utf-8 iteration / ord() behaves unexpected

Peter Ehses

Hi, I'm trying to iterate over a string containing German Umlaute (äöü) to get the matching unicode position as Decimal.

It's working for single characters

print(ord(b'ä'))
print(ord(bytes('ä', "utf-8")))

outputs

228
228

It doesn't, however, work once I iterate over the characters.

text = "aldköäü"
for index in range(len(text)):
    char = text[index]
    print("char:")
    print(char)
    print(char.encode("utf-8"))
    print(ord(bytes(char, "utf-8")))

outputs

char:
a
b'a'
97
char:
l
b'l'
108
char:
d
b'd'
100
char:
k
b'k'
107
char:
��
b'\xf6\xe4\xfc\x00'
Traceback (most recent call last):

File "<stdin>", line 52, in <module>

TypeError: ord() expected a character, but string of length 4 found

How can I get this to work? I'm using a WiPy 3.0 and plan to use the same code on a GPy

Peter Ehses

For future reference: Here's a solution.

text = "aouäöü"

print(text)
print(len(text))

for char in text:
    print(char)
    print(ord(bytes(char, "utf-8")))

Explore Pybytes | Official Documentation | Report a Firmware Bug/Issue | GitHub

utf-8 iteration / ord() behaves unexpected

Pycom on Twitter