utf-8 iteration / ord() behaves unexpected
-
Hi, I'm trying to iterate over a string containing German Umlaute (äöü) to get the matching unicode position as Decimal.
It's working for single characters
print(ord(b'ä')) print(ord(bytes('ä', "utf-8")))
outputs
228
228It doesn't, however, work once I iterate over the characters.
text = "aldköäü" for index in range(len(text)): char = text[index] print("char:") print(char) print(char.encode("utf-8")) print(ord(bytes(char, "utf-8")))
outputs
char:
a
b'a'
97
char:
l
b'l'
108
char:
d
b'd'
100
char:
k
b'k'
107
char:
���
b'\xf6\xe4\xfc\x00'
Traceback (most recent call last):File "<stdin>", line 52, in <module>
TypeError: ord() expected a character, but string of length 4 found
How can I get this to work? I'm using a WiPy 3.0 and plan to use the same code on a GPy
-
For future reference: Here's a solution.
text = "aouäöü" print(text) print(len(text)) for char in text: print(char) print(ord(bytes(char, "utf-8")))