Which is the end of the string, the flowers fall off the shoulders, confuse
I introduced the last article in Python2 encoding. The most direct and effective way to solve the encoding problem in Python2 is to convert all external strings into unicode format, and then flow inside Python. Python3 is doing a lot of optimization in this area.
Updated on May 2, 2017
Bytes represent the original 8-bit value of the character, and str represents the Unicode character. The unicode character is represented as binary data (the original 8-bit value), the most common encoding method is UTF-8. The unicode characters in python 2 and 3 are not associated with a specific binary encoding, so the encode method is required.
In python3, bytes and str are definitely not equivalent. Even if the character content is “”, you must pay attention to the type when passing in the sequence of characters.
Str represents the original 8-bit value of the character, and unicode represents the Unicode character.
In python2, if str only contains 7-bit ASCII characters (English characters), then unicode and str instances are similar to the same type (equivalent), then in this case, the following operations are normal:
- You can connect str with unicode with + sign
- You can use = and != to judge str and unicode
3.x has changed the source code encoding and system encoding from ascii to utf-8, avoiding Chinese error.
Secondly, we can see that we define a as str (equivalent to unicode in 2.x), and it does not report errors due to coding problems in the windows console output.
We can see that the str format in 3.x is similar to unicode in 2.x, and the str in 2.x is equivalent to the bytes in 3.x.
The return is in the bytes format, as long as the decode is converted to str, it is ok.
Result: The read from the file is str (unicode in 2.x), so there is no need to transcode.
Note that the open handle in python2 is str (original binary), and python3 is str (unicode character), so the code is normal in python2, and it will report an error in python3:
Because python3 requires the passed value to be str, not the bytes type. The open function comes with the encoding method.
Similarly, the read function is the same, written as rb, it can be compatible with 2 and 3.
[Continuation of Python 2 encoding] (http://thief.one/2017/04/14/1/)