The beauty of python3 encoding

Which is the end of the string, the flowers fall off the shoulders, confuse


I introduced the last article in Python2 encoding. The most direct and effective way to solve the encoding problem in Python2 is to convert all external strings into unicode format, and then flow inside Python. Python3 is doing a lot of optimization in this area.

bytes/str/unicode difference

Updated on May 2, 2017

python3bytes and str

Bytes represent the original 8-bit value of the character, and str represents the Unicode character. The unicode character is represented as binary data (the original 8-bit value), the most common encoding method is UTF-8. The unicode characters in python 2 and 3 are not associated with a specific binary encoding, so the encode method is required.
In python3, bytes and str are definitely not equivalent. Even if the character content is “”, you must pay attention to the type when passing in the sequence of characters.

python2 str and unicode

Str represents the original 8-bit value of the character, and unicode represents the Unicode character.
In python2, if str only contains 7-bit ASCII characters (English characters), then unicode and str instances are similar to the same type (equivalent), then in this case, the following operations are normal:

  • You can connect str with unicode with + sign
  • You can use = and != to judge str and unicode

System and source code encoding

3.x has changed the source code encoding and system encoding from ascii to utf-8, avoiding Chinese error.

1
2
3
4
5
6
>>> import sys
>>> print(sys.getdefaultencoding())
utf-8
>>> print(sys.getfilesystemencoding())
utf-8
>>>

Secondly, we can see that we define a as str (equivalent to unicode in 2.x), and it does not report errors due to coding problems in the windows console output.

1
2
3
>>> a="Hello"
>>>print(a)
Hello there

string encoding

1
2
3
4
5
6
7
>>> a="Hello"
>>> print(type(a))
<class 'str'>
>>> b=a.encode("utf-8")
>>> print(type(b))
<class 'bytes'>
>>>

We can see that the str format in 3.x is similar to unicode in 2.x, and the str in 2.x is equivalent to the bytes in 3.x.

Web page coding


result:

The return is in the bytes format, as long as the decode is converted to str, it is ok.

file encoding


Result: The read from the file is str (unicode in 2.x), so there is no need to transcode.

open function

Note that the open handle in python2 is str (original binary), and python3 is str (unicode character), so the code is normal in python2, and it will report an error in python3:

1
2
with open("test","w") as w:
w.write("123")

Because python3 requires the passed value to be str, not the bytes type. The open function comes with the encoding method.
Solution:

1
2
with open("test","wb") as w:
w.write("123")

Similarly, the read function is the same, written as rb, it can be compatible with 2 and 3.

Portal

[Continuation of Python 2 encoding] (http://thief.one/2017/04/14/1/)

本文标题:The beauty of python3 encoding

文章作者:nmask

发布时间:2017年04月18日 - 10:04

最后更新:2019年07月11日 - 16:07

原始链接:https://thief.one/2017/04/18/01/en/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

nmask wechat
欢迎您扫一扫上面的微信公众号,订阅我的博客!
坚持原创技术分享,您的支持将鼓励我继续创作!

热门文章推荐: