Butterfly is beautiful, after all, the butterfly flies but the sea </
A simple analysis of this string of characters, it feels like unicode encoded content, but I feel a little less, so I started a series of experiments.
I want to figure out what this string is. First, I tested the unicode-encoded string to see how long it looks.
The experimental result indicates that the unicode string is long like this: u’\u4f60\u597d’, but it actually represents Chinese: Hello. As for why input a, the output is unicode character content, and print a output is str format Chinese: Hello, the reason is that the print statement in python will automatically convert unicode characters into str format. If you don’t know about unicode and string, then please go back to the beginning of the article and move on to the article that analyzed the code before. I think it will help you.
Even knowing that unicode characters are long, then we can rule out that the strange string is not a unicode string. why? Obviously, because there is no u in front of it.
Are you a little confused when you see it here? Although it doesn’t have u(u”\u4f60….”) in front of it, it looks really like unicode characters. Don’t worry, let me introduce the difference between string variable encoding and string content encoding.
Description: The above two concepts are taken by myself and do not represent the official explanation. If there is any deviation, please understand
The so-called string variable encoding is what we usually call encoding, such as string, unicode, string and utf-8, gbk, gb2312 and so on. The judgment method is very simple, you can use the type function.
We can see that unicode or string represents an encoding format of a string variable, regardless of its content. We know to define a=”test”, then a is string encoding; otherwise it defines a=u”test”, a is unicode encoding, then I want to ask: What is the encoding of test? (The question is here, test, not a)
Some people will say that test is a normal string, yes it is indeed a string, which represents the content of a. Then the same reason
When a is itself a string in str format, then
What about the content itself? That’s right, the content itself is a unicode encoded string. Ok, let’s do the experimental test.
Let’s take a look at the string content encoded by several common encoding formats:
Please pay attention to the above coded content, observe its characteristics, and then we will look at the strange string.
We see that the variable a is in string format.
We add a u in front of the variable “”, indicating that the variable a is a unicode string and its content is
Next to print a, I found the same result as the previous step, yes, because print changed a from unicode to string, and its content looked less slashed.
Then, I reassigned the content of a, ie \u8fdd\u6cd5\u8fdd\u89c4, to the variable b. At this time, “” also added a u to make it a unicode format, then print b, the magic scene happened. The output is actually converted into Chinese. The reason I think is that the print statement will not only convert the string variable a into a string, but also convert its contents to string.
The above example defines the variable a as unicode encoding, and its content is string-utf-8 encoding. At this time, when printing a, the print statement tries to convert the content of a into string, but since it is string encoding itself, garbled characters appear. The opposite is ok.
You may find it strange to see this. We define the content of the variable a as \u8fdd\u6cd5\u8fdd\u89c4, and the strange string is like this.
It seems that there are more slashes, and the list is urgent. After reading the following test, you can understand the difference between the two.
Simply put, the strange string is the content after 2 unicode encodings.
Of course, turning it into Chinese can be done with a built-in function. The reason why I distribute the demo is to show the specific meaning more clearly.
Convert unicode encoded content to Chinese (note the content, not the string variable)
Convert string-encoded content to Chinese (note the content, not the string variable)
Added on April 27, 2017
When the variable a is decoded (‘utf-8’), in addition to changing the type of the variable a from str to unicode, the content of the a variable is also utf-8 decoded, so there are some slashes.
When the variable a is decoded (‘unicode-escape’), it seems that only the variable itself is decoded into unicode, and its content has not changed.
We know that the print function will encode the variables and the contents of the variable into str, so the second example can output Chinese, and the first example outputs the contents of the unicode type, but with some slashes missing, because it still needs to Encode once.
Of course, the conversion of this example has a simpler method, as follows:
I opened a round of speeding. I don’t know if you have motion sickness. If you can’t figure out the above coding relationships, you can remember the last two functions.
Looking at the screen to output familiar Chinese characters, I am excited to throw the transcoded content to a certain division, and eagerly awaiting the reward, waiting to enjoy the sunset that the top floor room of the Super 8 Hotel greets, and savoring That strange smirk. Until the final screen jumped out of a line: 8:00 at 8:00, the theater sees .
April 21, 2017
There is a list of lists. The fields in the list are in unicode format. When the list is output, the contents are as follows:
How to make the contents of the list Chinese? I guess, when the list is output, the Chinese inside will be automatically encoded, so you can do this:
[The beauty of Python3 encoding] (http://thief.one/2017/04/18/1/)