There is a specialization in the industry, if it is
When I first learned python multi-threading, the Internet search data is almost a reaction. Python does not really have multi-threading. Python multi-threading is a chicken rib. It was unknown at the time, only to understand that Python has the concept of GIL interpreter lock. Only one thread can be running at the same time, and the switch will be released when IO operation is encountered. So, is python multithreading really ridiculous? To solve this doubt, I think I have to test it myself.
After comparing python and java multi-threaded tests, I found that Python multi-threading efficiency is not as good as java, but far from the extent of chicken ribs, then compared with other mechanisms?
I have seen a lot of blog posts. I have seen some netizens’ opinions that I should use Python multi-process instead of multi-threading because multi-process is not restricted by GIL. So I started using multiple processes to solve some concurrency problems, and I encountered some pits during the period. Fortunately, most of the search data was solved, and then a brief summary of the multi-process [Python multi-process] (http://thief.one) /2016/11/23/Python-multiprocessing/).
So is multiple processes able to completely replace multithreading? Don’t worry, we will continue to look down.
The concept of coroutine is relatively hot at present, and the coroutine is different from the thread. The coroutine is not the operating system to switch, but the programmer code to switch, that is, the switching is controlled by the programmer, so There is no thread called security issue. The concept of coroutine is very broad and deep. This article will not be specifically introduced in the future, and will be written separately in the future.
Well, the online point of view is nothing more than the use of multi-process or coroutine instead of multi-threading (except for the programming language, except for the interpreter and other methods), then we will test the performance difference between the three. Since fair testing is required, IO-intensive and CPU-intensive issues should be considered, so two sets of data are tested.
Testing IO-intensive, I chose the most common crawler feature to calculate the time it takes for the crawler to access bing. (Mainly testing multi-threading and coroutine, single-threaded and multi-process is not tested, because it is not necessary)
Visit 10 times
data time: 0.380326032639
Visit 50 times
data time: 1,3358900547
Visit 100 times
data time: 2.42984986305
Visit 300 times
data time: 6,633330099106
As can be seen from the results, when the number of concurrent increases, the efficiency of the coroutine is indeed higher than that of multithreading, but when the number of concurrency is not so high, the difference between the two is small.
CPU-intensive, I chose some functions of scientific computing to calculate the time required. (mainly test single thread, multi-thread, coroutine, multi-process)
- Concurrent 10 times: [multi-process] 2.1s [multi-threaded] 3.8s [coroutine] 4.0s [single thread] 3.5s
- Concurrent 20 times: [multi-process] 3.8s [multi-threaded] 7.6s [coroutine] 7.7s [single thread] 7.6s
- Concurrent 30 times: [multi-process] 5.9s [multi-thread] 11.4s [coroutine] 11.5s [single thread] 11.3s
It can be seen that under CPU-intensive testing, multi-process effects are significantly better than others, and multi-threading, coroutine and single-threading effects are similar. This is because only the multi-process completely uses the computing power of the CPU. As the code runs, we can also see that only multiple processes can take up CPU usage.
From the two sets of data, we can easily find that python multithreading is not so tasteless. If not, why doesn’t Python3 remove GIL? For this issue, the Python community also has two opinions, which are not discussed here, we should respect the decision of the father of Python.
As for when to use multi-threading, when to use multiple processes, when to use coroutines? Presumably the answer is already obvious.
When we need to write IO-intensive programs such as concurrent crawlers, we should use multi-threading or coroutine (the pro-test gap is not particularly obvious); when we need scientific computing, design CPU-intensive programs, we should use multiple processes. Of course, the premise of the above conclusion is that it is not distributed and is only tested on one server.
The answer has been given, is this article closed? Now that Python multithreading is still useful, let’s introduce its usage.
Multiprocessing.dummy usage is similar to multiprocessing Multiprocessing, except that when you import the package, add .dummy.
Usage reference [Multiprocessing usage] (http://thief.one/2016/11/23/Python-multiprocessing/)
This is the threading multithreading module that comes with Python. There are two main ways to create multithreading. One is to inherit the threading class, the other is to use the threading.Thread function, and the two usages will be introduced separately.
Create a thread using the threading.Thread() function.
Description: The Thread() function has 2 parameters, one is target, the content is the name of the function to be executed by the child thread; the other is args, and the content is the parameter to be passed. After creating the child thread, it will return an object, and call the object’s start method to start the child thread.
The method of the thread object:
- Start() starts thread execution
- Run() function that defines the function of the thread
- Join(timeout=None) The program hangs until the thread ends; if timeout is given, it blocks the maximum timeout seconds.
- getName() returns the name of the thread
- setName() sets the name of the thread
- isAlive() boolean flag indicating whether this thread is still running
- isDaemon() returns the thread’s daemon flag
- setDaemon(daemonic) sets the daemon flag of the thread to daemonic (must be called before the start() function)
- t.setDaemon(True) sets the parent thread as the daemon thread, and when the parent process ends, the child process also ends.
The threading class method:
- threading.enumerate() number of threads running
Create a thread by inheriting the threading class.
Description: This method inherits the threading class and refactors the run function.
Sometimes, we often need to get the return value of each child thread. However, by calling a normal function, the way to get the return value does not apply in multithreading. So a new way is needed to get the child thread return value.
Description: The first question for multithreading to get the return value is when does the child thread end? When should we get the return value? You can use the isAlive() method to determine if a child thread is alive.
When there are a lot of tasks that need to be executed, we often need to control the number of threads. The threading class comes with a method to control the number of threads.
Description: The above program can control the number of concurrent multithreads to be 10, exceeding this number will cause an exception.
In addition to the methods that come with it, we can also design other solutions:
The above two methods are all possible, I prefer to use the following way.
Multi-process problems, you can go to [Python multi-process] (http://thief.one/2016/11/23/Python-multiprocessing/) site, other questions about multi-threading, you can discuss below
Affirmation: This article can’t talk about originality, which draws on the articles of many big cows on the Internet. I just test the Python multi-thread related issues in this test, and briefly introduce the basic usage of Python multi-threading to solve the problem for novice friends.