Python multi-threaded chicken year is not ribbed

There is a specialization in the industry, if it is

When I first learned python multi-threading, the Internet search data is almost a reaction. Python does not really have multi-threading. Python multi-threading is a chicken rib. It was unknown at the time, only to understand that Python has the concept of GIL interpreter lock. Only one thread can be running at the same time, and the switch will be released when IO operation is encountered. So, is python multithreading really ridiculous? To solve this doubt, I think I have to test it myself.

After comparing python and java multi-threaded tests, I found that Python multi-threading efficiency is not as good as java, but far from the extent of chicken ribs, then compared with other mechanisms?

Opinion: Replacing multithreading requirements with multiple processes

I have seen a lot of blog posts. I have seen some netizens’ opinions that I should use Python multi-process instead of multi-threading because multi-process is not restricted by GIL. So I started using multiple processes to solve some concurrency problems, and I encountered some pits during the period. Fortunately, most of the search data was solved, and then a brief summary of the multi-process [Python multi-process] (http://thief.one) /2016/11/23/Python-multiprocessing/).
So is multiple processes able to completely replace multithreading? Don’t worry, we will continue to look down.

Opinion: Coroutine is the best solution

The concept of coroutine is relatively hot at present, and the coroutine is different from the thread. The coroutine is not the operating system to switch, but the programmer code to switch, that is, the switching is controlled by the programmer, so There is no thread called security issue. The concept of coroutine is very broad and deep. This article will not be specifically introduced in the future, and will be written separately in the future.

Test Data

Well, the online point of view is nothing more than the use of multi-process or coroutine instead of multi-threading (except for the programming language, except for the interpreter and other methods), then we will test the performance difference between the three. Since fair testing is required, IO-intensive and CPU-intensive issues should be considered, so two sets of data are tested.

IO intensive test

Testing IO-intensive, I chose the most common crawler feature to calculate the time it takes for the crawler to access bing. (Mainly testing multi-threading and coroutine, single-threaded and multi-process is not tested, because it is not necessary)
Test code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#! -*- coding:utf-8 -*-
from gevent import monkey;monkey.patch_all()
import
import time
import threading
import urllib2
def urllib2_(url):
try:
urllib2.urlopen(url,timeout=10).read()
except Exception,e:
print is
def gevent_ (urls):
jobs=[gevent.spawn(urllib2_,url) for url in urls]
gevent.joinall(jobs,timeout=10)
for i in jobs:
i.join()
def thread_(urls):
a=[]
for url in urls:
t=threading.Thread(target=urllib2_,args=(url,))
a.append(t)
for i in a:
i.start()
for i in a:
i.join()
if __name__=="__main__":
urls=["https://www.bing.com/"]*10
t1=time.time()
gevent_ (urls)
t2=time.time()
print 'given-time:% s'% str (t2-t1)
thread_(urls)
t4=time.time()
print 'thread-time:%s' % str(t4-t2)

Test Results:
Visit 10 times
data time: 0.380326032639
thread-time:0.376606941223
Visit 50 times
data time: 1,3358900547
thread-time:1.59564089775
Visit 100 times
data time: 2.42984986305
thread-time:2.5669670105
Visit 300 times
data time: 6,633330099106
thread-time:10.7605059147
As can be seen from the results, when the number of concurrent increases, the efficiency of the coroutine is indeed higher than that of multithreading, but when the number of concurrency is not so high, the difference between the two is small.

CPU intensive

CPU-intensive, I chose some functions of scientific computing to calculate the time required. (mainly test single thread, multi-thread, coroutine, multi-process)
Test code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#! -*- coding:utf-8 -*-
from multiprocessing import Process as pro
from multiprocessing.dummy import Process as the
from gevent import monkey;monkey.patch_all()
import
def run(i):
lists=range(i)
list(set(lists))
if __name__=="__main__":
'''
multi-Progress
'''
for i in range(30): ##10-2.1s 20-3.8s 30-5.9s
t=pro(target=run,args=(5000000,))
t.start()
'''
Multithreading
'''
# for i in range(30): ##10-3.8s 20-7.6s 30-11.4s
# t=thr(target=run,args=(5000000,))
# t.start()
'''
Coroutine
'''
# jobs=[gevent.spawn(run,5000000) for i in range(30)] ##10-4.0s 20-7.7s 30-11.5s
# gevent.joinall (jobs)
# for i in jobs:
# i.join()
'''
Single thread
'''
# for i in range(30): ##10-3.5s 20-7.6s 30-11.3s
# run(5000000)

Test Results:

  • Concurrent 10 times: [multi-process] 2.1s [multi-threaded] 3.8s [coroutine] 4.0s [single thread] 3.5s
  • Concurrent 20 times: [multi-process] 3.8s [multi-threaded] 7.6s [coroutine] 7.7s [single thread] 7.6s
  • Concurrent 30 times: [multi-process] 5.9s [multi-thread] 11.4s [coroutine] 11.5s [single thread] 11.3s

It can be seen that under CPU-intensive testing, multi-process effects are significantly better than others, and multi-threading, coroutine and single-threading effects are similar. This is because only the multi-process completely uses the computing power of the CPU. As the code runs, we can also see that only multiple processes can take up CPU usage.

Conclusion of this article

From the two sets of data, we can easily find that python multithreading is not so tasteless. If not, why doesn’t Python3 remove GIL? For this issue, the Python community also has two opinions, which are not discussed here, we should respect the decision of the father of Python.
As for when to use multi-threading, when to use multiple processes, when to use coroutines? Presumably the answer is already obvious.
When we need to write IO-intensive programs such as concurrent crawlers, we should use multi-threading or coroutine (the pro-test gap is not particularly obvious); when we need scientific computing, design CPU-intensive programs, we should use multiple processes. Of course, the premise of the above conclusion is that it is not distributed and is only tested on one server.
The answer has been given, is this article closed? Now that Python multithreading is still useful, let’s introduce its usage.

Multiprocessing.dummy module

Multiprocessing.dummy usage is similar to multiprocessing Multiprocessing, except that when you import the package, add .dummy.
Usage reference [Multiprocessing usage] (http://thief.one/2016/11/23/Python-multiprocessing/)

threading module

This is the threading multithreading module that comes with Python. There are two main ways to create multithreading. One is to inherit the threading class, the other is to use the threading.Thread function, and the two usages will be introduced separately.
Create a thread using the threading.Thread() function.
Code:

1
2
3
4
5
6
def run(i):
print i
for i in range(10):
t=threading.Thread(target=run,args=(i,))
t.start()

Description: The Thread() function has 2 parameters, one is target, the content is the name of the function to be executed by the child thread; the other is args, and the content is the parameter to be passed. After creating the child thread, it will return an object, and call the object’s start method to start the child thread.

The method of the thread object:

  • Start() starts thread execution
  • Run() function that defines the function of the thread
  • Join(timeout=None) The program hangs until the thread ends; if timeout is given, it blocks the maximum timeout seconds.
  • getName() returns the name of the thread
  • setName() sets the name of the thread
  • isAlive() boolean flag indicating whether this thread is still running
  • isDaemon() returns the thread’s daemon flag
  • setDaemon(daemonic) sets the daemon flag of the thread to daemonic (must be called before the start() function)
  • t.setDaemon(True) sets the parent thread as the daemon thread, and when the parent process ends, the child process also ends.

The threading class method:

  • threading.enumerate() number of threads running

Create a thread by inheriting the threading class.
Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import threading
class test(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
try:
print "code one"
except:
pass
for i in range(10):
cur=test()
cur.start()
for i in range(10):
cur.join()

Description: This method inherits the threading class and refactors the run function.

Get thread return value problem

Sometimes, we often need to get the return value of each child thread. However, by calling a normal function, the way to get the return value does not apply in multithreading. So a new way is needed to get the child thread return value.
Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import threading
class test(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
self.tag=1
def get_result(self):
if self.tag==1:
return True
else:
return False
f=test()
f.start()
while f.isAlive():
continue
print f.get_result()

Description: The first question for multithreading to get the return value is when does the child thread end? When should we get the return value? You can use the isAlive() method to determine if a child thread is alive.

Controlling the number of thread runs

When there are a lot of tasks that need to be executed, we often need to control the number of threads. The threading class comes with a method to control the number of threads.
Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import threading
Maxs=10 ##Number of concurrent threads
threadLimiter=threading.BoundedSemaphore(maxs)
class test(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
threadLimiter.acquire() #Get
try:
print "code one"
except:
pass
finally:
threadLimiter.release() #release
for i in range(100):
cur=test()
cur.start()
for i in range(100):
cur.join()

Description: The above program can control the number of concurrent multithreads to be 10, exceeding this number will cause an exception.
In addition to the methods that come with it, we can also design other solutions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
threads=[]
'''
Create all threads
'''
for i in range(10):
t=threading.Thread(target=run,args=(i,))
threads.append(t)
'''
Start the thread in the list
'''
for t in threads:
t.start()
while True:
#Judge the number of running threads, if less than 5, exit the while loop,
#Enter for loop to start a new process. Otherwise it will always enter the infinite loop in the while loop.
if(len(threading.enumerate())<5):
break

The above two methods are all possible, I prefer to use the following way.

Thread Pool

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import threadpool
def ThreadFun(arg1,arg2):
pass
def main():
Device_list=[object1,object2,object3...,objectn]#Number of devices to be processed
Task_pool=threadpool.ThreadPool(8)#8 is the number of threads in the thread pool
Request_list=[]#Save the task list
#First construct a task list
for device in device_list:
request list.append(thread pool.make Requests(ThreadFin,[((device, ), {})]))
# Put each task into the thread pool, wait for the thread in the thread pool to read the task, and then process it, using the map function, you can get to know if you don't understand.
map(task_pool.putRequest,request_list)
#Wait for all tasks to be processed, then return, if not processed, it will block
task_pool.poll()
if __name__=="__main__":
main()

Multi-process problems, you can go to [Python multi-process] (http://thief.one/2016/11/23/Python-multiprocessing/) site, other questions about multi-threading, you can discuss below

Affirmation: This article can’t talk about originality, which draws on the articles of many big cows on the Internet. I just test the Python multi-thread related issues in this test, and briefly introduce the basic usage of Python multi-threading to solve the problem for novice friends.

本文标题:Python multi-threaded chicken year is not ribbed

文章作者:nmask

发布时间:2017年02月17日 - 13:02

最后更新:2019年08月16日 - 14:08

原始链接:https://thief.one/2017/02/17/Python multi-threaded chicken year is not ribbed/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

nmask wechat
欢迎您扫一扫上面的微信公众号,订阅我的博客!
坚持原创技术分享,您的支持将鼓励我继续创作!

热门文章推荐: