Python coroutine


The growth process of truly knowledgeable people is like the growth process of wheat ears: when the wheat ears are empty, the wheat grows very fast, and the wheat ears are proudly high, but when the wheat is ripe and full, they begin to be modest and hang down. Mai Mang.

The last part discussed the question about whether Python multithreading is a chicken rib. It has been recognized by some netizens. Of course, there are also some different opinions, indicating that the coroutine is much stronger than multithreading. In the front of the coroutine, multithreading is a tasteless. Ok, I agree with this. However, the point I discussed in the previous article is not the comparison between multi-threading and coroutine, but in the IO-intensive program, multi-threading is still useful.

For the coroutine, I said that its efficiency is not multi-threaded, but I don’t know much about it. So I have referenced some materials in recent days and learned to sort it out. I will share it here for your reference. Please correct me if you are wrong. Thank you.

Affirmation: The coroutine introduced in this article is the entry level. Please take a detour and be careful not to enter the pit.

Concept

Coroutine, also known as micro-threading, fiber, English name Coroutine. The role of the coroutine is that when the function A is executed, it can be interrupted at any time to execute the function B, and then the interrupt continues to execute the function A (can be freely switched). But this process is not a function call (no call statement), this whole process looks like multi-threaded, but the coroutine has only one thread to execute.

Advantage

  • Execution efficiency is extremely high, because the subroutine switch (function) is not a thread switch, it is controlled by the program itself, and there is no overhead of switching threads. So compared with multi-threading, the more threads there are, the more obvious the advantages of coroutine performance.
  • No multi-threaded locking mechanism is needed, because there is only one thread, there is no simultaneous write variable conflict, and there is no need to lock when controlling shared resources, so the execution efficiency is much higher.

  • Description: Coroutine can handle the efficiency of IO-intensive programs, but dealing with CPU-intensive is not its strengths. To fully utilize CPU utilization, you can combine multi-process + coroutine. *

These are just some of the concepts of coroutines, which may sound more abstract, so let me talk about the code. Here mainly introduces the application of coroutine in Python. Python2 has limited support for coroutine. The generator implementation of the generator is partially but not complete. The gevent module has a better implementation. After Python 3.4, the asyncio module is introduced. Good use of coroutines.

Python2.x coroutine

Python2.x coroutine application:

  • yield
  • given

There are not many modules supporting python in python2.x. gevent is more common. Here is a brief introduction to the usage of gevent.

Vented

Gevent is a third-party library that implements coroutines through greenlets. The basic idea is:
When a greenlet encounters an IO operation, such as accessing the network, it automatically switches to another greenlet, waits until the IO operation is completed, and then switches back to continue execution when appropriate. Since IO operations are very time consuming, the program is often in a wait state. With gevent automatically switching the coroutines for us, it is guaranteed that there will always be a greenlet running, rather than waiting for IO.

Install

pip install

  • The latest version seems to support windows, before the test seems to be running on windows…*
    Usage
    Let’s first look at a simple crawler example:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    #! -*- coding:utf-8 -*-
    import
    from gevent import monkey;monkey.patch_all()
    import urllib2
    def get_body(i):
    print "start",i
    urllib2.urlopen("http://cn.bing.com")
    print "end",i
    tasks=[gevent.spawn(get_body,i) for i in range(3)]
    gevent.joinall(tasks)

operation result:

1
2
3
4
5
6
start 0
start 1
start 2
end 2
end 0
end 1

Description: From the result point of view, the order of executing get_body should first output “start”, then when it encounters IO blockage when executing urllib2, it will automatically switch to run the next program (continue to execute get_body output start) until urllib2 returns the result, then Execute the end. In other words, the program does not wait for urllib2 to request the website to return the result, but skips directly, waits for the execution to complete and then returns to get the return value. It is worth mentioning that in this process, only one thread is executing, so this is not the same as the concept of multithreading.
Change to multi-threaded code to see:

1
2
3
4
5
6
7
8
9
10
import threading
import urllib2
def get_body(i):
print "start",i
urllib2.urlopen("http://cn.bing.com")
print "end",i
for i in range(3):
t=threading.Thread(target=get_body,args=(i,))
t.start()

operation result:

1
2
3
4
5
6
start 0
start 1
start 2
end 1
end 2
end 0

Note: From the results, the effect of multi-threading and coroutine is the same, it is the function of switching when IO blocking. The difference is that multi-thread switching is thread (inter-thread switching), and coroutine switching is context (can be understood as a function of execution). The overhead of switching threads is obviously greater than the overhead of switching contexts, so the more threads there are, the more efficient the coroutine is than the multithreading. (I guess the multi-process switching overhead should be the biggest)

Gevent Instructions
  • monkey can make some blocked modules become unblocked. Mechanism: Automatically switch when encountering IO operation. Manual switch can use gevent.sleep(0) (change the crawler code to this, the effect can reach the switching context)
  • gevent.spawn starts the coroutine, the parameter is the function name, the parameter name
  • gevent.joinall stop coroutine

Python3.x coroutine

Python3.5 coroutine can be used to move: [Python 3.5 coroutine learning research] (https://thief.one/2018/06/21/1/)

To test the coroutine application under Python 3.x, I installed the python 3.6 environment under virtualenv.
Python3.x coroutine application:

  • given

The asyncio module was introduced after Python 3.4, which can support coroutines very well.

asynico

Asyncio is a standard library introduced in Python 3.4, with built-in support for asynchronous IO. The asynchronous operation of asyncio needs to be completed by yield from in coroutine.

Usage

Example: (need to be used in python3.4 and later versions)

1
2
3
4
5
6
7
8
9
10
11
12
import asyncio
@asyncio.coroutine
def test(i):
print("test_1",i)
r=yield from asyncio.sleep(1)
print("test_2",i)
loop=asyncio.get_event_loop()
tasks=[test(i) for i in range(5)]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

operation result:

1
2
3
4
5
6
7
8
9
10
test_1 3
test_1 4
test_1 0
test_1 1
test_1 2
test_2 3
test_2 0
test_2 2
test_2 4
test_2 1

Description: From the running results, you can see that the effect is the same as that achieved by gevent. It is also switched when encountering IO operation (so output test_1 first, wait for test_1 to output and then output test_2). But here I am a bit unclear, why is the output of test_1 not executed in order? Can compare the output of gevent (hope that God can answer).

asyncio instructions

@asyncio.coroutine marks a generator as a coroutine type, and then we throw this coroutine into EventLoop for execution.
Test() will print out test_1 first, then the yield from syntax will allow us to easily call another generator. Since asyncio.sleep() is also a coroutine, the thread does not wait for asyncio.sleep(), but instead interrupts and executes the next message loop. When asyncio.sleep() returns, the thread can get the return value from yield from (here, None), and then execute the next line of statements.
Think of asyncio.sleep(1) as a IO operation that takes 1 second. During this time, the main thread does not wait, but executes other coroutines that can be executed in EventLoop, so concurrent execution can be implemented.

asynico/await

To simplify and better identify asynchronous IO, new syntax async and await have been introduced since Python 3.5 to make coroutine code more concise and readable.
Note that async and await are new syntaxes for coroutine. To use the new syntax, just do a two-step simple replacement:

  • Replace @asyncio.coroutine with async;
  • Replace yield from with await.
Usage

Example (used in Python 3.5 and later):

1
2
3
4
5
6
7
8
9
10
11
import asyncio
async def test(i):
print("test_1",i)
await asyncio.sleep(1)
print("test_2",i)
loop=asyncio.get_event_loop()
tasks=[test(i) for i in range(5)]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

The result of the operation is the same as before.
Description: Compared with the previous section, here is just replacing yield from await, @asyncio.coroutine replaced by async, and the rest is unchanged.

####
Same usage as python2.x.
If you have already understood the difference between multithreading and coroutine through the above introduction, then I don’t think it is necessary to test. Because when the thread is more and more, the main overhead of multi-threading is spent on thread switching, and the coroutine is switched within one thread, so the overhead is much smaller, which may be the fundamental difference between the performance of the two. (personal opinion)

Asynchronous crawler

Perhaps most of the friends who care about the coroutine use it to write crawlers (because the coroutine can solve the IO blocking problem well), however, I found that the commonly used urllib, requests can not be combined with asyncio, probably because the crawler module itself is synchronous (maybe I didn’t find usage). So how do you use coroutines for asynchronous crawlers? Or how to write asynchronous crawlers?
Give a few solutions I know:

  • grequests (asynchronization of the requests module)
  • Reptile module + gevent (preferred this)
  • aiohttp (this seems to be a lot of information, I will not use it at present)
  • asyncio built-in crawler function (this is also more difficult)

Role: control the number of associations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
from bs4 import BeautifulSoup
import requests
import
from gevent import monkey, pool
monkey.patch_all()
jobs = []
links = []
p = pool.Pool(10)
urls = [
'http://www.google.com',
# ... another 100 urls
]
def get_links(url):
r = requests.get(url)
if r.status_code == 200:
soup = BeautifulSoup(r.text)
links + soup.find_all('a')
for url in urls:
jobs.append(p.spawn(get_links, url))
gevent.joinall (jobs)

Article learning channel:



The content of this article refers to the source: [Liao Xuefeng python tutorial] (http://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/001432090954004980bd351f2cd4cc18c9e6c06d855c498000), recommend novices to learn.

本文标题:Python coroutine

文章作者:nmask

发布时间:2017年02月20日 - 11:02

最后更新:2019年08月16日 - 15:08

原始链接:https://thief.one/2017/02/20/Python coroutine/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

nmask wechat
欢迎您扫一扫上面的微信公众号,订阅我的博客!
坚持原创技术分享,您的支持将鼓励我继续创作!

热门文章推荐: