How did you get out of the haze of life?
Take a few more steps
Some time ago analyzed [Selenium+Phantomjs usage and performance optimization issues] (http://thief.one/2017/03/01/Phantomjs%E6%80%A7%E8%83%BD%E4%BC%98 %E5%8C%96/), during the analysis of the [Selenium+phantomjs crawler crawled some pit problems] (http://thief.one/2017/03/01/Phantomjs%E7%88%AC%E8 %BF%87%E7%9A%84%E9%82%A3%E4%BA%9B%E5%9D%91/). However, in the process of using phantomjs, the performance of phantomjs is not really improved, and the crawler performance is not improved. After the netizen’s reminder, it is found that the method of using phantomjs is a problem, so no matter how optimized, it can not fundamentally improve performance. Then let’s talk about this article, Phantomjs correctly open the way.
I used Selenium to use phantomjs before. The reason is that since selenium encapsulates some functions of phantomjs, selenium provides Python interface module. Selenium can be used well in Python language, and phantomjs can be used indirectly. However, what I am saying now is that it is time to discard selenium+phantomjs. One of the reasons why this packaged interface has not been updated for a long time (no one has maintained it), the second reason selenium only implements some of the phantomjs functions, and is not perfect. .
By looking at the official introduction of phantomjs, we can find that the function of phantomjs is extremely powerful, not just the function of the selenium package. Phantomjs provides a variety of APi, you can view: [Pantomjs api introduction] (http://thief.one/2017/03/13/Phantomjs-Api%E4%BB%8B%E7%BB%8D/), among them The most commonly used ones are Phantomjs WebService and Phantomjs WebPage, the former is used to open the http service, and the latter is used to initiate http requests.
Python sends the task through http request, Phantomjs Webservice gets the task and then processes it, and then returns the result to Python after processing. Task scheduling, storage and other complex operations are handed over to Python. Python can be written asynchronously to request Phantomjs Webservice. It should be noted that currently a Phantomjs Webservice only supports 10 concurrent. But we can open a few phantomjs Webservice on a server to enable different ports, or you can cluster multiple servers and use nginx as a reverse proxy.
Create a new test.js and write the following code:
Role: handle http requests, get urls, take screenshots or get source code operations.
The web service will be opened locally, and the port is 8080.
Create a new http_request.py and write the following code:
Role: Asynchronous concurrent delivery tasks.
After running python, 10 tasks are delivered asynchronously. The Phantomjs server receives the url and starts processing, and processes 10 tasks and enters the result.
Phenomenon: the screenshot is a black screen
Reason: The webpage has not been loaded yet, so it starts to take a screenshot.
Solution: Determine the status value after opening in the code to determine whether the web page is loaded.
Phenomenon: Program error - windows error
Solution: Replace the latest version of phantomjs
Phenomenon: memory usage is too large, causing the error to stop the phantomjs process
Reason: phantomjs did not release the content
Solution: After the code is open, open.close();
Phenomenon: no screenshots succeeded
Reason: page.close is used, because onloadfinished is non-blocking, so page.close should be placed inside the open code layer.
Reprint please indicate the source: [Phantomjs correct opening method | nMask’Blog] (http://thief.one/2017/03/31/Phantomjs correctly open way /)
This article address: http://thief.one/2017/03/31/Phantomjs correct opening method /
[[phantomjs series] phantomjs correctly opened] (http://thief.one/2017/03/31/Phantomjs%E6%AD%A3%E7%A1%AE%E6%89%93%E5%BC% 80%E6%96%B9%E5%BC%8F/)
[[phantomjs series] phantomjs api introduction] (http://thief.one/2017/03/13/Phantomjs-Api%E4%BB%8B%E7%BB%8D/)
[[Phantomjs series] those pits that selenium+phantomjs climbed] (http://thief.one/2017/03/01/Phantomjs%E7%88%AC%E8%BF%87%E7%9A%84%E9 %82%A3%E4%BA%9B%E5%9D%91/)
[[phantomjs series] selenium+phantomjs performance optimization] (http://thief.one/2017/03/01/Phantomjs%E6%80%A7%E8%83%BD%E4%BC%98%E5%8C% 96/)