Invisible, it means that the road is in the same way as Xiyi;
This system article is divided into four parts, namely, tactics, tools, hidden articles, and summary articles; this article is a summary, mainly introduces the detection and prevention of black hat seo behavior. It can be said that the first three articles in this series are the foundation for the last one. After all, what we really need to do as a security engineer is to help customers defend against attacks and resist black products.
I introduced a lot of techniques about black hat seo. How can I monitor whether my website is invaded as a webmaster or operation manager, and is used by black hat seo? I won’t talk about how to detect intrusions here, because this is not the scope of this article. We only talk about how to detect the use of black hat seo. Here are a few ideas.
You can monitor the file changes in the server web directory. Generally, the black hat seo needs to change the files in the web directory (add files, or change the file contents). Of course, some can only achieve the purpose of changing the nginx configuration, so the configuration files of servers such as nginx also need to be monitored.
Summary: Internal monitoring is similar to tamper-proof detection. It only faces webpage hijacking. In addition to responding to file content changes, it also needs to respond to new files and other behaviors, including changes to server configuration files.
Black hat seo techniques are fundamentally deceiving search engines, so detection can also be based on search engines in nature. Check if the website has sensitive content in the search engine search, such as: betting, pornography, etc. Because the webpage hijacking method can dynamically control the display content, such as clicking and returning different content in different regions, this requires our detection program to detect in multiple dimensions.
Multiple dimensions include but are not limited to the following:
- IP detection target sites in different regions
- Detect target sites in different time periods
- Use different UA to access the target website
- Use different access methods to target websites (Baidu search jump, direct access domain name)
The detection steps are divided into:
- Get search engine search results
- Simulate browser access to search results page
- Analyze elements such as web page source
- Matching rules to determine whether a website has been hijacked
This step requires crawling the search engine. For example, if we want to determine whether the thief.one website is hijacked, we can search Baidu: site:thief.one porn. Keywords need to be collected by themselves, and then use crawlers to crawl Baidu’s search results.
Obviously this step needs to fight against Baidu search engine, to prevent it from being blocked by the problem, and to be able to correctly obtain Baidu’s search results. For climbing search engines, please refer to:
[Crawling the search engine to find you thousands of Baidu] (https://thief.one/2017/03/17/%E7%88%AC%E6%90%9C%E7%B4%A2%E5%BC%95% E6%93%8E%E4%B9%8B%E5%AF%BB%E4%BD%A0%E5%8D%83%E7%99%BE%E5%BA%A6/)
[Crawling the search engine of Sogou] (https://thief.one/2017/03/19/%E7%88%AC%E5%8F%96%E6%90%9C%E7%B4%A2%E5% BC%95%E6%93%8E%E4%B9%8B%E6%90%9C%E7%8B%97/)
After climbing to the required web link, we need to replay the url to get the information. This step needs to be able to dynamically execute the js code embedded in the web page and dynamically track the direction (jump) of the web page. It is recommended to use [phantomjs] (http://thief.one/2017/03/31/Phantomjs%E6%AD%A3%E7%A1%AE%E6%89%93%E5%BC%80%E6%96 %B9%E5%BC%8F/_) Of course, other webkits can also be used.
You can use python to parse webpage source code, web page title, URL, js, etc. The most convenient way is to get the content of each parameter, and then process the data and throw it into the machine learning algorithm to calculate the model.
Regular rules can be used to create a rule base to match according to features such as black hat seo. Of course, you can also use machine learning to classify related web pages. We have used some algorithm to increase the accuracy to about 90%.
Summary: External detection is more difficult. At present, black hat seo is mainly for Baidu, so this is equivalent to detecting Baidu’s search results; how to simulate browser access is also a big problem, of course, the most important thing is the last machine learning, how to train model.
Most of the black hat SEOs are promoted for the gaming gambling industry, which will increase the risk of Internet users fascinated by online gambling. There are not many things that are ruined because of online gambling. Some black hat SEOs are also banned for guns and ammunition and drugs. The promotion of drugs has also facilitated criminals. Before that, I always thought that black production is not too much harm, but through the study of black hat SEO, it is not only the economy. So who should pay for this?
First of all, the website manager can’t blame. Because of the weak security awareness of the administrator, the security of the website is not high, which leads to the invasion and eventually become a part of the black product. In several similar incidents that I dealt with myself, webmasters are often an insignificant attitude. Even if the website has been used by Black Hat SEO, it feels that it does not cause any harm to the website itself, and the awareness is not high.
Secondly, the search engine should bear certain responsibilities, because the black hat SEO behavior is mainly aimed at the search engine. To put it bluntly, it is to use the search engine algorithm vulnerability to enhance the weight of the illegal website. Most domestic netizens use search engines online. Since search engines have the right to decide which resources to display to users, they must also have an obligation to ensure the security and regularity of these resources.
If you are a netizen, the best way to stop black hats is to go online and find illegal websites submitted to [Security Alliance] (https://www.anquan.org/) or report to search engines.
If you are a webmaster, please do a good job in the security construction of your website and promptly fill in the leaks; if you find that you have been compromised, contact the technical staff in time.
[Summary of Black Hat SEO Analysis] (https://thief.one/2017/09/28/4/)
[Invisible article of black hat SEO analysis] (https://thief.one/2017/09/28/3/)
[Black Hat SEO Analysis Tool] (https://thief.one/2017/09/28/2/)
[Black Hat SEO Anatomy] (https://thief.one/2017/09/28/1/)