Invisible, it means that the road is in the same way as Xiyi;
This system article is divided into four parts, namely, tactics, tools, hidden articles, and summary articles. This article is a black hat SEO article, mainly introduces the concept of black hat seo and some commonly used techniques.
First of all, it must be said that Black Hat SEO is an old topic. It is not difficult for me to imagine that there must be someone in the comment area that has been voicing for a long time. The author is suspected of having a cold meal. I agree with this point of view, but after careful lingering, I feel helpless. A black method that has long been used up, a black means of causing huge losses to the Internet industry every year, why can it continue to this day? Is it technically difficult to break through, or is it driven by interest to turn a blind eye?
When I found out that there are few introductions to this black production method in public resources and it is not detailed, the reason can be imagined. In order to create a good Internet environment, I will analyze the black-skins SEO black-and-white production method in combination with the actual case, hoping to resonate with the security community and resist.
Since I have been writing this article for a year, and I am not studying related technologies during this period, please understand if there is any deviation or delay in the content of the article.
Interlude: Interestingly, just a few days ago, a friend asked me about Black Hat SEO. The reason is that a website operated by a friend of his friend, the page has somehow appeared in the content of gambling betting, and automatically deleted after deletion. Generated, it is very difficult to seek help from him.
SEO is called search engine optimization, which means to improve search engine ranking by means of station optimization and off-site optimization. Since there are SEO technologies, there will be corresponding practitioners, they are called white hat SEO, specifically refers to professionals who help improve the ranking of the site through fair SEO techniques.
Of course, there will be black in the white, because the process of white hat SEO optimization will be very long, a new station wants to get a good ranking, it often takes a few years to optimize promotion. Therefore, some small partners who want to quickly improve their website rankings began to study cheating on SEO, which gave birth to black hat SEO. Black Hat SEO refers to a type of SEO technology that allows sites to quickly improve their rankings by cheating, or hacking techniques such as black chain (dark chain), station group, website hijacking (search engine hijacking), bridge pages, etc. Black Hat SEO can quickly improve the rankings, but after all, it is a violation of cheating and easy to be K.
Black hat SEO has many methods, and it is constantly updated. The most common ones include using pan-analysis to make station groups, invading high-weight websites to hang dark chains, invading high-weight websites to do web hijacking, tampering with high-weight website content, and utilizing High-weight website secondary directory to do promotion page, modify nginx configuration to do directory reverse proxy and so on. Next, I will introduce some common methods in combination with actual cases.
The DNS pan-parsing can quickly establish a station group, because a first-level domain name can generate countless second-level domain names. Of course, it is necessary to use the station group tool. Because the station group needs to have many pages with different contents, it is obviously impossible to manually establish the website. . The purpose of the seo staff to build a station group is to quickly attract a large number of search engine crawlers and increase the number of websites in the search engine. The following is a screenshot of a generic second-level domain name group:
It should be noted that the second-level domain name in the above screenshot is not bound by a dns parsing record. The parsing is set to *, which is pan-parsing. The server side has programs or code to control the different web content when constructing different second-level domain names, which makes the search engine mistakenly think that each second-level domain name is a separate website.
There are many advantages to pan-parsing, such as being user-friendly (even if you enter the wrong second-level domain name, you can jump to the target site), and you can be included in the search engine more quickly. Based on these advantages, many webmasters will choose to use this method to increase the inclusion of the website. However, if the pan-analysis is not properly used, it may bring unimaginable harm.
There are many ways to use the general analysis to do black hat seo. Based on whether you need to invade the website and the dns server, I will introduce it into the intrusion method and the non-intrusion method.
Real case: A few months ago we found a large number of betting pages on a major government website. The screenshots of the evidence are as follows:
After analysis, I found that this method uses pan-analysis. From the screenshots, you can see that there are a large number of secondary or even third-level domain names of this government website. These domain names are randomly constructed and will jump to the betting after the visit. Illegal pages such as pornography, and access to the first-level domain name is normal content. And do not analyze which techniques are used in the process of jumping. It is not difficult to see from the general analysis record that this website has been tampered with dns resolution records. We have reason to believe that the hacker obtained the dns resolution control permission of this domain name and parsed the domain name to the server prepared by the hacker. Then the purpose of the hacker is very obvious, in order to allow the search engine to quickly include the second or third level domain name, so as to achieve the purpose of draining to the illegal page.
We analyze this government website’s intrusion characteristics to derive this event process. The hacker obtained the dns resolution permission of the government website through intrusion (how to get temporarily unknown), and then added the pan-resolved record to point this record to the hacker. A good server, and this server has a dynamic language to achieve access through different second-level domain names, returning different page results. Due to the high weight of the government website itself, the second-level domain name page was quickly included by Baidu to achieve the purpose of draining illegal pages. The advantage of this method is that you don’t have to invade the website, but you only need to obtain the domain name resolution permission (of course, it is not easy to obtain the domain name resolution permission).
Real case: A few days ago we found a website (sdddzg.cn) using pan-parsing for malicious promotion. After viewing the site features, we tried to construct different second-level domain accesses. The screenshots are as follows.
Construct a second-level domain name access:
The final result is:
You can see the results of the return to the content of the web page and the url. When we try to construct a different second-level domain name access, we find that the returned content is different, but it is found by the same server by obtaining ip. First of all, we can easily imagine that this domain name must be pan-parsed, so how does it control the change of web content?
Looking at the source code of the webpage, you can see that the source code of jiang.gov.cn is embedded in the target webpage.
So in fact, it is not difficult to implement this technology, you can use code to implement on the server. First, obtain the requested second-level domain address, and then access the second-level domain content to obtain the source code into its own web page. If the constructed second-level domain name is not a complete domain address (eg 1.sdddzg.cn), then a piece of source code is randomly returned. The advantage of this approach is that you don’t have to invade the website, you just need to set up a server yourself, but the promotion effect is not so good.
The method of implanting dark chains in web pages has been relatively outdated, and it is currently used less, because search engines have been able to detect this cheating. In order to introduce the integrity of the knowledge, I will briefly introduce it here. Dark chains, also known as black chains, are hidden links, which are one of the cheating techniques of black hat SEO. The purpose of hanging the dark chain is very simple, increasing the external link of the website and improving the ranking of the website; the implementation methods are mainly divided into several types: implementation by CSS, implementation by JS, implementation by DIV+JS, etc.
For details, please refer to: [Black Hat SEO Dark Chain] (http://thief.one/2016/10/12/%E9%BB%91%E5%B8%BDSEO%E4%B9%8B%E6%9A %97%E9%93%BE/)
Real case: When I first studied black hat SEO a year ago, I found an interesting black hat SEO method. Although the technique is rather vague, it is also effective. So at the time of writing this article, I deliberately found a typical case to share with you, the screenshots of the evidence are as follows.
Displaying the contents of the parameters in the URL into a web page is a special feature of some web pages. Past experience tells me that if this feature is not handled well, it may lead to XSS vulnerabilities, and now I have to realize that this feature has also been used in black hat seo. By constructing a promotion keyword in a url or post packet (common in the search box function), and then adding a promotion keyword page to the spider pool, the search engine can be included for promotion purposes. Generally, this method is often used to promote qq numbers, profitable websites, etc. (similar to advertising), and when we search for certain keywords through search engines (such as pornography resources), this page will be displayed to promote itself. The purpose of the account or website, of course, this is just a means of promotion, and does not involve drainage.
Web hijacking, also known as website hijacking or search engine hijacking, is currently the most popular practice in black hat SEO. The reason can be summarized as follows: easy to record, difficult to find, easy to record performance is not a good mechanism for search engines to detect this cheating, web hijacking techniques can still be heavily drained. Difficult to find that webpage hijacking is relatively hidden, it is difficult for non-technical personnel to find out its existence.
Web hijacking can be divided into server hijacking, client hijacking, Baidu snapshot hijacking, Baidu search hijacking, etc.
The manifestation of web hijacking can be hijacking jumps, or it can be hijacked web content (different from direct tampering with web content), and is currently widely used in profiteering industries such as private services and gaming.
A few months ago I dealt with a webpage hijacking case because of the presence of gambling-related content on a government website (excluding news pages), which was clearly non-compliant. Excluding the addition of administrator errors, I am afraid that this site is mostly hacked. First I visited the link on the record, and then a normal government page appeared in the browser, and between the two, the page instantly jumped to the betting page.
Figure 1 shows the normal government page:
Figure 2 shows the betting page:
You can see that the domain name of the betting page is www.0980828.com, apparently not the previous government website domain name xxxx.gov.cn. Seeing this phenomenon, combined with years of security experience, I can roughly guess that this site should be hijacked by the web. By analyzing the data packets of the above process, it is not difficult to find that an illegal code is embedded in the front page of the website.
This code is stored on the 22.214.171.124 server, view the server information and find it in Japan.
By accessing this code, the return content is to go to www.0980828.com.
At this point, we can easily find that the reason for the page jump is that the xxxx.gov.cn webpage is illegally embedded with a code that can control the jump to the betting page when accessing the web page. This is the most basic and common way for search engine hijacking, with many variations and different types. Finally, I checked through the login web server and found that a large number of html files were tampered with and were written to the external js reference at the beginning of the file. Then the process of this intrusion event should be that the hacker invades the server through certain vulnerabilities in the web application (actually the management background weak password + arbitrary file upload), and achieves the purpose of webpage hijacking by batch tampering with the server static file. There are so many ways to hijack a webpage. It is not a case that can be summarized. For more details, please continue to see the following.
Server-side hijacking is also called global hijacking. This method is to modify the dynamic language file of the website and determine the source of the access control to achieve the purpose of webpage hijacking. Its characteristics are often achieved by modifying the suffix file such as asp/aspx/php to achieve the effect of dynamically rendering web content.
Files such as Global.asa, Global.asax, conn.asp, and conn.php are special. The function is to load the script each time a dynamic script is executed, and then execute the target script. So just write the code that determines the user’s system information in Global.asa (access source, etc.), if it is a spider access, return the keyword page (the website you want to promote), if it is a user visit, return to the normal page.
There are many ways to hijack clients, but the two most commonly used are: js hijacking and Header hijacking.
The purpose of js hijacking is to control website jumps, hide page content, window hijacking, etc. by implanting malicious js code into the target page. The js implantation method can be directly written into the source code through the intrusion server; it can also be written in the database, because some pages will present the database content.
Js hijack code case:
The following code can be used to search through the search engine and click on the page, execute a js and jump to the betting page; and directly enter the URL to access the web page, jump to a 404 page.
Code analysis: through the referer to judge the way, if the referer is empty, it will jump to the 404 page, if it is the referer inside the search engine will also have a display, and then write code control jump. If you just control the implementation to display different content, you can modify the php, asp code; if you need to hijack the search engine search box, you can write JS code to do the browser local jump. Of course, the js function can be extended indefinitely. For example, you can control the first access of a ip within one day, the rest of the access jumps, and so on.
Header hijacking, is to add a special tag in the head of the html code, the code is as follows:
The header hijacking utilizes the Meta Refresh Tag function to take traffic away.
Some hackers like to directly tamper with the content of the webpage after invading the website, such as putting their own qq number, or tampering with the webpage as an illegal page. Here I defy the hackers who do this because it is the worst and lowest level. The bad thing is that directly tampering with the content of the webpage may result in irreparable loss of the website; the low level is easy to be discovered in this way, and it does not really promote the promotion.
That is, after the hacker invades the website, create a lot of pages for promotion under the secondary directory of the website. In order to achieve the purpose of drainage, hackers often need to create a large number of secondary directory pages, so it is necessary to use parasite programs to automatically create pages. This method also needs to invade high-weight websites to obtain website server permissions. Different from the webpage hijacking method, this method focuses on the advantages of the high-weight website itself, and creates multiple promotion pages in its directory; while the webpage hijacking focuses on hiding itself, it can dynamically render the webpage content to the customer. Therefore, in actual use, hackers often use both. This method is obviously different from the method of using pan-analysis to do black hat seo. Although it also takes advantage of the high-weight website itself, the general analysis uses the second-level domain name, and this method uses the secondary directory, of course. Both have the same effect.
The case of using the high-weight website secondary directory method is similar to the general case of the general analysis, and will not be described in detail here. Since I mentioned earlier that this technique often requires the use of parasite programs, let us look at what is a parasite program? What is it?
[Black Hat SEO Series] Dark Chain
[[Black Hat SEO Series] Web Hijack] (https://thief.one/2016/10/12/%E9%BB%91%E5%B8%BDSEO%E4%B9%8B%E7%BD%91% E9%A1%B5%E5%8A%AB%E6%8C%81/)
[[Black Hat SEO Series] page jump] (https://thief.one/2016/10/10/%E9%BB%91%E5%B8%BDSEO%E4%B9%8B%E9%A1%B5 %E9%9D%A2%E8%B7%B3%E8%BD%AC/)
[[Black Hat SEO Series] Basics] (https://thief.one/2016/10/09/%E9%BB%91%E5%B8%BDSEO%E4%B9%8B%E5%9F%BA% E7%A1%80%E7%9F%A5%E8%AF%86/)
Summary: The technology of black production continues to improve, we can’t stand still!
[Summary of Black Hat SEO Analysis] (https://thief.one/2017/09/28/4/)
[Invisible article of black hat SEO analysis] (https://thief.one/2017/09/28/3/)
[Black Hat SEO Analysis Tool] (https://thief.one/2017/09/28/2/)
[Black Hat SEO Anatomy] (https://thief.one/2017/09/28/1/)