What is Baiduspider and How to Manage It
What is Baiduspider and How to Manage It

What is Baiduspider and How to Manage It

An important factor to consider when managing website search engines are website crawlers. Search bots or “spiders” regularly visit your pages and depending on what they find, they decide how to rank your content on search results.

Even though every search engine automatically sends spiders to your pages, you still have some control over how they will interact with your website. Therefore, it is important to manage spider settings correctly in order to have a positive effect on your SEO.

When running a website in China, or in the Chinese language, you will deal with the Baidu Spider bot. Although similar to Google Spider, it is important to understand the differences of the Baiduspider bot and how to block it. As well as how to leverage it in a way that will boost your website performance.

baidu spider how to deal with it

What Is Baiduspider?

Baiduspider is the official name of the crawling spider used by Baidu, the most important search engine in China.

As you probably know, Google doesn’t work in China. Therefore, this Baidu robot is needed when targeting the Chinese market.

How Baiduspider Works

As with any other search engine crawler, Baiduspider is an automated software that regularly checks the content of your website to collect information that will then be used to index your pages in the search engine database.

Every time Baiduspider visits your pages it will look for specific information such as the structure of your pages, quality of content, content updates, keywords, and so on. The crawling process is divided into two steps: 1) the spider crawls the page and puts it in storage and 2) it creates a list of links on your page to be checked later.

With the data collected, Baidu will rank your content. A webpage is placed on the top of the search results if Baiduspider deems it valuable, or it will be downgraded if the content doesn’t meet Baidu’s guidelines.

Likely, you neither want, nor need, the Baiduspider to visit all your pages. Therefore, you need to understand both how to block Baiduspider and how to make it like your content.

Many webmasters wonder if Baiduspider crawls Chinese websites only, or if it also crawls websites outside of China.

To answer this question, keep in mind that Baiduspider bot works exactly like any other visitor to your website. Therefore, if a user in China can access your website, the Baidu robot will too. Of course, when Baidu crawls an overseas website with high page rendering time or latency, the overall ranking will certainly be low, and you will not be visible in China. For this reason, if your target is the Chinese market you must have your website hosted in China.

Baiduspider user-agents

To collect information, Baiduspider bot uses different agents. You will find a list of them here divided by scope:

Baidu’s Product Name

Baidu’s User Agent

Baidu Web/Mobile Search

Baiduspider

Baidu Image Search

Baiduspider-image

Baidu Video Search

Baiduspider-video

Baidu News Search

Baiduspider-news

Baidu Bookmark Search

Baiduspider-favo

Baidu Business Search

Baiduspider-ads

Baidu Union Search

Baiduspider-cpro

Should I Allow the Baiduspider to Crawl My Site?

If you are targeting the Chinese market, you must absolutely let Baiduspider crawl your website. Moreover, you need to understand the basic guidelines set by Baidu to ensure that your pages are indexed properly. We will consider this topic in the next section “How to Rank Higher on Baidu.”

However, if you are not interested in the Chinese market, you must prevent the Baiduspider bot from crawling your website to save bandwidth that you want available for real users. Later, in this article, we will explain how to block the Baidu Spider bot.

How to leverage BaiduSpider to rank higher on Baidu

SEO campaigns in China must comply with Baidu requirements that are in some ways like Google’s standards, yet there are many differences.

For instance, websites hosted overseas do not perform well in China. This is due the great firewall. Therefore, unless your website is hosted in China, pages will load slowly or even won't be accessible at all. As is true in the western world, a slow website will always be penalized both by search engines and users.

Since speed is such a critical factor, it is highly advisable to implement a Content Delivery Network (CDN) on your website. A CDN will play a major role in speeding up your website in China due to the size of the country. With a CDN, your user will have access to your content from the fastest server closest to their position. This will reflect positively when Baiduspider visits your website.

Other major differences are connected to the homepage of your website. Baidu focuses significantly on the homepage, whereas for Google, internal pages are more relevant. Additionally, on Baidu, fresh content is more important than long, in-depth articles.

Another important point is that Baidu Spider will have a difficult time understanding JS and Flash content, having an HTML alternative will ease the job of the Chinese bot.

These are just some of the tricks that will help you rank higher on Baidu. However, if you want to boost your website performance in China, we suggest you read our extensive guide on “How to do SEO in China.”

Can Baidu bot really understand the content of my website?

Now that you know how the Baidu bot works you might be wondering if it can fetch, understand and render all the elements of your site, this is a crucial step as you need to be sure that Baidu considers all the elements of your site for SEO positioning.

The good news is that Baidu has their own version of Search console called Baidu Webmaster tools or Baidu Ziyuan. If you have a webmaster account you can run an analysis to detect crawling issues and take a look into the crawled HTML content by the bot.

In case you don't have a webmaster account you can run a test on Merkle's Fetch and Render free tool, just be sure you choose the Baidu bot on the User agent.

How to Block Baidu Spider

The only reason for blocking Baiduspider is if you aren’t interested in targeting the Chinese market. You will avoid wasting precious bandwidth that can be used by your visitors and not needlessly slow down your website.

It can also happen that you don't want certain pages to be fetched and indexed by the search engine, such as author pages, tags, and so on. In this case, you will need to block the Baiduspider bot from these specific pages.

To prevent Baiduspider from crawling your website you need to create a robots.txt file. You may use this file to block Baiduspider across the whole website, or only selected pages.

These are some examples of the robots.txt you can use:

To prevent Baidu from crawling throughout the whole website:

User-agent: Baiduspider

 Disallow: /

To prevent Baidu from crawling video:

User-agent: Baiduspider-video

 Disallow: /

To prevent Baidu from crawling images:

User-agent: Baiduspider-image

 Disallow: /

Some users dislike that despite having added the robots.txt file, Baidu may pass over it and continue crawling the website. Other users report that Baidu can be quite aggressive in crawling their website and have experienced problems with the Baidu bot visiting their website too often and too intensively.

If this is the case, and you are not targeting the Chinese audience, you must block the Baiduspider and prevent its crawling from affecting your website speed performances negatively.

In addition to adding the robots.txt, another way to solve this problem is to try banning the Baidu Spider IP on your server domain. This operation may require some time since the search engine uses multiple Baiduspider IPs.

This is a list of some of the many Baidu Spider IPs:

220.181.32.11 220.181.32.16 220.181.32.22 220.181.32.49 220.181.32.51 220.181.32.64 220.181.32.68 220.181.32.98 220.181.50.207 220.181.50.220 61.135.168.131 61.135.168.14 61.135.168.173 61.135.168.39

Another option, if you have PHP on your web server, is to set it to make Baidu wait up to 999 seconds for each page request.

In addition to that, if you want to prevent Baidu from showing a snapshot of your pages, you can use the following meta tag: <meta name=“Baiduspider” content+”noarchive”> and if you don't want specific pages to be indexed go to the classical noindex meta tag: <meta name="robots" content="noindex" /> 

Conclusion

Understanding how Baidu crawls your website is important when deciding which pages you want to be indexed, and which pages you do not. Moreover, understanding which elements of your website are inspected by Baiduspider and how the bot understand them will help you to optimize your resources to achieve better positioning in the Baidu search results.

If you need assistance in optimizing your website for the Chinese market, get in touch with our experts. QPSoftware has been actively present in the Chinese web environment for over a decade. We have helped many companies in becoming successful in China. Contact us today, we can assist you in optimizing your website for Baidu, creating social media campaigns in China, and using all the marketing tools necessary in the Chinese market.

QR code for this page URL

Submit your project
Get a free quote

Contact Now