The crawl budget is the amount of resources that Google spends on crawling websites. Since it is not unlimited, there is theoretically a risk that Google will have too little ‘budget’ for all of your site’s URLs. But how big is the budget, the risk, and what can you do about it?
Regular and complete crawling of web pages is crucial for content to show up in Google searches. However, not even Google has unlimited resources at its disposal. Therefore, the crawl budget per website is limited.
Crawl budget – a definition
The crawl budget can be described as the maximum number of pages that the Googlebot can crawl.
It is made up of two elements:
- Crawl rate: The crawl rate limit depends primarily on the crawl health, i.e. how quickly a website responds. As a website operator, you can also specify a limit in the Google Search Console.
- Crawl demand or crawl requirement: How high the crawl requirement is depends on the popularity of URLs. Obsolete and outdated content has a negative effect. Certain changes, such as a domain change, can increase the crawl requirement.
Crawl rate and crawl demand are taken together to calculate the crawl budget.
Note: In addition to the crawl budget, there is also the index budget. This determines how many pages can be indexed. The difference becomes clear when a page has many inaccessible subpages with the 404 error code. Crawling these pages is a burden on the crawl budget, but not on the index budget.
According to a Google blog post, URLs with little added value have a negative effect on crawling and indexing:
- Soft Errors
- Hacked pages
- Duplicate content
- Spam and low-quality content
- Faceted navigation and session IDs
- Infinite Spaces
In these cases, you should expect Google to reduce its crawling activities.
How you can influence the crawl budget
It naturally follows from the points made in the previous paragraph that, in order to increase the crawl budget, it is necessary to create high-quality content.
There are a number of other ways to influence crawling and indexing, however:
- Optimise the internal linking so that the crawler can easily find all of the important content
- Opt for a flat page architecture with few layers
- Remove duplicate content or mark it with canonical tags
- Repair broken links
- Use the robots.txt file to prevent the Googlebot from scanning unimportant pages
- Update content regularly
How important is the crawl budget?
Google itself is relaxed on this question. Accordingly, owners of small or medium-sized websites with up to a few thousand URLs need not fear that the crawl budget will not be sufficient. Prioritisation only makes sense for large sites and those that automatically create content based on URL parameters.
This does not mean that crawling and indexing is not an issue for smaller sites. After all, there is also a positive effect for smaller sites if the Googlebot indexes important pages without any problems and ignores inferior content.