How to simulate Googlebot using Chrome - (2024)

At DeepCrawl I helpeddebug thousands of technical SEOissueseach yearon some of the largest enterprise websites in the world.

I created a Googlebot simulator in Chrome to quickly replicate and debug complex technical SEO issues.I called it theChromebot technique.

In this guide, I’m going to explain how to make your own Googlebot simulator in Google Chrome to debug complex technical SEO issues.

Click here to jump tothe Googlebot Chrome technique.

What is the Chromebot technique?

The Chromebot techniqueis a simple non-code solution which allows a human toconfigure chrome settings so they act like aGooglebot crawler (not rendering).

It can help SEO specialists identify uniquecrawling and indexing issues on a website.

Why use this technique?

I’ve used this technique a lot at DeepCrawl when debugging countless client crawling and indexing issues.

It’s a fairly simple but effective non-code technique to help technical SEOs think more like a search engine crawler and less like a human.

Many websites can do funny things when Googlebot users request pages.

How do you know Googlebot crawler’s settings?

All of the settings are based on the time I spent chatting with engineers, studying the documentation around Googlebot, and updatingDeepCrawl’s Page Rendering Service documentation.

I’ve listed the original documents that I’ve based the settings on:

What do you need for this technique?

All you need isGoogle Chrome Canaryand aVirtual Private Network (VPN).

Why simulate Googlebot in Google Chrome?

There are four core benefits to using this technique which I will briefly explain.

1. Debugging in Google Chrome

I have debugged hundreds of websites in my time at DeepCrawl. Third party web crawling tools are amazing but I’ve always found that they have limits.

When trying to interpret results from these tools I always turn to Chrome to help understand and debug complex issues.

Google Chrome is still my favourite non-SEO tool to debug issues and when configured it can even simulate Googlebot to validate what crawling tools are picking up.

2. Googlebot uses Chromium

Gary clarified that Googlebot uses its own custom built solution for fetching and downloading content from the web. Which is then passed onto the indexing systems.

How to simulate Googlebot using Chrome - (1)

There is no evidence to suggest that Googlebot crawler uses Chromium or Chrome, however, Joshua Giardino at IPullRank makes a great argument about Google usingChromium to create a browser based web crawler.

Google Chrome is also based on theopen-source Chromium project, as well as many other browsers.

It makes sense then to use a Chromium browser to simulate Googlebot web crawling to better understand your website.

3. Unique SEO insights

Using Google Chrome to quickly interpret web pages like Googlebot can help to better understand exactly why there are crawling or indexing issues in minutes.

Rather than spending time waiting for a web crawler to finishing running, I can use this technique to quickly debug potential crawling and indexing.

I then use the crawling data to see the extent of an issue.

4. Googlebot isn’t human

The web is becoming more complex and dynamic.

It’s important to remember that when debugging crawling and indexing issues you are a human and Googlebot is a machine.Many modern sites treat these two users differently.

Google Chrome which was designed to help humans navigation the web, can now help a human view a site like a bot.

How to setup Googlebot simulator

Right, enough of the why. Let me explain how to create your own Googlebot simulator.

Download Google Chrome

I’d recommend downloadingChrome Canaryand not using your own Google Chrome browser (or if you’ve switched to Firefox then use Google Chrome).

The main reason for this is because you will be changing browser settings which can be a pain if you forget to reset them or have a million tabs open. Save yourself some time and just use Canary as your dedicated Googlebot simulator.

Download or use a VPN

If you are outside the United States then make sure you have access to aVirtual Private Network (VPN), so you can switch your IP address to the US.

This is because by defaultGooglebot crawls from the US, and to truly simulate crawl behaviour you have to pretend to be accessing a site from the US.

Chrome Settings

Once you have these downloaded and set up it’s time to configure Chrome settings.

I have provided an explanation of why you need to configure each setting but the original idea of using Chromebot came to me when I rewrote thePage Rendering Service guide.

Web Dev Tools

The Web Developer Tools UI is an important part of viewing your website like Googlebot. To make sure you can navigate around the console you will need to move the Web Dev Tools into a separate window.

Remember that your DevTools window is linked to the tab you opened it in. If you close that tab in Google Chrome the settings and DevTools window will also close.

It is very simple to do this, all you need to do is:

  1. Right-click on a web page and click inspect element (or CTRL+SHIFT+I)
  2. Navigate to the right side, click on the 3 vertical dots, and select the far left dockside option.

The Web Dev Tool console is now in a separate window.

How to simulate Googlebot using Chrome - (2)

User-agent token

A user-agent string – or line of text – is a way for applications to identify themselves to servers or networks. To simulate Googlebot we need to update the browser’s user-agent to let a website know we are Google’s web crawler.

Command Menu

Use the Command Menu (CTRL + Shift + P) and type “Show network conditions” to open the network condition tab in DevTools and update the user-agent.

Manual

To do this, navigate to the separate Web Dev Tools window and press the Esc button. This will open up the console.

How to simulate Googlebot using Chrome - (3)

Click on the three little buttons on the left of the console tab.

How to simulate Googlebot using Chrome - (4)

In the list of options, click on the network conditions. This will open the network conditions tab next to the console tab.

How to simulate Googlebot using Chrome - (5)

In the network conditions tab scroll down and untick the ‘user-agent select automatically’ option.

How to simulate Googlebot using Chrome - (6)

Google Chrome will now allow you to change the user-agent string of your browser to Googlebot or Googlebot Mobile.

How to simulate Googlebot using Chrome - (7)

I usually set it to Googlebot Mobile with mobile-indexing by default. Although I’d recommend checking in Google Search Console to see which Googlebot crawls your website most often.

How to simulate Googlebot using Chrome - (8)

The Googlebot user-agent will use the dev beta Chrome version, not the stable version, automatically. This isn’t usually an issue for 99% of websites but if you need to you can input the custom UA from stable Chrome.

How to simulate Googlebot using Chrome - (9)

Now you’ve changed the user-agent, close the console (press ESC again).

Enable stateless crawling

Googlebotcrawls web pages statelessacross page loads.

The Google Search developerdocumentation statesthat this means that each new page crawled uses a fresh browser and does not use the cache, cookies, or location to discover and crawl web pages.

Our Googlebot simulator also needs to replicate being stateless (as much as it can) across each new page loaded. To do this you’ll need to disable the cache, cookies, and location in your Chrome.

Disable the cache

Command Menu

Use the Command Menu (CTRL + Shift + P) and type “Disable Cache” to disable the cache when DevTools is open.

Manual

To disable the cache go to the Network panel in DevTools and check the ‘Disable cache’.

How to simulate Googlebot using Chrome - (10)
Disable cookies

Command Menu

Use the Command Menu (CTRL + Shift + P) and type “Disable Cache” to disable the cache when DevTools is open.

Manual

In Chrome navigate tochrome://settings/cookies. In the cookies settings choose the option to “Block third-party cookies”.

How to simulate Googlebot using Chrome - (11)
Disabling location

In Chrome navigate to thechrome://settings/content/locationin your browser. Toggle the “Ask before accessing (recommended)” to “Blocked”.

How to simulate Googlebot using Chrome - (12)

Disable Service Workers

Googlebot disables interfaces relying on theService Worker specification. This means it bypasses the Service Worker which might cache data and fetches URLs from the server.

To do this navigate to the Application panel in DevTools, go to Service Workers, and check the ‘Bypass the network’ option.

How to simulate Googlebot using Chrome - (13)

Once disabled the browser will beforced to always request a resource from the networkand not use a Service Worker.

Disable JavaScript

The Googlebot crawlerdoes not execute any JavaScriptwhen crawling.

How to simulate Googlebot using Chrome - (14)

The crawling and rendering sub-systems are further explained in theUnderstand the JavaScript SEO basics guideandGooglebot & JavaScript: A Closer Look at the WRSat TechSEO Boost 2019.

How to simulate Googlebot using Chrome - (15)

Googlebot is a very complex system and even thisdiagram above is an oversimplification. However; the Googlebot crawler must first fetch, download, and inspect a web page regardless of rendering.

For more information on how to diagnose rendering issues check out myHow to Debug JavaScript SEO issues in Chrome.

It’s important to make sure we can inspect server-side HTML, http status codes, and resources without JavaScript in our Googlebot simulator.

Command Line

Use the Command Menu (CTRL + Shift + P) and type “Disable JavaScript” to quickly disable JavaScript.

Manual

To disable JavaScript in Chrome, navigate to DevTools and click on the settings cog.

How to simulate Googlebot using Chrome - (16)

Then check the ‘Disable JavaScript’ box.

How to simulate Googlebot using Chrome - (17)

Now when you use your Googlebot simulator you’ll only be inspecting the initial server-side HTML. This will help to better understand if there is any link, content, or HTTP status code issues causing the crawler problems.

Network Panel

Finally, it is time to configure the Network panel. It is in this area in DevTools where you will be spending a lot of time as Googlebot.

How to simulate Googlebot using Chrome - (18)

TheNetwork panelis used to make sure resources are being fetched and downloaded. It is in this panel that you can inspect the metadata, http headers, content, etc of each individual URL downloaded when requesting a page.

However; before we can inspect the resources (HTML, CSS, IMG) downloaded from the server like Googlebot we need to update the headers to display the most important information in the panel.

Go to the Network panel in DevTools (now a separate window). On the table in the panel right click on the column headers and select the headings listed below to be added as columns in the network panel(remove any others not listed).

How to simulate Googlebot using Chrome - (19)

I have also provided a brief explanation of each heading and why they should be added.

Status

The https status code of the URL being downloaded from the server.Googlebot will alter its behaviour of crawlingdepending on the type of http status code – one of the most critical pieces of information to understand when auditing URLs.

Scheme

Displays the unsecure https:// or secure https:// scheme of the resource being downloaded. Googlebot prefers tocrawl and index HTTPS URLsso it’s important to get a good understanding of the scheme being used by resources on a page.

Domain

Displays the domain where the resources were downloaded. It’s important to understand if important content relies on an external CDN, API, or subdomain as Googlebot might have trouble fetching the content.

Remote address

Google Chrome lists the IP address of the host where the resources are being downloaded. As thecrawl budget of a website is based on the IP address of the hostand not on the domain, it is important to also take into account the IP address of each URL fetched.

Type

The MIME type of the requested resource. It’s important to make sure important URLs are labeled with the correct MIME type as different types ofGooglebot are interested in different types of content (HTML, CSS, IMG).

Size

The combined size of the response headers plus the response body, as delivered by the server. It’s important to improve the site speed of a website, as this can help both yourusers and Googlebot access your site quicker.

Time

The total duration, from the start of the request to the receipt of the final byte in the response. The response of your server can affect thecrawl rate limit of Googlebot. If the server slows down then the web crawler will crawl your website less.

Priority

The best-guess of the browser of which resources to load first. This is not how Googlebot crawls prioritises URLs to crawl but it can be useful to see which resources are prioritised by the browser (using its own heuristics).

Last Modified

The Last-Modified response HTTP header contains the date and time at which the origin server believes the resource was last modified. This responsecan be used by Googlebot, in combination with other signals,to help prioritize crawling on a site.

US IP Address

Once you have updated the Network panel headers in Chrome DevTools your Googlebot simulator is almost ready.

If you want to use it straight away you need to switch to a US IP address.

Googlebot crawls from theUnited States of America. For this reason, I’d always recommend changing your IP address to the US when using your Googlebot simulator.

It’s the best way to understand how your website behaves when visited by Googlebot. For example, if a site is blocking visitors with US IP addresses or geo-redirects visitors based on their location, this might cause issues with Google crawling and indexing a website.

I, Googlebot Chrome

Once your IP address is switched you are ready to go and have your own Googlebot simulator.

How to simulate Googlebot using Chrome - (20)

If you want to test to see if it works, go toangular.iooreventbrite.com. These websites require JavaScript to load content and links – with JavaScript disabled these sites won’t load content properly in the interface.

Frequently Asked Questions

Does the simulator work for just one tab?

Yes. Google DevTool settings are just for the tab you currently have opened. Opening a new tab will cause the Disable JavaScript and User-agent settings to be reset.

Other Chrome based settings (cookies, service workers) will still be configured.

Does this help to debug JavaScript SEO issues?

Yes this technique can be used to debug JavaScript SEO issues on a website when comparing view-source to rendered HTML. Although there might be better extensions and tools to do this at scale.

Do I need to update the settings every time?

Once your tab is closed you’ll need to update the following settings:

  • Disable JavaScript
  • Update User-agent token

All other settings will have been saved by the browser.

Why do I need to use Chrome Canary?

I only suggest using this to stop you from messing up your Chrome browser and having to spend time going back and forth between settings.

If you use Firefox or Safari then just download the normal Google Chrome.

I’ve already built this in headless chrome or through some other automation?

First off, well done! If you’re like me and don’t (currently) have the time/capacity to learn new coding languages then this non-code method is great to get started.

How to simulate Googlebot using Chrome - (2024)
Top Articles
Goodwill Color Of The Week Tennessee
What Do They Mean for Your Refund?
Spasa Parish
The Machine 2023 Showtimes Near Habersham Hills Cinemas
Gilbert Public Schools Infinite Campus
Rentals for rent in Maastricht
159R Bus Schedule Pdf
11 Best Sites Like The Chive For Funny Pictures and Memes
Finger Lakes 1 Police Beat
Craigslist Pets Huntsville Alabama
Paulette Goddard | American Actress, Modern Times, Charlie Chaplin
Red Dead Redemption 2 Legendary Fish Locations Guide (“A Fisher of Fish”)
What's the Difference Between Halal and Haram Meat & Food?
Rugged Gentleman Barber Shop Martinsburg Wv
Jennifer Lenzini Leaving Ktiv
Havasu Lake residents boiling over water quality as EPA assumes oversight
Justified - Streams, Episodenguide und News zur Serie
Epay. Medstarhealth.org
Olde Kegg Bar & Grill Portage Menu
Half Inning In Which The Home Team Bats Crossword
Amazing Lash Bay Colony
Cato's Dozen Crossword
Cyclefish 2023
What’s Closing at Disney World? A Complete Guide
New from Simply So Good - Cherry Apricot Slab Pie
Ohio State Football Wiki
Find Words Containing Specific Letters | WordFinder®
Abby's Caribbean Cafe
Joanna Gaines Reveals Who Bought the 'Fixer Upper' Lake House and Her Favorite Features of the Milestone Project
Pull And Pay Middletown Ohio
Navy Qrs Supervisor Answers
Trade Chart Dave Richard
Sweeterthanolives
How to get tink dissipator coil? - Dish De
Lincoln Financial Field Section 110
1084 Sadie Ridge Road, Clermont, FL 34715 - MLS# O6240905 - Coldwell Banker
Kino am Raschplatz - Vorschau
Classic Buttermilk Pancakes
Pick N Pull Near Me [Locator Map + Guide + FAQ]
'I want to be the oldest Miss Universe winner - at 31'
Gun Mayhem Watchdocumentaries
Ice Hockey Dboard
Infinity Pool Showtimes Near Maya Cinemas Bakersfield
Dermpathdiagnostics Com Pay Invoice
A look back at the history of the Capital One Tower
Alvin Isd Ixl
Maria Butina Bikini
Busted Newspaper Zapata Tx
Rubrankings Austin
2045 Union Ave SE, Grand Rapids, MI 49507 | Estately 🧡 | MLS# 24048395
Upgrading Fedora Linux to a New Release
Latest Posts
Article information

Author: Gov. Deandrea McKenzie

Last Updated:

Views: 6131

Rating: 4.6 / 5 (46 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Gov. Deandrea McKenzie

Birthday: 2001-01-17

Address: Suite 769 2454 Marsha Coves, Debbieton, MS 95002

Phone: +813077629322

Job: Real-Estate Executive

Hobby: Archery, Metal detecting, Kitesurfing, Genealogy, Kitesurfing, Calligraphy, Roller skating

Introduction: My name is Gov. Deandrea McKenzie, I am a spotless, clean, glamorous, sparkling, adventurous, nice, brainy person who loves writing and wants to share my knowledge and understanding with you.