Jetindexer
APIs

How we consume Google APIs for Indexing and URL checking

Ivan Radunovic

In this article, you’ll find how Jetindexer knows which pages are not indexed, when to request their indexing, the limits, and the edge cases.

Requesting new URL indexing in the Google Search Console interface is a very straightforward task, but when you look into the API, that’s where things become complicated pretty fast.

Since manually submitting URLs is a dealbreaker, we turned to automating this task.

Checking Google Search Results?

Every site has or at least should have a sitemap.

Sitemap represents a Table of contents of the entire website.

Sitemaps are intended to be used by crawlers or bots; you can read more about them here: Understanding Sitemaps and Their Importance in Indexing.

When you sign up for a Jetindexer account, Jetindexer visits your Search Console account and loads all sites and their sitemaps.

Processing Sitemaps

Every sitemap has many URLs inside.

Depending on the size of the sitemap, we process them 2 to 6 times a day and check for updates.

When we detect a new page in the sitemap, we save it into our database and schedule an inspection of it.

URL Inspection API

To check if the provided URL is in the Google index and get the exact status, there is an index.inspect API method.

First, you need to enable this API checkout video here.

You could inspect the provided URL with a number of 3rd party tools, but this is the only 100% verified approach.

HTTP request

POST https://searchconsole.googleapis.com/v1/urlInspection/index:inspect

This endpoint accepts 3 properties in the body:

  • inspectionUrl – this is a required string that represents a fully qualified URL to inspect. It must belong to the property specified in the next property.
  • siteUrl – this is a required string that represents the URL of the property as defined in the Search Console.
  • languageCode – optional property, a language code representing the content, default is en-US

If the property in the Search Console is a Domain property siteUrl has to have a sc-domain: prefix.

If it’s a URL-prefix property siteUrl has to have a trailing character at the end, like https://example.com/

Inspection Results

URL inspection API will return JSON response.

In the response most important key is indexStatusResult this is the content of it:

{
  "coverageState": "Submitted and indexed",
  "crawledAs": "MOBILE",
  "googleCanonical": "https://trguj.me/Sve-za-nju/botanic-therapy-sampon-400ml-24064",
  "indexingState": "INDEXING_ALLOWED",
  "lastCrawlTime": "2023-11-17T17:25:26Z",
  "pageFetchState": "SUCCESSFUL",
  "referringUrls": null,
  "robotsTxtState": "ALLOWED",
  "sitemap": null,
  "userCanonical": "https://trguj.me/Sve-za-nju/botanic-therapy-sampon-400ml-24064",
  "verdict": "PASS"
}

Everything is self-explanatory and the last property verdict is giving us info is the URL present in the Google index.

Verdict can have 5 values:

  • VERDICT_UNSPECIFIED
  • PASS
  • PARTIAL
  • FAIL
  • NEUTRAL

You can see these verdicts inside Jetindexer on Site details page.

Site Details page on Jetindexer showing verdicts

We consider a success only PASS value. For PARTIAL and FAIL we consider that these URLs have some kind of issue.

Limits when Inspecting URLs

Google Search Console API has limits based on a Website Property.

Single Property can request 2000 URL inspections per day, and it can not send more than 600 queries in 1 minute.

If your Property has 10.000 URLs it’ll take 5 days for Jetindexer to inspect every URL inside.

Instead of waiting for 5 days to load everything, Jetindexer will load all impressions in last 2 weeks and compare pages present there. You will see a P (possibly indexed) label next to these URLs. Of course these pages will also be checked with the Inspection API when there is an available quota.

You can use Google Cache link from the right column to manually check certain URL.

Permissions when Inspecting URLs

In order to check certain Search Console Properties, the user sending requests needs to have an Owner’s permission.

Search Console Settings showing Users and permissions

Inside Jetindexer, you’ll see an indicator for the permissions on the property.

Webmaster Scope

Before being able to submit URL inspection requests, Jetindexer requires authenticated users to authorize webmaster scope.

https://www.googleapis.com/auth/webmasters

This scope allows Jetindexer to submit URL inspection requests on behalf of the user.

Request Indexing using Google Indexing API

Indexing API allows any site to push new content to Google or request a removal.

Sending a request to the Indexing API is equivalent to manually requesting indexing through Google Search Console UI.

Google Indexing API has the following endpoints:

  • Update a URL
  • Remove a URL
  • Get notification status

We’ll focus only on the Update URL since it’s the most important one.

Update a URL request

Update URL request is used for informing Google about new pages on your site or updated pages.

POST https://indexing.googleapis.com/v3/urlNotifications:publish

The body of the request consists only of 2 properties:

  • url – This is a full URL to the page on your site that you want to index
  • type – Type of the request you’re sending, in this case, URL_UPDATED

Indexing API will return status 200 on successful request. This does not mean it’s instantly indexed, it just notifies you that request went OK.

Limits when requesting Indexing

Here, the limits are based on the GCP project level. Meaning by default you can request 200 indexation per day for 1 project.

In order to increase the number of requests sent daily, you can connect multiple GCP projects.

You can read more about Connecting a GCP project in our guide.

Jetindexer handles projects and their usage limits for you. You can connect multiple projects and decide which Search Console Property can submit requests and how many.

Indexing Scope

In order to submit requests on behalf of a user, Jetindexer needs indexing scope.

https://www.googleapis.com/auth/indexing

Same as for the Webmaster Scope, the user needs to be an owner of the Search Console Property in order to use Indexing API on it.

Scheduling tasks

In order not to break limits, Jetindexer uses a system of scheduled jobs and multiple queues.

Besides the mentioned limits for Indexing and Inspect API, there are limits on the authenticated user level.

A single user can submit 60 requests a minute in total to these APIs. Spreading tasks between GCP Projects and tracking all these limits is a challenging task.

Priorities when Checking and Submitting URLs

Inside Jetindexer, you can decide which sitemaps of the property will be checked and indexed.

This is helpful when you have too many pages to fit in a daily limit.

In a number of cases, there are sitemaps that are not a priority. With this feature, you can stop checking them and spend all your requests on the other more important sitemaps of the property.

Handling Timeouts

Timeouts can occur at any step of the process:

  • timeout when loading a sitemap
  • timeout when inspecting URLs
  • timeout when submitting URLs

Most common timeouts are happening on client servers when they have too big sitemaps.

It’s super important to follow best practices regarding the size of sitemaps. Read about them here.

Jetindexer has a robust retry mechanism when timeouts occur.