In this article, you’ll find how Jetindexer knows which pages are not indexed, when to request their indexing, the limits, and the edge cases.
Requesting new URL indexing in the Google Search Console interface is a very straightforward task, but when you look into the API, that’s where things become complicated pretty fast.
Since manually submitting URLs is a dealbreaker, we turned to automating this task.
Every site has or at least should have a sitemap.
Sitemap represents a Table of contents of the entire website.
Sitemaps are intended to be used by crawlers or bots; you can read more about them here: Understanding Sitemaps and Their Importance in Indexing.
When you sign up for a Jetindexer account, Jetindexer visits your Search Console account and loads all sites and their sitemaps.
Every sitemap has many URLs inside.
Depending on the size of the sitemap, we process them 2 to 6 times a day and check for updates.
When we detect a new page in the sitemap, we save it into our database and schedule an inspection of it.
To check if the provided URL is in the Google index and get the exact status, there is an index.inspect API method.
First, you need to enable this API checkout video here.
You could inspect the provided URL with a number of 3rd party tools, but this is the only 100% verified approach.
POST https://searchconsole.googleapis.com/v1/urlInspection/index:inspect
This endpoint accepts 3 properties in the body:
If the property in the Search Console is a Domain property siteUrl has to have a sc-domain: prefix.
If it’s a URL-prefix property siteUrl has to have a trailing character at the end, like https://example.com/
URL inspection API will return JSON response.
In the response most important key is indexStatusResult this is the content of it:
{
"coverageState": "Submitted and indexed",
"crawledAs": "MOBILE",
"googleCanonical": "https://trguj.me/Sve-za-nju/botanic-therapy-sampon-400ml-24064",
"indexingState": "INDEXING_ALLOWED",
"lastCrawlTime": "2023-11-17T17:25:26Z",
"pageFetchState": "SUCCESSFUL",
"referringUrls": null,
"robotsTxtState": "ALLOWED",
"sitemap": null,
"userCanonical": "https://trguj.me/Sve-za-nju/botanic-therapy-sampon-400ml-24064",
"verdict": "PASS"
}
Everything is self-explanatory and the last property verdict is giving us info is the URL present in the Google index.
Verdict can have 5 values:
You can see these verdicts inside Jetindexer on Site details page.
We consider a success only PASS value. For PARTIAL and FAIL we consider that these URLs have some kind of issue.
Google Search Console API has limits based on a Website Property.
Single Property can request 2000 URL inspections per day, and it can not send more than 600 queries in 1 minute.
If your Property has 10.000 URLs it’ll take 5 days for Jetindexer to inspect every URL inside.
Instead of waiting for 5 days to load everything, Jetindexer will load all impressions in last 2 weeks and compare pages present there. You will see a P (possibly indexed) label next to these URLs. Of course these pages will also be checked with the Inspection API when there is an available quota.
You can use Google Cache link from the right column to manually check certain URL.
In order to check certain Search Console Properties, the user sending requests needs to have an Owner’s permission.
Inside Jetindexer, you’ll see an indicator for the permissions on the property.
Before being able to submit URL inspection requests, Jetindexer requires authenticated users to authorize webmaster scope.
https://www.googleapis.com/auth/webmasters
This scope allows Jetindexer to submit URL inspection requests on behalf of the user.
Indexing API allows any site to push new content to Google or request a removal.
Sending a request to the Indexing API is equivalent to manually requesting indexing through Google Search Console UI.
Google Indexing API has the following endpoints:
We’ll focus only on the Update URL since it’s the most important one.
Update URL request is used for informing Google about new pages on your site or updated pages.
POST https://indexing.googleapis.com/v3/urlNotifications:publish
The body of the request consists only of 2 properties:
Indexing API will return status 200 on successful request. This does not mean it’s instantly indexed, it just notifies you that request went OK.
Here, the limits are based on the GCP project level. Meaning by default you can request 200 indexation per day for 1 project.
In order to increase the number of requests sent daily, you can connect multiple GCP projects.
You can read more about Connecting a GCP project in our guide.
Jetindexer handles projects and their usage limits for you. You can connect multiple projects and decide which Search Console Property can submit requests and how many.
In order to submit requests on behalf of a user, Jetindexer needs indexing scope.
https://www.googleapis.com/auth/indexing
Same as for the Webmaster Scope, the user needs to be an owner of the Search Console Property in order to use Indexing API on it.
In order not to break limits, Jetindexer uses a system of scheduled jobs and multiple queues.
Besides the mentioned limits for Indexing and Inspect API, there are limits on the authenticated user level.
A single user can submit 60 requests a minute in total to these APIs. Spreading tasks between GCP Projects and tracking all these limits is a challenging task.
Inside Jetindexer, you can decide which sitemaps of the property will be checked and indexed.
This is helpful when you have too many pages to fit in a daily limit.
In a number of cases, there are sitemaps that are not a priority. With this feature, you can stop checking them and spend all your requests on the other more important sitemaps of the property.
Timeouts can occur at any step of the process:
Most common timeouts are happening on client servers when they have too big sitemaps.
It’s super important to follow best practices regarding the size of sitemaps. Read about them here.
Jetindexer has a robust retry mechanism when timeouts occur.