Why doesn’t Google index my content properly?

Jono Alderson 11 December 2018 3 SEO basics, Technical SEO

Before Google can rank your content, it needs to discover it, be permitted and able to evaluate it, and index it. If any of these processes go wrong, you might find that your pages don’t show up in the search results.

Most of the time, you can rely on Google to correctly index your content, all by itself. After all, this process is one of the foundational parts of what Google is, and does.

Simply putting your content online isn’t always enough, however.

If you have technical issues, low-quality content, or incorrect indexing controls, you might trip up during those processes of discovery, evaluation and indexing might go wrong.

Discovery

In order to index a page, Google has to be able to find it. That means that somewhere has to link to it – whether that’s from other indexed pages in the same site, or from other sites.

Depending on the relevance and quality of the places it’s linked from, it might take a little time for Google to schedule following those links and finding your pages.

That also means that the page can’t be ‘hidden’ – which, for example, might mean content being password protected, blocked via robots.txt, or only available to users in certain countries.

Evaluation

When Google has discovered the page, it will digest the content (including the HTML code and related assets) to assess the quality and relevance.

During this process, there are a number of things which can result in Google choosing not to index a page. They include:

When it determines that the content of the page is ‘low quality’. E.g., if there’s a very low word count, or if the content is a close/direct duplicate of another page. Particular ‘over-optimized’ or ‘spammy’ pages may also be ignored.
When it discovers specific indexing instructions on the page (such as a meta robots tag, or a canonical URL tag pointing at a different page). Google will make a judgment call in cases like this whether it should honor the instructions, but chances are, it’ll choose not to include the page.
When it can’t see/access the content. For websites which rely heavily on JavaScript, or those which include content in complex or non-standard ways, Google might not be able to consume the page content. It may be that, as far as they’re concerned, it’s an empty (or low quality) page.
When it has to process heavy JavaScript, Google might schedule a ‘follow-up’ crawl to dig deeper, before deciding what/whether to index or not. The time this takes can vary considerably, based on Google’s resourcing and their prioritization of your pages.

Indexing

If you’ve passed all of those tests, then your content should be successfully indexed and should turn up when you search for it.

HINT: try doing a ‘site’ search on Google (e.g., site:https://www.example.com/example-page/) to see if a specific URL has been included in the index).

Bear in mind that, once a page is in the index, that doesn’t mean it’ll stay there forever! Google repeatedly crawls and re-evaluates content – so if your quality drops, or if you accidentally prevent Google from evaluating the content, then your page might get dropped out of the index.

Any questions? Let us know in the comments!

Discovery

Evaluation

Indexing

Discussion (3)