How to Eliminate Extra Pages in Google Index: A Complete Guide

For your e-commerce website, it often happens that there are a few extra pages than it should have. Once you analyze the website using the Google Search Console Index Coverage report, you get a clear picture of these extra pages but you don’t know how to eliminate extra pages. Here in this blog, we will give you a step-by-step process of why these pages are created and how you can remove these extra pages.

Let’s start with the basics first.

Why are these extra pages created?

For your e-commerce website when you find extra pages, this is mainly because extra URLs are generated. These extra URLs and extra pages are generated when people are searching for products on your website in a specific size, color, or other specific parameters. This generates a new URL and is generated for that particular product for the size, or color.

This causes a different webpage to be generated. Despite the fact that this is not an entirely different product, this webpages gets indexed just like the main product page and is discovered by Google via any link, it gets indexed.

Once this happens you have different web pages for the same product in different sizes, and colors, all linked to the main product page. Now all these products and UR:s are indexed for one product which cause extra web pages on the website.

How to Eliminate Extra Pages?

Now that we have understood the basics of how these extra pages are being generated and how these extra URLs are indexed. Let’s have an understanding of how to get rid extra pages and remove extra pages from the website.

Google says that using Canonical tags is the best approach to eliminate extra pages from the Google Index. Leveraging canonical tags you can direct all the product variation URLs to the same main product page and reduce the number of extra web pages indexed.

Furthermore, Google says that canonical tags are the best available approach to remove extra pages that are identical and differ only in color and size.

Google also said that, “A canonical URL is the URL of the page that Google thinks is most representative from a set of duplicate pages on your site. For example, if you have URLs for the same page (example.com?dress=1234 and example.com/dresses/1234), Google chooses one as canonical. The pages don’t need to be absolutely identical; minor changes in sorting or filtering of list pages don’t make the page unique (for example, sorting by price or filtering by item color).”

Additionally, “If you have a single page that’s accessible by multiple URLs, or different pages with similar content … Google sees these as duplicate versions of the same page. Google will choose one URL as the canonical version and crawl that, and all other URLs will be considered duplicate URLs and crawled less often.”

How does Google differentiate between canonical URLs and original one- if you won’t tell Google which URL is canonical, it will make the decision itself and will take both of them to be of equal weight.

Alternative Approach

Apart from using the canonical tags, the alternative approach is to remove extra pages and get rid of extra pages.

Using robot meta tags to block individual pages
Using robots.txt to block pages

For better understanding:

Robots.txt

The only problem here is that it is not guaranteed that Google will remove extra pages from the index.

On this Google Search Central said: “A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. Also, a disallow directive in robots.txt does not guarantee the bot will not crawl the page. That is because robots.txt is a voluntary system. However it would be rare for the major search engine bots not to adhere to your directives.”

This method is not an optimal choice and is not recommended by Google but is an alternative approach to using canonical tags to get rid of extra pages.

Robot Meta Tags

Leveraging robot meta tags, Google says that, “The robots meta tag lets you utilize a granular, page-specific approach to controlling how an individual page should be indexed and served to users in Google Search results.”

The best approach is integrating robot meta tags in the

of a webpage. Later, either the bots crawl into the pag leveraging an XML sitemap submission or you can let the bot do the crawling naturally which is usually 90 days.

Once you have integrated robots’ meta tags and the bots are back to crawl the page, they encounter the robots meta tags and won’t show the pages in the search results.

Concluding Thoughts

To eliminate extra pages, the best approach is to integrate canonical tags as this will help to redirect the links to the main product page only and will further allow you to remove extra pages.

Alternative approaches being robots.txt and robots meta tags, successful if you wish to stop these extra pages from being indexed at all. These approaches will help you remove the indexing of these extra pages and allow you to get rid extra pages without integrating canonical tags.

Hence, it is advisable to remove extra pages from the Google search console for better visibility of your web site and web pages. This will further help you to get your customers or users a better clarity about your website and remove all the confusion there is regarding your website or different web pages.

FAQ

How do I remove unwanted links from Google search?

If you wish to remove unwanted links from Google search and eliminate extra pages then you can install canonical tags or submit these unwanted links as spam links to Google search console which will remove these extra links.

How do I remove 404 pages from Google index?

Locate the HTTPS errors and click on it, check for the URLs that you wish to remove from Google search, check for all the webpages that return as 404 errors, and remove the selected pages.

How do I remove an old website from Google search?

Check for the URL in the Search console property, open the removal tools, opt for the removal, select the URL that you wish to remove, and confirm the removal.

How long does it take Google to index a new website?

In general, the time taken by Google to index a new website can range from a few days to a few weeks as well.

Why are these extra pages created?

How to Eliminate Extra Pages?

Alternative Approach

Concluding Thoughts

FAQ

Ravi Kumar

Is Your Site Invisible to AI?

Deep Dive Topics

Join the Asylum