Remove duplicate content URL entries from Google

Friday, March 15, 2013

Remove duplicate content URL entries from Google

 If you come across with a situation when you see duplicate URL entries of your site pages on Google  search engine and you do not know how to fix them so here we have a rundown as to how you can fix them, basically these duplicate URLs could be from  your session url or archive pages.
In other words you can call them duplicate content url that exits in your own site in the form of your site url and date archive format and sometimes your site url and session entries and sometime the same url with different parameter in url showing the same content with different url entries in search engine you can track them by just typing a command on Google:

Site:yoursitname.com  and enter then you will see your site url entries with different parameters showing same pages.

If you are managing your blog with blogspot/blogger and creating post and side by side you see there are number of archive pages are being indexed against those new posts so search engine probably do not like these duplicate archive pages.
Blogger gives you option to remove those archive pages from Google index, basically you need to tell Google do not index them because these are the archive pages which are duplicate URL.

Fix the duplicate content URL entries from blogger:
If you are using old interface of blogspot/blogger then go to setting>> archive and set no archive and if you are using new interface of blogspot /blogger then you need to write some code and telling  Google not to index these archive pages which is pretty straight forward approach. you can write this code with the old interface as well - it's just a matter of code which is mentioned below - just copy code snippet and you past it in template.

Sample duplicate archive pages entries:
Some of sample pages are your url ending with date and archives or some special character like question mark etc.

To fix them just point your location to blogger template html section and find head> tag and add following code after head tag.

<b:if cond='data:blog.pageType == &quot;archive&quot;'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>

This can happen with your own website where you see duplicate url of your session URL
Once you fix this issue then Google will show you exact  number of published post in its index.

Once Google bot will crawl your site then it will show you exact number of html of web pages or post no extra pages from your own site having same content.
you will see boost in ranking after this classic SEO fix.

Some other attributes you can use for noindex:
Following sample will prevent all cached linked to appear on MSN search result:
<meta name="msnbot" content="noindex">

1 comment:

Unknown said...
This comment has been removed by a blog administrator.