While browsing around the Google webmaster central, happened to bump into HTML improvement section and saw the replytocom duplicate title tags content warning. I was pretty sure what I wrote would not be duplicate within my website domain or my other sites. After all, Google frowns upon such duplicate content.
Upon further digging, it seems this is a WordPress issue. It automatically generates a suffix with a ?replytocom=xxx whenever comments are generated. I wonder why WordPress does such a thing, didn’t they know duplicate content is harmful to a site ranking against Google especially with the Panda algorithm?
Anyway, I found the solution and thought I reshare and at the same time make a personal note for myself in my blog.
To check whether you are also facing such replytocom issues in your WordPress installation, head on to Google Webmaster Central, and check under:
–> Search Appearance
—>HTML Improvements
—-> Duplicate Title Tags
It will show you how many pages Google has detected in your WordPress website indexed with duplicate title tags. For my case only about 6 were detected. Although I’m pretty sure there are more of such pages, somehow the Google bots only managed to crawl those pages.
To prevent such crawling of WordPress auto generated replytocom pages, head over to:
–> Crawl
—-> URL Parameters
—–> Add Parameter
Key in the parameters as per above picture.
You may check the parameter by looking through the “Show example URLs”. I saw I have many such pages though I wonder why it wasn’t crawled earlier.
Anyway, save the new parameter and wait for the Google Webmaster to refresh its data.
In addition, you may want to add this code into your ROBOT.TXT file located in the root directory of your website.
Disallow: *?replytocom
This should further prevent not only Google bots, but exclude other spiders as well from crawling your replytocom duplicate content.
Apart from the above two methods, a third option is to use a WordPress plugin. However I decided against installing this plugin as that would mean more processes running in the WordPress software. If the above settings should work, there was no need to install additional plugin.
Since I’ve just set up the above parameters, I just have to wait and see the end result from Google Webmaster Central after it refreshes its data.