Wednesday, February 2, 2011

Duplicate content and its issues.

There usually is a raging controversy amongst webmasters as to how Google and other search engines view and treat duplicate content issues.
Before we reveal the result of our in depth research, we must place "duplicate content" in proper perspective.

What is duplicate content?
Duplicate content is more or less identical content appearing on the same or different sites.
The definition above almost immediately throws up the fact that duplicate content is primarily of two types:

A) More or less identical content appearing on the same site
Google classifies these into two types-

1. Duplicate content, deceptive in origin and with malicious intent, on the same site.
Falling within this category, with a view to manipulating search engine rankings and web traffic to their advantage, some webmasters consciously duplicate content on their websites.

2. Unintentional duplicate content without any deceptive intent, on the same site.
This unintentionally occurs in some instances, for example
* Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
* Store items shown or linked via multiple distinct URLs
* Printer-only versions of web pages

What is Google and other search engines view and treatment of the scenario where more or less identical content appears on the same site?
Our in depth research has revealed the following:
Where as in the first scenario, the duplication is premeditated and with malicious intent or deceptive in origin, then Google frowns at this and will take steps to sanction such erring sites as their action constitutes a violation of Google's webmaster guidelines.
Such sanctions may include a complete removal from the Google index.
Where on the other hand and as in the second scenario, it arises unintentionally and without malicious intent, Google will not penalize such webmasters but rather take steps to index only one of the duplicated web pages it considers as ideal for such content.
The site's listing on the search engine result pages (SERP's) will therefore not be placed in the supplementary listing, as often touted.
Duplication as opposed to duplicate content may however indirectly influence this, if links to the webmaster's pages are split among the various versions, causing lower per-page PageRank.
Webmasters are therefore advised to proactively take steps to address duplicate content issues on their websites and ensure that visitors see the content they want them to.
The following steps can be taken to achieve this:
- Use 301 redirects. If for example you have restructured your site, you can use 301 redirect (permanent) to redirect visitors and spiders to the updated content.
- Use webmaster tools to indicate your preferred domain to Google and do the same to other search engines.
- Minimize similar content. If for example you have similar content on different web pages of your site, you can expand one reasonably, to distinct it from the other.
- Be consistent in your internal linking structure. Once you pick a particular format of writing URLs, then stick consistently to this.
- Ensure that within your sitemap file, you include the preferred version of your URLs.
- Understanding your content management system. For example, a blog may have the same content on the home page, a permalink page, a category page, and an archive page.
The following measures or steps to achieve this objective of proactively addressing duplicate content issues are however not recommended by Google.
- Blocking crawler access to the duplicates on your website. Whether with a robots.txt file or other methods, since without search engine bots being able to crawl these pages they cannot identify them as duplicates and will have to treat them as separate unique pages.
A more acceptable solution is to allow search engine bots to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects.
Where duplicate content leads to crawling too much of the webmasters' site, he can also adjust the crawl rate setting in Webmaster Tools.
B) More or less identical content appearing on different sites.
Again, this can be of two types
1. With malicious intent or deceptive from the origin
Again, as indicated above, such constitutes a violation of Google webmaster guidelines and Google will take necessary steps once notified, to sanction such erring webmasters.
This for example can apply to scrapers (misappropriating and republishing) of your site content.
With measures put in place by Google, they are unlikely to affect the originating webmaster's site rankings. However, where he particularly feels frustrated by the actions of such scrapers, he is at liberty to file a DMCA request to claim ownership of the content and request removal of the other site from Google's index.
2. In line with good practice e.g syndicated content
In this case, there is no contravention of Google webmaster guidelines and so no penalty results.