Adam Lasnik, Google’s “search evangelist,” has written an informative blog post about the concern about duplicate content. I’ve summarized what he wrote and what is necessary to avoid the issue.
The issue of duplicate content is a concern when there is content within or across websites that is either identical or very similar. Since Google’s goal is to have a “diverse cross-section of unique content” in the search results, it will filter out the less original content when crawling a site. Therefore, exact content should be filtered out, as Adam says:
…if your site has articles in “regular” and “printer” versions and neither set is blocked in robots.txt or via a noindex meta tag, we’ll choose one version to list … in the vast majority of cases, the worst thing that’ll befall webmasters is to see the “less desired” version of a page shown in our index.
From a webmaster perspective, Adam offers some useful tips to address these duplicate content issues (among other webmaster tips):
- Use a robots.txt file to filter out folders that you don’t want the crawler to access (as in the “regular” versus “printer” versions, you would probably want to want to Disallow the printer files).
- 301 redirects will tell the spiders about a recently redesigned site and let them know where to look for the appropriate content.
- Keep your site structure consistent — it’s just good organization.
- Where possible, focus on top level domains (TLDs). If you have a .de domain, Google is likely to recognize that as a German-speaking site, and this beats out the de.domain.com option to denote German, or fr.domain.com to denote French, etc.
- Syndication is important. Link to the article you’re referencing. Even so, the Google algorithm might choose the more appropriate article.
- Tell Google how you want to refer to your site. Use the webmaster tools to let Google know that you want www.domain.com to be referenced rather than just domain.com.
- Make sure there’s content there. Don’t post “stubs” or placeholders. Google doesn’t want that, and its users aren’t interested in areas on a website devoid of any content.
- Getting an understanding of your CMS tool to see how pages are created is important.
- File a complaint. In the worst cases, you can tell Google about infringement issues by filling out a DMCA form to report the offending site.
Thank you Adam for your useful tips and for setting the record straight.