Duplicate Content Penalty and Filter — Do They Exist? How Do They Work?
Duplicate content. This problem certainly causes a lot of confusion among webmasters and bloggers. Google has explained this repeatedly, both in the Google Webmaster Help Center and also on the Google Webmaster Central Blog. (See the resources section below for additional readings.)
Is it so difficult to understand?
It is, to a certain extent, mainly because there are a lot of myths around the Internet. Not everything you see, read, or hear online is true. This includes, but not limited to, blogs, forums, video sites, article directories, and other sites.
First of all, it is necessary that before you read the details, you understand the difference between duplicate content penalty and duplicate content filter. The truth is, there is no such thing as duplicate content penalty.
Most people, when referring to penalties in search engine rankings, mean the scores that are deducted from a page or site that make the page harder to rank in search engine result pages.
So, there is no penalty involved because of duplicate content. If you are a publisher or blogger who creates useful and unique content, you should not worry about it from this standpoint.
However, there is another kind of duplicate content issue which may cause problem or penalty. If your blog content mainly consists of scraped content from other sites, you may be penalized, especially if the whole site doesn’t add value at all to the Web.
As of this writing, this kind of penalty is far from perfect but as search engines improve, we all can hope for the better. If there is one thing you learn from the “duplicate content penalty,” it is the fact that bloggers should contribute value instead of using cookie-cutter approach to rank in search engines.
Types of Duplicate Content
There are two major scenarios for issues related to duplicate content:
- Duplicate content within your own domain. This is when identical content appears in multiple place (URLs) in the same site.
- Cross-domain duplicate content. When identical content from your site appears on other domains, especially if substantive blocks of the content are involved, then it may be worth paying attention to.
Duplicate Content is to be Expected
Duplicate content is an expected issue on the Web. Consider that online, people are posting press releases to multiple sites, including to their own domain. Certain news sites are there to sell contents to other sites so they can use them, oftentimes as is. This creates multiple copies of the same content in different sites and domains.
Search engines certainly are aware of this possibility.
Certain content management system and forum software generate multiple versions of the same content for different devices, such as mobile device or for printing.
As a blogger and webmaster, you have two choices:
- Help search engines address duplicate content issues and visitors to see the content you want them to see in the search engine result pages.
- Let search engines handle the issue. Although you may not get the exact same thing that you expect to happen, search engines are doing their part to make sure the content displayed in the result page is the most relevant one.
In-Site Duplicate Content Issue
Blog software such as WordPress displays the same piece of content multiple times in the homepage, post specific page (permalink), category page, tag page, etc. If you enable full content for each entry, or a large chunk of it, search engines may have a hard time identifying which page is the representative version.
You can help by using Sitemap to tell search engines about your preferred version. This is not a guarantee, but still a way to indicate the canonical preference. The sitemap method is also useful for a site which shows the same content for different query parameters, i.e. dynamic web site.
Another way to prevent the wrong content to be ranked on search engine listing is by using the robots.txt file to block search engines from indexing the content. For a blog, the nofollow attribute for the a tag prevents transferring page authority too. Combining it with robots.txt is necessary because although you may consistently put nofollow to a certain page on your site, other bloggers may still link to it.
Cross-Domain Duplicate Content Issue
As explained above, there are certain cases where duplicate content is to be expected. However, search engine spammers are constantly looking for good quality content, copy it as is, and put on different site, with different design, with the intention to monetize it.
For the first scenario, you can demand people who publish your content to put a link back to your site. This is possible with article directories, although not all webmasters comply with the guidelines for content use.
If the sploggers (for spam bloggers) promote the content heavier than you do, they may end up with a higher ranking than your page.
It is unfortunate to know that currently there is no quick and 100 percent solution to this. While Google allows you to file a DMCA request to claim ownership of the content and removal of the other site from Google’s index, it may take a long time.
If search engine traffic is part of your traffic strategies, you should promote your content and drive backlinks to your blog. Consider this as important as creating solid blog content, if not more.
Finally, if you choose to publish the same piece of syndicated content on your blog, Google explicitly requests that you enhance and expand the content on your site to make it unique.
- Duplicate content — Google Webmaster Help Center
- Duplicate content due to scrapers
- Duplicate content summit at SMX Advanced
- Google, duplicate content caused by URL parameters, and you
- Demystifying the duplicate content penalty
Return to Blog SEO — Whitehat SEO for Bloggers in a Nutshell.
Return to Blog Tips for a Better Blog — Blog Building University.