The Dirty Truth About Duplicate Content

question markIf you run a website or blog, you’ll eventually have to address the topic of duplicate content. It’s no secret that Google and other search engines place a great deal of value on unique, original content. Websites with unique content tend to rank higher; it’s just than simple. To understand the real effect of duplicate content, and how to prevent it, we must take a deeper look at this issue.

Duplicate Content: The Scoop

Duplicate content can appear in many different ways, the most obvious being webmasters copying content from other websites. There are literally hundreds of different article directions which allow webmasters to copy and publish their articles, assuming they leave the author bio and backlink intact. As long as the webmaster follows the directory’s respective guidelines, they can legally publish the article on their website without fear of being hit with a copyright infringement suit – but this doesn’t necessarily mean it’s a good idea.

Publishing articles, stories, blog posts or any other “bulk” content on your site that’s already been published on other websites will likely offer little-to-no SEO benefit. In fact, it could have a negative effect by lowering your site’s ranking.

Duplicate content may also occur within a website. If you publish the same exact blog post on two different webpages, search engines will view it as duplicate content. This is especially troubling for WordPress users, as WordPress automatically creates multiple locations for new content by default (we’ll get to that later).

Matt Cutts Talks About Duplicate Content

In one of his latest YouTube videos, Matt Cutts, head of Google’s Webspam team, answers the question: “How does Google handle duplicate content and what negative effects can it have on rankings from an SEO perspective?” Some people are under the impression that all duplicate content is bad, but Cutts lays this rumor to rest.

It’s important to realize that if you look at content on the web, something like 25 or 30 percent of all of the web’s content is duplicate content. People will quote a paragraph of a blog and then link to the blog, that sort of thing. So it’s not the case that every single time there’s duplicate content it’s spam, and if we made that assumption the changes that happened as a result would end up probably hurting our search quality rather than helping our search quality,” says Cutts in his latest YouTube Q&A.

Key Points From Cutt’s Video Response:

  • Roughly 30% of the web’s content is duplicate.
  • Duplicate content in the form of legal jargon, terms and conditions, privacy policy, technical specs, etc. are fine.
  • Duplicate content that’s automatically generated through software may be perceived as spam.
  • Google views duplicate content as a collective entity, and as such, they may rank a single website publishing it. (note: Cutts didn’t reveal how Google chooses a website with duplicate content, although he did suggest PageRank is a factor).
  • Avoid creating webpages with similar content. If possible, consolidate websites with similar themes and topics together.
  • Duplicate content penalties from Google are rare, although they do still occur.
  • What most webmasters believe is a duplicate content penalty is actually Google determining which webpage is the most relevant.

Duplicate Content on WordPress Websites

With over 75 million websites and counting, WordPress is the world’s most popular content management system (CMS). In its default out-of-the-box setup, however, it can also suffer from some duplicate content issues, which is something that users need to be aware of.

Let’s say you install WordPress on your website and publish a new post. You may assume this post would only show up in one location, but WordPress may recreate it for the archives, author, tags and categories. Rather than publishing your content to a single url, it’s now available on five different urls.

The good news is that you can easily fix duplicate content issues associated with WordPress by using an SEO plugin such as Yoast SEO. You can read more about Yoast SEO plugin, and other important WordPress SEO issues, on our previous blog post here.

Note: another important step to prevent duplicate content in WordPress is to set your RSS feed to display a summary rather than the entire post. There are websites out there which automatically scrape and publish RSS feeds from WordPress blogs and sites. By displaying 100% of your post content in the RSS feed, these scrape-and-publish websites will essentially publish all of your content.

Other Duplicate Content Prevention Tips:

  • Use canonical urls for webpages with similar content.
  • Limit the use of “boilerplate” content on your website.
  • Be consistent with your internal linking.
  • Stick with top-level domains (TLD).
  • Define preferred indexing structure at Google Webmasters Tools.
  • If multiple pages contain similar information, consolidate them together.
  • Be conscious of websites wrongfully copying and publishing your content.