Duplicate content is the boogeyman of the SEO world. Depending on who you ask and what you read, the definition and scope of what constitutes duplicate content varies wildly. That’s why in this article we will break down what duplicate content really is and the misconceptions about what search engines (Google) will and won’t abide by.
What is Duplicate Content?
Well, it’s as simple as it sounds: duplicate content is content that’s in more than one place. Content that matches verbatim (or close to it) the words of another article, or at least, content that is similar in structure and verbiage to another.
What most people think of as duplicate content is a copy/paste of the same article to multiple places around the internet. You can see this kind of thing happen when content is scraped illegally to be posted on less-than-reputable websites, when content is syndicated from one place to another, or when you own multiple websites and post the same content for added reach.
But duplicate content is also the use of the same phrases and sections on a site. If you have a template for guest posts, for instance, that reads exactly the same except for the writer’s name and website, that’s duplicate content.
The thing is, between a quarter to a third of the internet is duplicate content. If there’s a website, then it’s content is somewhere else. No doubt, it’s been scraped, its content stolen and later reposted elsewhere. It has blurbs and snippets that have been reused.
But the question of the hour is whether or not any of that has caused Google to penalize it. Has the scraped and plagiarized site lost rankings because of the duplication?
The answer is probably not.
Despite popular belief, it’s pretty hard for duplicate content to get you into trouble. The topic is full of anecdotes, myths, urban legends, and folklore passed down from marketer to marketer over the years. And like any story or tale, it gets taller and more exaggerated as it’s told. Let’s see if we can find the kernels of truth in these urban legends and misconceptions.
Will Google Blacklist Your Site for Duplicate Content?
There are variations of this floating around everywhere. That having even one instance of duplicate content will put you on Google’s bad side.
Maybe it’s a blacklisting from Google, or maybe it’s a penalty and the site ranks lower in various query results. But what if it’s something out of your control? Scrapers take your content against your wishes. Or what if it’s something you do purposefully? Like re-posting guest articles or fleshing out a secondary site or even syndicating content. Perhaps you have a template you use for interviews with the same questions repeated week after week after week.
Is that duplicate content? 100% absolutely yes. Is Google going to blacklist/penalize your site for it? Probably not.
You see, it takes a lot for Google to blacklist a site. If you’re not hosting malware, phishing scams, or just straight-up spam, the likelihood of your being blacklisted is nil. And as for penalizing your site, Google has said numerous times, they do not penalize for duplicate content (or as he puts it, “duplicate content is not really treated as spam”).
This means that as Matt says in the video linked above, if there are two websites with the same content, their search algorithms will determine which website is the most relevant and provides the most value to the users, and then display that result.
In cases like this, Google knows scraped content. Those websites are easy to find for them and their algorithms. In fact, you’ve probably run across content you know what stolen before and saw how horrible the website was. Full of ads, badly formatted, poorly designed, and just a heinous experience altogether. And worst of all? Nothing else on the site helped you with what you were searching for except this one, tiny excerpt you found.
That’s why Google takes search intent into account so much. Even if you have duplicate content, if it’s valuable content (and the rest of the site is valuable to users, too), you will be displayed in search results over websites with the exact same article.
Google Penalizes Thin Content, Not Duplicate
The reason that your site would be prioritized in search rankings over the duplicates is that their websites are full of what is known as thin content. That means that articles on these sites are short, the site itself is an unfocused mishmash of topics across many niches and industries, and it probably has an incredibly high bounce rate.
Or, in other words, they’re nearly useless articles on nearly useless sites.
However, it’s not just copy/paste scraper sites that create thin content. No, you can create plenty of thin content of your own without much trouble. So you need to be careful.
Keyword stuffing is the first way you fall into the thin-content hole. Your article sounds like it solves a problem or answers a question, but instead, it just awkwardly works in the keyphrase multiple times while tip-toeing around the subject itself in the name of length and word count.
On the other hand, if you write too-short articles, you’re once again proliferating thin content. You want to answer the question of the searcher, and you also want to go into detail about it and provide as much value as possible. You want to have internal links to other articles you’ve written on the topic, as well as external references. These show Google that you’ve done your research and care about providing your readers value, and they also make it so that when you do get scraped, you get a handful of links back to your site that might one day work as referral traffic. (You would get next to no link juice from those sites.)
Take this article for instance. We hope that it ranks higher for the question in the title — “Should Publishers Still Be Scared of Duplicate Content?” — than a site that has couple paragraphs that rephrase”no, 30% of the web is duplicate content. Just don’t spam and you’ll be fine.” That’s thin content. We are trying to provide value and expand on the idea, rather than leaving it as “nah, don’t worry too much about it.”
Canonical Links and Other Ways to Do Duplicate Content Right
The thing is, we know you’re going to worry about it. At least a little. We do, too. Everyone does. That’s why we want to give you a couple of options for handling the duplicate content that you will inevitably have out there in the wild. These do, however, address only full duplication. For snippet and excerpt and incidental duplication, as long as the content itself is sound, and you’re using the boilerplate as a vehicle for quality, you will be fine.
Using a canonical link tag is probably the best bet you have for keeping your duplicate content in check. While a lot goes on under the hood with a rel=’canonical’ tag, what it boils down to is you’re telling Google that whatever link you provide after it is the real deal and the one they should index.
For instance, if you have an article published at example.com/your-article, but you want to re-post that content on your own site, you’d include a tag on the reposted one that looks like this:
&lt;link rel='canonical' href='example.com/your-article' /&gt;
Keep in mind, however, that this is a request for Google to honor, not a demand. They have reserved the right to determine which is the better source to rank based on their internal metrics and algorithms. Though Google not honoring the request is rare.
On WordPress, adding the canonical tag can be tricky, so you can use a plugin to easily do it. The aptly named Canonical SEO Content Syndication plugin works very well for this.
Another way you can handle duplicate content — at least in terms of whole articles at a time — is to simply redirect the URL from one to another. If you have reposted or updated an article on your site, you don’t want the old one hanging out, vying for Google’s attention. So you throw a 301 redirect to tell Google and other engines where the new content lives, and that one gets most of the link juice passed its way.
The same applies to articles on other sites, too. If you move domains or have the post across multiple sites, you can choose the primary home by simply redirecting the duplicates. You retain link juice, and Google eventually sorts out that it’s been redirected and begins indexing the target site instead.
So…Should You Worry About Duplicate Content?
No. Not really. The chances that you will be penalized for it are minimal, and there are easy-enough ways to protect yourself if you were (canonical links being the primary defense). As long as you create content mindfully and look at why your audience is looking at your site and what answers they need, you won’t have to worry about duplicating content. Unless you’re a content scraper. But you’re not. So you’re safe.
What is your strategy regarding duplicate content on your websites?
Article featured image by Chonnajak / shutterstock.com