The Canonical URL is a mystery to many people, and can be mistaken in its usage for things like a 301 redirect. People might know it has to do with SEO, but aren’t quite sure where or when to apply it. In WordPress in particular (versus an HTML site), it can be hard to manually use a Canonical URL on each page of your site without a plugin, because of the way theme templates work in the content management system.
This article will aim to help clear up some of the questions WordPress users may have about the Canonical URL. Non-WordPress users may also find it helpful to learn the principles, and then apply the rules to their own content management system or development practices.
Do note however that you may find this article hard to follow if you don’t have much of a technical background with WordPress or basic HTML and search engine optimization (SEO). We’ll be throwing terms around that you may want to familiarize yourself with first. But at least in this article you’ll have a base from which to start googling more on the subject, to get a fuller picture.
What is the Canonical URL element?
The canonical URL, (also often referred to as rel=canonical, or the canonical tag, and other names) is what search engines refer to when they see multiple versions of a page on your website or even around the web. It is now used to solve some complicated duplicate content issues and is sometimes a better tool to use than a 301 redirect.
Google writes a great and simple explanation of the purpose of canonical URLs here. I strongly recommend checking that out. They’ve made it as clear as possible.
You might think that your site doesn’t contain duplicate content. It’s great if you’ve taken great care to not repeat your text on multiple pages. Otherwise, this could result in ranking dilutions on search engines.
If you have repeated text on your site, think about this seriously: if you were a search engine trying to answer a user’s query, would you give that user two identical pages in a list of results? Of course not! That’s useless to them. You would give as much variety as you can in your search engine results pages (SERPs) in the hopes you are aiding their quest adequately.
So if you are repeating content over and over on your site, you can, and should, expect that Google is not going to rank all your pages. But, that’s not good if you care about search engine rankings and being found through SERPs.
The Duplicate Content URLs you didn’t Know You Had
Ok, but let’s go back and assume you did the right thing and made sure all your pages are totally unique. You could still have some ‘hidden’ duplicate URLs that you are not aware of (they’re not really hidden, I’m just describing it that way). This may surprise you, but did you know that search engines view the following URLs as completely separate, even though, to you, they are the same, and they all display the same content?
http://www.examplesite.com (notice the www?)
https://examplesite.com (notice the https?)
http://www.examplesite.com/ (notice the slash at the end?)
This is why you need the canonical URL in the <head> tag of the HTML of all your pages. You need to tell search engines which version of the above URL types (or others) you want them to look at.
This means that YES, you need to make a final decision about whether or not you are going to use the www or not use the www in all your links during web marketing. This needs to be your linking strategy both on your site and off your site, and everywhere that you refer to your website. Everyone who uses your URL should know this: employees, subcontractors, partners, directories you list in, people who link to you – everyone.
You also need to decide on whether you want the slash at the end, or the https (if you are accepting sensitive information, like credit cards, through your site). Pick one, and stick to it. If I were you, I would pick the one you have been using the most, to make the headache less painful when it’s time fix your URLs.
Thankfully, if you are using WordPress, most of this will be easy to solve. We will get into the right types of plugins and things you’ll need to set up later in this article.
But there are still more instances that the canonical URL is going to come in handy.
Duplicate Content Created By Taxonomies
Let’s say you write an article and you include that article into multiple blog categories and tags in WordPress (these are called ‘taxonomies’). People do this all the time. Or let’s say you’re using e-commerce and your products appear in multiple categories. Now we have an issue where the same content can appear, by intention, on multiple URLs, to make it easier for users to navigate your site. For example:
You want your users to find the chocolate truffles in both categories: “candy” and “food.” That’s fine. But which of these URLs do you want search engines to pay attention to? Remember, they’re not going to rank both of them. So you have to pick. And that’s where the canonical URL comes into play. It will tell the search engines on one of these pages that, “hey, this is the same content as this other page, please rank that one and not this one.”
Remember this is a request – no search engine is obligated to ‘obey’ your canonicalization, and they can ignore it if they think it’s wrong.
(Note: since we’ve mentioned taxonomies, please have a read through another article I wrote about properly organizing your website content. The use of overlapping content in multiple tags and categories creates other SEO issues in your archives that you’ll need to solve. This is apart from the canonical URL usage described in this article.)
Using the Cross Domain Canonical URL for Duplicate Content on External URLs
There is a final reason you need the canonical URL element that we’ll bring up here (there are more, but they get complicated, and the principle is still the same). It’s when you publish content on your site that also appears on other sites. The most obvious case of this is with syndication, such as press releases.
So, your company publishes a press release and posts it on your site. That’s a legitimate thing to do. But traditionally, the way press releases work, is that they are free game for any other content publisher to use. They are meant to be shared and copied. That’s the whole premise behind syndication networks like PRWeb. It’s an age-old form of marketing.
But this creates a conflict with your SEO efforts. To a search engine bot, the content of the press release on your website is the same as on the news sites. So where is the original copy? Which URL gets ranked in the SERPs? Remember, we have to pick one.
Usually search engines will pick for you, unless you make a suggestion to them. And you do that with the canonical URL. In the case of press releases however, it’s unlikely you are going to get every single news outlet that publishes your article to point their canonical URL element to your site. Remember, it can be a mystery to many. I also doubt they will all take the time to find out the original source of the content and code their HTML properly. They are publishing several articles a day.
So that leaves you to take care of it on your site. If I were you, I would use the canonical URL on the page containing your press release, and refer to the copy on the main syndication network where you originally posted the article for distribution. For example, point to the copy of the article on the PRweb.com site (if you used that service). Just my two cents.
To give you a real life example of a non-press-release situation when the canonical URL would be appropriate, see this article I wrote on KISSmetrics about a year ago:
Shortly afterwards, Entrepreneur.com picked up the article, because they had an agreement to do so with KISSmetrics. (Remember they had permission! You can’t just do this whenever you feel like it!)
Here is that article’s URL:
Now we have the same content on two URLs. Technically that’s duplicate content, and duplicate content is ‘bad,’ remember? But fear not! If you view the source code of the Entreprenuer.com article, you’ll find this:
<link rel=”canonical” href=”http://blog.kissmetrics.com/click-worthy/” />
This is telling search engines where the content originally appeared, which is the right thing to do. It also removes the suspicion of scraping content, in the eyes of search engine bots (who don’t have any other way of knowing your legal rights to publish copyrighted work).
You would not want an entire site that only publishes other people’s articles though. The canonical URL element probably won’t help you there as far as ranking goes. So don’t over-use this tactic.
When You Can’t Use the Canonical URL for External Duplicate Content
I want to bring this up because I see it a lot. If you are going to write a company description or personal bio on your website, I would not recommend using that same wording on your social media profiles or other places on the web.
If you are writing the same thing over and over again on your LinkedIn Company page, your Google Plus Business Page, and all that, you are essentially duplicating content. It wouldn’t be appropriate to use a canonical URL on your ‘About’ page and point it to a social media profile. You want your ‘About’ page to rank in its own right. In this case, please, just write a completely new description for external use. I do this for all my SEO clients.
How to use the Canonical URL in WordPress
There is more than one way to do this, but I’m just going to give you the best way that I know how: use the WordPress SEO plugin by Joost De Valk.
As soon as you install this plugin on your site, it takes care of a lot of your SEO, including the canonical URL on what I have called your ‘hidden’ URLs (see above). But there are still settings to pay attention to.
In the screenshot below, you can see that on a single post or page editing screen (this goes for Custom Post Types too if you have them enabled), the WordPress SEO box has many fields and settings. To control the Canonical URL, which you’d want to do for things like press releases and external duplicate content, click on the “Advanced” tab:
This plugin has made the canonical URL element really simple for you. All you have to do is enter the full url (not relative paths) of the original source of the content you are publishing on this page. In other words, the URL that copies the content is the one that needs to use the canonical URL in its HTML header. The copier is the one with the responsibility to tell search engines that they copied. Make sense?
Ok, but this is just a principle. In the examples we gave above, sometimes you want to say that you are the copier, even if you are not, because it would otherwise be hard to control. Like with press releases. You can publish the press release on your company site and, if you want, you can attribute the original source of content to the syndication network you are using. You would be like one of the syndicators in this sense.
(By the way, this is just my advice of what to do in this scenario. It may not be ‘sanctioned’ advice by other SEO experts who can disagree. So, use at your own discretion. Google claims to be good at finding the original source of content, and the canonical pointer is only a suggestion; search engines can ignore it).
In other cases, the canonical URL you enter here will be the internal URL on your site that contains the duplicate content. Let say for example you describe your product on a static page to sell to wholesale customers. But you use that same product description in the e-commerce section of your site where people buy at retail rates. You can tell search engines which URL you prefer them to rank by using the canonical URL field in the WordPress SEO settings.
Note: you do not need to indicate the www or non www versions of your URL on each and every page using these settings. The WordPress SEO plugin does that automatically. You only want to use these settings when the URLs are very different, or the content is on another domain.
Https Sitewide Canonical URL Setting Using WordPress SEO
There’s still more you can do with canonical URLs using the WordPress SEO plugin. If you are using an SSL certificate (such as for e-commerce), you can choose to force your canonical tag to use the https version of your URLs, in case the pages can be reached by both http and https. You would go to SEO > Permalinks and scroll to where it says “Canonical Settings” in your WordPress dashboard.
Clicking on the drop down list there will give you the option of picking which type of URL you want to be the canonical URL element in the <head> tag of all your pages:
When NOT to use the Canonical URL Element
First off, read this article on the Google Webmaster Central blog about common mistakes people make with the rel=canonical URL. Make sure you, or your web developer, are not doing any of these.
Secondly, don’t use a canonical URL in the following scenarios:
When you want to do a 301 redirect
If you want to redirect one page to another page, so that users who type in an ‘old’ URL, or who click on a dead link, are automatically taken to the ‘new’ URL, you need to use a 301 redirect. Don’t use the canonical URL for this. In SEO, they can be used similarly, however.
(For more on this subject, see this article on Moz.com from when rel=canonical first became available. Note that advancements have since been made to allow for cross-domain usage).
Think of the difference in this way: a redirect means there is only one place where the content appears, and you are forcing all visitors to go to that one page. You would do this, for example, if you were changing your site to use a new domain, or were setting up new URL structures during a site re-build. You would also use a 301 redirect for sending people to either the www or non www version of your site (this will ensure no one links to your site with the wrong URL going forward).
With a canonical URL, you can have the same content on multiple pages around the web to serve users, and still have one ‘original’ source of content be visible. In other words, multiple pages containing the same content can exist and be viewed.
But, back in 2011, Rand Fishkin did an interesting experiment where he used a canonical URL in the header of all pages on an old domain to help a newer, different domain rank better. It surprisingly worked. He tells the story in this post. The post also explains clearly why the canonical URL is so relevant for cross-domain content syndication in the world of SEO. I’m not sure if this would work today, so if you try it, remember to consider it an experiment!
When you want search engines to ignore a page
Remember that the rel=canonical URL element is not the solution for all duplicate content issues. SEO is much more complex, and sometimes the more appropriate solution is to use a robots file to no-index a page instead. This is why the WordPress SEO plugin has allowances for this.
I recommend to my own SEO clients that they no-index pages that are not desirable entry points into their site, and are not really useful to most visitors. For example, do you really need your ‘Terms and Conditions’ page, or your login pages, to be appearing in search results, ever? No, not really. You want to make way for more valuable content to rank. It’s the sales pages, product descriptions and informative blog posts that matter.
I also recommend using the no-index rule for pages with very, very little content (since they make your site seem too sparse), and for archives containing duplicate content. In WordPress this would apply to the author archives, date archives and, in my method, all the tag archives (since these pages contain the same content as category archives). You also want to no-index Custom Post Types and their archives if they are only functioning to feed content onto other pages of your site.
Note: when you no-index something, be sure to remove it from your sitemap as well, or it will create errors in Google Webmaster Tools.
Fixing up Your URLs to Match your Canonical Pointer
Remember up above we talked about choosing one version of a URL to use in all your links from now on? Well, once you do that, you’ll need to ‘clean up’ or fix the URLs on your site and elsewhere to use the version you’ve decided on. So, let’s say you made a decision to use the non-www version of your site. Now you need to check to see if your internal site links, or places that link to you on the web, are using the www version. If they are, it would be wise to make an effort to change these. Yes, it can be manual and tedious labor, but it’s worth it.
For a faster route, if you know what you are doing, you can use a tool like Search Replace DB by Interconnect/it to do a quick replacement of all URLs on your site. But please, only use this if you know how to handle it and understand what it does.
There are also be plugins that can do a find and replace within your WordPress dashboard, but use them at your discretion. Also, be sure to remove anything that connects with your database as soon as you are done using it, to avoid security risks.
To take care of dead links within your WordPress posts and pages, try a plugin like Redirection, which will make this task easier on you.
When you are done with all that, be sure to log into your Google Webmaster Tools account and set a preferred URL for your site. Also, submit both the www and non www versions of your site to Google Webmaster Tools, so you can set the preference.
To Conclude: Use the Canonical URL for your SEO Benefit
Hopefully we’ve cleared up the confusion around the Canonical URL and how it affects your SEO. If you’re still confused, I strongly recommend checking out the links in this article for more references. But the great thing is, now that you know how to use it, it has the potential to produce great results for your SEO!
Duplicate content is a hard thing to manage for most business owners with limited time to write (as I’ve learned while working with many of them). Thankfully search engines have recognized that sometimes, there’s a legitimate need for the same content to be contained on more than one URL. They’ve provided the tool for us to use, and WordPress plugin authors have made it easy to implement, so let’s take advantage of it!
Article image thumbnail by phipatbig / Shutterstock.com