Any web page(s) with the majority of content appearing on another webpage(s) (same domain or any other) is considered “duplicate”. But Search Engines also understand that there are valid reasons to have pages with similar (not same) content. Dynamic web servers (CMS supported) often create duplicate versions of a product page. To alleviate duplicate content issues, developers/marketers utilize a canonical tag, notifying search engine bots of the "canon" page to receive search engine value for a group of similar contents(s).
If there are several versions of the same content, then we can pick one "canonical" version and point the search engines at it. By doing so, the duplicate content problem will be solved. Canonical URL is used to inform Google and other search engines to crawl a website, and what URL to index that specific page’s content under.
For example, take a look at the below URLs.
https://nectarspot.com
www.nectarspot.com
Both the URLs are referring to the same homepage content of the website, but the URLs are slightly different. This can be an issue for the search engines. In these cases, we should specify a canonical link for the search engine.
Search Engine algorithms often penalize e-commerce portals and generic websites for having "duplicate content” to eliminate low-quality content from search engine results.
You may prefer people to reach your website via:
Rather than:
Using canonicals we can keep things clean.
When there are a variety of URLs, it becomes more difficult to get consolidated metrics for a specific piece of content. Canonical URLs help keep things simple and organized, especially when it comes to reporting performance to your client.
In Shopify, the canonical_url object returns the canonical URL for the current page. The canonical URL is the page's "default" URL with any URL parameters removed. It can be output like this:
<link rel="canonical" href="{{ canonical_url }}" />
Below are the common mistakes to avoid while utilizing the Canonical Tag feature on your e-commerce site
Rel=canonical should only appear in the <head> of the page. A canonical tag in the <body> section of the page will be ignored.
We should not block the canonicalized URL in robots.txt. If we block the URL, it prevents Google from crawling it, which means that they're unable to see any canonical tags on that page. Which in turn, prevents transferring any "link equity" from the non-canonical to the canonical.
We should not combine noindex and rel=canonical. Google usually prioritizes the canonical tag over the ‘noindex’ tag. If you want a page to be both noindex and canonicalize, use a 301 redirect. Otherwise, use rel=canonical.
Setting a 4XX HTTP status code for a canonicalized URL has a similar effect as using the 'noindex' tag. Google will not be able to work out the canonical tag and transfer "link equity" to the canonical version.
Paginated pages must not be canonicalized to the first paginated page within the series. Otherwise, self-referencing canonicals should be used on all paginated pages.
Multiple rel=canonical tags will be ignored by Google. In many cases, this happens because tags are inserted into a system at different points like by the CMS, the theme, and plugin(s).
Hreflang tag is used to specify the language and geographical targeting of the webpage. When using hreflang, we must "specify a canonical page within the same language, or the most effective substitute language."
Canonical link is a powerful tool in an SEO’s toolbox. Especially for larger sites, the process of canonicalization can be critical and lead to major SEO improvements.