Network Working Group | M. Ohye |
Internet-Draft | J. Kupke |
Intended status: Informational | October 14, 2011 |
Expires: April 16, 2012 |
The Canonical Link Relation
draft-ohye-canonical-link-relation-04
[RFC5988] specified a way to define relationships between links on the web. This document describes a new type of such relationship, "canonical," which designates the preferred URI from a set of identical or vastly similar ones.
Distribution of this document is unlimited. Comments should be sent to the IETF Apps-Discuss mailing list (see https://www.ietf.org/mailman/listinfo/apps-discuss).
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 16, 2012.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The canonical link relation specifies the preferred URI from a set of URIs that return identical or vastly similar content, making it possible for references to the context URI to be updated to reference the target URI.
The most common application of the canonical link relation includes specifying the preferred version of a URI from duplicate content pages created with the addition of parameters (e.g. session IDs, tracking IDs, category, or sort information).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
The canonical (target) URI MUST identify content that duplicates, is extremely similar, or is a superset of the content at the context (referring) URI. Authors who declare the canonical link relation ought to anticipate that applications such as search engines can:
A resource SHOULD NOT specify more than one canonical link relation.
The target/canonical URI MAY:
The target/canonical URI SHOULD NOT designate:
The following example illustrates:
If the preferred version of a URI and its content exists at:
http://www.example.com/page.php?item=purse
Then duplicate content URIs such as:
http://www.example.com/page.php?item=purse&category=bags
http://www.example.com/page.php?item=purse&category=bags&sid=1234
may designate the canonical link relation in HTML as specified in [REC-html401-19991224]:
<link rel="canonical" href="http://www.example.com/page.php?item=purse">
or as a relative URI:
<link rel="canonical" href="page.php?item=purse">
or alternatively, in the HTTP header field as specified in Section 5 of [RFC5988]:
Link: <http://www.example.com/page.php?item=purse>; rel="canonical"
This signals to automated programs, such as search engines, that these are duplicates of the canonical URI: http://www.example.com/page.php?item=purse.
Automated programs may then select the canonical value as the display URI (such as in search results), and additional URI properties such as indexing and ranking signals, can be transferred as well.
Before adding the canonical link relation, verification of the following is recommended:
IANA is asked to register the Canonical Link Relation below as per [RFC5988].
Relation Name:
Description:
Reference:
Notes:
Application Data:
When a site is compromised, the canonical link relation can be implemented with malicious intent to designate the attacker's URI as the preferred version of the content. While this technique is largely unnoticeable to humans, automated programs may cluster the compromised resource as duplicative of the attacker's designated canonical, transferring properties such as link popularity away from the resource to the attacker's URI.
In designating a canonical URI, please see [RFC3986] for information on URI encoding.
[REC-html401-19991224] |
Le Hors, A., Raggett, D. and I. Jacobs, "HTML 4.01 Specification", W3C Recommendation REC-html401-19991224, December 1999. Latest version available at |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC2616] | Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. |
[RFC3986] | Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. |
[RFC5988] | Nottingham, M., "Web Linking", RFC 5988, October 2010. |
Automated programs that implement functionality with regard for the canonical link relation include: