Internet Engineering Task Force | H. VandeSompel |
Internet-Draft | Los Alamos National Laboratory |
Intended status: Informational | M.L. Nelson |
Expires: October 30, 2011 | Old Dominion University |
R.D. Sanderson | |
Los Alamos National Laboratory | |
April 28, 2011 |
HTTP framework for time-based access to resource states -- Memento
draft-vandesompel-memento-01
The HTTP-based Memento framework bridges the present and past Web by interlinking current resources with resources that encapsulate their past. It facilitates obtaining representations of prior states of a resource, available from archival resources in Web archives or version resources in content management systems, by leveraging the resource's URI and a preferred datetime. To this end, the framework introduces datetime negotiation (a variation on content negotiation), and new Relation Types for the HTTP Link header aimed at interlinking resources with their archival/version resources. It also introduces various discovery mechanisms that further support briding the present and past Web.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 30, 2011.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
This specification uses the terms "resource", "request", "response", "entity", "entity-body", "entity-header", "content negotiation", "client", "user agent", "server" as described in RFC 2616 [RFC2616], and it uses the terms "representation" and "resource state" as described in W3C.REC-aww-20041215 [W3C.REC-aww-20041215].
In addition, the following terms specific to the Memento framework are introduced:
The state of an Original Resource may change over time. Dereferencing its URI at any specific moment in time during its existence yields a representation of its then current state. Dereferencing its URI at any time past its existence no longer yields a meaningful representation, if any. Still, in both cases, resources may exist that encapsulate prior states of the Original Resource. Each such resource, named a Memento, has its own URI that, when dereferenced, returns a representation of a prior state of the Original Resource. Mementos may, for example, exist in Web archives, Content Management Systems, or Revision Control Systems.
Examples are:
Mementos for Original Resource http://www.ietf.org/ :
Mementos for Original Resource http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol :
Mementos for Original Resource http://www.w3.org/TR/webarch/ :
In the abstract, Memento introduces a mechanism to access versions of Web resources that:
The core components of Memento's mechanism to access resource versions are:
1. The abstract notion of the state of a resource identified by URI-R as it existed at some time Tj. Note the relationship with the ability to identify a the state of a resource at some datetime Tj by means of a URI as intended by the proposed Dated URI scheme I-D.masinter-dated-uri [I-D.masinter-dated-uri].
2. A bridge from the present to the past, consisting of:
3. A bridge from the past to the present, consisting of an appropriately typed link from a resource identified by URI-M, which encapsulates the state a resource identified by URI-R had at some datetime Tj, to the resource identified by URI-R.
Section 2 and Section 3 of this document are concerned with specifying an instantiation of these abstractions for resources that are identified by HTTP(S) URIs, whereas Section 4 details approaches to discover TimeGates, TimeMaps, and Mementos on the HTTP(S) Web by other means than typed links.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
When needed for extra clarity, the following conventions are used:
The Memento framework is concerned with Original Resources, TimeGates, Mementos, and TimeMaps that are identified by HTTP or HTTPS URIs. Details are only provided for resources identified by HTTP URIs but apply similarly to those with HTTPS URIs.
The Memento framework operates at the level of HTTP request and response headers. It introduces two new headers ("Accept-Datetime", "Memento-Datetime"), introduces new values for two existing headers ("Vary", "Link"), and uses an existing header ("Location") without modification. All these headers are described below. Other HTTP headers are present or absent in Memento response/request cycles as specified by RFC 2616 [RFC2616].
The "Accept-Datetime" request header is used by a user agent to indicate it wants to retrieve a representation of a Memento that encapsulates a past state of an Original Resource. To that end, the "Accept-Datetime" header is conveyed in an HTTP GET/HEAD request issued against a TimeGate for an Original Resource, and its value indicates the datetime of the desired past state of the Original Resource. The "Accept-Datetime" request header has no defined meaning for HTTP methods other than HEAD and GET.
The "Memento-Datetime" response header is used by a server to indicate that the response contains a representation of a Memento, and its value expresses the datetime of the state of an Original Resource that is encapsulated in that Memento. The URI of that Original Resource is provided in the response, as the Target IRI (see RFC5988 [RFC5988]) of a link provided in the HTTP "Link" header that has a Relation Type of "original" (see Section 2.2).
The presence of a "Memento-Datetime" header and associated value for a given resource constitutes a promise that the resource is stable and that its state will no longer change. This means that, in terms of the Ontology for Relating Generic and Specific Information Resources (see W3C.gen-ont-20090420 [W3C.gen-ont-20090420]), a Memento is a FixedResource.
As a consequence, "Memento-Datetime" headers associated with a Memento MUST be "sticky" in the following ways:
Values for the "Accept-Datetime" header consist of a MANDATORY datetime expressed according to the RFC 1123 [RFC1123] format, which is formalized by the rfc1123-date construction rule of the BNF in Figure 2, and an OPTIONAL interval indicator expressed according to the iso8601-interval rule of the BNF in Figure 2. The datetime MUST be represented in Greenwich Mean Time (GMT).
Examples of "Accept-Datetime" request headers with and without an interval indicator:
Accept-Datetime: Thu, 31 May 2007 20:35:00 GMT Accept-Datetime: Thu, 31 May 2007 20:35:00 GMT; -P3DT5H;+P2DT6H
The user agent uses the MANDATORY datetime value to convey its preferred datetime for a Memento; it uses the OPTIONAL interval indicator to convey it is interested in retrieving Mementos that reside within this interval around the preferred datetime, and not interested in Mementos that reside outside of it. Not using an interval indicator is equivalent to expressing an infinite interval around the preferred datetime.
The interval mechanism can be regarded as an implementation of the functionality intended by the q-value approach that is used in regular content negotiation. The q-value approach is not supported for Memento's datetime negotiation because it is well-suited for negotiation over a discrete space of mostly predictable values, not for negotiation over a continuum of unpredictable datetime values.
accept-dt-value = rfc1123-date *SP [ iso8601-interval ] rfc1123-date = wkday "," SP date1 SP time SP "GMT" date1 = 2DIGIT SP month SP 4DIGIT ; day month year (e.g., 20 Mar 1957) time = 2DIGIT ":" 2DIGIT ":" 2DIGIT ; 00:00:00 - 23:59:59 (e.g., 14:33:22) wkday = "Mon" | "Tue" | "Wed" | "Thu" | "Fri" | "Sat" | "Sun" month = "Jan" | "Feb" | "Mar" | "Apr" | "May" | "Jun" | "Jul" | "Aug" | "Sep" | "Oct" | "Nov" | "Dec" iso8601-interval = ";" *SP "-" duration *SP ";" *SP "+" duration duration = "P" ( dur-date | dur-week ) dur-date = ( dur-day | dur-month | dur-year ) [ dur-time ] dur-year = 1*DIGIT "Y" [ dur-month ] [ dur-day ] dur-month = 1*DIGIT "M" [ dur-day ] dur-day = 1*DIGIT "D" dur-time = "T" ( dur-hour | dur-minute | dur-second ) dur-hour = 1*DIGIT "H" [ dur-minute ] [ dur-second ] dur-minute = 1*DIGIT "M" [ dur-second ] dur-second = 1*DIGIT "S" dur-week = 1*DIGIT "W"
Values for the "Memento-Datetime" headers MUST be datetimes expressed according to the rfc1123-date construction rule of the BNF in Figure 2; they MUST be represented in Greenwich Mean Time (GMT).
An example "Memento-Datetime" response header:
Memento-Datetime: Wed, 30 May 2007 18:47:52 GMT
The "Vary" response header is used in responses to indicate the dimensions in which content negotiation was successfully applied. This header is used in the Memento framework to indicate both whether datetime negotiation was applied or is supported by the responding server.
For example, this use of the "Vary" header indicates that datetime is the only dimension in which negotiation was applied:
Vary: negotiate, accept-datetime
The use of the "Vary" header in this example shows that both datetime negotiation, and media type content negotiation were applied:
Vary: negotiate, accept-datetime, accept
The "Location" header is used as defined in RFC 2616 [RFC2616]. Examples are given in Section 3 below.
The "Link" response header is specified in RFC5988 [RFC5988]. The Memento framework introduces new Relation Types to convey typed links among Original Resources, TimeGates, Mementos, and TimeMaps. Already existing Relation Types, among others, aimed at supporting navigation among a series of ordered resources may also be used in the Memento framework. This is detailed in Link Header Relation Types [Link-Header-Relation-Types], below.
The "Link" header specified in RFC5988 [RFC5988] is semantically equivalent to the "<LINK>" element in HTML, as well as the "atom:link" feed-level element in Atom RFC 4287 [RFC4287]. By default, the origin of a link expressed by an entry in a "Link" header (named Context IRI in RFC5988 [RFC5988]) is the IRI of the requested resource. This default can be overwritten using the "anchor" attribute in the entry.
The Relation Types used in the Memento framework are listed in the remainder of this section, and their use is summarized in the below table. Appendix Appendix A shows a Memento request/response cycle that uses all the Relation Types that are introduced here.
Relation Type | Original Resource | TimeGate | Memento |
---|---|---|---|
original | NA, except see Section 3.1.2.1 | REQUIRED, 1 | REQUIRED, 1 |
timegate | RECOMMENDED, 0 or more | NA | RECOMMENDED, 0 or more |
timemap | NA | RECOMMENDED, 0 or more | RECOMMENDED, 0 or more |
memento | NA, except see Section 3.1.2.1 | REQUIRED, 1 or more | REQUIRED, 1 or more |
"original" -- A "Link" header entry with a Relation Type of "original" is used to point from a TimeGate or a Memento to their associated Original Resource. In both cases, an entry with the "original" Relation Type MUST occur exactly once in a "Link" header. Details for the entry are as follows:
"timegate" -- A "Link" header entry with a Relation Type of "timegate" is used to point both from an Original Resource or a Memento to a TimeGate for the Original Resource. In both cases, the use of an entry with the "timegate" Relation Type is RECOMMENDED. Since more than one TimeGate can exist for any Original Resource, multiple entries with a "timegate" Relation Type MAY occur, each with a distinct Target IRI. Since a TimeGate has no mime type, the "type" attribute MUST NOT be used on Links with a "timegate" Relation Type. Details for the entry are as follows:
"timemap" -- A "Link" header entry with a Relation Type of "timemap" is used to point from both a TimeGate or a Memento to a TimeMap resource from which a list of Mementos known to the responding server is available. Use of an entry with the "timemap" Relation Type is RECOMMENDED, and, since multiple serializations of a TimeMap are possible, multiple entries with a "timemap" Relation Type MAY occur, each with a distinct Target IRI, and each with a MANDATORY "type" attribute to convey the mime type of the TimeMap serialization. Details for the entry are as follows:
Further details about TimeMap serializations are provided in Section 3.4.
"memento" -- A "Link" header entry with a Relation Type of "memento" is used to point from both a TimeGate and a Memento to various Mementos for an Original Resource. This link MUST include a "datetime" attribute with a value that matches the "Memento-Datetime" of the Memento that is the target of the link; that is, the value of the "Memento-Datetime" header that is returned when the URI of the linked Memento is dereferenced. In addition, the link MAY include an "embargo" attribute to convey the datetime until which the Memento will remain inaccessible. The value for both the "datetime" and "embargo" attributes MUST be a datetime expressed according to the rfc1123-date construction rule of the BNF in Figure 2 and it MUST be represented in Greenwich Mean Time (GMT). This link MAY also include a "license" attribute to associate a license with the Memento; the value for the "license" attribute SHOULD be a URI. The link SHOULD also include a "type" attribute to convey the mime type of the Memento that is the target of the link. Use of entries with the "memento" Relation Type is REQUIRED and it MUST be as follows:
For all responses to HTTP HEAD/GET requests issued against a TimeGate or a Memento in which a Memento is selected or served by the responding server:
For all responses to HTTP HEAD/GET requests issued against an existing TimeGate or Memento in which no Memento is selected or served by the responding server:
Note that the Target IRI of some of these links may coincide. For example, if the selected Memento actually is the first Memento known to the server, only three distinct "memento" links may result. The value for the "datetime" attribute of these links would be the datetimes of the first (equal to selected), next, and most recent Memento known to the responding server.
The summary is as follows:
Web Linking RFC5988 [RFC5988] allows for the inclusion of links with different Relation Types but the same Target IRI, and hence the Relation Types introduced by the Memento framework MAY be combined with others as deemed necessary. As the "memento" Relation Type focuses on conveying the datetime of a linked Memento, Relation Types that allow navigating among the temporally ordered series of Mementos known to a server are of particular importance. With this regard, the Relation Types listed in the below table SHOULD be considered for combination with the "memento" Relation Type. A distinction is made between responding servers that can be categorized as systems that are the focus of RFC5829 [RFC5829] (such as version control systems) and others that can not (such as Web archives). Note that, in terms of RFC5829 [RFC5829], the last Memento (URI-Mn) is the version prior to the latest (i.e. current) version.
Memento Type | RFC5988 system | non RFC5988 system |
---|---|---|
First Memento (URI-M0) | first | first |
Last Memento (URI-Mn) | last | last |
Selected Memento (URI-Mj) | NA | NA |
Memento prior to selected Memento (URI-Mi) | predecessor-version | prev |
Memento next to selected Memento (URI-Mk) | successor-version | next |
This section describes the HTTP interactions of the Memento framework for a variety of scenarios. First, Figure 6 provides a schematic overview of a successful request/response chain that involves datetime negotiation. Dashed lines depict HTTP transactions between user agent and server. Appendix Appendix A shows these HTTP interactions in detail for the case where the Original Resource resides on one server, whereas both the TimeGate and the Mementos reside on another. Scenarios also exist in which all these resources are on the same server (for example, Content Management Systems) or on different servers (for example, an aggregator of TimeGates). Note that, in Step 2 and Step 6, the HTTP status code of the response is shown as "200 OK", but a series of "206 Partial Content" responses could be substituted without loss of generality.
1: UA --- HTTP GET/HEAD; Accept-Datetime: Tj ---------------> URI-R 2: UA <-- HTTP 200; Link: URI-G ----------------------------- URI-R 3: UA --- HTTP GET/HEAD; Accept-Datetime: Tj ---------------> URI-G 4: UA <-- HTTP 302; Location: URI-Mj; Vary; Link: URI-R,URI-T,URI-M0,URI-Mn,URI-Mi,URI-Mj,URI-Mk -------- URI-G 5: UA --- HTTP GET URI-Mj; Accept-Datetime: Tj -------------> URI-Mj 6: UA <-- HTTP 200; Memento-Datetime: Tj; Link: URI-R,URI-T,URI-G,URI-M0,URI-Mn,URI-Mi,URI-Mj,URI-Mk -- URI-Mj
The following sections detail the specifics of HTTP interactions with Original Resources, TimeGates, Mementos, and TimeMaps under various conditions.
This section details HTTP GET/HEAD requests targeted at an Original Resource (URI-R).
In order to try and discover a TimeGate for the Original Resource, the user agent SHOULD issue an HTTP HEAD or GET request against the Original Resource's URI. Use of the "Accept-Datetime" header in the HTTP HEAD/GET request is OPTIONAL.
Figure 7 shows the use of HTTP HEAD indicating the user agent is not interested in retrieving a representation of the Original Resource, but only in determining a TimeGate for it. It also shows the use of the "Accept-Datetime" header anticipating that the user agent will set it for the entire duration of a Memento request/response cycle.
HEAD / HTTP/1.1 Host: a.example.org Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
The response of the Original Resource's server to the user agent's HTTP HEAD/GET request of Step 1, for the case where the Original Resource exists, is as it would be in a regular HTTP request/response cycle, but in addition MAY include a HTTP "Link" header with a Relation Type of "timegate" that conveys the URI of the Original Resource's TimeGate as the Target IRI of the Link. Multiple HTTP Links with a relation type of "timegate" MAY be provided to accommodate situations in which the server is aware of multiple TimeGates for an Original Resource. The actual Target IRI provided in the "timegate" Link may depend on several factors including the datetime provided in the "Accept-Datetime" header, and the IP address of the user agent. A response for this case is illustrated in Figure 8.
HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://arxiv.example.net/timegate/http://a.example.org> ; rel="timegate" Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8859-1
Servers that actively maintain archives of their resources SHOULD include the "timegate" HTTP "Link" header because this link is an important way for a user agent to discover TimeGates for those resources. This includes servers such as Content Management Systems, Control Version Systems, and Web servers with associated transactional archives Fitch [Fitch]. Servers that do not actively maintain archives of their resources MAY include the "timegate" HTTP "Link" header as a way to convey a preference for TimeGates for their resources exposed by a third party archive. This includes servers that rely on Web archives such as the Internet Archive to archive their resources.
The server of the Original Resource MUST treat requests with and without an "Accept-Datetime" header in the same way:
The "Memento-Datetime" header MAY be applied to an Original Resource directly to indicate it is a FixedResource (see W3C.gen-ont-20090420 [W3C.gen-ont-20090420]), meaning that the state of the Original Resource has not changed since the datetime conveyed in the "Memento-Datetime" header, and as a promise that it will not change anymore beyond it. This may occur, for example, for certain stable media resources on news sites. In case the user agent's preferred datetime is equal to or more recent than the datetime conveyed as the value of "Memento-Datetime" in the server's response in Step 2, the user agent SHOULD conclude it has located an appropriate Memento, and it SHOULD NOT continue to Step 3.
Figure 9 illustrates such a response to a request for the resource with URI http://a.example.org/pic that has been stable since it was created. Note the use of both the "memento" and "original" Relation Types for links that have as Target IRI the URI of the Original Resource.
HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://a.example.org/pic> ; rel="original memento" ; datetime="Fri, 20 Mar 2009 11:00:00 GMT" Memento-Datetime: Fri, 20 Mar 2009 11:00:00 GMT Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8909-1
Cases may also exist in which a resource becomes stable at a certain point in its existence, but changed previously. In such cases, the Original Resource may know about a TimeGate that is aware of its prior history and hence MAY also include a link with a "timegate" Relation Type. This is illustrated in Figure 10, where the "memento" and "original" Relation Types are used as in Figure 9, and the existence of a TimeGate to negotiate for Mementos with datetimes prior to Fri, 20 Mar 2009 11:00:00 GMT is indicated.
HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://a.example.org/pic> ; rel="original memento" ; datetime="Fri, 20 Mar 2009 11:00:00 GMT", <http://arxiv.example.net/timegate/http://a.example.org/pic> ; rel="timegate" Memento-Datetime: Fri, 20 Mar 2009 11:00:00 GMT Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8909-1
Servers SHOULD also provide a "timegate" HTTP "Link" header in responses to requests for an Original Resource that the server knows used to exist, but no longer does. This allows the use of an Original Resource's URI as an entry point to representations of its prior states even if the resource itself no longer exists. A server's response for this case is illustrated in Figure 11.
HTTP/1.1 404 Not Found Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://arxiv.example.net/timegate/http://a.example.org/gone> ; rel="timegate" Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8909-1
In case the server is not aware of the prior existence of the Original Resource, its response SHOULD NOT include a "timegate" HTTP Link. Section 3.1.2.3 details what the user agent's behavior should be in such cases.
A user agent MAY ignore the TimeGate returned in Step 2. However, when engaging in a Memento request/response cycle, a user agent SHOULD NOT proceed immediately to Step 3 by using a TimeGate of its own preference but rather SHOULD always start the cycle by issuing an HTTP GET/HEAD against the Original Resource (Step 1, Figure 7) as it is an important way to learn about dedicated or preferred TimeGates for the Original Resource. Also, cases exist in which the response in Step 2 will not provide a "timegate" link, including:
In all these cases, the user agent SHOULD attempt to determine an appropriate TimeGate for the Original Resource, either automatically or interactively supported by the user. The discovery mechanisms described in Section 4 can support the user agent with this regard.
This section details HTTP GET/HEAD requests targeted at a TimeGate (URI-G).
In order to negotiate with a TimeGate, the user agent MUST issue a HTTP HEAD or GET against its URI, its request MUST include the "Accept-Datetime" header to express its datetime preference, and the use of that header MUST be as described in Section 2.1.1.1. The URI of the TimeGate may have been provided as the Target IRI of a "timegate" HTTP "Link" header in the response from the Original Resource (Step 2, Figure 8), or may have resulted from another discovery mechanism (see Section 4) or user interaction. Such a request is illustrated in Figure 12.
GET /timegate/http://a.example.org HTTP/1.1 Host: arxiv.example.net Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
In order to respond to a datetime negotiation request (Step 3, Section 3.2.1), the server uses an internal algorithm to select the Memento that best meets the user agent's datetime preference, and redirects to it. The exact nature of the selection algorithm is at the server's discretion but SHOULD be consistent. A variety of approaches can be used including selecting the Memento that is nearest in time (either past or future) or nearest in the past relative to the requested datetime. Special cases for datetime negotiation with a TimeGate exist, and they are addressed in Section 3.2.2.3 through Section 3.2.2.7.
In cases where the TimeGate exists, and the datetime provided in the user agent's "Accept-Datetime" header can be parsed and is not out of the user agent's range (see Section 3.2.2.5), the server selects a Memento based on the user agent's datetime preference. The response MUST have a "302 Found" HTTP status code, and the "Location" header MUST be used to convey the URI of the selected Memento. The "Vary" header MUST be provided and it MUST include the "negotiate" and "accept-datetime" values to indicate that datetime negotiation has taken place. The "Link" header MUST be provided and contain links with Relation Types subject to the considerations described in Section 2.2. Such a response is illustrated in Figure 13.
HTTP/1.1 302 Found Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache Vary: negotiate, accept-datetime Location: http://arxiv.example.net/web/20010911203610/http://a.example.org Link: <http://a.example.org>; rel="original", <http://arxiv.example.net/timemap/http://a.example.org> ; rel="timemap"; type="application/link-format", <http://arxiv.example.net/web/20000915112826/http://a.example.org> ; rel="first memento"; datetime="Tue, 15 Sep 2000 11:28:26 GMT", <http://arxiv.example.net/web/20080708093433/http://a.example.org> ; rel="last memento"; datetime="Tue, 08 Jul 2008 09:34:33 GMT", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="memento"; datetime="Tue, 11 Sep 2001 20:36:10 GMT", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="prev memento"; datetime="Tue, 11 Sep 2001 20:30:51 GMT", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="next memento"; datetime="Tue, 11 Sep 2001 20:47:33 GMT" Content-Length: 0 Content-Type: text/plain; charset=UTF-8 Connection: close
Note that if a user agent's "Accept-Datetime" header does not convey an interval indicator, and conveys a datetime that is either earlier than the datetime of the first Memento or later than the datetime of the most recent Memento known to the server, the server's response is as just described yet entails the selection of the first or most recent Memento, respectively. This approach is consistent with interpreting the abscence of an interval indicator in the user agent's request as an indication of an infinite interval around its preferred datetime (see Section 2.1.1.1).
This is illustrated in Figure 14 that shows the response from a TimeGate exposed by a MediaWiki server to a request by a user agent that has an "Accept-Datetime: Mon, 31 May 1999 00:00:00 GMT" header. Note that a link is provided with a "successor-version" Relation Type but not with a "predecessor-version" Relation Type.
HTTP/1.1 302 Found Server: Apache Content-Length: 709 Content-Type: text/html; charset=utf-8 Date: Thu, 21 Jan 2010 00:09:40 GMT Location: http://a.example.org/w/index.php?title=Clock&oldid=1493688 Vary: negotiate, accept-datetime Link: <http://a.example.org/w/Clock>; rel="original", <http://a.example.org/Special:TimeMap/http://a.example.org/w/Clock> ; rel="timemap", <http://a.example.org/w/index.php?title=Clock&oldid=1493688> ; rel="first memento"; datetime="Sun, 28 Sep 2003 01:42:00 GMT", <http://a.example.org/w/index.php?title=Clock&oldid=1493854> ; rel="successor-version memento" ; datetime="Tue, 30 Sep 2003 14:28:00 GMT", <http://a.example.org/w/index.php?title=Clock&oldid=337446696> ; rel="last memento"; datetime="Tue, 12 Jan 2010 19:55:00 GMT" Connection: close
When interacting with a TimeGate, the regular content negotiation dimensions (media type, character encoding, language, and compression) remain available. It is the TimeGate server's responsibility to honor (or not) such content negotiation, and in doing so it MUST always first select a Memento that meets the user agent's datetime preference, and then consider honoring regular content negotiation for it. As a result of this approach, the returned Memento will not necessarily meet the user agent's regular content negotiation preferences. Therefore, it is RECOMMENDED that the server provides HTTP Links with a "memento" Relation Type pointing at Mementos that do meet the user agent's regular content negotiation requests and that have a Memento-Datetime value in the temporal vicinity of the user agent's preferred datetime value.
In case, in Step 3, a user agent issues a request to a TimeGate and fails to include an "Accept-Datetime" request header, the response MUST be handled as in Section 3.2.2.1, with a selection of the most recent Memento known to the responding server.
Because the finest datetime granularity expressable using the RFC 1123 [RFC1123] format used in HTTP is seconds level, cases may occur in which a TimeGate server is aware of multiple Mementos that meet the user agent's datetime preference. This may occur in Content Management Systems with very high update rates. The response in this case MUST be handled as in Section 3.2.2.1, with the selection of one of the matching Mementos.
As an example, Figure 15 shows a hypothetical response from a TimeGate on a MediaWiki server to a request for a Memento for the Original Resource http://a.example.org/w/Clock for which two Mementos exist for the user agent's preferred datetime.
HTTP/1.1 302 Found Server: Apache Content-Length: 705 Content-Type: text/html; charset=utf-8 Date: Thu, 21 Jan 2010 00:09:40 GMT Vary: negotiate, accept-datetime Location: http://a.example.org/w/index.php?title=Clock&oldid=322586071 Link: <http://a.example.org/w/Clock>; rel="original", <http://a.example.org/Special:TimeMap/http://a.example.org/w/Clock> ; rel="timemap";type="application/link-format", <http://a.example.org/w/index.php?title=Clock&oldid=1493688> ; rel="first memento"; datetime="Sun, 28 Sep 2003 01:42:00 GMT", <http://a.example.org/w/index.php?title=Clock&oldid=337446696> ; rel="last memento"; datetime="Tue, 12 Jan 2010 19:55:00 GMT", <http://a.example.org/w/index.php?title=Clock&oldid=322586071> ; rel="memento"; datetime="Sun, 31 May 2009 15:43:00 GMT", <http://a.example.org/w/index.php?title=Clock&oldid=326164283> ; rel="memento successor-version" ; datetime="Sun, 31 May 2009 15:43:00 GMT" <http://a.example.org/w/index.php?title=Clock&oldid=326164283> ; rel="memento predecessor-version" ; datetime="Sun, 31 May 2009 15:41:24 GMT" Connection: close
In case, in Step 3, a user agent conveys an interval indicator, and the responding server is not aware of any Mementos with datetimes within the expressed interval, the server's response MUST have a "406 Not Acceptable" HTTP status code. The use of the "Vary" header MUST be as described in Section 3.2.2.1. The use of the "Link" header MUST be as described in Section 2.2. Specifically, the use of links with a "memento" Relation Type MUST follow the rules for the case where no Memento is selected by the responding server (Section 2.2.1.4).
Figure 16 shows a user agent using an "Accept-Datetime" header conveying an interval of interest starting 5 hours before and ending 6 hours after Tue, 11 Sep 2001 20:35:00 GMT. Figure 17 shows the "406 Not Acceptable" response from the TimeGate that has links to the first and last Memento, as well to a Memento outside of the user agent's interval yet in the temporal vicinity of its preferred datetime.
GET /timegate/http://a.example.org HTTP/1.1 Host: arxiv.example.net Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT; -P5H;+P6H Connection: close
HTTP/1.1 406 Not Acceptable Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache Vary: negotiate, accept-datetime Link: <http://an.example.org>; rel="original", <http://arxiv.example.net/timemap/http://a.example.org> ; rel="timemap";type="application/link-format", <http://arxiv.example.net/web/20000915112826/http://a.example.org> ; rel="memento first"; datetime="Tue, 15 Sep 2000 11:28:26 GMT", <http://arxiv.example.net/web/20080708093433/http://a.example.org> ; rel="memento last"; datetime="Tue, 08 Jul 2008 09:34:33 GMT", <http://arxiv.example.net/web/20000915112826/http://a.example.org> ; rel="memento"; datetime="Mon, 10 Sep 2001 08:22:00 GMT" Content-Length: 1732 Connection: close Content-Type: text/plain; charset=UTF-8
In case, in Step 3, a user agent conveys a value for the "Accept-Datetime" request header that does not conform to the accept-dt-value construction rule of the BNF in Figure 2, the TimeGate server's response MUST have a "400 Bad Request" HTTP status code. With all other respects, responses in this case MUST be handled as described in Section 3.2.2.5
Cases may occur in which a user agent issues a request against a TimeGate that does not exist. This may, for example, occur when a user agent uses internal knowledge to construct the URI of an assumed, yet non-existent TimeGate. In these cases, the response from the target server MUST have a "404 Not Found" HTTP status code, and SHOULD include a "Vary" header that includes the "negotiate" and "accept-datetime" values as an indication that, generally, the server is capable of datetime negotiation. The response MUST NOT include a "Link" header with any of the Relation Types introduced in Section 2.2.1.
In the above, the safe HTTP methods GET and HEAD are described for TimeGates. TimeGates MAY support the safe HTTP methods OPTIONS and TRACE in the way described in RFC 2616 [RFC2616]. Unsafe HTTP methods (i.e. PUT, POST, DELETE) MUST NOT be supported by a TimeGate. Such requests MUST yield a response with a "405 Method Not Allowed" HTTP status code, and MUST include an "Allow" header to convey that only the HEAD and GET (and OPTIONALLY the OPTIONS and TRACE) methods are supported. In addition, the response MUST have a "Vary" header that includes the "negotiate" and "accept-datetime" values to indicate the TimeGate supports datetime negotiation. Figure 18 shows such a response.
HTTP/1.1 405 Method Not Allowed Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Vary: negotiate, accept-datetime Allow: HEAD, GET Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8909-1
When a user agent issues a HTTP HEAD/GET request against a resource of which it found the URI as the Target IRI of an entry in the "Link" header with a "timegate" Relation Type, it SHOULD NOT assume that the targeted resource effectively is a TimeGate and hence will behave as described in Section 3.2.2.
A user agent MUST decide it has reached a TimeGate if the response to a HTTP HEAD/GET request against the resource's URI contains a "Vary" header that includes the "negotiate" and "accept-datetime" values. If the response does not, the user agent MUST decide it has not reached a TimeGate and proceed as follows:
Resources that are not TimeGates (i.e. do not behave as described in Section 3.2.2) MUST NOT use a "Vary" header that includes the "accept-datetime" value.
This section details HTTP GET/HEAD requests targeted at a Memento (URI-M).
In Step 5, the user agent issues a HTTP GET request against the URI of a Memento. The user agent MAY include an "Accept-Datetime" header in this request, but the existence or absence of this header MUST NOT affect the server's response. The URI of the Memento may have resulted from a response in Step 4, or the user agent may simply have happened upon it. Such a request is illustrated in Figure 19.
GET /web/20010911203610/http://a.example.org HTTP/1.1 Host: arxiv.example.net Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
If the Memento requested by the user agent in Step 5 exists, the server's response MUST have a "200 OK" HTTP status code (or "206 Partial Content", where appropriate), and it MUST include a "Memento-Datetime" header with a value equal to the archival datetime of the Memento, that is, the datetime of the state of the Original Resource that is encapsulated in the Memento. The "Link" header MUST be provided and contain links subject to the considerations described in Section 2.2. The Target IRI and, when applicable, the datetime values in the "Link" header associated with the "memento" Relation Type SHOULD be the same as conveyed in Step 4, in case the TimeGate and the selected Memento reside on the same server. However, they MAY be different in case the TimeGate and the selected Memento reside on different servers.
Figure 20 illustrates the server's response to the request issued against a Memento in Step 5 (Figure 19).
HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:09:40 GMT Server: Apache-Coyote/1.1 Memento-Datetime: Tue, 11 Sep 2001 20:36:10 GMT Link: <http://a.example.org>; rel="original", <http://arxiv.example.net/timemap/http://a.example.org> ; rel="timemap"; type="application/link-format", <http://arxiv.example.net/timegate/http://a.example.org> ; rel="timegate", <http://arxiv.example.net/web/20000915112826/http://a.example.org> ; rel="first memento"; datetime="Tue, 15 Sep 2000 11:28:26 GMT", <http://arxiv.example.net/web/20080708093433/http://a.example.org> ; rel="last memento"; datetime="Tue, 08 Jul 2008 09:34:33 GMT", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="memento"; datetime="Tue, 11 Sep 2001 20:36:10 GMT", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="prev memento"; datetime="Tue, 11 Sep 2001 20:30:51 GMT", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="next memento"; datetime="Tue, 11 Sep 2001 20:47:33 GMT" Content-Length: 23364 Content-Type: text/html;charset=utf-8 Connection: close
The server's response MUST include the "Memento-Datetime" header regardless whether the user agent's request contained an "Accept-Datetime" header or not. This is the way by which resources make explicit that they are Mementos. Due to the sparseness of Mementos in most archives, the value of the "Memento-Datetime" header returned by a server may differ (significantly) from the value conveyed by the user agent in "Accept-Datetime".
Although a Memento encapsulates a prior state of an Original Resource, the entity-body returned in response to an HTTP GET request issued against a Memento may very well not be byte-to-byte the same as an entity-body that was previously returned by that Original Resource. Various reasons exist why there are significant chances these would be different yet do convey substantially the same information. These include format migrations as part of a digital preservation strategy, URI-rewriting as applied by some Web archives, and the addition of banners as a means to brand Web archives.
Cases may occur in which a TimeGate's response (Step 4) points at a Memento that actually does not exist, resulting in a user agent's request (Step 5) for a non-existent Memento. In this case, the server's response MUST have the expected "404 Not Found" HTTP Status Code and it MUST NOT contain a "Memento-Datetime" header.
Cases may occur in which a server that hosts Mementos does not expose a TimeGate for those Mementos. This can, for example, be the case if the server's Mementos result from taking a snapshot of the state of a set of Original Resources from another server at the time this other server is being retired. As a result, only a single Memento per Original Resource is hosted, making the introduction of a TimeGate unnecessary. But it may also be the case for servers that hosts multiple Mementos for an Original Resource but consider exposing TimeGates too expensive.
In cases of Mementos without associated TimeGates, responses to a request for a Memento by a user agent MUST be as described in Section 3.3.2 with the exception that it will not contain a HTTP Link with a "timegate" Relation Type pointing at a TimeGate exposed by the responding server. It MAY still contain such a Link pointing at a TimeGate exposed elsewhere. Depending on whether one or more Mementos are hosted for an Original Resource, the response may or may not have a HTTP Link with a "timemap" Relation Type. However, the response MUST still contain a "Memento-Datetime" response header with a value that corresponds to archival datetime of the Memento.
Figure 21 illustrates the server's response to the request issued against a Memento in Step 5 (Figure 19) for the case that Memento has no associated TimeGate. In this example, it is also assumed there is only one Memento for the Original Resource, and hence the Links with Relation Types "memento", "first", "last" all point at the same - responding - Memento.
HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:09:40 GMT Server: Apache-Coyote/1.1 Memento-Datetime: Tue, 11 Sep 2001 20:36:10 GMT Link: <http://a.example.org>; rel="original", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="first last memento" ; datetime="Tue, 15 Sep 2000 11:28:26 GMT" Content-Length: 23364 Content-Type: text/html;charset=utf-8 Connection: close
Note that a server issuing a response similar to that of Figure 21 does not imply that there is no server whatsoever that exposes a TimeGate; it merely means that the responding server neither provides nor is aware of the location of a TimeGate.
When following the redirection provided by a confirmed TimeGate (see Section 3.2.3), a user agent SHOULD NOT assume that the targeted resource effectively is a Memento and hence will behave as described in Section 3.3.2.
A user agent MUST decide it has reached a Memento if the response to a HTTP HEAD/GET request against the resource's URI contains a "Memento-Datetime" header with a legitimate value. If the response does not, the following applies:
A TimeMap is introduced to support retrieving a comprehensive list of all Mementos for a specific Original Resource, known to a responding server. The entity-body of a response to an HTTP GET request issued against a TimeMap's URI:
The entity-body of a response from a TimeMap MAY be serialized in various ways, but the link-value format serialization MUST be supported. In this serialization, the entity-body MUST be formatted in the same way as the value of a HTTP "Link" header, and hence MUST comply to the "link-value" construction rule of "Section 5. The Link Header Field" of RFC5988 [RFC5988], and the media type of the entity-body MUST be "application/link-format" as introduced in I-D.ietf-core-link-format [I-D.ietf-core-link-format]. All links conveyed in this serialization MUST be interpreted as having the URI of the Original Resource as their Context IRI. The URI of the Original Resource is provided in the entity-body as the Target IRI of the link with an "original" Relation Type.
In order to retrieve the link-value serialization of a TimeMap, a user agent SHOULD use an "Accept" request header with a value set to "application/link-format". This is shown in Figure 22.
GET /timemap/http://a.example.org HTTP/1.1 Host: arxiv.example.net Accept: application/link-format;q=1.0 Connection: close
If the TimeMap requested by the user agent exists, the server's response MUST have a "200 OK" HTTP status code (or "206 Partial Content", where appropriate). Note that a TimeMap is itself an Original Resource for which Mementos may exist. For example, a response from a TimeMap could provide a "timegate" Link to a TimeGate via which prior TimeMap versions are available. In this case, the use of the "Link" header is subject to all considerations described in Section 2.2, with the TimeMap acting as the Original Resource.
However, in case a TimeMap wants to explicitly indicate in its response headers for which Original Resource it is a TimeMap, it MUST do so by including a HTTP "Link" header with the following characteristics:
Because the Context IRI of this HTTP Link is not the URI of the TimeMap, as per RFC5988 [RFC5988], the default Context IRI must be overwritten by using the "anchor" attribute with a value of the URI of the Original Resource.
The response from the TimeMap to the request of Figure 22 is shown in Figure 23. The response header shows the TimeMap explicitly conveying the URI of the Original Resource for which it is a TimeMap; for practical reasons the entity-body in the example has been abbreviated. Notice also the use of the "license" and "embargo" attributes introduced in Section 2.2.1.4 on the "memento" links in the TimeMap.
HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache Link: <http://arxiv.example.net/timemap/http://a.example.org> ; anchor="http://a.example.org"; rel="timemap" ; type="application/link-format" Content-Length: 4883 Content-Type: application/link-format Connection: close <http://a.example.org>;rel="original", <http://arxiv.example.net/timemap/http://a.example.org> ; rel="timemap";type="application/link-format", <http://arxiv.example.net/timegate/http://a.example.org> ; rel="timegate", <http://arxiv.example.net/web/20000620180259/http://a.example.org> ; rel="first memento";datetime="Tue, 20 Jun 2000 18:02:59 GMT" ; license="http://creativecommons.org/publicdomain/zero/1.0/", <http://arxiv.example.net/web/20091027204954/http://a.example.org> ; rel="last memento";datetime="Tue, 27 Oct 2009 20:49:54 GMT" ; license="http://creativecommons.org/publicdomain/zero/1.0/" ; embargo="Tue, 19 Apr 2011 00:00:00 GMT", <http://arxiv.example.net/web/20000621011731/http://a.example.org> ; rel="memento";datetime="Wed, 21 Jun 2000 01:17:31 GMT" ; license="http://creativecommons.org/publicdomain/zero/1.0/", <http://arxiv.example.net/web/20000621044156/http://a.example.org> ; rel="memento";datetime="Wed, 21 Jun 2000 04:41:56 GMT" ; license="http://creativecommons.org/publicdomain/zero/1.0/", ...
Section 3 describes how TimeGates, Mementos, Original Resources, and TimeMaps can be discovered by following HTTP Links with Relation Types "timegate", "memento", "original", and "timemap", respectively.
Naturally, some of these links can also be embedded into representations of resources that have a media type that allows embedding of typed links. For example, an Original Resource that has an HTML representation can include a "timegate" link by using HTML's LINK element, e.g. <link href="http://arxiv.example.net/timegate/http://a.example.org" rel="timegate">. The use of such embedded links is also subject to the considerations of Section 2.2.
In this section additional approaches are introduced that support batch discovery of TimeGates, TimeMaps, and Mementos. The approaches leverage the Robots Exclusion Protocol and a special-purpose profile of Atom Feeds named TimeMap Feeds.
The Robots Exclusion Protocol's robots.txt file [robotstxt.org] is commonly used by Web site owners to give instructions about their site to Web robots. It is used both to protect resources hosted by a server from crawling and to facilitate discovering them. This document introduces the "TimeGate" and "Archived" directives for robots.txt to provide a server-wide mechanism to support TimeGate discovery that SHOULD be used by:
A robots.txt file MAY contain zero or more occurrences of the "TimeGate" directive, and each occurrence MUST be followed by one or more associated "Archived" directives. The meaning of the directives is as follows:
For example, consider a wiki at http://a.example.org/w/ that supports the Memento framework and exposes TimeGates to access the wiki's history pages at base URL http://a.example.org/w/index.php/Special:TimeGate/. An actual TimeGate for the wiki's http://a.example.org/w/My_Title page would then be at http://a.example.org/w/index.php/Special:TimeGate/http://a.example.org/w/My_Title. This wiki SHOULD make its TimeGates discoverable by using the directives shown in Figure 24 in its robots.txt file.
TimeGate: http://a.example.org/w/index.php/Special:TimeGate/ Archived: a.example.org/w/
As another example, consider a server of Original Resources at http://a.example.org/ and http://www.a.example.org/ that is aware that its resources are regularly crawled by a Web archive that generally exposes TimeGates at base URL http://arxiv.example.net/timegate/ and hence has TimeGate http://arxiv.example.net/timegate/http://a.example.org/ to access Mementos for http://a.example.org/. This server SHOULD make the remote TimeGates discoverable by including the directives shown in Figure 25 in its robots.txt file:
TimeGate: http://arxiv.example.net/timegate/ Archived: a.example.org/ Archived: www.a.example.org/
And, consider a Web archive that crawls a wide range of Original Resources, and exposes TimeGates to access the resulting Mementos at base URL http://arxiv.example.net/timegate/. In order to make its TimeGates discoverable, this Web archive SHOULD include the directives shown in Figure 26 in its robots.txt file:
TimeGate: http://arxiv.example.net/timegate/ Archived: *
Atom Feeds [RFC4287] are commonly used to support discovery of news items by humans and are also frequently used for automated discovery by a variety of applications. This section introduces a profile of Atom Feeds named TimeMap Feeds intended to support batch discovery of TimeMaps. The discovery of TimeMap Feeds is in its turn supported by the new "TimeMapFeed" directive for robots.txt.
TimeMap Feeds are special-purpose Atom Feeds that SHOULD be published by servers to support batch discovery of their Mementos. The following are the essential characteristics of a TimeMap Feed:
Further details about the use of feed-level and entry-level elements in a TimeMap Feed are provided in Section 4.2.1.1 and Section 4.2.1.2, respectively.
This section discusses the use of feed-level Atom elements in TimeMap Feeds. All elements are as specified in Atom [RFC4287], yet additional constraints or guidelines apply to some when used in TimeMap Feeds.
As the content of the atom:id element, a tag URI [RFC4151] or an HTTP URI equal to the one provided as the value of the "href" attribute of the MANDATORY feed-level "atom:link" element with a "rel" attribute equal to "self" is RECOMMENDED.
The atom:author element MUST occur exactly once, and is constructed as follows:
The atom:category element MUST occur at least once, and its use is as follows:
Figure 27 shows the use of feed-level elements for a TimeMap Feed published by the server http://arxiv.example.net/.
<feed xmlns="http://www.w3.org/2005/Atom"> <id>http://arxiv.example.net/timemapfeeds/feed1</id> <title>Feed 1 of arxiv.example.net TimeMaps</title> <updated>2011-05-01 12:34:00 GMT</updated> <author> <name>Example Web Archive</name> <uri>http://arxiv.example.net/</uri> <email>admin@arxiv.example.net</email> </author> <rights>Content of this feed is public domain.</rights> <icon>http://arxiv.example.net/images/icon.png</icon> <category term=".be" scheme="http://purl.org/memento/categories/archived"/> <category term="webarchive" scheme="http://purl.org/memento/categories/class"/> <category term="TimeMapFeed" scheme="http://purl.org/memento/categories/feedtype"/> <link rel="self" href="http://arxiv.example.net/timemapfeeds/feed1"/> <link rel="license" href= "http://creativecommons.org/publicdomain/zero/1.0/"/> ... entries go here ... </feed>
This section discusses the use of entry-level Atom elements in TimeMap Feeds. All elements are as specified in Atom [RFC4287], yet additional constraints or guidelines apply to some when used in TimeMap Feeds.
The content of the atom:id element MUST be a tag URI [RFC4151] as specified by the "timemap-tagURI" construction rule of Figure 28, and it MUST have the URI of the Original Resource as the value for the "or-uri" component of that rule. If the feed is moved or copied, the tag URI that is provided as the value of the atom:id element MUST remain the same.
timemap-tagURI = "tag:" taggingEntity ":" or-uri taggingEntity = DNSname "," "2011" DNSname = DNScomp *( "." DNScomp ) ; see RFC 1035 DNScomp = alphaNum [*(alphaNum /"-") alphaNum] alphaNum = DIGIT / ALPHA or-uri = scheme ":" hier-part [ "?" query ] ; see RFC 3986 scheme = "http" | "https"
The atom:author element MUST NOT be used. Authorship information for an entry is inherited as follows:
The atom:updated element MUST be used and its value MUST change whenever the entry changes, including when the TimeMap conveyed by the entry (by-value or by-reference) changes.
The atom:link element MUST occur at least once and its use is as follows:
If the entry does not contain an atom:link element pointing to a TimeMap serialized according to the link-value format, then the atom:content element MUST be used to directly contain such a TimeMap wrapped in a CDATA section. The "type" attribute of this atom:content element MUST be used and it MUST have the value "application/link-format" (see Section 3.4).
Figure 29 shows the use of entry-level elements for a TimeMap Feed published by the server http://arxiv.example.net/.
<feed xmlns="http://www.w3.org/2005/Atom"> ... feed information ... <entry> <id>tag:arxiv.example.net,2011:http://a.example.org/</id> <title/> <updated>2011-05-01 12:34:00 GMT</updated> <link rel="alternate timemap" type="application/link-format" href="http://arxiv.example.net/timemap/http://a.example.org"/> <link rel="original" href="http://a.example.org/" /> <link rel="timegate" href="http://arxiv.example.net/timegate/http://a.example.org"/> </entry> ... more entries ... </feed>
Servers that publish TimeMap Feeds SHOULD make them discoverable by using the "TimeMapFeed" directive for robots.txt that is introduced here.
A robots.txt file MAY contain zero or more occurrences of the "TimeMapFeed" directive, and its meaning is as follows:
Figure 30 shows an excerpt of the robots.txt file of the server at http://arxiv.example.net/ that hosts two TimeMap Feeds to make its Mementos discoverable.
TimeMapFeed: http://arxiv.example.net/timemapfeeds/feed1 TimeMapFeed: http://arxiv.example.net/timemapfeeds/feed2
Servers can support discovery of their Mementos by crawlers through the use of the Robots Exclusion Protocol, but SHOULD do so in a manner that conveys to crawlers and mirroring applications that the sticky Memento-Datetime behavior (see Section 2.1.1) MUST be respected. To that end, servers SHOULD use the "User-agent" and "Allow" directives of the Robots Exclusion Protocol in the following manner:
Figure 31 shows the robots.txt for a server that generally disallows crawling, yet allows agents that respect the sticky Memento-Datetime behavior to crawl Mementos in the /web/ path.
User-agent: * Disallow: / User-agent: memento Allow: /web/
This memo requires IANA to register the Accept-Datetime and Memento-Datetime HTTP headers defined in Section 2.1.1 in the appropriate IANA registry.
This memo requires IANA to register the "Link" header Relation Types "original", "timegate", "timemap", and "memento" defined in Section 2.2.1 in the appropriate IANA registry.
This memo requires IANA to register the "datetime", "license", and "embargo" attributes for Link headers with a "memento" Relation Type, as defined in Section 2.2.1.4 in the appropriate IANA registry.
Provision of a "timegate" HTTP "Link" header in responses to requests for an Original Resource that is protected (e.g., 401 or 403 HTTP response codes) is OPTIONAL. The inclusion of this Link when requesting authentication is at the server's discretion; cases may exist in which a server protects the current state of a resource, but supports open access to prior states and thus chooses to supply a "timegate" HTTP "Link" header. Conversely, the server may choose to not advertise the TimeGate URIs (e.g., they exist in an intranet archive) for unauthenticated requests.
Authentication, encryption and other security related issues are otherwise orthogonal to Memento.
v02 2011-05-11 HVDS MLN RS draft-vandesompel-memento-01
v01 2010-11-11 HVDS MLN RS First public version draft-vandesompel-memento-00
v00 2010-10-19 HVDS MLN RS Limited circulation version
2010-07-22 HVDS MLN First internal version
The Memento effort is funded by the Library of Congress. Many thanks to Kris Carpenter Negulescu, Michael Hausenblas, Erik Hetzner, Larry Masinter, Gordon Mohr, Mark Nottingham, David Rosenthal, Ed Summers for early feedback. Many thanks to Samuel Adams, Scott Ainsworth, Lyudmilla Balakireva, Frank McCown, Harihar Shankar, Brad Tofel for early implementations.
[RFC5988] | Nottingham, M., "Web Linking", RFC 5988, October 2010. |
[I-D.ietf-core-link-format] | Shelby, Z, "CoRE Link Format", Internet-Draft draft-ietf-core-link-format-03, March 2011. |
[RFC2616] | Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. |
[RFC5829] | Brown, A., Clemm, G. and J. Reschke, "Link Relation Types for Simple Version Navigation between Web Resources", RFC 5829, April 2010. |
[RFC4287] | Nottingham, M. and R. Sayre, "The Atom Syndication Format", RFC 4287, December 2005. |
[RFC4151] | Kindberg, T. and S. Hawke, "The 'tag' URI Scheme", RFC 4151, October 2005. |
[RFC1123] | Braden, R., "Requirements for Internet Hosts - Application and Support", STD 3, RFC 1123, October 1989. |
[I-D.masinter-dated-uri] | Masinter, L, "The 'tdb' and 'duri' URI schemes, based on dated URIs", Internet-Draft draft-masinter-dated-uri-08, January 2011. |
[W3C.gen-ont-20090420] | Berners-Lee, , "Architecture of the World Wide Web", April 2009. |
[W3C.REC-aww-20041215] | Jacobs, and Walsh, "Architecture of the World Wide Web", December 2004. |
[Fitch] | Fitch, , "Web site archiving - an approach to recording every materially different response produced by a website", July 2003. |
[robotstxt.org] | Robots Exclusion Protocol", August 2010. | , "
Step 1 : UA --- HTTP GET/HEAD; Accept-Datetime: Tj ---------> URI-R HEAD / HTTP/1.1 Host: a.example.org Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close Step 2 : UA <-- HTTP 200; Link: URI-G ----------------------- URI-R HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://arxiv.example.net/timegate/http://a.example.org> ; rel="timegate" Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8859-1 Step 3 : UA --- HTTP GET/HEAD; Accept-Datetime: Tj ---------> URI-G GET /timegate/http://a.example.org HTTP/1.1 Host: arxiv.example.net Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close Step 4 : UA <-- HTTP 302; Location: URI-Mj; Vary; Link: URI-R, URI-T, URI-M0, URI-Mn, URI-Mi, URI-Mj, URI-Mk ---- URI-G HTTP/1.1 302 Found Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache Vary: negotiate, accept-datetime Location: http://arxiv.example.net/web/20010911203610/http://a.example.org Link: <http://a.example.org>; rel="original", <http://arxiv.example.net/web/20000915112826/http://a.example.org> ; rel="first memento"; datetime="Tue, 15 Sep 2000 11:28:26 GMT", <http://arxiv.example.net/web/20080708093433/http://a.example.org> ; rel="last memento"; datetime="Tue, 08 Jul 2008 09:34:33 GMT", <http://arxiv.example.net/timemap/http://a.example.org> ; rel="timemap"; type="application/link-format", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="memento"; datetime="Tue, 11 Sep 2001 20:36:10 GMT", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="prev memento"; datetime="Tue, 11 Sep 2001 20:30:51 GMT", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="next memento"; datetime="Tue, 11 Sep 2001 20:47:33 GMT" Content-Length: 0 Content-Type: text/plain; charset=UTF-8 Connection: close Step 5 : UA --- HTTP GET URI-Mj; Accept-Datetime: Tj -------> URI-Mj GET /web/20010911203610/http://a.example.org HTTP/1.1 Host: arxiv.example.net Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close Step 6 : UA <-- HTTP 200; Memento-Datetime: Tj; Link: URI-R, URI-T, URI-G, URI-M0, URI-Mn, URI-Mi, URI-Mj, URI-Mk ---- URI-Mj HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:09:40 GMT Server: Apache-Coyote/1.1 Memento-Datetime: Tue, 11 Sep 2001 20:36:10 GMT Link: <http://a.example.org>; rel="original", <http://arxiv.example.net/web/20000915112826/http://a.example.org> ; rel="first memento"; datetime="Tue, 15 Sep 2000 11:28:26 GMT", <http://arxiv.example.net/web/20080708093433/http://a.example.org> ; rel="last memento"; datetime="Tue, 08 Jul 2008 09:34:33 GMT", <http://arxiv.example.net/timemap/http://a.example.org> ; rel="timemap"; type="application/link-format", <http://arxiv.example.net/timegate/http://a.example.org> ; rel="timegate", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="memento"; datetime="Tue, 11 Sep 2001 20:36:10 GMT", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="prev memento"; datetime="Tue, 11 Sep 2001 20:30:51 GMT", <http://arxiv.example.net/web/20010911203610/http://a.example.org> ; rel="next memento"; datetime="Tue, 11 Sep 2001 20:47:33 GMT" Content-Length: 23364 Content-Type: text/html;charset=utf-8 Connection: close