DECADE | R. Alimi, Ed. |
Internet-Draft | |
Intended status: Informational | A. Rahman, Ed. |
Expires: February 03, 2012 | InterDigital Communications, LLC |
Y. R. Yang, Ed. | |
Yale University | |
August 02, 2011 |
A Survey of In-network Storage Systems
draft-ietf-decade-survey-05
This document surveys deployed and experimental in-network storage systems and describes their applicability for DECADE.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 03, 2012.
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
DECADE (DECoupled Application Data Enroute) is an architecture that provides applications with access to in-network storage. With access to in-network storage, content distribution applications can be designed to place less load on network infrastructure, such as last-mile links. See [I-D.ietf-decade-problem-statement] for further discussion.
A major motivation for DECADE is the substantial increase of capacity and reduction in cost offered by storage systems. For example, over the last two decades, there has been at least a 30-fold increase in the amount of storage that you can get for a given price (for flash memory and hard disk drives) [StorageTrends_1], [StorageTrends_2].
High-capacity and low-cost in-network storage devices introduces substantial opportunities. One example of in-network storage is content caches supporting Web and Peer-to-Peer (P2P) content. Different from existing content caches whose control fully reside at the owners of the caching devices, DECADE also allows applications to control access to their allocated in-network storage, as well as the resources consumed while accessing that storage (bandwidth, connections, storage space). While designed in the context of P2P applications, it may be useful to other applications as well. This document provides details on deployed and experimental in-network storage solutions, and evaluates their suitability for DECADE.
We note that the survey presented in this document is only representative of the research in this area. Rather than trying to enumerate an exhaustive list, we have chosen some typical techniques that lead to derivative works.
This document uses terms defined in [I-D.ietf-decade-problem-statement].
In-network storage has been used previously in numerous scenarios to reduce network traffic and enable more efficient content distribution. This section presents a brief history of content distribution techniques and illustrates how DECADE relates to past approaches. Systems have been developed with particular use cases in mind. Thus, this survey is not meant to point out shortcomings of existing solutions, but rather to indicate where certain capabilities required in DECADE [I-D.ietf-decade-reqs] are not provided by existing systems.
In the early stage of Internet development, most Web content was stored at a central server and clients requested Web content from the central server. In this architecture, the central server was required to provide a large amount of bandwidth. Web browsing is still a primary activity on today's Internet. As more and more users access Web content, a central server can become overloaded. The use of web caches is one technique to reduce load on a central server. Web caches store frequently-requested content, and provide bandwidth for serving the content to clients.
The ongoing growth of broadband technology in the worldwide market has been driven by the hunger of customers for new multimedia services as well as Web content. In particular, the use of audio and video streaming formats has become common for delivery of rich information to the public - both residential and business.
To overcome this challenge of massive multimedia consumption, just installing more Web cache will not be enough. Moving content closer to the consumer results in greater network efficiency, improved QoS, and lower latency, while facilitating personalization of content through broadband content applications. In these edge technologies, CDN is a representative technique. Content Delivery Networks (CDN) are based on a large-scale distributed network of servers located closer to the edges of the Internet for efficient delivery of digital content including various forms of multimedia content.
Although CDN is an effective means of information access and delivery, there are two barriers to making CDN a more common service: cost and replication integrity. Deploying a CDN for publicly available content is expensive. It requires administrative control over nodes with large storage capacity at geographically dispersed locations with adequate connectivity. CDN can be scalable, but due to this administrative and cost overhead, not rapidly deployable for the common user.
The emergence and maturation of P2P has allowed improvements to many network applications. P2P allows the use of client resources, such as CPU, memory, storage, and bandwidth, for serving content. This can reduce the amount of resources required by a content provider. Multimedia content delivery using various P2P or peer-assisted frameworks has been shown to greatly reduce the dependence on CDN and central content servers. However, the popularity of P2P applications has resulted in increased traffic on ISP networks. P2P caches (both transparent and non-transparent) have been introduced as a way to reduce the burden. Though they can be effective in reducing traffic in certain areas of ISP networks, P2P caches have their shortcomings. In particular, they are application-dependent and thus difficult to keep up-to-date with new and evolving P2P application protocols. Second, applications may benefit from explicit control of in-network storage, which P2P caches do not provided. See [I-D.ietf-decade-problem-statement] for further discussion.
DECADE aims to provide a standard protocol allowing P2P applications (including Content Providers) to make use of in-network storage to reduce the traffic burden on ISP networks, while enabling P2P applications to control access to content they have placed in in-network storage.
Before surveying individual technologies, we describe the basic components of in-network storage. For consistency and for ease of comparison, we use the same model to evaluate each storage technology in this document.
Note that the network protocol(s) used by a given storage system are also an important part of the design. We omit details of particular protocol choices in this document.
A set of operations available to a client user for accessing data in the in-network storage. Solutions typically allow both read and write, though the mechanisms for doing so can differ drastically.
Storage systems may provide users the ability to manage stored content. For example, operations such as delete and move may be provided to users. In this survey, we focus on data management operations that are provided to client users and omit those provided to system administrators.
Some storage systems may provide the capability to search or enumerate content that has been stored. In this survey, we focus on search capabilities that are provided to client users and omit those provided to system administrators. An example of a client search would be to find out the list of items stored by the given user over a given period of time.
Storage systems typically allow a client user, content owner or some other entity to define the access policies for the in-network storage. The in-network storage system then checks the authorization of a user before it stores or retrieves content. We define three types of access control authorization: public-unrestricted, public-restricted, and private.
Public-unrestricted refers to content on an in-network storage system that is widely available to all clients (i.e., without restrictions). An example is accessing Wikipedia on the Web, or anonymous access to FTP sites.
Public-restricted refers to content on an in-network storage system that is available to a restricted (though still potentially large) set of clients, but which do not require any confidential credentials from the client. An example is some content (e.g., a TV show episode) on the Internet that can only be viewable in selected countries or networks (i.e., white/black lists or black-out areas).
Private refers to content on an in-network storage system that is only made available to one or more clients presenting the required confidential credentials (e.g., password or key). This content is not available to anyone without the proper confidential access credentials.
Note that a combination of access control types may be applicable for a given scenario. For example, the retrieval (read) of content from an in-network storage system may be public-unrestricted, but the storage (write) to the same system may be private.
This is the interface through which users manage the resources on in-network storage that can be used by other peers, e.g., the bandwidth or connections. The storage system may also allow users to indicate a time for which resources are granted.
Users use the discovery mechanism to find location of in-network storage, find access interface or resource control interface or other interfaces of in-network storage.
Storage systems may use the following modes of storage: file system, object-based, or block-based.
A file system typically organizes files into a hierarchical tree structure. Each level of the hierarchy normally contains one or more directories each with one or more files. A file system may also be flat or use some other organizing principle.
We define an object-based storage mode as one which stores discrete chunks of data (e.g., IP datagrams or another type of aggregation useful to an application) without a pre-defined hierarchy or meta structure.
We define a block-based storage mode as one which stores a raw sequence of bytes, with a client being able to read and/or write data at offsets within that sequence. Data is typically accessed in blocks for efficiency. An common example for this storage mode is raw access to a hard disk.
In this survey, we define Storage Mode to refer to how data is structured within the system, which may not be the same as how it is accessed by a client. For example, a caching system may cache objects with hierarchical names, but may internally use an object-based Storage Mode.
This section surveys in-network storage systems using the methodology defined above. The survey includes some systems that are widely deployed today, some systems that are just being deployed, and some experimental/futuristic systems. The survey covers both traditional client-server architectures and P2P architectures. The surveyed systems are listed in alphabetical order. Also, for each system, a brief explanation is given of the relevance to DECADE.
Amazon S3 (Simple Storage Service) [AmazonS3] provides an online storage service using web (HTTP) interfaces. Users create buckets, and each bucket can contain stored objects. Users are provided an interface through which they can manage their buckets. Amazon S3 is a popular backend storage for other services. Other related storage services is the Blob Service provided by Windows Azure [Azure], and the Google Storage for Developers [GoogleStorage].
Very widely used (deployed) example of in-network storage. Amazon leases the storage to third party companies for disparate services. In particular, Amazon S3 has a rich model for authorization (using signed queries) to integrate with a wide variety of use cases. A focus for Amazon S3 is scalability. Particular simplifications that were made are the absence of a general, hierarchical namespace and the inability to update the contents of existing data.
Users can read, and write objects.
Users can delete previously-stored objects.
Users can list contents of buckets to find objects matching desired criteria.
All methods of access control are supported for clients: public-unrestricted, public-restricted and private.
For example, access to stored objects can be restricted by owner, a list of other Amazon Web Service users, all Amazon Web Service Users, or open to all users (anonymous access). Another option is for the owner to generate and sign a query (e.g., a query to read an object) that can be used by any user until an owner-defined expiration time.
Not provided.
Users are provided a well-known DNS name (either a default provided by Amazon, or one customized by a particular user). Users accessing S3 storage use DNS to discover an IP address where S3 requests can be sent.
Object-based, with the extension that objects can be organized into user-defined buckets.
BranchCache [BranchCache] is a feature integrated into Windows (Windows 7 and Windows Server 2008R2) that aims to optimize enterprise branch office file access over the WAN links. The main goals are to reduce WAN link utilization and improve application responsiveness by caching and sharing content within a branch while still maintaining end-to-end security. BranchCache allows files retrieved from the web servers and file servers located in headquarters or datacenters to be cached in remote branch offices, and shared among users in the same branch accessing the same content. BranchCache operates transparently by instrumenting the HTTP and SMB components of the networking stack. It provides two modes of operation: Distributed Cache and Hosted Cache.
In both modes, a client always contacts a BranchCache-enabled content server first to get the content identifiers for local search. If the content is cached locally, the client then retrieves the content within the branch. Otherwise, the client will go back to the original content server to request the content. The two modes differ in how the content is shared.
In the Hosted Cache mode, a locally provisioned server acts as a cache for files retrieved from the servers. After getting the content identifiers, the client first consults the cache for the desired file. If it is not present in the cache, the client retrieves it from the content server and sends it to the cache for storage.
In the Distributed Cache mode, a client first queries other clients in the same network using the Web Services Discovery multicast protocol. As in the Hosted Cache mode, the client retrieves the file from the content server if it is not available locally. After retrieving the file (either from another client or the content server), the client stores the file locally.
The original content server always authorizes requests from clients. Cached content is encrypted, and clients can only decrypt the data using keys derived from metadata returned by the content server. In addition to instrumenting the networking stack at clients, content servers must also support BranchCache.
BranchCache is an example of an in-network storage system primarily targeted at enterprise networks. It supports both a P2P like mode (Distributed Cache) as well as a client-server mode (Hosted Cache). Integration into the Microsoft OS will ensure wide distribution of this in-network storage technology.
Clients transparently retrieve (read) data from a cache (other clients or a Hosted Cache) since it operates by instrumenting the networking stack. In Hosted Cache mode, clients write data to the Hosted Cache once it is retrieved from the content server.
Not provided.
Not provided.
Access control method for clients is private. For example, transferred content is encrypted, and can only be decrypted by keys derived from data received from the original content server. Though data may be transferred to unauthorized clients, end-to-end security is maintained by only allowing authorized clients to decrypt the data.
The storage capacity of caches on the clients and Hosted Caches are configurable by system administrators. The Hosted Cache further allows configuration of the maximum number of simultaneous client accesses. In the Distributed Caching mode, exponential back-off and throttling mechanisms are utilized to prevent reply storms of popular content requests. The client will also spread data block access among multiple serving clients that have the content (complete or partial) to improve latency and provide some load balancing.
The Distributed Cache mode uses multicast for discovery of other clients and content within a local network. Currently, the Hosted Cache mode uses policy provisioning or manual configuration of the server used as the Hosted Cache. In this mode, the address of the server may be found via DNS.
Object-based.
Cache-and-Forward (CNF) [PRDW08] is an architecture for content delivery services for the future Internet. In this architecture, storage can be exploited at nodes within the network, either directly at routers or deployed nearby the routers. CNF is based on the concept of store-and-forward routers with large storage, providing for opportunistic delivery to occasionally disconnected mobile users and for in-network caching of content. The proposed CNF protocol uses reliable hop-by-hop transfer of large data files between CNF routers in place of an end-to-end transport protocol like TCP.
An example of an experimental in-network storage system that would require storage space on (or near) a large number of routers in the Internet if it was deployed. As the name of the system implies, it would provide short term caching and not long term network storage.
Users implicitly store content at Cache-and-forward routers by requesting files. End hosts read content from in-network storage by submitting queries for content.
Not provided.
Not provided.
Access control method is public-restricted (to any client which is part of the cache-and-forward network).
Not provided.
A query including a location-independent content ID is sent to the network, and routed to a Cache-and-forward router, which handles retrieval of the data and forwarding to the end host.
Object-based (with objects representing individual files). The architecture proposes to cache large files in storage within the network, though objects could be made to represent smaller chunks of larger files.
The Cloud Data Management Interface (CDMI) is a specification to access and manage cloud storage. CDMI is specified by the Storage Networking Industry Association (SNIA).
CDMI is a functional interface that applications can use to create, retrieve, update and delete data elements from the cloud. As part of this interface the client will be able to discover the capabilities of the cloud storage offering and use this interface to manage containers and the data that is placed in them. In addition, metadata can be set on containers and their contained data elements through this interface [CDMI].
CDMI follows a traditional client server model, and operates over an HTTP interface using the Representational State Transfer (REST) model. Similar to Amazon S3 buckets (see Section 4.1), users may create containers into which data objects may be stored. Even though data objects may be accessed via a user-defined name within a container, it is also possible to access data objects by a storage-defined Object ID which is provided in the response upon creation of a Data Object.
CDMI is an important initiative to standardize storage interfaces for cloud services which are rapidly becoming an important storage service. In particular, it specifies a set of operations for creating, reading, writing, and managing data objects at a remote server (or set of servers) via the HTTP protocol.
Users can read and write data objects, and also update data in existing data objects. CDMI-specific operations are specified in which data objects are embedded as fields inside of a JavaScript Object Notation (JSON) object, but the protocol also defines interfaces in which the contents of data objects can be written via simple HTTP GET/PUT operations.
Users can delete already-existing data objects. The create operation also supports modes in which the created object is copied or moved from an existing data object.
Data System Metadata also allows users to configure policies regarding time-to-live after which a data object is automatically deleted, as well as the redundancy with which a data object is stored.
Users may list the contents of containers to locate data objects matching any desired criteria.
All methods of access control for clients are supported: public-unrestricted, public-restricted and private.
In particular CDMI allows access to data objects to be protected by ACLs which can allow or restrict access based on user, group, administrative status, or whether a user is authenticated or anonymous.
CDMI supports attributes 'cdmi_max_latency' and 'cdmi_max_throughput' (set at either the level of containers, or a specific data object) which control the level of service offered to any users accessing a particular data object.
Users are provided a well-known DNS name. The DNS name is resolved to determine the IP address to which requests may be sent.
Object-based, with the extension that objects can be organized into user-defined containers.
A Content Delivery Network (CDN) provides services that improve network performance by maximizing bandwidth, improving accessibility and maintaining correctness through content replication. They offer fast and reliable applications and services by distributing content to cache or edge servers located close to users. See [PR07] for an additional taxonomy and survey.
A CDN has some combination of content-delivery, request-routing, distribution and accounting infrastructure. The content-delivery infrastructure consists of a set of edge servers (also called surrogates) that deliver copies of content to end-users. The request-routing infrastructure is responsible for directing client requests to appropriate edge servers. It also interacts with the distribution infrastructure to keep an up-to-date view of the content stored in the CDN caches. The distribution infrastructure moves content from the origin server to the CDN edge servers and ensures consistency of content in the caches. The accounting infrastructure maintains logs of client accesses and records the usage of the CDN servers. This information is used for traffic reporting and usage-based billing.
In practice, a CDN typically hosts static content including images, video, media clips, advertisements, and other embedded objects for Web viewing. A focus for CDNs is the ability to publish and deliver content to end-users in a reliable and timely manner. A CDN focuses on building its network infrastructure to provide the following services and functionalities: storage and management of content; distribution of content among surrogates; cache management; delivery of static, dynamic and streaming content; backup and disaster recovery solutions; and monitoring, performance measurement and reporting.
Examples of existing CDNs are Akamai, Limelight, and CloudFront.
The following description uses the term "content provider" to refer to the entity purchasing a CDN service, and the term "client" to refer to the subscriber requesting content via the CDN from the content provider.
Very widely used (deployed) example of in-network storage for multimedia content. The existence and operation of the storage is totally transparent to the end user. A CDN typically require a strong business relationship between the content providers and content distributors and often the business relationship extends to the ISPs.
A CDN is typically a closed system, and generally provides only read (retrieve) access interface to clients. A CDN typically does not provide write (store) access interface to clients. The content provider can access network edge servers and store content on them. Or edge servers can retrieve content from content providers. Client nodes can just retrieve content from edge servers.
A content provider can manage the data distributed in different cache nodes, such as moving popular data objects from one cache node to another cache node, or deleting rarely-accessed data objects in cache nodes. Client user nodes, however, have no right to perform these operations.
A content provider can search or enumerate the data each cache node stores. Client user nodes cannot perform search operations.
All methods of access control (for reading) are supported for clients: public-unrestricted, public-restricted and private. Some CDN edge servers will allow usage of HTTP basic authentication with the origin server, restrictions by IP address, or they can use a token-based technique to allow the origin server to apply its own authorization criteria.
Also as mentioned previously, clients typically cannot write to the CDN. Writing is typically a private operation for the content providers.
Not provided.
Content providers can directly find internal CDN cache nodes to store content, since they typically have an explicit business relationship. Clients can locate CDN nodes through DNS or other redirection mechanism.
Though addressing objects uses URLs which typically refer to objects in a hierarchical fashion, the storage mode is typically object-based.
The Delay-Tolerant Network (DTN) [RFC4838] is an evolution of an architecture originally designed for the Interplanetary Internet. The Interplanetary Internet is a communication system envisioned to provide Internet-like services across interplanetary distances in support of deep space exploration. The DTN architecture can be utilized in various operational environments characterized by severe communication disruptions, disconnections and high-delays (e.g., a month long loss of connectivity between two planetary networks because of high solar radiation due to sun spots). The DTN architecture is thus suitable for environments including deep space networks, sensor-based networks, certain satellite networks and underwater acoustic networks.
A key aspect of the DTN is a store and forward overlay layer called the "Bundle Protocol" or "Bundle Layer" that exists between the transport and application layers [RFC5050]. The Bundle Layer forms a logical overlay that employs persistent storage to help combat long term network interruptions by providing a store and forwarding service. While traditional IP networks are also based on store and forward principles, the amount of time of a packet being kept in "storage" at a traditional IP router is typically in the order of milli-seconds (or less). In contrast, the DTN architecture assumes that most Bundle Layer nodes will use some form of persistent storage (e.g., hard disk, flash memory, etc.) for DTN packets because of the nature of the DTN environment.
An example of an experimental in-network storage system that would require fundamental changes to the Internet protocols.
Users implicitly cause content to be stored (until successfully forwarded) at Bundle Layer nodes by initiating/terminating any transaction that traverses the DTN.
Users can implicitly cause deletion of content stored at Bundle Layer nodes via a "Time To Live" type parameter that the user can control (for transactions originating from the user).
Not provided.
Access control method is public-restricted (to any client which is part of the DTN) or private.
Not provided.
A Uniform Resource Identifier (URI) approach is used as the basis of the addressing scheme for DTN transactions (and subsequent store and forward routing through the DTN network).
Object-based. DTN applications send data to the Bundle Layer which then breaks the data into segments. These segments are then routed through the DTN network, and stored in Bundle Layer nodes as required (before being forwarded).
Named Data Networking (NDN) [NDN] is a research initiative which proposes to move to a new model of addressing and routing for the Internet. NDN uses "named data" based routing and forwarding, to replace the current IP address based model. NDN also uses name-based data caching in the routers.
Each NDN Data packet will be assigned a content name and will be cryptographically signed. Data delivery is driven by the requesting end. Routers disseminate name-based prefix announcements by using routing protocols like Intermediate System to Intermediate System (IS-IS) or Border Gateway Protocol (BGP). The requester will send out an "Interest" packet which identifies the name of the data that it wants. Routers that receive this Interest packet will remember the interface it came from and will then forward it on a named-based routing protocol. Once an Interest packet reaches a node that has the desired data, a named Data packet is sent back, which carries both the name and content of the data, along with a digital signature of the producer. This named Data packet is then forwarded back to the original requester on the reverse path of the Interest packet [NDN_Proposal].
A key aspect of NDN is that router have the capability to cache the named data. If a request for the same data (i.e., same name) comes to the router, then the NDN router will forward the named data stored locally to fulfill the request. The proponents of NDN believe that the network can be designed naturally matching data delivery characteristics instead of communication between endpoints because data delivery has become the primary use of the network.
An example of an experimental in-network storage system that would require storage space on a large number of routers in the Internet. Named Data packets would be kept in storage in the NDN routers and provided to new requesters of the same data.
Users implicitly store content at NDN routers by requesting content (named Data packets) from the network. Subsequent requests by different users for the same content will cause the named Data packets to be read from the NDN routers in-network storage.
Users do not have the direct ability to delete content stored in the NDN routers. However, there will be some type of "Time To Live" parameter associated with the named Data packets though this has not yet been specified.
Not provided.
All methods of access control for clients are supported: public-unrestricted, public-restricted and private.
The basic security mechanism in NDN is for the sender to digitally sign the content (named Data packet) that it sends. It is envisioned that a complete access control system can be built on top of this though this has not yet been specified.
Not provided.
Names are used as the basis of the addressing and discovery scheme for NDN (and subsequent store and forward routing through the NDN network). NDN names are assumed to be hierarchical and to be able to be deterministically constructed. This is still an active area of research.
Object-based. NDN sends named Data packets through the network. These Data packets are routed through the NDN network, and stored in NDN routers.
Similar to NDN (see Section 4.7), Network of Information (NetInf) [NetInf] is another information centric approach in which the named data objects are the basic component of the networking architecture. NetInf is thus moving away from today's host centric networking architecture where the nodes in the network are the primary objects. In today's network the information objects are named relative to the hosts they are stored on (e.g., http://www.example.com/information-object.txt).
The NetInf naming and security framework builds the foundation for an information centric security model that integrates security deeply into the architecture. In this model, trust is based on the information itself. Information Objects (IOs) are given a unique name with cryptographic properties. Together with additional metadata, the name can be used to verify the data integrity as well as several other security properties, like self-certification, name persistency, and owner authentication and identification. The approach also gives some benefits over the security model in today's host centric networks, as it minimizes the need for trust in the infrastructure, including the hosts providing the data, the channel, or the resolution service.
In NetInf the information objects are published into the network. They are registered with a Name Resolution Service (NRS). The NRS is also used to register network locators that can be used to retrieve data objects that represent the published IOs. When a receiver wants to retrieve an IO, the request for the IO is resolved by the NRS into a set of locators. These locators are then used to retrieve a copy of the data object from the "best" available source(s). NetInf is open to use any type of underlying transport networks. The locators can thus be a heterogeneous set, e.g., IPv4, IPv6, MAC, etc.
NetInf will make extensive use of caching of information objects in the network and will provide network functionality that is similar to what overlay solutions like Content Distribution Networks (CDN) and p2p distribution networks (e.g., BitTorrent) provide today.
An example of an experimental information centric network architecture that will require storage space for storage and caching of information objects on a large number of NetInf nodes in the Internet.
Users will publish IOs with specific IDs into the network. This is done by the client sending a register message to the NRS stating that the IO with the specific ID is available. When another user wishes to retrieve the IO they will use the given ID to make a request for the IO. The ID is then resolved by the NRS and the IO is delivered from a nearby in-network storage location.
Users do not have the direct ability to delete content stored in the NetInf nodes. However, there can be some type of "Time To Live" parameter associated with the information objects though this has not yet been specified.
Not provided.
All methods of access control for clients are supported: public-unrestricted, public-restricted and private. The basic security mechanism in NetInf is for the publisher to digitally sign the content of the information object that it publish. It is envisioned that a complete access control system can be built on top of this though this has not yet been specified.
Not provided.
NetInf IDs are used for naming and accessing information objects. The IDs are resolved by the NRS into locators that are used for routing and transport of data through the transport networks. This is still an active area of research.
Object-based. From an application perspective NetInf can be used for publishing entire files or chunks of files. NetInf is agnostic to the application perspective and treats everything as information objects.
Redundancy Elimination (RE) (e.g., [AVA09]) is used for identifying and removing repeated content from network transfers. This technique has been proposed to improve network performance in many types of networks, such as ISP backbones and enterprise access links. One example of redundancy elimination proposal is SmartRE, proposed by Anand et al., which focuses on network-wide redundancy elimination. In packet-level redundancy elimination, forwarding elements are equipped with additional storage which can be used to cache data from forwarded packets. Upstream routers may replace packet data with a fingerprint that tells a downstream router how to decode and reconstruct the packet based on cached data.
An example of an experimental in-network storage system that would require a large amount of associated packet processing at routers if it was ever deployed.
Redundancy-elimination are typically transparent to the user. Writing into the storage is done by transferring data that has not already been cached. Storage is read when users transmit data identical to previously-transmitted data.
Not provided.
Not provided.
Access control method is public-restricted (to any client which is part of the RE network). Note that the content provider still retains control over which peers receive the requested data. The returned data is "compressed" as it is transferred within the network.
Not provided. The content provider still retains control over the rate at which packets are sent to a peer. The packet size within the network may be reduced.
No discovery mechanism is necessary. Routers can use redundancy-elimination without the users' knowledge.
Object-based, with "objects" being data from packets transmitted within the network.
OceanStore [OceanStore] is a storage platform developed at University of California, Berkeley, that provides globally-distributed storage. OceanStore implements a model where multiple storage providers can pool resources together. Thus, a major focus is on resiliency and self-organization and self-maintenance.
The protocol is resilient to some storage nodes being compromised by utilizing Byzantine agreement and erasure codes to store data at primary replicas.
An example of an experimental in-network storage system that provides a high degree of network resilience to failure scenarios.
Users may read and write objects
Objects may be replaced by newer versions, and multiple versions of an object may be maintained.
Not provided.
Provided, but specifics for clients are unclear from the available references.
Not provided.
Users require an entry-point into the system in the form of one storage node that is part of OceanStore. If a hostname is provided, the address of a storage node may be determined via DNS.
Object-based.
There are a growing number of popular on line Photo Sharing (storing) systems. For example, the Kodak Gallery system [KodakGallery01] serves over 60 million users and stores billions of images [KodakGallery02]. Other well known examples of Photo Sharing systems include Flickr [Flickr] and ImageShack [ImageShack]. Also there are a number of popular blogging services, such as Tumblr [Tumblr], which specialize in also sharing large numbers of photos and other multimedia content (e.g., video, text, audio, etc.) as part of their service. All these in-network storage systems utilize both free and paid subscription models.
Most Photo Sharing systems are traditional client-server architecture. However, a minority of systems also offer a P2P mode of operation. The client-server architecture is typically based on HTTP with a browser client and a web server.
Very widely used (deployed) example of in-network storage where the end user has direct visibility and extensive control of the system. Typical end user interface is through a HTTP based web browser.
Users can read (view) and write (store) photos.
Users can delete previously stored photos.
Users can tag photos and/or organize them using sophisticated web photo album generators. Users can then search for objects (photos) matching desired criteria.
Access control method for clients is typically either private or public-unrestricted. For example, writing (storing) to a Photo blog is typically private to the owner of the account. However, all other clients can view (read) the contents of the blog (i.e., public-unrestricted). Some photo sharing websites provide private access to read photos to allow sharing with a limited set of friends.
Not provided.
Usually by manually logging on to a central web page for the service and entering the appropriate information to access the desired information. The address to which the client connects is usually determined by DNS using the hostname from the provided URL.
File-based. Photos are usually stored as files. They can then be organized into meta-structures (e.g., albums, galleries, etc.) using sophisticated web photo album generators.
Caching of P2P traffic is a useful approach to reduce P2P network traffic, because objects in P2P systems are mostly immutable and the traffic is highly repetitive. In addition, making use of P2P caches do not require changes to P2P protocols and can be deployed transparently from clients.
P2P caches operate similarly to web caches, in that they temporarily store frequently-requested content. Requests for content already stored in the cache can be served from local storage instead of requiring the data to be transmitted over expensive network links.
Two types of P2P caches exist: non-transparent P2P caches and transparent P2P caches. A non-transparent cache appears as a super peer; it explicitly peers with other P2P clients. For a transparent cache, once a P2P cache is established, the network will transparently redirect P2P traffic to the cache, which either serves the file directly or passes the request on to a remote P2P user and simultaneously caches that data. Transparency is typically implemented using deep packet inspection (DPI). DPI products identify and pass P2P packets to the P2P caching system so it can cache the traffic and accelerate it.
To enable operation with existing P2P software, P2P caches directly support P2P application protocols. A large number of P2P protocols are used by P2P software, and hence are supported by caches, leading to higher complexity. Additionally, these protocols evolve over time, and new protocols are introduced.
An example of in-network storage for P2P systems. However, unlike DECADE, the existence and operation of the storage is totally transparent to the end user.
Data Access Interface allows P2P content to be cached (stored) and supplied (retrieved) locally such that network traffic is reduced, but it is transparent to P2P users, and P2P users implicitly use the data-access interface (in the form of their native P2P application protocol) to store or retrieve content.
Not provided.
Not provided.
Access control method is typically public-restricted (to any client which is part of the P2P channel or swarm).
Not provided.
Use of Deep Packet Inspection (DPI) means no discovery mechanism is provided to P2P users, it is transparent to P2P users. Since DPI is used to recognize P2P applications' private protocols, P2P Cache implementations must be updated as new applications are added and existing protocols evolve.
Object-based. Chunks (typically, the unit of transfer amongst P2P clients) of content are stored in the cache.
Data Access Interface allows P2P content to be cached (stored) and supplied (retrieved) locally such that network traffic is reduced. P2P users implicitly store and retrieve from the cache using the P2P application's native protocol.
Not provided.
Not provided.
Access control method is typically public-restricted (to any client which is part of the P2P channel or swarm)
Not provided.
A cache pretends to be normal peers to join the P2P overlay network. Other P2P users can find these cache nodes through overlay routing mechanism, just looking to them as normal neighbor nodes.
Object-based. Chunks (typically, the unit of transfer amongst P2P clients) of content are stored in the cache.
Usenet is a distributed Internet based discussion (message) system. The Usenet messages are arranged as a set of "newsgroups" that are classified hierarchically by subject. Usenet information is distributed and stored among a large conglomeration of servers that store and forward messages to one another in so called news feeds. Individual users may read messages from and post messages to a local news server typically operated by an ISP. This local server communicates with other servers and exchanges articles with them. In this fashion, the message is copied from server to server and eventually reaches every server in the network [Usenet].
Traditional Usenet as described above operates as a P2P network between the servers, and in a client-server architecture between the user and their local news server. The user requires a Usenet client to be installed on their computer and a Usenet server account (through their ISP). However, with the rise of web browsers the Usenet architecture is evolving to be web based. The most popular example of this is Google Groups where Google hosts all the newsgroups and client access is via a standard HTTP based web browser [GoogleGroups].
A historically very important and widely used (deployed) example of in-network storage in the Internet. The use of this system is rapidly declining but efforts have been made to preserve the stored content for historical purposes.
Users can read and post (store) messages.
Users sometimes have limited ability to delete messages that they previously posted.
Traditionally, users could manually search through the newsgroups as they are classified hierarchically by subject. In the newer web based systems there is also automatic search capability based on key word matches.
Access control method is either public-unrestricted or private (to client members of that newsgroup).
Not provided.
Usually by manually logging on to the Usenet account. DNS may be used to resolve hostnames to their corresponding addresses.
File system. Messages are usually stored as files which are then organized hierarchically by subject into newsgroups.
Web cache [GH09] has been widely deployed by many ISPs to reduce bandwidth consumption and web access latency since the late 1990s. A web cache can cache the web documents (e.g., HTML pages, images) between server and client to reduce bandwidth usage, server load, and perceived lag. A web cache server is typically shared by many clients, and stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.
Another form of cache is a client-side cache, typically implemented in web browsers. A client side cache can keep a local copy of all pages recently displayed by browser, and when the user returns to one of these web pages, the local cached copy is reused.
A related protocol for P2P applications to use web cache is HPTP (HTTP based Peer to Peer) [GYZ07]. It proposes to share chunks of P2P files/streams using HTTP protocol with cache-control headers.
Very widely used (deployed) example of in-network storage for the key Internet application of Web browsing. The existence and operation of the storage is transparent to the end user in most cases. The content caching time is controlled by Time To Live parameters associated with the original content. The principle of web caching is to speed up web page reading by using (the same) content previously requested by a preceding user to service a new user.
Users explicitly read from a web cache by making requests, but they cannot explicitly write data into it. Data is implicitly stored into the web cache by requesting content that is not already cached and meets policy restrictions of the cache provider.
Not provided.
Not provided.
Access control method for clients is public-unrestricted. It is important to note that if content is authenticated or encrypted (e.g., HTTPS, SSL) it will not be cached. Also if the content is flagged as private (vs public) at the HTTP level by the origin server it will not be cached.
Not provided.
Web Caches can be transparently deployed between Web Server and Web Clients, employing DPI for discovery. Alternatively, web caches could be explicitly discovered by clients using techniques such as DNS or manual configuration.
Object based. Web content is keyed within the cache by HTTP Request fields, such as Method, URI, and Headers.
This following observations about the surveyed In-Network storage systems are made in the context of DECADE as defined by [I-D.ietf-decade-problem-statement].
The majority of the surveyed systems were designed for client-server architectures and do not support P2P. However, there are some important exceptions, especially for some of the newer technologies such as BranchCache and P2P Cache which do support a P2P mode.
The P2P cache systems are interesting since they do not require changes to P2P applications themselves. However, this is also a limitation in that they are required to support each application protocol.
Many of the surveyed systems were designed for caching as opposed to long term network storage. Thus during DECADE protocol design it should be carefully considered if a caching mode should be supported in addition to a long term network storage mode. There is typically a trade-off between providing a caching mode and long-term (and usually also reliable) storage with regards to some performance metrics. Note that [I-D.ietf-decade-problem-statement] identifies issues with classical caching from a DECADE perspective such as the fact that P2P caches typically do not allow users to explicitly control content stored in the cache.
Certain components of the surveyed systems are outside of the scope of DECADE. For example, a protocol used for searching across multiple DECADE servers is out of scope. However, applications may still be able to implement such functionality if DECADE exposes the appropriate primitives. This has the benefit of keeping the core in-network storage systems simple, while permitting diverse applications to design mechanisms that meet their own requirements.
Today, most in-network storage systems follow some variant of the authorization model of public-unrestricted, public-restricted, and private. For DECADE, we may need to evolve the authorization model to support a resource owner (e.g., end user) authorization, in addition to the network authorization.
This section surveys existing storage and other related protocols, as well as comments on the usage of these protocols to satisfy DECADE's use cases. The surveyed protocols are listed alphabetically.
HTTP [RFC2616] is a key protocol for the World Wide Web. It is a stateless client-server protocol that allows applications to be designed using the REST model. HTTP is often associated with downloading (reading) content from web servers to web browsers, but it also has support for uploading (writing) of content to web servers. It has been used as the underlying protocol for other protocols such as WebDAV.
HTTP is used in some of the most popular in-network storage systems surveyed previously including CDNs, Photo Sharing, and Web Cache. Usage of HTTP by a storage protocol implies that no extra SW is required in the client (i.e., web based client) as all standard Web browsers are based on HTTP.
Basic read and write operations are supported (using HTTP GET, PUT and POST methods).
Not provided.
Not provided
All methods of access control for clients are supported: public-unrestricted, public-restricted and private.
The majority of web pages are public-unrestricted in terms of reading but do not allow any uploading of content. In-network storage systems range from private or public-unrestricted for Photo Sharing described in Section 4.11.5 to public-unrestricted for Web Caching described in Section 4.14.5.
Not provided.
Manual configuration is typically used. Clients typically address HTTP servers by providing a hostname, which is resolved to an address using DNS.
HTTP is a protocol, it thus does not define a Storage Mode. However, a non-collection resource can typically be thought of as a "file"). These files may be organized into collections, which typically map on to the HTTP Path hierarchy, creating the illusion of a file system.
HTTP is based on a client-server architecture and thus is not directly applicable for the DECADE focus on P2P. Also, HTTP offers only a rudimentary toolset for storage operations compared to some of the other storage protocols.
Small Computer System Interface (SCSI) is a set of protocols enabling communication with storage devices such as disk drives and tapes; internet SCSI (iSCSI) [RFC3720] is a protocol enabling SCSI commands to be sent over TCP. As in SCSI, iSCSI allows an Initiator to send commands to a Target. These commands operate on the device level as opposed to individual data objects stored on the device.
Read and write commands indicate which data is to be read or written by specifying the offset (using Logical Block Addressing) into the storage device. The size of data to be read or written is an additional parameter in the command.
Since commands operate at the device level, management operations are different than with traditional file systems. Management commands for SCSI/iSCSI including explicit device control such as starting and stopping the device and formatting the device.
SCSI/iSCSI does not provide the ability to search for particular data within a device. Note that such capabilities can be implemented outside of iSCSI.
With respect to access to devices, the access control method is private. iSCSI uses CHAP [RFC1994] to authenticate initiators and targets when accessing storage devices. However, since SCSI/iSCSI operates at the device level, neither authentication nor authorization are provided for individual data objects. Note that such capabilities can be implemented outside of iSCSI.
Not provided.
Manual configuration may be used. An alternative is the internet Storage Name Service (iSNS) [RFC4171] provides the ability to discover available storage resources.
As a protocol, iSCSI does not explicitly have a storage mode. However, it provides block-based access to clients. SCSI/iSCSI provides an Initiator block-level access to the storage device.
The Network File System is designed to allow users to access files over a network in a manner similar to how local storage is accessed. NFS is typically used in local area network or enterprise settings, though changes made in later versions of NFS make it easier to operate over the Internet.
Traditional file-system operations such as read, write, and update (overwrite) are provided. Locking is provided to support concurrent access by multiple clients.
Traditional file-system operations such as move and delete are provided.
User has the ability to list contents of directories to find filenames matching desired criteria.
All methods of access control for clients are supported: public-unrestricted, public-restricted and private. For example, files and directories can be protected using read, write, and execute permissions for the files owner, group, and the public (others). Also, NFSv4.1 has a rich ACL model allowing a list of Access Control Entries (ACEs) to be configured for each file or directory. The ACEs can specify per-user read/write access to file data, file/directory attributes, creation/deletion of files in a directory, etc.
While disk space quotas can be configured, it typically limits the total amount of storage allocated to a particular user. User control of bandwidth and connections used by remote peers is not provided.
Manual configuration is typically used. Clients address NFS servers by providing a hostname and a directory that should be mounted. DNS may be used lookup an address for the provided hostname.
As a protocol, there is no defined internal storage mode. However, implementations typically use the underlying filesystem storage. Note that extensions have been defined for alternate storage modes (e.g., block-based [RFC5663] and object-based [RFC5664]).
Only ACEs that have a "who" that matches the requester are considered.
The efficiency and scalability of the NFS access control method is a concern in the context of DECADE. In particular, Section 6.2.1 of [RFC5661] states that:
Note that NFS v4.1's usage of RPCSEC_GSS provides support for multiple security mechanisms. Kerberos V5 is required, but others such as X.509 certificates are also supported by way of GSS-API. Note, however, that NFSv4.1's usage of such security mechanisms is limited to linking a requesting user to a particular account maintained by the NFS server.
OAuth [RFC5849] is a protocol that enriches the traditional client-server authentication model for web resources. In particular, OAuth distinguishes the "client" from the "resource owner", thus enabling a resource owner to authorize a particular client for access (e.g., for a particular lifetime) to private resources.
We include OAuth in this survey so that its authentication model can be evaluated in the context of DECADE. OAuth itself, however, is not a network storage protocol.
Not provided.
Not provided.
Not provided.
Not provided. While similar in spirit to the WebDAV ticketing extensions [I-D.ito-dav-ticket], OAuth instead uses the following process: (1) a client constructs a delegation request, (2) the client forwards the request to the resource owner for authorization, (3) the resource owner authorizes the request, and finally (4) a callback is made to the client indicating that its request has been authorized.
Once the process is complete, the client has a set of token credentials that grant it access to the protected resource. The token credentials may have an expiration time, and they can also be revoked by the resource owner at any time.
Not provided.
Not provided.
Not provided.
The ticketing mechanism requires server involvement and the discussion relating to WebDAV's proposed ticketing mechanism (see Section 5.5.8) applies here as well.
WebDAV [RFC4918] is a protocol designed for Web content authoring. It is developed as an extension to HTTP described in Section 5.1, meaning it can be simpler to integrate into existing software. WebDAV supports traditional operations for reading/writing from storage, as well as other constructs such as locking and collections which are important when multiple users collaborate to author or edit a set of documents.
Traditional read and write operations are supported (using HTTP GET and PUT methods, respectively). Locking is provided to ease concurrent access by multiple clients.
WebDAV supports traditional file-system operations such as move, delete and copy. Objects are organized into collections, and these operations can also be performed on collections. WebDAV also allows objects to have user-defined properties.
User has the ability to list contents of collections to find objects matching desired criteria. A SEARCH extension [RFC5323] has also been specified allowing listing of objects matching client-defined criteria.
All methods of access control for clients are supported: public-unrestricted, public-restricted and private.
For example, an ACL extension [RFC3744] is provided for WebDAV. ACLs allow both user- and group-based access control policies (relating to reading, writing, properties, locking, etc) to be defined for objects and collections.
A ticketing extension [I-D.ito-dav-ticket] has also been proposed, but has not progressed beyond an Internet Draft. This extension allows a client to request the WebDAV server to create a "ticket" (e.g., for reading an object) that can be distributed to other clients. Tickets may be given expiration times, or may only allow for a fixed number of uses. The proposed extension requires the server to generate tickets and maintain state for outstanding tickets.
An extension [RFC4331] allows disk space quotas to be configured for Collections. The extension also allows WebDAV clients to query current disk space usage. User control of bandwidth and connections used by remote peers is not provided.
Manual configuration is typically used. Clients address WebDAV servers by providing a hostname, which can be resolved to an address using DNS.
Though no storage mode is explicitly defined, WebDAV can be thought of as providing file-based storage to a client. A non-collection resource can typically be thought of as a "file". Files may be organized into collections, which typically map on to the HTTP Path hierarchy.
The efficiency and scalability of the WebDAV access control method is a concern in the context of DECADE, for similar reasons as stated in Section 5.3.8 for NFS. The proposed WebDAV ticketing extension partially alleviates this concern, but the particular technique may need further evaluation before being applied to DECADE. In particular, since DECADE clients may continuously upload/download a large number of small-size objects, and a single DECADE server may need to scale to many concurrent DECADE clients, requiring the server to maintain ticket state and generate tickets may not be the best design choice. Server-generated tickets can also increase latency for data transport operations depending on the message flow used by DECADE.
This following observations about the surveyed storage and related protocols are made in the context of DECADE as defined by [I-D.ietf-decade-problem-statement].
All of the surveyed protocols were primarily designed for client-server architectures and not for P2P. However, it is conceivable that some of the protocols could be adapted to work in a P2P architecture.
Several popular in-network storage systems today use HTTP as their key protocol even though it is not classically considered as a storage protocol. HTTP is a stateless protocol that is used to design RESTful applications. HTTP is a well supported and widely implemented protocol which can provide important insights for DECADE.
The majority of the surveyed protocols do not support low latency access for applications such as live streaming. This was one of the key general requirements for DECADE.
The majority of the surveyed protocols do not support any form of resource control interface. Resource control is required for users to manage the resources on in-network storage that can be used by other peers, e.g., the bandwidth or connections. Resource control is a key capability required for DECADE.
Nearly all surveyed protocols did however support the following capabilities which are required for DECADE: user ability to read/write content; some form of access control; some form of error indication; and ability to traverse firewalls and NATs.
Though there have been many successful in-network storage systems, they have been designed for use cases different from those defined in DECADE. For example, many of the surveyed in-network storage systems and protocols were designed for client-server architectures and not P2P. No surveyed system or protocol has the functionality and features to fully meet the set of requirements defined for DECADE. DECADE aims to provide a standard protocol for P2P applications and content provider to access and control in-network storage, resulting in increased network efficiency while retaining control over content shared with peers. Additionally, defining a standard protocol can reduce complexity of in-network storage since multiple P2P application protocols no longer need to be implemented by in-network storage systems.
This draft is a survey of existing in-network storage systems, and does not introduce any security considerations beyond those of the surveyed systems.
For more information on security considerations of DECADE, see [I-D.ietf-decade-problem-statement].
This document does not have any IANA Considerations.
The editors would like to thank the following people for contributing to the development of this document:
- ZhiHui Lu
- Borje Ohlman
- Pang Tao
- Lucy Yong
- Juan Carlos Zuniga
The editors would like to thank the following people for providing valuable comments to various versions of this document: David Bryan, Tao Mao, Haibin Song, Ove Strandberg, Yu-Shun Wang, Richard Woundy, Yunfei Zhang, and Ning Zong.