Internet-Draft | new-uuid-format | March 2022 |
Peabody & Davis | Expires 2 October 2022 | [Page] |
This document presents new Universally Unique Identifier (UUID) formats for use in modern applications and databases.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 2 October 2022.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Many things have changed in the time since UUIDs were originally created. Modern applications have a need to create and utilize UUIDs as the primary identifier for a variety of different items in complex computational systems, including but not limited to database keys, file names, machine or system names, and identifiers for event-driven transactions.¶
One area UUIDs have gained popularity is as database keys. This stems from the increasingly distributed nature of modern applications. In such cases, "auto increment" schemes often used by databases do not work well, as the effort required to coordinate unique numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring synchronization makes them a good alternative, but UUID versions 1-5 lack certain other desirable characteristics:¶
Non-time-ordered UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic.¶
The 100-nanosecond, Gregorian epoch used in UUIDv1 timestamps is uncommon and difficult to represent accurately using a standard number format such as [IEEE754].¶
Introspection/parsing is required to order by time sequence; as opposed to being able to perform a simple byte-by-byte comparison.¶
Privacy and network security issues arise from using a MAC address in the node field of Version 1 UUIDs. Exposed MAC addresses can be used as an attack surface to locate machines and reveal various other information about such machines (minimally manufacturer, potentially other details). Additionally, with the advent of virtual machines and containers, MAC address uniqueness is no longer guaranteed.¶
Many of the implementation details specified in [RFC4122] involve trade offs that are neither possible to specify for all applications nor necessary to produce interoperable implementations.¶
[RFC4122] does not distinguish between the requirements for generation of a UUID versus an application which simply stores one, which are often different.¶
Due to the aforementioned issue, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has lead to numerous implementations over the past 10+ years solving the same problem in slightly different ways.¶
While preparing this specification the following 16 different implementations were analyzed for trends in total ID length, bit Layout, lexical formatting/encoding, timestamp type, timestamp format, timestamp accuracy, node format/components, collision handling and multi-timestamp tick generation sequencing.¶
[LexicalUUID] by Twitter¶
[ShardingID] by Instagram¶
[Elasticflake] by P. Pearcy¶
[orderedUuid] by IT. Cabrera¶
An inspection of these implementations and the issues described above has led to this document which attempts to adapt UUIDs to address these issues.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The following UUIDs are hereby introduced:¶
RFC EDITOR PLEASE DELETE THIS SECTION.¶
draft-03¶
- Reworked the draft body to make the content more concise¶
- UUIDv6 section reworked to just the reorder of the timestamp¶
- UUIDv7 changed to simplify timestamp mechanism to just millisecond Unix timestamp¶
- UUIDv8 relaxed to be custom in all elements except version and variant¶
- Introduced Max UUID.¶
- Added C code samples in Appendix.¶
- Added test vectors in Appendix.¶
- Version and Variant section combined into one section.¶
- Changed from pseudo-random number generators to cryptographically secure pseudo-random number generator (CSPRNG).¶
- Combined redundant topics from all UUIDs into sections such as Timestamp granularity, Monotonicity and Counters, Collision Resistance, Sorting, and Unguessability, etc.¶
- Split Encoding and Storage into Opacity and DBMS and Database Considerations¶
- Reworked Global Uniqueness under new section Global and Local Uniqueness¶
- Node verbiage only used in UUIDv6 all others reference random/rand instead¶
- Clock sequence verbiage changed simply to counter in any section other than UUIDv6¶
- Added Abbreviations section¶
- Updated IETF Draft XML Layout¶
- Added information about little-endian UUIDs¶
draft-02¶
- Added Changelog¶
- Fixed misc. grammatical errors¶
- Fixed section numbering issue¶
- Fixed some UUIDvX reference issues¶
- Changed all instances of "motonic" to "monotonic"¶
- Changed all instances of "#-bit" to "# bit"¶
- Changed "proceeding" verbiage to "after" in section 7¶
- Added details on how to pad 32 bit Unix timestamp to 36 bits in UUIDv7¶
- Added details on how to truncate 64 bit Unix timestamp to 36 bits in UUIDv7¶
- Added forward reference and bullet to UUIDv8 if truncating 64 bit Unix Epoch is not an option.¶
- Fixed bad reference to non-existent "time_or_node" in section 4.5.4¶
draft-01¶
- Complete rewrite of entire document.¶
- The format, flow and verbiage used in the specification has been reworked to mirror the original RFC 4122 and current IETF standards.¶
- Removed the topics of UUID length modification, alternate UUID text formats, and alternate UUID encoding techniques.¶
- Research into 16 different historical and current implementations of time-based universal identifiers was completed at the end of 2020 in attempt to identify trends which have directly influenced design decisions in this draft document (https://github.com/uuid6/uuid6-ietf-draft/tree/master/research)¶
- Prototype implementation have been completed for UUIDv6, UUIDv7, and UUIDv8 in various languages by many GitHub community members. (https://github.com/uuid6/prototypes)¶
The variant bits utilized by UUIDs in this specification remain in the same octet as originally defined by [RFC4122], Section 4.1.1.¶
The next table details Variant 10xx (8/9/A/B) and the new versions defined by this specification. A complete guide to all versions within this variant has been includes in Appendix C.1.¶
Msb0 | Msb1 | Msb2 | Msb3 | Version | Description |
0 | 1 | 1 | 0 | 6 | Reordered Gregorian time-based UUID specified in this document. |
0 | 1 | 1 | 1 | 7 | Unix Epoch time-based UUID specified in this document. |
1 | 0 | 0 | 0 | 8 | Reserved for custom UUID formats specified in this document |
For UUID version 6, 7 and 8 the variant field placement from [RFC4122] are unchanged. An example version/variant layout for UUIDv6 follows the table where M is the version and N is the variant.¶
The UUID format is 16 octets; the variant bits in conjunction with the version bits described in the next section in determine finer structure.¶
UUID version 6 is a field-compatible version of UUIDv1, reordered for improved DB locality. It is expected that UUIDv6 will primarily be used in contexts where there are existing v1 UUIDs. Systems that do not involve legacy UUIDv1 SHOULD consider using UUIDv7 instead.¶
Instead of splitting the timestamp into the low, mid and high sections from UUIDv1, UUIDv6 changes this sequence so timestamp bytes are stored from most to least significant. That is, given a 60 bit timestamp value as specified for UUIDv1 in [RFC4122], Section 4.1.4, for UUIDv6, the first 48 most significant bits are stored first, followed by the 4 bit version (same position), followed by the remaining 12 bits of the original 60 bit timestamp.¶
The clock sequence bits remain unchanged from their usage and position in [RFC4122], Section 4.1.5.¶
The 48 bit node SHOULD be set to a pseudo-random value however implementations MAY choose to retain the old MAC address behavior from [RFC4122], Section 4.1.6 and [RFC4122], Section 4.5. For more information on MAC address usage within UUIDs see the Section 8¶
The format for the 16-byte, 128 bit UUIDv6 is shown in Figure 1¶
With UUIDv6 the steps for splitting the timestamp into time_high and time_mid are OPTIONAL since the 48 bits of time_high and time_mid will remain in the same order. An extra step of splitting the first 48 bits of the timestamp into the most significant 32 bits and least significant 16 bits proves useful when reusing an existing UUIDv1 implementation.¶
UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds seconds since midnight 1 Jan 1970 UTC, leap seconds excluded. As well as improved entropy characteristics over versions 1 or 6.¶
Implementations SHOULD utilize UUID version 7 over UUID version 1 and 6 if possible.¶
UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. The only requirement is that the variant and version bits MUST be set as defined in Section 4. UUIDv8's uniqueness will be implementation-specific and SHOULD NOT be assumed.¶
The only explicitly defined bits are the Version and Variant leaving 120 bits for implementation specific time-based UUIDs. To be clear: UUIDv8 is not a replacement for UUIDv4 where all 122 extra bits are filled with random data.¶
Some example situations in which UUIDv8 usage could occur:¶
An implementation would like to embed extra information within the UUID other than what is defined in this document.¶
An implementation has other application/language restrictions which inhibit the use of one of the current UUIDs.¶
The Max UUID is special form of UUID that is specified to have all 128 bits set to 1. This UUID can be thought of as the inverse of Nil UUID defined in [RFC4122], Section 4.1.7¶
The minimum requirements for generating UUIDs are described in this document for each version. Everything else is an implementation detail and up to the implementer to decide what is appropriate for a given implementation. That being said, various relevant factors are covered below to help guide an implementer through the different trade-offs among differing UUID implementations.¶
UUID timestamp source, precision and length was the topic of great debate while creating this specification. As such choosing the right timestamp for your application is a very important topic. This section will detail some of the most common points on this topic.¶
Monotonicity is the backbone of time-based sortable UUIDs. Naturally time-based UUIDs from this document will be monotonic due to an embedded timestamp however implementations can guarantee additional monotonicity via the concepts covered in this section.¶
Additionally, care MUST be taken to ensure UUIDs generated in batches are also monotonic. That is, if one-thousand UUIDs are generated for the same timestamp; there is sufficient logic for organizing the creation order of those one-thousand UUIDs. For batch UUID creation implementions MAY utilize a monotonic counter which SHOULD increment for each UUID created during a given timestamp.¶
For single-node UUID implementations that do not need to create batches of UUIDs, the embedded timestamp within UUID version 1, 6, and 7 can provide sufficient monotonicity guarantees by simply ensuring that timestamp increments before creating a new UUID. For the topic of Distributed Nodes please refer to Section 6.3¶
Implementations SHOULD choose one method for single-node UUID implementations that require batch UUID creation.¶
The following sub-topics cover methods behind incrementing either type of counter method:¶
The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters:¶
The following sub-topics cover rollover handling with either type of counter method:¶
Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature:¶
Compare the current timestamp against the previously stored timestamp.¶
If the current timestamp is equal to the previous timestamp; increment the counter according to the desired method and type.¶
If the current timestamp is greater than the previous timestamp; re-initialize the desired counter method to the new timestamp and generate new random bytes (if the bytes were frozen or being used as the seed for a monotonic counter).¶
Implementations SHOULD check if the the currently generated UUID is greater than the previously generated UUID. If this is not the case then any number of things could have occurred. Such as, but not limited to, clock rollbacks, leap second handling or counter rollovers. Applications SHOULD embed sufficient logic to catch these scenarios and correct the problem ensuring the next UUID generated is greater than the previous.¶
Implementations SHOULD weigh the consequences of UUID collisions within their application and when deciding between UUID versions that use entropy (random) versus the other components such as Section 6.1 and Section 6.2. This is especially true for distributed node collision resistance as defined by Section 6.3.¶
There are two example scenarios below which help illustrate the varying seriousness of a collision within an application.¶
UUIDs created by this specification MAY be used to provide local uniqueness guarantees. For example, ensuring UUIDs created within a local application context are unique within a database MAY be sufficient for some implementations where global uniqueness outside of the application context, in other applications, or around the world is not required.¶
Although true global uniqueness is impossible to guarantee without a shared knowledge scheme; a shared knowledge scheme is not required by UUID to provide uniqueness guarantees. Implementations MAY implement a shared knowledge scheme introduced in Section 6.3 as they see fit to extend the uniqueness guaranteed this specification and [RFC4122].¶
Implementations SHOULD utilize a cryptographically secure pseudo-random number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique"). CSPRNG ensures the best of Section 6.4 and Section 8 are present in modern UUIDs.¶
Advice on generating cryptographic-quality random numbers can be found in [RFC4086]¶
UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g. database indexes) SHOULD sort as opaque raw bytes, without need for parsing or introspection.¶
Time ordered monotonic UUIDs benefit from greater database index locality because the new values are near each other in the index. As a result objects are more easily clustered together for better performance. The real-world differences in this approach of index locality vs random data inserts can be quite large.¶
UUIDs formats created by this specification SHOULD be Lexicographically sortable while in the textual representation.¶
UUIDs created by this specification are crafted with big-ending byte order (network byte order) in mind. If Little-endian style is required a custom UUID format SHOULD be created using UUIDv8.¶
UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID version and variant.¶
For many applications, such as databases, storing UUIDs as text is unnecessarily verbose, requiring 288 bits to represent 128 bit UUID values. Thus, where feasible, UUIDs SHOULD be stored within database applications as the underlying 128 bit binary value.¶
For other systems, UUIDs MAY be stored in binary form or as text, as appropriate. The trade-offs to both approaches are as such:¶
Storing as binary requires less space and may result in faster data access.¶
Storing as text requires more space but may require less translation if the resulting text form is to be used after retrieval and thus maybe simpler to implement.¶
DBMS vendors are encouraged to provide functionality to generate and store UUID formats defined by this specification for use as identifiers or left parts of identifiers such as, but not limited to, primary keys, surrogate keys for temporal databases, foreign keys included in polymorphic relationships, and keys for key-value pairs in JSON columns and key-value databases. Applications using a monolithic database may find using database-generated UUIDs (as opposed to client-generate UUIDs) provides the best UUID monotonicity. In addition to UUIDs, additional identifiers MAY be used to ensure integrity and feedback.¶
This document has no IANA actions.¶
MAC addresses pose inherent security risks and SHOULD not be used within a UUID. Instead CSPRNG data SHOULD be selected from a source with sufficient entropy to ensure guaranteed uniqueness among UUID generation. See Section 6.6 for more information.¶
Timestamps embedded in the UUID do pose a very small attack surface. The timestamp in conjunction with an embedded counter does signal the order of creation for a given UUID and it's corresponding data but does not define anything about the data itself or the application as a whole. If UUIDs are required for use with any security operation within an application context in any shape or form then [RFC4122] UUIDv4 SHOULD be utilized.¶
The authors gratefully acknowledge the contributions of Ben Campbell, Ben Ramsey, Fabio Lima, Gonzalo Salgueiro, Martin Thomson, Murray S. Kucherawy, Rick van Rein, Rob Wilton, Sean Leonard, Theodore Y. Ts'o., Robert Kieffer, sergeyprokhorenko, LiosK As well as all of those in the IETF community and on GitHub to who contributed to the discussions which resulted in this document.¶
This section details a function in C which converts from a UUID version 1 to version 6:¶
UUIDv8 will vary greatly from implementation to implementation. A good candidate use case for UUIDv8 is to embed exotic timestamps like the one found in this example which employs approximately 0.25 milliseconds and approximately 5 microseconds per timestamp tick as a 48 bit value.¶
Both UUIDv1 and UUIDv6 test vectors utilize the same 60 bit timestamp: 0x1EC9414C232AB00 (138648505420000000) Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00¶
Both UUIDv1 and UUIDv6 utilize the same values in clk_seq_hi_res, clock_seq_low, and node. All of which have been generated with random data.¶
This example UUIDv7 test vector utilizes a well-known 32 bit Unix epoch with additional millisecond precision to fill the first 48 bits¶
rand_a and rand_b are filled with random data.¶
The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00 represented as 0x17F21CFD130 or 1645539742000¶
This example UUIDv8 test vector utilizes a well-known 64 bit Unix epoch with nanosecond precision, truncated to the least-significant, right-most, bits to fill the first 48 bits through version.¶
The next two segments of custom_b and custom_c are are filled with random data.¶
Timestamp is Tuesday, February 22, 2022 2:22:22.000000 PM GMT-05:00 represented as 0x16D6320C3D4DCC00 or 1645557742000000000¶
It should be noted that this example is just to illustrate one scenario for UUIDv8. Test vectors will likely be implementation specific and vary greatly from this simple example.¶
Msb0 | Msb1 | Msb2 | Msb3 | Version | Description |
0 | 0 | 0 | 0 | 0 | Unused |
0 | 0 | 0 | 1 | 1 | The Gregorian time-based UUID from in [RFC4122], Section 4.1.3 |
0 | 0 | 1 | 0 | 2 | DCE Security version, with embedded POSIX UIDs from [RFC4122], Section 4.1.3 |
0 | 0 | 1 | 1 | 3 | The name-based version specified in [RFC4122], Section 4.1.3 that uses MD5 hashing. |
0 | 1 | 0 | 0 | 4 | The randomly or pseudo-randomly generated version specified in [RFC4122], Section 4.1.3. |
0 | 1 | 0 | 1 | 5 | The name-based version specified in [RFC4122], Section 4.1.3 that uses SHA-1 hashing. |
0 | 1 | 1 | 0 | 6 | Reordered Gregorian time-based UUID specified in this document. |
0 | 1 | 1 | 1 | 7 | Unix Epoch time-based UUID specified in this document. |
1 | 0 | 0 | 0 | 8 | Reserved for custom UUID formats specified in this document. |
1 | 0 | 0 | 1 | 9 | Reserved for future definition. |
1 | 0 | 1 | 0 | 10 | Reserved for future definition. |
1 | 0 | 1 | 1 | 11 | Reserved for future definition. |
1 | 1 | 0 | 0 | 12 | Reserved for future definition. |
1 | 1 | 0 | 1 | 13 | Reserved for future definition. |
1 | 1 | 1 | 0 | 14 | Reserved for future definition. |
1 | 1 | 1 | 1 | 15 | Reserved for future definition. |