Internet-Draft | Computerate Specifying | February 2021 |
Petit-Huguenin | Expires 5 August 2021 | [Page] |
This document specifies a paradigm named Computerate Specifying, designed to simultaneously document and formally specify communication protocols. This paradigm can be applied to any document produced by any Standard Developing Organization (SDO), but this document targets specifically documents produced by the IETF.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 5 August 2021.¶
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
If, as the unofficial IETF motto states, we believe that "running code" is an important part of the feedback provided to the standardization process, then as per the Curry-Howard equivalence [Curry-Howard] (that states that code and mathematical proofs are the same), we ought to also believe that "verified proof" is an equally important part of that feedback. A verified proof is a mathematical proof of a logical proposition that was mechanically verified by a computer, as opposed to just peer-reviewed.¶
The "Experiences with Protocol Description" paper from Pamela Zave [Zave11] gives three conclusions about the usage of formal specifications for a protocol standard. The first conclusion states that informal methods (i.e. the absence of verified proofs) are inadequate for widely used protocols. This document is based on the assumption that this conclusion is correct, so its validity will not be discussed further.¶
The second conclusion states that formal specifications are useful even if they fall short of the "gold standard" of a complete formal specification. We will show that a formal specification can be incrementally added to a document.¶
The third conclusion from Zave's paper states that the normative English language should be paraphrasing the formal specification. The difficulty here is that to be able to keep the formal specification and the normative language synchronized at all times, these two should be kept as physically close as possible to each other.¶
To do that we introduce the concept of "Computerate Specifying" (note that Computerate is a British English word). "Computerate Specifying" is a play on "Literate Computing", itself a play on "Structured Computing" (see [Knuth92] page 99). In the same way that Literate Programming enriches code by interspersing it with its own documentation, Computerate Specifying enriches a standard specification by interspersing it with code (or with proofs, as they are the same thing), making it a computerate specification.¶
Note that computerate specifying is not specific to the IETF, just like literate computing is not restricted to the combination of Tex and Pascal described in Knuth's paper. What this document describes is a specific instance of computerate specifying that combines [AsciiDoc] as formatting language and [Idris2] as programming language with the goal of formally specifying IETF protocols.¶
The remaining of this document is divided in 3 parts:¶
After the Terminology (Section 3) section starts a tutorial on how to write a specification. This tutorial is meant to be read in sequence, as concepts defined in early part will not be repeated later. On the other hand the tutorial is designed to present information progressively and mostly in order of complexity, so it is possible to start writing effective specifications without reading or understanding the whole tutorial.¶
The tutorial begins by explaining how to write private specifications (Section 4), which are specifications that are not meant to be shared. Then the tutorial continues by explaining how to write an self-contained specification (Section 5), which is a specification that contains Idris code that relies only on the Idris Standard Library. Writing self-contained specifications is difficult and time-consuming, so the tutorial continues by explained how to import specifications (Section 6) that contain reusable types and code. The tutorial ends with explanations on how to design a exportable specification (Section 7).¶
After the tutorial come the description of all the packages and modules in the Computerate Specifying Standard Library (Section 8).¶
Appendix A explains how to install and use the associated tooling, and Appendix B contains the reference manual for the standard library.¶
Literate Idris2 code embedded in an AsciiDoc document, containing both formal descriptions and human language texts, and which can be processed to produce documents in human language.¶
Any text that contains the documentation of a protocol in the English language. A document is the result of processing a specification.¶
A specification created after a document was published such as the generated document coincides with the published document.¶
In this document, the same word can be used either as an English word or as an Idris identifier used inside the text.
To explicitly differentiate them, the latter is always displayed like this
.
E.g. IdrisDoc
is meant to convey the fact that IdrisDoc in that case is an Idris module or type.
On the other hand the word IdrisDoc refers to the IdrisDoc specification.¶
Similarly blocks of code, including literate code, are always sandwiched between "<CODE BEGINS>" and "<CODE ENDS>". Code blocks will be presented in their literate form only when necessary, i.e. when mixed AsciiDoc and Idris are required. However, in a computerate specification, Idris code must in fact be used in its literate form.¶
By convention an Idris function that returns a type and types themselves will always start with an uppercase letter. Functions not returning a type start with a lowercase letter.¶
For the standard library, the types names are also formed by taking the English word or expression, making the first letter of each word upper case, and removing any symbols like underscore, dash and space. Thus bitvector would become "Bitvector" after conversion as a type name but bit diagram would become "BitDiagram".¶
Nowadays documents at the IETF are written in a format named xml2rfc v3 [RFC7991] but unfortunately making that format Computerable is not trivial, mostly because there is no simple solution to mix code and XML together in the same file. Instead, the [AsciiDoc] format forms the basis for specifications as it permits the generation of documents in the xmlrfc v3 format (among other formats) and also because it can be enriched with code in the same file.¶
AsciiRFC [I-D.ribose-asciirfc] and [Metanorma-IETF] describe a backend for the [Asciidoctor] tool that converts an AsciiDoc document into an xml2rfc v3 document. The AsciiRFC document states various reasons why AsciiDoc is a superior format for the purpose of writing standards, so that will not be discussed further. Note that the same team developed Asciidoctor backends [Metanorma] for other Standards Developing Organizations (SDO), making it easy to develop computerate specifications targeting the documents developed by these SDOs.¶
The code in a computerate specification uses the programming language [Idris2] in literate programming [Literate] mode using the Bird-style, by having each line of code starting with a ">" mark in the first column.¶
That same symbol is also used by AsciiDoc as an alternate way of defining a blockquote [Blockquotes] way which is no longer available in a computerate specification. Bird-style code will simply not appear in the rendered document.¶
The result of Idris code execution can be inserted inside the AsciiDoc part of a specification by inserting that code fragment between the "{`" string and the "`}" strings.
That code fragment must return a value of a type that implements the Show
interface.¶
A computerate specification is processed by an Asciidoctor preprocessor that does the following:¶
For instance the following document fragment taken from the computerate specification of [RFC8489]:¶
is rendered as¶
The Idris2 programming language has been chosen because its type system supports dependent and linear types [Type-Driven], and that type system is the language in which propositions are written. The Idris2 programming also has reflection capabilities and support for meta-programming, also known as elaboration.¶
Following Zave's second conclusion, computerate specifying is not restricted to the specification of protocols, or to property proving. There is a whole spectrum of formalism that can be introduced in a specification, and we will present it in the remaining sections by increasing order of complexity. Note that because the specification language is a programming language, these usages are not exhaustive, and plenty of other usages can and will be found after the publication of this document.¶
At the difference of an RFC which is immutable after publication, the types and code in a specification will be improved over time, especially as new properties are proved or disproved. The latter will happen when a bug is discovered in a specification and a proof of negation is added to the specification, paving the way to a revision of the standard.¶
A private specification is a specification that is not meant to be shared. There is mostly two reasons for a specification to be kept private, as explained in the next sections.¶
In the simplest case, writing a specification with the goal of publishing the resulting document does not require sharing that specification. This is quite similar to what was done with xml2rfc before the IETF adopted RFC 7991 as the canonical format for Internet-Drafts and RFCs; most people used xml2rfc to prepare their document, but did not share the xml2rfc file beyond the co-authors of the document.¶
In that case writing a specification is straightforward: write the specification from scratch using AsciiDoc for the text and Idris for the formal parts.¶
One effective rule to quickly discover that the Idris code and the AsciiDoc document are diverging is to keep both of them as close as possible to each other. This is most effectively done by having the matching Idris code right after each AsciiDoc paragraph, such as it is then easy to compare each to the other.¶
Idris itself imposes an order in which types and code must be declared and defined, because it does not by default look for forward references. Because, by the rule above, the text will follow the order the Idris code is organized, the document generated by such specification tends to be naturally easier to implement, because it induces the same workflow than a software implementer will follow when implementing the document.¶
A second reason to write a private specification is for the purpose of doing a review of an existing document, most likely of an Internet-Draft during the standardization process.¶
This is done by first turning the existing document into a specification by converting it into an AsciiDoc document, which can be done manually relatively easily. After this step, the specification can be enriched by adding some Idris code and replacing some of the text with the execution of Idris code fragments. Comparing the original document with a document generated by processing the specification permits to validate that the original document is correct regarding the formalism introduced.¶
Documents that are not generated from a specification do not always have a structure that follow the way a software developer will implement it. When that is the case it will be difficult to add the Idris code right after a paragraph describing its functionality, because the final code may not type-check because of the lack of support for forward references. It could be a signal that the text needs to be reorganized to be more software-development friendly.¶
One alternative is to use a technique named self-inclusion, which is the possibility to change the order of paragraphs in an AsciiDoc document and thus keeping the Idris code in an order that type-checks.¶
This is done by using tags to delimit the text that needs to be moved:¶
Then a self-include can move (instead of duplicating) the text inside the tags to a different place, without changing the order of the Idris code:¶
Note that the IETF Trust licences [TLP5] do not grant permission to distribute an annotated Internet-Draft as a whole so redistributing such specification would be a copyright license infringement. But as in this case the specification is not meant to be distributed, there is no infringement possible.¶
A self-contained specification is a specification where the Idris code does not use anything but the types and functions defined in its standard library, thus not requiring to install anything but Idris2 itself.¶
A specification uses Idris types to specify both how stream of bits are arranged to form valid Protocol Data Units (PDU) and how the exchange of PDUs between network elements is structured to form a valid protocol. In addition a specification can be used to prove or disprove a variety of properties for these types.¶
The PDUs in a communication protocol determines how data is laid out before it is sent over a communication link. Generally a PDU is described only in the context of the layer that this particular protocol is operating at, e.g. an application protocol PDU only describes the data as sent over UDP or TCP, not over Ethernet or Wi-Fi.¶
PDUs can generally be split into two broad categories, binary and text, and a protocol PDU mostly falls into one of these two categories.¶
PDU descriptions can be defined as specifications for at least three reasons: the generation of examples that are correct by construction, correctness in displaying the result of calculations, and correctness in representing the structure of a PDU. Independently of these reasons, a PDU description is a basic component of a specification that will probably be needed regardless.¶
Examples in protocol documents are frequently incorrect, which proves to have a significant negative impact as they are too often misused as normative text. See Appendix C for statistics about the frequency of incorrect examples in RFC errata.¶
Ensuring example correctness is achieved by adding the result of a computation (the example) directly inside the document. If that computation is done from a type that is (physically and conceptually) close to the normative text, then we gain some level of assurance that both the normative text and the derived examples will match.¶
Generating an example that is correct by construction always starts by defining a type that describes the format of the data to display. The Internet Header Format in section 3.1 of [RFC0791] will be used in the following sections as example.¶
In this section we start by defining an Idris type, using a Generalized Algebraic Data Type (GADT).
In that case we have only one constructor (MkInternetHeader
) which is defined as a Product Type that "concatenate" all the fields on the Internet Header.
One specific aspect of Idris types is that we can enrich the definition of each field with constraints that then have to be fulfilled when a value of that type will be built.¶
where¶
An Idris type where the fields in a constructor are organized like the InternetHeader
by ordering them in a sequence is called a Pi type - or, when there is no dependencies between fields as there is in version = 4
, a Product type.
Although there is no equivalence in most programming languages to a Pi type, Product types are known as classes in Java and struct in C.¶
Another way to organize a type is called the Sum type, which is a type with multiple constructors that act as alternative. Sum types can be used in C with a combination of struct and union, and since Java 14 by using sealed records.¶
Sum types have a dependent counterpart named a Sigma type, which is a tuple in which the type of the second element depends on the value of the first element. This is mostly returned by functions, with the returned Sigma type carrying both a value and a proof of the validity of that value.¶
From that point it is possible to define a value that fulfills all the constraints. The following values are taken from example 1 in [RFC0791] Appendix A.¶
The =>
symbol after a constraint indicates that Idris should try to automatically find a proof that this constraint is met by the values in the example, which it successfully does in the example above.¶
The following example, where the constraints defined in the InternetHeader type are not met, will not type-check in Idris (an error message will be generated) and thus can not be used to generate an example.¶
The next step is to define an Idris function that converts a value of the type InternetHeader
into the kind of bit diagram that is showed in Appendix A of [RFC0791].¶
Here we implement the Show
interface that permits to define the adhoc polymorphic function show
for InternetHeader
, function that will convert the value into the right character string.
Idris names starting with a question mark like in ?showPrec_rhs_1
are so-called holes, which are placeholder for code to be written, while still permitting type-checking.¶
After replacing the hole by the actual code, the following embedded code can be used in the document to generate an example that is correct by construction, at least up to mistakes in the specification (i.e. the constraints in InternetHeader
) and bugs in the show
function.¶
will generate the equivalent AsciiDoc text:¶
This generated example is similar to the first of the examples in appendix A of RFC 791.¶
The previous section showed how to define a type that precisely describes a PDU, how to generates examples that are are values of that type, and how to insert them in a document.¶
Our specification, which has the form of an Idris type, can be seen as a generalization of all the possible examples for that type. Now that we went through the effort of precisely defining that type, it would be useful to use it to also calculate statements about that syntax.¶
In RFC 791 the description of the field IHL states "[...]that the minimum value for a correct header is 5." The origin of this number may be a little mysterious, so it is better to use a formula to calculate it and insert the result instead.¶
Inserting a calculation is easy:¶
Here we can insert a code fragment that is using a function that is defined later in the document because the Idris code is evaluated before the document is processed.¶
Note the difference with examples: The number 5
is not an example of value of the type InternetHeader
, but a property of that type.¶
Systematically using the result of calculation on types in a specification makes it more resistant to mistakes that are introduced as result of modifications.¶
The layout of a PDU, i.e. the size and order of the fields that compose it can be represented in a document in various forms. One of them is just an enumeration of these fields in order, each field identified by a name and accompanied by some description of that field in the form of the number of bits it occupies in the PDU and how to interpret these bits.¶
That layout can also be presented as text, as a list, as a table, as a bit diagram, at the convenience of the document author. In all cases, some parts of the description of each field can be extracted from our Idris type just like we did in Section 5.1.2.¶
RFC 791 section 3.1 represents the PDUs defined in it both as bit diagrams and as lists of fields.¶
A network protocol, which is how the various PDUs defined in a document are exchanged between network elements, can always be understood as a set of state machines. At the difference of PDUs, that are generally described in a way that is close to their Idris counterpart, state machines in a document are generally only described as text.¶
Note that, just like an Idris representation of a PDU should also contain all the possible constraints on that PDU but not more, a state machine should contain all the possible constraints in the exchange of PDUs, but not less.¶
This issue is most visible in one of the two state machines defined in RFC 791, the one for fragmenting IP packets (the other is for unfragmenting packets). The text describes two different algorithms to fragment a packet but in that case each algorithm should be understood as one instance of a more general state machine. That state machine describes all the possible sequences of fragments that can be generated by an algorithm that is compliant with RFC 791 and it would be an Idris type that is equivalent to the following algorithm:¶
For a specific packet size, generate a list of all the binary values {b0,.., bN} with N being the packet size divided by 8 and rounded-up, and 0..N representing positional indexes for each of the 8 byte chunks of the packet.¶
For each binary value in that list, generate a list of values that represents the number of consecutive bits of the same value (e.g.. 0x110001011
generates a [2, 3, 1, 1, 2]
list), each such sequence representing a given fragment¶
Remove from that list of lists any list that contains a number that, after multiplication by 8, is higher than the maximum size of a fragment.¶
For each remaining list in that list, generate the list of fragments, i.e with the correct offset, length and More bit.¶
Generate all the possible permutations for each list of fragments.¶
We can see that this state machine takes in account the fact that an IP packet can not only be fragmented in fragments of various sizes - as long as the constraints are respected - but also that these fragments can be sent in any order.¶
Then the algorithms described in the document can be seen as generating a subset of all the possible list of fragments that can be generated by our state machine. It is then easy to check that these algorithms cannot generate fragments lists that cannot be generated by our state machine.¶
As a consequence, the unfragment state machine must be able to regenerate a valid unfragmented packet for any of the fragments list generated by our fragment state machine. Furthermore, the unfragment state machine must also take in account fragment lists that are modified by the network (itself defined as a state machine) in the following ways:¶
fragments can be dropped;¶
the fragments order can change (this is already covered by the fact that our fragment state machine generates all possible orders);¶
fragments can be duplicated multiple times;¶
fragments can be delayed;¶
fragments can be received that were never sent by the fragment state machine.¶
Then the algorithm described in the document can be compared with the unfragment state machine to verify that all states and transitions are covered.¶
Defining a state machine in Idris can be done in an ad-hoc way [Linear-Resources], particularly by using linear types that express resources' consumption.¶
Under the Curry-Howard equivalence, the Idris types that we created to describe PDUs and state machine are formal logic propositions, and being able to construct values from these types (like we did for the examples), is proof that these propositions are true. These are also called internal verifications [Stump16].¶
External verifications are made of additional propositions (as Idris types) and proofs (as code for these types) with the goal of verifying additional properties.¶
One kind of proofs that one would want in a specification are related to isomorphism, i.e. a guarantee that two or more descriptions of a PDU or a state machine contain exactly the same information, but there is others.¶
The Idris types that are used for generating examples, calculations or representations are generally very close to the bit structure of the PDU. But some properties may be better expressed by defining more abstract types. We call the former Wire Types, and the latter Abstract Types.¶
As example, the type in Section 5.1.1 is a wire type, because it follows exactly the PDU layout. But fragmentation can be more easily described using the following abstract type:¶
First the version
field is eliminated, because it always contains the same constant.¶
Then the flags
and offset
fields are reorganized so to provide four different alternate packets:¶
The Full
constructor represents an unfragmented packet.
It is isomorphic to a MkInternetHeader
with a flags
and offset
values of 0.¶
The 'First' constructor represents the first fragment of a packet.
It is isomorphic to a MkInternetHeader
with a flags
value of 1 and offset
value of 0.¶
The 'Next' constructor represents a intermediate fragments of a packet.
It is isomorphic to a MkInternetHeader
with a flags
value of 1 and offset
value different than 0.¶
Finally the 'Last' constructor represents the last fragment of a packet.
It is isomorphic to a MkInternetHeader
with a flags
value of 0 and offset
value different than 0.¶
One of the main issue of having two types for the same data is ensuring that they both contains the same information, i.e. that they are isomorphic. To ensure that these two types are carrying the same information we need to define and implement four functions that, all together, prove that the types are isomorphic. This is done by defining the 4 types below, as propositions to be proven:¶
Successfully implementing these functions will prove that the two types are isomorphic.
Note the usage of the total
keyword to ensure that these are proofs and not mere programs.¶
For documents that describe a conversion between different data layouts, having a proof that guarantees that no information is lost in the process can be beneficial. For instance, we observe that syntax encoding tends to be replaced each ten years or so by something "better". Here again isomorphism can tell us exactly what kind of information we lost and gained during that replacement.¶
Here, for example, the definition of a function that would verify an isomorphism between an XML format and a JSON format:¶
DeltaXML expresses what is gained by switching from XML to JSON, and DeltaJson expresses what is lost.¶
— Jon Postel - RFC 761Be conservative in what you do, be liberal in what you accept from others.¶
One of the downsides of having specifications is that there is no wiggle room possible when implementing them. An implementation either conforms to the specification or does not.¶
One analogy would be specifying a pair of gears. If one decides to have both of them made with tolerances that are too small, then it is very likely that they will not be able to move when put together. A bit of slack is needed to get the gear smoothly working together but more importantly the cost of making these gears is directly proportional to their tolerance. There is an inflexion point where the cost of an high precision gear outweighs its purpose.¶
We have a similar issue when implementing a specification, where having an absolutely conform implementation may cost more money than it is worth spending. On the other hand a specification exists for the purpose of interoperability, so we need some guidelines on what to ignore in a specification to make it cost effective.¶
Postel's law proposes an informal way of defining that wiggle room by actually having two different specifications, one that defines a data layout for the purpose of sending it, and another one that defines a data layout for the purpose of receiving that data layout.¶
Existing documents express that dichotomy in the form of the usage of SHOULD/SHOULD NOT/RECOMMENDED/NOT RECOMMENDED [RFC2119] keywords. For example the SDP spec says that "[t]he sequence CRLF (0x0d0a) is used to end a line, although parsers SHOULD be tolerant and also accept lines terminated with a single newline character." This directly infers two specifications, one used to define an SDP when sending it, that enforces using only CRLF, and a second specification, used to define an SDP when receiving it (or parsing it), that accepts both CRLF and LF.¶
Note that the converse is not necessarily true, i.e. not all usages of these keywords are related to Postel's Law.¶
To ensure that the differences between the sending specification and the receiving specification do not create interoperability problems, we can use a variant of isomorphism, as shown in the following example (data constructors and code elided):¶
Here we define two data types, one that describes the data layout that is permitted to be sent (Sending
) and one that describes the data layout that is permitted to be received (Receiving
).
For each data layout that is possible to send, there is one or more matching receiving data layouts.
This is expressed by the function to
that takes as input one Sending value and returns a list of Receiving values.¶
Conversely, the from
function maps a Receiving data layout onto a Sending data layout.
Note the asymmetry there, which prevents using a standard proof of isomorphism.¶
Then the toFrom
and fromTo
proofs verify that there is no interoperability issue by guaranteeing that each Receiving value maps to one and only one Sending instance and that this mapping is isomorphic.¶
All of this will provide a clear guidance of when and where to use a SHOULD keyword or its variants, without loss of interoperability.¶
As an trivial example, the following proves that accepting LF characters in addition to CRLF characters as end of line markers does not break interoperability:¶
Postel's Law is not limited to the interpretation of PDUs as a state machine on the receiving side can also be designed to accept more than than what a sending state machine can produce. A similar isomorphism proof can be used to ensure that this is done without loss of interoperability.¶
When applied, the techniques described in Section 5.1 and Section 5.2 result in a set of types that represents the whole protocol. These types can be assembled together, using another set of types, to represent a simulation of that protocol that covers all sending and receiving processes.¶
The types can then be implemented, and that implementation acts as a proof that this protocol is actually implementable.¶
To make these pieces of code composable, a specification is split in multiple modules, each one represented as a unique function. The type of each of these functions is derived from the state machines described in Section 5.2, by bundling together all the inputs of the state machine as the input for that function, and bundling all the outputs of the state machine as the output of this function.¶
For instance the IP layer is really 4 different functions:¶
A function that converts between a byte array and a tree representation (parsing).¶
A function that takes a tree representation and a maximum MTU and returns a list of tree representations, each one fitting inside the MTU.¶
A function that accumulates tree representations of an IP fragment until a tree representation of a full IP packet can be returned.¶
A function that convert a tree representation into a byte array.¶
The description of each function is incomplete, as in addition to the input and the output listed, these functions needs some ancillary data, in the form of:¶
state, which is basically values stored between evaluations of a function,¶
an optional signal, that can be used as an API request or response. As timers are a fundamental building block for communication protocols, one common uses for that signal are to request the arming of a timer, and to receive the indication of the expiration of that timer.¶
Proving that a protocol does not loop is equivalent to proving that a implementation of the types for that protocol does not loop either i.e., terminates.
This is done by using the type described in Section 5.3.4 and making sure that it type-check when the total
keyword is used.¶
One of the ultimate goals of this document is to convince authors to use the techniques described there to write their documents. Because doing so requires a lot of efforts, an important intermediate goal is to show authors that the benefits of Computerate Specifying are worth learning and becoming proficient in these techniques.¶
The best way to reach that intermediate goal is to apply these technique to documents that are in the process of being published by the IETF and if issues are found, report them to the authors. Doing that on published RFCs, especially just after their publication, would be unnecessarily mean. On the other hand doing that on all Internet-Drafts as they are published would not be scalable.¶
The best place to do a Computerate Specifying oriented review is when a document enters IETF Last Call. These reviews would then be indistinguishable from the reviews done by an hypothetical Formal Specification Directorate. An argument can be made that, ultimately, writing a specification for a document could be an activity too specialized, just like Security reviews are, and that an actual Directorate should be assembled.¶
Alas, it is clear that writing a specification from scratch (as in Section 5) for an existing document takes far more time than the Last Call duration would allow. On the other hand the work needed could be greatly reduced if, instead of writing that specification from scratch, libraries of code would be available for the parts that are reusable between successive specifications. These libraries fall into 3 categories:¶
General types and common presentations. E.g., bit diagrams are a very common way of presenting data, and so reusable types and functions to generate and compare them would accelerate a formalization. The libraries in that category are explained in Section 6.1, in Section 8.1, and its associated reference in Appendix B.¶
Types and common representations for meta-languages. A few meta-languages are used in documents to formalize some parts of them, so having libraries to formalize these meta-languages also helps accelerating their verification. The libraries in that category are explained in Section 6.2, in Section 8.2, and its associated reference in Appendix B.¶
Types and common representation for common protocols. Most documents are about modifying or defined new usages for existing protocols, which is why it makes sense to establish libraries of these existing protocols for reuse. The libraries in that category are explained in Section 6.3, in Section 8.3, and its associated reference in Appendix B.¶
Together these libraries form the Computerate Specifying Standard Library (Section 8).¶
These libraries are in fact computerate specifications that, instead of being private, are designed to export types and code and be imported in other computerate specifications. Section 7 describes how to build an specification that can be exported.¶
The types and code in a computerate specification form an Idris package, which is a collection of Idris modules. An Idris module form a namespace hierarchy for the types and functions defined in it and is physically stored as a file.¶
Different types of specification can be combined, for instance an exporting library may import from another specification, and this recursively until importing specifications that are both self-contained and exporting.¶
For convenience each public computerate specification, including the one behind this document, is available as an individual git repository. There is exactly one Idris package per git repository. Appendix A.5 explains how to gain access to these computerate specifications.¶
This document is itself generated from a computerate specification that contains data types and functions that can be reused in future specifications, and as a whole is part of the standard library for computerate specifying. The following sections describes the Idris modules defined in that specification.¶
The code described in Section 5 directly generates text that is to be embedded inside an AsciiDoc document. This is fine for small examples but AsciiDoc has quite a lot of escaping rules that are difficult to use in a consistent manner.¶
For this reason the specification behind this document provides a module named AsciiDoc
that contains a set of types that can be used to guarantee that the AsciiDoc text generated is compliant with its specification.
All these types implement the Show
interface so they can be directly returned by the embedded code.¶
So instead of implementing a show function, a function returning an instance of one of the types can be executed directly as embedded code:¶
In the example above, the example
function converts an InternetHeader
value into an AsciiDoc
block, which is automatically serialized as AsciiDoc text.¶
The AsciiDoc
module is not limited to generating examples, but can be used to generate any AsciiDoc structure from Idris code.
E.g., the tables in Appendix C are generated using that technique.¶
Section 8.1.1 provides a description of the AsciiDoc
module.¶
Using an intermediary type will also permit to correctly generate AsciiDoc that can generate an xmlrfc 3 file that supports both text and graphical versions of a figure. This will be done by having AsciiDoc blocks converted into <artwork> elements that contains both the 72 column formatted text and an equivalent SVG file, even for code source (instead of using the <sourcecode> element).¶
The type in Section 5.1.1 seems a good representation of the structure of the Internet Header, but the origin of a lot of the values in the constraints does not seems very obvious, and as such are still prone to errors. E.g., the calculation in Section 5.1.2 could be better if it was using the type itself as a source for the calculated data.¶
It also may be more convenient to use types that already have some of the properties we need, instead of having to add a bunch of constraints to the Int
type.¶
The truth of the matter is that the Idris standard library contains very few predefined types that are useful to specify the syntax of communication protocols.
E.g., none of the builtin types (Int
, Integer
, Double
, Char
, String
, etc) are really suitable to describe a PDU syntax, and so should be avoided.
For this reason, it is preferable to use the types provided by the Computerate Specifying standard library.¶
We are going to redefine the InternetHeader
type, but using three modules from the standard library:¶
A sequence of bits, or bit-vector, is the most primitive type with which a packet can be described.
This module provides a type BitVector n
that represents a sequence of bit of fixed size n
.
The module also provides a set of functions that permits to manipulate bit-vectors.
See Section 8.1.2 for a description of the BitVector
module.¶
The Unsigned
module provides a type Unsigned n
that is built on top of the BitVector
module.
In addition of the properties of a bit-vector, an Unsigned n
is considered a number and so all the integer operations applies to it.
See Section 8.1.3 for a description of the Unsigned
module.¶
Some numbers (also called denominate numbers) are used in conjunction with a so-called unit of measure.
The Dimension
module provides a way to associate a dimension, in the form of a unit of measure, to an Idris number, including to the numbers defined in the Unsigned
module.
The Dimension
module provides two dimensions, Data (with bit, octet, etc, as units of information) and Time (with second, millisecond, etc, as unit of time).
See Section 8.1.4 for a description of the Dimension
module.¶
A redefinition of the type in Section 5.1.1 using the types in these modules would look like this:¶
This is bit-vector, but it always contains the same value, so a constraint states that.
Because bit-vectors are not integers, the value must be expressed by a list of O
(for 0) and I
(for 1) constructors.¶
This is an unsigned integer with a size of 4 bits.
It is associated with a dimension, here the Data
dimension, which is constrained to use the tetra
unit (32-bit words).
Basically a denominate number can only be added or subtracted with numbers with the same dimension (but not necessarily with the same unit).
E.g. adding the ihl
value with the ttl
value will be rejected by Idris, because that operation does not make sense.
A denominate number can also be divided or multiplied by a dimensionless number.¶
These are defined as bit-vectors, because they are not really numbers - they do not need to be compared, or be part of a calculation. The number in this type (and all the others) is the number of bits allocated.¶
This is an unsigned number with a size of 16 bits, a Data
dimension and a byte
unit (8 bits).
After casting as denominate numbers, subtracting ihl
from length
gives directly the size of the payload, without risk of scaling error.¶
This is an unsigned integer. Comparisons and calculations are possible on this field.¶
This is an unsigned number with a length of 13 bits, a Data
dimension and an octa
unit (64 bits).
Again, adding or subtracting this value after casting to another of the same dimension is guaranteed to return the correct value.¶
This is a denominate number with Time
as dimension and second
as unit.¶
This is a variable length field that contains a list of options, which are defined in a separate type named Option
.¶
This is a bit-vector whose length is variable.¶
As we can see the noise in the definition of our type is greatly reduced by using these specialized types, which in turn permits to add even more constraints.¶
We can even constrain the size of a field, like is done for the padding
field below.
In that case the length is calculated in the first constraint by calling the pad
function, function that calculates the number of bits needed to pad a value of a type that implements the Size
interface to a word boundary, here 32 bits.
The second constraint checks that whatever the length of the padding field is, it is always equal to a zero-filled bit-vector, as returned by the function bitVector
.¶
Dimensions can also be combined to seamlessly build more complex dimensions.
For example, all "length" values of sent packets can be added up during a period of time, while keeping beginning and ending times as denominate numbers: dividing the length
sum by the difference between the end time and the begin time gives us directly the data speed in bits per second (or whatever unit we prefer), with the guarantee that Idris will not let us mix oranges and apples.¶
Here's an example of Sum type that implements some of the variants for an Option
in an InternetHeader
:¶
The imported types that we are using in the definition of our types all implement the Size
interface, which provides a definition for the adhoc polymorphic function size
, function that returns the size of a field as a dimensional number of dimension Data
.
This interface can be implemented for the type InternetHeader
by making its size the sum of the size of all its fields:¶
We can then define a minimal header, and insert its size, using the right unit, in the document:¶
A better solution than defining an adhoc type for our state machines, as explained in Section 5.2, is to use Petri Nets. This specification defines a DSL that permits describing a Typed Petri Net (TPN) which is heavily influenced by Coloured Petri Nets [CPN] (CPN). The original CPN adds some restriction on the types that can be used in a Petri Net because of limitations in the underlying programming language, SML. As the underlying programming used in TPN, Idris, does not have these limitations, any well-formed Idris type (including polymorphic, linear and dependent types) can be directly used in TPN.¶
From there it is easy to generate (using the non-deterministic monad in Idris) an interpreter for debugging and simulation purposes:¶
A Petri Net has the advantage that the same graph can be reused to derive other Petri Nets, e.g., Timed Petri Nets (that can be used to collect performance metrics) or Stochastic Petri Nets.¶
A TPN that covers a whole protocol (i.e. client, network, and server) is useful to prove the properties listed in Section 5.3.4, Section 5.3.5, and Section 5.3.6. But the TPN is also designed in a way that each of these parts can be defined separately from the others, making it a Hierarchical TPN.¶
Another usage of our Idris type would be to generate a textual representation of that type.¶
Figure 4 in RFC 791 is a good example of a representation of a data layout, here as a bit diagram. Because we already have an Idris type which is describing exactly the same thing, the idea of syntax representation is to convert that type into text, and insert it in place of the bit diagram.¶
For each textual representation of a type, it is possible to write a function that takes as parameter this type and generate an AsciiDoc
value that can then be inserted in the document.¶
Some document uses representations that are unique to this document but often multiple documents share the same representation and so that function can be also shared between them. A set of such functions is available as part of the Computerate Specification standard library,¶
The bit diagram is one of the most frequently used representation of a PDU specification in documents, so a function to convert an Idris type into a bit diagram is provided as part of the standard library.¶
That function takes as parameters an Idris type, a structure containing additional informations, and returns an AsciiDoc
value that can be inserted in the document.¶
The additional structure is a list of the properties associated to each field that are needed to generate the bit diagram. For a bit diagram the only property is a character string containing the name of the field.¶
For our InternetHeader
type, that additional structure would look like this:¶
The Pdu type takes care of verifying that each name is unique in the structure, and that each name length does not exceed 2 * (size field) - 1
, so it is guaranteed to fit in the bit diagram cell.¶
After that it is just a matter of inserting the function call in the document (the %runElab
keyword indicates that the Idris code is using reflection elaboration, which is used to inspect a type).¶
Message Sequence Charts are a common way to represent an example of execution of a Petri Net, i.e. of the interactions between the underlying state machines. Although sequence charts are often implicitly used to describe a protocol, that description can only be partial.¶
Nonetheless, this is a very common way to show state machine related examples in a document, so a library will permit to specify an execution in a Petri Net (i.e. a list of Transitions and their Bindings), and convert it into a message sequence chart that can be inserted into the document. Optionally the message sequence chart generated can be followed by an ordered list of the messages themselves.¶
When different representations of a specification share some common characteristics, it is usual to generalize them into a formal language.¶
One shared limitation of these languages is that they cannot always formalize all the constraints of a specific data layout, so they have to be enriched with comments. One consequence of this is that they cannot be used as a replacement for the Idris types described in Section 5.1.1 or Section 6.1.2, types that are purposely designed to be as complete as possible.¶
Another consequence is the proliferation of these languages, with each new formal language trying to integrate more constraints than the previous ones. For that reason Computerate Specifying does not favor one formal language over the others, and will try to provide code to help use all of them.¶
Similarly to what was explained in Section 5.1 a set of types can be designed and then used to type-check instance of that formal language, and convert them into a textual representation. Most of the formal languages used at the IETF already come with a set of tools that permits to verify that the text representation in an RFC is syntactically correct, so that type does not add much to that.¶
On the other hand that type can be the target of a converter from an ad-hoc type. This will ensure that the generated instance of the formal language matches the specification, which is something that external tools cannot do.¶
When a PDU is described with a formal language, we end up with two descriptions, one using the Idris dependent type (and used to generate examples) and the other using the formal language.¶
Proving isomorphism requires generating an Idris type from the formal language instance, which is done using an Idris elaborator script.¶
In Idris, Elaborator Reflection [Elab] is a metaprogramming facility that permits writing code generating type declarations and code (including proofs) automatically.¶
For instance the ABNF language is itself defined using ABNF, so after converting that ABNF into an instance of the Syntax type (which is an holder for a list of instances of the Rule type), it is possible to generate a suite of types that represents the same language:¶
The result of the elaboration can then be used to construct a value of type Iso, which requires four total functions, two for the conversion between types, and another two to prove that sequencing the conversions results in the same original value.¶
The following example generates an Idris type "SessionDescription" from the SDP ABNF. It then proves that this type and the Sdp type contain exactly the same information (the proofs themselves have been removed, leaving only the propositions):¶
As stated in Section 5.3.1, the Idris type and the type generated from the formal language are not always isomorphic, because some constraints cannot be expressed in that formal language. In that case isomorphism can be used to precisely define what is missing information in the formal language type. To do so, the generated type is augmented with a delta type, like so:¶
Then the DeltaSessionDescription type can be modified to include the missing information until the same function type checks. After this we have a guarantee that we know all about the constraints that cannot be encoded in that formal language, and can check manually that each of them are described as comments.¶
An interesting comment in [Momot16] states that if the input of an application is too complex to be expressed in ABNF without adding comments, it is too complex to be safe. The technique described in this section can be used to evaluate the safety of such ABNF by clearly specifying the impact of these additional comments.¶
Idris elaborator scripts will be developed for each formal languages.¶
The following sections describe how these formal languages have been or will be themselves be converted into types with the goal of importing them in computerate specifications.¶
Augmented Backus-Naur Form (ABNF) [RFC5234] is a formal language used to describe a text based data layout.¶
An ABNF can be described by defining a value for the types from the RFC5234.Main
module:¶
That value can then be inserted in a document, which will convert it as a proper ABNF, so¶
is rendered as¶
See Section 8.2.2 for details on that package.¶
Augmented Packet Header Diagram (APHD) [I-D.mcquistin-augmented-ascii-diagrams] is a formal language used to describe an augmented bit diagram in a machine-readable format.¶
It can be seen as an extension to the self-contained bit diagram in Section 5.1.3, where more information are extracted from the Idris type, and more properties are carried in the list of properties:¶
From the Idris type:¶
The size of a field in the Idris type is converted into the field's width.¶
The size constraints in Idris are converted into a variable size field (Section 4.1).¶
A constraint that reduces the possible values (like for the version field) is converted into a constraint on field value (Section 4.4).¶
Alternative constructors (i.e., a Sum type) generate a presence predicate (Section 4.2).¶
From the additional structure:¶
The description for each field is a value of AsciiDoc
type, which permits to correctly format it.
In addition, it is possible to insert calculation or even other type representation in the description by using an AsciiDoc
type that works similarly than code embedding.¶
Reusing the type in Section 6.1.2, the conversion process would partially look like this:¶
and is rendered as:¶
Cosmogol [I-D.bortzmeyer-language-state-machines] is a formal language designed to define states machines. The Internet-Draft will be retrofitted as a computerate specification to provide an internal Domain Specific Language (DSL) that permits specifying an instance of that language.¶
As a Petri Net can be seen as a set of state machines, it will be possible to extract part of a Petri Net and generate the equivalent state machine in Cosmogol format.¶
Protocols evolve over time, and the documents that standardize them also need to evolve with them. Each SDO has a specific set of methods to do so, from having the possibility of modifying a document, to systematically releasing a complete new document when a modification is needed. The IETF uses a combination of methods to update the documents that define a protocol.¶
One such method is to release a new document that completely replaces ("obsoletes") an existing protocol. E.g., TLS 1.2 [RFC5246] was completely replaced by TLS 1.3 [RFC8446] such as there is no need to read RFC 5246 to be able to implement RFC 8446.¶
Alternatively only part of a protocol needs modification, so the method used in that case is to issue a new document that only updates that specific part. E.g., RFC 2474 updates only the definition of the ToS field in the Internet Header defined in RFC 791, so reading both documents is required to implement the Internet Protocol. These two methods can be combined together, like was done for RFC 2474. RFC 2474 obsoleted RFC 1349 and RFC 1349 was the original update for RFC 791.¶
Systematically updating a protocol in new documents instead of replacing it means that sometimes a lot of different documents has to be read before implementing a modern implementation of a specific protocol. E.g., the DNS was originally defined in RFC 1034 and 1035, but was updated by more than 30 documents since, requiring to read all of them to implement that protocol.¶
In the DNS example we are not even counting definitions of codepoints as protocol updates. This is the third method used at the IETF to evolve a standard, by defining new codepoints and their associated data. That last method will be explored in more detail in Section 6.3.2, so the remaining of this section can focus on the two other methods.¶
Writing a computerate specification for a new document or a document that obsoletes another one is straightforward, as the specification will contain all the types that are needed to formalize it. On the other hand it is less clear what should go into a specification that updates another one.¶
A simplistic solution is to copy the whole Idris content from the original specification into the new one and modify that new content, but this creates a few problems:¶
Firstly the content from the original specification will have to be copied again each time it was modified, as computerate specifications are meant to evolve, even if the underlying document did not.¶
Secondly the size of the code should be roughly proportional to the size of the document itself, so the actual update is made obvious from the content.¶
So instead of manually copying the content, an Idris elaboration can be used to copy it automatically and apply the minimal modifications needed at the same time.¶
But first the specification that will be updated needs to be prepared, by encapsulating the types in a function that will be used to generate the types themselves:¶
This code behaves exactly like the previous definition, with the major difference that the documentation is not generated for that type.
Idris2 has been enhanced with the possibility to cache the result of an elaboration directly in the source code, and to automatically send a warning when the cache needs to be refreshed.
The interactive command :gc <line>
automatically generates the code followed by a %cacheElab
line that indicates where the code generated ends, something like this:¶
The numbers on the %cacheElab
line are hashes of, respectively, the elaboration code and the generated text and permit to detect if either were modified since the last time the code was cached.¶
With that we can import the definition of the InternetHeader
type and clone in in our new specification:¶
The modification needed by the new document can be done by replacing the ToS
field by the newly defined DSField
, using the replace
function:¶
At this point using elaboration caching would permit to check that the new type indeed uses the Dscp
type instead of the old Tos
type.¶
At the difference of the previous section, that describes how to formalize the unplanned evolution of a protocol, most protocols are designed with the potentiality of evolution, also known as extensibility. These potentialities are generally expressed as values for some fields that will be later assigned to a new meaning.¶
The meaning for a new value will be defined in a new document, with all the documents giving new meanings to a field easily locatable in a registry.¶
Following up on our previous example, RFC 791 defines IP Options only for values 0, 1, 7, 68, 131, 136, and 137. These values, together with new values defined by other documents, are listed in the IP Option Numbers IANA registry. E.g., that IANA registry also defines, among others, value 25 in RFC 4782.¶
The values that are part of a registry are designed to be used with the protocol that defined that registry, so it makes sense to synthesise a Sum type of all these values in the computerate specification for the document that defined that registry.¶
Building that Sum type can be done by applying transformations to the original type, just like when modifying a protocol in a new specification. The difference is that the list of types that will be used in the Sum type needs be collected from the registry, and updated each time the registry is updated.¶
Idris has a mechanism to read external data during type-checking (a feature known as Type Provider), mechanism that could be used to read the content of the registry. A registry generally contains the codepoint that identifies a new value and the name of the document that defines that value, but unfortunately the protocol registries do not contain enough information to automatically find the Idris2 type that matches a specific codepoint.¶
For instance IANA is the organization that is maintaining the registries for the IETF.
The IP Option Numbers
is an example of a registry that contains the list of all the IP Options that can be carried by the Internet Protocol.
E.g., in that registry RFC 1191 contains the description for multiple entries in the registry, and so an additional mechanism is needed to find the Idris2 type for each of them.¶
That additional mechanism is abstracted as an extended registry that complements the existing registry, but for the sole purpose of finding the exact type to use for each codepoint to generate that Sum type.¶
Building an InternetHeader
type that contains all the IP Options defined at the time of type-checking looks like this;¶
The %provide
statement reads both the IANA registry and its associated extended registry and stores the result in the ipParameter
constant.
Then the %runElab
statement repetitively adds the types retrieved to the InternetHeader
type.¶
Instead of having to manually maintain the extended registries, they can be automatically updated by information coming from the type-checking of the types in the respective computerate specifications that define new values, by binding a specific entry in a registry with the type in the specification.¶
The mechanism used is also based on a type provider, but this time to update the extended registry instead of reading from it:¶
Here the statements bind the types defined in mtuR
and mtuT
to codepoints 11 and 12 in the extended registry of IANA's IP Option Numbers registry.¶
Computerate specifications can formalize their content to make it reusable as a building block for other specifications. A specification that organizes its content along the guidelines presented in this section can become a part of the Computerate Specification Standard Library.¶
To be part of the Standard Library, specifications must be organized in 4 components:¶
This is the formalization of the content of the standard as an Idris package i.e., a set of Idris modules (i.e. files) that exports some or all of the types and functions defined in it. The code of these Idris modules is generally interspersed with the content of the standard to form literate code.¶
This is a document section that guides the reader step by step in the use of the Idris package in a Computerate Specification. A tutorial may import the package itself to validate the examples provided as part of the tutorial. This section is considered informative.¶
This is a document section that explains the Idris package as a whole i.e, grouping explanations by feature.¶
This is a document section that is auto-generated from the structured comments in the types and functions of the code Idris package. It lists all the types and functions in alphabetic order, including the comments on parameters.¶
This document is itself an Idris package that is part of the Standard Library, Section 7 contains the tutorial part of that package, Section 8.1 forms its description part, and Appendix B contains its reference.¶
For a retrofitted document, the code will be mixed with the existing standard to produce a Computerate Specification but the tutorial, description and reference parts cannot be added to that standard, so they have to be part of a separate document. It can be a new specification written for the express purpose of documenting that package. This is the case for this specification, which documents a selection of retrofitted Computerate Specifications that are part of the Standard Library. E.g., Section 6.2.1, and Section 8.2.2 are respectively the tutorial and the description for [RFC5234].¶
For a new document, the four components should be part of it.
E.g., in this document Section 6.1.5.1, Section 8.1.5, and Appendix B.1.2 are respectively the tutorial, description, and reference for the BitDiagram
module.¶
RFCs, Internet-Drafts and standard documents published by other SDOs did not start their life as computerate specifications, so to be able to use them as Idris packages they will need to be progressively retrofitted. This is done by converting the documents into an AsciiDoc documents and then enriching them with code, in the same way that would have been done if the standard was developed directly as a computerate specification.¶
Converting the whole document in AsciiDoc and enriching it with code, instead of just maintaining a library of code, seems a waste of resources. The reason for doing so is to be able to verify that the rendered text is equivalent to the original standard, which will validate the examples and formal languages.¶
Retrofitted specifications will also be made available as individual git repositories as they are converted.¶
Because the IETF Trust does not permit modifying an RFC as a whole (except for translation purposes), a retrofitted RFC uses transclusion, a mechanism that includes parts of a separate document at runtime. This way, a retrofitted RFC is distributed as two separate files, the original RFC in text form, and a computerate specification that contains only code and transclusions. Transclusions use are explained in Appendix A.2.2.¶
Types and functions are exported by using the export
keyword.
Type constructors, interface functions and type functions implementation can be additionally exported by prepending the keyword public
to the export
keyword.¶
Additionally, types that may be transformed should be declared as explained in Section 6.3.2, i.e. by wrapping them first in a exported function that uses a quote declaration, then generating them locally using a declare
elaboration.¶
The AsciiDoc module provides a way to programmatically build an AsciiDoc document without having to worry about the particular formatting details.¶
Note that, at the difference of the AsciiDoc rendering process that tries very hard to render a document in any circumstances, the types in this module are meant to only generate a correct document.¶
E.g., the string this is {`N "bold"
}bold` will be rendered as this is bold
.
If the intent was to render the "bold" word in bold, then the string should have been this is {`Bold "bold"
}`.¶
The Computerate Specifying Library provides a number of types and functions aimed at defining and manipulating the data types that are commonly found in Protocol Data Units (PDU). The most elementary type of data is the bit-vector, which is a list of individual bits. Bit-vectors are not always sufficient to describe the subtleties the data types carried in a PDU, and several more precise types are built on top of them. See Section 8.1.3 for unsigned integers.¶
BitVector
is a dependent type representing a list of bits, indexed by the number of bits contained in that list.
The type is inspired by Chapter 6 of [Kroening16] and by [Brinkmann02].¶
A value of type BitVector n
can be built as a series of zeros (bitVector
) or can be built by using a list of O
(for 0) and I
(for 1) constructors.
E.g., [O, I, O, O]
builds a bit-vector of type BitVector 4
with a value equivalent to 0b0100.¶
Bit-vectors can be compared for equality, but they are not ordered. They also are not numbers and arithmetics operations cannot be applied to them.¶
Bit-vectors can be concatenated (concat
), a smaller bit-vector can be extracted from an existing bit-vector (extract
), or a bit-vector can be extended by adding a number of zeros in front of it (extend
).¶
The usual unary bitwise (shiftL
, shiftR
, not
) operations are defined for bit-vectors, as well as binary bitwise operations between two bit-vectors of the same size (and
, or
, xor
)¶
Finally it is possible to convert the bit at a specific position in a bit-vector into a Bool
value (test
).¶
This module permits to manipulate denominate numbers, which are numbers associated with a unit.
Examples of denominate numbers are cast (5, meter / second)
(which uses a unit of speed), or cast (10, meter * meter * meter)
(which uses a unit of volume).¶
In this module a denominate number is a value of type Denominate xs
.
It carries one number as a fraction.
Its type is indexed over a list of dimensions, each associated with an exponent number.
All together this type can represent any unit that is based directly or indirectly from the base dimensions defined in the Dimension
type.¶
Denominate numbers are constructed by passing a tuple made of a number (either an Integer
or a Double
) and a unit to the cast
function.
E.g., cast (5, megabit)
will build the denominate number 5 with the megabit
unit.¶
Dimensionless denominate numbers can be constructed by using the none
unit, as in cast (10, none)
¶
Denominate numbers can be converted back into a tuple with the fromDenominate
function.¶
Denominate numbers can be added, subtracted or negated (respectively +
, -
, and neg
).
All these operations can only be done on denominate numbers with the same exact dimension, and the result will also carry the same dimension.
This prevents what is colloquially known as mixing apples and oranges.¶
For the same reason, adding a number to a non-dimensionless denominate number is equally impossible.¶
The *
, /
, and recip
operations respectively multiply, divide and calculate the reciprocal of denominate numbers.
These operations can be done on denominate number that have different types, and the result dimension will be derived from the dimension of the arguments.
E.g. multiplying cast (5, meter)
by cast (6, meter)
will return the equivalent of cast (30, meter * meter)
.¶
Also multiplying a denominate number by a (dimensionless) number is possible e.g., as in multiplying cast (5, meter)
by cast (10, none)
, which will return the equivalent of cast (50, meter)
.¶
Ultimately we want to insert in a computerate specification the value of a denominate number, together with its unit, as text, which is done by implementing the Show
interface on a denominate number in its tuple form.
E.g. fromDenominate (cast (5, meter / second)) (kilometer / hour)
can be directly inserted in a document and will be substituted with the string 18 km/h
.¶
For each dimension we define a list of constants that represents units of that dimension.
Units that uses a prefix are automatically generated, which is the case for SI units for the Time
dimension (i.e., from yoctosecond
to yottasecond
), SI units (only positive powers of 10) for the Data
dimension (i.e., from kilobit
to yottabit
), and IEC units (positive powers of 2) for the Data
dimension (i.e., from kibibit
to yobibit
).¶
A bit diagram displays a graphical representation of a data layout at the bit level.¶
The BitDiagram
type is used to build BitDiagrams values.¶
The toAsciiDoc
function converts a BitDiagram
value into an AsciiDoc Literal Block which can be inserted directly in the document.¶
Adhoc types can also be used to generate a bit diagram, by passing that type to the toDiagram
function and the returned value to the toAsciiDoc
function.
The toDiagram
function will build a field only for types that have an implementation for the Size
interface.
The function toDiagram
also takes an auxiliary Type Names
that associate names with these types.¶
This module permits to manipulate values that are in the very generic form of trees. These manipulations consist of removing, or replacing a selected value or values in that tree.¶
The values to manipulate are selected using a path, which is a series of instructions used to move the focus of the manipulation up, down and sideway in the tree and to apply a predicate until a set of values are chosen.¶
The values selected are then either removed or replaced by a new value. The rest of the tree stays unmodified.¶
This mechanism is very generic and can be applied to any tree, but it is meant to modify the types defined in the Language.Reflection.TTImp
and Language.Reflection.TT
standard modules, with the goal of generating types that are derived from existing types.¶
The augmented-ascii-diagram
Idris package provides a set of modules that permits to generate parts of AsciiDoc documents that are conform to the [I-D.mcquistin-augmented-ascii-diagrams] specification.¶
The AAD.Pdu
type is used to define a PDU.¶
TBD.¶
The computerate command line tools are run inside a Docker image, so the first step is to install the Docker software or verify that it is up to date (https://docs.docker.com/install/).¶
Note that for the usage described in this document there is no need for Docker EE or for having a Docker account.¶
The following instructions assume a Unix based OS, i.e. Linux or MacOS. Lines separated by a "\" character are meant to be executed as one single line, with the "\" character removed.¶
To install the computerate tools, the fastest way is to download and install the Docker image using BitTorrent. The BitTorrent magnet URI for the version distributed with this version of the document is:¶
magnet:?xt=urn:btih:20d184da7a740dfecbb8b29464aa87610f95a316&dn=tools-05.tar.xz¶
After this, the image can be loaded in Docker as follow:¶
Note that a new version of the tooling is released at the same time a new version of this document is released, each time with a new BitTorrent magnet URI.¶
computerate
Command
The Docker image main command is computerate
, which takes the same parameters as the metanorma
command from the Metanorma tooling:¶
The differences with the metanorma
command are explained in the following sections.¶
The computerate
command can process Literate Idris files (files with a "lidr" extension, aka lidr files), in addition to AsciiDoc files (files with an "adoc" extension, aka adoc files).
When a lidr file is processed, all embedded code fragments (text between prefix "{`" and suffix "`}") are evaluated in the context of the Idris code contained in this file.
Each code fragment (including the prefix and suffix) are then substituted by the result of that evaluation.¶
The computerate
command can process included lidr files in the same way.
The embedded code fragments in the imported file are processed in the context of the included lidr file, not in the context of the including file.
Idris modules (either from an idr or lidr file) can be imported the usual way.¶
The literate code (which is all the text that is starting by a ">" symbol in column 1) in a lidr file will not be part of the rendered document.¶
The computerate
command can process transclusions, a special form of AsciiDoc include
that takes a range of lines as parameters:¶
The "sub" parameter permits modifying the copied content according to a regular expression. For instance the following converts references into the AsciiDoc format:¶
In the following example, the text is converted into a note:¶
The computerate
can include in a document the result of the generation of the IdrisDoc for a package.
This is done by including a line like this:¶
The leveloffset
attribute is used to adjust the level of the section generated, as the sections generated always have the level 2.¶
Instead of generating a file based on the name of the input file, the computerate
command generates a file based on the :name:
attribute in the header of the document.¶
In addition to the "txt", "html", "xml", and "rfc" output formats supported by metanorma
, the computerate
command can also be used to generate for the "pdf" and "json" formats by using these names with the -x
command line parameter.¶
If the type of document passed to the computerate
command (options -t
or --type
) is one of the following, then the document will be processed directly using asciidoctor
, and not metanorma
:
"html,
"html5,
"xhtml",
"xhtml5",
"docbook",
"docbook5",
"manpage",
"pdf",
and "revealjs".
The asciidoctor-diagram extension is available in this mode with the following supported diagram types:
"actdiag",
"blockdiag",
"ditaa",
"graphviz",
"meme",
"mscgen",
"nwdiag",
"plantuml",
and "seqdiag".¶
Because most references are stable, there is not much point in retrieving them each time the document is processed, even with the help of a cache, so lookup of external references is disabled.¶
The following command can be used to fetch an RFC reference:¶
Then ietf.xml file needs to be edited by removing the first two lines. After this the xml file can be converted into a AsciiDoc document:¶
This will generate an ietf.adoc file that can be copied in the bibliography section. Note that section level of the bibliographic item needs to be one up the section level of the bibliography section.¶
One exception is a reference to a standard document that is under development, like an Internet-Draft.¶
In that case the best way is to have a separate script that fetch, edit and convert Internet-Drafts as separate files. Then these files can be inserted dynamically in the bibliography section using includes.¶
The command to retrieve an Internet-Draft reference is as follow:¶
idr and lidr files can be loaded directly in the Idris REPL for debugging:¶
It is possible to directly modify the source code in the REPL by entering the :e
command, which will load the file in an instance of VIM preconfigured to interact with the REPL.¶
The idris2-vim
add-ons (which provides interactive commands and syntax coloring) is augmented with a feature that permits to use both Idris and AsciiDoc syntax coloring.
To enable it, add the following line at the end of all lidr file:¶
For convenience, the docker image provides the latest version of the xml2rfc, aspell, and idnits tools.¶
The Docker image also contains a extended version of git that will be used to retrieve the computerate specifications as explained in Appendix A.5.¶
The following sections list the tools distributed in the Docker image that have been modified for integration with the computerate
tool.¶
0.3.0 commit 05c9029¶
The interactive command :gc
permits to display the result of an elaboration.¶
The types in TTImp can carry the documentation for the types that will be generated from them.¶
The %cacheElab
directive permits to cache the result of an elaboration in the source code instead of been regenerated at each type-checking.¶
A new idris2 wrapper sets the correct mappings for recursive build.¶
--mkdoc <ipkg-file>
generates the package documentation in AsciiDoc on stdout.¶
Elaborations can be exported and documented.¶
package
and depends
in ipkg file can use quoted strings.¶
--depends
lists the dependencies.¶
--map <package>=<dir>
maps between a package and a directory.¶
--paths
now displays the paths after modification.¶
Fix boolean operators precedence.¶
Replace the literate processor by a faster one. Remove support for reversed Bird marks.¶
2.0.12¶
2.2.8¶
commit 964cebe¶
The current version of Docker in Ubuntu fails, but this can be fixed with the following commands:¶
:gc is currently broken.¶
Docstrings are not generated correctly.¶
Interactive commands are missing or not working well with literate code.¶
Changing the installation prefix requires two installations.¶
Documentation not generated for namespaces and records.¶
Recursive build incomplete.¶
RFC and I-D references are not correctly generated by relaton. The workaround is to remove the IETF docid and to add the following:¶
code blocks escape a '>' in the first column. The workaround is to insert a space before the '>'.¶
Add documentation support for all types in TTImp.¶
:gc!
should update the file.¶
%cacheElab
should check hashes.¶
Add a way to generate a hole name.¶
Literate ipkg to merge the Main.adoc and ipkg files.¶
Merge bibliographies.¶
Extract bibliography from computerate specification.¶
Generate xml2rfc <contact> element.¶
Generate .rfc.xml and err file with the same name.¶
Generate rfc.xml as xml and xml under another extension so the xml2rfc file can be directly submitted to the IETF secretariat.¶
Generate sourcecode blocks from existing code.¶
Pass surrounding line for embedded code so the Asciidoc module can process constrained elements.¶
Implement self-inclusion to reorder a document.¶
Backport embedded blocks from Coriander.¶
Starting vim in docker often result in an invalid terminal size when a file is loaded. Using the following command line solves the problem:¶
This future tool will be able to convert an xml2rfc v3 file into an AsciiDoc file. It will also be able to update an already converted file without losing the Idris annotations.¶
The Builtin Computerate Specification Standard Library.¶
A module to generate valid AsciiDoc.¶
A block of text¶
Implements Show.¶
A type for inline text.¶
Embedded code.¶
Concatene the second bit-vector after the first one.¶
Bitwise and between bit-vectors of identical size.¶
Build an empty bit-vector¶
Extend a bit-vector by n zero bits on the left side.¶
Extract a bit-vector.¶
Bitwise not of a bit-vector.¶
Bitwise or between bit-vectors of identical size.¶
Shift the bit-vector to the left by n bits, inserting zeros.¶
Shift the bit-vector to the right by n bits, inserting zeros.¶
Return a boolean that is True if the bit at position m is set.¶
Bitwise xor between bit-vectors of identical size.¶
A module that defines types, constants and operations on denominate numbers.¶
The multiplication operation between denominate numbers.¶
The addition operation between denominate numbers.¶
The subtraction operation between denominate numbers.¶
The division operation between denominate numbers.¶
The type of a denominate number for the data dimension.¶
The type of a dimensionless denominate number¶
The type of a denominate number for the time dimension.¶
Bit, the base unit of data.¶
The byte unit, as 8 bits.¶
The day, as unit of time.¶
Convert a denominate number into a tuple made of the dimensionless value (as a Double
) calculated after applying a unit, and that unit.¶
Generate all the IEC units based on the bit, from kibibit to yobibit.¶
Generate all the SI units based on the bit, from kilobit to yottabit.¶
Generates all the SI units based on the second, from yoctosecond to yottasecond.¶
The hour, as unit of time.¶
The minute, as unit of time.¶
The negation operation of a denominate number.¶
The unit for a dimensionless denominate number.¶
The octa unit, as 64 bits.¶
The reciprocal operation of a denominate number.¶
Second, the base unit of time.¶
The tetra unit, as 32 bits.¶
The wyde unit, as 16 bits.¶
A module that defines types for Petri Net.¶
Retrieve the type of token that can be stored in the place.¶
Calculate the combined type of a list of inputs.¶
A multiset.¶
Non-determinism monad.¶
Calculate the combined type of a list of outputs.¶
A transition.¶
Convert a list of types into a tuple of types.¶
A module to transform values structured as trees, with specialization to transform types via elaboration.¶
A selection path¶
Add a value as a sibling to values in a tree that are selected by a path.¶
Add a binding between a codepoint and a type in an extended registry¶
Remove the values in a tree as selected by a path.¶
An unsigned number with a length.¶
An unsigned integer is just a wrapper around a bit-vector of the same size.¶
For sanity sake, this type always assumes that the value of a bit is 2 ^ m - 1, with m the size of the unsigned int. In other words the first bit is the MSB, the last bit (the closer to Nil) is the LSB.¶
Implements Num, Integral, Eq, Ord, Size.¶
The ABNF Core rules.¶
An ASCII alphabetic character.¶
A "0" or "1" ASCII character.¶
Any ASCII character, starting at SOH and ending at DEL.¶
A Carriage Return ASCII character.¶
A Carriage Return ASCII character, followed by the Line Feed ASCII character.¶
Any ASCII control character.¶
Any ASCII digit.¶
A double-quote ASCII character.¶
Any hexadecimal ASCII character, in lower and upper case.¶
A Horizontal Tab ASCII character.¶
A Line Feed ASCII character.¶
A potentially empty string of space, horizontal tab, or line terminators, that last one followed by a space or horizontal tab.¶
A 8-bit value.¶
An ASCII space.¶
A printable ASCII character.¶
A potentially empty string of space, or horizontal tab.¶
A module to generate a valid ABNF.¶
Implements Show.¶
A module to generate augmented packet header diagrams.¶
A boolean expression¶
Implements ShowPrec, Show.¶
An expression¶
Implements ShowPrec, Show.¶
This module provides types for Internet Protocol Address.¶
An IP address.¶
Implements Size.¶
A class A address.¶
A class B address.¶
A class C address.¶
Types for the Internet Protocol.¶
Internet Protocol Header.¶
Implements Size.¶
Internet Protocol Header Options.¶
Implements Size.¶
End of Options.¶
No operation.¶
Security Option.¶
Loose Source and Record Route Option.¶
In an effort to quantify the potential benefits of using formal methods at the IETF, an effort to relabel the Errata database is under way.¶
The relabeling uses the following labels:¶
Label | Description |
---|---|
AAD | Error in an ASCII bit diagram |
ABNF | Error in an ABNF |
Absent | The errata was probably removed |
ASN.1 | Error in ASN.1 |
C | Error in C code |
Diagram | Error in a generic diagram |
Example | An example does not match the normative text |
Formula | Error preventable by using Idris code |
FSM | Error in a State machine |
Ladder | Error in a ladder diagram |
Rejected | The erratum was rejected |
Text | Error in the text itself, no remedy |
TLS | Error in the TLS language |
XML | Error in an XML Schema |
At the time of publication the first 1600 errata, which represents 25.93% of the total, have been relabeled. On these, 135 were rejected and 51 were deleted, leaving 1414 valid errata.¶
Label | Count | Percentage |
---|---|---|
Text | 977 | 69.09% |
Formula | 118 | 8.34% |
Example | 112 | 7.92% |
ABNF | 71 | 5.02% |
AAD | 49 | 3.46% |
ASN.1 | 40 | 2.82% |
C | 13 | 0.91% |
FSM | 13 | 0.91% |
XML | 12 | 0.84% |
Diagram | 6 | 0.42% |
TLS | 2 | 0.14% |
Ladder | 1 | 0.07% |
Note that as the relabeling is done in in order of erratum number, at this point it covers mostly older RFCs. A change in tooling (e.g. ABNF verifiers) means that these numbers may drastically change as more errata are relabeled. But at this point it seems that 31.89% of errata could have been prevented with a more pervasive use of formal methods.¶
Thanks to Jim Kleck, Eric Petit-Huguenin, Nicolas Gironi, Stephen McQuistin, and Greg Skinner for the comments, suggestions, questions, and testing that helped improve this document and its associated tooling.¶
Thanks to Ronald Tse and the Ribose team for the metanorma and relaton tools and their diligence in fixing bugs and implementing improvements.¶
Stephane Bryant¶
Email: stephane.ml.bryant@gmail.com¶
Document:¶
Tooling:¶
Document:¶
Sections 2, 3, 4 and 5 have been completely reorganized, edited, and extended as a tutorial.¶
New Terminology section.¶
Add a new Standard Library section, that contains the description of all the Idris modules and external packages that will be available for developing specifications.¶
Improve bibliography.¶
Extend the CLI section to cover:¶
Generate IdrisDoc of standard library packages and modules as a new appendix.¶
Update errata stats.¶
More compact changelog.¶
Many modifications following Stephane's reviews.¶
Tooling:¶
Additional metanorma features:¶
Generate json file.¶
Various bug fixes in metanorma and relaton.¶
Additional Idris2 features:¶
Idris2 wrapper to load local packages.¶
New include processor to generate IdrisDoc.¶
Process multiple fragments on each line.¶
Add support for asciidoctor outputs, including revealjs and diagrams.¶
Embedding code must now return a value that implements Show
.
String values are then stripped of their first and last double-quotes.¶
Fix bug where transcluded text is converted into ASCII art.¶
Embedded code in examples in lidr files can now be escaped with \
.¶
Replace Idris with Idris2 version 0.2.1.¶
Update metanorma to 1.1.4.¶
Update metanorma-ietf to 2.2.2.¶
Update xml2rfc to 3.0.0.¶
Downgrade idnits to 2.16.04.¶
Decommission the Docker image in dat://78f80c850af509e0cd3fd7bd6f5d0dd527a861d783e05574bbd040f0502da3c6.¶
Document:¶
Notes are now correctly displayed.¶
Add "Implementations Oriented Standards" section.¶
Add "Extended Registries" section and appendix.¶
Add paragraph about hierarchical petri nets.¶
Convert "Verified Code" section into a top level section, and expand it.¶
Add "Implementation-Oriented Standards" section.¶
Tooling:¶
Many bug fixes in metanorma-ietf.¶
Update xml2rfc to 2.40.1.¶
Rebuilding text for an RFC with xml2rfc now uses pagination.¶
Update metanorma-ietf to version 2.0.5.¶
The "computerate" command can directly generate PDF files.¶
Add support in xml2rfc for generating PDF files.¶
Add asciidoctor-revealjs.¶
Update metanorma to version 1.0.0.¶
Update metanorma-cli to version 1.2.10.1.¶
Document¶
Switch to rfcxml3.¶
Status is now experimental.¶
Many nits.¶
Fix incorrect errata stats.¶
Move acknowledgment section at the end.¶
Rewrite the APHD section (formerly known as AAD) to match draft-mcquistin-augmented-diagrams-01.¶
Fix non-ascii characters in the references.¶
Intermediate AsciiDoc representation for serializers.¶
Tooling¶
Document¶
New changelog appendix.¶
Fix incorrect reference, formatting in Idris code.¶
Add option to remove container in all docker run
command.¶
Add explanations to use the Idris REPL and VIM inside the Docker image.¶
Add placeholders for ASN.1 and RELAX NG languages.¶
New Errata appendix.¶
Nits.¶
Improve Syntax Examples section.¶
Tooling¶