Internet-Draft | Candidate Selection | November 2023 |
Hoffman | Expires 23 May 2024 | [Page] |
This document describes a process to randomly select a subset of named candidates from a larger set of candidates. The process uses an unpredictable value that can be trusted by all candidates.¶
This draft has a GitHub repository. Issues and pull requests can be made there.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 23 May 2024.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
It is common to need to pick a subset of people from a larger group using a random selection method. This is often done on an ad hoc basis, but for some selections, a more formal process is needed, particularly if the people in the larger group don't all trust the administrator of the selection process to be unbiased.¶
This document gives a simple, understandable process that can be done for groups and subsets of arbitrary size. The process is purposely transparent and reproducible. It works with any group of entities that have names: people, companies, locations, and so on.¶
As a simple example, a future leadership committee will have a fixed size. The members of the committee will be selected from a large pool of volunteers. Someone is in charge of collecting the names of the volunteers and making a randomized selection among them for the leadership committee. They can use the process in this document to make that selection in a way that is both provably random and understandable.¶
As described later in this document, the process can also be used for weighted selections (Section 5) and for randomly sorting lists of candidates (Section 6).¶
Due to the formatting used in this document, the reader is encouraged to read the HTML version, although the text version is still usable.¶
See [I-D.thomson-elegy-vrs] for a similar method as described here.¶
A few terms are used throughout this document:¶
The act of collecting names into a pool, making a random selection from the pool, and publishing the entire process in a clear and transparent method.¶
The person who performs the steps of the ceremony.¶
A person, organization, or other namable entity that is possibly being selected during the ceremony.¶
The name used by each candidate in the pool. The candidate name is expressed as a string of Unicode characters in UTF-8 format [Unicode] [UTF-8].¶
A publicly-visible string that is only known after the pool of candidates has been closed. (Note that this is different from what is normally called a "random number" or a "random string". True random numbers or strings are designed to be nearly impossible to predict, whereas D in this process has weak but sufficient randomness.)¶
The number of candidates that will be selected from the pool.¶
The steps in a ceremony that follows this process is given here. See Section 3 for more detail on the steps.¶
The CA starts the ceremony by performing the following steps at the same time:¶
Announces an end date for when the pool will be complete.¶
Announces a later date on which D, the difficult-to-predict string, will be selected.¶
Announces the source where D will be found on that later date.¶
Announces S, the number of candidates that will be selected.¶
Opens up the pool of candidates for submission.¶
Candidates submit their names to the pool until the closing date, and the CA puts the allowed names in the pool.¶
On the closing date, the CA publishes the candidate names from the pool with the hexadecimal value of the UTF-8 encoding for each candidate name.¶
On the date for selecting D, the CA gets D from the announced source.¶
The CA calculates the hashes used to make the selection. The CA concatenates each candidate name with D (name first, then D), uses the SHA-256 hash function [SHA-2] on the resulting string, and records the value of the hash as a UTF-8 string.¶
The CA arranges the set of hash values in alphabetic order from highest to lowest. They then select the S candidates from the top of the list (that is, the names whose hash values are largest).¶
Much of the trust in the selection process is based on the CA not being able to influence the selection. If the CA can choose, or even influence, the value of D, they can help establish the outcome of the selection. Similarly, if one or more of the candidates can influence the value of D, they can increase their chance of being selected.¶
To make the process trustworthy, the value of D must be unrelated to the CA or the candidates, and it must be selected only after the list of candidates is completed. The most important things for a ceremony is that the source is announced before the ceremony starts, that all participants and viewers of a ceremony can find the source on the date specified by the CA, that all candidates believe that no candidate can influence D on that date, and that everyone gets the same value when they go to the source for that date.¶
The process described in this document uses the closing value for the FTSE 100 Index on the particular day selected by the CA. The FTSE 100 Index is a long-established index based on 100 stocks; it is sometimes known by its stock ticker as "UKX". A common open source of those values is the Wall Street Journal. The daily closing for the FTSE 100 Index at the Wall Street Journal can currently be found here.¶
Note that the location for sources of daily closing values can change over time. The CA must check that the intended source is still active, and still available when the ceremony starts.¶
Values from the FTSE 100 Index in this procedure are always encoded as four digits, followed by a period character (U+002E), followed by two more digits, such as:¶
7623.10¶
If the FTSE 100 Index ever goes above 10,000, the encoded values would be five digits, followed by a period character (U+002E), followed by two more digits.¶
Although the procedure in this document uses the FTSE 100 Index as a public source of randomness, there are many other sources that can be used by a CA, as long as the source chosen is trusted by the candidates. There are many other stock indexes with enough stocks in them to make prediction of the exact value have less than a 0.1% chance. Having said that, using a future price of a single stock is probably not a good public source of randomness because candidates are likely to trust the variability of that less than the variability of a basket of stocks.¶
Some systems that use public sources of randomness use the results of an unrelated lottery, such as the type of lotteries that many countries hold. These are probably trusted by candidates not be able to be manipulated. However, lotteries normally are a set of numbers between 1 and 100, often five or more such numbers. If the CA uses such a lottery for this procedure, they need to specify how the numbers from the lottery of the chosen date will be combined, including whether or not the numbers from 1 to 9 need to be preceded by a "0" character.¶
There are other public sources of randomness, such as cameras pointed at lava lamps and so on. These are probably not good choices for the type of ceremony described in this document because the operators of such sources are not publicly trusted entities.¶
Note that some sources of randomness may have less randomness than it appears at first glance. There can be hidden biases towards certain values that are not obvious when looking at a small set of recent values. If a CA chooses a source for D other than the FTSE 100 Index, the data from source should be measured over a long period of time for unexpected biases toward values that a candidate can use to improve their chance of being selected.¶
The CA is the sole arbitrator for whether a candidate is allowed to enter the pool. The CA is also the sole arbitrator of what name string (in UTF-8) the candidate can use in the pool.¶
The order that the candidates join the pool does not affect the outcome of the selection process. Said another way, the pool is kept as an unordered set of candidates, not an ordered list of candidates.¶
It is a good practice for the CA to have consistent rules for the names, such as only using ASCII space characters (U+0020), only one space between each name part, no trailing spaces, and so on. These rules can be more difficult when the candidates are company names (such as whether the legal standing of the company such as "Inc." is included), but making consistent rules is not that difficult.¶
At the closing of submissions, the CA verifies that the number of candidates in the pool is larger than S. If the length is the same as S, the rest of the steps are unneeded (and could be confusing), because all candidates will automatically be selected. If the length is shorter than S, the ceremony stops because there are too few candidates.¶
The method for publishing the set of candidates is determined by the CA. Figure 3 gives an example of how a CA might publish this information.¶
On the day that the CA announced for the selection of D, the CA goes the the source they announced and gets D. After the CA retrieves D from the announced source, they encode D as a UTF-8 string. In the example of the FTSE 100 Index, a closing value for the day announced at the beginning of the ceremony might be "7623.10". This would be encoded in UTF-8 as the string of characters whose hex value is 0x373632332e3130.¶
Different programming libraries have different requirements for the input to hash functions.
Appendix A uses the built-in hashlib
library in Python, which requires that text strings have a specified encoding.¶
The process of selecting is simply taking the S candidates whose hash value is highest. This can easily be determined by sorting the text representation of the hash values in descending order because in UTF-8 and ASCII, digits have lower codepoints than letters.¶
To complete the process in a transparent manner, the CA should publish all known data for the ceremony. This includes S, D, the hexadecimal value of D, all of the information for each candidate, and the full list of selected candidates. Figure 5 shows an example of what this publication might look like.¶
Ceremonies don't always go as planned. For example, after a ceremony completes, one or more of the selected candidates might be removed from the selected set due to voluntary withdrawal or established rules (such as no two candidates being from the same geographic region). In such cases, no new ceremony is needed: the CA simply selects the next candidate(s) on the list that is ordered by hash values.¶
Similarly, if after the selection process is completed, the size S of the selected set needs to increase, the CA simply selects the next candidate(s) on the list that is ordered by hash values.¶
In some candidate selections, the CA wants to give candidates a weighted chance of being selected. For example, a legislature might select its leadership randomly, but weights the chance of being selected by the size of the membership of the political party in the legislature. The CA can create the pool with multiple names for each party, giving each name a number.¶
For example, assume a legislature has 27 members of the Orange party, 20 members of the Yellow party, and 7 members of the Green party. The CA could create a pool consisting of the names "Orange1", "Orange2", ... "Orange27", "Yellow1", "Yellow2", ... "Yellow20", "Green1", "Green2", ... "Green7". The selected party would be the one whose name appears in the first name of the list of hashes.¶
Some use cases do not involve a selection of candidates from a larger list, but instead sorting the list of candidates randomly. The process given in this document can be easily used to do this: set S to the size of the pool, peform the steps of the ceremony, and create the output list in the last step as all S candidates in alphabetic order from highest to lowest of the hash values.¶
This document has no IANA considerations.¶
The value D used in this process is explicitly not cryptographically strong; in fact, it might provide only a few bits of randomness. The FTSE 100 Index might be predictable after the third digit from the right, but not the last three digits, meaning that they only have randomness of about 10 bits. The value of D is concatenated into each candidate string before the whole string is hashed, so incorrectly predicting even one character of D completely changes the value of the hash for comparison.¶
A cryptographic hash function like SHA-256 has the property that changing any individual bit of the input will change every bit in the output with a 50% chance, regardless of the position of the bit in the input. Appending a small amount of randomness at the end of the input is just as effective as prepending the randomness at the beginning of the input nd just as effective as mixing the randomness throughout the input. The procedure in this document appends the string from the FTSE 100 Index at the end of the candidate name because it makes viewing the pre-hashed result easier while still causing the maximum change to the resulting hash value.¶
A candidate who has a lot of leeway in choosing their name can possibly increase their chance of being selected by as much as 0.1% with such source of randomness. The procedure in this document assumes that candidates have very little leeway in choosing their names; the CA must accept each name before it is put into the pool. The combination of the limited leeway for choosing the names in the pool and the necessity to predict D exactly in order to gain any benefit means that D needs much less randomness that a random number that would be used during encryption or authentication.¶
The following is a list of figures for an implementation of the procedure shown in this document.¶
The Python script in Figure 1 implements the algorithm from this document.¶
The file that contains the list of names is shown in Figure 2. (The names are the winners of the Nobel laureates in Literature for 2016 through 2021.)¶
A file showing the UTF-8 representation of the names from Figure 2 is shown in Figure 3. This file is suitable for showing to the candidates.¶
The file that contains the S and D on separate lines is shown in Figure 4.¶
Figure 5 shows the result of running the program with that file as input.¶