Copyright © 2005 jsd
ABSTRACT: We discuss the principles of a Hardware Random Generator (HRNG), also called a True Random Generator (TRNG). It is suitable for a wide range of applications, from the simplest benign applications to the most demanding high-stakes adversarial applications, including cryptography and gaming. It harvests entropy from physical processes, and uses that entropy efficiently. The hash saturation principle is used to distill the data, so that the output has virtually 100% entropy density. This is calculated from the laws of physics, not just statistically estimated, and is provably correct under mild assumptions. In contrast to a Pseudo-Random Generator, it has no internal state to worry about. In particular, we describe a low-cost high-performance implementation, using the computer’s audio I/O system.
Other keywords: TRNG, HRNG, hardware random number generator, uniform hash, unbiased coin flip.
Although there exist garden-variety “Random Number Generators” that are suitable for garden-variety applications, they fail miserably in high-stakes adversarial applications. See section 3 for a discussion of the range of applications. Some of the things that can go wrong are discussed in section 9.1.
In this paper, we explain how to construct a True Random Generator (TRNG), specifically a Hardware Random Generator (HRNG) – also called a High-Quality Random Generator or High-Entropy Random Generator – suitable for a wide range of applications, including extremely demanding ones. We will explain and then use some key theoretical ideas:
We have implemented a generator using these principles. Salient engineering features include:
If you ask three different people, you might get six or seven different definitions of “random”.
At a minimum, we need to consider the following ideas:
For example: Once upon a time I was working on a project that required a random generator. The pointy-haired boss directed me to use the saved program counter at each interrupt. I had to explain to him that it was an event-driven system, and therefore the saved PC was almost certain a constant, namely the address of the null job. Of course it was not reliably a constant, but it was also not reliably unpredictable. In other words, there was no nonzero lower bound on the entropy density.
The terminology is not well settled. I have heard intelligent experts call items 1, 2, 4, and 5 on the previous list “random”. I have also heard them call items 3, 4, and 5 “non-random” or “not really random”. Sometimes even item 2 is included in the catagory of “not really random”.
We consider the terms “random” and “non-random” to be vague, clumsy, and prone to misunderstanding. If a precise meaning is intended, careful clarification is necessary. This remains something of an unsolved problem, especially when talking about PRNGs.
In contrast, when talking about HRNGs, we can speak very precisely. We associate the terms entropy and entropy density with a very specific meaning.
A True Random Generator (such as a Hardware Random Generator) has the property that the outputs are unpredictable in principle.
At the opposite extreme there stands the classic Pseudo-Random Generator (PRNG), where the output is calculated using a secret internal state variable. A high-quality PRNG depends on the assumption that it is computationally infeasible for any adversary to infer the internal state by observing the outputs. More generally, the assumption is that the adversary will find it not worthwhile to infer the internal state. This is a moving target, depending to some degree on the value of what the pseudo-random distribution is protecting.
Pseudo-random generators, as a class, suffer from a number of problems. For starters, the state variable must be initialized, which is often remarkably problematic. Then the state variable must be preserved and defended for all time, which is problematic, too. Furthermore, one must make some crucial assumptions, namely the existence of a one-way function that has long-term resistance to cryptanalytic attack. All of modern cryptography rests on assumptions of this kind, but still these are unproven assumptions.
Initializing the secret internal state variable is called seeding the PRNG. A classic PRNG is seeded only once, at the start of operations. There is some finite amount of entropy in the seed. As the PRNG produces more and more output, the entropy density of the output goes asymptotically to zero.
The rand() or random() functions provided by typical computer languages are in fact classic pseudo-random generators of this kind.
Higher security can be obtained if the generator is re-seeded every so often. This could be considered an “improved” PRNG, but really it sits in a gray area between the classic PRNG and a TRNG. I prefer to call it a Stretched Random Generator (SRNG). Unlike a classic PRNG it has a non-zero lower bound on the entropy-density of its output. Unlike a TRNG, the entropy density is not 100%. Typically the entropy density of a SRNG is less than one part per thousand, sometimes as low as one part per billion.
The SRNG is a hybrid, with some internal state and some physical entropy-input. This may be able to produce symbols more copiously than a purely stateless system, while remaining highly resistant to cryptanalysis.
The concept of PRNG cannot stand on its own, because any PRNG requires a seed. We have to ask where the seed came from. If it came from some other PRNG, we have to ask where the other seed came from. The only way to avoid infinite regress is to use a HRNG to provide a seed.
To be clear: A HRNG can exist without a PRNG, but not vice versa.
Entropy is defined in terms of probability. In the same way, conditional entropy is defined in terms of conditional probability. The same idea applies to other types of random distributions, including pseudo-random distributions.
For example: Suppose I shuffle a deck of 52 cards. That provides 226 bits of entropy, assuming we pay attention only to the rank and suit, according to the usual convention. Now suppose I arrange another deck in exactly the same way. The two decks are equivalent. Either one separately would provide 226 bits of entropy, but the two of them together still provides only 226 bits. So the conditional entropy of the second deck – conditioned on having the first deck – is zero.
The basic objective is to provide a copious supply of randomly-generated numbers, with quality and quantity suitable for a wide range of applications. Some specific applications are listed in section 3. Quality requires, among other things, defending against various possible attacks. Some well-known attacks are discussed in section 9.1. A more detailed list of objectives and tradeoffs is discussed in section 2.2.
Right now, in this section, we give a broad outline of what is possible and what is not possible.
So far, we have analyzed the HRNG mainly in terms of entropy density, denoted by ρ. We choose a criterion ρmin very close to 100%, and then build a generator that produces ρ > ρmin. We are not pretending to achieve ρ = 100% exactly.
In any case, entropy density does not tell the whole story. For instance, consider a sequence of 8000 bits, consisting of 7999 completely random bits followed by 1 completely predictable bit. That would have an entropy density of 99.99% (equivalent to 159.98 bits of entropy in a 160 bit word) on average. However, in this case, the average doesn’t tell the whole story. Despite that high average, the sequence would be unsuitable for many applications. For instance it would be unsuitable for use as a stream cipher. Of the 28000 possible codewords, half of them are missing from the output ... and the attacker quite possibly knows which codes are missing.
In contrast, consider a 160-bit word containing 159.98 bits of entropy spread more-or-less evenly throughout the word. It would be very much harder for an adversary to exploit the fact that ρ is not quite 100%. We can begin to see why by reference to figure 2. It shows a pie chart with 64 sectors. If the probability were evenly distributed, there would be 6 bits of entropy However, if you look closely, you see that some sectors are bigger than average or smaller than average. The entropy is only 5.9 bits.
If we extend this to a system with 2160 codewords and 159.98 bits of entropy. The probability is almost but not quite evenly distributed across all possible codewords. No codewords are missing. Some of them are slightly more probable than others, so in theory if you see a particular codeword, you could guess that it will turn up again with higher-than-normal probability. However, in practice, that sort of direct approach does you no good, because even if that codeword were to have vastly too much probability, you would never see it again anyway. Note that 2160 is on the order of 1048, whereas the age of the universe is less than 1027 nanoseconds. So if you are waiting for a particular codeword to come up again, you will have to wait a very long time.
There are of course indirect approaches, such as cryptanalyzing the hash, so as to use the output to discover patterns on the input ... but this is believed to be computationally infeasible. Indeed, a cryptologic hash is designed to withstand attacks vastly harsher than this. Consider the contrast:
|For a PRNG, we need a hash that can withstand attacks when the output entropy density is 0.01%.||For a HRNG, we need a hash that can withstand attacks when the output entropy density is 99.99%.|
In the cryptology literature, there are two main types of arguments: information-theory arguments and computational-feasibility arguments. From an information-theory point of view, it is possible for your adversaries to break any block cipher (such as DES or AES) by searching the key-space. The amount of ciphertext they need to mount such an attack is given by the unicity distance, which is proportional to the key-length and inversely proportional to how much they know about the statistics of the plaintext. For example, just a few lines of ordinary (uncompressed) business correspondence grealy exceeds the unicity distance for commonly-used ciphers such as 3DES or AES-128, so an attack is possible in principle. Whether it is computationally feasible for your adversaries to mount such an attack is another question.
In contrast, information theory tells us that it is impossible to cryptanalyze a one-time pad system, if the pad is truly random. It absolutely does not matter how much computing power the adversariers bring to bear. The unicity distance is infinite.
So: When we say that the output of the HRNG has 159.98 bits of entropy spread throughout a 160-bit word, that is basically an information-theory statement, based on physics and statistics plus some mild assumptions about the mixing properties of the hash function. In contrast, to argue that your adversaries cannot exploit the deficit (the missing 0.02 bits of entropy) will presumably require a computational-feasibility argument that depends on the details of your application. Such application-specific arguments are beyond the scope of this paper, although we remark that using a HRNG ought to be much better than using some less-carefully designed RNG, and there are very many applications for which the HRNG seems more than good enough.
Here are some of the criteria on which a random generator can be judged:
These criteria conflict in complex ways:
These conflicts and tradeoffs leave us with more questions than answers. There are as-yet unanswered questions about the threat model, the cost of CPU cycles, the cost of obtaining raw entropy, et cetera. The answers will vary wildly from machine to machine.
Here is a list of typical applications where some degree of randomness is needed, followed by a discussion of the demands such applications place on the random generator:
The first three items require a random distribution with good statistical uniformity and independence, but do not usually require resistance against cryptanalytic attack. The molecules in a Monte Carlo simulation are not going to attack the Random Generator. Similarly, the shuffle used in a low-stakes game of Crazy Eights does not require a cryptographically strong RNG, since nobody is going to bother attacking it.
As the stakes get higher, the demands on the RNG get stricter. Implementing a multi-million dollar lottery requires the RNG to have excellent tamper-resistance as well as good statistical properties.
Game players need randomness, too. In the simple scissors/paper/stone game, choosing a move at random is the minimax strategy. More sophisticated games like poker require a mixed strategy, i.e. a mixture of deterministic and non-deterministic considerations. You must study the cards, study the other players, and sometimes make an unpredictable decision about whether to bluff or not.
Cryptography is far and away the predominant high-volume consumer of high-grade entropy. This is what motivated our search for better generators. Even a smallish secure-communications gateway can consume millions of bits in a one-minute period.
Item #7 requires special comment: Suppose you need to initialize the internal state in a PRNG. You can’t initialize it from that PRNG itself. If you derive the seed from another PRNG, we must ask how that was seeded. And so on. The only way to prevent an infinite regress is to use a hardware-based generator at some point. We conclude that there is really no such thing as a pure, all-software PRNG. That’s because directly or indirectly, each and every PRNG requires initialization from outside – that is, outside of the usual models of “computation”.
Figure 3 shows a block diagram of a High-Entropy Random Generator. The symbols it generates are random in the strictest sense.
The starting point for an HRNG is the unpredicability associated with one or more physical processes, such as radioactive decay or thermal noise. Such physical processes are partially random, in the sense that they contain contributions which are believed to be extraordinarily difficult for anyone to predict, perturb, or replay.
The raw symbols (as sampled from the data-acquisition apparatus) will, in general, be partly predictable and partly unpredictable, as discussed in section 9.3. The raw symbols are essentially 100% resistant to the attacks that plague PRNGs. That’s because those attacks are directed toward capturing the internal state, and the HRNG doesn’t have any internal state worth capturing.
To quantify the amount of randomness in the raw samples, we use the concept of surprisal. For a given state, the surprisal is defined as:
|$(i) := log(1/Pi) (1)|
where Pi is the probability of the state. For details, see appendix A. The entropy is defined as a weighted average of the surprisal, specifically the expectation of the surprisal:
where the sum runs over all possible states. Note that the entropy is a property of the overall probability distribution, whereas the surprisal is a property of a given state i; there is no such thing as the entropy of a single state.
Let’s consider an example. We read one byte of data from the data-acquisition system. There will be 256 possible states. Suppose that the data is almost entirely predictable:
Next, consider three bytes of data from the same source. The first byte has the value A1 or B1 (as before), the second byte is either A2 or B2, and the third byte is either A3 or B3. In each case there is 99-to-1 probability ratio, and the probabilities are independent. There are in principle 2563 states, but only eight states occur with any nonzero probability. You can easily enough calculate the surprisal in each case.
Continuing down this road, consider 2100 such bytes. There will be 22100 possible states that occur with nonzero probability. We can group them into 2101 classes, according to whether they contain 0, 1, ... 2100 copies of the secondary symbol. The most likely single class will be the one with 21 copies of B. Each and every state in this class will have a surprisal of 170 bits, which is just what we would expect, based on multiplying the source entropy of each byte by 2100 bytes.
The states containing 20 or 22 copies will be less likely, but only slightly less. Indeed, using the standard formula for the binomial distribution, we find there will be an 11% chance of getting 16 or fewer copies of B, which means an 11% chance that the surprise value will be 137 or less.
If, instead, we consider 5000 such bytes, the most-likely value for the surprisal is around 404 bits, and there is only about one chance in a billion that the surprisal will be less than 170 bits.
At this point we’ve got a substantial amount of surprisal, but the surprise density remains quite low, only 0.08 bits per byte. Our customers want our output to have a surprise density closer to 8 bits per byte, so we have some more work to do. This is the role of the hash function, as shown in figure 3.
Let us return for a moment to the case where only one byte has been received from the data-acquisition apparatus. We feed this data to the hash function. For definiteness, let’s use SHA-1 (reference 1) as the hash function. That is a fancy cryptologic hash function. (It is fancier than we really need, but it does the job.) It takes an arbitrarily long input and produces an output of width W = 160 bits. If we histogram the raw data, there are 256 states, of which two occur with nonzero probability. If we histogram the output of the hash function, there are 2160 states, of which two occur with nonzero probability.
The story is similar when there are three bytes of raw data. There are 8 states at the input to the hash function, which map to 8 states at the output. All of the surprisal present at the input is transported to the output, minus some infinitesimal allowance for the possibility that two inputs might produce the same hash-output (i.e. a hash collision).
The story changes slightly when we have 2100 bytes of raw data. The entropy density remains the same, so now we have Sin = 170 bits of entropy in the buffer, at the input to the hash function. There are 22100 states with nonzero probability, and 2170 states with appreciable probability. At the output, there cannot possibly be more than 2160 states, because the output is represented using 160 bits. There will be hash collisions ... lots of them. Each of the 2W output states will have on average 22100/2160 = 21940 inverse images, of which on average 2170 / 2160 = 210 occur with appreciable probability. In a moment we will clarify what “on average” means, but first let’s finish the calculation of the entropy of the hash-function output.
We histogram the output, which requires C = 2W cells in the histogram. Unlike the very sparsely-occupied histograms considered previously, this histogram is expected to have nonzero occupation in every cell. There are 2170 input states we need to worry about, each carrying a snippet of probability. Each such snipped carries roughly 2−170 of the total probability. (In the language of measure theory, which is the foundation of modern probability theory, the measure of each snippet is 2−170.)
Each time we apply the hash function to such an input, that snippet of probability lands in one of the cells of the output-histogram. If by some miracle all these contributions were exactly evenly distributed, the entropy of the output register would be exactly 160 bits. However, in reality, some of the cells will accumulate a few more snippets than average, and some a few less.
To an excellent approximation, the probability that a given cell will contain exactly m snippets is given by the binomial distribution:
|P(m | N,C) =|
|(1/C)m (1−1/C)N−m (3)|
where C is the number of cells (log(C) = W = 160 in this example) and N is the total number of snippets deposited into those cells (log(N) = Sin = 170). We can verify that the normalization is correct:
|P(m | N,C) = 1 (4)|
which conveys the unsurprising fact that in every cell there must be some number of snippets.
Now, to calculate the entropy, we need to apply the definition of entropy, equation 2, which requires summing over all cells in the output-histogram. It is convenient to re-arrange the sum as follows: we sort the cells into groups, according to their m value. We perform the sum group by group, taking into account the number of members in each group. There are C P(m | N,C) members in the group where the snippet-count is m. So the entropy is:
|Sout(C,N) = −|
|(m/N) log(m/N) C P(m | N,C) (5)|
and we recall that log(C) is the width of the output register, and log(N) is the amount of entropy present in the input buffer.1
The dependence of Sout on C and N, as given by equation 5, is quite a remarkable function. If N is small compared to C, then Sout just equals Sin. That is, all the input-entropy shows up as output-entropy, which makes sense, because there are no hash collisions. On the other hand if N is large compared to C, then Sout saturates and converges to W, the maximum entropy the output register can hold. This can be seen in the following table (using W = 160 bits in this example):
|input entropy||output entropy||% of 160|
This is very nice. It means we need only a few “extra” bits of entropy in the input buffer, to smash the output up against the asymptote, giving the output virtually 100% entropy density.
Note that equation 5 has a useful homogeneity property, provided C is not unreasonably small: If you add k bits of width to the output register (W), and add k bits of entropy to the input buffer (Sin), then you will get k more bits in the output register (Sout). For example, if we add 100 to the numbers in the next-to-last row in the table, we find that feeding 265 bits to a 260-bit hash function will produce 259.98 bits at the output.
To summarize the design: To produce a high entropy density we require:
In order for the HRNG to work, the hash function be reasonably well-behaved. As an example of what could go wrong, consider a few badly-behaved examples:
Weak hash functions have the property that they waste entropy. That is, their outputs are in some way more predictable than the outputs of a strong hash function would be. Obviously for our purposes we want to use a strong hash function.
The hash function cannot possibly create entropy, and we do not need it to do so. Instead, what we need is a hash function that does not waste entropy through unnecessary hash collisions. In the unsaturated regime, there should be very few collisions, so any entropy present at the input is passed to the output without loss. In the saturated regime, when the output approaches 100% entropy density, there will necessarily be collisions and loss of entropy.
Hash functions are used in a wide range of applications. As a first example, let us consider how hash functions are used in compilers: the input to the hash function is a symbol-name, and the output is used as an index into the symbol table. It is often much more efficient to manipulate the hash than to manipulate the corresponding input string.
When two different inputs produce the same hash, it is called a hash collision. This is not desirable from the compiler’s point of view. So, the essential properties of a compiler-grade hash function are:
- 1) It takes as input a variable-length string, and returns a fixed-length string, called the hash.
- 2) It is a function; that is, the same input produces the same output every time.
- 3) Vague anti-collision property: There should not be more hash collisions than necessary.
A cryptologic hash function has several additional properties:
- 4) Noninvertibility: For any given hash, it is is computationally infeasible to find an input that generates that hash.
- 5) Cryptologic anti-collision property: It is computationally infeasible to construct two different inputs that produce the same hash.
A typical application for a cryptologic hash is to compute a Hashed Message Authentication Code (HMAC), which can be used to detect and deter tampering with the message. See reference 2.
Property #4 (noninvertibility) only applies if there is a sufficient diversity of possible inputs. If the only possible inputs are “yes” and “no” then the adversary can easily search the input space; this is called a brute-force attack. When computing HMACs one prevents such a search by padding the message with a few hundred bits of entropy. The hash function itself has property #2 but not property #4, while the HMAC has property #4 but not property #2, because HMAC(message) = HASH(padding+message).
When building a High-Entropy Random Generator, we do not require noninvertibility at all. If the adversary knows the current hash-function output, we don’t care if the hash-function input can be inferred, because that has no bearing on any past or future outputs. The generator has no memory. This is one of the great attractions of this HRNG design, because nobody has ever found a function that is provably noninvertible.
In stark contrast, noninvertibility is crucial for any Pseudo-Random Generator. It is easy to arrange that the state variable (the input to the hash function) has plenty of bits, to prevent brute-force searching; that’s not the problem. The problem lies in initializing the state variable, and then defending it for all time. The defense rests on the unproven assumption that the hash function is noninvertible. What’s worse, the burden on the hash function is even heavier than it is in other applications: The hash function must not leak even partial information about its input. Otherwise, since the hash function is called repeatedly, the attacker would be able to gradually accumulate information about the generator’s internal state.
In discussions of this topic, the following notion often crops up:
- 6) Strict avalanche2 property: whenever one input bit is changed, every output bit must change with probability ½. See reference 3.
This statement is not, alas, strict enough for our purposes. Indeed, WEAKHASH-1 and WEAKHASH-2 satisfy this so-called strict avalanche criterion. To be useful, one would need to say something about two-bit and multi-bit changes in the input. We dare not require that every change in the input cause a change in the output, because there is a simple pigeon-hole argument: There are V = 2562100 input states and only C = 2160 output states. There will be hash collisions; see section 4.2.
To see what criteria are really needed, we will need a more sophisticated analysis. The most general representation of a hash function (indeed any function) is as a truth table. Table 2 shows a rather weak hash function.
The input symbols are delimited by parentheses to emphasize that they are variable-length strings; row 1 is the zero-length string. It should be obvious how to extend the table to cover arbitrary-length inputs. It should also be obvious how to generalize it to larger numbers of hashcodes.
The hashcodes in the Output column were constructed by cycling through all possible codes, in order. All possible codes occur, provided of course that there are enough rows, i.e. enough possible inputs. What’s more, all possible codes occur with nearly-uniform frequency, as nearly as can be. This construction produces exactly uniform frequency if and only if the number of rows is a multiple the number of possible codes, which we call the commensurate case. The example in the table is non-commensurate, so even though the codes are distributed as evenly as possible, complete evenness is not possible. In this example, the (00) code is slightly underrepresented.
Table 2 is a hash function in the sense that it maps arbitary-length inputs to fixed-length outputs, but it has very poor collision-resistance, and it has no vestige of one-wayness.
Starting from table 2 can construct a better hash function by permuting the rows in the Output column, as shown in table 3. If there are R rows in the table and C possible codes, there are V! / (V/C)!C distinct permutations. This is exact in the commensurate case, and a lower bound otherwise. This is a huge number in practical situations such as V = 2562100 and C = 2160.
If we temporarily assume a uniform probability measure on the set of all possible permutations, we can choose a permutation and use it to construct a hash function. Statistically speaking, almost all hash functions constructed in this way will have very good anti-collision properties and one-way properties (assuming V and C are reasonably large, so that it makes sense to make statistical statements).
This way of constructing hash functions is more practical than it might initially appear. Note that any cipher can be considered a permutation on the space of all possible texts of a given length. Hint: use a pigeon-hole argument: The ciphertext is the same length as the plaintext, and the fact that the cipher is decipherable guarantees that there is a one-to-one correspondence between ciphertexts and plaintexts.
This suggests a general method of constructing hash functions: Pad all intputs to some fixed length. Append a representation of the original length, so that we don’t have collisions between messages that differ only in the amount of padding required. Encipher. If it’s a block cipher, use appropriate chaining so that the resulting permutation isn’t a block-diagonal matrix. Pass the cipertext through a contraction function such as WEAKHASH-0 to produce a fixed-length output. That is:
|Pad → Encipher → Contract (6)|
For starters, this serves to answer a theoretical question: it is certainly possible for a hash function to generate all codes uniformly.
By the way, a hash constructed this way will have the one-way property, if the adversaries don’t know how to decipher your cipher. You could ensure this by using a public-key cipher and discarding the private key. (For what it’s worth, if you don’t need the one-way property, you could even use a symmetric cipher without bothering to keep the key secret.)
Another way to add the one-way property, for instance for a PRNG or SRNG, is to tack on a one-way function downstream of the cipher-based hash function. This may be more computationally efficient than using asymmetric crypto. A cipher with a random key will suffice. If the one-way function requires a key, consider that to be part of the keying and re-keying requirements of the PRNG or SRNG.
In any case, keep in mind that turbid does not require its hash to be one-way.
We remark that many modern hash functions including SHA-1 follow the general plan Pad → Scramble → Contract which is rather similar to scheme 6. A key difference is that the scramble function is not a normal cipher, because it is not meant to be deciphered. The Feistel scheme is conspicuously absent. Therefore we can’t easily prove that it is really a permutation. The pigeon-hole argument doesn’t apply, and we can’t be 100% sure that all hashcodes are being generated uniformly.
In section 5.3 we temporarily assumed a uniform distribution on all possible truth-tables. That’s not quite realistic; the mixing performed by SHA-1 (or any other off-the-shelf hash function) is not the same as would be produced by a randomly-chosen truth table. We need to argue that it is, with virtual certainty, impossible for anyone to tell the difference, even if the adversaries have unlimited computing power.
Rather than just assuming that the hash function has the desired property, let’s look a little more closely at what is required, and how such a property can come about.
In section 5.3 each row in the Input column was treated pretty much the same as any other row. It is better to assign measure to each input-state according to the probability with which our data-acquisition system generates that state.
We then sort the truth-table according to the probability of each input. (This sort keeps rows intact, preserving the Input/Output relationship, unlike the permutation discussed previously which permuted one column relative to the others.) In the Input column we list all the states (all 2562100 of them) in descending order of probability. In the Output column we find the corresponding hash. This allows us to see the problem with WEAKHASH-1: Because we are imagining that the data is an IID sequence of samples, with much less than 100% entropy density, a huge preponderance of the high-probability data strings will have the same hash, because of the permutation-invariance. Certain hashcodes will be vastly over-represented near the top of the list.
So we require something much stronger than the so-called strict avalanche property. Mainly, we require that anything that is a symmetry or near-symmetry of the raw data must not be a symmetry of the hash function. If single-bit flips are likely in the raw data, then flipping a single bit should move the data-string into a new hashcode. If flipping a few bits at a time is likely, or permutations are likely, then such things should also move the data-string to a new hashcode.
The input buffer will typically contain some bits that carry little or no entropy. In the space of all possible truth tables, there must exist some hash functions that look at all the wrong bits. However, these must be fantastically rare. Consider the cipher-based hash functions decribed previously. For almost any choice of cipher key, the function will permute the hashcodes so that a representative sample falls at the top of the list. The key need not be kept secret (since the adversaries have no power to change the statistics of the raw data).
Unless we are extraordinarily unlucky, the symmetries of natural physical processes are not symmetries of SHA-1.
Let us contrast how much a cryptologic hash function provides with how little we need:
|A cryptologic hash function advertises that it is computationally infeasible for an adversary to unmix the hashcodes.||What we are asking is not really very special. We merely ask that the hashcodes in the Output column be well mixed.|
|A chosen-plaintext (chosen-input) attack will not discover inputs that produce hash collisions with any great probability.||We ask that the data acquisition system will not accidentally produce an input pattern that unmixes the hashcodes.|
We believe that anything that makes a good pretense of being a cryptologic hash function is good enough for our purposes, with a wide margin of safety. If it resists attack when the adversary can choose the inputs, it presumably resists attack when the adversary can’t choose the inputs. See also section 5.5.
Note that we don’t care how much the adversary knows about the distribution from which the input samples are drawn. We just need to make sure that there is enough sample-to-sample variability. If there is some 60-cycle hum or other interference, even something injected by an adversary, that cannot reduce the variability (except in ultra-extreme cases).
Once noise is added to a signal, it is very hard to remove — as anyone who has ever tried to design a low-noise circuit can testify.
It is often suggested that we test whether the output of turbid is random, using packages such as Diehard (reference 4) and Maurer’s Universal Statistical Test (“MUST”, reference 5). We have several things to say in response:
- 1) SHA-1 does pass those tests.
- 2) SHA-1 passes a stricter test, the counter test described below.
- 3) Being able to pass such a test is necessary but far from sufficient as a criterion for the correctness of the turbid program. See section 7.
We note in passing a trivial weakness: There is a wide category of hash functions (including SHA-1) that operate on blocks of input data, and are Markovian in the sense that they remember only a limited amount of information about previous blocks. It has been shown that any hash in this category will be more vulnerable to multicollisions than an ideal hash would be. However, this is a chosen-plaintext attack, and is not even remotely a threat to turbid.
SHA-1 has been cryptanalyzed by experts. See reference 6, reference 7, and references therein. To date, no attacks have been reported that raise even the slightest doubts about SHA-1’s suitability for use in turbid.
However, bowing to popular demand, we performed some tests, as follows: We constructed a 60-bit counter using the LSB from each of 60 32-bit words; the higher 31 bits of each word remained zero throughout. These sparsely-represented 60-bit numbers were used, one by one, as input to the hash function. The resulting hashcodes were taken as the output, analogous to the output of turbid. We then applied Diehard and MUST.
As expected, SHA-1 passed these tests.
In relative terms, we consider this counter-based approach to be in some ways stricter than testing the output of turbid under normal operating conditions, because the counter has zero entropy. It has a goodly number of permanently-stuck bits, and the input to the hash function changes by usually only one or two bits each time.
However, in absolute terms, we consider all such tests ludicrously lame, indicative of the weakness of the tests, not indicative of the strength of SHA-1. We don’t want turbid to be judged by such low standards. See section 7 for more on this.
Note the contrast:
Understanding turbid requires some interdisciplinary skills. It requires physics, analog electronics, and cryptography. If you are weak in one of those areas, it could take you a year to catch up.
This section exists mainly to dispel some misconceptions. Sometimes people decide that the the laws of physics do not apply to electronics, or decide on the basis of “intuition” that the noise in their audio system is negligible. If you do not suffer from these particular misconceptions, feel free to skip to section 6.3.
|A digital logic gate has considerable noise immunity. Noise, if it is not too large, is eliminated at every step. This requires both nonlinearity and power dissipation.||A linear analog system cannot eliminate noise in the same way. The fundamental equations of motion do not allow it.|
In principle, there are many physical devices that could be used as the entropy source. We prefer the audio system (the “soundcard”) because such things are cheap, widely available, and easy to calibrate. We have a provable lower bound on the rate at which they produce entropy, and this rate in sufficient for a wide range of applications. This provides a way to make typical hosts very much more secure than they are now.
In contrast, a video frame-grabber card would produce entropy more copiously, but would be much more costly. Calibration and quality-assurance monitoring would be much more difficult.
There are many other sources that “might” produce usable entropy, but for which a provable nonzero lower bound is not available.
As foreshadowed above, we choose to focus on the Johnson noise. As discussed in reference 11 and reference 12, this noise is inherent in any electrical resistor. The laws of thermodynamic guarantee there will be ½kT of energy per degree of freedom, For a more elaborate discussion, including the extension to zero temperature, see reference 13.
We use the term “noise” in the quantitative technical sense, as a synonym for “thermal fluctuations”. (There are about a dozen non-technical, esthetic, and/or metaphorical meanings of the word, which are not relevant here.) We do not hook up a microphone to the audio system. If we were using a video frame-grabber we would not hook up a camera. We rely entirely on the thermal fluctuations of the electrons in the circuitry.
Johnson noise is not the only source of noise in the system. Sometimes it is not even the largest contribution. However, it has one great virtue: we can easily calculate its magnitude without having to know much about the innards of the system.
For reasons discussed in reference 12, the Johnson noise in any Ohmic resistor is nice white noise. Samples are independent and identically distributed (IID), according to a Gaussian normal distribution with zero mean and variance given by
|σ2 = 4 k T B R (7)|
where k is Boltzmann’s constant, T is the temperature (300 K), B is the power-averaged bandwidth as defined in section 12.8. Typical soundcards limit the bandwidth to slightly less than N/2, where N is the sampling frequency, and R is the relevant resistance. We recognize σ as the root-mean-square (RMS) voltage. Every resistor contributes additional noise; we focus on R = RIn, the real part of the input impedance.
Tangential remark: Equation 7 is the classical expression, valid for not-too-low temperatures and not-too-high frequencies. The criterion is ℏω ≪ kT, so at room temperature we are assuming the soundcard operates at frequencies small compared to 600 terahertz. This assumption is quite well-founded in practice.
At cryogenic temperatures and/or very high frequencies, equation 7 must be replaced by a fancier expression based on quantum statistical mechanics.
The Gaussian normal probability density is:
|dP(V | µ, σ) =|
where µ is the mean and σ is the standard deviation. The corresponding cumulative probability is:
|P(V | µ, σ) = ½ + ½ erf|
The data acquisition system will quantize the voltage in bins of size Q. That converts the continuous probability density in equation 8 into a set of discrete probabilities, i.e. the amount of probability in each bin. The entropy per sample is then just a sum over bins, in accordance with equation 2. We sometimes calculate it approximately by
|s ≈ log(σ / Q) + ½ log(2πe) (10)|
where the first term is fairly obvious from dimensional analysis – just ask how many bins fall under the high-probability part of the Gaussian. The second term is just a constant (2.05 bits). Equation 10 is accurate to a fraction of a percent (or better) whenever σ/Q is greater than 3.3. Otherwise it is more convenient and more accurate to calculate the entropy directly from the definition, by plugging equation 9 into equation 2.
However, we need to be a little bit careful. Think of the transfer function of the digitizer as a staircase. The worst-case scenario is when the Gaussian (representing the Johnson noise) is narrow and centered near the middle of a tread. Virtually all the probability lands in that one bin, so the entropy is virtually zero. In contrast, suppose we change µ so that the Gaussian exactly straddles one of the risers. Then half the probability lands in in one cell, and half in the neighboring cell. The entropy is one bit, no matter how narrow the Gaussian is. You might imagine the chance of the Gaussian straddling a riser is inversely proportional to σ, if you assume that all µ values are equally likely. However, we are not going to make any such assumption.
Remember, turbid is designed around provable lower bounds on the entropy. Since we cannot be certain where the Gaussian sits relative to the nearest riser, we assume the worst case, namely that it sits in the middle of a tread. In practice this doesn’t cost us much, because it produces a fairly tight lower bound. The bound stops being tight when σ/Q is quite small (less than 0.2 or so), which only happens when the soundcard has too little sensitivity for practical use.
Example: The Line-In port on the Thinkpad 600 is woefully lacking in sensitivity, and its impedance and bandwith are unimpressive also. As a consequence, it has a rather narrow Gaussian which could easily sit entirely (except for negligible wings) in the middle of a single tread, so we cannot derive a useful lower bound on the entropy. Although intuition suggests that the signal actually contains some entropy, we can’t prove it, so we give up on this port. We use the Mike-In port instead, which has vastly more sensitivity.
The key characteristics of various audio I/O systems are listed in the following table.
|Card||Q / µV||RIn / Ω||Kout||B||N||σ / µV||s / bits||ROut / Ω|
|Thinkpad cs4239 (Line)||19.6||7900||.037||9000||44100||1.07||0|
|Thinkpad cs4239 (Mike)||1.65||10100||.038||7300||44100||1.09||1.58|
Note that the CT4700 reports itself as “ENS1370” so it uses the ens1370.ctl control file.
As mentioned before, we need a lower bound on the entropy per sample, and that is what we have just calculated. This lower bound is used to ensure that the hash function’s buffer contains “enough” entropy, as discussed in section 4.
Let us briefly explore the consequences of mis-estimating the entropy:
- 1a) If we make a slight overestimate, it means that the hash function, rather than putting out 2160 different outputs, will put out some rather smaller ensemble. In mild cases this may not be noticeable in practice.
- 1b) A major overestimate may allow an adversary to focus an attack on the few output-words being produced.
- 2a) If we modestly underestimate the entropy content of the raw data, there is no serious harm done.
- 2b) A major underestimate will reduce the total randomness production capability of the machine. This means the processes that consume random symbols may block, waiting for the HRNG to produce more entropy.
We looked at the power spectrum of the signal produced by a number of different sound cards. Based on these observations, and on general engineering principles, we can list various hypotheses as to what physical processes could be at work, namely:
Some of these are undoubtedly fine sources of randomness. The only problem is that except for item #1, we cannot calculate their magnitude without a great deal more engineering knowledge about the innards of the card. Using these sources might produce a HRNG that is correct, but not provably correct. Since we want something that is provably correct, we simply ignore these other contributions. They can’t hurt, and almost certainly provide a huge margin of safety.
That is: We need randomness in the raw data, and we need to know a provable lower bound on the amount of randomness. Turbid consistently applies the principle that having extra randomness is OK, and underestimating the amount of randomness is OK. This may give us less-than-optimally efficient use of the available randomness, but it guarantees correctness. Turbid consistently chooses correctness over efficiency.
By way of analogy, let us consider the relationship between code-breakers and code-makers. This is a complex ever-changing contest. The code-breakers make progress now and then, but the code-makers make progress, too. It is like an “arms race” but much less symmetric, so we prefer the term cat-and-mouse game.
At present, by all accounts, cryptographers enjoy a very lopsided advantage in this game.
There is an analogous relationship between those who make PRNGs and those who offer tools to test for randomness. The PRNG has a hidden pattern. Supposedly the tester “wants” the pattern to be found, while the PRNG-maker doesn’t.
We remark that in addition to the various programs that bill themselves as randomness-testers, any off-the-shelf compression routine can be used as a test: If the data is compressible, it isn’t random.
To deepen our understanding of the testing issue, let’s consider the following scenario: Suppose I establish a web server that puts out pseudo-random bytes. The underlying PRNG is very simple, namely a counter strongly encrypted with a key of my choosing. Each weekend, I choose a new key and reveal the old key.
The funny thing about this scenario is the difference between last week’s PRNG and this week’s PRNG. Specifically: this week’s PRNG will pass any empirical tests for randomness that my adversary cares to run, while last week’s PRNG can easily be shown to be highly non-random, by means of a simple test involving the now-known key.
As a modification of the previous scenario, suppose that each weekend I release a set of one billion keys, such that the key to last week’s PRNG is somewhere in that set. In this scenario, last week’s PRNG can still be shown to be highly non-random, but the test will be very laborious, involving on the order of a billion decryptions.
Note that to be effective against this PRNG, the randomness-testing program will need to release a new version each week. Each version will need to contain the billions of keys needed for checking whether my PRNG-server is “random” or not. This makes a mockery of the idea that there could ever be an all-purpose randomness tester. Even a tester that claims to be “universal” cannot be universal in any practical sense.3 An all-purpose randomness tester would be tantamount to an automatic all-purpose encryption-breaking machine.
To paraphrase Dijkstra: Measurement can prove the absence of entropy, but it cannot prove the presence of entropy. More specifically, a “randomness testing” program will give an upper bound on the entropy density of a random generator, but what we need is a lower bound, which is something else entirely.
In section 6.4 we calculated a lower bound on the entropy density, based on a few macroscopic physical properties of the hardware. It is entirely appropriate to measure these macroscopic properties, and to remeasure them occasionally to verify that the hardware hasn’t failed. This provides a lower bound on the entropy, which is vastly more valuable than a statistically-measured upper bound.
As discussed in section 5.5, tests such as Diehard and Maurer’s Universal Statistical Test are far from sufficient to prove the correctess of turbid. They provide upper bounds, whereas we need a lower bound.
When applied to the raw data (at the input to the hash function) such tests report nothing more than the obvious fact that the raw data is not 100% random. When applied to the output of the hash function, they are unable to find patterns, even when the entropy density is 0%, a far cry from the standard (100%) we have set for ourselves.
A related point: Suppose you suspect a hitherto-undetected weakness in turbid. There are only a few possibilities:
The foregoing is a special case of a more general rule: it is hard to discover a pattern unless you have a pretty good idea what you’re looking for. Anything that could automatically discover general patterns would be a universal automatic code-breaker, and there is no reason to believe such a thing will exist anytime soon.
There are some tests that make sense. For instance:
On the one hand, there is nothing wrong with making measurements, if you know what you’re doing. On the other hand, people have gotten into trouble, again and again, by measuring an upper bound on the entropy and mistaking it for a lower bound.
The raison d’etre of turbid is that it provides a reliable lower bound on the entropy density, relying on physics, not relying on empirical upper bounds.
We ran Maurer’s Universal Statistical Test a few times on the output of turbid. We also ran Diehard. No problems were detected. This is totally unsurprising, and it must be emphasized that we are not touting this a serious selling point for turbid; we hold turbid to an incomparably higher standard. As discussed in section 7.2, we consider it necessary but not sufficient for a high-entropy random generator to be able to pass such tests.
To summarize this subsection: At runtime, turbid makes specific checks for common failures. As discussed in section 7.3 occasionally but not routinely apply general-purpose tests to the output.
We believe non-specific tests are very unlikely to detect deficiencies in the raw data (except the grossest of deficiencies), because the hash function conceals a multitude of sins. You, the user, are welcome to apply whatever additional tests you like; who knows, you might catch an error.
We must consider the possibility that something might go wrong with our high entropy random generator. For example, the front-end transistor in the sound card might get damaged, losing its gain (partially or completely). Or there could be a hardware or software bug in the computer that performs that hashing and control functions. We start by considering the case where the failure is detected. (The other case is considered in section 8.2.)
At this point, there are two major options:
We can ornament either of those options by printing an error message somewhere, but experience indicates that such error messages tend to be ignored.
If the throttling option is selected, you might want to have multiple independent generators, so that if one is down, you can rely on the other(s).
The choice of throttling versus concealment depends on the application. There are plenty of high-grade applications where concealment would not be appropriate, since it is tantamount to hoping that your adversary won’t notice the degradation of your random generator.
In any case, there needs to be an option for turning off error concealment, since it interferes with measurement and testing as described in section 7.
We can also try to defend against undetected flaws in the system. Someone could make a cryptanalytic breakthrough, revealing a flaw in the hash algorithm. Or there could be a hardware failure that is undetected by our quality-assurance system.
One option would be to build two independent instances of the generator (using different hardware and different hash algorithms) and combine the outputs. The combining function could be yet another hash function, or something simple like XOR.
Figure 3 can be simplified to
|Source → Digitizer → Hash (11)|
It has been repeatedly suggested that we should add an additional “whitening” step, resulting in
|Source → Digitizer → Hash → Whitener (12)|
using a cipher such as DES as the whitener.
We think such whitening schemes are a bad idea.
Specifically: There are only a few possibilities:
In general, people are accustomed to achieving reliability using the belt-and-suspenders approach, but that only works if the contributions are in parallel, not in series. A chain is only as strong as its weakest link.
In this case, it is a fallacy to think that a whitener can compensate for a weakness in the hash function. As an extreme example, consider the simplest possible hash function, one that just computes the parity of its input. This qualifies as a hash function, because it takes an arbitrary-length input and produces a constant-length output. Now suppose the data-acquisition system produces an endless sequence of symbols with even parity. The symbols have lots of variability, lots of entropy, but always even parity. The output of the hash function is an endless sequence of zeros. That’s not very random. You can run that through DES or any other whitener and it won’t get more random.
As a less extreme example, consider the WEAKHASH-2 function described in section 5.2. The whitener can conceal the obvious problem that all hashcodes have odd parity, but it cannot remedy the fundamental weakness that half the codes are going unused. The problem, no matter how well concealed, is still present at the output of the whitener: half the codes are going unused.
This is easy to understand in terms of entropy: once the entropy is lost, it cannot be recreated.
It is sometimes suggested that there may exist relatively-simple hash functions that have good mixing properties but require a whitener to provide other cryptologic strengths such as the one-way property. We respond by pointing out that the HRNG does not require the one-way property. We recognize that a Pseudo-Random Generator is abjectly dependent on a one-way function to protect its secret internal state, but the HRNG has no secret internal state.
We recognize that the HRNG may have some internal state that we don’t know about, such as a slowly-drifting offset in the data-acquisition system. However, we don’t care. The correctness of the final output depends on the variability in the raw data, via the hash-saturation principle. As long as there is enough unpredictable variability in the raw data, the presence of additional variability (predictable or not) is harmless.
As a practical matter, since SHA-1 is remarkably computationally efficient, it is hard to motivate the search for a “simpler” hash functions, especially if they require whitening or other post-processing.
Finally, we point out that the last two steps in scheme 12 are backwards compared to scheme 6. If you really want whiteness, i.e. good statistical properties, it is better to put the crypto upstream of the contraction.
For some applications, a well-designed PRNG may be good enough. However, for really demanding applications, at some point you might well throw up your hands and decide that implementing a good HRNG is easier.
A typical PRNG, once it has been seeded, depends on a state variable, a deterministic update function, and some sort of cryptologic one-way function. This allows us to classify typical PRNG attacks and failures into the following categories:
The historical record contains many examples of failed PRNGs. For example:
See reference 21 for an overview of ways in which PRNGs can be attacked.
The following scenario serves to illustrate a few more of the ideas enumerated above: Suppose you are building a network appliance. It has no mouse, no keyboard, no disk, and no other easy sources of randomness (unless you use the audio system, or something similar, as described in this paper). You want to sell millions of units. They start out identical, except possibly that each one has a network interface with a unique MAC address.
Seeding the PRNG with the MAC address is grossly inadequate. So the only way to have a halfway acceptable PRNG is to configure each unit by depositing into it a unique PRNG state vector. This means that each unit needs a goodly amount of writable but non-volatile storage; it can’t execute out of ROM and volatile RAM alone. Also, the stored state-vector needs to be updated from time to time; otherwise every time the machine is rebooted it will re-use the exact same numbers as last time, which would be an example of failure failure 3. Note that the standard Linux distributions only update the stored seed when the system is given a shutdown command – not if it goes down due to sudden power failure or software crash – which is unacceptable. You have to protect the state vector for all time. In particular, if a rogue employee sneaks a peek at the state vector (either on the machine itself, or on a backup tape, or whatever) and sells it to the adversaries, they can break all the traffic to and from the affected unit(s) from then on, which would be an example of failure failure 2. All these problems can be dealt with, but the cost might be so high that you give up on the PRNG and use a HRNG instead.
A subtle type of improper reseeding or improper stretching (failure 3) is pointed out in reference 22. If you have a source of entropy with a small but nonzero rate, you may be tempted to stir the entropy into the internal state of the PRNG as often as you can, whenever a small amount of entropy (ΔS) becomes available. This alas leaves you open to a track-and-hold attack. The problem is that if the adversaries had captured the previous state, they can capture the new state with only 2ΔS work by brute-force search, which is infinitesimal compared to brute-force capture of a new state from scratch. So you ought to accumulate quite a few bits, and then stir them in all at once (“quantized reseeding”). If the source of entropy is very weak, this may lead to an unacceptable interval between reseedings, which means, once again, that you may be in the market for a HRNG with plenty of throughput, as described in this paper.
The Linux kernel random generator /dev/random (reference 23) accumulates entropy in a pool and extracts it using a hash function. It is associated with /dev/urandom which is the same, but becomes a PRNG when the pool entropy goes to zero. Therefore in some sense /dev/urandom can be considered a stretched random generator, but it has the nasty property of using up all the available entropy from /dev/random before it starts doing any stretching. Therefore /dev/urandom provides an example of bad side effects (failure 4). Until the pool entropy goes to zero, every byte read from either /dev/random or /dev/urandom takes 8 bits from the pool. That means that programs that want to read modest amounts of high-grade randomness from /dev/random cannot coexist with programs reading large amounts of lesser-grade randomness from /dev/urandom. In contrast, the stretched random generator described in this paper is much better behaved, in that it doesn’t gobble up more entropy than it needs. See section 10, section 13.2, section 13.3, and section 13.4.
It is certainly possible for PRNG failures to be found by direct analysis of the PRNG output, for example by using statistical tools such as reference 5.
More commonly, though, PRNGs are broken by attacking the seed-storage and the seed-generation process. Here are some examples:
If a PRNG contains N bits of internal state, it must repeat itself with a period no longer than 2N. Obviously, N must be large enough to ensure that repetition never occurs in practical situations. However, although that is necessary, it is far from sufficient, and making the period longer is not necessarily a good way to make the PRNG stronger. By way of example: A 4000-bit Linear Feedback Shift Register (LFSR) has a period of 24000 or so, which is a tremendously long period ... but the LFSR can easily be cryptanalyzed on the basis of only 4000 observations (unless it is protected by a strong one-way function installed between the LFSR and the observable output). For a Stretched Random Generator (section 13.4), it is necessary to have a period long compared to the interval between reseedings, but it is not necessary to make it much longer than that. So, for a SRNG, a huge period is neither necessary nor sufficient. For a PRNG that is not being reseeded, a huge period is necessary but not sufficient.
This is worth emphasizing: One key difference between a genuinely entropic random generator and a pseudo-random generator is that for the PRNG you have to worry about where you get the initial seed, how you recover the seed after a crash/restart, and how you protect the seed for all time, including protecting your backup tapes. For the HRNG you don’t.
Attacks against the HRNG are very different from the usual sort of attack against a PRNG (such as are discussed in section 9.1).
The question is sometimes asked whether thermal noise is really “fundamentally” random. Well, it depends. Obviously, the magnitude of thermal noise depends on temperature, and theoretically an adversary could reduce the temperature of your computer to the point where the HRNG calibration (section 12) was no longer valid. In contrast, there are other processes, such as radioactive decay and quantum zero-point motion, that are insensitive to temperature under ordinary terrestrial conditions. This makes thermal noise in some sense “less fundamental”, but it’s not worth worrying about. If somebody can get close enough to your computer to radically change the temperature, attacks against the HRNG are the least of your worries. There are other attacks that are easier to mount and harder to detect.
This is a special case of a more-general observation: The most-relevant attacks against the HRNG don’t attack the fundamental physics; they attack later stages in the data acquisition chain, as we now discuss.
Suppose you choose radioactive decay, on the theory that it is a “more fundamentally” random process. The decay process is useless without some sort of data acquisition apparatus, and the apparatus will never be perfect. The detector will presumably have live-time / dead-time limitations and other nonidealities. An adversary can interfere with the data acquisition even if the fundamental decay process is beyond reach. Last but not least, the resulting Poisson distribution will not match the desired flat distribution (all symbols equally likely) that corresponds to the desired 100% entropy density. Therefore the acquired signal will not be one whit more useful than a signal derived from thermal noise. The same sort of postprocessing will be required.
Similar remarks apply to all possible sources of hardware randomness: we do not expect or require that the processes will be 100% random; we merely require that they contain some randomness.
The most significant threat to the HRNG is the possibility of gross failure. A transistor could burn out, whereupon the signal that contained half a bit of entropy per sample yesterday might contain practically nothing today. Or there could be a bug in the software.
Even if everything is working as designed, we have to be careful if the noise is small relative to Q, the the quantum of significance of the digitizer, i.e. the voltage represented by least-significant bit. Note that the quantum of significance Q is not necessarily the same as the quantum of representation QR. For instance, the Delta-1010 uses a 32-bit representation with 24 bits of significance. As discussed in section 6.4, we avoid problems by simply assuming the worst-case scenario; then we know there’s nothing an adversary can do to make things worse.
A more subtle threat arises if the digitizer has “missing codes” or “skimpy codes”. A soundcard that is advertised as having a 24-bit digitizer might really only have a 23-bit digitizer. Most customers wouldn’t notice.
An adversary could mount an active attack. The obvious brute-force approach would be to use a high-power electromagnetic signal at audio frequencies (VLF radio), in the hopes that it would couple to the circuits in the sound card. However, any signal of reasonable size would just add linearly to the normal signal, with no effect on the process of harvesting entropy.
A truly huge signal might overload and saturate the audio input amplifier. This would be a problem for us, because thermal noise would have no effect on an already-saturated output. That is, the small-signal gain would be impaired. An attack of this kind would be instantly detectable. Also, it is likely that if attackers could get close enough to your computer to inject a signal of this magnitude, they could mount a more direct attack. Remember, if we are using a 24-bit soundcard we have almost 16 bits of headroom (a factor of 10,000 or so) between the normal signal levels and the saturation level. I consider an attack that attempts to overload the amplifier so implausible that I didn’t write any code to detect it, but if you feel motivated, you could easily write some code to look at the data as it comes off the data-acquisition system and verify that it is not saturated. Having 16 bits (or more) of resolution is a great luxury. (It would be quite tricky to verify correct operation on other systems that try to get by with only one-bit or two-bit digitizers.)
Another possibility is interference originating within the computer, such as a switching power supply or a video card, that inadvertently produces a signal that drives the sound card into saturation. This is unlikely, unless something is badly broken. Such interference would cause unacceptable performance in ordinary audio applications, at levels very much lower than we care about. A decent soundcard is, by design, well shielded against all known types of interference. That is, a soundcard must provide good performance at low audio levels (little if any discernible additive noise, no matter what interference sources are present) and also good performance at high audio levels (little if any saturation). The possibility that an interference source that is normally so small as to be inaudible would suddenly become so large as to saturate a 24-bit ADC seems quite insignificant, unless there is a gross hardware failure. The odds here are not dominated by the statisics of the noise processes; they are dominated by the MTBF of your hardware.
Even if there is no interference, it may be that the sound card, or the I/O bus, radiates some signal that is correlated with the raw data that is being provided to the program. However, if you are worried about this sort of passive attack you need to be worried about compromising emanations (TEMPEST) from other parts of your system also. See reference 24. There is no reason to believe the audio hardware or HRNG software unduly increases your vulnerability in this area.
The least-fundamental threats are probably the most important in practice. As an example in this category, consider the possibility that the generator is running on a multiuser machine, and some user might (inadvertently or otherwise) change the mixer gain. To prevent this, we went to a lot of trouble to patch the ALSA system so that we can open the mixer device in “exclusive” mode, so that nobody else can write to it.
We do not need the samples to be completely independent. All we need is some independence.
Here’s an example that may help explain how this works.
Suppose we start with a deck of cards that has been randomly shuffled. It can provide log2(52!) bits of entropy. That’s a little more than 225 bits. Now suppose we have ten decks of cards all arranged alike. You could set this up by shuffling one of them and then stacking the others to match ... or by some more symmetric process. In any case the result is symmetric w.r.t interchange of decks. In this situation, we can choose any one of the decks and obtain 225 bits of entropy. The funny thing is that if we choose N of the decks, we still get only 225 bits of entropy, not N×225.
We can graph the amount of entropy we get, as a function of the number of decks we look at, as shown in figure 4.
This can be summarized by saying that entropy is not an extensive quantity in this situation.
The spooky aspect of this situation is the whack-a-mole aspect: You cannot decide in advance which one of the decks has entropy and which N−1 of them do not. That’s the wrong question. The first deck we choose to look at has 225 bits of entropy, and only then can we say that the other N−1 decks have zero additional entropy.
It is possible for a source to be partially dependent and partially independent. For example, suppose we take each of the ten aforementioned decks and “cut the deck” randomly and independently. “Cutting” means applying one of the 52 different full-length permutations. Now, the first deck we look at will provide 225 bits of entropy, and each one thereafter will provide 5.7 bits of additional entropy, since log2(52)=5.7. So in this situation, each deck can be trusted to provide 5.7 bits of entropy.
In this situation, requiring each deck to have no dependence on the others would be an overly strict requirement. We do not need full independence; we just need some independence, as quantified by the provable lower bound on the entropy. To repeat: we don’t need to quantify how much dependence there is; we just need to quantify how much independence there is. In our example, there is provably at least 5.7 bits of usable entropy per deck.
If you wanted, you could do a deeper analysis of this example, taking into account the fact that the entropy we are relying on, 5.7 bits, is not the whole story. We can safely use 5.7 bits as a lower bound, but under some conditions more entropy is available, as can be quantified by considering the joint probability distribution and computing the entropy of that distribution. Meanwhile the fact remains that we are not required to extract every last bit of available randomness. Under a wide range of practical conditions, it makes sense to engineer a random generator based on provable lower bounds, since that is good enough to get the job done, and a deeper analysis would not be worth the trouble.
For a typical personal workstation, there is only a modest demand for high-grade randomness. Uses include:
For such uses, I/O timing typically provides a perfectly adequate supply of entropy. The Linux kernel random generator is an example: it collects a few bits per second from timing mouse, keyboard, and disk events.
Servers are more problematic than workstations. Often they have a much greater demand for entropy, and less supply. An example is a server which terminates hundreds of IPsec tunnels. During start-up it requires more than a hundred thousand high-quality random bits in a short time, which is far more than can be buffered in /dev/random, and far more than can be collected by the kernel in the available time.
An even worse situation arises in small networked controllers, which have no mouse, no keyboard, no disk, nor any other good sources of entropy. If we want to establish secure IPsec or ssh connections to such boxes, we simply must come up with a source of entropy.
At this point, one often hears the suggestion that we send the box some new entropy over the network. This is not a complete solution, because although it helps defend against certain threats, it leaves others undefended. One problem is that if security is compromised once (perhaps because somebody stole the PRNG seed) then security is compromised more-or-less forever, because the adversary can read the network traffic that carries the new entropy.
The sound card can produce quite a bit of entropy. Even a low-end sound card can produce 44,000 bits per second. A Delta-1010 (which has a higher sample rate, higher resolution, and more channels) can produce a couple million bits per second.
If turbid runs as non-root, it will open its output FIFOs in your $HOME/dev/ directory, look for its control files in your $HOME/etc/turbid/ directory, and scribble a .pid file in your $HOME/var/run/ directory.
If turbid runs as root, it will use the system /dev/ and /etc/turbid/ /var/run/ directories.
It is remarkably easy to mis-configure a soundcard in such a way that it produces much less entropy than it otherwise would. This section outlines the general procedure; see appendix C for details on specific soundcards.
Note that we use the term soundcard loosely, to apply not just to individual cards per se, but also to audio I/O systems embedded on the mainboard, and to external pods connected via USB or whatever.
You can skip down to section 12.5 if you’ve been given a working mixer configuration (.ctl) file for your soundcard. You can skip all the way down to section 13 if you have the configuration file and you know the R-value, Q-value, and B-value for your card (input impedance, digitizer quantum, and bandwidth, perhaps from table 4).
hw:CARD=Intel,DEV=0 HDA Intel, CX20561 Analog Direct hardware device without any conversions
in which case you can invoke turbid --card hw:CARD=Intel,DEV=0.
card 0: Intel [HDA Intel], device 1: CX20561 Digital [CX20561 Digital] Subdevices: 1/1 Subdevice #0: subdevice #0
in which case you can invoke turbid --card hw:0,1 where the zero is the card number and the 1 is the device number on that card.
Plug something into the output port for the chosen channel and leave it plugged in from now on. That’s because I have seen situations where plugging and unplugging causes the mixer settings to get changed behind your back. The pulseaudio daemon appears to be the culprit, but there may be other culprits.
Connect headphones to the chosen output signal, by plugging them into the other port on the QA box, so that they receive a copy of the output signal. You could perhaps use loudspeakers instead of headphones, but for simplicity this document will speak in terms of headphones.
Use alsamixer to adjust the output to some reasonable level. You don’t want the output to clipped or otherwise distorted.
If you want to use a different filename, that’s fine, but then you will need to add the -m ... option to every turbid command from now on.
If you need to adjust the mixer in order to obtain a the desired output voltage, do so, and then save the configuration as described in step 6.
Note this assumes you are using a voltmeter that measures RMS voltage. If you are using an oscilloscope or something else that measures peak voltage, keep in in mind that the peak of a sine wave is 1.4 times the RMS value ... and the peak-to-peak excursion is 2.8 times the RMS value.
All the critical measurements should be made with the headphones disconnected, unless you are sure that they aren’t affecting the measurements.
Note that speakers (as opposed to headphones) are more likely to put a significant load on the output.
Conversely, the speaker-out port is likely to have a much lower impedance (compared to the line-out port), so it might be much less affected by a smallish load.
On a mono Mic port, the left channel (the tip) is the actual input. Beware that very commonly the ring is a power-supply output. This is scary, because if you jumper the Mic port to some other input port, you run the risk that the DC on the right channel could blow out your speaker.
Therefore, when configuring the input subsystem, by far the safest procedure is to use a signal that comes from somewhere else, not from the computer that is being configured.
The first step is to make sure audio input is working on your soundcard. This temporarily requires a well-behaved test signal. Your options include:
Note that in the turbid software, anything that is linear in voltage is represented relative to 1.0 = full scale. In contrast, anything that is measured in dB is measured relative to a reference level that is 15 dB below full scale. This means that a rail-to-rail square wave (such as might result from extreme clipping) is reported as +15 dB. The peak voltage of a zero-dB sine wave √2 higher than a zero-dB square wave. Here’s a numerical example of what you might see at this stage:
In other words, turbid needs to do input, it needs to do output, and the two things need to be independent.
Observe the “stdev” reading in dB. Adjust the mixer to enable the chosen input port as the capture source, and turn the associated gain sliders up as high as it can reasonably go (usually all the way). If there is a “Mic Boost” slider, turn it up also, as high as it can go without clipping. You want every element in the input-chain to be turned up as high as it can go without clipping.
At this point, you may find that the input readings are saturated no matter what you do, and any attempt to de-saturate them fails because you can’t turn the output down far enough. At this point you need to build a different box, with a 100-to-1 attentuator. In other words, we need a 40dB attentuator. The resistance values in figure 5 should work. Note that on a Thinkpad W500, even with the 40dB attenuator in place, I still needed to turn the output to -17 dB.
Check whether the stdev readings are saturated, as follows: Turn down the amplitude on the signal generator (West) by a few dB (turbid -F 550 -I 1.2 -A -5) or maybe many dB (turbid -F 550 -I 1.2 -A -25) and see if the stdev readings (on East) drop by the same number of dBs. Explore the range of amplitudes to find the point where saturation sets in. Set the amplitude so that the reading is 10 dB or so below saturation, and leave it there.
Note that you can give the -A ... option more than once, and turbid. will do the arithmetic for you. So you can use the first -A ... to set the level at a nice round number, and then use another -A ... to change things from there.
Do not cure saturation by lowering the gain of the input channel, because that would reduct the entropy production. Instead, just lower the amplitude of the applied signal.
If you are not using the Mike-In port, skip to section 12.4. On most consumer-grade soundcards, the Mike-In port is peculiar.4 It is mono. It records from what you would think of as the left channel. The wire that would have carried the right channel is present, but it is not an input; it supplies power from the soundcard to the microphone. The nominal open-circuit voltage is 5V and the source impedance is a couple of kohms, so the short-circuit current is about 2.5mA. You probably shouldn’t connect an ordinary Line-Out signal to a Mike-In port. That drives 5 volts backwards into the Line-Out port. It probably won’t blow out the port, but it still seems unsavory.
The risk of damage is even greater if you connect the Mike-In port of the soundcard to the Line-In port of your stereo. (This is easier to do than you might think, e.g. if you are using a Y-connector to monitor some signal using your stereo while simultaneously presenting that signal to the Mike-In port.) This puts a huge DC signal on the right channel of your speakers, and could quite easily burn out the woofer. Even if you use a custom cable with the right channel open-circuited, there will likely be an unsavory transient whenever the plug is partway inserted into the Mike-In jack, so you might want to insert that end first (and remove it last). All in all, the Mike-In port is an appallingly bad design. It should have provision for a second channel, and it should be shaped differently from other connectors to prevent the 5V power from going astray.
If the Mike-In signal has an AGC (Automatic Gain Control) feature that can be turned off, turn it off (using alsactl or whatever).
We need to get input and output working at the same time, not interfering with each other. This is called full duplex operation. To see whether output is working, play something using the play command, or the aplay command, or (best of all) turbid -F 440. Then hook a cable to the output of your soundcard (on East) and observe the voltage with the voltmeter (or, for initial qualitative checks, hook up a speaker or headphones).
Most likely, you will have to fiddle with mixer settings (using alsamixer) to get this to work. The real trick is to get output working without messing up the input. The three central requirements are:
When it is all working, you can produce an output and measure an input, using e.g. turbid -v -F 440.
Virtually all soundcards are capable of full duplex, but it is sometimes tricky to get it set up. A major problem is that mixers (such as alsamixer) don’t understand what you are trying to do. Suppose you have chosen to record from the Line-In channel. You want to mute that channel; otherwise the input signal will be routed directly to the output channel. (Such direct routing would be useful if we wanted an analog monitor of the input signal, but that’s not what we want.) The problem is that when you mute this channel, the mixer might disable it for input, which would not be what we want, either.
Here’s a typical workaround. The alsactl program is much more powerful than the alsamixer program. Set up the card as best you can recording and playback (as described in section 12.2) – even if you cannot yet achieve full duplex. Save the settings in a file, using e.g. alsactl -f record.ctl store. Then mute the input channel. Save the settings in a different file, e.g. using alsactl -f muted.ctl store. Compare the two files using diff --side-by-side record.ctl play.ctl. You may find that two things that have changed: one having to do with playback and one having to do with capture. Create a third file, duplex.ctl by copying one of the files and editing it by hand to incorporate the best features of the other file. Apply it using alsactl -f duplex.ctl restore. See if it works, using a command like turbid -v -m duplex.ctl ....
Once it works, rename duplex.ctl to a name appropriate to the sound device driver you’re using. Get the appropriate filename from turbid -h | grep mixer. Put it in your $HOME/etc/turbid/ directory, since that’s where the program looks for such things.
To check that full-duplex is working, check that on your machine (East) the command turbid -v -F 440 measures the same input voltage (stdev) as the command turbid -v. That is, the amplitude on the other machine (West) should be controlling the observed stdev signal on your machine, independent of what your machine is outputting.
Conversely, while running the command turbid -v -F 440 on your machine, observe the output level of your machine (using the voltmeter). Make sure that starting and stopping the input signal (coming from the other machine) has no effect on the output of your machine.
At this point, you may find it convenient to configure West for full-duplex operation also, by following the mirror-image procedure, but this is not necessary if your only goal is to configure East.
Since we are now sure that your machine (East) is operating full duplex, the calibration can be completed using your machine alone. Hook a cable from the output to the input, and hook up the voltmeter. An easy way to do this is to use the QA box (described in section 12.10), because it provides places to attach the voltmeter. We still want this to be a low-impedance connection, i.e. without the QA resistors in the circuit.
Run turbid -v -F 440. Using the voltmeter, measure the voltage. Adjust the output level using the -A ... option so that the voltage roughly equals the non-saturated value determined in section 12.3. Confirm this using the voltmeter reading and the stdev value. By way of illustration, suppose the amplitude is -8 and the resulting voltage is 200mV.
Restart turbid, passing it this value (turbid -v -A -8 -F 440 -V 0.200). If you are using a 40dB attentuator, measure the voltage at the high side, then divide by 100, and pass the lower number as the argument to -V. The program will print out the value of Qrep, the quantum of representation. This is a function of the voltage (as obtained from the voltmeter), the magnitude of the received signal, and the position of the least significant bit (LSB). For most 16-bit soundcards, the LSB is the 16th bit from the top. For the Delta-1010, the LSB is the 24th bit from the top.
We would like to know the quantum of significance, but we may not be able to measure it directly. Suppose the hardware makes a measurement, quantizes it, and then shifts it left 8 bits. Then by the time we see the data, the quantum of significance will be 256 times larger than the quantum of representation.
Experiment with different amplitudes (turbid -v -F 440 -A ... -V ...), pairing each amplitude with the corresponding observed voltage. The Q should come out the same each time, unless the amplitude is unduly large (saturation) or unduly small (noise pollution). Note that you will have to run turbid twice for each such observation: First you run it with a new -A value and no -V value, so you can observe the voltage with the voltmeter, and then, once you know the voltage, restart turbid passing it that -V value.
Restart turbid. Pass the Q value determined in section 12.5. The command might look something like turbid -v -A -8 -F 440 -V .200 -Q 1.31e-6. The program will display the quantity “Vin/Vmeter” which should be close to unity.
Now hook up the QA cable. From now on we will not be jumpering across the QA cable resistors; we are using them as a standard of comparison for calibrating the input impedance of the soundcard. Attach the voltmeter to the high side of the resistor, the side toward the output stage of the soundcard. (The low side will be measured by the input stage of the soundcard.)
Restart turbid, passing it the new voltmeter reading. This might be slightly higher than the previous reading (because the high impedance of the QA cable loads the soundcard output somewhat less than a direct connection to the soundcard input does), but not much higher (and certainly not lower).
Using this information, turbid will be able to determine the input impedance R of the soundcard port you are using.
Restart turbid, passing it this R value. The command should look something like turbid -v -A -8 -F 440 -V .200 -Q 1.31e-6 -R 9400. As a check, observe that the Vin/Vmeter/divider number is close to unity. (The Vin/Vmeter number itself will be small compared to unity, because there is a voltage divider between what the voltmeter is measuring and what the soundcard input is measuring.)
At this point turbid will also be displaying a number (Kout) which is basically the voltage (as would be measured by the voltmeter) corresponding to the internal representation of a certain “standard” sine wave.
Restart turbid, passing it this K value. From now on, we no longer need the voltmeter. You should erase the -V argument from the command line. We will rely on the calibrated output of the soundcard to complete the calibration of the input. This has the advantage that the soundcard can probably be trusted over a wider frequency range than the voltmeter can.
You can test this by changing the frequency of the calibrator. The command should look something like turbid -v -A -8 -F ... -Q 1.31e-6 -R 9400 -K .152. Verify that Vin/Vout/divider ratio remains near unity over a wide range of frequencies. It will start to drop off at high frequencies (above 15 or 20 kHz).
Restart turbid, passing it the option “-F BB”. Overall, command should look something like turbid -v -A -8 -F BB -Q 1.31e-6 -R 9400 -K .152. This does not actually emit a negative-frequency sine wave; instead it emist a broadband probe signal. The program will then print out a measurement of the bandwidth. See appendix D for an explanation of the measurement technique.
This completes the calibration process. See section 13 for a discussion of normal (non-calibration) operations.
Please send me <email@example.com> the .ctl file for your card, and the values for the -Q -R -B and -K settings. I will add it to the distribution, so that others may use that brand of cards without having fuss with calibration.
The QA circuit is used temporarily for calibration and used permanently for integrity checking. You don’t need a cable if you are neither interested in integrity checking, nor interested in calibrating your card (because you’ve been given a .ctl file for your mixer, along with the Q, R, and B values).
The following tools will be needed only temporarily, so if you don’t have them on hand you will have to buy or borrow them:
The easiest way to make a QA circuit is to build a box such as shown in figure 6. (If you are mass-producing things, you may want to build an integrity-monitoring cable with resistors built right into the cable. That is fine for integrity monitoring, but not so convenient for calibration. We will pursue that topic here.)
The following components will be needed, so if you don’t have them on hand you will have to go to an electronics-hobby store and buy them:
Figure 7 shows the circuit diagram for the QA box.
Here is the wirelist:
pin Jack1 Jack2 Jack3 signal tip X----R----X---------X (left) ring X----R----X---------X (right) shank X---------X---------X (ground)
where X denotes a soldered joint and R denotes a resistor.
To make a low-impedance connection between two cables, you plug one into Jack2 and the other into Jack3. To make a high-impedance connection between two cables, you plug one into Jack2 and the other into Jack1.
If you really want to economize, you can build a QA box without Jack3, since there are other ways of making a low-impedance connection. However, Jack3 is often convenient, so I recommend including it.
“Production mode” stands in contrast to “calibration mode”.
In production mode, you must provide values for Q, R, and B. No other parameters are needed, but there are a lot of options, as we now discuss.
To verify that turbid is working, you can look at the output, e.g. od -A n -t u1 -N 32 /dev/hrandom (or $HOME/dev/hrandom if you are running it as non-root). Also, turning up the verbosity level (turbid -v or turbid -vv) will display some information about the internal workings. The meaning of some of the dislayed items is documented in section 12.
If you are running a lot of legacy applications you might want to turn on the -f option as described in section 13.3. The program will write random bits to the output fifo, and will block waiting for somebody to read them. For example: turbid -Q 2.076e-07 -R 2800 -B 36200 -f. You can add the -d option if you want it to detach and run as a daemon.
The output of turbid is double-buffered, 512 bytes per buffer. The main reasons for this include:
Note that you have to be careful when using dd, because it unwisely counts blocks, not bytes. If you want to use dd to capture, say, a megabyte from /dev/random, it is advisable to use two copies of dd in a pipeline:
dd if=/dev/random obs=512 | dd bs=512 count=2k
The first dd performs reblocking while the second dd does the counting. A similar scheme is required at the output of turbid if you want to use dd with a blocksize that is not a divisior of 512 (and possibly under other conditions).
In code that you write yourself, no reblocking is required. You can keep out of trouble very easily, just by paying attention to the number of bytes actually returned by each read() request (which will sometimes be less than the number requested).
While we’re discussing buffering, note that the Linux /dev/random can buffer approximately 4096 bits, but you can’t trust ioctl(, RNDGETENTCNT, ) to tell you how much is really there. Sometimes when the ioctl reports 4096 bits, the buffer can be exhausted by reading as little as 342 bytes (2736 bits).
If you specify turbid -f, a subprocess will take some of the output and feeds it to /dev/random whenever the latter is running short of entropy. This means that applications such as GnuPG that rely on /dev/random will run nicely, without blocking.
Runtime efficiency would be improved by rewriting the applications to use /dev/hrandom directly, but this is not always convenient.
You should leave this feature turned off if you are not running as root, since /dev/random doesn’t trust deposits from non-root sources.
Also, you may want to leave it turned off under the following conditions: Suppose some application is consuming vast quantities of low-grade randomness from /dev/urandom (perhaps to obliterate a multi-gigabyte file or some such). Due to a misfeature of the Linux kernel random-number system, every byte you read from /dev/urandom takes one byte away from the pool of stored entropy in /dev/random. Therefore the feeder subprocess, in the attempt to keep /dev/random happy, will consume a great deal of entropy from /dev/hrandom, in competition with other processes that want to use /dev/hrandom.
As discussed in section 14, this problem is intrinsic to /dev/urandom. Avoid using /dev/urandom. The solution is to use /dev/srandom instead. See section 13.4.
The turbid program comes bundled with a Stretched Random Generator. Its entropy density is strictly greater than zero but less than 100%. To use it, read from /dev/srandom.
The SRNG seeds itself, and reseeds itself every so often, using entropy from the HRNG. Reading from /dev/srandom consumes entropy 10,000 times more sparingly than reading from /dev/hrandom or /dev/random or /dev/urandom. It is also less compute-intensive than /dev/urandom.
The principle of operation is as follows: In addition to the seeding/reseeding process, there is a state variable, a deterministic update function, and a hash function. Between reseedings, at each iteration we deterministically update the state variable and feed it to the hash function. The output of the hash function is the output of the SRNG.
The update function treats the state variable as three 32-bit registers. Each register has a random starting point. The first register is updated by adding a random odd addend. The second is a Linear Feedback Shift Register. The third is another LFSR, with a different period. (This design was chosen because each register can be separately tested, in contrast to one big 96-bit LFSR which would be harder to test.) Each register separately has a period on the order of 232 and combined they give a period on the order of 296. The period is billions upon billions of times longer than it needs to be, because after 10,000 iterations the state variable is strongly perturbed using 128 bits of pure entropy from the HRNG.
It is commonly assumed that having a super-long period is a hallmark of a good PRNG. However, that’s neither necessary nor sufficient. A 4000-bit LFSR has a period of 24000 or so, but it can easily be cryptanalyzed on the basis of only 4000 observations (unless it is protected by a strong one-way function installed between the LFSR and the observable output). It is necessary to have a period long compared to the interval between reseedings, but it is not necessary to make it much longer than that. In summary, if the period is long enough, making it longer confers no advantage.
Unlike the HRNG, the SRNG must assume that the hash function has the one-way property. That is, we assume an adversary (even after ∼10,000 observations of the output of the hash function) cannot infer anything useful about the state variable.
The defining property of a SRNG is that it has an average entropy density that is greater than zero but less than 100%. Let’s take a moment to discuss where in the output-stream this entropy is to be found. Suppose you have 1024 bits of entropy available for seeding your generator, which you then use to put out a very long output string.
For a discussion of the Fortuna class of random generators, and how that contrasts with turbid, see reference 26.
For a discussion of the Linux device driver for /dev/random and /dev/urandom, see reference 23.
There is a vast literature on hash functions in general. Reference 3 is a starting point. Reference 27 is a useful literature survey. Reference 28 was an influential early program for harvesting entropy from the audio system; it differs from turbid in not having calibration features. Reference 29 uses the physics of a spinning disk as a source of entropy.
Methods for removing bias from a coin-toss go back to von Neuman; see reference 30 and references therein; also reference 31.
Reference 32 suggests hashing 308 bits (biased 99% toward zero) and taking the low-order 5 bits of the hash to produce an unbiased result. This is correct as far as it goes, but it is inefficient, wasting 80% of the input entropy. Presumably that reflects an incomplete understanding of the hash-saturation principle, in particular the homogeneity property discussed in section 4.2, which would have led immediately to consideration of wider hashcodes. There is also no attempt to quantify how nearly unbiased the result is, the way table 1 does.
Thanks to Steve Bellovin, Paul Honig, and David Wagner for encouragement, incisive questions, and valuable suggestions.
Practical sources of randomness can be characterized in terms of the entropy density. The raw sources have an entropy density greater than zero and less than 100%. The methods disclosed here make it possible to produce an output that approaches 100% entropy density as closely as you wish, producing symbols that are random enough for any practical purpose. It is important to have a calculated lower bound on the entropy density. In contrast, statistical tests provide only an upper bound, which is not what we need.
It is possible to generate industrial-strength randomness at very low cost, for example by distilling the randomness that is present in ordinary audio interfaces.
In order to refine our notions of what is random and what is not, consider the following million-character strings. We start with
That seems distinctly non-random, very predictable, not very complex. Next, consider the string
That seems also non-random, very predictable, not very complex.
That is somewhat more interesting. The digits pass all known tests for randomness with one narrow exception (namely tests that check for digits of π). However, they must still be considered completely predictable. Finally, consider the two strings
The million-character string represented by example E5 is, as far as anybody knows, completely random and unpredictable. Example E4 is very similar, except that the letter “A” appears more often than it would by chance. This string is mostly unpredictable, but contains a small element of nonrandomness and predictability.
Following Solomonoff (reference 33 and reference 34) and Chaitin (reference 35) we quantify the surprisal of a string of symbols as follows:
Let z be a string of symbols. The elements of z are denoted zk and are drawn from some alphabet Z. The number of symbols in the alphabet is denoted #Z.
Let PC(z) be the probability that computer programs, when run on a given computer C, will print z and then halt. The surprisal is defined to be the negative logarithm of this probability, denoted $C(z) := − logPC(z). In this paper we choose to use base-2 logarithms, so that surprisal is measured in bits. Surprise value is also known as the unexpectedness. It is intimately related to entropy, as discussed below.
Note that we are using one probability distribution (the probability of choosing a computer program) to derive another (the probability PC(z) of a symbol string). To get this process started, we need to specify a method for choosing the programs. Better yet, using the notions of measure theory, we realize that probability need not involve choosing at all; instead all we really need is a method for assigning weight (i.e. measure) to each of the possible programs. Here is a simple method:5 Consider all possible programs of length exactly L* (measured in bits, since we are representing all programs as bit-strings). Assign each such program equal measure, namely 2−L*. Then test each one and count how many of them produce the desired output string z and then halt, when run on the given computer C. The measure of a string is the total measure of the set of programs that produce it. A string that can be produced by many different programs accumulates a large probability.
In this construction, a short program X (i.e. one which has a length L(X) less than L*) is not represented directly, but instead is represented by its children, i.e. programs formed from X by padding it with comments to reach the required length L*. There are at least 2L*−L(X) ways of doing the padding, and each way contributes 2−L* to the total measure. This means that a string z that can be produced by a short programs X will have a probability at least 2−L(X), no matter how large L* is. We take the limit as L* becomes very large and use this in the definition of PC(z).
Usually the probability PC(z) is dominated by (the children of) the shortest program that produces z. Therefore some people like to use the length of this shortest program as an estimate of the surprise value. This is often a good estimate, but it should not be taken as the definition.
The explicit dependence of PC(z) on the choice of computer C calls attention to an element of arbitrariness that is inherent in any definition of surprisal. Different people will assign different values to “the” surprisal, depending on what resources they have, and on what a priori information they have about the situation.
In an adversarial situation such as cryptography, we suggest that probabilities be defined in terms of the adversary’s computer. If you have multiple adversaries, they can be treated as a single adversary with a parallel computer.
In this section, for conciseness, we drop the subscript C ... but you should remember that the probabilities and related quantities are still implicitly dependent on the choice of C.
A relatively minor objection to this definition of surprisal is that PC(z) includes contributions from arbitrarily-long programs. That is a problem in theory, but in practice the sum is dominated by relatively short programs, and the limit converges quickly.
A much more serious objection is that even for modest-sized programs, the definition runs afoul of the halting problem. That is, there may well be programs that run for a million years withough halting, and we don’t know whether they would eventually produce string z and then halt. This means the surprisal, as defined here, is a formally uncomputable quantity.
We will duck both these issues, except for the following remarks.
In this section we shift attention from the unpredictability of strings to the unpredictability of the individual symbols making up the string.
Let us re-examine the examples given at the beginning of this section. Example E5 has surprisal $(z) very close to L(z) log(#Z). We classify strings of this sort as absolutely-random, by which we mean algorithmically-random.
Examples E1, E2, and E3 all have surprisal much less than their length. These strings are clearly not absolutely-random.
The interesting case is example E4. Intuitively, we think of this as “somewhat unpredictable” but more predictable than E5. To make this notion quantitative, we introduce the notion of surprise density. The following quantity tells us the surprise density of the string z in the region from i to j:
where Front(z,i) is the substring formed by taking the first i symbols of the string z.
The surprise density for examples E1, E2, and E3 is zero for any region not too near the beginning. The surprise density for example E5 is as large as it possibly could be, namely 100% of log(#Z). Example E4 illustrates the interesting case where the surprise density is quite a bit greater than zero, but not quite 100% of log(#Z).
As mentioned in section 1.2, we consider the unadorned term “random” to be ambiguous, because it means different things to different people. Some people think “random” should denote 100% surprise density, and anything less than that is “non-random” even if it contains a great deal of unpredictability. Other folks think that “random” refers to anything with an surprisal density greater than zero, and “non-random” means completely predictable. Yet other folks extend “random” to include pseudo-random strings, as long as they are “random enough” for some application of interest. Even some professionals in the field use the word this way; reference 36 and the more recent reference 37 speak of “deterministic random number generators”, although it could be argued that it is impossible in principle for a process to be both deterministic and random. The randomness of a PRNG comes from the seed. Calling a PRNG “deterministic” seriously understates the importance of the seed.
In the space of all possible strings, almost all strings are absolutely-random, i.e. are algorithmically-random, i.e. contain 100% surprise density. However, the strings we find in nature, and the strings we can easily describe, all have very little surprise density. We can summarize this by saying: “All strings are absolutely-random, except for the ones we know about”.
We can use similar ideas to describe PRNGs and contrast them with HRNGs.
|Most of modern cryptography revolves around notions of computational intractability. (I’m not saying that’s good or bad; it is what it is.||In contrast, there is another set of notions, including entropy and the related notion of unicity distance, that have got nothing to do with computational intractability.|
|The long strings produced by a good PRNG are conditionally compressible in the sense that we know there exists a shorter representation, but at the same time we believe them to be conditionally incompressible in the sense that the adversary has no feasible way of finding a shorter representation.||The long strings produced by a HRNG are unconditionally, absolutely incompressible. Specifically, the set of strings collectively cannot be compressed at all. As for each string individually, it is not compressible either, except possibly for trivially rare cases and/or trivial amounts of compression.|
In this section we climb down a bit from the formality of the previous section.
Let C be a machine that endlessly emits symbols from some alphabet Z. We assume the symbols are IID, that is, independent and identically distributed. That means we can calculate things on a symbol-by-symbol basis, without worrying about strings, the length of strings, or any of that. Let PC(i) denote the probability of the ith symbol in the alphabet.
Note: Nothing is ever exactly IID, but real soundcards are expected to be very nearly IID. At the least, we can say that they are expected to have very little memory. What we require for good random generation is a very mild subset of what is required for good audio performance.
The surprisal of a given symbol is
|$C(i) := log(1/PC(i)) (14)|
By extension we can talk about the probability of a string of symbols. Since the symbols are IID, the probability of a string z is:
where zk is the symbol in the kth position in the string. As an immediate corollary,
which just says that the surprisal is additive in a nice way.
The source entropy of the machine C is defined to be:
|PC(i) log(1/PC(i)) (17)|
Let’s be clear: The surprisal is a property of the symbol, while the entropy is a property of the source. In particular, the entropy is the appropriately-weighted average surprise value.
For a long string, we expect the surprisal of the string to converge to the entropy of the source times the length of the string.
Beware: In the literature it is common to see the word “entropy” used in places where surprisal would make more sense. In particular, the code and documentation for the Linux /dev/random talks about depleting the “entropy” in its buffer.
Let’s look again at the definition of entropy, equation 2. It is manifestly a single number that characterizes a certain average property of the distribution P. If you wanted a fully detailed description of the distribution, you would need to know Pi for all i (and even more information than that, unless the samples are IID).
As previously mentioned, the entropy S can be interpreted as the appropriately weighted average surprise-value.
Suppose we have a distribution so badly skewed that for some symbols, the surprisal is vastly smaller than the average surprisal. In this case, focussing on the average value – i.e. the entropy – is not going to solve all the world’s problems. The typical case may not be the same as the average case.
Specifically: for a skewed distribution, entropy is often not a good measure of cryptographic strength or resistance to cryptanalytic attack. You may care more about the minimum surprise value than about the average surprisal.
An example of a skewed distribution is shown in figure 8. The disk represents unit probability. We see that there is one symbol that monopolizes half the probability, and then there are N−1 other symbols that share the other half of the probability. The figure portrays the case where N=9. It is easy to show that the entropy for such a distribution is 1 + .5 log(N−1).
Here’s the basic idea for dealing with skewed distributions: When filling the buffer with raw data, we need to keep a running estimate of how much surprisal is in the buffer. Call this $′.
There are several variations on this basic scheme:
Additional things to keep in mind include:
The general rule is: Drawing a small sample from a highly-skewed distribution is not guaranteed to produce “average” behavior. Duh, no surprise there!
Again: the source entropy S in equation 2 is a single number that characterizes the “average” behavior of the source. For a small sample from a highly-skewed distribution, S doesn’t tell you everything you need to know. This has no bearing on the definition of entropy; entropy is always defined by equation 2. It just means that equation 2 by itself doesn’t solve all the world’s problems.
In practice, we don’t have too much of a problem with skewed distributions, for two reasons: First of all, the raw data from a typical soundcard is not terribly skewed. It could perhaps be skewed if the card is producing much less than 1 bit of surprisal per sample ... but such cards produce so little randomness that you ought to re-engineer the system to use a better card. Secondly, turbid is somewhat protected by the fact that the hash function has a reasonably-wide output word. By the time the input buffer contains enough randomness to saturate the hash function, the law of large numbers has taken hold sufficiently to wash out any small amount of skew that may be present. That is, the average surprisal (averaged over a bufferful of data) converges reasonably quickly to the source entropy.
This section is a collection of details about certain soundcards. (You can refer back to table 4 for a summary of the performance of a few soundcards. Also, section 12 provides step-by-step procedures that apply to “generic” soundcards.)
Soundcards generally include mixer functions. You have to initialize the mixer, to set the sliders on the features you want and/or to set the switches to bypass the features you don’t want. This is a bit of a black art, because the soundcards typically don’t come with any “principles of operation” documentation.
Graphical user interface (GUI) programs such as alsamixer may help you make some settings, but may not suffice to fully configure the mixer. All too often, the card has “routing switches” and other features that the GUI doesn’t know about. A reasonable procedure that gives complete control is to invoke “alsactl -f record.ctl store”, edit record.ctl, then “alsactl -f record.ctl restore”. Or you can use amixer to fiddle one setting without disturbing others.
The Intel High-Definition Audio spec (reference 38) describes the codec but says remarkably little about mixer functions.
This is yet another example of hype and deception from Creative Labs. The device itself, the box it comes in, and the advertising all prominently mention USB, 24-bit audio, and 96 kHz sampling rates. However, according to all available evidence, the device will not transmit 24-bit audio via USB to the DACs or from the ADCs (no matter what the data rate). It also will not transmit 96 kHz samples via USB to the DACs or from the ADCs (no matter what the word size).
Apparently the only way in which the Extigy can deal with 24 bits or 96 kHz is by shipping it digitally (e.g. via S/PDIF) to/from some other device.
It also has astonishingly poor input sensitivity. It’s about 5 times worse than a typical cheap sound card, and about 100 times worse than a decent 24-bit soundcard, as you can see from table 4.
The Extigy is no good for present purposes. I recommend you do not buy this product. Many other Creative Labs products have problems, too, so in fact I recommend that you never buy anything made by them.
This is a high-end system, with eight analog inputs, eight analog outputs, 96000 frames per second, and 24 bits of resolution. M-Audio was previously known as Midiman.
Each of the eight analog input and each of the eight analog outputs has a pushbutton that I call the “pro/con” switch. Professional studio equipment (pro) runs at lower impedances and lower signal levels than consumer-grade home audio equipment (con). That is, the pro setting (switch pushed in) means lower impedance and lower open-circuit voltage on output, and lower impedance and greater voltage-sensitivity on input. Whether you care about the lower impedance depends on the impedance of whatever the card is hooked to.
The card does not have an 8×8 or 10×10 mixer board. It is apparently 10×2. We bypass it, as discussed below.
The alsa driver always expects 10 channels of output: 8 analog plus 2 S/PDIF. If you are only interested in a subset thereof, you have to pad things so the buffer has 10 samples per frame anyway. Similarly, the alsa driver always produces 12 samples per frame on input: 8 analog plus 2 S/PDIF plus 2 from the mixer. For present purposes we use the 8 analog inputs and ignore the others, since they contain no additional entropy. Therefore you must pass the --channel-mask 255 option to turbid; otherwise the program will have less entropy than it thinks it has, by a factor of 8/12ths.
Each of the eight analog inputs is always routed directly to its own DAC, so the PCM input functionality is always unaffected by the sliders and switches on the mixer.
The routing can be set so that each of the eight analog outputs is fed directly from the PCM data. This is recommended. This makes the PCM output functionality independent of the sliders on the mixer.
Getting this to work requires the following tricks:
As usual, you will need the .ctl file (included in the turbid bundle) plus the settings in table 4.
The same jack doubles as a Line-In port and Mike-In port, depending on software settings. Unlike the typical Mike-In port as described in section 12.2, this Mike-In port is stereo and does not provide power to the microphone. This would be bad if you wanted to connect a microphone that needed power, but it’s ideal for our purposes. Basically it’s like an ordinary Line-In port, with higher sensitivity.
The principle of the bandwidth measurement is as follows: Equation 7 tells us the variance of the voltage right at the resistor. The quantity we actually measure has been amplified, according to some gain function G which depends on frequency. So a more complete description is
|4 k T R |G(f)|2 df (18)|
where the subscript o on σo indicates this is the quantity we observe, and fmax is some “large enough” frequency; the real, relevant bandwidth B is hidden in G, which goes to zero at large frequencies. Since G is presumably an even function of frequency, we can rewrite this as
|2 k T R |G(f)|2 df (19)|
and in a discrete (sampled) system this becomes
|2 k T R |G(k Δf)|2 Δf (20)|
where we identify the sampling frequency N = MΔf = 2 fmax, covering the whole range of positive and negative frequencies. We can write this in the suggestive form
|σo2 = 4 k T R G*2 B (21)|
where G*2 is some nominal mid-band power gain, if we identify
||G(k Δf)|2 Δf (22)|
and we call B the power-averaged bandwidth.
By definition the gain G is Vo / Vr, where Vr is the voltage right at the resistor, and Vo is what we observe.
This expression applies to any Vo/Vr ratio, whether Vr comes from thermal noise or from any other source.
We choose to inject a non-thermal probe signal Vr. It is constructed to be “white” – i.e. it has constant magnitude independent of frequency. That allows us to pull it outside the summation:
and by Parseval’s theorem we can rewrite that frequency-domain expression as a time-domain expression
It is easy to get a good lower bound on B. At compile time, we know the root-mean-square (RMS) magnitude of the internal representation of the probe signal; by construction it is -15dBFS. Then given K and R (and Zref) we know ⟨Vr2⟩ in the passband. At frequencies outside the passband this voltage will roll off a little, because of the anti-aliasing filter in the soundcard output stage. This will cause us to under-estimate B by a little, which is harmless. Finally we determine ⟨Vo2⟩ by summing the squares of the observed voltage samples and multiplying by the appropriate calibration factors (involving Q).
You might have thought based on the form of equation 18 that we would need to take a Fourier transform, but we don’t, because of Parseval’s theorem.
All of Volume I is available online:
Copyright © 2005 jsd