_ [Contents]

Copyright © 2005 jsd

High-Entropy Randomness Generator
 
John S. Denker

ABSTRACT: We discuss the principles of a High-Entropy Randomness Generator (also called a Hardware Random Number Generator, or True Random Number Generator) that is suitable for a wide range of applications, including cryptography, high-stakes gaming, and other highly adversarial applications. It harvests entropy from physical processes, and uses that entropy efficiently. The hash saturation principle is used to distill the data, resulting in virtually 100% entropy density. This is calculated, not statistically estimated, and is provably correct under mild assumptions. In contrast to a Pseudo-Random Number Generator, it has no internal state to worry about, and does not depend on unprovable assumptions about “one-way functions”. We also describe a low-cost high-performance implementation, using the computer’s audio I/O system.

Other keywords: TRNG, HRNG, hardware random number generator, uniform hash, unbiassed coin flip

For information about downloading the code and/or joining the discussion list, see http://www.av8n.com/turbid/

*   Contents

1  Introduction; Basic Notions

Although there exist garden-variety “Random Number Generators” that are suitable for garden-variety applications, they fail miserably in high-stakes adversarial applications. See section 2 for a discussion of the range of applications. Some of the things that can go wrong are discussed in section 9.1.

In this paper, we explain how to construct a High-Entropy Randomness Generator, suitable for a wide range of applications, including extremely demanding ones. We will explain and then use some key theoretical ideas:

We have implemented a generator using these principles. The result is what most people would call a True Random Number Generator. Salient engineering features include:

At the opposite extreme there stands the classic Pseudo-Randomness Generator (PRNG), which revolves around an internal state variable. The state variable must be initialized, which is often remarkably problematic. Then the state variable must be preserved and defended for all time, which is problematic, too. Furthermore, one must make some drastic, unproven assumptions, namely the existence of a one-way function that has long-term resistance to cryptanalytic attack. A PRNG has finite total entropy, so its output has asymptotically zero entropy density. The rand() or random() functions provided by typical computer languages are in fact Pseudo-Random.

Between these two extremes, we can have a Stretched Randomness Generator (SRNG) as discussed in section 13.4. This is a hybrid, with some internal state and some physical entropy-input. This may be able to produce symbols more copiously than a purely stateless system, while remaining highly resistant to cryptanalysis. It has an entropy density that is strictly greater than zero but less than 100%.

We consider the terms “random” and “non-random” to be clumsy and prone to misunderstanding. We prefer to discuss things in terms of entropy and entropy density, for the following reasons:

We will use the term random in an informal, heuristic sense, when precise meaning is not required. We will use the term entropy when we want to be precise.

2  Applications of Randomness Generators

Here is a list of typical applications where some degree of randomness is needed, followed by a discussion of the demands such applications place on the randomness generator:

  1. Monte Carlo integration, for solution of problems in physics, decision analysis, et cetera.
  2. Simulated annealing, e.g. for solution of combinatorial optimization problems.
  3. Assigning members to the “test” and “control” groups in a statistical study.
  4. Implementation of games and gambling, such as deciding the winner of a lottery.
  5. Decision-making for business, military, and gaming applications, such as choosing what lottery number to play.
  6. Cryptography.
  7. Generating seeds for Pseudo-Randomness Generators .

The first three items require random symbols with good statistical uniformity and independence, but do not usually require resistance against cryptanalytic attack. The molecules in a Monte Carlo simulation are not going to attack the Randomness Generator. Similarly, the shuffle used in a low-stakes game of Crazy Eights does not require a cryptographically strong RNG, since nobody is going to bother attacking it.

As the stakes get higher, the demands on the RNG get stricter. Implementing a multi-million dollar lottery requires the RNG to have excellent tamper-resistance as well as good statistical properties.

Game players need randomness, too. In the simple scissors/paper/stone game, choosing a move at random is the minimax strategy. More sophisticated games like poker require a mixed strategy, i.e. a mixture of deterministic and non-deterministic considerations. You must study the cards, study the other players, and sometimes make an unpredictable decision about whether to bluff or not.

Cryptography is far and away the predominant high-volume consumer of high-grade entropy. This is what motivated our search for better generators. Even a smallish secure-communications gateway can consume millions of bits in a one-minute period.

Item #7 requires special comment: Suppose you need to initialize the internal state in a PRNG. You can’t initialize it from that PRNG itself. If you derive the seed from another PRNG, we must ask how that was seeded. And so on. The only way to prevent an infinite regress is to use a hardware-based generator at some point. We conclude that there is really no such thing as a pure, all-software PRNG. That’s because directly or indirectly, each and every PRNG requires initialization from outside – that is, outside of the usual models of “computation”.

3  High-Entropy Randomness Generator

3.1  Overview

Figure 1 shows a block diagram of a High-Entropy Randomness Generator. The symbols it generates are random in the strictest sense.

hrng1
Figure 1: High-Entropy Randomness Generator

The starting point for an HERNG is the unpredicability associated with one or more physical processes, such as radioactive decay or thermal noise. Such physical processes are partially random, in the sense that they contain contributions which are believed to be extraordinarily difficult for anyone to predict, perturb, or replay.

The raw symbols (as sampled from the data-acquisition apparatus) will, in general, be partly predictable and partly unpredictable, as discussed in section 9.3. The raw symbols are essentially 100% resistant to the attacks that plague PRNGs. That’s because those attacks are directed toward capturing the internal state, and the HERNG doesn’t have any internal state worth capturing.

To quantify the amount of randomness in the raw samples, we use the concept of surprise value, defined as:

$(i) :=  log(1/Pi)              (1)

where the sum runs over all possible states of the object in question, and Pi is the probability of that state. For details, see appendix A.

Let’s consider an example. We read one byte of data from the data-acquisition system. There will be 256 possible states. Suppose that the data is almost entirely predictable:

Next, consider three bytes of data from the same source. The first byte has the value A1 or B1 (as before), the second byte is either A2 or B2, and the third byte is either A3 or B3. In each case there is 99-to-1 probability ratio, and the probabilities are independent. There are in principle 2563 states, but only eight states occur with any nonzero probability. You can easily enough calculate the surprise value in each case.

Continuing down this road, consider 2100 such bytes. There will be 22100 possible states that occur with nonzero probability. We can group them into 2101 classes, according to whether they contain 0, 1, ... 2100 copies of the secondary symbol. The most likely single class will be the one with 21 copies of B. This class (and every member of this class) will have a surprise value of 170 bits, which is just what we would expect, based on multiplying the source entropy by 2100.

The states containing 20 or 22 copies will be less likely, but only slightly less. Indeed, using the standard formula for the binomial distribution, we find there will be an 11% chance of getting 16 or fewer copies of B, which means an 11% chance that the surprise value will be 137 or less.

If, instead, we consider 5000 such bytes, there is only about one chance in a billion that the total surprise value will be less than 170 bits.

At this point we’ve got a substantial amount of surprise value, but the surprise density remains quite low, only 0.08 bits per byte. Our customers want our output to have a surprise density closer to 8 bits per byte, so we have some more work to do. This is the role of the hash function, as shown in figure 1.

Let us return for a moment to the case where only one byte has been received from the data-acquisition apparatus. We feed this data to the hash function. For definiteness, let’s use SHA-1 (reference 16) as the hash function. That is a fancy cryptologic hash function. (It is fancier than we really need, but it does the job.) It takes an arbitrarily long input and produces an output of width W = 160 bits. If we histogram the raw data, there are 256 states, of which two occur with nonzero probability. If we histogram the output of the hash function, there are 2160 states, of which two occur with nonzero probability.

The story is similar when there are three bytes of raw data. There are 8 states at the input to the hash function, which map to 8 states at the output. All of the surprise value present at the input is transported to the output, minus some infinitesimal allowance for the possibility that two inputs might produce the same hash-output (i.e. a hash collision).

The story changes slightly when we have 2100 bytes of raw data. The entropy density remains the same, so now we have Sin = 170 bits of entropy in the buffer, at the input to the hash function. There are 22100 states with nonzero probability, and 2170 states with appreciable probability. At the output, there cannot possibly be more than 2160 states, because the output is represented using 160 bits. There will be hash collisions ... lots of them. Each of the 2W output states will have on average 22100/2160 = 21940 inverse images, of which on average 2170 / 2160 = 210 occur with appreciable probability. In a moment we will clarify what “on average” means, but first let’s finish the calculation of the entropy of the hash-function output.

We histogram the output, which requires C = 2W cells in the histogram. Unlike the very sparsely-occupied histograms considered previously, this histogram will have nonzero occupation in every cell. There are 2170 input states we need to worry about, each carrying a snippet of probability, each measuring roughly 2−170. Each time we apply the hash function to such an input, that snippet of probability lands in one of the cells of the output-histogram. If by some miracle all these contributions were exactly evenly distributed, the entropy of the output register would be exactly 160 bits. But in fact some of the cells will accumulate a few more snippets than average, and some a few less.

To an excellent approximation, the probability that a given cell will contain exactly m snippets is given by the binomial distribution:

P(m | N,C) = 
N
m! (Nm)!
 (1/C)m (1−1/C)Nm              (2)

where C is the number of cells (log(C) = W = 160 in this example) and N is the total number of snippets deposited into those cells (log(N) = Sin = 170). We can verify that the normalization is correct:

N
m=0
 P(m | N,C) = 1              (3)

which conveys the unsurprising fact that in every cell there must be some number of snippets.

Now, to calculate the entropy, we need to apply equation 16, which requires summing over all cells in the output-histogram. It is convenient to re-arrange the sum as follows: we sort the cells into groups, according to their m value. We perform the sum group by group, taking into account the number of members in each group, namely C P(m | N,C) members in the group where the snippet-count is m. So the entropy is:

Sout(C,N) = −
N
m=0
 (m/N) log(m/NC P(m | N,C)               (4)

and we recall that log(C) is the width of the output register, and log(N) is the amount of entropy present in the input buffer.1

3.2  Saturation

Sout is quite a remarkable function. If N is small compared to C, then Sout just equals Sin. That is, all the input-entropy shows up as output-entropy. On the other hand if N is large compared to C, then Sout saturates and converges to W, the maximum entropy the output register can hold. This can be seen in the following table (using W = 160 bits in this example):

input entropyoutput entropy
SinSout
11
33
155154.97
160159.17
165159.98
170159.9993
Table 1: Hash Saturation

This is very nice. It means we need only a few “extra” bits of entropy in the input buffer, to smash the output up against the asymptote, giving the output virtually 100% entropy density.

Note that equation 4 has a useful homogeneity property, provided C is not unreasonably small: If you add k bits of width to the output register (W), and add k bits of entropy to the input buffer (Sin), then you will get k more bits in the output register (Sout). For example, if we add 100 to the numbers in the table above, we find that feeding 265 bits to a 260-bit hash function will produce 259.98 bits at the output.

To summarize the design: To produce a high entropy density we require:

4  Objectives and Non-Objectives

Generally speaking, our objective is to overcome, to the extent possible, the problems discussed in section 9.1. This section discusses what is possible and what is not possible.

So far, we have analyzed the HERNG mainly in terms of entropy density, denoted by ρ. We choose a criterion ρmin very close to 100%, and then build a generator that produces ρ > ρmin. We are not pretending to achieve ρ = 100% exactly.

But entropy density does not tell the whole story. For instance, consider a sequence of 8000 completely random bits followed by 1 completely predictable bit. That would have an entropy density of 99.99% (equivalent to 159.98 bits of entropy in a 160 bit word) on average. Despite that high average, the sequence would be unsuitable for many applications. For instance it would be unsuitable for use as a stream cipher.

In contrast, consider a 160-bit word containing 159.98 bits of entropy spread more-or-less evenly throughout the word. It would be very much harder for an adversary to exploit the fact that ρ is not quite 100%.

In the cryptology literature, there are two main types of arguments: information-theory arguments and computational-feasibility arguments. From an information-theory point of view, it is possible for your adversaries to break any block cipher (such as DES or AES) by searching the key-space. The amount of ciphertext they need to mount such an attack is given by the unicity distance, which is proportional to the key-length and inversely proportional to how much they know about the statistics of the plaintext. For example, just a few lines of ordinary (uncompressed) business correspondence grealy exceeds the unicity distance for commonly-used ciphers such as 3DES or AES-128, so an attack is possible in principle. Whether it is computationally feasible for your adversaries to mount such an attack is another question.

In contrast, information theory tells us that it is impossible to cryptanalyze a one-time pad system, if the pad is truly random. It absolutely does not matter how much computing power the adversariers bring to bear. The unicity distance is infinite.

So: When we say that the output of the HERNG has 159.98 bits of entropy spread throughout a 160-bit word, that is basically an information-theory statement, based on physics and statistics plus some mild assumptions about the mixing properties of the hash function. In contrast, to argue that your adversaries cannot exploit the deficit (the missing 0.02 bits of entropy) will presumably require a computational-feasibility argument that depends on the details of your application. Such application-specific arguments are beyond the scope of this paper, although we remark that using a HERNG ought to be much better than using some less-carefully designed RNG, and there are very many applications for which the HERNG seems more than good enough.

5  Hash Function Requirements

5.1  Some Counterexamples

In order for the HERNG to work, the hash function be reasonably well-behaved. As an example of what could go wrong, consider a few badly-behaved examples:

5.2  General Requirements

The hash function cannot possibly create entropy, and we do not need it to do so. Instead, what we need is a hash function that does not waste entropy through unnecessary hash collisions. In the unsaturated regime, there should be very few collisions, so any entropy present at the input is passed to the output without loss. In the saturated regime, when the output approaches 100% entropy density, there will necessarily be collisions and loss of entropy.

Hash functions are used in a wide range of applications. We start by considering how they are used in compilers: the input is a symbol-name, and the output is used as an index into the symbol table. It is often much more efficient to manipulate the hash than to manipulate the corresponding input string.

When two different inputs produce the same hash, it is called a hash collision. This is not desirable from the compiler’s point of view. So, the essential properties of a compiler-grade hash function are:

1)
It takes as input a variable-length string, and returns a fixed-length string, called the hash.
2)
It is a function; that is, the same input produces the same output every time.
3)
Vague anti-collision property: There should not be more hash collisions than necessary.

A cryptologic hash function has several additional properties:

4)
It is computationally infeasible to construct an input that produces a given hash.
5)
Cryptologic anti-collision property: It is computationally infeasible to construct two different inputs that produce the same hash.
6)
Noninvertibility: For any given hash, it is is computationally infeasible to find an input that generates that hash.

A typical application for a cryptologic hash is to compute a Hashed Message Authentication Code (HMAC), which can be used to detect and deter tampering with the message. See reference 4.

Property #6 (noninvertibility) only applies if there is a sufficient diversity of possible inputs. If the only possible inputs are “yes” and “no” then the adversary can easily search the input space; this is called a brute-force attack. When computing HMACs one prevents such a search by padding the message with a few hundred bits of entropy. The hash function itself has property #2 but not property #6, while the HMAC has property #6 but not property #2, because HMAC(message) = HASH(padding+message).

When building a High-Entropy Randomness Generator, we do not require noninvertibility at all. If the adversary knows the current hash-function output, we don’t care if the hash-function input can be inferred, because that has no bearing on any past or future outputs. The generator has no memory. This is one of the great attractions of this HERNG design, because nobody has ever found a function that is provably noninvertible.

In stark contrast, noninvertibility is crucial for any Pseudo-Randomness Generator. It is easy to arrange that the state variable (the input to the hash function) has plenty of bits, to prevent brute-force searching; that’s not the problem. The problem lies in initializing the state variable, and then defending it for all time. The defense rests on the unproven assumption that the hash function is noninvertible. What’s worse, the burden on the hash function is even heavier than it is in other applications: The hash function must not leak even partial information about its input. Otherwise, since the hash function is called repeatedly, the attacker would be able to gradually accumulate information about the generator’s internal state.

In discussions of this topic, the following notion often crops up:

7)
Strict avalanche2 property: whenever one input bit is changed, every output bit must change with probability ½. See reference 5.

This statement is not, alas, strict enough for our purposes. Indeed, WEAKHASH-1 and WEAKHASH-2 satisfy this so-called strict avalanche criterion. To be useful, one would need to say something about two-bit and multi-bit changes in the input. We dare not require that every change in the input cause a change in the output, because there is a simple pigeon-hole argument: There are V = 2562100 input states and only C = 2160 output states. There will be hash collisions; see section 3.2.

5.3  Constructing Uniform Hash Functions

To see what criteria are really needed, we will need a more sophisticated analysis. The most general representation of a hash function (indeed any function) is as a truth table. Table 2 shows a rather weak hash function.

Row Input Output
1 () (01)
2 (0) (10)
3 (1) (11)
4 (00) (00)
5 (01) (01)
6 (10) (10)
7 (11) (11)
8 (000) (00)
9 (001) (01)
10 (010) (10)
11 (011) (11)
12 (100) (00)
13 (101) (01)
14 (110) (10)
15 (111) (11)
Table 2: Truth Table: Weak Hash Function

The input symbols are delimited by parentheses to emphasize that they are variable-length strings; row 1 is the zero-length string. It should be obvious how to extend the table to cover arbitrary-length inputs. It should also be obvious how to generalize it to larger numbers of hashcodes.

The hashcodes in the Output column were constructed by cycling through all possible codes, in order. All possible codes occur, provided of course that there are enough rows, i.e. enough possible inputs. What’s more, all possible codes occur with nearly-uniform frequency, as nearly as can be. This construction produces exactly uniform frequency if and only if the number of rows is a multiple the number of possible codes, which we call the commensurate case. The example in the table is non-commensurate, so even though the codes are distributed as evenly as possible, complete evenness is not possible. In this example, the (00) code is slightly underrepresented.

Table 2 is a hash function in the sense that it maps arbitary-length inputs to fixed-length outputs, but it has very poor collision-resistance, and it has no vestige of one-wayness.

Starting from table 2 can construct a better hash function by permuting the rows in the Output column, as shown in table 3. If there are R rows in the table and C possible codes, there are V! / (V/C)!C distinct permutations. This is exact in the commensurate case, and a lower bound otherwise. This is a huge number in practical situations such as V = 2562100 and C = 2160.

Row Input Output
1 () (01)
2 (0) (10)
3 (1) (10)
4 (00) (00)
5 (01) (11)
6 (10) (01)
7 (11) (01)
8 (000) (11)
9 (001) (10)
10 (010) (00)
11 (011) (11)
12 (100) (00)
13 (101) (11)
14 (110) (01)
15 (111) (10)
Table 3: Truth Table: Improved Hash Function

If we temporarily assume a uniform probability measure on the set of all possible permutations, we can choose a permutation and use it to construct a hash function. Statistically speaking, almost all hash functions constructed in this way will have very good anti-collision properties and one-way properties (assuming V and C are reasonably large, so that it makes sense to make statistical statements).

This way of constructing hash functions is more practical than it might initially appear. Note that any cipher can be considered a permutation on the space of all possible texts of a given length. Hint: use a pigeon-hole argument: The ciphertext is the same length as the plaintext, and the fact that the cipher is decipherable guarantees that there is a one-to-one correspondence between ciphertexts and plaintexts.

This suggests a general method of constructing hash functions: Pad all intputs to some fixed length. Append a representation of the original length, so that we don’t have collisions between messages that differ only in the amount of padding required. Encipher. If it’s a block cipher, use appropriate chaining so that the resulting permutation isn’t a block-diagonal matrix. Pass the cipertext through a contraction function such as WEAKHASH-0 to produce a fixed-length output. That is:

Pad → Encipher → Contract                              (5)

For starters, this serves to answer a theoretical question: it is certainly possible for a hash function to generate all codes uniformly.

By the way, a hash constructed this way will have the one-way property, if the adversaries don’t know how to decipher your cipher. You could ensure this by using a public-key cipher and discarding the private key. If you don’t need the one-way property, you can use a symmetric cipher without bothering to keep the key secret.

Another way to add the one-way property, for instance for a PRNG or SRNG, is to tack on a one-way function downstream of the cipher-based hash function. This may be more computationally efficient than using asymmetric crypto. A cipher with a random key will suffice. If the one-way function requires a key, consider that to be part of the keying and re-keying requirements of the PRNG or SRNG.

In any case, keep in mind that turbid does not require its hash to be one-way.

We remark that many modern hash functions including SHA-1 follow the general plan PadScrambleContract which is rather similar to scheme 5. A key difference is that the scramble function is not a normal cipher, because it is not meant to be deciphered. The Feistel scheme is conspicuously absent. Therefore we can’t easily prove that it is really a permutation. The pigeon-hole argument doesn’t apply, and we can’t be 100% sure that all hashcodes are being generated uniformly.

5.4  Assigning Probability to the Inputs

In section 5.3 we temporarily assumed a uniform distribution on all possible truth-tables. That’s not quite realistic; the mixing performed by SHA-1 (or any other off-the-shelf hash function) is not the same as would be produced by a randomly-chosen truth table. We need to argue that it is, with virtual certainty, impossible for anyone to tell the difference, even if the adversaries have unlimited computing power.

Rather than just assuming that the hash function has the desired property, let’s look a little more closely at what is required, and how such a property can come about.

In section 5.3 each row in the Input column was treated pretty much the same as any other row. It is better to assign measure to each input-state according to the probability with which our data-acquisition system generates that state.

We then sort the truth-table according to the probability of each input. (This sort keeps rows intact, preserving the Input/Output relationship, unlike the permutation discussed previously which permuted one column relative to the others.) In the Input column we list all the states (all 2562100 of them) in descending order of probability. In the Output column we find the corresponding hash. This allows us to see the problem with WEAKHASH-1: Because we are imagining that the data is an IID sequence of samples, with much less than 100% entropy density, a huge preponderance of the high-probability data strings will have the same hash, because of the permutation-invariance. Certain hashcodes will be vastly over-represented near the top of the list.

So we require something much stronger than the so-called strict avalanche property. Mainly, we require that anything that is a symmetry or near-symmetry of the raw data must not be a symmetry of the hash function. If single-bit flips are likely in the raw data, then flipping a single bit should move the data-string into a new hashcode. If flipping a few bits at a time is likely, or permutations are likely, then such things should also move the data-string to a new hashcode.

The input buffer will typically contain some bits that carry little or no entropy. In the space of all possible truth tables, there must exist some hash functions that look at all the wrong bits. But these must be fantastically rare. Consider the cipher-based hash functions decribed previously. For almost any choice of cipher key, the function will permute the hashcodes so that a representative sample falls at the top of the list. The key need not be kept secret (since the adversaries have no power to change the statistics of the raw data).

Unless we are extraordinarily unlucky, the symmetries of natural physical processes are not symmetries of SHA-1.

Let us contrast how much a cryptologic hash function provides with how little we need:

A cryptologic hash function advertises that it is computationally infeasible for an adversary to unmix the hashcodes.   What we are asking is not really very special. We merely ask that the hashcodes in the Output column be well mixed.

A chosen-plaintext (chosen-input) attack will not discover inputs that produce hash collisions with any great probability.   We ask that the data acquisition system will not accidentally produce an input pattern that unmixes the hashcodes.

We believe that anything that makes a good pretense of being a cryptologic hash function is good enough for our purposes, with a wide margin of safety. If it resists attack when the adversary can choose the inputs, it presumably resists attack when the adversary can’t choose the inputs. See also section 5.5.

Note that we don’t care how much the adversary knows about the distribution from which the input samples are drawn. We just need to make sure that there is enough sample-to-sample variability. If there is some 60-cycle hum or other interference, even something injected by an adversary, that cannot reduce the variability (except in ultra-extreme cases).

Once noise is added to a signal, it is very hard to remove — as anyone who has ever tried to design a low-noise circuit can testify.

5.5  A Simple Test of the Hash Function

It is often suggested that we test whether the output of turbid is random, using packages such as Diehard (reference 6) and Maurer’s Universal Statistical Test (“MUST”, reference 7). We have several things to say in response:

1)
SHA-1 does pass those tests.
2)
SHA-1 passes a stricter test, the counter test described below.
3)
Being able to pass such a test is necessary but far from sufficient as a criterion for the correctness of the turbid program. See section 7.

We note in passing a trivial weakness: There is a wide category of hash functions (including SHA-1) that operate on blocks of input data, and are Markovian in the sense that they remember only a limited amount of information about previous blocks. It has been shown that any hash in this category will be more vulnerable to multicollisions than an ideal hash would be. However, this is a chosen-plaintext attack, and is not even remotely a threat to turbid.

SHA-1 has been cryptanalyzed by experts. See reference 20, reference 21, and references therein. To date, no attacks have been reported that raise even the slightest doubts about SHA-1’s suitability for use in turbid.

However, bowing to popular demand, we performed some tests, as follows: We constructed a 60-bit counter using the LSB from each of 60 32-bit words; the higher 31 bits of each word remained zero throughout. These sparsely-represented 60-bit numbers were used, one by one, as input to the hash function. The resulting hashcodes were taken as the output, analogous to the output of turbid. We then applied Diehard and MUST.

As expected, SHA-1 passed these tests.

In relative terms, we consider this counter-based approach to be in some ways stricter than testing the output of turbid under normal operating conditions, because the counter has zero entropy. It has a goodly number of permanently-stuck bits, and the input to the hash function changes by usually only one or two bits each time.

However, in absolute terms, we consider all such tests ludicrously lame, indicative of the weakness of the tests, not indicative of the strength of SHA-1. We don’t want turbid to be judged by such low standards. See section 7 for more on this.

6  Entropy of the Raw Data

6.1  Choice of Input Device

In principle, there are many physical devices that could be used as the entropy source. We prefer the audio system (the “soundcard”) because such things are cheap, widely available, and easy to calibrate. They produce entropy at a rate sufficient for most applications.

In contrast, a video frame-grabber card would produce entropy more copiously, but it would be much more costly. Calibration and quality-assurance monitoring would be much more difficult.

6.2  Johnson Noise

As foreshadowed above, we choose to focus on the Johnson noise. We use the term “noise” in the technical sense, as a synonym for “thermal fluctuations”. (We are not talking about “noise” in the acoustical sense. We do not hook up a microphone to the audio system. If we were using a video frame-grabber we would not hook up a camera. We rely on the thermal fluctuations of the electrons in the circuitry.)

Johnson noise is not the only source of noise in the system. It is not even the largest contribution. However, it has one great virtue: we can easily calculate its magnitude without having to know much about the innards of the system.

Johnson noise is nice white noise. Samples are independent and identically distributed (IID), according to a Gaussian normal distribution with zero mean and variance given by

σ2 = 4 k T   B   R              (6)

where k is Boltzmann’s constant, T is the temperature (300 K), B is the power-averaged bandwidth as defined in section 12.7 (typically a shade less than N/2, where N is the sampling frequency), and R is the relevant resistance. We recognize σ as the root-mean-square (RMS) voltage. Every resistor contributes additional noise; we focus on R = RIn, the real part of the input impedance.

Tangential remark: Equation 6 is the classical expression, valid for not-too-low temperatures and not-too-high frequencies. The criterion is ℏω ≪ kT, so at room temperature we are assuming the soundcard operates at frequencies small compared to 600 terahertz. This assumption is quite well-founded in practice.

At cryogenic temperatures and/or very high frequencies, equation 6 must be replaced by a fancier expression based on quantum statistical mechanics.

The Gaussian normal probability density is:

dP(V | µ, σ) = 
√(2πσ2)
 exp
−(V−µ)2 
2
  dV              (7)

where µ is the mean and σ is the standard deviation. The corresponding cumulative probability is:

P(V | µ, σ) =  ½ + ½ erf
V − µ 
σ √ 2
             (8)

The data acquisition system will quantize the voltage in bins of size Q. That converts the continuous probability density in equation 7 into a set of discrete probabilities, i.e. the amount of probability in each bin. The entropy per sample is then just a sum over bins, in accordance with equation 16. We sometimes calculate it approximately by

s ≈ log(σ / Q) + ½ log(2πe)              (9)

where the first term is fairly obvious from dimensional analysis – just ask how many bins fall under the high-probability part of the Gaussian. The second term is just a constant (2.05 bits). Equation 9 is accurate to a fraction of a percent (or better) whenever σ/Q is greater than 3.3. Otherwise it is more convenient and more accurate to calculate the entropy directly from the definition, by plugging equation 8 into equation 16.

However, we need to be a little bit careful. Think of the transfer function of the digitizer as a staircase. The worst-case scenario is when the Gaussian (representing the Johnson noise) is narrow and centered near the middle of a tread. Virtually all the probability lands in that one bin, so the entropy is virtually zero. In contrast, suppose we change µ so that the Gaussian exactly straddles one of the risers. Then half the probability lands in in one cell, and half in the neighboring cell. The entropy is one bit, no matter how narrow the Gaussian is. You might imagine the chance of the Gaussian straddling a riser is inversely proportional to σ, if you assume that all µ values are equally likely. But we are not going to assume that.

Remember, turbid is designed around provable lower bounds on the entropy. Since we cannot be certain where the Gaussian sits relative to the nearest riser, we assume the worst case, namely that it sits in the middle of a tread. In practice this doesn’t cost us much, because it produces a fairly tight lower bound. The bound stops being tight when σ/Q is quite small (less than 0.2 or so), which only happens when the soundcard has too little sensitivity for practical use.

Example: The Line-In port on the Thinkpad 600 is woefully lacking in sensitivity, and its impedance and bandwith are unimpressive also. As a consequence, it has a rather narrow Gaussian which could easily sit entirely (except for negligible wings) in the middle of a single tread, so we cannot derive a useful lower bound on the entropy. Although intuition suggests that the signal actually contains some entropy, we can’t prove it, so we give up on this port. We use the Mike-In port instead, which has vastly more sensitivity.

The key characteristics of various audio I/O systems are listed in the following table. The SB16 appears twice, for reasons discussed in section C.1.

Card Q / µV RIn / Ω Kout   B   N σ / µV s / bits ROut / Ω
Deskpro onboard 6.6  8800.1541850048000 1.630.3 
Delta-1010 (pro)  0.22 3020.22336500960001.284.65 192
Delta-1010 (con)  1.31110001.2436500960002.553.041005
Thinkpad cs4239 (Line) 19.6 7900.037 9000441001.070 
Thinkpad cs4239 (Mike)  1.6510100.038 7300441001.091.58 
SB16 (440Hz)  4.615400.0343600044100 2.91.57 
SB16 (10kHz)  3.215400.0341800044100 2.111.58 
SB64AWE  2.973000.040 4500441002.31.78 
Extigy (Line) 22.520200.32718000480002.47e-5 
Extigy (Mike) 12.4 6100.33212000480001.13e-7 
SB PC128 CT470011.7257191.48 4556441001.44e-4 
Table 4: Soundcard Properties

Note that the CT4700 reports itself as “ENS1370” so it uses the ens1370.ctl control file.

6.3  Lower Bounds

As mentioned before, we need a lower bound on the entropy per sample, and that is what we have just calculated. This lower bound is used to ensure that the hash function’s buffer contains “enough” entropy, as discussed in section 3.

Let us briefly explore the consequences of mis-estimating the entropy:

1a)
If we make a slight overestimate, it means that the hash function, rather than putting out 2160 different outputs, will put out some rather smaller ensemble. In mild cases this may not be noticeable in practice.
1b)
A major overestimate may allow an adversary to focus an attack on the few output-words being produced.
2a)
If we modestly underestimate the entropy content of the raw data, there is no serious harm done.
2b)
A major underestimate will reduce the total randomness production capability of the machine. This means the processes that consume random symbols may block, waiting for the HERNG to produce more entropy.

6.4  Other Contributions to the Audio Signal

We looked at the power spectrum of the signal produced by a number of different sound cards. Based on these observations, and on general engineering principles, we can list various hypotheses as to what physical processes could be at work, namely:

  1. Johnson noise, due to thermal fluctuations in the front-end input resistor.
  2. Johnson noise in other resistors and dissipative elements later in the processing chain.
  3. Shot noise and generation/recombination noise, arising in small structures because electrons are not infinitely small.
  4. 1/f noise.
  5. Hum at powerline frequency and harmonics thereof.
  6. Interference from other subsystems such as video cards and switching power supplies.
  7. Sums of the above.
  8. Intermodulation. For example, intermodulation occurs when we digitize a superposition of fairly-low-amplitude hum plus very-low-amplitude high-frequency noise. That’s because the digitizer is sensitive to the high-frequency noise component only when the hum component is near one of the digitizer steps.
  9. Various filter functions applied to the above.
  10. Deceptive signals injected by an adversary.

Some of these are undoubtedly fine sources of randomness. The only problem is that except for item #1, we cannot calculate their magnitude without a great deal more engineering knowledge about the innards of the card. Using these sources might produce a HERNG that is correct, but not provably correct. Since we want something that is provably correct, we simply ignore these other contributions. They can’t hurt, and almost certainly provide a huge margin of safety.

That is: We need randomness in the raw data, and we need to know a provable lower bound on the amount of randomness. Turbid consistently applies the principle that having extra randomness is OK, and underestimating the amount of randomness is OK. This may give us less-than-optimally efficient use of the available randomness, but it guarantees correctness. Turbid consistently chooses correctness over efficiency.

7  The Role of Measurement and Testing

7.1  Issues of Principle; Cat and Mouse

By way of analogy, let us consider the relationship between code-breakers and code-makers. This is a complex ever-changing contest. The code-breakers make progress now and then, but the code-makers make progress, too. It is like an “arms race” but much less symmetric, so we prefer the term cat-and-mouse game.

At present, by all accounts, cryptographers enjoy a very lopsided advantage in this game.

There is an analogous relationship between those who make PRNGs and those who offer tools to test for randomness. The PRNG has a hidden pattern. Supposedly the tester “wants” the pattern to be found, while the PRNG-maker doesn’t.

We remark that in addition to the various programs that bill themselves as randomness-testers, any off-the-shelf compression routine can be used as a test: If the data is compressible, it isn’t random.

To deepen our understanding of the testing issue, let’s consider the following scenario: Suppose I establish a web server that puts out pseudo-random bytes. The underlying PRNG is very simple, namely a counter strongly encrypted with a key of my choosing. Each weekend, I choose a new key and reveal the old key.

The funny thing about this scenario is the difference between last week’s PRNG and this week’s PRNG. Specifically: this week’s PRNG will pass any empirical tests for randomness that my adversary cares to run, while last week’s PRNG can easily be shown to be highly non-random, by means of a simple test involving the now-known key.

As a modification of the previous scenario, suppose that each weekend I release a set of one billion keys, such that the key to last week’s PRNG is somewhere in that set. In this scenario, last week’s PRNG can still be shown to be highly non-random, but the test will be very laborious, involving on the order of a billion decryptions.

Note that to be effective against this PRNG, the randomness-testing program will need to release a new version each week. Each version will need to contain the billions of keys needed for checking whether my PRNG-server is “random” or not. This makes a mockery of the idea that there could ever be an all-purpose randomness tester. Even a tester that claims to be “universal” cannot be universal in any practical sense.3 An all-purpose randomness tester would be tantamount to an automatic all-purpose encryption-breaking machine.

7.2  Necessary but Not Sufficient

To paraphrase Dijkstra: Measurement can prove the absence of entropy, but it cannot prove the presence of entropy. More specifically, a “randomness testing” program will give an upper bound on the entropy density of a randomness generator, but what we need is a lower bound, which is something else entirely.

In section 6.2 we calculated a lower bound on the entropy density, based on a few macroscopic physical properties of the hardware. It is entirely appropriate to measure these macroscopic properties, and to remeasure them occasionally to verify that the hardware hasn’t failed. This provides a lower bound on the entropy, which is vastly more valuable than a statistically-measured upper bound.

As discussed in section 5.5, tests such as Diehard and Maurer’s Universal Statistical Test are far from sufficient to prove the correctess of turbid. They provide upper bounds, whereas we need a lower bound.

When applied to the raw data (at the input to the hash function) such tests report nothing more than the obvious fact that the raw data is not 100% random. When applied to the output of the hash function, they are unable to find patterns, even when the entropy density is 0%, a far cry from the standard (100%) we have set for ourselves.

A related point: Suppose you suspect a hitherto-undetected weakness in turbid. There are only a few possibilities:

The foregoing is a special case of a more general rule: it is hard to discover a pattern unless you have a pretty good idea what you’re looking for. Anything that could automatically discover general patterns would be a universal automatic code-breaker, and there is no reason to believe such a thing will exist anytime soon.

There are some tests that make sense. For instance:

On the one hand, there is nothing wrong with making measurements, if you know what you’re doing. On the other hand, people have gotten into trouble, again and again, by measuring an upper bound on the entropy and mistaking it for a lower bound.

The raison d’etre of turbid is that it provides a reliable lower bound on the entropy density, relying on physics, not relying on empirical upper bounds.

7.3  Actual Measurement Results

We ran Maurer’s Universal Statistical Test a few times on the output of turbid. We also ran Diehard. No problems were detected. This is totally unsurprising, and it must be emphasized that we are not touting this a serious selling point for turbid; we hold turbid to an incomparably higher standard. As discussed in section 7.2, we consider it necessary but not sufficient for a high-entropy randomness generator to be able to pass such tests.

7.4  Summary

To summarize this subsection: At runtime, turbid makes specific checks for common failures. As discussed in section 7.3 occasionally but not routinely apply general-purpose tests to the output.

We believe non-specific tests are very unlikely to detect deficiencies in the raw data (except the grossest of deficiencies), because the hash function conceals a multitude of sins. You, the user, are welcome to apply whatever additional tests you like; who knows, you might catch an error.

8  Error Detection versus Error Concealment

8.1  Basics

We must consider the possibility that something might go wrong with our high entropy randomness generator. For example, the front-end transistor in the sound card might get damaged, losing its gain (partially or completely). Or there could be a hardware or software bug in the computer that performs that hashing and control functions. We start by considering the case where the failure is detected. (The other case is considered in section 8.2.)

At this point, there are two major options:

We can ornament either of those options by printing an error message somewhere, but experience indicates that such error messages tend to be ignored.

If the throttling option is selected, you might want to have multiple independent generators, so that if one is down, you can rely on the other(s).

The choice of throttling versus concealment depends on the application. There are plenty of high-grade applications where concealment would not be appropriate, since it is tantamount to hoping that your adversary won’t notice the degradation of your randomness generator.

In any case, there needs to be an option for turning off error concealment, since it interferes with measurement and testing as described in section 7.

8.2  Combining Generators

We can also try to defend against undetected flaws in the system. Someone could make a cryptanalytic breakthrough, revealing a flaw in the hash algorithm. Or there could be a hardware failure that is undetected by our quality-assurance system.

One option would be to build two independent instances of the generator (using different hardware and different hash algorithms) and combine the outputs. The combining function could be yet another hash function, or something simple like XOR.

8.3  “Whitening” Considered Unhelpful

Figure 1 can be simplified to

Source → Digitizer → Hash                               (10)

It has been repeatedly suggested that we should add an additional “whitening” step, resulting in

Source → Digitizer → Hash → Whitener              (11)

using a cipher such as DES as the whitener.

We think such whitening schemes are a bad idea.

Specifically: There are only a few possibilities:

In general, people are accustomed to achieving reliability using the belt-and-suspenders approach, but that only works if the contributions are in parallel, not in series. A chain is only as strong as its weakest link.

In this case, it is a fallacy to think that a whitener can compensate for a weakness in the hash function. As an extreme example, consider the simplest possible hash function, one that just computes the parity of its input. This qualifies as a hash function, because it takes an arbitrary-length input and produces a constant-length output. Now suppose the data-acquisition system produces an endless sequence of symbols with even parity. The symbols have lots of variability, lots of entropy, but always even parity. The output of the hash function is an endless sequence of zeros. That’s not very random. You can run that through DES or any other whitener and it won’t get more random.

As a less extreme example, consider the WEAKHASH-2 function described in section 5.2. The whitener can conceal the obvious problem that all hashcodes have odd parity, but it cannot remedy the fundamental weakness that half the codes are going unused. The problem, no matter how well concealed, is still present at the output of the whitener: half the codes are going unused.

This is easy to understand in terms of entropy: once the entropy is lost, it cannot be recreated.

It is sometimes suggested that there may exist relatively-simple hash functions that have good mixing properties but require a whitener to provide other cryptologic strengths such as the one-way property. We respond by pointing out that the HERNG does not require the one-way property. We recognize that a Pseudo-Randomness Generator is abjectly dependent on a one-way function to protect its secret internal state, but the HERNG has no secret internal state.

We recognize that the HERNG may have some internal state that we don’t know about, such as a slowly-drifting offset in the data-acquisition system. But we don’t care. The correctness of the final output depends on the variability in the raw data, via the hash-saturation principle. As long as there is enough unpredictable variability in the raw data, the presence of additional variability (predictable or not) is harmless.

As a practical matter, since SHA-1 is remarkably computationally efficient, it is hard to motivate the search for a “simpler” hash functions, especially if they require whitening or other post-processing.

Finally, we point out that the last two steps in scheme 11 are backwards compared to scheme 5. If you really want whiteness, i.e. good statistical properties, it is better to put the crypto upstream of the contraction.

9  Threats Against Randomness Generators

9.1  Threats to Pseudo-Random Number Generators

For some applications, a well-designed PRNG may be good enough. However, for really demanding applications, at some point you might well throw up your hands and decide that implementing a good HERNG is easier.

A typical PRNG, once it has been seeded, depends on a state variable, a deterministic update function, and some sort of cryptologic one-way function. This allows us to classify typical PRNG attacks and failures into the following categories:

  1. Improper seeding, i.e. internal state never properly initialized.
  2. Leakage of the internal state over time. This rarely involves direct cryptanalytic attack on the one-way function, leading to leakage through the PRNG’s output channel. More commonly it involves side-channels.
  3. Improper stretching of limited entropy supplies, i.e. improper reseeding of the PRNG, and other re-use of things that ought not be re-used.
  4. Bad side effects.
  5. Gross defects, such as outputs that are somewhat predictable even if you don’t know the internal state.

The historical record contains many examples of failed PRNGs. For example:

The following scenario serves to illustrate a few more of the ideas enumerated above: Suppose you are building a network appliance. It has no mouse, no keyboard, no disk, and no other easy sources of randomness (unless you use the audio system, or something similar, as described in this paper). You want to sell millions of units. They start out identical, except possibly that each one has a network interface with a unique MAC address.

Seeding the PRNG with the MAC address is grossly inadequate. So the only way to have a halfway acceptable PRNG is to configure each unit by depositing into it a unique PRNG state vector. This means that each unit needs a goodly amount of writable but non-volatile storage; it can’t execute out of ROM and volatile RAM alone. Also, the stored state-vector needs to be updated from time to time; otherwise every time the machine is rebooted it will re-use the exact same numbers as last time, which would be an example of failure failure 3. Note that the standard Linux distributions only update the stored seed when the system is given a shutdown command – not if it goes down due to sudden power failure or software crash – which is unacceptable. You have to protect the state vector for all time. In particular, if a rogue employee sneaks a peek at the state vector (either on the machine itself, or on a backup tape, or whatever) and sells it to the adversaries, they can break all the traffic to and from the affected unit(s) from then on, which would be an example of failure failure 2. All these problems can be dealt with, but the cost might be so high that you give up on the PRNG and use a HERNG instead.

A subtle type of improper reseeding or improper stretching (failure 3) is pointed out in reference 17. If you have a source of entropy with a small but nonzero rate, you may be tempted to stir the entropy into the internal state of the PRNG as often as you can, whenever a small amount of entropy (ΔS) becomes available. This alas leaves you open to a track-and-hold attack. The problem is that if the adversaries had captured the previous state, they can capture the new state with only 2ΔS work by brute-force search, which is infinitesimal compared to brute-force capture of a new state from scratch. So you ought to accumulate quite a few bits, and then stir them in all at once (“quantized reseeding”). If the source of entropy is very weak, this may lead to an unacceptable interval between reseedings, which means, once again, that you may be in the market for a HERNG with plenty of throughput, as described in this paper.

The Linux kernel random number generator /dev/random (reference 22) accumulates entropy in a pool and extracts it using a hash function. It is associated with /dev/urandom which is the same, but becomes a PRNG when the pool entropy goes to zero. Therefore in some sense /dev/urandom can be considered a stretched randomness generator, but it has the nasty property of using up all the available entropy from /dev/random before it starts doing any stretching. Therefore /dev/urandom provides an example of bad side effects (failure 4). Until the pool entropy goes to zero, every byte read from either /dev/random or /dev/urandom takes 8 bits from the pool. That means that programs that want to read modest amounts of high-grade randomness from /dev/random cannot coexist with programs reading large amounts of lesser-grade randomness from /dev/urandom (such as a disk-wipe operation). In contrast, the stretched randomness generator described in this paper is much better behaved, in that it doesn’t gobble up more entropy than it needs. See section 10, section 13.2, section 13.3, and section 13.4.

It is certainly possible for PRNG failures to be found by direct analysis of the PRNG output, for example by using statistical tools such as reference 7.

More commonly, though, PRNGs are broken by attacking the seed-storage and the seed-generation process. Here are some examples:

  a) On a system that relies on /dev/urandom in the absence of sources of real entropy, the attackers capture a backup tape containing the /var/lib/.../random-seed file. Then they provoke a crash and restart, perhaps by interrupting the power. Then they know the output of /dev/urandom for all time thereafter.
  b) A related point: Suppose we are running a high-stakes lottery. Who provided the original seed for the randomness generator we will use? Even assuming (!) we can protect the seed for all times greater than t=0, where did the t=0 seed come from? If you provided it, how do I know you didn’t keep a copy?

If a PRNG contains N bits of internal state, it must repeat itself with a period no longer than 2N. Obviously, N must be large enough to ensure that repetition never occurs in practical situations. However, although that is necessary, it is far from sufficient, and making the period longer is not necessarily a good way to make the PRNG stronger. By way of example: A 4000-bit Linear Feedback Shift Register (LFSR) has a period of 24000 or so, which is a tremendously long period ... but the LFSR can easily be cryptanalyzed on the basis of only 4000 observations (unless it is protected by a strong one-way function installed between the LFSR and the observable output). For a Stretched Randomness Generator (section 13.4), it is necessary to have a period long compared to the interval between reseedings, but it is not necessary to make it much longer than that. So, for a SRNG, a huge period is neither necessary nor sufficient. For a PRNG that is not being reseeded, a huge period is necessary but not sufficient.

This is worth emphasizing: One key difference between a genuinely entropic randomness generator and a pseudo-randomness generator is that for the PRNG you have to worry about where you get the initial seed, how you recover the seed after a crash/restart, and how you protect the seed for all time, including protecting your backup tapes. For the HERNG you don’t.

9.2  Threats to the High-Entropy Randomness Generator

Attacks against the HERNG are very different from the usual sort of attack against a PRNG (such as are discussed in section 9.1).

The question is sometimes asked whether thermal noise is really “fundamentally” random. Well, it depends. Obviously, the magnitude of thermal noise depends on temperature, and theoretically an adversary could reduce the temperature of your computer to the point where the HERNG calibration (section 12) was no longer valid. In contrast, there are other processes, such as radioactive decay and quantum zero-point motion, that are insensitive to temperature under ordinary terrestrial conditions. This makes thermal noise in some sense “less fundamental”, but it’s not worth worrying about. If somebody can get close enough to your computer to radically change the temperature, attacks against the HERNG are the least of your worries. There are other attacks that are easier to mount and harder to detect.

This is a special case of a more-general observation: The most-relevant attacks against the HERNG don’t attack the fundamental physics; they attack later stages in the data acquisition chain, as we now discuss.

Suppose you choose radioactive decay, on the theory that it is a “more fundamentally” random process. The decay process is useless without some sort of data acquisition apparatus, and the apparatus will never be perfect. The detector will presumably have live-time / dead-time limitations and other nonidealities. An adversary can interfere with the data acquisition even if the fundamental decay process is beyond reach. Last but not least, the resulting Poisson distribution will not match the desired flat distribution (all symbols equally likely) that corresponds to the desired 100% entropy density. Therefore the acquired signal will not be one whit more useful than a signal derived from thermal noise. The same sort of postprocessing will be required.

Similar remarks apply to all possible sources of hardware randomness: we do not expect or require that the processes will be 100% random; we merely require that they contain some randomness.

The most significant threat to the HERNG is the possibility of gross failure. A transistor could burn out, whereupon the signal that contained half a bit of entropy per sample yesterday might contain practically nothing today. Or there could be a bug in the software.

Even if everything is working as designed, we have to be careful if the noise is small relative to Q, the the quantum of significance of the digitizer, i.e. the voltage represented by least-significant bit. Note that the quantum of significance Q is not necessarily the same as the quantum of representation QR. For instance, the Delta-1010 uses a 32-bit representation with 24 bits of significance. As discussed in section 6.2, we avoid problems by simply assuming the worst-case scenario; then we know there’s nothing an adversary can do to make things worse.

A more subtle threat arises if the digitizer has “missing codes” or “skimpy codes”. A soundcard that is advertised as having a 24-bit digitizer might really only have a 23-bit digitizer. Most customers wouldn’t notice.

An adversary could mount an active attack. The obvious brute-force approach would be to use a high-power electromagnetic signal at audio frequencies (VLF radio), in the hopes that it would couple to the circuits in the sound card. However, any signal of reasonable size would just add linearly to the normal signal, with no effect on the process of harvesting entropy.

A truly huge signal might overload and saturate the audio input amplifier. This would be a problem for us, because thermal noise would have no effect on an already-saturated output. That is, the small-signal gain would be impaired. An attack of this kind would be instantly detectable. Also, it is likely that if attackers could get close enough to your computer to inject a signal of this magnitude, they could mount a more direct attack. Remember, if we are using a 24-bit soundcard we have almost 16 bits of headroom (a factor of 10,000 or so) between the normal signal levels and the saturation level. I consider an attack that attempts to overload the amplifier so implausible that I didn’t write any code to detect it, but if you feel motivated, you could easily write some code to look at the data as it comes off the data-acquisition system and verify that it is not saturated. Having 16 bits (or more) of resolution is a great luxury. (It would be quite tricky to verify correct operation on other systems that try to get by with only one-bit or two-bit digitizers.)

Another possibility is interference originating within the computer, such as a switching power supply or a video card, that inadvertently produces a signal that drives the sound card into saturation. This is unlikely, unless something is badly broken. Such interference would cause unacceptable performance in ordinary audio applications, at levels very much lower than we care about. A decent soundcard is, by design, well shielded against all known types of interference. That is, a soundcard must provide good performance at low audio levels (little if any discernible additive noise, no matter what interference sources are present) and also good performance at high audio levels (little if any saturation). The possibility that an interference source that is normally so small as to be inaudible would suddenly become so large as to saturate a 24-bit ADC seems quite insignificant, unless there is a gross hardware failure. The odds here are not dominated by the statisics of the noise processes; they are dominated by the MTBF of your hardware.

Even if there is no interference, it may be that the sound card, or the I/O bus, radiates some signal that is correlated with the raw data that is being provided to the program. However, if you are worried about this sort of passive attack you need to be worried about compromising emanations (TEMPEST) from other parts of your system also. See reference 8. There is no reason to believe the audio hardware or HERNG software unduly increases your vulnerability in this area.

The least-fundamental threats are probably the most important in practice. As an example in this category, consider the possibility that the generator is running on a multiuser machine, and some user might (inadvertently or otherwise) change the mixer gain. To prevent this, we went to a lot of trouble to patch the ALSA system so that we can open the mixer device in “exclusive” mode, so that nobody else can write to it.

9.3  Partially Independent Samples

We do not need the samples to be completely independent. All we need is some independence.

Here’s an example that may help explain how this works.

Suppose we start with a deck of cards that has been randomly shuffled. It can provide log2(52!) bits of entropy. That’s a little more than 225 bits. Now suppose we have ten decks of cards all arranged alike. You could set this up by shuffling one of them and then stacking the others to match ... or by some more symmetric process. In any case the result is symmetric w.r.t interchange of decks. In this situation, we can choose any one of the decks and obtain 225 bits of entropy. The funny thing is that if we choose N of the decks, we still get only 225 bits of entropy, not N×225.

We can graph the amount of entropy we get, as a function of the number of decks we look at, as shown in figure 2.

non-extensive
Figure 2: Entropy is Non-Extensive

This can be summarized by saying that entropy is not an extensive quantity in this situation.

The spooky aspect of this situation is the whack-a-mole aspect: You cannot decide in advance which one of the decks has entropy and which N−1 of them do not. That’s the wrong question. The first deck we choose to look at has 225 bits of entropy, and only then can we say that the other N−1 decks have zero additional entropy.

It is possible for a source to be partially dependent and partially independent. For example, suppose we take each of the ten aforementioned decks and “cut the deck” randomly and independently. “Cutting” means applying one of the 52 different full-length permutations. Now, the first deck we look at will provide 225 bits of entropy, and each one thereafter will provide 5.7 bits of additional entropy, since log2(52)=5.7. So in this situation, each deck can be trusted to provide 5.7 bits of entropy.

In this situation, requiring each deck to have no dependence on the others would be an overly strict requirement. We do not need full independence; we just need some independence, as quantified by the provable lower bound on the entropy. To repeat: we don’t need to quantify how much dependence there is; we just need to quantify how much independence there is. In our example, there is provably at least 5.7 bits of usable entropy per deck.

If you wanted, you could do a deeper analysis of this example, taking into account the fact that the entropy we are relying on, 5.7 bits, is not the whole story. We can safely use 5.7 bits as a lower bound, but under some conditions more entropy is available, as can be quantified by considering the joint probability distribution and computing the entropy of that distribution. Meanwhile the fact remains that we are not required to extract every last bit of available randomness. Under a wide range of practical conditions, it makes sense to engineer a randomness generator based on provable lower bounds, since that is good enough to get the job done, and a deeper analysis would not be worth the trouble.

10  Magnitude of Supply and Demand for Entropy

For a typical personal workstation, there is only a modest demand for high-grade randomness. Uses include:

For such uses, I/O timing typically provides a perfectly adequate supply of entropy. The Linux kernel random number generator is an example: it collects a few bits per second from timing mouse, keyboard, and disk events.

Servers are more problematic than workstations. Often they have a much greater demand for entropy, and less supply. An example is a server which terminates hundreds of IPsec tunnels. During start-up it requires more than a hundred thousand high-quality random bits in a short time, which is far more than can be buffered in /dev/random, and far more than can be collected by the kernel in the available time.

An even worse situation arises in small networked controllers, which have no mouse, no keyboard, no disk, nor any other good sources of entropy. If we want to establish secure IPsec or ssh connections to such boxes, we simply must come up with a source of entropy.

At this point, one often hears the suggestion that we send the box some new entropy over the network. This is not a complete solution, because although it helps defend against certain threats, it leaves others undefended. One problem is that if security is compromised once (perhaps because somebody stole the PRNG seed) then security is compromised more-or-less forever, because the adversary can read the network traffic that carries the new entropy.

The sound card can produce quite a bit of entropy. Even a low-end sound card can produce 44,000 bits per second. A Delta-1010 (which has a higher sample rate, higher resolution, and more channels) can produce a couple million bits per second.

11  Installation

  1. Unpack the ALSA distribution (reference 25), including drivers, libraries, and utilities. ALSA version 0.9 is satisfactory; version 0.5 is not. Apply the patch in turbid/src/excl.patch, so that the mixer device can be opened in “exclusive” mode. To do that, cd to the directory in which you unpacked the turbid and alsa packages, and then incant something like patch -p1 < turbid/src/excl.patch. Then configure, make, and install the ALSA stuff (drivers, libraries, utilities). If you have never run ALSA on your system, you will need to run snddevices once. It may take some fussing to get the proper entries to describe your soundcard in /etc/modules.conf. Install the utils/alsasound script in /etc/rc.d/init.d/.
  2. Test the sound system. Verify that you can play a generic .wav file using aplay.
  3. Compile turbid. This should require nothing more than untarring the distribution and typing make in the turbid/src directory.
  4. If you want to run turbid as non-root, make dirs to create the needed directories.
  5. Test it and calibrate it as discussed in the next section. You don’t need to be root in order to compile and test turbid.
  6. Send me <jsd@av8n.com> the resulting .ctl file, and the values for the -Q -R -B and -K settings. I will add it to the distribution, so that others may use that brand of cards without having fuss with calibration.
  7. You can (if you want) become root and do a make install.
  8. If you want turbid to be started and stopped automatically according to the system runlevel, you need to install a few things by hand. There is a file turbid/src/init.d/turbid which you can use as a model for something to put in to /etc/rc.d/init.d/ (but don’t forget to edit the options, according to the calibration, as described below). This is not installed automatically; you have to do it by hand. Similarly you need to install by hand the symlinks in /etc/rc.d/rc?.d/.

If turbid runs as non-root, it will open its output FIFOs in your $HOME/dev/ directory, look for its control files in your $HOME/etc/turbid/ directory, and scribble a .pid file in your $HOME/var/run/ directory.

If turbid runs as root, it will use the system /dev/ and /etc/turbid/ /var/run/ directories.

12  Configuration and Calibration

It is remarkably easy to mis-configure a soundcard in such a way that it produces much less entropy than it otherwise would. This section outlines the general procedure; see appendix C for details on specific soundcards.

Note that we use the term soundcard loosely, to apply not just to individual cards per se, but also to audio I/O systems embedded on the mainboard, and to external pods connected via USB or whatever.

You can skip down to section 12.4 if you’ve been given a working mixer configuration (.ctl) file for your soundcard. You can skip all the way down to section 13 if you have the configuration file and you know the R-value, Q-value, and B-value for your card (input impedance, digitizer quantum, and bandwidth, perhaps from table 4).

12.1  Configuring Audio Output

You might want to run turbid -h to see what files it wants to use. In particular, take a look at the default for the -m option, which tells you what the default ALSA device is. Make sure this looks sensible. If it says “unknown” you’ve got a problem, probably due to hardware not installed or driver not loaded.

Now the fun begins. Fire up turbid -m "" -F 440 to invoke the calibration feature. It plays a chord: By default the left channel is Concert A (440 Hz) and the right channel is a perfect 4th above that (a factor of 1.3333).

Plug an ordinary stereo audio cable into the soundcard. In order of preference, you want to plug into one of the following ports: Line-Out, Headphone-Out, or perhaps Speaker-Out.

At the other end of the cable, use the voltmeter to measure the voltage of the chosen channel relative to ground (i.e. the shield). If you have made a QA box (section 12.9) it provides convenient places to clip onto ... but make sure the resistors are not in series with the voltmeter.

Select the AC Volts mode on the voltmeter. Adjust the mixer gain (using e.g. alsamixer) and/or restart turbid with an adjusted calibrator amplitude (turbid -m "" -F 440 -A ...) so that the voltage is around 0.2 volts RMS. If desired, save the mixer settings to a temporary file using alsactl, just to leave some breadcrumbs in the forest.

There are some obvious advantages and some subtle risks associated with using a speaker-system or headphones during this phase. One advantage is that it provides a convenient, immediate, qualitative indication of the output level, not requiring you to look at the voltmeter. Another advantage is that you can detect saturation (clipping) by its nasty sound. Alas, there are disadvantages because different speakers, headphones, and preamps have impedances varying over a huge range, from a few Ohms to a few dozen kOhms. (Contrast this with the voltmeter impedance, several MegOhms, which is high enough that it will never cause problems.) A speaker that works as desired when connected to the Speaker-Out jack might cause problems when connected to the Line-Out jack, and the problems might be subtle: even if it qualitatively works it might greatly perturb the signal level. So if you use a speaker-system as a qualitative check, once things are qualitatively working you should disconnect it and repeat the entire calibration process, starting from scratch, using just the voltmeter.

12.2  Configuring Audio Input

The first step is to make sure audio input is working on your soundcard. This temporarily requires a well-behaved test signal. Your options include:

Run a cable from the Line-Out port on West to the appropriate input port on East. If there is a Line-In port, start with that. We can come back and explore the other input ports (e.g. Mike-In) later. We want this to be a low-impedance connection, i.e. we do not want QA resistors (section 12.9) in series with the connection at this time.

Beware: although some machines have a stereo microphone input, others have a mono stereo input where you expect the left channel to be, i.e. the tip of the connector, while the ring of the connector is a power-supply output.

Use the voltmeter to verify that a reasonable-sized signal is present at the input to East.

On East, fire up a mixer-control program (e.g. alsamixer) in one window and fire up turbid in another. Select verbose mode, and turn off the calibrator-output; that is, don’t specify a -F option. You will also need to use the -m "" option so that turbid won’t monopolize the mixer device. The command should look something like turbid -v -m "".

Observe the “stdev” reading in dB. Adjust the mixer to enable the chosen input port as the capture source, and turn the associated gain sliders all the way up. This is important! Make sure every element in the input-chain is turned all the way up.

Check whether the stdev readings are saturated, as follows: Turn down the amplitude on the signal generator (West) by a few dB (turbid -F 550 -I 1.2 -A -5) or maybe many dB (turbid -F 550 -I 1.2 -A -25) and see if the stdev readings (on East) drop by the same number of dBs. Explore the range of amplitudes to find the point where saturation sets in. Set the amplitude so that the reading is 10 dB or so below saturation, and leave it there.

Do not cure saturation by lowering the gain of the input channel. Instead, just lower the amplitude of the applied signal.

If you are not using the Mike-In port, skip to section 12.3. On most consumer-grade soundcards, the Mike-In port is peculiar.4 It is mono. It records from what you would think of as the left channel. The wire that would have carried the right channel is present, but it is not an input; it supplies power from the soundcard to the microphone. The nominal open-circuit voltage is 5V and the source impedance is a couple of kOhm, so the short-circuit current is about 2.5mA. You probably shouldn’t connect an ordinary Line-Out signal to a Mike-In port. That drives 5 volts backwards into the Line-Out port. It probably won’t blow out the port, but it still seems unsavory.

The risk of damage is even greater if you connect the Mike-In port of the soundcard to the Line-In port of your stereo. (This is easier to do than you might think, e.g. if you are using a Y-connector to monitor some signal using your stereo while simultaneously presenting that signal to the Mike-In port.) This puts a huge DC signal on the right channel of your speakers, and could quite easily burn out the woofer. Even if you use a custom cable with the right channel open-circuited, there will likely be an unsavory transient whenever the plug is partway inserted into the Mike-In jack, so you might want to insert that end first (and remove it last). All in all, the Mike-In port is an appallingly bad design. It should have provision for a second channel, and it should be shaped differently from other connectors to prevent the 5V power from going astray.

If the Mike-In signal has an AGC (Automatic Gain Control) feature that can be turned off, turn it off (using alsactl or whatever).

12.3  Configuring Full Duplex

We need to get input and output working at the same time, not interfering with each other. This is called full duplex operation. To see whether output is working, play something using the play command, or the aplay command, or (best of all) turbid -F 440. Then hook a cable to the output of your soundcard (on East) and observe the voltage with the voltmeter (or, for initial qualitative checks, hook up a speaker or headphones).

Most likely, you will have to fiddle with mixer settings (using alsamixer) to get this to work. The real trick is to get output working without messing up the input. The three central requirements are:

When it is all working, you can produce an output and measure an input, using e.g. turbid -v -F 440.

Virtually all soundcards are capable of full duplex, but it is sometimes tricky to get it set up. A major problem is that mixers (such as alsamixer) don’t understand what you are trying to do. Suppose you have chosen to record from the Line-In channel. You want to mute that channel; otherwise the input signal will be routed directly to the output channel. (Such direct routing would be useful if we wanted an analog monitor of the input signal, but that’s not what we want.) The problem is that when you mute this channel, the mixer disables it for input, which is not what we want, either.

Here’s a typical workaround. The alsactl program is much more powerful than the alsamixer program. Set up the card for recording (as described in section 12.2) even if it isn’t full duplex yet. Save the settings in a file, using e.g. alsactl -f record.ctl store. Then mute the input channel. Save the settings in a different file, e.g. using alsactl -f play.ctl store. Compare the two files using diff –side-by-side record.ctl play.ctl. Typically you will find two things that have changed: one having to do with playback and one having to do with capture. Create a third file, duplex.ctl by copying one of the files and editing it by hand to incorporate the best features of the other file. Apply it using alsactl -f duplex.ctl restore. See if it works, using a command like turbid -v -m duplex.ctl ....

Once it works, rename duplex.ctl to a name appropriate to the sound device driver you’re using. Get the appropriate filename from turbid -h | grep mixer. Put it in your $HOME/etc/turbid/ directory, since that’s where the program looks for such things.

To check that full-duplex is working, check that on your machine (East) the command turbid -v -F 440 measures the same input voltage (stdev) as the command turbid -v. That is, the amplitude on the other machine (West) should be controlling the observed stdev signal on your machine, independent of what your machine is outputting.

Conversely, while running the command turbid -v -F 440 on your machine, observe the output level of your machine (using the voltmeter). Make sure that starting and stopping the input signal (coming from the other machine) has no effect on the output of your machine.

At this point, you may find it convenient to configure West for full-duplex operation also, by following the mirror-image procedure, but this is not necessary if your only goal is to configure East.

Now you can give back the other machine (West).

12.4  Calilbrating Input Sensitivity

Since we are now sure that your machine (East) is operating full duplex, the calibration can be completed using your machine alone. Hook a cable from the output to the input, and hook up the voltmeter. An easy way to do this is to use the QA box (described in section 12.9), because it provides places to attach the voltmeter. We still want this to be a low-impedance connection, i.e. without the QA resistors in the circuit.

Run turbid -v -F 440. Using the voltmeter, measure the voltage. Adjust the output level using the -A ... option so that the voltage roughly equals the non-saturated value determined in section 12.2. Confirm this using the voltmeter reading and the stdev value. By way of illustration, suppose the amplitude is -8 and the resulting voltage is 200mV.

Restart turbid, passing it this value (turbid -v -A -8 -F 440 -V 0.200). The program will print out the value of Q, the quantum of significance. This is a function of the voltage (as obtained from the voltmeter), the magnitude of the received signal, and the position of the least significant bit (LSB). For most 16-bit soundcards, the LSB is the 16th bit from the top. For the Delta-1010, the LSB is the 24th bit from the top.

Experiment with different amplitudes (turbid -v -F 440 -A ... -V ...), pairing each amplitude with the corresponding observed voltage. The Q should come out the same each time, unless the amplitude is unduly large (saturation) or unduly small (noise pollution). Note that you will have to run turbid twice for each such observation: First you run it with a new -A value and no -V value, so you can observe the voltage with the voltmeter, and then, once you know the voltage, restart turbid passing it that -V value.

12.5  Calibrating Input Impedance

Restart turbid. Pass the Q value determined in section 12.4. The command might look something like turbid -v -A -8 -F 440 -V .200 -Q 1.31e-6. The program will display the quantity “Vin/Vmeter” which should be close to unity.

Now hook up the QA cable. From now on we will not be jumpering across the QA cable resistors; we are using them as a standard of comparison for calibrating the input impedance of the soundcard. Attach the voltmeter to the high side of the resistor, the side toward the output stage of the soundcard. (The low side will be measured by the input stage of the soundcard.)

Restart turbid, passing it the new voltmeter reading. This might be slightly higher than the previous reading (because the high impedance of the QA cable loads the soundcard output somewhat less than a direct connection to the soundcard input does), but not much higher (and certainly not lower).

Using this information, turbid will be able to determine the input impedance R of the soundcard port you are using.

Restart turbid, passing it this R value. The command should look something like turbid -v -A -8 -F 440 -V .200 -Q 1.31e-6 -R 9400. As a check, observe that the Vin/Vmeter/divider number is close to unity. (The Vin/Vmeter number itself will be small compared to unity, because there is a voltage divider between what the voltmeter is measuring and what the soundcard input is measuring.)

12.6  Calibrating the Output Scale Factor

At this point turbid will also be displaying a number (Kout) which is basically the voltage (as would be measured by the voltmeter) corresponding to the internal representation of a certain “standard” sine wave.

Restart turbid, passing it this K value. From now on, we no longer need the voltmeter (or the -V argument); we will rely on the calibrated output of the soundcard to complete the calibration of the input. This has the advantage that the soundcard can probably be trusted over a wider frequency range than the voltmeter can.

You can test this by changing the frequency of the calibrator. The command should look something like turbid -v -A -8 -F ... -Q 1.31e-6 -R 9400 -K .152. Verify that Vin/Vout/divider ratio remains near unity over a wide range of frequencies. It will start to drop off at high frequencies (above 15 or 20 kHz).

12.7  Measuring the Bandwidth

Restart turbid, passing it “-1” instead of a frequency. The command should look something like turbid -v -A -8 -F -1 -Q 1.31e-6 -R 9400 -K .152. This will emit a broadband probe signal, rather than a sine wave. The program will then print out a measurement of the bandwidth. See appendix D for an explanation of the measurement technique.

This completes the calibration process. See section 13 for a discussion of normal (non-calibration) operations.

12.8  File a Report

Please send me <jsd@av8n.com> the .ctl file for your card, and the values for the -Q -R -B and -K settings. I will add it to the distribution, so that others may use that brand of cards without having fuss with calibration.

12.9  How to Make a Quality-Assurance Circuit

The QA circuit is used temporarily for calibration and used permanently for integrity checking. You don’t need a cable if you are neither interested in integrity checking, nor interested in calibrating your card (because you’ve been given a .ctl file for your mixer, along with the Q, R, and B values).

The following tools will be needed only temporarily, so if you don’t have them on hand you will have to buy or borrow them:

The easiest way to make a QA circuit is to build a box such as shown in figure 3. (If you are mass-producing things, you may want to build an integrity-monitoring cable with resistors built right into the cable. That is fine for integrity monitoring, but not so convenient for calibration. We will pursue that topic here.)

box20
Figure 3: QA Circuit Box

The following components will be needed, so if you don’t have them on hand you will have to go to an electronics-hobby store and buy them:

Here is the circuit diagram:

  pin     Jack1     Jack2     Jack3   signal

  tip       X----R----X---------X     (left)

  ring      X----R----X---------X     (right)

  shank     X---------X---------X     (ground)

where X denotes a soldered joint and R denotes a resistor.

To make a low-impedance connection between two cables, you plug one into Jack2 and the other into Jack3. To make a high-impedance connection between two cables, you plug one into Jack2 and the other into Jack1.

If you really want to economize, you can build a QA box without Jack3, since there are other ways of making a low-impedance connection. But I find Jack3 to be convenient.

13  Normal “Production” Operation

“Production mode” stands in contrast to “calibration mode”.

In production mode, you must provide values for Q, R, and B. No other parameters are needed, but there are a lot of options, as we now discuss.

To verify that turbid is working, you can look at the output, e.g. od -A n -t u1 -N 32 /dev/hrandom (or $HOME/dev/hrandom if you are running it as non-root). Also, turning up the verbosity level (turbid -v or turbid -vv) will display some information about the internal workings. The meaning of some of the dislayed items is documented in section 12.

13.1  Feeding the Kernel

If you are running a lot of legacy applications you might want to turn on the -f option as described in section 13.3. The program will write random bits to the output fifo, and will block waiting for somebody to read them. For example: turbid -Q 2.076e-07 -R 2800 -B 36200 -f. You can add the -d option if you want it to detach and run as a daemon.

13.2  Buffering

The output of turbid is double-buffered, 512 bytes per buffer. The main reasons for this include:

Note that you have to be careful when using dd, because it unwisely counts blocks, not bytes. If you want to use dd to capture, say, a megabyte from /dev/random, it is advisable to use two copies of dd in a pipeline:

dd if=/dev/random obs=512 | dd bs=512 count=2k

The first dd performs reblocking while the second dd does the counting. A similar scheme is required at the output of turbid if you want to use dd with a blocksize that is not a divisior of 512 (and possibly under other conditions).

In code that you write yourself, no reblocking is required. You can keep out of trouble very easily, just by paying attention to the number of bytes actually returned by each read() request (which will sometimes be less than the number requested).

While we’re discussing buffering, note that the Linux /dev/random can buffer approximately 4096 bits, but you can’t trust ioctl(, RNDGETENTCNT, ) to tell you how much is really there. Sometimes when the ioctl reports 4096 bits, the buffer can be exhausted by reading as little as 342 bytes (2736 bits).

13.3  Feeding the Kernel RNG

If you specify turbid -f, a subprocess will take some of the output and feeds it to /dev/random whenever the latter is running short of entropy. This means that applications such as GnuPG that rely on /dev/random will run nicely, without blocking.

Runtime efficiency would be improved by rewriting the applications to use /dev/hrandom directly, but this is not always convenient.

You should leave this feature turned off if you are not running as root, since /dev/random doesn’t trust deposits from non-root sources.

Also, you may want to leave it turned off under the following conditions: Suppose some application is consuming vast quantities of low-grade randomness from /dev/urandom (perhaps to obliterate a multi-gigabyte file or some such). Due to a misfeature of the Linux kernel random-number system, every byte you read from /dev/urandom takes one byte away from the pool of stored entropy in /dev/random. Therefore the feeder subprocess, in the attempt to keep /dev/random happy, will consume a great deal of entropy from /dev/hrandom, in competition with other processes that want to use /dev/hrandom.

As discussed in section 14, this problem is intrinsic to /dev/urandom. Avoid using /dev/urandom. The solution is to use /dev/srandom instead. See section 13.4.

13.4  Stretched Randomness Generator

The turbid program comes bundled with a Stretched Randomness Generator. Its entropy density is strictly greater than zero but less than 100%. To use it, read from /dev/srandom.

The SRNG seeds itself, and reseeds itself every so often, using entropy from the HERNG. Reading from /dev/srandom consumes entropy 10,000 times more sparingly than reading from /dev/hrandom or /dev/random or /dev/urandom. It is also less compute-intensive than /dev/urandom.

The principle of operation is as follows: In addition to the seeding/reseeding process, there is a state variable, a deterministic update function, and a hash function. Between reseedings, at each iteration we deterministically update the state variable and feed it to the hash function. The output of the hash function is the output of the SRNG.

The update function treats the state variable as three 32-bit registers. Each register has a random starting point. The first register is updated by adding a random odd addend. The second is a Linear Feedback Shift Register. The third is another LFSR, with a different period. (This design was chosen because each register can be separately tested, in contrast to one big 96-bit LFSR which would be harder to test.) Each register separately has a period on the order of 232 and combined they give a period on the order of 296. The period is billions upon billions of times longer than it needs to be, because after 10,000 iterations the state variable is strongly perturbed using 128 bits of pure entropy from the HERNG.

It is commonly assumed that having a super-long period is a hallmark of a good PRNG. But that’s neither necessary nor sufficient. A 4000-bit LFSR has a period of 24000 or so, but it can easily be cryptanalyzed on the basis of only 4000 observations (unless it is protected by a strong one-way function installed between the LFSR and the observable output). It is necessary to have a period long compared to the interval between reseedings, but it is not necessary to make it much longer than that. In summary, if the period is long enough, making it longer confers no advantage.

Unlike the HERNG, the SRNG must assume that the hash function has the one-way property. That is, we assume an adversary (even after ∼10,000 observations of the output of the hash function) cannot infer anything useful about the state variable.

The defining property of a SRNG is that it has an average entropy density that is greater than zero but less than 100%. Let’s take a moment to discuss where in the output-stream this entropy is to be found. Suppose you have 1024 bits of entropy available for seeding your generator, which you then use to put out a very long output string.

14  History, Context, and Acknowledgements

There is a vast literature on hash functions in general. Reference 5 is a starting point. Methods for removing bias from a coin-toss go back to von Neuman; see reference 23 and references therein; also reference 24.

Reference 19 suggests hashing 308 bits (biassed 99% toward zero) and taking the low-order 5 bits of the hash to produce an unbiassed result. This is correct as far as it goes, but it is inefficient, wasting 80% of the input entropy. Presumably that reflects an incomplete understanding of the hash-saturation principle, in particular the homogeneity property discussed in section 3.2, which would have led immediately to consideration of wider hashcodes. There is also no attempt to quantify how nearly unbiassed the result is, the way table 1 does.

We thank Steve Bellovin, Paul Honig, and David Wagner for encouragement, incisive questions, and valuable suggestions.

15  Conclusions

There is no reason to believe that anything in this world is completely random. The degree of randomness can be quantified in terms of the entropy density. The methods disclosed here make it possible to approach 100% entropy density as closely as you wish, producing symbols that are random enough for any earthly practical purpose. It is important to have a calculated lower bound on the entropy density. In contrast, measurement and testing can only hope to provide an upper bound on the entropy density.

It is possible to generate industrial-strength randomness at very low cost, for example by distilling the randomness that is present in ordinary audio interfaces.

A  The Definition of Randomness and Surprise Value

A.1  Introduction and Intuitive Examples

In order to refine our notions of what is random and what is not, consider the following million-character strings. We start with

Example E1
: “xxxxxxxx...xx”      (a million copies of “x”)

That seems distinctly non-random, very predictable, not very complex. Next, consider the string

Example E2
: “xyxyxyxy...xy”      (half a million copies of “xy”)

That seems also non-random, very predictable, not very complex.

Example E3
: “31415926...99”      (the first million decimal digits of π/10)

That is somewhat more interesting. The digits pass all known tests for randomness with one narrow exception (namely tests that check for digits of π). However, they must still be considered completely predictable. Finally, consider the two strings

Example E4
: “AukA1sVA...A5”     
Example E5
: “Aukq1sVN...G5”     

The million-character string represented by example E5 is, as far as anybody knows, completely random and unpredictable. Example E4 is very similar, except that the letter “A” appears more often than it would by chance. This string is mostly unpredictable, but contains a small element of nonrandomness and predictability.

A.2  Definitions

Following Solomonoff (reference 9 and reference 10) and Chaitin (reference 11) we quantify the surprise value of a string of symbols as follows:

Let z be a string of symbols. The elements of z are denoted zk and are drawn from some alphabet Z. The number of symbols in the alphabet is denoted #Z.

Let PC(z) be the probability that computer programs, when run on a given computer C, will print z and then halt. The surprise value is defined to be the negative logarithm of this probability, denoted $C(z) := − logPC(z). In this paper we choose to use base-2 logarithms, so that surprise value is measured in bits. Surprise value is also known as the unexpectedness. It is intimately related to entropy, as discussed below.

Note that we are using one probability distribution (the probability of choosing a computer program) to derive another (the probability PC(z) of a symbol string). To get this process started, we need to specify a method for choosing the programs. Better yet, using the notions of measure theory, we realize that probability need not involve choosing at all; instead all we really need is a method for assigning weight (i.e. measure) to each of the possible programs. Here is a simple method:5 Consider all possible programs of length exactly L* (measured in bits, since we are representing all programs as bit-strings). Assign each such program equal measure, namely 2L*. Then test each one and count how many of them produce the desired output string z and then halt, when run on the given computer C. The measure of a string is the total measure of the set of programs that produce it. A string that can be produced by many different programs accumulates a large probability.

In this construction, a short program X (i.e. one which has a length L(X) less than L*) is not represented directly, but instead is represented by its children, i.e. programs formed from X by padding it with comments to reach the required length L*. There are at least 2L*L(X) ways of doing the padding, and each way contributes 2L* to the total measure. This means that a string z that can be produced by a short programs X will have a probability at least 2L(X), no matter how large L* is. We take the limit as L* becomes very large and use this in the definition of PC(z).

Usually the probability PC(z) is dominated by (the children of) the shortest program that produces z. Therefore some people like to use the length of this shortest program as an estimate of the surprise value. This is often a good estimate, but it should not be taken as the definition.

The explicit dependence of PC(z) on the choice of computer C calls attention to an element of arbitrariness that is inherent in any definition of surprise value. Different people will assign different values to “the” surprise value, depending on what resources they have, and on what a priori information they have about the situation.

In an adversarial situation such as cryptography, we suggest that probabilities be defined in terms of the adversary’s computer. If you have multiple adversaries, they can be treated as a single adversary with a parallel computer.

A.3  Properties of the Surprise Value

In this section, for conciseness, we drop the subscript C ... but you should remember that the probabilities and related quantities are still implicitly dependent on the choice of C.

Upper bound:   If z is a string of bits, the surprise value $(z) can never exceed the length L(z) by more than a small amount, because we can write a program that contains z and just prints it verbatim. More generally, if z is a string of symbols, $(z) can never exceed L(z) log(#Z) by more than a small amount. Again, that’s because we can write a program that contains a literal representation of z and just prints it verbatim.
Tighter Upper Bound:   The surprise value cannot exceed the length of the compressed representation6 of z by more than a bounded amount, because we can write a program that contains the compressed representation and calls the uncompress utility.
Lower Bound:   With very few exceptions, the surprise value $(z) of any string z cannot be not much less than L(z) log(#Z). You can see why this must be so, using a simple pigeon-hole argument: Consider all bit-strings of length 500, and suppose 1% of them have entropy less then 490 bits. Well, we are talking about 1% of 2500 strings, each having probability 2−490 — which adds up to more than 100% probability. Oops.

A.4  Limitations

A relatively minor objection to this definition of surprise value is that PC(z) includes contributions from arbitrarily-long programs. That is a problem in theory, but in practice the sum is dominated by relatively short programs, and the limit converges quickly.

A much more serious objection is that even for modest-sized programs, the definition runs afoul of the halting problem. That is, there may well be programs that run for a million years withough halting, and we don’t know whether they would eventually produce string z and then halt. This means the surprise value, as defined here, is a formally uncomputable quantity.

We will duck both these issues, except for the following remarks.

A.5  Surprise Density

In this section we shift attention from the unpredictability of strings to the unpredictability of the individual symbols making up the string.

Let us re-examine the examples given at the beginning of this section. Example E5 has surprise value $(z) very close to L(z) log(#Z). We classify strings of this sort as absolutely-random, by which we mean algorithmically-random.

Examples E1, E2, and E3 all have surprise value much less than their length. These strings are clearly not absolutely-random.

The interesting case is example E4. Intuitively, we think of this as “somewhat unpredictable” but more predictable than E5. To make this notion quantitative, we introduce the notion of surprise density. The following quantity tells us the surprise density of the string z in the region from i to j:

σj|i(z) := 
$(Front(zj)) − $(Front(z,i))
ji
             (12)

where Front(z,i) is the substring formed by taking the first i symbols of the string z.

The surprise density for examples E1, E2, and E3 is zero for any region not too near the beginning. The surprise density for example E5 is as large as it possibly could be, namely 100% of log(#Z). Example E4 illustrates the interesting case where the surprise density is quite a bit greater than zero, but not quite 100% of log(#Z).

As mentioned in the introduction, we consider the unadorned term “random” to be ambiguous, because it means different things to different people. Some people think “random” should denote 100% surprise density, and anything less than that is “non-random” even if it contains a great deal of unpredictability. Other folks think that “random” refers to anything with an surprise density greater than zero, and “non-random” means completely predictable. Yet other folks extend “random” to include pseudo-random strings, as long as they are “random enough” for some application of interest. Even some professionals in the field use the word this way; NIST FIPS-140 speaks of “deterministic random number generators”.

In the space of all possible strings, almost all strings are absolutely-random, i.e. are algorithmically-random, i.e. contain 100% surprise density. However, the strings we find in nature, and the strings we can easily describe, all have very little surprise density. We can summarize this by saying: “All strings are absolutely-random, except for the ones we know about”.

We can use similar ideas to describe PRNGs and contrast them with HERNGs.

Most of modern cryptography revolves around notions of computational intractability. (I’m not saying that’s good or bad; it is what it is.   In contrast, there is another set of notions, including entropy and the related notion of unicity distance, that have got nothing to do with computational intractability.

The long strings produced by a good PRNG are conditionally compressible in the sense that we know there exists a shorter representation, but at the same time we believe them to be conditionally incompressible in the sense that the adversary has no feasible way of finding a shorter representation.   The long strings produced by a HERNG are unconditionally, absolutely incompressible. Specifically, the set of strings collectively cannot be compressed at all. As for each string individually, it is not compressible either, except possibly for trivially rare cases and/or trivial amounts of compression.

B  Entropy

B.1  Basics

In this section we climb down a bit from the formality of the previous section.

Let C be a machine that endlessly emits symbols from some alphabet Z. We assume the symbols are IID, that is, independent and identically distributed. That means we can calculate things on a symbol-by-symbol basis, without worrying about strings, the length of strings, or any of that. Let PC(i) denote the probability of the ith symbol in the alphabet.

Note: Nothing is ever exactly IID, but real soundcards are expected to be very nearly IID. At the least, we can say that they are expected to have very little memory. What we require for good randomness generation is a very mild subset of what is required for good audio performance.

The surprise value of a given symbol is

$C(i) := log(1/PC(i))              (13)

By extension we can talk about the probability of a string of symbols. Since the symbols are IID, the probability of a string z is:

PC(z) := 
 
k
 PC(zk)              (14)

where zk is the symbol in the kth position in the string. As an immediate corollary,

$C(z) := 
 
k
 $C(zk)              (15)

which just says that the surprise value is additive in a nice way.

The source entropy of the machine C is defined to be:

SC := 
 
i
  PC(i) log(1/PC(i))              (16)

Let’s be clear: The surprise value is a property of the symbol, while the entropy is a property of the source. In particular, the entropy is the appropriately-weighted average surprise value.

For a long string, we expect the surprise value of the string to converge to the entropy of the source times the length of the string.

Beware: In the literature it is common to see the word “entropy” used in places where surprise value would make more sense. In particular, the code and documentation for the Linux /dev/random talks about depleting the “entropy” in its buffer.

B.2  Skewed Distributions

Let’s look again at the definition of entropy, equation 16. It is manifestly a single number that characterizes a certain average property of the distribution P. If you wanted a fully detailed description of the distribution, you would need to know Pi for all i (and even more information than that, unless the samples are IID).

As previously mentioned, the entropy S can be interpreted as the appropriately weighted average surprise-value.

Suppose we have a distribution so badly skewed that for some symbols, the surprise value is vastly smaller than the average surprise value. In this case, focussing on the average value – i.e. the entropy – is not going to solve all the world’s problems. The typical case may not be the same as the average case.

Specifically: for a skewed distribution, entropy is often not a good measure of cryptographic strength or resistance to cryptanalytic attack. You may care more about the minimum surprise value than about the average surprise value.

An example of a skewed distribution is shown in figure 4. The disk represents unit probability. We see that there is one symbol that monopolizes half the probability, and then there are N−1 other symbols that share the other half of the probability. The figure portrays the case where N=9. It is easy to show that the entropy for such a distribution is 1 + .5 log(N−1).

skewed
Figure 4: Skewed Distribution

Here’s the basic idea for dealing with skewed distributions: When filling the buffer with raw data, we need to keep a running estimate of how much surprise value is in the buffer. Call this $’.

There are several variations on this basic scheme:

  1. Suppose we have full knowledge of the distribution P, so that we know the probability Pi for each and every i, i.e. the probability of each and every symbol in the raw-data alphabet. Then, for each sample of raw data, we increment $’ according to the actual surprise value of the sample. This allows us to use the number of samples actually needed. The number of samples required per output-word will fluctuate.
  2. At the opposite extreme, a crude yet provably correct approach is to rely on the minimum (not the average) per-symbol unpredictability. That is, we increment $’ according to log(1/P+), where P+ is the probability of the most-probable symbol. This causes us to overestimate the amount of raw data needed. It guarantees that even in the worst case, the buffer contains enough unpredictability to saturate the hash function.
  3. As an intermediate case, suppose we don’t have full knowledge of the distribution, but we have estimated (by histogramming if nothing else) the probabilities of the most-common symbols. Using that, we can arrange that $’ is a relatively mild underestimate of the surprise value in the buffer, which naturally leads us to mildly overestimate the amount of raw data needed.

Additional things to keep in mind include:

The general rule is: Drawing a small sample from a highly-skewed distribution is not guaranteed to produce “average” behavior. Duh, no surprise there!

Again: the source entropy S in equation 16 is a single number that characterizes the “average” behavior of the source. For a small sample from a highly-skewed distribution, S doesn’t tell you everything you need to know. This has no bearing on the definition of entropy; entropy is always defined by equation 16. It just means that equation 16 by itself doesn’t solve all the world’s problems.

In practice, we don’t have too much of a problem with skewed distributions, for two reasons: First of all, the raw data from a typical soundcard is not terribly skewed. It could perhaps be skewed if the card is producing much less than 1 bit of surprise value per sample ... but such cards produce so little randomness that you ought to re-engineer the system to use a better card. Secondly, turbid is somewhat protected by the fact that the hash function has a reasonably-wide output word. By the time the input buffer contains enough randomness to saturate the hash function, the law of large numbers has taken hold sufficiently to wash out any small amount of skew that may be present. That is, the average surprise value (averaged over a bufferful of data) converges reasonably quickly to the source entropy.

C  Soundcard Quirks

This section is a collection of details about certain soundcards. (You can refer back to table 4 for a summary of the performance of a few soundcards. Also, section 12 provides step-by-step procedures that apply to “generic” soundcards.)

Soundcards generally include mixer functions. You have to initialize the mixer, to set the sliders on the features you want and/or to set the switches to bypass the features you don’t want. This is a bit of a black art, because the soundcards typically don’t come with any “principles of operation” documentation.

Graphical user interface (GUI) programs such as alsamixer may help you make some settings, but may not suffice to fully configure the mixer. All too often, the card has “routing switches” and other features that the GUI doesn’t know about. A reasonable procedure that gives complete control is to invoke “alsactl -f record.ctl store”, edit record.ctl, then “alsactl -f record.ctl restore”. Or you can use amixer to fiddle one setting without disturbing others.

There are still a lot of ISA-bus soundcards floating around. You need to work a bit to get their I/O registers and IRQs correct. On Linux systems, the Red Hat “sndconfig” program is helpful. It may be necessary to capture a copy of /proc/isapnp before installing your soundcard; then insert (by hand) an appropriate “options” line in /etc/modules.conf so that /etc/init.d/alsasound will configure the card with non-conflicting registers and IRQs. Plug-and-play doesn’t mean there will be no conflicts, or that conflicts will be automagically resolved; mainly it means that you can re-configure the card’s addresses using software, without having to change hardware jumpers and switches.

C.1  Soundblaster Cards

C.2  Extigy

This is yet another example of hype and deception from Creative Labs. The device itself, the box it comes in, and the advertising all prominently mention USB, 24-bit audio, and 96 kHz sampling rates. However, according to all available evidence, the device will not transmit 24-bit audio via USB to the DACs or from the ADCs (no matter what the data rate). It also will not transmit 96 kHz samples via USB to the DACs or from the ADCs (no matter what the word size).

Apparently the only way in which the Extigy can deal with 24 bits or 96 kHz is by shipping it digitally (e.g. via S/PDIF) to/from some other device.

It also has astonishingly poor input sensitivity. It’s about 5 times worse than a typical cheap sound card, and about 100 times worse than a decent 24-bit soundcard, as you can see from table 4.

The Extigy is no good for present purposes. I recommend you do not buy this product. Many other Creative Labs products have problems, too, so in fact I recommend that you never buy anything made by them.

C.3  M-Audio Delta 1010

This is a high-end system, with eight analog inputs, eight analog outputs, 96000 frames per second, and 24 bits of resolution. M-Audio was previously known as Midiman.

Each of the eight analog input and each of the eight analog outputs has a pushbutton that I call the “pro/con” switch. Professional studio equipment (pro) runs at lower impedances and lower signal levels than consumer-grade home audio equipment (con). That is, the pro setting (switch pushed in) means lower impedance and lower open-circuit voltage on output, and lower impedance and greater voltage-sensitivity on input. Whether you care about the lower impedance depends on the impedance of whatever the card is hooked to.

The card does not have an 8×8 or 10×10 mixer board. It is apparently 10×2. We bypass it, as discussed below.

The alsa driver always expects 10 channels of output: 8 analog plus 2 S/PDIF. If you are only interested in a subset thereof, you have to pad things so the buffer has 10 samples per frame anyway. Similarly, the alsa driver always produces 12 samples per frame on input: 8 analog plus 2 S/PDIF plus 2 from the mixer. For present purposes we use the 8 analog inputs and ignore the others, since they contain no additional entropy. Therefore you must pass the –channel-mask 255 option to turbid; otherwise the program will have less entropy than it thinks it has, by a factor of 8/12ths.

Each of the eight analog inputs is always routed directly to its own DAC, so the PCM input functionality is always unaffected by the sliders and switches on the mixer.

The routing can be set so that each of the eight analog outputs is fed directly from the PCM data. This is recommended. This makes the PCM output functionality independent of the sliders on the mixer.

C.4  Thinkpad 600

Getting this to work requires the following tricks:

As usual, you will need the .ctl file (included in the turbid bundle) plus the settings in table 4.

The same jack doubles as a Line-In port and Mike-In port, depending on software settings. Unlike the typical Mike-In port as described in section 12.2, this Mike-In port is stereo and does not provide power to the microphone. This would be bad if you wanted to connect a microphone that needed power, but it’s ideal for our purposes. Basically it’s like an ordinary Line-In port, with higher sensitivity.

D  Bandwidth Measurement

The principle of the bandwidth measurement is as follows: Equation 6 tells us the variance of the voltage right at the resistor. The quantity we actually measure has been amplified, according to some gain function G which depends on frequency. So a more complete description is

σo2 = 
fmax


0
 4 k T R |G(f)|2 df              (17)

where the subscript o on σo indicates this is the quantity we observe, and fmax is some “large enough” frequency; the real, relevant bandwidth B is hidden in G, which goes to zero at large frequencies. Since G is presumably an even function of frequency, we can rewrite this as

σo2 = 
fmax


− fmax
 2 k T R |G(f)|2 df              (18)

and in a discrete (sampled) system this becomes

σo2 = 
M−1
k=0
 2 k T R |G(k Δf)|2 Δf              (19)

where we identify the sampling frequency N = MΔf = 2 fmax, covering the whole range of positive and negative frequencies. We can write this in the suggestive form

σo2 = 4 k T R G*2 B              (20)

where G*2 is some nominal mid-band power gain, if we identify

B = 
 ½   
G*2
 
M−1
k=0
 |G(k Δf)|2 Δf              (21)

and we call B the power-averaged bandwidth.

By definition the gain G is Vo / Vr, where Vr is the voltage right at the resistor, and Vo is what we observe.

B = 
 ½   
G*2
 
M−1
k=0
 
|Vo(k Δf)|2 
|Vr(k Δf)|2 
Δf              (22)

This expression applies to any Vo/Vr ratio, whether Vr comes from thermal noise or from any other source.

We choose to inject a non-thermal probe signal Vr. It is constructed to be “white” – i.e. it has constant magnitude independent of frequency. That allows us to pull it outside the summation:

B = 
 ½   
G*2 |Vr|2
M−1
k=0
  |Vo(k Δf)|2 Δf
  = 
 ½ M Δf   
G*2 |Vr|2 Δf
M−1
k=0
  |Vo(k Δf)|2 Δf
             (23)

and by Parseval’s theorem we can rewrite that frequency-domain expression as a time-domain expression

B = 
 ½ M Δf   
G*2 |Vr|2 Δt
M−1
k=0
  |Vo(k Δt)|2 Δt
  = 
 N 
2
 
 1   
G*2
 
 ⟨Vo2⟩ 
Vr2⟩ 
             (24)

It is easy to get a good lower bound on B. At compile time, we know the root-mean-square (RMS) magnitude of the internal representation of the probe signal; by construction it is -15dBFS. Then given K and R (and Zref) we know Vr2 in the passband. At frequencies outside the passband this voltage will roll off a little, because of the anti-aliasing filter in the soundcard output stage. This will cause us to under-estimate B by a little, which is harmless. Finally we determine ⟨Vo2⟩ by summing the squares of the observed voltage samples and multiplying by the appropriate calibration factors (involving Q).

You might have thought based on the form of equation 17 that we would need to take a Fourier transform, but we don’t, because of Parseval’s theorem.

E  References

  1. The VENONA project http://www.nsa.gov/docs/venona/index.html especially http://www.nsa.gov/docs/venona/venona_remember.html
  2. Ian Goldberg and David Wagner, “Randomness and the Netscape Browser” http://www.cs.berkeley.edu/~daw/papers/ddj-netscape.html (Break of Netscape SSL PRNG.)
  3. John Viega and Gary McGraw, Building Secure Software Addison-Wesley 2001. See especially the section “How to cheat in online gambling” in chapter 10.
  4. FIPS 198, HMAC - Keyed-Hash Message Authentication Code (March 2002). http://csrc.nist.gov/publications/fips/fips198/fips-198a.pdf
  5. Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone, Handbook of Applied Cryptography, CRC Press (1996). http://www.cacr.math.uwaterloo.ca/hac/ The strict avalanche property is discussed on page 277.
  6. George Marsaglia, “A battery of tests for random number generators”. http://stat.fsu.edu/~geo/diehard.html
  7. Ueli Maurer, “A Universal Statistical Test for Random Bit Generators”. Advances in Cryptology - CRYPTO ’90, Lecture Notes in Computer Science, Springer-Verlag, vol. 537, pp. 409-420, 1990, Final version in J. of Cryptology (1992). Abstract: http://www.crypto.ethz.ch/cgi-bin/show?what=abstractlabel=Maurer90b Code: http://matrix.irb.hr/~mario/ftp/pub/random/utest.tgz
  8. National Security Agency “TEMPEST Fundamentals”. http://cryptome.org/nacsim-5000.htm
  9. Solomonoff, R.J., “A Preliminary Report on a General Theory of Inductive Inference”, Report V-131, Zator Co., Cambridge, Mass, Feb 4 1960.
  10. Solomonoff, R.J. “A Formal Theory of Inductive Inference I and II”, Information and Control 7, 1-22 & 224-254.
  11. Gregory J. Chaitin, Algorithmic Information Theory, Cambridge University Press, 1987 http://www.cs.auckland.ac.nz/CDMTCS/chaitin/cup.pdf
  12. Don Davis, Ross Ihaka, and Philip Fenstermacher, “Cryptographic Randomness from Air Turbulence in Disk Drives”, Proceedings of Crypto 94, Springer-Verlag Lecture Notes in Computer Science, No. 839, 1994.
  13. Peter Gutman, “Software Generation of Practically Strong Random Numbers”,
    http://www.cs.auckland.ac.nz/~pgut001/pubs/random.pdf
  14. Charles Surya, “Network Noise and Intermodulation Distortion” Chapter 3 of Lecture Notes, http://www.eie.polyu.edu.hk/~ensurya/lect_notes/commun_cir/Ch3/Chapter3.htm
  15. FIPS 140-1 SECURITY REQUIREMENTS FOR CRYPTOGRAPHIC MODULES http://www.itl.nist.gov/fipspubs/fip140-1.htm
  16. FIPS 180-1 SECURE HASH STANDARD http://www.itl.nist.gov/fipspubs/fip180-1.htm
  17. J. Kelsey, B. Schneier, and N. Ferguson, “Yarrow-160: Notes on the Design and Analysis of the Yarrow Cryptographic Pseudorandom Number Generator”, Sixth Annual Workshop on Selected Areas in Cryptography, Springer Verlag, August 1999.
    http://www.counterpane.com/yarrow-notes.html
  18. Terry Ritter, “Random Number Machines: A Literature Survey” http://www.ciphersbyritter.com/RES/RNGMACH.HTM
  19. D. Eastlake, 3rd, S. Crocker, J. Schiller, “Randomness Recommendations for Security” (1994) http://www.ietf.org/rfc/rfc1750.txt
  20. F. Chabaud, and A. Joux, “Differential collisions in SHA-0” CRYPTO’98, H. Krawczyk ed., LNCS 1462, pp 56–71, (1998) http://fchabaud.free.fr/English/Publications/sha.pdf
  21. Charanjit S. Jutla and Anindya C. Patthak “Is SHA-1 conceptually sound?” (2005). http://eprint.iacr.org/2005/350.ps
  22. Theodore T’so, “random.c – A strong random number generator” http://www.cs.berkeley.edu/~daw/rnd/linux-rand
  23. Yuval Peres, “Iterating von Neumann’s Procedure for Extracting Random Bits” The Annals of Statistics 1992, pp 590-597. http://www.stat.berkeley.edu/~peres/mine/vn.pdf
  24. Michael Mitzenmacher, “Tossing a Biased Coin” http://www.fas.harvard.edu/~libcs124/CS/coinflip3.pdf
  25. Advanced Linux Sound Architecture http://alsa-project.org

1
You can drop the m=0 term in the sum, because m log(m) is zero for m=0. If this zero isn’t obvious to you, take the limit using l’Hospital’s rule.
2
The metaphor here is that a small change in the input-conditions (throwing a snowball) provokes a 100% change in the output-conditions (knocking loose all the snow on the mountainside). In meteorology and complex-systems theory, the same notion is called the “butterfly effect”.
3
I do not wish to discuss other less-practical more-technical notions of universality, which are very widely misunderstood.
4
See section C.4 for an exception.
5
Other methods are possible. It can be shown that the choice of method doesn’t matter much.
6
As always, we have to choose some units for measuring length and surprise value, and use those units consistently. This is tantamount to choosing a consistent base for our logarithms and exponentials. A conventional and convenient unit is the bit.
[Contents]

Copyright © 2005 jsd

_