Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )
122
4. RANDOM NUMBER GENERATION AND ENCRYPTION
parametrized with four integers, as follows:
µ the modulus
µ>0
α
the multiplier
0≤α<µ
γ
the increment
0≤γ<µ
the starting value, or seed
0 ≤ x0 < µ
x0
If xn is the current value of the “random seed” then a call to the random number
generator first computes
(4.0.28)
xn+1 = (αxn + γ) mod µ
as the seed for the next call, and then returns xn+1 /µ as independent observation of
a pseudo random number which is uniformly distributed in (0, 1).
a mod b is the remainder in the integer division of a by b. For instance 13 mod 10 =
3, 16 mod 8 = 0, etc.
The selection of α, γ, and µ is critical here. We need the following criteria:
• The random generator should have a full period, i.e., it should produce all
numbers 0 < x < µ before repeating. (Once one number is repeated, the
whole cycle is repeated).
• The function should “appear random.”
4. RANDOM NUMBER GENERATION AND ENCRYPTION
123
• The function should implement efficiently with 32-bit arithmetic.
If µ is prime and γ = 0, then for certain values of α the period is µ − 1, with only the
value 0 missing. For 32-bit arithmetic, a convenient value of µ is 231 − 1, which is a
prime number. Of the more than 2 billion possible choices for α, only and handful
pass all 3 tests.
Problem 75. Convince yourself by some examples that for all a, b, and µ follows:
(4.0.29)
a · b mod µ = a · (b mod µ) mod µ
In view of of Question 75, the multiplicative congruential random generator is
based on the following procedure: generate two sequences of integers ai and bi as
follows: ai = x1 · αi and bi = i · µ. ai is multiplicative and bi additive, and µ is a
prime number which is not a factor of α. In other words, ai and bi have very little to
do with each other. Then for each i find ai − bs(i) where bs(i) is the largest b which is
a −b
smaller than or equal to ai , and then form form i µs(i) to get a number between 0
and 1. This is a measure of relationship between two processes which have very little
to do with each other, and therefore we should not be surprised if this interaction
turns out to look “random.” Knuth writes [Knu81, p. 10]: “taking the remainder
mod µ is somewhat like determining where a ball will land in a spinning roulette
wheel.” Of course, this is a heuristic argument. There is a lot of mathematical
124
4. RANDOM NUMBER GENERATION AND ENCRYPTION
theory behind the fact that linear congruential random number generators are good
generators.
If γ = 0 then the period is shorter: then the maximum period is µ − 1 because
any sequence which contains 0 has 0 everywhere. But not having to add γ at every
step makes computation easier.
Not all pairs α and µ give good random number generators, and one should only
use random number generators which have been thoroughly tested. There are some
examples of bad random number generators used in certain hardware or software
programs.
Problem 76. The dataset located at www.econ.utah.edu/ehrbar/data/randu.txt
(which is available as dataset randu in the R-base distribution) has 3 columns and
400 rows. Each row is a consecutive triple of numbers generated by the old VAX
FORTRAN function RANDU running under VMS 1.5. This random generator,
which is discussed in [Knu98, pp. 106/7], starts with an odd seed x0 , the n + 1st
seed is xn+1 = (65539xn ) mod 231 , and the data displayed are xn /231 rounded to 6
digits. Load the data into xgobi and use the Rotation view to check whether you
can see something suspicious.
Answer. All data are concentrated in 15 parallel planes. All triplets of observations of randu
fall into these planes; [Knu98, pp. ??] has a mathematical proof. VMS versions 2.0 and higher use
a different random generator.
4.1. ALTERNATIVES TO LINEAR CONGRUENTIAL
125
4.1. Alternatives to the Linear Congruential Random Generator
One of the common fallacies encountered in connection with random number
generation is the idea that we can take a good generator and modify it a little in
order to get an “even more random” sequence. This is often false.
Making the value dependent on the two preceding values increases the maximum
possible period to µ2 . The simplest such generator, the Fibonacci sequence
(4.1.1)
xn+1 = (xn + xn−1 ) mod µ
is definitely not satisfactorily random. But specific other combinations are good:
(4.1.2)
xn+1 = (xn−100 − xn−37 ) mod 230
is one of the state of the art random generators used in R.
Using more work to get from one number to the next, not mere addition or
multiplication:
(4.1.3)
xn+1 = (αx−1 + γ) mod µ
n
Efficient algorithms exist but are not in the repertoire of most computers. This
generator is completely free of the lattice structure of multiplicative congruential
generators.
126
4. RANDOM NUMBER GENERATION AND ENCRYPTION
Combine several random generators: If you have two random generators with
modulus m, use
(4.1.4)
xm − ym mod µ
The Wichmann-Hill portable random generator uses this trick.
Randomizing by shuffling: If you have xn and yn , put the first k observation
of xn into a buffer, call them v1 , . . . , vk (k = 100 or so). Then construct xn+1 and
yn+1 . Use yn+1 to generate a random integer j between 1 and k, use vj as your next
random observation, and put xn+1 in the buffer at place j. This still gives the same
values as xn but in a different order.
4.2. How to test random generators
Chi-Square Test: partition the outcomes into finitely many categories and test
whether the relative frequencies are compatible with the probabilities.
Kolmogorov-Smirnoff test for continuous distributions: uses the maximum distance between the empirical distribution function and the theoretical distribution
function.
Now there are 11 kinds of empirical tests, either on the original xn which are
supposedly uniform between 0 and 1, or on integer-valued yn between 0 and d-1.
4.2. HOW TO TEST RANDOM GENERATORS
127
Equidistribution: either a Chi-Square test that the outcomes fall into d intervals,
or a Kolmogoroff-Smirnov test.
Serial test: that all integer pairs in the integer-valued outcome are equally likely.
Gap test: for 0 ≤ α < β ≤ 1 a gap of length r is a sequence of r + 1 consecutive
numbers in which the last one is in the interval, and the others are not. Count the
occurrence of such gaps, and make a Chi Squared test with the probabilities of such
occurrences. For instance, if α = 0 and β = 1/2 this computes the lengths of “runs
above the mean.”
Poker test: consider groups of 5 successive integers and classify them into the
7 categories: all different, one pair, two pairs, three of a kind, full house, four of a
kind, five of a kind.
Coupon collectors test: observe the length of sequences required to get a full set
of integers 0, . . . , d − 1.
Permutation test: divide the input sequence of the continuous random variable
into t-element groups and look at all possible relative orderings of these k-tuples.
There are t! different relative orderings, and each ordering has probability 1/t!.
Run test: counts runs up, but don’t use Chi Square test since subsquent runs
are not independent; a long run up is likely to be followed by a short run up.
128
4. RANDOM NUMBER GENERATION AND ENCRYPTION
Maximum-of-t-Test: split the sample into batches of equal length and take the
maximum of each batch. Taking these maxima to the tth power should again give
an equidistributed sample.
Collision tests: 20 consecutive observations are all smaller than 1/2 with probability 2−20 ; and every other partition defined by combinations of bigger or smaller
than 1/2 has the same probability. If there are only 214 observations, then on the
average each of these partitions is populated only with probability 1/64. We count
the number of “collisions”, i.e., the number of partitions which have more than 1 observation in them, and compare this with the binomial distribution (the Chi Square
cannot be applied here).
Birthday spacings test: lagged Fibonacci generators consistently fail it.
Serial correlation test: a statistic which looks like a sample correlation coefficient
which can be easily computed with the Fast Fourier transformation.
Tests on subsequences: equally spaced subsequences are usually worse than the
original sequence if it is a linear congruential generator.
4.3. The Wichmann Hill generator
The Wichmann Hill generator defined in [WH82] can be implemented in almost
any high-level language. It used to be the default random number generator in R,
but version 1.0 of R has different defaults.
4.3. THE WICHMANN HILL GENERATOR
129
Since even the largest allowable integers in ordinary programming languages
are not large enough to make a good congruential random number generator, the
Wichmann Hill generator is the addition mod 1 of 3 different multiplicative congruential generators which can be computed using a high-level programming language. [Zei86] points out that due to the Chinese Remainder Theorem, see [Knu81,
p. 286], this is equivalent to one single multiplicative congruential generator with
α = 1655 54252 64690 and µ = 2781 71856 04309. Since such long integers cannot
be used in ordinary computer programs, Wichmann-Hill’s algorithm is an efficient
method to compute a congruential generator with such large numbers.
Problem 77. Here is a more detailed description of the Wichmann-Hill generator: Its seed is a 3-vector x1 y1 z1 satisfying
(4.3.1)
0 < x1 ≤ 30269
(4.3.2)
0 < y1 ≤ 30307
(4.3.3)
0 < z1 ≤ 30323
130
4. RANDOM NUMBER GENERATION AND ENCRYPTION
A call to the random generator updates the seed as follows:
(4.3.4)
x2 = 171x1 mod 30269
(4.3.5)
y2 = 172y1 mod 30307
(4.3.6)
z2 = 170z1 mod 30323
and then it returns
x2
y2
z2
+
+
mod 1
30269 30307 30323
as its latest drawing from a uniform distribution. If you have R on your computer,
do parts b and c, otherwise do a and b.
(4.3.7)
• a. 4 points Program the Wichmann-Hill random generator in the programming
language of your choice.
Answer. A random generator does two things:
• It takes the current seed (or generates one if there is none), computes the next seed from it, and
stores this next seed on disk as a side effect.
• Then it converts this next seed into a number between 0 and 1.
The ecmet package has two demonstration functions which perform these two tasks separately for
the Wichmann-Hill generator, without side effects. The function next.WHseed() computes the next
seed from its argument (which defaults to the seed stored in the official variable .Random.seed), and
4.3. THE WICHMANN HILL GENERATOR
131
the function WH.from.current.seed() gets a number between 0 and 1 from its argument (which
has the same default). Both functions are one-liners:
next.WHseed <- function(integer.seed = .Random.seed[-1])
(c( 171,
172,
170) * integer.seed) %% c(30269, 30307, 30323)
WH.from.current.seed <- function(integer.seed = .Random.seed[-1])
sum(integer.seed / c(30269, 30307, 30323)) %% 1
• b. 2 points Check that the 3 first numbers returned by the Wichmann-Hill
random number generator after setting the seed to 1 10 2000 are 0.2759128 0.8713303
0.6150737. (one digit in those 3 numbers is wrong; which is it, and what is the right
digit?)
Answer. The R-code doing this is ecmet.script(wichhill):
##This script generates 3 consecutive seeds, with the
##initial seed set as (1, 10, 2000), puts them into a matrix,
##and then generates the random numbers from the rows of
##this matrix:
my.seeds <- matrix(nrow=3, ncol=3)
132
4. RANDOM NUMBER GENERATION AND ENCRYPTION
my.seeds[1,] <- next.WHseed(c(1, 10, 2000))
my.seeds[2,] <- next.WHseed(my.seeds[1,])
my.seeds[3,] <- next.WHseed(my.seeds[2,])
my.unif <- c(WH.from.current.seed(my.seeds[1,]),
WH.from.current.seed(my.seeds[2,]),
WH.from.current.seed(my.seeds[3,]))
• c. 4 points Check that the Wichmann-Hill random generator built into R is
identical to the one described here.
Answer. First make sure that R will actually use the Wichmann-Hill generator (since it is not
the default): RNGkind("Wichmann-Hill"). Then call runif(1). (This sets a seed if there was none,
or uses the existing seed if there was one.) .Random.seed[-1] shows present value of the random
seed associated with this last call, dropping 1st number which indicates which random generator
this is for, which is not needed for our purposes. Therefore WH.from.current.seed(), which takes
.Random.seed[-1] as default argument, should give the same result as the last call of the official
random generator. And WH.from.current.seed(next.WHseed()) takes the current seed, computes
the next seed from it, and converts this next seed into a number between 0 and 1. It does not write
the updated random seed back. Therefore if we issue now the official call runif(1) again, we should
get the same result.