In his spare time, an engineer found flaws in the classic book "A Million Random Digits"

The line between mathematics and computer science, when trained classically, is very thin. I trust that any competent, classically trained Computer Scientist / Computer Engineer will have the training for the math involved.

Now, the specialization of the Mathematician and Computer Scientist on a personal level may make one more suited for the task than the other; but to dismiss the computer expert because of their field of study when discussing a mathematical analysis of the outputs of a computer program is ridiculous.

Do not mistake a code monkey or a programmer for a Computer Scientist or Computer Engineer.

With that being said, peer review of this case is important, and I hope that a vigorous discussion occurs.


Indeed. An actual computer scientist ought to be well qualified to judge the quality of random numbers, but computer science training does not make one a computer scientist.

I’ve got an undergraduate degree in computer science; I know just enough to know that I don’t know enough to judge the quality of random numbers myself and that if it’s important I should leave that to the experts.


So, in answer to the question “what you do when you are really really really bored?”…


To be honest, looking at his LinkedIn profile, he may be a bit light in qualifications for this analysis. Or not. (ETA: his LinkedIn looks like a well seasoned IT professional; but doesn’t show a heavy statistics background and doesn’t list his degree.)

I assume, however, that since he works for the Rand corporation and this was published by the Wall Street Journal, that his math has been checked by more experienced people. This would be pretty off the reservation if it was not vetted by their research arm, and I would imagine would be the kind of thing that would create an opportunity to redefine your career if it wasn’t…

1 Like

Define N as a random variable representing the number of runs of 4 equal digits in 50,000 digits.

Define N for Neerd!! Am I right? … No?

1 Like

I’ll refrain from weighing in until I brush up on what The Chicago Manual of Style has to say about setting out lists of random number.


I’m not familiar with the options Matlab gives you for RNG, or which ones he used; but most computers you would bother to run Matlab on at this point have actual hardware RNGs. Via got in early but is mostly dead now, Intel added a hardware RNG to Ivy Bridge in 2012 to serve RDRAND/RDSEED. AMD was later; but has had support for at least a couple of years.

Much less reason to risk being in von Neumann sin with modern hardware.

(Edit: that’s actually what surprises me about seeing the book described as being in continued active use, rather than just occupying an interesting place in the history of science and source of trustworthy-looking constants. You can get hardware RNGs good for hundreds of kilobits to multiple gigabits/s with what is now pretty proasic hardware. Faster still if PRNGs seeded with authentic entropy are acceptable.)


Given how many of them will do it for you; it’s just plain inefficient to do it yourself.


You know what they say, foreworded is forewarned.


It’s really more of a historical relic. People might use it for demonstrations because it is well known and traditional like using the Lenna image for image processing demonstrations, but there are many more options for random number generation now. Nobody needs to use the RAND sequence, and typically would need larger quantities of different random numbers. Nobody should be relying on the statistical properties of a single, well known, relatively short sequence of random numbers. I’m sure some people still are, but honestly if they are, they are probably making other, much more significant errors in their analysis anyway.

This doesn’t sound particularly off. I haven’t worked through the math on this to verify that 40 is the expectation much less the distribution of 4-runs. The distribution is probably not exactly poissonian, but I think it should be fairly close for rare events like a run of 4. If that is the case and the expectation number is 40, then 48 is only a little over one standard deviation (6.3) away. Unless this was coupled with a pattern of under- or over-representation of runs it’s not really worth talking about. Or someone can correct me if I’m wrong about the distribution.

I’m not sure if this was intended as a joke, or odd wording on the part of the WSJ trying to explain this in “laymans terms”, or what, but it doesn’t sound right.

Since the numbers are public it’s hard to imagine any system that relied on them being unbiased for security but that wouldn’t already be compromised just by your adversary knowing the sequence.

That was the case in the 50s, it isn’t really true now. Modern CPUs have adequate true random number generators, with much better ones available as hardware or over the internet if you need them, and we also have good cryptographic pseudo-random number generators which are not random but should be unbiased relative to almost any process and are reproducible when that is needed.

Electronic distribution is now also easy so if you want to show that a simulation or analysis is reproducible, you don’t have to point someone at the RAND book, you can just upload the specific random digits you used along with the code.


So that’s how the NEST shuts down everyone’s opinion about whether they’re comfortable, offsetting randomly from anyone’s guess at some low-70-deg. F target. Or running the filament laden with Mt. Etna ash to series of 4-digit temperatures as an improved hearth. Sometimes the vacuum UV dichroism seems to bring such structural certainty to the dank, it does seem a bit chilly.

1 Like

that’s pretty cool. i don’t know too much about it. the wikipedia article on hardware rng says:

Hardware random number generators generally produce only a limited number of random bits per second. In order to increase the available output data rate, they are often used to generate the “seed” for a faster cryptographically secure pseudorandom number generator, which then generates a pseudorandom output sequence at a much higher data rate.

so, re: the use of matlab to generate a comparison set of numbers… it sounds like it depends. if you’re still using a pseudo random algorithm after the rng youre not going to get truly random number distribution.

interesting to note the article also says:

Even though macroscopic processes are deterministic under Newtonian mechanics, the output of a well-designed device like a roulette wheel cannot be predicted in practice, because it depends on the sensitive, micro-details of the initial conditions of each use.

i wonder then if it’s possible that the roulette they used had some hidden bias.

i definitely like the idea that the set of numbers they generated is within the set of the possible distributions of random numbers – it sounds sensible. hopefully mathy people will poke into it more.

sometimes its hard to justify new processes if the old ones work. no need to re-invent the wheel and all that without some reason… maybe this becomes the reason.

1 Like

I was told that any discipline with ‘science’ in its label, wasn’t one. Cf. Christian Science, Political Science, Computer Science, etc.

If it’s a good PRNG, though, the output should be indistinguishable from truly random output, within whatever constraints the PRNG guarantees.

Well, any set of numbers is a possible result when generating a truly random set of numbers - but by statistically analyzing a supposedly random set, you can say what the probability is that it is the result of true randomness. It is not impossible to generate a million digits and have every single one be ‘5’, just extremely improbable.

Unfortunately it’s kinda hard to come up with a test which adequately captures all possible ways in which your output might be predictable rather than truly random, but since that problem turns out to be very important in a lot of areas of statistics and cryptography, there’s been quite a lot of research into it.


Should get this guy to make my random music player, which currently wants to play the same dozen songs over and over from a USB stick with a thousand songs on it.

And forwarded is sent packing.


Yep. As an engineer, I endorse this comment.


Randomness is chaotic

We are certainly the worst of the bunch, I concur, but I’ve lost count of how many mechanical engineers I’ve met who are climate change deniers “because thermodynamics”, or that time an antenna engineer tried to lecture me on cryptography, or all the times mechanical engineers have tried to reinvent food “because biology is easy”.


Oh hell yes. I’ve got one of those mechanical engineers in my professional circle. Thinks AGW is bunk because water is a more significant greenhouse gas. Dude, call me when you see CO2 raining out of the sky at a rate of tons per second.

An interesting take, from a woman with a double degree in engineering and philosophy:

"the main skills you learn in a humanities degree are timeless: critical reading, critical thinking, communication of complex ideas, and most importantly (in my opinion) logical reasoning.

These skills have made me a far better engineer than I would have been without them, and I expect the same is true for most others with an arts degree, no matter which field they enter." [my emphasis]