29 January, 2010

True random numbers: linguistics may be of help

I'm completely ignorant in computer science, but even I know that it's tough to get true random numbers. The problem is that any software-based algorithm generates "pseudo-random" numbers, that is there is still some regularity in a row of those, and the sequence will repeat itself after a while.

People are used to get the real randomness from the outside, like picking up the thermal noise of the sound card, the displacements of mouse (that are supposed to be random), or even monitoring the atmosphere. But, there is at least one challenging method to get true random numbers out of the software part.

Let's take a block of English text (say, a .pdf of a paper), written by a native Russian speaker. What is to do, is to analyze the articles "a" and "the" over there. If the correct article is used, we have "1", if not – "0". In such a way, we get a truly random sequence of 1's and 0's, and we can generate true random base-10 numbers out of it.

Is there anyone who wants to give it a try? )

Take care,

Misha

3 comments:

Daniel Lemire said...

Interesting idea, but in what sense is it random? How do you know that the 1s are as likely as the 0s?

Plus, you'll need a lot of text if you only consider the articles. Why not look at the bits of a compressed document? Maybe that's a lot more random?

But there is a deeper issue. Whereas in some domains, such as cryptography, it is very important to have true random numbers, in most cases, true randomness is not needed. It just needs to pass as random. A Mersenne Twister is pretty good.

Lemeshko said...

Dear Daniel, thanks for the comments.

Well... The 0-1 distribution will be non-uniform. But the "real random" numbers picked up from the environment also have a non-uniform distribution, don't they?

Daniel Lemire said...

Don't forget that it is possible to download large files containing recorded true random numbers.

Back to heat sensors... I think that when they use heat sensors to generate random numbers, they look at the last digits only. So whereas the first digits of the temperature are certainly not random, the last few digits are very random.

Anyhow, the problem is that whatever you do, it must be "more random" than what you can get with a Mersenne Twister. Yes, a Mersenne Twister will eventually repeat... but by then, we will be hunting for a new planets as the Sun will be dying.