©2009 2010 2011 2012 Jennifer Harlow, Dominic Lee and Raazesh Sainudiin.
©2016 Raazesh Sainudiin.
Creative Commons AttributionNoncommercialShare Alike 3.0
How can we produce realisations from $Uniform(0,1)$, the fundamental random variable?
i.e., how can we produce samples $(x_1, x_2, \ldots, x_n)$ from $X_1, X_2, \ldots, X_n$ $\overset{IID}{\thicksim}$ $Uniform(0,1)$?
What is Sage doing when we ask for random()?

Modular arithmetic and number theory gives us pseudorandom number generators.
What can we do with samples from a $Uniform(0,1)$ RV? Why bother?
Answer:
We can use them to sample or simulate from other, more complex, random variables. These simulations can be used to understand realworld phenomenon such as:
The starting point for all of this is modular arithmetic ...
Modular arithmetic (arithmetic modulo $m$) is a central theme in number theory and is crucial for generating random numbers from a computer (in fancylingo, "machineimplementing objects in probability theory"). Being able to do this is essential for computational statistical experiments and Monte Carlo methods. Such computergenerated random numbers are technically called pseudorandom numbers.
In this worksheet we are going to learn to add and multiply modulo $m$ (this part of our worksheet is adapted from William Stein's SAGE worksheet on Modular Arithmetic for the purposes of linear congruential generators). If you want a more thorough treatment see http://en.wikipedia.org/wiki/Modular_arithmetic.
Remember when we talked about the modulus operator %? The modulus operator gives the remainder after division:




Arithmetic modulo $m$ is like usual arithmetic, except every time you add or multiply, you also divide by $m$ and return the remainder. For example, working modulo $m=12$, we have:
since $2$ is the remainder of the division of $14$ by $12$.
Think of this as like the hours on a regular analog clock. We already do modular addition on a regular basis when we think about time, for example when answering the question:
"If it is 8pm now then what time will it be after I have spent the 6 hours that I am supposed to dedicate this week toward this course? Answer: 2am."
Modular addition and multiplication with numbers modulo $m$ is well defined, and has the following properties (we will just assume them here  you'd cover them properly in a basic algebra course):
Let us make a matrix of results from addition and multiplication modulo 4 .
[1 2 3] [4 5 6] [1 2 3] [4 5 6] 
[(0, 0, 0), (0, 1, 1), (0, 2, 2), (0, 3, 3), (1, 0, 1), (1, 1, 2), (1, 2, 3), (1, 3, 0), (2, 0, 2), (2, 1, 3), (2, 2, 0), (2, 3, 1), (3, 0, 3), (3, 1, 0), (3, 2, 1), (3, 3, 2)] [(0, 0, 0), (0, 1, 1), (0, 2, 2), (0, 3, 3), (1, 0, 1), (1, 1, 2), (1, 2, 3), (1, 3, 0), (2, 0, 2), (2, 1, 3), (2, 2, 0), (2, 3, 1), (3, 0, 3), (3, 1, 0), (3, 2, 1), (3, 3, 2)] 
[0, 1, 2, 3, 1, 2, 3, 0, 2, 3, 0, 1, 3, 0, 1, 2] [0, 1, 2, 3, 1, 2, 3, 0, 2, 3, 0, 1, 3, 0, 1, 2] 
[0 1 2 3] [1 2 3 0] [2 3 0 1] [3 0 1 2] [0 1 2 3] [1 2 3 0] [2 3 0 1] [3 0 1 2] 
[0 0 0 0] [0 1 2 3] [0 2 0 2] [0 3 2 1] [0 0 0 0] [0 1 2 3] [0 2 0 2] [0 3 2 1] 

In the following interactive image (created by William Stein) we make an addition and multiplication table modulo $m$, where you can control $m$ with the slider. Try changing $m$ and seeing what effect it has:
Click to the left again to hide and once more to show the dynamic interactive window 

Look at this for a while. Answer the following questions (which refer to the default colour map) to be sure you understand the table:
You can change the colour map if you want to.
You try
You should be able to understand a bit of what is happening here. See that there are two list comprehensions in there. Take the line "matrix(m,m,[(i+j)%m for i in range(m) for j in range(m)])". The list comprehension part is "[(i+j)%m for i in range(m) for j in range(m)]". Let's pull it out and have a look at it. We have set the modulo m to 6 but you could change it if you want to:

This list comprehension doing double duty: remember that a list comprehension is like a short form for a for loop to create a list? This one is like a short form for one forloop nested within another forloop. We could recreate what is going on here by making the same list with two forloops. This one uses modulo m = 6 again.

Notice that the last statement in the list comprehension, "for j in range(m)", is the inner loop in the nested forloop.
The next step that Stein's interactive image is to make a matrix out of the list. We won't be doing matrices in detail in this course (we decided to concentrate on arrays instead), but if you are interested, this statement uses the matrix function to make a matrix out of the list listFormAdd. The dimensions of the matrix are given by the first two arguments to the matrix function, m, m.

Optionally, you can find out more about matrices from the documentation.
Try recreating the matrix for multiplication, just as we have just recreated the one for addition.


(end of You Try)
The simplest way to create a number modulo $m$ in Sage is to use the Mod(a,m) command. We illustrate this below.

Let's assign it to a variable so that we can explore it further:



We will compare myModInt to a 'normal' Sage integer:



We can see that myModInt and myInt are different types, but what does this mean? How do they behave?
Try addition:


Was this what you already expected?
What about multiplication?


What's going on here? As we said above, arithmetic modulo $m$ is like usual arithmetic, except every time you add or multiply, you also divide by $m$ and return the remainder. 8 x 6 is 48, and the remainder of 48 divided by 12 (12 is our modulo) is 0.
What about this one? What's happening here?


You can create the number giving your year of birth ($\pm 1$ year) in a similar way. For example, if you are 19 years old now then find the number 19 modulo 2010.


You try
Let us assign 10 modulo 12 to a variable now

And add 1 to now
Put in the expression to do this in Sage into the cell below


And subtract 13 from the previous expression.
Sage expression is


Also try adding 14 to the previous expression  the new postition is not shown on this clock but you should be able to see what it should be.

And multiplying 2 by now (or now by 2)

Try making and using some modular integers of your own. Make sure that you can predict the results of simple addition and multiplication operations on them: this will confirm that you understand what's happening.


What happens if you try arithmetic with two modular integers? Does it matter if the moduli of both operands a and b in say a+b are the same if a and b are both modular integers? Does this make sense to you?




(end of You Try)
A linear congruential generator (LCG) is a simple pseudorandom number generator  a simple way of imitating the $Uniform(0,1)$. "Pseudorandom" means that the numbers are not really random. We will look at what we mean by that as we find out about linear congruential generators.
The theory behind LCGs is easy to understand, and they are easily implemented and fast.
To make a LCG we need:
Using these inputs, the LCG generates numbers $x_1, x_2, \ldots x_{n1} $ where $x_{i+1}$ is calculated from $x_i$ as defined by the following recurrence relation:
$x_0,x_1,\ldots,x_{n1}$ is the sequence of pseudorandom numbers called the linear congruential sequence.
We can define a function parameterised by $(m,a,c,x_0,n)$ to give us a linear congruential sequence in the form of a list.
(Remember about function parameters? The function parameters here are m, a, c, x0, and n. Note that the return value of the function is a list.

You don't need to be able write this function, but you should be able to fully understand what it is doing. The function is merely implementing the pseudocode or algorithm of the linear congruential generator using a forloop and modular arithmetic: note that the generator produces integers modulo $m$.
Are all linear congruential generators as good as each other? What makes a good LCG?
One desirable property of a LCG is to have the longest possible period. The period is the length of sequence we can get before we get a repeat. The longest possible period a LCG can have is $m$. Lets look at an example.

You should be able to see the repeating pattern 9, 8, 6, 2, 11, 12, 14. If you can't you can see that the sequence actually contains 8 unique numbers by making a set out of it:

Changing the seed $x_0$ will, at best, make the sequence repeat itself over other numbers but with the same period:


At worst, a different seed might make the sequence get stuck immediately:



What about changing the multiplier $a$?



We want an LCG with a full period of $m$ so that we can use it with any seed and not get stuck at fixed points or short periodic sequences. This is a minimal requirement for simulation purposes that we will employ such sequences for. Is there anyway to know what makes a LCG with a full period? It turns out that an LCG will have a full period if and only if:
(Different conditions apply when $c=0$. The Proposition and Proof for this are in Knuth, The Art of Computer Programming, vol. 2, section 3.3).
Sage has a function gcd which can calculate the greatest common divisor of two numbers:

Sage can also help us to calculate prime factors with the prime_factors function:


$(m, a, c, x0, n) = (256, 137, 123, 13, 256)$ gives a linear congruential sequence with the longest possible period of 256. Let us see how these parameters satisfy the above three requirements while those earlier with $m=17$, $a=2$, and $c=7$ do not.








Thus, the parameters $(m, a, c, x_0, n) = (256, 137, 123, 13, 256)$ do indeed satisfy the three conditions to guarantee the longest possible period of 256. In contrast, for the LCG example earlier with $m=17$, $a=2$, and $c=7$, although $m$ and $c$ are relatively prime, i.e., $gcd(m,c)=1$, we have the violation that $a1=21=1$ is not divisible by the only prime factor $17$ of $m=17$. Thus, we cannot get a period of maximal length $17$ in that example.


Let us see if the parameters $(m, a, c, x_0, n) = (256, 137, 123, 13, 256)$ that satisfy the three conditions to guarantee the longest possible period of 256 do indeed produce such a sequence:


ourLcg is an example of a good LCG.
The next example will demonstrate what can go wrong.
Consider $m= 256$, $a = 136$, $c = 3$, $x_0 = 0$, $n = 15$.


But, since $a1=$135 is not divisible by 2, the only prime factor of $m=$256, we get into the fixed point 91 no matter where we start from. Try changing the seed $x_0$. Does it make a difference?

We can look at the linear congruential sequence generated by $(m,a,c,x_0,n)=(256,137,0,123,256)$ from Knuth's classic [The Art of Computer Programming, vol. 2, Sec. 3.3.4, Table 1, Line 5] and compare it with ourLcg.



Note that although ourLcg has maximal period of 256, lcgKnuth334T1L5 has period of 32. Let us look at them as points.
We can plot each number in the sequence, say ourLcg, against its index in the sequence  i.e. plot the first number 13 against 0 as the tuple (0, 13), the second number against 1 as the tuple (1, 112), etc. To do this, we need to make a list of the index values, which is simple using range(256) which as you know will give us a list of numbers from 0 to 255 going up in steps of 1. Then we can zip this list with the sequence itself to make the list of tuples .

Then we plot the points for ourLcg and for lcgKnuth334T1L5

We can see that in section 3.3.4, Table 1, line 5 in The Art of Computer Programming, Knuth is giving an example of a particularly bad LCG.
When we introducted LCGs, we said that using an LCG was a simple way to imiate the $Uniform(0, 1)$, but clearly so far we have been generating sequences of integers. How does that help?
To get a simple pseudo Uniform(0,1) generator, we scale the linear congruential sequence over [0, 1]. We can do this by dividing each element by the largest number in the sequence (256 in the case of ourLcg).
Important note: The numbers in the list returned by our linConGen function are integers modulo $m$.

You will need to convert these to Sage integers or Sage multiprecision floating point numbers using int() or RR() command before dividing by $m$. Otherwise you will be doing modular arithmetic when you really want the usual division. This may come up in assignments so make sure you have noted it!

Having sorted that out, we can use a list comprehension as a nice neat way to do our scaling:

This is more like it! We could have a look at this on a plot. Again again want tuples (index, element in scaled sequence at index position), which we can get using range(256)(to get the indexes 0, .., 255) and zip:

Now we have points on the real line. We could use a histogram to look at their distribution. If we are hoping that our LCG, once we have scaled the results, is imitating the $Uniform (0,1)$, what kind of shape would we want our histogram to be?

Is this roughly what you expected?
(Please note: we are using the histogram plots to help you to see the data, but you are not expected to be able to make one yourself.)
We could repeat this for the Knuth bad LCG example:

And show it as a histogram. Given the pattern above, what would you expect the histogram to look like?

The above generators are cute but not useful for simulating Lotto draws with 40 outcomes. Minimally, we need to increase the period length with a larger modulus $m$.
But remember that the quality of the pseudorandom numbers obtained from a LCG is extremely sensitive to the choice of $m$, $a$, and $c$.
To illustrate that having a large $m$ alone is not enough we next look at RANDU, an infamous LCG which generates sequences with strong correlations between 3 consecutive points, which can been seen if we manipulate the sequence to make 3dimensional tuples out of groups of 3 consecutive points.
In the cell below we make our scaled sequence in one step, using a list comprehension which contains the expression to generate the LCG and the scaling.

Have a look at the results as a histogram:

Now we are going to use some of the array techniques we have learned about to resize the sequence from the RANDU LCG to an array with two columns. We can then zip the two columns together to make tuples (just pairs or twotuples).





Let us resize the LCG to an array with three columns. We can then zip the three columns together to make tuples. The effect will be that if our original sequence was $x_0, x_1, x_2, x_3, \ldots, x_{n3}, x_{n2}, x_{n1}$, we will get a list of triplets, tuples of length three, $(x_0, x_3, x_6), (x_1, x_4, x_7), \ldots, (x_{n16}, x_{n13}, x_{n1})$. Unlike the pairs in 2D which seem wellscattered and random, triplets from the RANDU LCG are not very random at all! They all lie on parallel planes in 3D.


You can alter your perspective on this image using the mouse. From a particular perspective you can see that something has gone horribly wrong ... RANDU is a really ugly LCG.
The above generators are of low quality for producing pseudorandom numbers to drive statistical simulations. We end with a positive note with a LCG that is in use in the Gnu Compiler Collection. It does not have obvious problems as in small periods or as high a correlation as RANDU.


Even good LCG are not suited for realistic statistical simulation problems. This is because of the strong correlation between successive numbers in the sequence. For instance, if an LCG is used to choose points in an ndimensional space, the points will lie on, at most, $m^{1/n}$ hyperplanes. There are various statistical tests that one can use to test the quality of a pseudorandom number generator. For example, the spectral test checks if the points are not on a few hyperplanes. Of course, the Sample Mean, Sample Variance, etc. should be as expected. Let us check those quickly:
Recall that the population mean for a $Uniform(0,1)$ RV is $\frac{1}{2}$ and the population variance is $\frac{1}{12}$.



To go into this topic in detail is clearly beyond the scope of this course. You should just remember that using computers to generate random numbers is not a trivial problem and use care when employing them especially in higher dimensional or less smooth problems.
We will use a pseudorandom number generator (PRNG) called the Mersenne Twister for simulation purposes in this course. It is based on more sophisticated theory than that of LCG but the basic principles of recurrence relations are the same.
(The Mersenne Twister is a variant of the recursive relation known as a twisted generalised feedback register. See [Makato Matsumoto and Takuji Nishimura, "Mersenne Twister: A 623dimensionally equidistributed uniform pseudorandom number generator, ACM Transactions on Modelling and Computer Simulation, vol. 8, no. 1, Jan. 1998, pp. 320.], or the Wikipedia page.)
The Mersenne Twister has a period of $2^{19937}1 \approx 10^{6000}$ (which is essentially a Very Big Number) and is currently widely used by researchers interested in statistical simulation.

You try
Our example using prngs is going to use a very useful function from the pylab module called cumsum. This calculates a cumulative sum from a sequence of numeric values. What do we mean by a cumulative sum? If we have a sequence $x_0, x_1, x_2, x_3, \ldots, x_n$, then we can derive a sequence that is the cumulative sum,
$$x_0, \,x_0 + x_1,\, x_0+ x_1+x_2, \,\dots, \sum_{i=0}^nx_i$$
Try evaluating the next cell to get the cumulative sum of the sequence 0, 1, 2, ..., 9

You will see that the result is in a pylab.array. This can be useful, but all we need is a list, we can convert this back to a list with the array's tolist method:

Using the list function to make a list will do the same thing:

Try some lists and cumsums for yourself. Please note that cumsum and tolist (or list) may come up in assessments. This is your chance to become familiar with them.


Now, say we have some probabilities in a list and we want to get the cumulative sum of them. Wecan use cumsum as we did above. We are going to start with just two probabilties which (for reasons that will shortly become clear), weI will assign as values to variables pLeft and pRight.

Note that cumsum works on a tuple as well as a list, providing that the elements of the tuple are some kind of number.
You are now going to use some simulated $Uniform(0,1)$ samples to simulate the famous Drunkard's Walk. The idea of the Drunkard's Walk is that the Drunkard has no idea where he is going: at each decision point he makes a random decision about which way to go. We simulate this random decison using our $Uniform(0,1)$ samples.
We are going to have quite a limited version of this. At each decision point the Drunkard can either go one unit left, right, up or down. Effectively, he is moving around on a (x, y) coordinate grid. The points on the grid will be tuples. Each tuple will have two elements and will represent a point in 2d space. For example, (0,0) is a tuple which we could use as a starting point.
First, we look at a useful feature of tuples: we can unpack a tuple and assign its elements as values to specified variables:


We use this useful feature in the functions we have defined below. You now know what is happening in each step of these functions and should be able to understand what is happening.


You should be thinking "Hey that is a lot of duplicated code: I thought that functions should help us not to duplicate code!". You are completely right, but this example will be easier to understand if we just live with some bad programming for the sake of clarity.
Now, experience the flexibility of Python: We can have lists of functions!

To demonstrate how we can use our list of functions, take some path which is represented by a list of points (tuples):

If you want to, see if you can find the tuple which is the last point in the path so far using len(somePath) and the indexing operator [ ].

We put the functions into the list in order, left to right, so we know that the first function in the list (index 0) is makeLeftTurnPoint, and the second function in the list (index 1) is makeRightTurnPoint. We can now call makeLeftTurnPoint by calling movementFunctions[0], and passing it the arguement somePath which is our example path so far. We should get back a new point (tuple) which is the point to the immediate left of the last point in the list somePath:

This has not added the new point to the list, but we can do this as well:

Try doing a similar thing to find same to now find the next right point.


That's all very well, but what about some random moves? What we are now going to do is to use random( ) to make our decisions for us. We know that the number generated by random will be in the interval [0,1). If all directions (up, down, left, right) are equally probable, then each has probability 0.25. All directions are independent. The cumulative probabiltiies can be thought of as representing the probabilities of (up, up or down, up or down or left, up or down or left or right).

Using these accumulated probabilities we can simulate a random decision: if a realisation of a $Uniform(0,1)$, $u$, is such that $0 \le u < 0.25$ then we go left. If $0.25\le u < 0.50$ we go right, if $0.50 \le u < 0.75$, we go up, and if $0.75 \le u < 1$ we go down.
We can demonstrate this:

You will see that we have only dealt with the cases for left and right. You may well not get $n$ lines of output. You can have a go at improving this in very soon. First, one more thing ...
We can tidy up the if statement a bit by using another for loop and the break statement. The break statement can be used to break out a loop. In this case we want to compare u to each cumulative probability until we find a 'match' (u < cumulative probability) and then break out of the for loop.

Now, try adding the cases to deal with up and down.


Now we can combine all this together to make a simulated Drunkard's Walk: First, a little helper function to plot a list of points as lines.

Now the real stuff:

A bit boring? A bit onedimensional? See if you can add up and down to the Drunkard's repetoire. You will need to:
Try it and see!






