©2009 2010 2011 2012 Jennifer Harlow, Dominic Lee and Raazesh Sainudiin.
Creative Commons AttributionNoncommercialShare Alike 3.0
This is a course about computational statistical experiments with Monte Carlo methods.
Official Description: This course is about the generation of random numbers and their uses, including computer simulations to mimic and contrast random realworld phenomena. It will provide an intuitive and practical understanding of the basic methods in computational statistics, and show how to implement statistical algorithms to manipulate, visualise and comprehend various aspects of realworld data.
Who does computational statistical experiments?
A statistical experimenter is a person who conducts a statistical experiment. Roughly, an experiment is an action with an empirically observable outcome (data) that cannot necessarily be predicted with certainty (in the sense that a repetition of the experiment may result in a different outcome). An experimenter attempts to learn about a phenomenon through the outcome of an experiment. An experimenter is often a decisionmaker, scientist or engineer.
A simple example of an experiment is the Quincunx and a more popular one is the NZ Lotto or the British Lotto (See British Lottery animation). Other experiments close to home we may see in this course include earth quakes in NZ, sea shells along New Brighton beach, etc.
Recent technological advances are facilitating computationally intensive statistical experiments based on possibly massive amounts of empirical observations, in a manner that was not viable a decade ago. Hence, a successful decisionmaker, scientist or engineer in most specialisations today is a computational statistical experimenter.
A computational statistical experimenter has to tell a machine what to do with the data, i.e. program the machine. In addition, statistical experimenters use a mathematically formal way of thinking about their experiments. They use set theory, probability theory and other branches of pure and applied mathematics through established statistical theory to reach their administrative, scientific and engineering decisions from their data.
This course is designed to help you take the first steps along this path.
What is Sage and why are we using it?
We will be using Sage for our handson work in this course. Sage is a free opensource mathematics software system licensed under the GPL.
Sage can be used to study mathematics and statistics, including algebra, calculus, elementary to very advanced number theory, cryptography, commutative algebra, group theory, combinatorics, graph theory, exact linear algebra, optimization, interactive data visualization, randomized or Monte Carlo algorithms, scientific and statistical computing and much more. It combines various software packages into an integrative learning, teaching and research experience that is well suited for novice as well as professional researchers.
Sage is a set of software libraries built on top of Python, a widely used general purpose programming language. Sage greatly enhance Python's already mathematically friendly nature. It is one of the languages used at Google, US National Aeronautic and Space Administration (NASA), US Jet Propulsion Laboratory (JPL), Industrial Light and Magic, YouTube, and other leading entities in industry and public sectors (read more...). Scientists, engineers, and mathematicians often find it well suited for their work. Obtain a more thorough rationale for Sage from Why Sage? and Success Stories, Testimonials and News Articles. Jump start your motivation by taking a Sage Feature Tour right now!
For course history and archive from 2009, 2008 and 2007 please see STAT 218: Computational Methods in Statistics.
This is an interactive SAGE worksheet from the SAGE Notebook and interactive means...
We will embed relevant videos in the SAGE Notebook. To wrap up the ideas so far, See the chief economist at Google talk about statistics being the dream job for 2010's, free virtual academic resources such as The Khan Academy and a rapid introduction to SAGE Notebook in the following three videos.
We will formally present mathematical and statistical concepts in the SAGE Notebook.
\[ \sum_{i=1}^5 i = 1+2+3+4+5=15, \qquad \prod_{i=3}^6 i = 3 \times 4 \times 5 \times 6 = 360 \]
\[\binom{n}{k}:= \frac{n!}{k!(nk)!}, \qquad \lim_{x \to \infty}\exp{(x)} = 0 \]
\[\{\alpha, \beta, \gamma, \delta, \epsilon, \zeta, \mu,\theta, \vartheta, \phi, \varphi, \omega, \sigma, \varsigma,\Gamma, \Delta, \Theta, \Phi, \Omega\}, \qquad \forall x \in X, \quad \exists y \leq \epsilon, \ldots\]
We will use interactive visualisations to convey concepts when possible. See the Taylor Series interact by Harald Schilly below.
However, there is no substitute for taking notes on paper with a pencil or pen. The exam is based on pencilpaper skills!
We will write computer programs within cells in the SAGE Notebook right after we learn the concepts. Thus, there is a significant overlap between lectures and labs in this course. We will interactively visualise and analyse live data with our computer programs in the SAGE Notebook.
Let us visualize the Stock Market data, fetched from Yahoo and Google by William Stein next by placing the cursor in the cell below and clicking the evaluate link at the bottom left of the purple rectangular outline of the cell.
Click to the left again to hide and once more to show the dynamic interactive window 
Let us fetch the CO2 data from US National Oceanic and Atmospheric Association and see it using the program by Marshall Hampton.
Traceback (most recent call last): if a_line.find('Creation:') !=
1:
File "", line 1, in <module>
File "/tmp/tmppIhmeE/___code___.py", line 5, in <module>
co2data =
U.urlopen('ftp://ftp.cmdl.noaa.gov/ccg/co2/trends/co2_mm_mlo.txt').readl\
ines()
File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line
154, in urlopen
return opener.open(url, data, timeout)
File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line
431, in open
response = self._open(req, data)
File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line
449, in _open
'_open', req)
File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line
409, in _call_chain
result = func(*args)
File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line
1412, in ftp_open
fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line
1434, in connect_ftp
persistent=False)
File "/home/sagebuild/sage7.0/local/lib/python/urllib.py", line 875,
in __init__
self.init()
File "/home/sagebuild/sage7.0/local/lib/python/urllib.py", line 887,
in init
self.ftp.cwd(_target)
File "/home/sagebuild/sage7.0/local/lib/python/ftplib.py", line 562,
in cwd
return self.voidcmd(cmd)
File "/home/sagebuild/sage7.0/local/lib/python/ftplib.py", line 254,
in voidcmd
return self.voidresp()
File "/home/sagebuild/sage7.0/local/lib/python/ftplib.py", line 229,
in voidresp
resp = self.getresp()
File "/home/sagebuild/sage7.0/local/lib/python/ftplib.py", line 222,
in getresp
raise error_temp, resp
urllib2.URLError: <urlopen error ftp error: 421Access denied  wrong
user name or password
421 aborted>
Traceback (most recent call last): if a_line.find('Creation:') != 1: File "", line 1, in <module> File "/tmp/tmppIhmeE/___code___.py", line 5, in <module> co2data = U.urlopen('ftp://ftp.cmdl.noaa.gov/ccg/co2/trends/co2_mm_mlo.txt').readlines() File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line 431, in open response = self._open(req, data) File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line 449, in _open '_open', req) File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line 409, in _call_chain result = func(*args) File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line 1412, in ftp_open fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout) File "/home/sagebuild/sage7.0/local/lib/python/urllib2.py", line 1434, in connect_ftp persistent=False) File "/home/sagebuild/sage7.0/local/lib/python/urllib.py", line 875, in __init__ self.init() File "/home/sagebuild/sage7.0/local/lib/python/urllib.py", line 887, in init self.ftp.cwd(_target) File "/home/sagebuild/sage7.0/local/lib/python/ftplib.py", line 562, in cwd return self.voidcmd(cmd) File "/home/sagebuild/sage7.0/local/lib/python/ftplib.py", line 254, in voidcmd return self.voidresp() File "/home/sagebuild/sage7.0/local/lib/python/ftplib.py", line 229, in voidresp resp = self.getresp() File "/home/sagebuild/sage7.0/local/lib/python/ftplib.py", line 222, in getresp raise error_temp, resp urllib2.URLError: <urlopen error ftp error: 421Access denied  wrong user name or password 421 aborted> 
This lab will start by showing you some of the basic numeric capabilities of SAGE.
To start with, we are going to be using SAGE like a handheld calculator. Let's perform the basic arithmetic operations of addition, subtraction, multiplication, division, exponentiation, and remainder over the three standard number systems: Integers denoted by $\mathbb{Z}$, Rational Numbers denoted by $\mathbb{Q}$ and Real Numbers denoted by $\mathbb{R}$. Let us recall the real number line and the basics of number systems next.
Let us get our fingers dirty with some numerical operations in SAGE. Note that anything after a '#' symbol is a comment  comments are ignored by SAGE but help programmers to know what's going on.
Example 1: Integer Arithmetic
Try evaluating the cell containing 1+2 below by placing the cursor in the cell and pressing <SHIFT><ENTER>. Next, modify the expression and evaluate it again. Try 3+4, for instance.
3 3 
1 1 
The multiplication operator is '*', the division operator is '/'.
12 12 

The exponentiation operator is '^'.

Being able to finding the remainder after a division is surprisingly useful in computer programming.

Another way of referring to this is 11 modulus 3, which evaluates to 2. '%' is the modulus operator.
You try
Try typing in and evaluating some expressions of your own. You can get new cells above or below an existing cell by placing the cursor just above or below an existing cell and clicking after you see a purple bar.

What happens if you put space between the characters in your expression, like "1 + 2" instead of "1+2"?.


Example 2: Operator Precedence for Evaluating Arithmetic Expressions
Sometimes we want to perform more than one arithmetic operation with some given integers. Suppose, we want to "divide 12 by 4 then add the product of 2 and 3 and finally subtract 1." Perhaps this can be achieved by evaluating the expression "12/4+2*31"?
But could that also be interpreted as "divide 12 by the sum of 4 and 2 and multiply the result by the difference of 3 and 1"?
In programming, there are rules for the order in which arithmetic operations are carried out. This is called the order of precedence.
The basic arithmetic operations are: +, , *, %, /, ^. The order in which operations are evaluated are as follows:
When operators are at the same level in the list above, what matters is the evaluation order (right to left, or left to right).
Operator precedence can be forced using parenthesis.


Operator precedence can be forced using nested parentheses. When our expression has nested parenthesis, i.e., one pair of parentheses inside another pair, the expression inside the innermost pair of parentheses is evaluated first.

You try
Try writing an expression which will subtract 3 from 5 and then raise the result to the power of 3.

Find out for yourself what we mean by the precedence for exponentiation (^) being from right to left: What do you think the expression 3^3^2 would evaluate to? Is it the same as (3^3)^2 (27 squared) or 3^(3^2) (3 raised to the power 9)? Try typing in the different expressions to find out:



Find an expression which will add the squares of four numbers together and then divide that sum of squares by 4.

Find what the precedence is for the modulus operator % that we discussed above: try looking at the difference between the results for 10%2^2 and 10%2*2 (or 10^2+2). Can you see how Sage is interpreting your expressions?
Note that when you have two operators at the same precedence level (like % and *), then what matters is the order  left to right or right to left. You will see this when you evaluate 10%2*2.



Does putting spaces in your expression make any difference?

The lesson to learn is that it is always good to use the parentheses: you will make it clear to someone reading your code what you mean to happen as well as making sure that the computer actually does what you mean it to!
Try these videos to get some practice if you are rusty with order of operations.
Example 3: Rational Arithmetic
So far we have been dealing with integers. Integers are a type in SAGE. Algebraically speaking, integers, rational numbers and real numbers form a ring. This is something you will learn in a maths course in Group Theory or Abstract Algebra.
<type 'sage.rings.integer.Integer'> <type 'sage.rings.integer.Integer'> 
However, life with only integers is a bit limited. What about values like '1/2'?
This brings us to the rational numbers.
<type 'sage.rings.rational.Rational'> <type 'sage.rings.rational.Rational'> 
Try evaluating the cell containing 1/2+2 below. Next, modify the expression and evaluate it again. Try 1/3+2/4, for instance.

You can do arithmetic with rationals just as we did with integers.




You try
Write an expression which evaluates to 1 using the rationals 1/3 and 1/12, some integers, and some of the arithmetical operators  there are lots of different expressions you can choose, just try a few.

What does Sage do with something like 1/1/5? Can you see how this is being interpreted? What should we do if we really want to evaluate 1 divided by 1/5?


Try adding some rationals and some integers together  what type is the result?

Example 4: Real Arithmetic (multiprecision floatingpoint arithmetic)
Recall that real numbers include integers, rational numbers irrational numbers like the square root of 2, pi and Euler's constant e. Real numbers can be thought of as all the numbers in the real line between negative infinity and positive infinity. Real numbers are represented in decimal format, for e.g. 234.4677878.
We can do arithmetic with real numbers and can combine them with integer and rational types in SAGE. Compare the results of evaluating the expressions below to the equivalent expressions using rationals above.






Technical note: Computers can be made to exactly compute in integer and rational arithmetic. But, because computers with finite memory (all computers today!) cannot represent the uncountably infinitely many real numbers, they can only mimic or approximate arithmetic over real numbers using finitely many computerrepresentable floatingpoint numbers.
You try
Find the type of 1/2.

Try a few different ways of getting the same result as typing ((((1/5) / (1/10)) * (0.1 * 2/5) + 4/100))*5/(3/5)  this exact expression has already been put in for you in the cell below you could try something just using floating point numbers. Then see how important the parentheses are around rationals when you have an expression like this  try taking some of the out.



Example 5: Assignment and variables
In SAGE, the symbol = is the assignment operator. You can assign a numerical value to a variable in SAGE using the assignment operator. This is a good way to store values you want to use or modify later.
(If you have programmed before using a a language like C or C++ or Java, you'll see that SAGE is a bit different because in SAGE you don't have to say what type of value is going to be assigned to the variable.)

Just typing the name of a variable to get the value works in the Sage Notebook, but if you are writing a program and you want to output the value of a variable, you'll probably want to use something like the print command

Variables can be strings as well as numbers. Anything you put inside quote marks will be treated as a string by SAGE.


You can reassign different values to variable names. Using SAGE you can also change the type of the values assigned to the variable (not all programming languages allow you to do this)



You can assign values to more than one variable on the same line, by separating the assignment expressions with a semicolon ";". However, it is usually best not to do this because it will make your code easier to read (it is hard to spot the other assignments on a single line after the first one).

You try
Try assigning some values to some variables  you choose what values and you choose what variable names to use. See if you can print out the values you have assigned.


Assign the value 2 to a variable named x
On the next line down in the same cell, assign the value 3 to a variable named y
Then (on a third line) put in an expression which will evaluate x + y

Now try reassigning a different value to x and reevaluating x + y


Example 6: Truth statements and Boolean values
Consider statements like "Today is Friday" or "2 is greater than 1" or " 1 equals 1": statements which are either true or not true (i.e., false). SAGE has two values, True and False which you'll meet in this situation. These value are called Booleans values, or values of the type Boolean.
In SAGE, we can express statements like "2 is greater than 1" or " 1 equals 1" with relational operators, also known as value comparison operators. Have a look at the list below.
Lets try some really simple truth statements.

Let us evaluate the following statement.

We can use these operators on variables as well as on values. Again, try assigning different values to x and y, or try using different operators, if you want to.

Note that when we check if something equals something else, we use "==", a double equals sign. This is because '=', a single equals sign, is the assignment operator we talked about above. Therefore, to test if x equals y we can't write 'x = y' because this would assign y to x, instead we use the equality operator '==' and write 'x == y'.
We can also assign a Boolean value to a variable.


If we want to check if two things are not equal we use '!='. As we would expect, it gives us the opposite of testing for equality

You try
Try assigning some values to two variables  you choose what values and you choose what variable names to use. Try some truth statements to check if they are equal, or one is less than the other.


Try some strings (we looked at strings briefly in Example 5 above). Can you check if two strings are equal? Can you check if one string is 'less than' (<) another string. How do you think that Sage is ordering strings (try comparing "fred" and "freddy", for example)?



Example 7: Mathematical constants
Sage has reserved words that are defined as common mathematical constants. For example, pi and e behave as you expect. Numerical approximations can be obtained using the .n()method, as before.


Let us learn elementary set theory. Sets are perhaps the most fundamental concept in mathematics.
Set is a collection of distinct elements. We write a set by enclosing its elements with curly brackets. Let us see some example next.
The set that contains no elements is the empty set. It is denoted by $$\boxed{\emptyset = \{\}} \ .$$
We say an element belongs to or does not belong to a set with the binary operators $$\boxed{\in \ \text{or} \ \notin} \ .$$ For example,
We say a set $C$ is a subset of a set $D$ and write
$$\boxed{C \subset D}$$
if every element of $C$ is also an element of $D$. For example,
We can add distinct new elements to an existing set by union operation denoted by $\cup$ symbol. For example
More formally, we write the union of two sets $A$ and $B$ as $$\boxed{A \cup B = \{x: x \in A \ \text{or} \ x \in B \}} \ .$$
The symbols above are read as "$A$ union $B$ is equal to the set of all $x$ suct that $x$ belongs to $A$ or $x$ belongs to $B$" and simply means that $A$ union $B$ or $A \cup B$ is the set of elements that belong to $A$ or $B$.
Similarly, the intersection of two sets $A$ and $B$ written as $$\boxed{A \cap B = \{x: x \in A \ \text{and} \ x \in B \}} $$ means $A$ intersection $B$ is the set of elements that belong to both $A$ and $B$.
For example
The set difference of two sets $A$ and $B$ written as $$\boxed{A \setminus B = \{x: x \in A \ \text{and} \ x \notin B \}} $$ means $A \setminus B$ is the set of elements that belong to $A$ and not belong to $B$.
For example
The equality of two sets $A$ and $B$ is defined in terms of subsets as follows: $$\boxed{A = B \quad \text{if and only if} \quad A \subset B \ \text{and} \ B \subset A} \ .$$
Two sets $A$ anb $B$ are said to be disjoint if $$\boxed{ A \cap B = \emptyset} \ .$$
Given a universal set $\Omega$, we define the complement of a subset $A$ of the universal set by $$\boxed{A^c = \Omega \setminus A = \{x: x \in \Omega \ \text{and} \ x \notin A\}} \ .$$
Let us gain more intuition by seeing the unions and intersections of sets interactively. The following interact is from interact/misc page of Sage Wiki.
Click to the left again to hide and once more to show the dynamic interactive window 
Now we'll look at how we can create and manipulate sets in SAGE.
Example 1: Making sets
In SAGE, you do have to specifically say that you want a set when you make it .








We can add new elements to a set.

But remember from lectures that sets contain distinct elements.

You try
Try making the set Z={4,5,6,7} next. The instructions are in the two cells below


Make a set with the value 2/5 (as a rational) in it. Try adding 0.4 (as a floating point number) to the set. Does Sage do what you expect?


Example 2: Subsets
In lectures we talked about subsets of sets.
Recall that Y is a subset of X if all the elements in Y are also in X.

If you have time: We say Y is a proper subset of X if all the elements in Y are also in X but there is at least one element in X that it is not in Y. If X is a (proper) subset of Y, then we also say that Y is a (proper) superset of X.




Example 3: More set operations
Now let's have a look at the other set operations we talked about in lectures: intersection, union, and difference.
Recall that the intersection of X and Y is the set of elements that are in both X and Y.

The union of X and Y is the set of elements that are in either X or Y.

The set difference between X and Y is the set of elements in X that are not in Y.

You try
Try some more work with sets of strings below.


Fruit and colours are different to us as people, but to the computer, the string 'orange' is just the string 'orange' whether it is in a set called fruit or a set called colours.

Try a few other simple subset examples  make up your own sets and try some intersections, unions, and set difference operations. The best way to try new possible operations on a set such as X we just created is to type a period after X and hit <TAB> key. THis will bring up all the possible methods you can call on the set X.





Infact, there are two ways to make sets in SAGE. We have so far used the python set to make a set. However we can use the SAGE Set to maka sets too. SAGE Set is more mathematically consisitent. If you are interested in the SAGE Set go to the source and work through the reference on sets. But, first let us appreciate the difference between set and Set!







After the tutorial, a few questions came up about arithmetical operators, numbers, and sets. Here are some short comments and demonstrations of things we did not cover in the tutorial.
Operator precedence
We gave a list for the precence of basic arithmetic operators:
Note that When operators are at the same level in the list above, what matters is the evaluation order (right to left, or left to right).
This means that if we have the expression 10%2*2, this is the equivalent of (10%2)*2 = 0*2 = 0 because the modulus operator % is at the same precedence level as the multiplication operator * and the order of evaluation is left to right.

However, if we have 10%2^2 then this is the equivalent of 10%(2^2) because the exponentiation operator ^ is at a higher level than the modulus operator %.

Sage real literals and Python floating point numbers
You don't need to know this but it might be interesting if you are into computing ...
We showed how you can find the type of a number value and we demonstrated that by default, Sage makes 'real' numbers like 3.1 into Sage real literals. This is an extrafancy number type Sage developed because Sage is used extensively by mathematicians and others who want their number types to have very specific and 'mathy' properties. If you were just using Python (the programming language underlying most of Sage) then a value like 3.1 would be a floating point number or float type. Python has some interesting extra operators that you can use with Python floating point numbers, which also work with the Sage rings integer type but not with Sage real literals.







One of the differences of Sage rings integers to plain Python integers is that result of dividing one Sage rings integer by another as a rational. This probably seems very sensible, but it is not what happens at the moment with Python integers.

In doing this, Python is similar to many other programming languages (like C, C++, Java ...) but it can be a bit unexpected unless you are used to it. Python is however changing (see this article for the gory details if you are really interested...). For the case when people actually want what they have called "floor division" (essentially getting the whole number part of the result of the division), then there is a special floor division operator //. Again, this works with Sage rings integers (any Python integers) and floats but not with Sage real literals.



Python also provides a rather nice function for getting both the whole number and the remainer parts of the division at once //. You guessed it .. this works with Sage rings intgers (any Python integers) and floats but not with Sage real literals.



In the tutorial we showed the .n method. If X is some Sage real literal and we use X.n(20) we will be asking for 20 bits of precision, which is about how many bits in the computer's memory will be allocated to hold the number. If we ask for X.n(digits=20) will be asking for 20 digits of precision, which is not the same thing. Also note that 20 digits of precision does not mean showing the number to 20 decimal places, it means all the digits including those in front of the decimal point.


If you want to actually round a number to a specific number of decimal places, you can also use the round(...) function.


Sets
In class we talked about sets and in the tutorial a few questions came up about the order in which Sage displays the elements in a set. The key is to remember that sets are unordered: a set {1, 2, 3} is the same as the set {2, 1, 3} is the same as the set {3, 1, 2} ... (soon we will talk about ordered collections like lists where order is important  for the moment just remember that a set is about collections of unique values or objects). Remember also that the (lower case s) set is a python type (in contrast to the 'more mathy' Sage Set with a capital S). For many 'nonmathy' computing purposes, what matters about a set is the speed of being able to find things in the set without making it expensive (in computer power) to add and remove things, and the best way of doing this (invisible to the programmer actually using the set) is to base the set on a hash table. Don't worry if you don't understand a word of that  you don't need to know it for this course  but I think that in practical terms what is does mean is that the ordering that Sage uses to actually display a set is related to the hash values it has given to the elements in the set which may be totally unrelated to what you or I would think of as any obvious ordering based on the magnitude of the values in the set, or the order in which we put them in, etc. Remember, you don't need to know anything about hash values and the 'underthehood' construction of sets for this course  this explanation is just to give a bit more background on something that some students commented on in the tutorial.
(However, if you are interested, you could have a look at this link for a bit more about the technical aspects of sets.)


Python also provides something called a frozenset, which you can't change like an ordinary set


You can find more about the Python set and frozenset types in the Python builtin types documentation (the sets stuff is just over half way down that page).
