©2009 Jennifer Harlow, Dominic Lee and Raazesh Sainudiin.
Creative Commons Attribution-Noncommercial-Share Alike 3.0
This is an interactive Sage worksheet from the Sage Notebook GUI.
Example 1: A revision on functions
We are going to be writing quite a few of our own functions in this Lab, so we will start with a brief revision example on defining functions.
In SAGE we can define our own functions. You can think of functions as reusable procedures for doing useful things that you might want to have available in lots of other, more specialised, sections of code. The beauty of a function is that it is a way of generalising your program instructions: When you want to use a function you pass it the value or values you want it to use and (usually - see below) get back an answer which has been calculated using those values.
Think of writing some code to sort out who out of your classmates can join your basketball team. It is a very politically incorrect heightist team, unfortunately, so only people 1.5m tall or more can join, and also they have to be available on Tuesday nights. You've got this information in the form of records for each person (note the use of tuples for the records).
|
|
Each classmate has a record encoded as a tuple containing 3 pieces of information (name, height in metres, and availability on Tuesday evenings) about the classmate it represents.
Now we want to find out who should be in our team. We can index into the tuple (remember the indexing operator [ ])to get at the values in the first, second and third positions (for example, myClassmate1[0]gets the value at the first position of the tuple myClassmate1), and we could 'hard-code' the whole thing:
|
|
As you can see, that is repetitive to code, boring, and very inflexible. In fact, whenever you see repeated lines in your code that seem to be doing the same basic thing with different values, it is a good signal that it is time to write a function.
Here is an expanded version of the information we gave you in Lab 2 about defining a function:
The function name is preceded by the keyword def for definition.
After the function name you put the function parameters within a pair of parentheses ( ).
The function parameters are variable names that are used within the body of the function, do with them whatever it is you want the function to do. We'll come back to them in a minute....
After we have defined the function by its name followed by its parameters in parentheses, we end the line with a colon : before continuing to the next line.
Now, we are ready to write the body of the function. This must be indented and it is customary to leave 4 white spaces (as we have with all our other indentations). SAGE does this for you automatically if you have ended the line above with a colon :.
It is a matter of courteous programming practice to have a docstring for your function, i.e., comments on what the function does. The Docstring is returned when we ask SAGE for help on the function.
Inside the body of the function we do whatever it is we want to do. Most of the time, but not always, we want the function to tell us the result of these calculations, i.e. we want the function to return something when it has finished. We do this with the keyword return followed by the expression to be returned..
In the SAGE Notebook you can define a function in one cell, evalutate it, and then use the function in other cells further down. More generally, if you want to put more code below the body of the function (i.e., code that is not part of the function body) you have to move the start of the line backwards from the indentation that indicated the function body, so that it aligns with the def keyword that started the function definition.
|
|
|
|
Now you need to really think about what is going on here. The function has parameters classmate and minHeight. These are named in the parentheses following the def keyword and the function name printInTeam. The code inside the body of the function uses these names in its calculations.
When we define the parameters, we are defining variable names that the function knows about and can use. This is what makes the function an efficient way of dealing with different actual values for classmate and minHeight. Notice that by having minHeight as a parameter, rather than hard-coded as 1.5 inside the function body, we make our function even more flexible: Not only can it deal with lots of different classmates, but we could also use exactly the same function if we changed our minds about the minimum height.
|
|
What is happening when our code has a line like "printInTeam(myClassmate1, mh)"? We are calling the function printInTeam and passing it the arguments myClassmate1 and mh. In this case these arguments are variable names that we have assigned to values.
When the function is called like this, it assigns its own variable names specified by the parameter names to values based on the arguments passed in. classmate is assigned to the same value (in this case, a tuple) as myClassmate1 and minHeight is assigned to the same value as mh.
In the next line, printInTeam(myClassmate2, mh), the classmate parameter variable name is assigned to the same value (in this case, a tuple) as myClassmate2 (and minHeight is again assigned to the same value as mh). The same happens for each of the other classmates in turn.
You'll notice that the printInTeam(...) function does not actually return anything, it just prints out a line. It has given us a bit more flexibility, but what if we don't want to print things out? We might want to use this function in the middle of a huge program that deals with lots and lots of classmate tuples to make lots of different teams and thousands of lines of output are not going to be helpful.
What about changing the function to return something? We could return a boolean value, ie true or false, to say whether the person tuple passed as the argument to the function meets our criteria.
|
|
Lets look at the this alternative. Functions that return boolean values are often given a name that starts with is... It's just a kind of convention and it means that you, as a programmer, can just see a function name like isInTeam(...) and, without even looking at the docstring, be pretty certain that it is going to return a boolean value, i.e. true or false.
Here we can see the keyword return, followed by the expression for the value we want to return. In this case we are returning the value assigned to a variable named retValue.
Note that this function uses a kind of shortcut to do the if.. else calculation that printInTeam() did. It specifies a default value for what it is going to return (we have given this variable name retValue). The default is false. The function body only changes default to true if the team membership criteria are satisfied. Doing this is a good way of programming for two reasons: First, in this case it saves writing the 'else' block and, in general, shorter code is faster and easier to understand. Secondly, it makes sure that we return a safe default value no matter how complex our function then gets.
|
|
What we are doing here is calling the function successively for each classmate. For each function call, a value (true or false) is being returned. However, the SAGE notebook only gives the value of the last expresssion evaluated, which in the cell above is isInTeam(myClassmate3, mh).
Our new function also does not give us the names, so why is it better than printInTeam()? Because, by returning a boolean value, it is more flexible than a function that just prints something out.
Lets create the kind of situation we might have if we had lots of classmate record tuples to deal with. We'd have them in some kind of collection, say a list (maybe an array, but we'll use list here):
|
|
classmateList is a list of tuples, each tuple containing 3 pieces of information (name, height in metres, and availability on Tuesday evenings) about the classmate it represents. Now say we wanted the same kind of print out that we had before. We can use a for loop on the list, like this:
|
|
The if clause of the for loop is calling the function isInTeam(...) and using the boolean value that is returned to decide whether to execute the if block code (print cm[0], "is in the team") or the else block code (print cm[0], "cannot be in the team").
cm is the name we have used for the loop variable, and each value (i.e., tuple) in the list is assigned to it in turn. cm[0] indexes into the tuple the loop variable cm is currently assigned to and gets the name of the person that tuple represents.
But we can also use the same function in lots of different ways. What about using it to actually make our team?
|
|
We can then ask SAGE to tell us what our team is:
|
|
You may also have seen that we could make this code shorter by just using a list comprehension::
|
|
If we had lots of different teams to make, we'd have to repeat this code a lot of times too. That's not good, but in that situation we would define a function that uses isInTeam(...) to make teams.
|
|
|
|
The function makeTeam(...) has two parameters, clist and minValue. It uses a list comprehension to make the team list. The list comprehension calls the function isInTeam(...), as described above for the for loop.
Note that because the list comprehension allows us to make the team list with a single expression, the function body only has to have the return keyword and the list comprehension expression that makes the list to return (if you used the for loop method the function body would have to first make the list using the for loop and then return the list). Most functions do have more than one line in the function body, but occasionally a one-line function is all you need.
You should now understand a bit better what we said at the start of the lab about functions. They allow us to write flexible, reusable code, i.e. to avoid repetition and build larger blocks from smaller blocks. There is one other use for functions even if you don't think that you'll want to use the function in lots of different circumstances. This is just being able to break down your code into manageble blocks (functions), each of which fulfils a clear, self-contained purpose. This is described as modular programming. This kind of code is much easier to read and understand and maintain. You will also almost inevitably find that you reuse key blocks of code more than you thought you would when you first wrote it, so it's useful to make nice functions right from the start. Function parameters allow us to pass arguments to functions, and the function usually passes back, or returns, the result of its calculations. You can nest function calls within functions - that is what building up from your blocks is all about.
Think about how you could use a similar function to take lots of tuple records on people and calculate statistics like the proportion of a class who are over a certain height. Then think a bit wider - if you have some sort of data, you can use a function to do some analysis on the data and then summarise the results.
Example 2: The Unifom(0,1) Random Variable
Recall that the Uniform(0,1) random variable is the fundamental model as we can transform it to any other random variable, random vector or random structure. The PDF f and DF F of X \sim Uniform(0,1) are:
f(x) = \begin{cases} 0 & \text{if} \ x \notin [0,1] \\ 1 & \text{if} \ x \in [0,1] \end{cases}
F(x) = \begin{cases} 0 & \text{if} \ x < 0 \\ 1 & \text{if} \ x > 1 \\ x & \text{if} \ x \in [0,1] \end{cases}
We use the Mersenne twister pseudo-random number generator to mimic independent and identically distributed draws from the uniform(0,1) RV.
random() |
0.78936494292946802 |
[0.67222825545249176, 0.83903559995064014, 0.78275085347771289] |
[0.98142074805915858, 0.92795736068071921, 0.0068193102833066233] |
[0.077606477890307946, 0.32538553422543337, 0.53712793508137602] |
[0.75101715871455732, 0.46689825924970674, 0.65162159443316914] |
|
|
|
|
[0.64096933778601184, 0.94335465796599494, 0.51471298033567725] |
256526L |
[0.64096933778601184, 0.94335465796599494, 0.51471298033567725] |
256526L |
[0.31709291282858576, 0.47750941216422715, 0.24185704293897292] |
2676676766L |
|
|
![]() ![]()
|
|
|
Example 3: The Uniform(\theta_1,\theta_2) Random Variable
Recall that the PDF f(x;\theta_1,\theta_2) and DF F(x;\theta_1,\theta_2) of X \sim Uniform(\theta_1,\theta_2) is:
f(x;\theta_1,\theta_2) = \begin{cases} 0 & \text{if} \ x \notin [\theta_1,\theta_2] \\ 1/(\theta_2 - \theta_1) & \text{if} \ x \in [\theta_1,\theta_2] \end{cases}
F(x; \theta_1,\theta_2) = \begin{cases} 0 & \text{if} \ x < \theta_1 \\ 1 & \text{if} \ x > \theta_2 \\ (x-\theta_1)/(\theta_2 - \theta_1) & \text{if} \ x \in [\theta_1,\theta_2] \end{cases}
We are going to write some functions to help us to do inversion sampling from the Uniform(\theta_1,\theta_2) RV:
This function codes the inverse of the CDF of a uniform distribution parameterised by theta1 and theta2. Given a value between 0 and 1 for the parameter u, it returns the height of the inverse CDF at this point, i.e. the value in the range theta1 to theta2 where the CDF evaluates to u.
|
|
|
|
|
|
[[x == 5, y == 1]] |
[x == -(theta1 - theta2)*u + theta1] |
|
|
Then we can use this function, which takes a single u and returns a single inverse CDF value, inside another function to generate a number of samples. Naturally, this function will be defined with a parameter for the number of values we want and when we call it we will pass in values for the parambers theta1, theta2, and n.
|
|
What is happening here? We are using a list comprehension and the built-in SAGE random() function to make a list of random numbers in the interval [0, 1]. The length of the list is determined by the value of n. Inside the body of the function we assign this list to a variable named us (i.e., u plural). We then use another list comprehension to make our simulated sample. This list comprehension works by calling our function uniformFInverse(...) and passing in values for theta1 and theta2 together with each u in us in turn.
We can demonstrate this with some values for the uniform parameters and the number of samples to simulate.
[-1.9155177315140861, -3.1238919198537882, -1.0396023491097184, 0.023590481951211828, 2.6541256073768018, 2.8737192266671627, 1.0504652628234998, 0.79930568097353216, -4.1762713054375009, 1.0474183778088957, 2.9129925944531347, -2.1478416772184761, -2.6548474694463096, 3.5613779755054829, -3.773111626708201, -3.1861291897364907, -1.4818362515426866, 2.6820325271719057, -2.0246665889158755, 1.5248879917178488, 3.4152326612117179, 3.5350273631754821, 1.5523726228930892, 0.27701234470572711, 0.17917950427028195, 0.21095407508888542, 0.96134368059568054, 1.7805714079564225, -1.1782668877562976, -2.9402062789262851] |
Much more fun, we can make an interactive plot which uses the uniformSample(...) function to generate and plot while you choose the parameters and number to generate
|
|
Example 4: The Exponential(\lambda) Random Variable
For a given \lambda > 0, an Exponential(\lambda) Random Variable has the following PDF f and DF F:
f(x;\lambda) = \lambda e^{-\lambda x}
F(x;\lambda) = 1 - e^{-\lambda x}
We can code some functions for the PDF and DF of an exponential parameterised by lam like this. [Note that we cannot use the name 'lambda' for the parameter because in SAGE (and Python), the term 'lambda' has a special meaning.]
|
|
|
|
You can see the shapes of the PDF and CDF for different values of \lambda using the interactive plot below.
|
|
We are going to write some functions to help us to do inversion sampling from the Exponential(\lambda) RV. Recall that that we said that the Uniform(0,1) random variable is the fundamental model as we can transform it to any other random variable, random vector or random structure. The function below shows how we can transform a Uniform(0,1) random variable into an Exponential(\lambda) random variable using the inverse CDF. We can obtain this inverse CDF by solving by hand or by using the solve function as follows.
[x == log(-1/(u - 1))/lam] |
[x == -log(-u + 1)/lam] |
The function exponentialFInverse(u, lam) codes the inverse of the CDF of an exponential distribution parameterised by lam. Given a value between 0 and 1 for the parameter u, it returns the height of the inverse CDF of the exponential distribution at this point, i.e. the value where the CDF evaluates to u. The function exponentialSample(n, lam) uses exponentialFInverse(...) to simulate n samples from an exponential distribution parameterised by lam. Note how it does this: it generates n uniform random samples and then transforms each one into an exponential sample using exponentialFInverse(...).
|
|
We could try using a similar interactive plot to that for the uniform random samples for the exponential. We can see that we seem to get less of the higher values, but it a bit hard to relate this to the shape of the exponential PDF, so we also visualise the sample as a histogram:
|
|
Example 5: The Cauchy(\lambda) Random Variable
A standard Cauchy Random Variable has the following PDF f and DF F:
f(x;\lambda) = \frac{1}{\pi(1+x^2)}
F(x;\lambda) = \frac{1}{\pi}arctan(x) + 0.5
We can code some functions for the PDF and DF of the standard Cauchy like this.
|
|
|
|
You can see the shapes of the PDF and CDF using the plot below.
|
|
We can perform inversion sampling on the Cauchy RV by transforming a Uniform(0,1) random variable into a Cauchy random variable using the inverse CDF.
The function cauchyFInverse(u) codes the inverse of the CDF of the standard Cauchy distribution. Given a value between 0 and 1 for the parameter u, it returns the height of the inverse CDF of the standard cauchy at this point, i.e. the value where the CDF evaluates to u. The function cauchySample(n) uses cauchyFInverse(...) to simulate n samples from a standard Cauchy distribution.
|
|
And we can visualise these simulated samples with an interactive plot:
|
|
Example 6: The Bernoulli(\theta) Random Variable
The Bernoulli(\theta) RV X with PMF f(x;\theta) and DF F(x;\theta) parameterised by some real \theta\in [0,1] is a discrete random variable with only two possible outcomes.
f(x;\theta)= \theta^x (1-\theta)^{1-x} \mathbf{1}_{\{0,1\}}(x) =
\begin{cases}
\theta & \text{if} \ x=1,\\
1-\theta & \text{if} \ x=0,\\
0 & \text{otherwise}
\end{cases}
F(x;\theta) =
\begin{cases}
1 & \text{if} \ 1 \leq x,\\
1-\theta & \text{if} \ 0 \leq x < 1,\\
0 & \text{otherwise}
\end{cases}
Here are some functions for the PMF and DF for a bernoulli:
|
|
We can see the effect of varying \theta interactively:
|
|
We can simulate a sample from a bernoulli distribution by transforming input from a Uniform(0,1) distribution using the floor() function in SAGE. Recall that \lfloor x \rfloor, the 'floor of x' is the largest integer that is smaller than or equal to x. For example, \lfloor 3.8 \rfloor = 3.
3 |
Using floor, we can do inversion sampling from the Bernoulli(\theta) RV using the the Uniform(0,1) random variable that we said is the fundamental model.
The function bernoulliFInverse(u, theta) codes the inverse of the CDF of a bernoulli distribution parameterised by theta. The function bernoulliSample(n, theta) uses bernoulliFInverse(...) to simulate n samples from a bernoulli distribution parameterised by theta.
|
|
Lets try a small number of samples:
[1, 0, 1, 0, 0, 1, 0, 0, 0, 0] |
Now lets explore the effect of interactively varying n and \theta:
|
|
Example 6: The equi-probable de Moivre(\theta) Random Variable
The de~Moivre(\theta_1,\theta_2,\ldots,\theta_k) RV is the natural generalisation of the Bernoulli (\theta) RV to more than two outcomes. Take a die (i.e. of of a pair of dice): there are 6 possible outcomes from tossing a die if the die is a normal six-sided one (the outcome is which face is the on on the top). To start with we can allow the possibility that the different faces could be loaded so that they have different probabilities of being the face on the top if we throw the die. In this case, k=6 and the parameters \theta_1, \theta_2, ...\theta_6 specify how the die is loaded, and the number on the upper-most face if the die is tossed is a de Moivre random variable parameterised by \theta_1,\theta_2,\ldots,\theta_6. If \theta_1=\theta_2=\ldots=\theta_6= \frac{1}{6} then we have a fair die.
Here are some functions for the equi-probable de Moivre PMF and CDF where we code the possible outcomes as the numbers on the faces of a k-sided die, i.e, 1,2,...k.
|
|
Use the interactive plots to look at the PMF and CDF for different values of k.
|
|
Because we are defining our outcomes as 1, 2, ... k, we use \lceil \rceil or the ceiling function in our inversion sampler.
|
|
A small sample:
[4, 6, 3, 3, 1, 4, 2, 2, 1, 5] |
Interactive sampling:
|
|
|
|
|
|
Congratulations: this is the end of the essential section of this worksheet. You have covered everything you need to know for this lab session.
You can use the rest of the lab time to go back over the information we have covered above to make sure that it really makes sense to you. Remember, everything up to this point is essential for passing the course (getting a C grade). If you have more time read the help docs for various new functions and try the examples in them.
Try the various examples in sag.misc.randstate
|
|
|
|
|
|
|
|