©2009 Jennifer Harlow, Dominic Lee and Raazesh Sainudiin.
Creative Commons Attribution-Noncommercial-Share Alike 3.0
This is an interactive Sage worksheet from the Sage Notebook GUI.
This lab is split into two parts. In the first part we will work through, as a class, the things that you need to know. This should take half of the lab time. In the second half of the lab you will continue at your own pace and we will be there to help with anything you want to ask about. You can spend this time going back over the things we covered in the first half to make sure that you understand them properly. If you are completely happy with the first part we have provided some more optional material for you to work on in the lab and on your own.
The functions for obtaining and manipulating earthquakes data in this section are based on ideas in 5.2.5 in Miller & Ranum, Python: Programming in Context (2009). The source for New Zealand earthquake data is http://magma.geonet.org.nz/resources/quakesearch/.
We first define a function that can get our data for us into a friendly array from a text file containing it.
We have now defined a function getData(...) which we can use to read data from a file into an array |
Let us assign the comma-separated-variable files named NZearthquakes1Jul08to6Aug08.csv and NZearthquakes1Jul09to6Aug09.csv that contain NZ earthquakes in the time intervals [1/July/2008 , 6/August/2008] and [1/July/2009 , 6/August/2009] to the string variables myFilename2008 and myFilename2009, respectively.
|
|
Now we assign the array returned by getData with appropriate arguments to myData2008 and myData2009.
|
|
What is exactly in the arrays myData2008 and myData2009 just returned by getData? Let's find out, shall we?
array([['2931322', '-37.88412', '177.87352', ..., '37.77383',
'2.421',
'63.9307\r\n'],
['2931325', '-37.26902', '176.51445', ..., '38.14913',
'3.34',
'220.2202\r\n'],
['2931332', '-39.47651', '175.69447', ..., '59.8815',
'1.839',
'17.9686\r\n'],
...,
['2949711', '-40.7494', '174.62494', ..., '40.23068',
'2.217',
'12\r\n'],
['2949714', '-38.51364', '176.21767', ..., '36.03949',
'2.96',
'171.0209\r\n'],
['2949715', '-41.38802', '172.34175', ..., '25.78879',
'2.932',
'5\r\n']],
dtype='|S10')
|
array([['3117514', '-44.99823', '168.53069', ..., '12.58069',
'1.863',
'5\r\n'],
['3117515', '-47.38385', '165.79785', ..., '19.09013',
'2.744',
'33\r\n'],
['3117537', '-40.38247', '176.09354', ..., '37.45829',
'2.553',
'33.809\r\n'],
...,
['3134964', '-40.78112', '174.41324', ..., '11.8528',
'4.322',
'50.269\r\n'],
['3134966', '-39.41886', '175.83319', ..., '48.71931',
'2.272',
'53.3487\r\n'],
['3134981', '-39.78736', '176.8123', ..., '53.7128', '2.241',
'45.7776\r\n']],
dtype='|S10')
|
<type 'numpy.ndarray'> |
array(['3117514', '-44.99823', '168.53069', '2157734', '5569493',
'2009',
'7', '1', '0', '59', '12.58069', '1.863', '5\r\n'],
dtype='|S10')
|
'3117514' |
<type 'numpy.string_'> |
|
|
|
|
It is necessary to do error checks on raw data. If you expect data in the real world to be pre-checked for you then you are in the wrong profession. Without careful error checks you cannot analyse data. We need two functions safeFloat and safeInt. See the docstrings for these functions now.
We have now defined a function safeFloat(...) which we can use to try to turn strings into floats |
We have now defined a function safeInt(...) which we can use to try to turn strings into floats |
Let us next define a function that will return the magnitudes of the earthquakes from our data array. See the docstring of the function makeMagList now.
We have now defined a function makeMagList(...) which we can use to process some earthquake data |
Let us assign the list of magnitudes returned by makeMagList(myData2008) to listMags2008 and by makeMagList(myData2009) to listMags2009.
Ignored row 48 error diagnosis empty string for float() Ignored row 148 error diagnosis empty string for float() Ignored row 150 error diagnosis empty string for float() Ignored row 275 error diagnosis empty string for float() Ignored row 291 error diagnosis empty string for float() Ignored row 305 error diagnosis empty string for float() Ignored row 311 error diagnosis empty string for float() Ignored row 346 error diagnosis empty string for float() Ignored row 348 error diagnosis empty string for float() Ignored row 352 error diagnosis empty string for float() Ignored row 425 error diagnosis empty string for float() Ignored row 457 error diagnosis empty string for float() Ignored row 517 error diagnosis empty string for float() Ignored row 548 error diagnosis empty string for float() Ignored row 597 error diagnosis empty string for float() Ignored row 602 error diagnosis empty string for float() Ignored row 604 error diagnosis empty string for float() Ignored row 779 error diagnosis empty string for float() Ignored row 797 error diagnosis empty string for float() Ignored row 950 error diagnosis empty string for float() Ignored row 1048 error diagnosis empty string for float() Ignored row 1073 error diagnosis empty string for float() Ignored row 1094 error diagnosis empty string for float() Ignored row 1296 error diagnosis empty string for float() Ignored row 1366 error diagnosis empty string for float() Ignored row 1496 error diagnosis empty string for float() Ignored row 1506 error diagnosis empty string for float() Ignored row 1519 error diagnosis empty string for float() Ignored row 1532 error diagnosis empty string for float() Ignored row 1536 error diagnosis empty string for float() Ignored row 1627 error diagnosis empty string for float() Ignored row 1707 error diagnosis empty string for float() Ignored row 1712 error diagnosis empty string for float() Ignored row 1718 error diagnosis empty string for float() Ignored row 1769 error diagnosis empty string for float() Ignored row 1771 error diagnosis empty string for float() Ignored row 1777 error diagnosis empty string for float() Ignored row 1797 error diagnosis empty string for float() Ignored row 1818 error diagnosis empty string for float() Ignored row 1850 error diagnosis empty string for float() Ignored row 1871 error diagnosis empty string for float() Ignored row 1909 error diagnosis empty string for float() Ignored row 2066 error diagnosis empty string for float() |
Ignored row 45 error diagnosis empty string for float() Ignored row 79 error diagnosis empty string for float() Ignored row 127 error diagnosis empty string for float() Ignored row 230 error diagnosis empty string for float() Ignored row 442 error diagnosis empty string for float() Ignored row 570 error diagnosis empty string for float() Ignored row 607 error diagnosis empty string for float() Ignored row 638 error diagnosis empty string for float() Ignored row 720 error diagnosis empty string for float() Ignored row 741 error diagnosis empty string for float() Ignored row 801 error diagnosis empty string for float() Ignored row 825 error diagnosis empty string for float() Ignored row 888 error diagnosis empty string for float() Ignored row 921 error diagnosis empty string for float() Ignored row 1281 error diagnosis empty string for float() Ignored row 1299 error diagnosis empty string for float() Ignored row 1708 error diagnosis empty string for float() |
Let us visualise the frequency of magnitudes in our data from 2008 (try 2009 too) over ten intervals using a histogram.
|
Note that this choice of ten interval bins over the range of magnitudes was an arbitrary choice. It is true that the relative frequency of the magnitudes over any interval is a consistent point estimate of the probability of an earthquake having a magnitude in that interval under an IID model of course. But, the shape of the histogram bar heights are not automatically a good estimate of the underlying density. This is because the histogram is isnsitive to the number of interval bins that are being used to construct it. This problem is called smoothing. So, without the right number of bins we may be under-smoothing or over-smoothing the histogram and therefore unable to get a good nonparametric point estimate of the underlying probability density function. Let us visualise the frequency of magnitudes in our data from 2009 (try 2008 too) over a range of bin numbers using to interactively appreciate the over/under-smoothing problem.
|
|
|
|
1703 |
<type 'float'> |
1301 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We have now defined a function makeQuakeTimes(...) which we can use to process some earthquake data |
|
|
|
|
Omitted row with latitude -37.53203 longititude -179.80537 Omitted row with latitude -36.52448 longititude -179.99582 |
0.49965718630686601 |
0.95531164430209203 |
|
|
|
Omitted row with latitude -37.22483 longititude -179.88422 Omitted row with latitude -36.28675 longititude -179.25929 Omitted row with latitude -37.11357 longititude -179.92047 Omitted row with latitude -37.52361 longititude -179.95947 |
0.42090492010757791 |
0.49867108150408451 |
|
Try to visualise the locations of earthquake epicenters between the two years. Recall the procedures from previous lab. The source for New Zealand earthquake data is http://magma.geonet.org.nz/resources/quakesearch/.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|