What can you do with Mathematics and Statistics?

318 days ago by raazesh.sainudiin

What can you do with some understanding of Mathematics and Statistics? 

by Raazesh Sainudiin and Miguel Moyers-Gonzalez to chelate feeds from James Degnan and Igor Rychkov and ...
for students at University of Canterbury,
Christchurch, New Zealand, who want to be truly challenged and help shape the future.  Paul Brouwers improved html typesetting.

Let us first hear what Google's chief economist has to say about Statistics and Data analysis (Don't forget that Mathematics is the language of Statistics!), shall we?
 
       

Mathematics: You can use the language of numbers to understand real-world phenomena

Statistics: You can analyse information to help take an action in the real-world.

Some examples are...






Example 1:   Environmontal Statistics; CO2 data from US National Oceanic and Atmospheric Association

Let us fetch the CO2 data and see it using the program by Marshall Hampton.

Let's grab the data live, shall we?

%hide #%auto # Author: Marshall Hampton - needs deprecated Warnings fixed - just removed linear regression for now import urllib2 as U import scipy.stats as Stat co2data = U.urlopen('ftp://ftp.cmdl.noaa.gov/ccg/co2/trends/co2_mm_mlo.txt').readlines() datalines = [] for a_line in co2data: if a_line.find('Creation:') != -1: cdate = a_line if a_line[0] != '#': temp = a_line.replace('\n','').split(' ') temp = [float(q) for q in temp if q != ''] datalines.append(temp) trdf = RealField(16) @interact def mauna_loa_co2(start_date = slider(1958,2011,1,1958), end_date = slider(1958, 2011,1,2010)): htmls1 = '<h3>CO2 monthly averages at Mauna Loa (interpolated), from NOAA/ESRL data</h3>' htmls2 = '<h4>'+cdate+'</h4>' html(htmls1+htmls2) sel_data = [[q[2],q[4]] for q in datalines if start_date < q[2] < end_date] c_max = max([q[1] for q in sel_data]) c_min = min([q[1] for q in sel_data]) #slope, intercept, r, ttprob, stderr = Stat.linregress(sel_data) #html(htmls1+htmls2+'<h4>Linear regression slope: ' + str(trdf(slope)) + ' ppm/year; correlation coefficient: ' + str(trdf(r)) + '</h4>') var('x,y') #show(list_plot(sel_data, plotjoined=True, rgbcolor=(1,0,0)) + plot(slope*x+intercept,start_date,end_date), xmin = start_date, ymin = c_min-2, axes = True, xmax = end_date, ymax = c_max+3, frame = False) show(list_plot(sel_data, plotjoined=True, rgbcolor=(1,0,0)) , xmin = start_date, ymin = c_min-2, axes = True, xmax = end_date, ymax = c_max+3, frame = False) 
       



Example 2: Natural Hazard Assessment; New Zealand earthquakes

Now we are ready to play with some more real data downloaded from the GeoNet Quake Search catalog site http://magma.geonet.org.nz/resources/quakesearch/.




Example 3: Let us visualize the Stock Market data and fit smooth wiggly curves to it (nonlinear time series), fetched from Yahoo and Google by William Stein next.

Let's grab some fresh stocks shall we?

%hide #%auto import urllib class Day: def __init__(self, date, open, high, low, close, volume): self.date = date self.open=float(open); self.high=float(high); self.low=float(low); self.close=float(close) self.volume=int(volume) def __repr__(self): return '%10s %4.2f %4.2f %4.2f %4.2f %10d'%(self.date, self.open, self.high, self.low, self.close, self.volume) class Stock: def __init__(self, symbol): self.symbol = symbol.upper() def __repr__(self): return "%s (%s)"%(self.symbol, self.yahoo()['price']) def yahoo(self): url = 'http://finance.yahoo.com/d/quotes.csv?s=%s&f=%s' % (self.symbol, 'l1c1va2xj1b4j4dyekjm3m4rr5p5p6s7') values = urllib.urlopen(url).read().strip().strip('"').split(',') data = {} data['price'] = values[0] data['change'] = values[1] data['volume'] = values[2] data['avg_daily_volume'] = values[3] data['stock_exchange'] = values[4] data['market_cap'] = values[5] data['book_value'] = values[6] data['ebitda'] = values[7] data['dividend_per_share'] = values[8] data['dividend_yield'] = values[9] data['earnings_per_share'] = values[10] data['52_week_high'] = values[11] data['52_week_low'] = values[12] data['50day_moving_avg'] = values[13] data['200day_moving_avg'] = values[14] data['price_earnings_ratio'] = values[15] data['price_earnings_growth_ratio'] = values[16] data['price_sales_ratio'] = values[17] data['price_book_ratio'] = values[18] data['short_ratio'] = values[19] return data def historical(self): try: return self.__historical except AttributeError: pass symbol = self.symbol def get_data(exchange): name = get_remote_file('http://finance.google.com/finance/historical?q=%s:%s&output=csv'%(exchange, symbol.upper()), verbose=False) return open(name).read() R = get_data('NASDAQ') if "Bad Request" in R: R = get_data("NYSE") R = R.splitlines() headings = R[0].split(',') self.__historical = [] try: for x in reversed(R[1:]): date, opn, high, low, close, volume = x.split(',') self.__historical.append(Day(date, opn,high,low,close,volume)) except ValueError: pass self.__historical = Sequence(self.__historical,cr=True,universe=lambda x:x) return self.__historical def plot_average(self, spline_samples=10): d = self.historical() if len(d) == 0: return text('no historical data at Google Finance about %s'%self.symbol, (0,3)) avg = list(enumerate([(z.high+z.low)/2 for z in d])) P = line(avg) + points(avg, rgbcolor='black', pointsize=4) + \ text(self.symbol, (len(d)*1.05, d[-1].low), horizontal_alignment='right', rgbcolor='black') if spline_samples > 0: k = 250//spline_samples spl = spline([avg[i*k] for i in range(len(d)//k)] + [avg[-1]]) P += plot(spl, (0,len(d)+30), color=(0.7,0.7,0.7)) P.xmax(260) return P def plot_diff(self): d = self.historical() if len(d) == 0: return text('no historical data at Google Finance about %s'%self.symbol, (0,3)) diff = [] for i in range(1, len(d)): z1 = d[i]; z0 = d[i-1] diff.append((i, (z1.high+z1.low)/2 - (z0.high + z0.low)/2)) P = line(diff,thickness=0.5) + points(diff, rgbcolor='black', pointsize=4) + \ text(self.symbol, (len(d)*1.05, 0), horizontal_alignment='right', rgbcolor='black') P.xmax(260) return P symbols = ['bsc', 'vmw', 'sbux', 'aapl', 'amzn', 'goog', 'wfmi', 'msft', 'yhoo', 'ebay', 'java', 'rht', ]; symbols.sort() stocks = dict([(s,Stock(s)) for s in symbols]) @interact def data(symbol = symbols, other_symbol='', spline_samples=(8,[0..15])): if other_symbol != '': symbol = other_symbol S = Stock(symbol) html('<h1 align=center><font color="darkred">%s</font></h1>'%S) S.plot_average(spline_samples).save('avg.png', figsize=[10,2]) S.plot_diff().save('diff.png', figsize=[10,2]) Y = S.yahoo() k = Y.keys(); k.sort() html('Price during last 52 weeks:<br>Grey line is a spline through %s points (do not take seriously!):<br> <img src="cell://avg.png">'%spline_samples) html('Difference from previous day:<br> <img src="cell://diff.png">') html('<table align=center>' + '\n'.join('<tr><td>%s</td><td>%s</td></tr>'%(k[i], Y[k[i]]) for i in range(len(k))) + '</table>') 
       



Example 4: Nonlinear Time Series of Chaotic Double Pendulum

The divergence of two close-by trajectories in a measurable double pendulum. Check out the double pendulum that was recently built in our Department of Mathematics and Statistics.




Example 5: Statistical Genetics; Shedding light on interrelations (plan/animal breeding, Landcare Research, Department of Conservation).

A Coalescent Simulator by Marshall Hampton

%hide def next_gen(x, selection=1.0): '''Creates the next generation from the previous; also returns parent-child indexing list''' next_x = [] for ind in range(len(x)): if random() < (1 + selection)/len(x): rind = 0 else: rind = int(round(random()*(len(x)-1)+1/2)) next_x.append((x[rind],rind)) next_x.sort() return [[x[0] for x in next_x],[x[1] for x in next_x]] def coal_plot(some_data): '''Creates a graphics object from coalescent data''' gens = some_data[0] inds = some_data[1] gen_lines = line([[0,0]]) pts = Graphics() ngens = len(gens) gen_size = len(gens[0]) for x in range(gen_size): pts += point((x,ngens-1), hue = gens[0][x]/float(gen_size*1.1)) p_frame = line([[-.5,-.5],[-.5,ngens-.5], [gen_size-.5,ngens-.5], [gen_size-.5,-.5], [-.5,-.5]]) for g in range(1,ngens): for x in range(gen_size): old_x = inds[g-1][x] gen_lines += line([[x,ngens-g-1],[old_x,ngens-g]], hue = gens[g-1][old_x]/float(gen_size*1.1)) pts += point((x,ngens-g-1), hue = gens[g][x]/float(gen_size*1.1)) return pts+gen_lines+p_frame d_field = RealField(10) @interact def coalescents(pop_size = slider(2,100,1,15,'Population size'), selection = slider(-1,1,.1,0, 'Selection for first taxon'), s = selector(['Again!'], label='Refresh', buttons=True)): print 'Population size: ' + str(pop_size) print 'Selection coefficient for first taxon: ' + str(d_field(selection)) start = [i for i in range(pop_size)] gens = [start] inds = [] while gens[-1][0] != gens[-1][-1]: g_index = len(gens) - 1 n_gen = next_gen(gens[g_index], selection = selection) gens.append(n_gen[0]) inds.append(n_gen[1]) coal_data1 = [gens,inds] print 'Generations until coalescence: ' + str(len(gens)) show(coal_plot(coal_data1), axes = False, figsize = [8,4.0*len(gens)/pop_size], ymax = len(gens)-1) 
       
 
       
 
       



Example 6: Understand Funny Fluids via Non-newtonian Fluid Dynamics

The science that deals with Non-Newtonian fluids (or funny fluids :) ) is called Rheology and studies the deformation and flow of matter. But what type of behaviour can we expect from this materials (or funny fluids)? Well, if the behaviour is neither elastic (if you double the tension you double the extension) nor the one of a Newtonian liquid (if you double the force you double the velocity gradient), then Rheology is your tool.

Some examples of these materials can be found in: Synthetic-fibre and plastics-processing industries, liquid detergents (your shampoo or toothpaste), multigrade oils, paints, cosmetics, foods (chocolate and ice cream for example) and biological fluids just to name a few.


The above fluid is called ferro-magneto fluid...you can do all these weird shapes by controlling the magnetic field around the fluid.

 
       



Example 7: Reconstructing Surfaces

Reconstructing surfaces is useful in geospatial research and image reconstructions.

Surface reconstruction from Laser scanning

Particle-based fluid simulation of river flow

Particle-based fluid & granular matter simulation