Untitled

443 days ago by pub

Wikipages: Mean-Median-Mode, Variance-Standard Deviation,  Histograms   Barchart (scroll down here to see "histogram" and "bar chart".)

Here we look at the statistics commands:  mean, median, mode, variance, std and try to create a histogram and barchart.

def random_between(j,k): a=int(random()*(k-j+1))+j return a 
       
#generate a random number between 1 and 100 and display it. r=random_between(1,100) print r 
       
89
89
#Here we create a list of n=10 random integers between b=1 and t=100 and sort this list. b=1; t=100; n=10 list10=[ random_between(b,t) for j in range(n) ] list10_s=sorted(list10) print list10 print list10_s 
       
[46, 96, 10, 22, 42, 2, 22, 90, 78, 74]
[2, 10, 22, 22, 42, 46, 74, 78, 90, 96]
[46, 96, 10, 22, 42, 2, 22, 90, 78, 74]
[2, 10, 22, 22, 42, 46, 74, 78, 90, 96]
#Here we find the mean (average) and median of the list we generated above. Mean and median are numbers! mu=float(mean(list10)) m=float(median(list10)) print 'mean = ',mu, ' median = ', m 
       
mean =  48.2    median =  44.0
mean =  48.2    median =  44.0
#We find the mode of a couple of list with category data (since the mode of numeric data is useless). Mode is a list! listA=['a','a','b','a','c'] print mode(listA) listB=['a','a','b','b','c'] print mode(listB) listC=['a','a','b','b','c','c'] print mode(listC) 
       
['a']
['a', 'b']
['a', 'c', 'b']
['a']
['a', 'b']
['a', 'c', 'b']
#Here we create a list of 1000 random integers between m=1 and n=100 and find the minimum and maximum values b=1; t=100; n=1000 list1=[ random_between(b,t) for j in range(n) ] mn=min(list1) mx=max(list1) print 'min = ',mn,' max = ', mx 
       
min =  1     max =  100
min =  1     max =  100
# Here we use the commands variance() and std() to get the variance and standard deviation (and float to get decimals) sigma_sqrd=float(variance(list1)) print 'variance = ',sigma_sqrd sigma=float(std(list1)) print 'standard deviation = ', sigma, sqrt(sigma_sqrd) 
       
variance =  817.924435435
standard deviation =  28.5993782351 28.5993782351
variance =  817.924435435
standard deviation =  28.5993782351 28.5993782351


Sage sucks at histograms and barcharts (jan 2013). You cannot adjust the bin intervals, the width of the bars, .... 

Alternatively read: http://trac.sagemath.org/sage_trac/ticket/9671

# Importing matplotlib you can get a decent histogram. Remembering that our data can take on values in [0,100] so don't leave out range and it must be decimals! Everything after that is extras. import matplotlib.pyplot as plt plt.hist(list1, bins=5, range=(0.,100.), facecolor='lightgreen', alpha=0.5, zorder=100) plt.savefig('Histogram.png') plt.close() #The bottom two lines are to force a plot. This is a bug (jan 2013). For more extras: http://sagemath.wikispaces.com/Uniform+Distributions 
       
#Here we create a (normalized) histogram of the list we generated in previous block. #We expect the histogram to be "reasonably" uniform (flat) or at least more uniform as n increases. #Because we do know how (or even if) we can enter actual intervals for the bins, we must make sure that the min and max are actually in the list. num_bins=5 stats.TimeSeries(list1).plot_histogram(bins=num_bins,color='red', figsize=6, aspect_ratio=num_bins*1000, alpha=0.5) 
       

                                
                            

                                
#To make a bar chart with our data, we want 5 bins or (0,20],(21,40],...,(80,100] so our int_len=((t-b)+1)/num_bins print int_len #We count our data count_data=[] for j in range(num_bins): print j count_data=count_data+[len([x for x in list1 if (x>int_len*j and x<=int_len*(j+1))])] print count_data 
       
20
0
1
2
3
4
[199, 188, 184, 230, 199]
20
0
1
2
3
4
[199, 188, 184, 230, 199]
#See below to get the barchart to start at y=0; width=1 connects the bars. bar_chart(count_data, width=1, color="lightblue", figsize=4) 
       

                                
                            

                                
BC=bar_chart(count_data, width=1, color="lightblue", figsize=4) show(BC, ymin=0)