Example 1B: Least Squares Quadratic Approximation.
This file uses the SAME data as Example 1 for Least Squares Linear Approximation
The following measured data is recorded:
x=(60, 61, 62, 63, 65)
y=(3.1,3.6,3.8,4,4.1).
Use the method of least squares to find a quadratic function that best matches the data.
Find the standard deviation of the resulting function and the given data. (The least squares algorithm should yield a function should that minimizes this standard deviation.)
Solution:
We plot the data points and see that there might to be a quadratic correlation, that is, the data can be approximated by a parabola.
$f(x,{a_0},{a_1},{a_2}) = a_2 x^2+a_1x+a_0=ax^2+bx+c$


You must be very careful when applying formulas to the definition of n! In statistics, n = sample size = (number of points). In mathematics, n=number of subintervals= (number of points  1)
This is particularly important in computing the standard deviation. Here we will use the statistical definition! There are 5 pieces of data so n=5 where n is the number of points.
The number of unknown constants in our approximating function is 3, that is, $a_0$, $a_1$ and $a_2$ so m=3.
Since $f(x)=a_2x^2+a_1x+a_0$ is a parabola, we have $f(x)=a x^2+bx+c $ where $a=a_2$, $b=a_1$ and $c=a_0$.
We need to solve the (nonsquare) matrix equation AX=B using the least squares algorithm where
A is the nx3 dimensional array $A=\left[ x^2, x, 1 \right]$ X is the 2x1 dimensional array $\left[ {\begin{array}{*{20}{c} }a \\ b \end{array} } \right] $ and B is the nx1 dimension array $B=\left[ y \right]$ 
AX=B  $ \left[ {\begin{array}{*{20}{c}}{60^2}&{60}&1\\{61^2}&{61}&1\\{62^2}&{62}&1\\{63^2}&{63}&1\\{65^2}&{65}&1\end{array}} \right] \cdot \left[ {\begin{array}{*{20}{c} }a \\ b \end{array} } \right] = \left[ {\begin{array}{*{20}{c}}3.1\\{3.6} \\{3.8}\\{4}\\{4.1} \end{array}} \right] $ 

The syntax for x1 and y1 are easy since they just need to be 1xn arrays. (We save x and y for variable names for plots.)
We point out that when calling elements of a 1xn array, we do NOT have to write x1[0,m], but simply x1[m].
[60 61 62 63 65] [3600 3721 3844 3969 4225] [ 3.1 3.6 3.8 4. 4.1] [60 61 62 63 65] [3600 3721 3844 3969 4225] [ 3.1 3.6 3.8 4. 4.1] 
The syntax for the array A is VERY important because both of its dimensions are bigger than 1. So it must be written properly as an nx2 array.
We point out that when calling elements of such an array, we must name both the row and the column, i.e. A[3,2] gives the element in the 4th row, 3rd column.
[[3600 60 1] [3721 61 1] [3844 62 1] [3969 63 1] [4225 65 1]] [[3600 60 1] [3721 61 1] [3844 62 1] [3969 63 1] [4225 65 1]] 
B is an nx1 array, but here we can be lazy and use a 1xn array since SAGE knows what to do with 1dimensional arrays. The nice thing about doing this is that lstsq then yields a 1x2 array.
[ 3.1 3.6 3.8 4. 4.1] [ 3.1 3.6 3.8 4. 4.1] 
Now we want to solve the (nonsquare) matrix equation AX=B using least squares. In MatLab we could just write X=A\B.
Here we use the numpy linalg command lstsq. Reference (Remember: X is just the constants $a$ and $b$.)
[ 5.19145803e02 6.68136966e+00 2.10858321e+02] 0.0519145802651 6.68136966126 210.85832106 [ 5.19145803e02 6.68136966e+00 2.10858321e+02] 0.0519145802651 6.68136966126 210.85832106 
0.0519145802651 6.68136966126 210.85832106 0.0519145802651 6.68136966126 210.85832106 

Now we want to calculate the "standard deviation" σ.
To do this, we calculate the sum S of the squares of the residuals, that is the difference between the value of the approximating function and the measured value for each value of x.
This means we need an array y2 with the values of the approximating function and the measured value for each value of x.
We then calculate $S=\sum (y2y1)^2 $.
We then calculate the standard deviation using the statistical definition of n=number of points, m=number of constants: $\sigma=\large{\sqrt{\frac{S}{nm}}}$.
[ 3.13136966 3.53107511 3.8269514 4.01899853 4.0916053 ] [ 3.13136966 3.53107511 3.8269514 4.01899853 4.0916053 ] 
0.00689248895434 0.00689248895434 
0.0587047227842 0.0587047227842 
