# Least Squares

## Contents

Carl Heiles has written the definitive lab document on least squares fitting, and you should definitely read at least the lite version in the reference materials above. That being said, all the examples in the document are in IDL, and so this page will focus on performing those examples with Python. Note that all figures and equations referenced are those within Carl Heiles’ document.

This is also the first lab where you will really be working with error and the propagation of uncertainty, so it is highly recommended that you look over chapter 3 of Taylor’s error analysis text (the basics covered there will be up for game for quizzes).

## 1 A Numerical Example and Its Solution in IDL

### 1.1 Generation of the numerical example

Suppose that we make four measurements of the angle y and we want to fit to a parabolic function in time t. In the notation of Equation 1.1, s would be unity, t the time, and u the time squared, so the number of unknowns is three (N = 3). Because there are four independent measurements (M = 4) the subscripts run from m = 0 →3. Suppose that the four values of time are 5, 7, 9, 11.

First, we create the matrix X in Python,

X = np.zeros(N, M, dtype = float) = np.zeros(3, 4, dtype = float)

and then we populate it with numbers. In your own work, you would normally do this by reading a data file and transferring the numbers to the matrix using IDL commands; to work through this example, you might manually type them in. After populating the matrix, in direct correspondence with Equation 2.1a we have sm = 1, tm = timem , um = time2 :

\begin{align} {X} = \left[ \begin{array}{ccc} 1 & 5 & 25 \\ 1 & 7 & 49 \\ 1 & 9 & 81 \\ 1 & 11 & 121 \\ \end{array} \; \right] \; \end{align}\,\!

Suppose that the four measured values of y are (Equation 2.1c),

\begin{align} { Y} = \left[ \begin{array}{c} 142 \\ 168 \\ 211 \\ 251 \\ \end{array} \; \right] \; \end{align}\,\!

Figure 4.1 shows the datapoints, together with the fitted curve. In Python, you can generate a column vector like Y by creating an array,

Y = np.array([142, 168, 211, 251]),

and a matrix like X by using the np.matrix function, providing each of the columns in a list, as,

X = np.matrix([[1,1,1,1],[5,7,9,11],[25,49,81,121]])

## 2 Solution of the Numerical Example in Python

In Python we calculate the normal equation matrices and denote the [α] in Equation 2.4a by XX:

XX = np.dot(X,np.transpose(X))

and we denote the [β] in Equation 2.4b by XY:

XY = np.dot(Y,np.transpose(X))

We can take the inverse of [α] (XX) in Python by:

XXI = np.linalg.inv(XX)

Finally, we can find the least-squares fitted quantities a by multiplying these two matrices,

a = np.dot(XY,XXI)

In Python we denote the matrix of predicted values $\bar{y_ m}$ by YBAR, which is,

YBAR = np.dot(a,X),

and we can also define the residuals in Y as,

DELY = Y - YBAR

In Python we can also denote s2 in Equations 3.1 and 3.6 by s_sq and write

s_sq = np.dot(DELY/(M-N),np.transpose(DELY)),

or

s_sq = np.sum(DELY**2)/(M-N)

It is always a good idea to plot all three quantities (the measured values Y, the fitted values YBAR, and the residuals DELY to make sure your fit looks reasonable and to check for bad datapoints.

To get the error in the derived coefficients, we need a way to select the diagonal elements of a matrix. Obviously, any N by N matrix has N diagonal elements; a convenient way to get them is,

diag_elems = np.dot(NbyNmatrix,np.identity(N))

In Python, we define the variances of the N derived coefficients by vardc (think of "variances of derived coefficients"). You can find this as in Equation 3.7 from,

vardc = s_sq*diag_elems

## 3 Discussion of the numerical example

For this numerical example, the solution (the array of derived coefficients) is,

\begin{align} { a} = \left[ \begin{array}{c} 96.6250 \\ 4.5000 \\ 0.8750 \\ \end{array} \; \right] \; \end{align}\,\!

and the errors in the derived coefficients (the square root of the σ2’s of the derived coefficients, i.e. $[\sigma ^2_ n]^{1/2}$, or in Python, np.sqrt(vardc)) are:

\begin{align} { \sigma _ a} = \left[ \begin{array}{c} 34.012 \\ 9.000 \\ .5590 \\ \end{array} \; \right] \; \end{align}\,\!

These results look horrible: the uncertainties are large fractions of the derived coefficients. The reason: we have specifically chosen an example with terrible covariance. And the great thing is that this can be fixed easily (at least in this case–certainly not always), without taking more data! To address this issue, look to section 5 of Least-Squares Lite.