# Least Squares

## 1 Overview

The method of least squares (also commonly referred to as least-squares fitting) is procedure to find the best fit model curve for some given data set. The development of this method is commonly attributed to Karl Friedrich Gauss, despite being first developed and published by Adrien Marie Legendre approximately ten years earlier. The most common version, linear least squares fitting, can be used to fit the data with any function of the form

$f(\vec{x}, \vec{\beta }) = \beta _0 + \beta _1 x_1 + \beta _2 x_2 + ... + \beta _ N x_ N \,\!$

in which

1. Each explanatory variable xi in the function is multiplied by a unique unknown parameter βi,
2. There is at most one unknown parameter with no corresponding explanatory variable, and
3. All of the individual terms are summed to produce the final function value.

In statistical terms, any function that meets these criteria would be called a “linear function". The term “linear" is used, even though the model function produced may not itself be a straight line. This is because the unknown parameters are considered to be variables and the explanatory variables are considered to be known coefficients corresponding to those “variables", making the problem a system (usually overdetermined) of linear equations that can be solved for the values of the unknown parameters. To differentiate the various meanings of the word “linear", the linear models being discussed here are often said to be “linear in the parameters" or “statistically linear".

If you are trying to fit more complicated functions where the statistical parameters themselves are not linear, you can use non-linear least squares techniques. In practice, these are often solved using iterative refinement, where each iteration is solved linearly, so the root calculation tends to be similar. This makes linear regression a very valuable tool to understand, as it really strikes at the core of all modern data fitting techniques.

For example, a function like

$f(\vec{x}, \vec{\beta }) = \beta _0 + \beta _1 x + \beta _2 x^2 \,\!$

or

$f(\vec{x}, \vec{\beta }) = \beta _0 + \beta _1 ln{x} \,\!$

could easily be fit by a linear least squares model, whereas something like

$f(\vec{x}, \vec{\beta }) = \beta _0 + \beta _0 \beta _1 x \,\!$

would not, despite being linear in $\vec{x}$, because it is now quadratic in the parameters $\vec{\beta }$.

## 2 Why “Least Squares"?

The method of least squares gets its name from the way the estimates of the unknown parameters are calculated. In the least squares method, the unknown parameters are estimated by minimizing the sum of the squared deviations between the data and the model (also called the residuals).

$s^2 = \frac{1}{M-N} \sum _{m=0}^{M-1} \delta y_ m^2 \,\!$

The minimization process reduces the overdetermined system of equations formed by the data to a sensible system of p, (where p is the number of parameters in the functional part of the model) equations in p unknowns. This new system of equations is then solved to obtain the parameter estimates.

## 3 Least Squares or Chi-Squared, Which Is Right for Me?

The main difference between least squares and chi-squared fitting comes in how individual measurements are weighted. Generally, you can think of least squares as unweighted, whereas chi-squared fitting is downweighted by its intrinsic measurement error. In fact, the definition of the reduced chi-square $\widehat{\chi ^2}$ is simply the ratio of the variance of the datapoint residuals (i.e. s2 as defined in Eq.eq:least-squares) and the adopted intrinsic measurement variances $\sigma _ m^2$.

$\widehat{\chi ^2} = \frac{1}{M-N} \sum _{m=0}^{M-1} \frac{\delta y_ m^2}{\sigma _ m^2} \,\!$

In theory, this sounds great! Your fit to your data will account for the fact that some data points are less valuable than others, this seems like a great way to estimate the true value of measured data sets. But before you drop everything and switch all fitting efforts to chi-squared, remember that your fit will only be as good as your error estimates. That is to say: you must have precise knowledge of your error on each measurement – any error on your errors will propagate into your fitting and ultimately give you a worse fit!