Topics

- Derivation of normal equations for linear least squares solution
- Application of least-squares to geodesy
- Application of least squares as linear regression

Further reading

- The history of statistics: The measurement of uncertainty before 1900, by Stephen Stigler. An excellent account.
- Numerical methods for least squares problems by Åke Björck, 1996
- Legendre’s 1805 work on the orbit of comets contains the first published uses of least squares (translation).
- William Chauvenet’s appendix on linear least squares, published 1891.
- On the History of the Method of Least Squares, by Merriman, 1877, describes the first 13 “proofs” of the method.
- Sources in the history of probability and statistics, old but very comprehensive website.
- A nice non-technical historical survey of linear regression by Dan Kopf.
- Wikipedia’s article on least squares
- A tutorial history of least squares with applications to astronomy and geodesy by Yves Nievergelt, 2000.
- Legendre’s 1805 work on the orbit of comets contains the first published uses of least squares (translation).
- Laplace’s data as published by Bowditch.

Linear least squares is a way to solve systems of linear equations that can not otherwise be solved. The broad idea is to look for the thing that is *closest* to solving all of the equations, rather than solving all of the equations directly. The trick of the method that has lead to its wide adoption is that it minimizes the sum of the square errors between the observations and predictions, rather than the error itself. Let’s begin by briefly reviewing some of the basics of linear systems.

A linear system is a set of equations that we can arrange to look like \[\begin{align*}
a_{1,1} x_1 + a_{1,2} x_2 + \cdots + a_{1,n} x_n &= b_1,
\\
a_{2,1} x_1 + a_{2,2} x_2 + \cdots + a_{2,n} x_n &= b_2,
\\
\vdots & \quad \vdots
\\
a_{m,1} x_1 + a_{m,2} x_2 + \cdots + a_{m,n} x_n &= b_n,
\end{align*}\] where the \(a\)’s and \(b\)’s are all known numbers, and the \(x\)’s are the unknowns we would like to determine. This linear system can be written more compactly as a vector-matrix equation \[\mathbf{A x} = \mathbf{b}\quad\text{or}\quad \mathbf{A x} - \mathbf{b} = \mathbf{0}\] where \(\mathbf{b} = [ b _ i ]\) is an \(m\)-dimensional vector, \(\mathbf{A} = [a _ {ij}]\) is an \(m \times n\) matrix with \(m\) rows and \(n\) columns, and \(\mathbf{x} = [ x _ i]\) is an \(n\)-dimensional vector. When we are attempting to determine the set of unknowns \(\mathbf{x}\), each additional equation removes one dimension from the solution set. If the number of equations is equal to the number of unknowns (\(m = n\)), then the solution set is *usually* a zero-dimensional point and we can determine the value of all the unknowns exactly (the exceptions being cases where one or more of the equations are repeating information already present in the preceding equations). If there are fewer equations than unknowns (\(m < n\)), the linear system is underdetermined, and we can only say that the solutions lie in a certain \(n-m\) dimensional plane. If there are more equations than unknowns (\(m > n\)), then the linear system is *usually* overdetermined, and there are no solutions (unless enough of the equations are redundant).

Ideally, when we measure things, all our measurements are very accurate, and we can make just enough measurements so that we get a linear system having exactly as many independent equations as we have unknowns. In practice, however, our observations have errors, and we always want to check your work, in case an observation went really wrong. A good way to check our work for errors is to make more observations. But adding more observations means adding more equations to the system. So, we often have overdetermined systems. And since there’s always a little error in these observations, it is unlikely that a single set of unknown values will satisfy all the equations at once.

Although there is no solution to an overdetermined system, we can still look for the solution that is “closest” to solving all of our equations at once, \(x _ {\rm closest}\). There are many possible ways to measure “closest”. We could, for instance, calculate how much each equation is off by, and choose the value of \(x\) that minimized the largest of these errors. Or we could just sum the absolute values of all the errors and minimize this sum. The most popular approach is to pick values for our unknowns that minimize the sum of the squared errors for each equation – the unknowns that return the “least squares”. As it turns out, the vector \(x _ {\rm closest}\) with the least sum-of-squares-error solves a simplified linear system \[A^T A x _ {\rm closest} = A^T b.\] This equation is called the *normal equation* of the linear system.

Why the normal equation? The normal equation is derivable using standard minimization techniques from calculus – differentiate an error function and find where that derivative of the error vanishes. Because the error function involves only quadratics, it’s derivative will be a linear function. It is good to have a reference handy for matrix calculus.

First, observe that if you want to sum the squares of the components of any vector \(v\), you can do this by taking the dot-product of the vector with itself, or multiplying the vector by it’s own matrix transpose, \[\sum_i v_i^2 = v \cdot v = v^T v.\] Now, we define the square error \[\begin{align*}
E(x) &:= \sum _ i \left[ (Ax)_i - b_i \right]^2
\\
&= (Ax - b)^T (Ax-b)
\\
&= x^T A^T A x - x^T A b - b^T A x + b^T b
\end{align*}\] Since each of these terms is a scalar, \(x^T A b = (x^T A b)^T = b^T A x\), and \[\begin{align*}
E(x) = x^T A^T A x - 2 b^T A x + b^T b
\end{align*}\] To find the solution \(x _ {\rm closest}\) the gives the least square error, differentiate \(E(x)\) with respect to \(x\) and set the derivative to zero: \[\frac{dE}{dx} = 2 x^T A^T A - 2 b^T A = 0.\] If we now sort and transpose, we obtain the *normal equation* stated above. So, that’s where the normal equations come from.

One of the reasons the math works out so nicely is that the sum-of-squared-errors is always a convex function of the unknowns, and for convex functions on a bounded convex domain, there is always exactly one local minimum that is also a global minimum. When minimizing the raw error, there can be many different local minima. The normal equation also has a close connection to probability theory. If observation errors only occur in the constant vector \(b\) and these errors are normally distributed, then according to principles of probability, the solution of the normal equations gives the most likely values for the unknown vector \(x\).

**Pro tip**: For any of you familiar with the techniques of numerical linear algebra, it is often recommended to use a QR algorithm or SVD decomposition to find linear least-squares solutions rather than solving the normal equations by elimination with partial pivoting – these alternative algorithms are more robust in cases where the normal equations are sensitive to small measurement errors. *Computationally* this means, *do not* invert \(A^TA\) to recover \(x=(A^T A)^{-1} A^T b\). Use a built-in linear system solver like scipy’s `linalg.solve`

instead. The version of the linear least squares algorithm presented here is the simplest, most naive version. Many variations and enhancements have been developed over the last century (see Björck, 1996).

In the 1864 book *A manual of spherical and practical astronomy*, William Chauvenet illustrates the method with a problem for coastal surveying (v 2, p551). A Coastguard station at Pine Mount surveyed the angles between 4 neighboring stations, Joscelyne, Deepwater, Deakyne, and Burden as follows, measured to a precision of thousandths of an arcsecond (or roughly a millionth of a degree). The conversion of (degrees, minutes, seconds) to decimal form can be quickly accomplished with the python function `lambda d,m,s : ((s/60.+m)/60.+d)`

Stations | Angle between | Decimal form |
---|---|---|

Joscelyne to Deepwater | 65 11’ 52.500" | 65.197917 |

Deepwater to Deakyne | 66 24’ 15.553" | 66.404320 |

Deakyne to Burden | 87 2’ 24.703" | 87.040195 |

Burden to Joscelyne | 141 21’ 21.757" | 141.356044 |

But we also have one extra piece of information that hasn’t been accounted for – the sum of all these angles must be 360 degrees. If we add up all four observed angles, we get 359.998476, close to 360 degrees, but off by more than a thousandth of a degree. That’s definitely not correct to the millionth of a degree precision we expect, given the precision of the measurements reported. So, the task is to use this extra information about the sum of the angles to improve our accuracy, if we can.

If we let \(t\), \(u\), \(v\), and \(w\) be actual angles, we have 5 equations. \[\begin{align*} t &= 65.197917 \\ u &= 66.404320 \\ v &= 87.040195 \\ w &= 141.356044 \\ t + u + v + w &= 360 \end{align*}\] In matrix form, \[ A x = b\] where \[A = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 1 & 1 & 1 \end{bmatrix}, \quad x = \begin{bmatrix} t \\ u \\ v \\ w \end{bmatrix}, \quad b = \begin{bmatrix} 65.197917 \\ 66.404320 \\ 87.040195 \\ 41.356044 \\ 360 \end{bmatrix}. \]

The calculation from here on can be done by hand but are best performed by computer.

[Show code]

```
from scipy import *
from scipy import linalg
thetas = array([65.197917, 66.404320, 87.040195, 141.356044]) # directly observed values
A = zeros((5,4))
A[0:4,0:4] = eye(4)
A[4,0:4] = 1
b = zeros((5,1))
b[0:4,0] = thetas
b[4] = 360.
xbest = linalg.solve(A.T.dot(A), A.T.dot(b)) #
print("Corrections, in arc-seconds:\n")
print((xbest.T - thetas) * 60 ** 2)
```

We find \[A^T A = \begin{bmatrix}2 & 1 & 1 & 1\\1 & 2 & 1 & 1\\1 & 1 & 2 & 1\\1 & 1 & 1 & 2\end{bmatrix} \quad \text{and} \quad A^T b = \begin{bmatrix}425.197917\\426.40432\\447.040195\\501.356044\end{bmatrix}.\]

\[x _ {\text{closest}} = (A^T A)^{-1} (A^T b) = \frac{ 1 }{ 5 } \begin{bmatrix}4&-1&-1&-1\\-1&4&-1&-1\\-1&-1&4&-1\\-1&-1&-1&4\end{bmatrix} \begin{bmatrix}425.197917\\426.40432\\447.040195\\501.356044\end{bmatrix} = \begin{bmatrix}65.1982218\\66.4046248\\87.0404998\\141.3563488\end{bmatrix}. \]

We find the best correction is to add 1.097 arcseconds to each angle estimate. This makes intuitive sense, since without more information, we expect all four of the measurements to have the same kinds of errors, and our auxiliary condition that the angles must sum to 360 degrees also treats all four symmetrically.

Stations | Observed angle | Corrected angle | Correction |
---|---|---|---|

Joscelyne to Deepwater | 65° 11’ 52.500" | 65° 11’ 53.598" | 1.097" |

Deepwater to Deakyne | 66° 24’ 15.553" | 66° 24’ 16.649" | 1.097" |

Deakyne to Burden | 87° 2’ 24.703" | 87° 2’ 25.799" | 1.097" |

Burden to Joscelyne | 141° 21’ 21.757" | 141° 21’ 22.856" | 1.097" |

The method of least squares was developed almost simultaneously in France by Adrien-Marie Legendre (1805), in (what is now) Germany by Carl Gauss(1809), and in the United States by Robert Adrian (1808). (Adrian’s story is rarely told and somewhat controversial.) The very first public application of the method of least squares in 1805 was by Legendre to determine the figure of the earth and orbit of a comet. Laplace had previously attempted to solve the problem, and we will work with the data he collated. Since observations of natural variation in pendulum periods in 1672 started the debate, a number surveys and expeditions made careful measurements of the distance travelled to traverse 1 degree of latitude, including Maupertius’s expedition to Lapland (1736), the French Geodesic Mission to Peru (1739), and the Mason-Dixon observation in Pennsylvania (1768).

[ * Data : hide , shown as table , shown as CSV shown as QR * ]

```
# latitude L, Ds/DL, arc length
#
# Volume 2, book 3, section 41, page 445 of Bowditch's translation
# https://archive.org/stream/mcaniquecles02laplrich#page/444/mode/2up
#
# Formula: ds/dL = m sin(L)**2 + c
# and we wish to determine m and c from observations of the
#
# latitude L measured in grads (400 grads = 360 degrees = 2 pi rads)
#
# the length of one degree (ds/dL) is measured in double toises per grad
# (1 toises is 1.949 meters, so y*3.898)
#
# The last column is the actual arclength measured to estimate ds/dL,
# and not used in our calculations
#
0.00000, 25538.85, 3.4633
37.0093, 25666.65, 1.3572
43.5556, 25599.60, 1.6435
47.7963, 25640.55, 2.4034
51.3327, 25658.28,10.7487
53.0926, 25683.30, 3.2734
73.7037, 25832.25, 1.0644
```

Working with the approximate equation \[m \sin^2(\lambda) + c \approx \frac{ds}{d\lambda},\] where \(m\) and \(c\) are unknowns, Laplace was able to use existing observations of lattitude \(\lambda\) and curvature \(ds/d\lambda\) to obtain 7 linear equations. But being overdetermined, these equation’s didn’t have a simultaneous solution, and Laplace didn’t have a good method for solving these 7 equations for the 2 unknown variables \(c\) and \(m\), though he tried several methods.

Let’s consider the general problem of finding a line through several data points. Suppose we have an equation of the form \(m u + c = v\), and we observe a bunch of data points \((u _ i, v _ i)\) that we think should occur on this line, except for errors in \(v _ i\). \[\begin{align*}
m u _ 1 + c &= v _ 1 \\
m u _ 2 + c &= v _ 2 \\
\vdots \\
m u _ i + c &= v _ i \\
\vdots
\end{align*}\] The parameters \(m\) and \(c\) are our unknowns that we’d like to solve for, given our data points, but we have way more than 2 equations for our 2 unknowns, so we’ll use linear least-squares. If we re-arrange our equations into matrix form, \[\begin{gather*}
\begin{bmatrix} u _ 1 & 1 \\ u _ 2 & 1 \\ \vdots & \vdots \\ u _ i & 1 \\ \vdots & \vdots \end{bmatrix}
\begin{bmatrix} m \\ c \end{bmatrix}
=
\begin{bmatrix} v _ 1 \\ v _ 2 \\ \vdots \\ v _ i \\ \vdots \end{bmatrix}
\end{gather*}\] So, then \[\begin{gather*}
A = \begin{bmatrix} u _ 1 & 1 \\ u _ 2 & 1 \\ \vdots & \vdots \\ u _ i & 1 \\ \vdots & \vdots \end{bmatrix}, \quad
x = \begin{bmatrix} m \\ c \end{bmatrix}, \quad
b = \begin{bmatrix} v _ 1 \\ v _ 2 \\ \vdots \\ v _ i \\ \vdots \end{bmatrix}.
\end{gather*}\] We can now calculate \(A^T A\) and \(A^T b\) and then solve the normal equation \(A^T A x = A^T b\) for \(x\), i.e. for the parameters \(m\) and \(c\), using the python code

`x = scipy.linalg.solve(A.T.dot(A), A.T.dot(b))`

to determine the best fit.

[Show code]

```
from scipy import *
from numpy import *
from matplotlib.pyplot import *
fname = "LaplaceEarth.csv"
data = loadtxt(fname, delimiter=',', comments='#')
n = max(data.shape)
latitude = data[:,0:1]*(360./400)*(pi/180) # convert grads to rads
# using '0:1' keeps result a 2-d array
u = sin(latitude)**2
A = hstack([u, ones((n, 1))])
b = data[:,1] # b becomes is a 1-d array
m, c = linalg.solve( A.T.dot(A), A.T.dot(b))
print("Data: \n", data)
print(A.T.dot(A))
print(A.T.dot(b))
print("=== Laplace 1 ===")
print("\tu*25525.20 + 308.202 = v")
print("=== best least squares fit ===")
print("\tu*%f + %f = v"%(m,c))
plot(u, b, 'bo', markersize=10, label="Laplace's Data")
plot(u, u*m + c, 'k-', linewidth=2, label='Linear least squared error')
plot(u, u*25525.20 + 308.202, 'g:', linewidth=2, \
label="Laplace's weighted least error")
legend(framealpha=0.0)
xlim(0,1)
xlabel('Transformed Lattitude ($\sin^2 \lambda$)',fontsize=18)
ylabel('Inverse curvature ($ds/d\lambda$)\nin double-toises per gradian', \
fontsize=18)
# png figures work well in web browsers
savefig('LaplaceEarthLstSq.png',transparent=True,bbox_inches='tight')
# pdf figures work well in LaTex documents and for publishing
# because they are vector-base, and can be rescaled without losing
# quality.
savefig('LaplaceEarthLstSq.pdf',bbox_inches='tight')
```

When we evaluate the normal equations for Laplace’s data, taking \(u_i = \sin^2 \lambda_i\) and \(v_i = \left( ds/d\lambda\right) _ i\), we find \[A^T A = \begin{bmatrix} 7. & 3.07473 \\ 3.07473 & 1.74303 \end{bmatrix}, \quad A^T b = \begin{bmatrix} 179619.48 \\ 79022.77 \end{bmatrix},\] which is equivalent to the 2 by 2 linear system \[\begin{align} 3.07473 m + 7 c &= 179619.48,\\ 1.74303 m + 3.07473 c &= 79022.77. \end{align}\] If we now solve this linear system, we find the best least-squares fit is \(c \approx 25519\), \(m \approx 319.60\), and the regression line through the data is given as \[319.60 \sin^2 \lambda + 25519 = \frac{ds}{d\lambda} .\] Laplace used his own approach to minimizing the error, leading to the line \(308.202 \sin^2 \lambda + 25525 = ds/d\lambda\).

When we plot the data and the regression line together, as in the figure above, we see the least-squares regression line does a good job of bisecting the data, but the data itself has enough variation to be scattered away from the line. Though calculated differently, there is essentially no difference between Laplace’s result and ours when allowing for the uncertainty of the data. The data’s trend is clear – inverse curvature increases with lattitude, so curvature decreases with latitude, implying that the earth bulges at the equator and is flatter at the poles.

Using our regression-line fit obtained above, and solving for our original variables, the elliptical distortion \(\epsilon = 2m /(2m+3 c)\approx 0.008280\). Hooke and Newton were correct – the earth bulges at the equator. We also derive the estimate, after converting back to standard units, of \(r \approx 6.38\) Mm, an estimate within 0.3 % of the modern value. These calculations do not settle the problem of determining the figure of the earth. There were significant errors in these data (as you might have guessed from their plot), and the method of local approximation to the curvature is not making full use of the data available. In 1826, with the help of a much-improved data set and a little creativity, George Airy was able to obtain values that we can consider accurate by modern standards (see Exercises). We now know the earth’s radius is about 6.36 megameters.

The linear least squares method for approximating solution to overdetermined linear systems has several advantages over other approaches like minimizing the total error or the maximum error. First, the square error forms a convex function, and thus, there is always a unique local minimum that is also the global minimum. This ensures reproducibility, and removes ambiguity. Second, the least-squares solution can be calculated directly, with greater accuracy and efficiency than other optimization methods that require iterative evaluation for convergence. Third, the method can be theoretically justified in cases when the errors in the observations are normally distributed.

But don’t mistake these advantages for “proof” that least squares methods and linear regression are universally the “right” approach to estimation problems. Least squares methods are a good general-purpose approach and often give answers very similar to alternatives. But for problems where the value of greater accuracy justifies the time and effort or where there are strong nonlinearities, we can get better results by modelling the actual error mechanisms involved and using alternative optimization algorithms. And least-squares will *always* gives you a solution, even if the question is nonsense! You should never accept it’s results without some sanity-checking. We’ll see more examples as things go along.

There is a second, parallel track in the story of the determination of the earth’s figure that we should atleast mention. While much work on measurement and fitting was going on, there was also substantial theoretical work under way, in an effort to explain the figure of the earth using the laws of nature. Newton’s work was the first step on this path, but was unsatisfactory to a number of attentive readers. Subsequent scholarly studies showed there was not a single equilibrium figure for a spinning blob of fluid with gravity, but a menagery of equilibria figures. The story unfolded over the next 200 hundred years of study, as new equilibria were discovered, and then shown to deform into some other new figure as the rate of spin of the blob increased. These analyses were some of the first examples of what we now call “bifurcation theory”. Many possibly familiar names, including Clairaut, Maclaurin, Simpson, Jacobi, Poincare, and Liapounov appear in the story, and the curious may find it rewarding to seek out.

(Gauss) Find the least-squares solution for \(x\), \(y\), and \(z\) of the following over-determined linear system. \[\begin{align*} x - y + 2 z =& 3, \\ 3x + 2y - 5 z =& 5, \\ 4x + y + 4 z =& 21, \\ -2x + 6y + 6 z =& 28. \end{align*}\]

The calculations Chauvenet actually performs in his book are a little more complicated than described above. Based on other information, some measurements were expected to be more accurate than others. Specifically, the measurement of the angle between Burden to Jescelyne is believed to be less accurate than the other three angles. It would be nice to be able to take this inaccuracy into account when calculating our least-squares solution. And there is indeed a way – the method of weighted linear least squares. We can weight the errors produced by each equation such that equations measured to greater accuracy are given larger weights and equations with less accuracy are given smaller weights.

Find the matrix form for the equations of the method of weighted linear least squares by determining the vector \(x_{\rm closest}\) that minimizes the total weighted square error \[E(x) = \sum_{i} W_{ii} ( (Ax)_i - b_i)^2,\] where \(W\) is a non-negative diagonal matrix.

Chauvenet weights the first 3 measurements as 3 times more accurate than the 4th, and the observation that angles must sum to 360 degrees with (say) 1,000 times more accurate. Use the method of weighted linear least equations obtained above to find the angles between stations to the nearest thousands of an arc-second.

Replicate the calculations in the published version of Chauvenet’s survey problem using the weighted normal equations. Explain any discrepancies with Chauvenet’s result.

In a 2009 article, Head and other authors use a fossil vertebrae to deduce that about 60 million years ago, there existed a neotropical snake that grew to 13 meters in length and weighted more than 1,000 kilograms (42 feet and more than 2,000 pounds). They called this snake

*Titanoboa cerrejonensis*. This discovery and others like it inspired the PBS Secrets of the Dead episode Graveyard of the giant beasts in 2016, also found on (youtube). Let’s see if we agree with the hype by looking through this scientific paper and checking their numbers.Fit a line through the data given below.

What were the two regression lines the team obtained for the relationship between between vertebra width and total body length using the extreme values of 60% and 65% for vertebral position?

Create a plot of width vs total body length using known 60% and 65% position data. Add to this plot the regression lines from parts (a) and (b). Make sure your axes scales are large enough to show the predicted body length of

*Titanoboa*, given that the discovered vertebra was 12 cm wide.Is your line consistent with the lines found by the team?

Do you think Head et al’s conclusion about body length are supported by the evidence, are contradicted by the evidence, or that the evidence is inconclusive? Defend your opinion.

[ * Data : hide , shown as table , shown as CSV shown as QR * ]

```
# VW 65% (mm), VW 60% (mm),TBL(mm),SVL(mm)
# These are the data column headings ..
#
# Extant snake data used to estimate Titanoboa's TBL and SVL
# VW = (V)ertabral (W)idth at a fixed precentage done spine
# TBL = (t)otal (b)ody (l)ength
# SVL = (s)nout (v)ent (l)ength
#
# @article{bib:Head2009,
# title={Giant boid snake from the Palaeocene neotropics reveals hotter past equatorial temperatures},
# volume={457},
# ISSN={1476-4687},
# url={http://dx.doi.org/10.1038/nature07671},
# DOI={10.1038/nature07671},
# number={7230},
# journal={Nature},
# publisher={Springer Nature},
# author={Head, Jason J. and Bloch, Jonathan I. and Hastings, Alexander K. and Bourque, Jason R. and Cadena, Edwin A. and Herrera, Fabiany A. and Polly, P. David and Jaramillo, Carlos A.},
# year={2009},
# month={Feb},
# pages={715-717},
# }
#
# Data extracted from Supplement, Table 2
#
6.530,6.380,1535,1423
19.04,19.66,2040,1900
21.42,23.50,2480,2390
12.64,12.92,1606,1355
27.08,27.69,3220,2970
28.01,28.20,3434,3129
10.96,11.22,867,803
11.80,12.20,1450,1216
10.54,10.79,1734,1360
17.97,17.61,2330,1960
11.27,11.54,1380,1195
10.22,10.65,1210,1070
11.58,11.66,1700,1450
10.85,11.20,1770,1385
15.08,15.09,2250,1950
10.66,10.80,1720,1490
17.26,17.10,2470,2110
23.99,24.32,3320,2910
17.24,17.67,2510,2190
23.12,24.33,2690,2310
15.69,16.15,1760,1610
```

(Hard) In 1807, Nathaniel Bowditch observed a comet in the New England sky and made a series of astronomical observations. He then attempted to calculate the orbit of the comet using the methods described by Laplace. In the process, he developed a system of 56 equations for 5 unknowns.

- Use linear least squares to estimate the values of these 5 parameters.
- Discuss the relationship between your estimates and Bowditch’s.
- Show that even though Bowditch’s estimates do not minimize the sum-of-squared errors, they do do a better job minimizing the sum-of-errors.

[ * Data : hide , shown as table , shown as CSV shown as QR * ]

```
# eq #, c_d, c_t, c_p, c_n, c_i, constant
# data from pages 7, 8, and 9 of Bowditch, 1807
# http://www.jstor.org/stable/25057874
# Special thanks to ZY and HP for transcription
#
1, -315, 2, 129, 239, -234, 38
2, -317, -1, 138, 242, -248, -88
3, -319, -4, 145, 244, -262, -98
4, -323, -8, 151, 246, -279, -84
5, -330, 2, 198, 249, -412, 21
6, -329, 14, 207, 246, -448, -75
7, -327, 17, 208, 242, -467, 16
8, -324, 26, 212, 240, -485, -164
9, -321, 31, 213, 235, -505, -116
10, -318, 40, 215, 231, -526, -71
11, -314, 48, 215, 225, -547, 86
12, -308, 57, 214, 218, -567, -278
13, -304, 67, 213, 211, -589, -158
14, -199, 202, 157, 108, -789, -413
15, -140, 262, 119, 59, -851, 385
16, -122, 280, 99, 37, -876, 120
17, -95, 305, 81, 16, -895, -304
18, -14, 378, 12, -55, -955, 1081
19, 47, 429, -40, -105, -987, 775
20, 187, 533, -165, -217,-1033, 263
21, 267, 588, -240, -280,-1045, 164
22, 306, 613, -282, -314,-1048, 757
23, 509, 726, -483, -464,-1020, 204
24, 542, 742, -520, -489,-1007, 238
25, 582, 761, -562, -517, -992, 614
26, 690, 804, -676, -587, -929, 47
27, 724, 813, -713, -609, -905, 1628
28, 944, 789,-1006, -716, -496, 69
29, 226, 957, -336, -214, 43, -180
30, 237, 950, -337, -212, 41, -235
31, 247, 945, -339, -211, 39, -301
32, 258, 933, -341, -209, 38, -143
33, 333, 865, -361, -208, 28, -262
34, 349, 849, -367, -210, 27, -197
35, 359, 839, -372, -211, 27, -386
36, 366, 831, -374, -212, 27, -180
37, 375, 824, -378, -213, 28, -98
38, 383, 814, -382, -215, 29, -127
39, 390, 804, -386, -217, 28, -92
40, 397, 794, -391, -220, 31, 74
41, 403, 784, -395, -223, 32, -108
42, 463, 702, -435, -241, 74, 244
43, 476, 671, -447, -245, 97, -294
44, 478, 659, -451, -246, 106, -197
45, 483, 651, -454, -246, 115, 413
46, 491, 617, -464, -246, 146, -803
47, 493, 593, -468, -242, 170, -200
48, 488, 541, -468, -229, 223, -177
49, 484, 516, -466, -220, 252, 287
50, 477, 500, -464, -215, 267, 424
51, 447, 428, -440, -176, 342, 565
52, 436, 412, -434, -167, 355, 101
53, 427, 397, -426, -157, 370, -143
54, 395, 351, -398, -123, 412, -589
55, 384, 338, -387, -111, 425,-1082
56, 206, 167, -219, 56, 539, -229
```

Laplace proposed that the equations for the data-fitting should be weighted in proportion to the total length of arc measured, reasoning that the larger the arc, the more accurate the estimate of the length of a single degree. If Laplace’s data is fit using weighted linear least squares (see exercise in previous lecture), what estimates do we get for \(m\) and \(c\)?

(From Sand, 1995) The following data set illustrates Bergmann’s rule that body size within a species should increase with lattitude. Using the linear least-square method, find a regression line through the data that is the best fit.

[ * Data : hide , shown as table , shown as CSV shown as QR * ]

```
# This data set illustrates Bergmann's rule of
# animal size increasing with lattitude with Swedish Moose
# https://en.wikipedia.org/wiki/Bergmann's_rule
#
# @article{bib:Sand1995,
# title={Geographical and latitudinal variation in growth
# patterns and adult body size of Swedish moose (Alces alces)},
# volume={102},
# ISSN={1432-1939},
# url={http://dx.doi.org/10.1007/BF00341355},
# DOI={10.1007/bf00341355},
# number={4},
# journal={Oecologia},
# publisher={Springer Nature},
# author={Sand, H�kan and Cederlund, G�ran and Danell, Kjell},
# year={1995},
# pages={433-442},
# }
#
# Data from tables 1 and 2
#
# Columns ...
# Lattitude (degrees N), mean male adult mass in herd (kg)
58.0,214.5
57.7,216.3
58.0,217.8
57.9,214.0
59.8,215.1
61.6,241.9
62.0,247.8
62.7,235.2
64.0,259.2
65.5,250.0
66.0,246.4
66.0,262.2
```

- (From Rosner, 2000) In newborn babies, blood-pressure depends on the weight of the baby as well as the age of the baby when the blood pressure is recorded. Below is a data set from 16 infants. Use linear least-squares to fit a model of the form \(b = c_a a + c_w w + c_0\) where \(a\) is the baby’s age, \(w\) is the baby’s weight, \(b\) is the baby’s blood pressure, and \(c_i\) are the model parameters to be estimated. Hint: the method for one independent variable works just the same with 2 independent variables, but with the matrix \(A\) having 3 columns instead of 2.

[ * Data : hide , shown as table , shown as CSV shown as QR * ]

```
#patient id, age (days), birthweight (oz), systolic blood pressure (mm Hg)
#Data from Rosner, Fundamentals of Biostatistics, page 468
01, 3, 135, 89
02, 4, 120, 90
03, 3, 100, 83
04, 2, 105, 77
05, 4, 130, 92
06, 5, 125, 98
07, 2, 125, 82
08, 3, 105, 85
09, 5, 120, 96
10, 4, 090, 95
11, 2, 120, 80
12, 3, 095, 79
13, 3, 120, 86
14, 4, 150, 97
15, 3, 160, 92
16, 3, 125, 88
```

The method of linear least squares can be applied to fitting certain polynomial curves to data. The parabola \(y = p_0 + p_1 x + p_2 x^2\) can be fit to data by letting the columns of the matrix \(A\) be the monomial powers \(x^0\), \(x^1\), and \(x^2\), with the coefficients \(p_i\) being our unknowns. Use the method of linear least squares to find the parabola that best fits the points \((-2,8)\), \((0,5)\), \((2, 3)\), \((4,4)\), and \((6,9)\).

Fit a circle to the following data set by applying the method of linear least squares to estimate the parameters \(a\), \(b\), and \(c\) in the circle equation \[x^2 + y^2 = a + b x + c y.\]

[ * Data : hide , shown as table , shown as CSV shown as QR * ]

```
4.699 -15.127
5.003 -13.942
5.191 -15.625
5.309 -15.477
5.367 -15.923
5.399 -14.886
5.461 -13.778
5.563 -13.871
5.858 -16.825
6.305 -12.915
6.435 -17.143
6.750 -17.045
6.789 -16.919
6.882 -13.101
6.977 -16.659
7.154 -13.088
7.228 -12.828
7.855 -12.871
8.072 -16.715
8.468 -14.796
8.527 -13.972
8.690 -14.195
8.835 -15.526
8.853 -13.606
8.898 -15.991
8.946 -15.823
9.021 -15.789
9.038 -16.106
9.051 -16.004
9.094 -15.117
```

George Airy (1826), like Legendre before him, recognized that the local approximation wasn’t a great match to the long transects contemporary surveyers were making in their efforts to improve accuracy. Instead of working directly with point estimates of curvature, Airy compared estimates of arc length between different latitudes. Airy’s data (given below) is of the form \((\phi, \psi, d)\), where \(\phi\) is the starting latitude, \(\psi\) is the ending latitude, and \(d\) is the distance measured between the two latitudes. Let’s try a version of his method and see what we get.

- Integrate our McLaurin series approximation \[\frac{ds}{d\lambda} \approx c + m \sin^{2}{\left (\lambda \right )},\] to get a formula for \(s(\lambda)\). Use trigonometry identities if necessary to express your formula in terms of \(\sin(2 \lambda)\).
- Use your formula for \(s(\lambda)\) to predict the arc measurements \(d(\phi,\psi)\). Your prediction should be linear in parametres \(c\) and \(m\), although it will
**not**be linear in \(\phi\) or \(\psi\). - Use linear least squares and Airy’s data to get estimates for the values of \(c\) and \(m\).
- Based on your estimates of \(c\) and \(m\), what’s the radius of the earth? How accurate is this estimate compared modern estimates?

[ * Data : hide , shown as table , shown as CSV shown as QR * ]

```
# lower latitude, upper latitude, measured distance
#
# These are the improved data used by Bowditch and Airy
# in their estimates of the figure of the earth.
# https://archive.org/stream/mcaniquecles02laplrich#page/452/mode/2up
#
# latitudes are in radians
# the measured distance between latitudes is in meters
#
-7.3313525e-04, 5.36780860e-02, 3.44747088e+05
1.42430454e-01, 3.15146346e-01, 1.09477454e+06
6.74841785e-01, 8.90744298e-01, 1.37446573e+06
8.83453039e-01, 9.33023396e-01, 3.15927029e+05
1.14362830e+00, 1.17193793e+00, 1.80813456e+05
```

(Medium) One of the sources of error in Airy’s approximation is the use of only the first two terms in the series approximation of \(ds/d\lambda\).

- Derive the 3rd term in the McLaurin series of \(ds/d\lambda\) for small \(\epsilon\).
- Integrate this improved series to find an improved approximation to \(s(\lambda)\).
- Using a modern approximation for the earth’s radius, plot the log of the sum of the squared errors between the predicted and observed values of \(d\) as a function of \(\epsilon\) over the range \([0,0.01]\).
- Which value of \(\epsilon\) minimizes this log sum of squared errors? (Note: because our formula for \(s(\lambda)\) is now a nonlinear function of \(\epsilon\), we can no longer use linear least squares to directly estimate its value.)