Solving Linear Equations (1)

Least Squares Analysis

In this article, we will discuss solving a specific case of linear equations, specifically considering the linear system

Ax = b

where $A \in R^{m \times n}, b \in R^{m}, m \ge n, \text{rank}(A) = n$ . In this scenario, the number of unknowns is less than the number of equations, so it’s highly likely that $b$ is not in the range space of $A$ , meaning the system has no solution. However, we can obtain the least squares solution of this linear equation, i.e., there exists an $x^{*}$ such that for all $x \in R^{n}$ , we have

||Ax - b||^{2} \ge ||Ax^{*} - b||^{2}

This $x^{*}$ is called the least squares solution of the linear equation. When $b$ is in the solution space of $A$ , $x^{*}$ naturally becomes the solution of the equation. The least squares solution can be directly calculated using the following formula:

x^{*} = (A^{T}A)^{-1}A^{T}b

Proof process is as follows: Construct the objective function

f(x) = ||Ax - b||^{2}\\ = (Ax - b)^{T}(Ax - b)\\ = \frac{1}{2}x^{T}(2A^{T}A)x - x^{T}(2A^{T}b) + b^{T}b

Clearly, the function $f$ is a quadratic function. Since $\text{rank}(A) = n$ , this quadratic form is positive definite. Using the first-order necessary condition for a local minimum, we can find that the minimum point satisfies

\nabla f(x) = 2A^{T}Ax - 2A^{T}b = 0

The unique solution to this equation is $x^{*} = (A^{T}A)^{-1}A^{T}b$ , which is the least squares solution. The least squares method is a convenient algorithm for applications like line fitting.

Recursive Least Squares Method

The previous section introduced the least squares method, which we can use for line fitting. If we want to add a few more data sets to the fitting data, we can use the recursive least squares method, which makes partial corrections based on the previous fitting result. This means using the least squares solution $x^{*}$ obtained from the last fitting to get the least squares solution $x^{*}$ after adding data points.

For an optimization problem to find a suitable $x$ that minimizes $||A_{0}x - b^{(0)}||^{2}$ , the solution is known as $x^{(0)} = G_{0}^{-1}A_{0}^{T}b^{(0)}$ , where $G_{0} = A^{T}_{0}A_{0}$ . If new data is added, represented by matrix $A_{1}$ and vector $b_{1}$ , then the problem becomes finding $x$ such that

|| \begin{bmatrix} A_{0}\\A_{1} \end{bmatrix}x - \begin{bmatrix} b^{(0)}\\b^{(1)} \end{bmatrix}||^{2}

is minimized. The iterative formula is:

G_{1} = G_{0} + A_{1}^{T}A_{1}\\ x^{(1)} = x^{(0)} + G_{1}^{-1}A_{1}^{T}(b^{(1)} - A_{1}x^{(0)})

The proof process is omitted here. In general, the iterative formula is:

G_{k+1} = G_{k} + A_{k+1}^{T}A_{k+1}\\ x^{(k+1)} = x^{(k)} + G_{k+1}^{-1}A_{k+1}^{T}(b^{(k+1)} - A_{k+1}x^{(k)})

For solving linear equations, we will also introduce the minimum norm solution and general solutions, as well as knowledge about pseudoinverses. To be continued…