Solving Systems of Linear Equations (1)

Least Squares Analysis

This article discusses solving one particular case of systems of linear equations, namely considering the system

Ax=bAx=b

where ARmn,bRm,mn,rank(A)=nA\in R^{m*n},b\in R^{m},m\ge n,rank(A)=n. In this case, the number of unknowns is smaller than the number of equations, so it is very likely that bb does not lie in the range space of AA, meaning the system has no solution. However, in this situation we can obtain the least squares solution of this linear system, that is, there exists xx^{*} such that for all xRnx\in R^{n} we have

Axb2Axb2||Ax-b||^{2}\ge ||Ax^{*}-b||^{2}

We then call xx^{*} the least squares solution of this linear system. When bb lies in the solution space of AA, xx^{*} is naturally the solution of the system. The least squares solution can be computed directly via the following formula:

x=(ATA)1ATbx^{*}=(A^{T}A)^{-1}A^{T}b

The proof is as follows. Construct the objective function

f(x)=Axb2=(Axb)T(Axb)=12xT(2ATA)xxT(2ATb)+bTbf(x)=||Ax-b||^{2}\\ =(Ax-b)^{T}(Ax-b)\\ =\frac{1}{2}x^{T}(2A^{T}A)x-x^{T}(2A^{T}b)+b^{T}b

Clearly the function ff is a quadratic form, and since rank(A)=nrank(A)=n, this quadratic form is positive definite. Using the first-order necessary condition for a local minimizer, we find that the minimizer satisfies

f(x)=2ATAx2ATb=0\nabla f(x)=2A^{T}Ax-2A^{T}b=0

The unique solution of this equation is x=(ATA)1ATbx^{*}=(A^{T}A)^{-1}A^{T}b, which is the least squares solution. The least squares method is a very convenient algorithm for applications such as line fitting.

Recursive Least Squares

The previous section introduced the least squares method, which we can use for line fitting. If we want to add several groups of data to the fitted data, we can do so via the recursive least squares method, that is, by making partial corrections based on the previous fitting result. In other words, we use the least squares solution xx^{*} obtained from the previous fit to derive the least squares solution xx^{*} after the data points have been added.

For a given optimization problem, in order to find a suitable xx that minimizes A0xb(0)2||A_{0}x-b^{(0)}||^{2}, the solution to this problem is known to be x(0)=G01A0Tb(0)x^{(0)}=G_{0}^{-1}A_{0}^{T}b^{(0)}, where G0=A0TA0G_{0}=A^{T}_{0}A_{0}. If new data is added, represented by the matrix A1A_{1} and the vector b1b_{1}, then the problem becomes finding xx that minimizes

[A0A1]x[b(0)b(1)]2|| \begin{bmatrix} A_{0}\\A_{1} \end{bmatrix}x-\begin{bmatrix} b^{(0)}\\b^{(1)} \end{bmatrix}||^{2}

Its iterative formula is:

G1=G0+A1TA1x(1)=x(0)+G11A1T(b(1)A1x(0))G_{1}=G_{0}+A_{1}^{T}A_{1}\\ x^{(1)}=x^{(0)}+G_{1}^{-1}A_{1}^{T}(b^{(1)}-A_{1}x^{(0)})

The proof is omitted for now. In the general case, the iterative formula is:

Gk+=Gk+Ak+1TAk+1x(k+1)=x(k)+Gk+11Ak+1T(b(k+1)Ak+1x(k))G_{k+}=G_{k}+A_{k+1}^{T}A_{k+1}\\ x^{(k+1)}=x^{(k)}+G_{k+1}^{-1}A_{k+1}^{T}(b^{(k+1)}-A_{k+1}x^{(k)})

For solving systems of linear equations, we will later introduce the minimum-norm solution and the general solution method, as well as the pseudoinverse and related topics. To be continued…