Solving Systems of Linear Equations (2)

Language:中文/EN

Minimum Norm Solution of a System of Linear Equations

In the previous blog post, we discussed one scenario of a system of linear equations, where the number of unknowns is less than the number of equations, and introduced the least squares method. In this post, we will cover another scenario, where the number of equations is less than the number of unknowns. In this case, the system has infinitely many solutions, but only one solution is closest to the origin, which is the solution with the smallest norm, also known as the minimum norm solution of the system of linear equations.

Consider the system of linear equations \(Ax=b\), where ARmn,mnA\in R^{m*n},m\le n. We aim to find its minimum norm solution, which is equivalent to solving the following optimization problem:

minimize xsubject to Ax=b\begin{align*} \text{minimize}\ ||x|| \\ \text{subject to}\ Ax=b \end{align*}

This problem falls under equality-constrained optimization problems and can be solved using the method of Lagrange multipliers. However, here we introduce an alternative approach.

The conclusion is that the minimum norm solution of this system is x=AT(AAT)1bx^{*}=A^{T}(AA^{T})^{-1}b. Below is the proof:

x2=(xx)+x2=xx2+x2+2xT(xx)||x||^{2}=||(x-x^{*})+x^{*}||^{2}\\ =||x-x^{*}||^{2}+||x^{*}||^{2}+2x^{*T}(x-x^{*})

Let x=AT(AAT)1bx^{*}=A^{T}(AA^{T})^{-1}b substitute into the rightmost term of the above equation to get 2xT(xx)=02x^{*T}(x-x^{*})=0. Thus, x2=xx2+x2||x||^{2}=||x-x^{*}||^{2}+||x^{*}||^{2}. Since x2||x^{*}||^{2} is a constant, and when xxx\neq x^{*}, xx2>0||x-x^{*}||^{2}>0 always holds, we can conclude that x=xx=x^{*} is the unique minimum solution to this optimization problem, i.e., the minimum norm solution.

Kaczmarz Algorithm

The Kaczmarz algorithm is an iterative method for solving the minimum norm solution, which avoids the step of computing (AAT)1(AA^{T})^{-1}, making the computation more efficient. Assuming ARmnA\in R^{m*n}, the iterative formula is given directly as follows: During the first mm iterations, i.e., k=0,1,...,m1k=0,1,...,m-1, we have:

x(k+1)=x(k)+μ(bk+1ak+1Tx(k))ak+1ak+1Tak+1x^{(k+1)}=x^{(k)}+\mu(b_{k+1}-a_{k+1}^{T}x^{(k)})\frac{a_{k+1}}{a_{k+1}^{T}a_{k+1}}

For the (m+1)(m+1)-th iteration, reuse the first row of AA and the first element of bb, i.e.,

x(m+1)=x(m)+μ(b1a1Tx(m))a1a1Ta1x^{(m+1)}=x^{(m)}+\mu(b_{1}-a_{1}^{T}x^{(m)})\frac{a_{1}}{a_{1}^{T}a_{1}}

Every mm iterations, the process loops back to the beginning. The parameter μ\mu can be considered as part of the iterative algorithm’s process. It can be proven that in the Kaczmarz algorithm, if x(0)=0x^{(0)}=0, then as kk\to \infty, x(k)x=AT(AAT)1bx^{(k)}\to x^{*}=A^{T}(AA^{T})^{-1}b. The detailed proof is omitted here.

Future articles will introduce general solutions for systems of linear equations, pseudoinverse, and related topics. To be continued…