Solving Optimization Problems with Equality Constraints

Basic Concepts

This article discusses optimization problems of the following form

minimizef(x)subject toh(x)=0\begin{align*} minimize\quad f(x)\\ subject\ to\quad h(x)=0 \end{align*}

where xRn,f:RnR,h:RnRm,h=[h1,...,hm]T,mnx\in R^{n},f:R^{n}\to R,h:R^{n}\to R^{m},h=[h_{1},...,h_{m}]^{T},m\le n, and we assume the function hh is continuously differentiable, i.e. hC1h\in C^{1}. Let us introduce a few basic concepts:

Regular point: For a point xx^{**} satisfying the constraints h1(x)=0,...,hm(x)=0h_{1}(x^{*})=0,...,h_{m}(x^{*})=0, if the gradient vectors h1(x),...,hm(x)\nabla h_{1}(x^{*}),...,\nabla h_{m}(x^{*}) are linearly independent, then xx^{*} is said to be a regular point of the constraints.

Tangent space: The tangent space at a point xx^{*} on the surface S=xRn:h(x)=0S={x\in R^{n}:h(x)=0} is the set T(x)={y:Dh(x)y=0}T(x^{*})=\{ y:Dh(x^{*})y=0\} . We can see that the tangent space T(x^{_}) is the null space of the matrix Dh(x^{_}), i.e. T(x^{_})=N(Dh(x^{_})).

Normal space: The normal space at a point xx^{*} on the surface S=xRn:h(x)=0S={x\in R^{n}:h(x)=0} is the set N(x)={xRn:x=Dh(x)Tz,zRm}N(x^{*})=\{ x\in R^{n}:x=Dh(x^{*})^{T}z,z\in R^{m}\} . We can see that the normal space N(x^{_}) is the null space of the matrix Dh(x^{_}), i.e. N(x^{_})=R(Dh(x^{_})^{T}).

The Lagrange Condition

First consider an optimization problem with only two decision variables and one equality constraint. Let h:R2Rh:R^{2}\to R be the constraint function. We know that at a point xx in the domain of the function, the gradient h(x)\nabla h(x) is orthogonal to the level set of h(x)h(x) passing through that point. Choose a point x=[x1,x1]Tx^{*}=[x^{*}_{1},x^{*}_{1}]^{T} such that h(x)=0h(x^{*})=0 and h(x)0\nabla h(x^{*})\neq 0. The level set passing through the point xx^{*} is the set {x:h(x)=0}\{ x:h(x)=0\}. We can parameterize it within a neighborhood of xx^{*} using a curve x(t)x(t), where x(t)x(t) is a continuously differentiable vector function h:RR2h:R\to R^{2}:

x(t)=[x1(t),x1(t)]T,t(a,b),x=x(t),x˙(t)0,t(a,b)\begin{align*} x(t)=[x_{1}(t),x_{1}(t)]^{T},t\in (a,b),x^{*}=x(t^{*}),\dot{x}(t^{*})\neq 0,t^{*}\in (a,b) \end{align*}

Next, we can show that h(x)\nabla h(x^{*}) is orthogonal to x˙(t)\dot{x}(t^{*}). Since hh is the constant 0 along the curve {x(t):t(a,b)}\{x(t):t\in (a,b)\}, i.e. for all t(a,b)t\in (a,b) we have

h(x(t))=0h(x(t))=0

therefore for any t(a,b)t\in(a,b) we have

ddth(x(t))=0\frac{d}{dt}h(x(t))=0

Using the chain rule we obtain

ddth(x(t))=h(x(t))Tx˙(t)=0\frac{d}{dt}h(x(t))=\nabla h(x(t))^{T}\dot{x}(t)=0

Therefore h(x)\nabla h(x^{*}) and x˙(t)\dot{x}(t^{*}) are orthogonal. When xx^{*} is a minimizer of f:RR2f:R\to R^{2} subject to h(x)=0h(x)=0, we can show that f(x)\nabla f(x^{*}) is orthogonal to x˙(t)\dot{x}(t^{*}). Construct the composite function of tt:

ϕ(t)=f(x(t))\phi(t)=f(x(t))

It attains its minimum when t=tt=t^{*}. By the first-order necessary condition for unconstrained extremum problems, we know that

dϕdt(t)=0\frac{d\phi}{dt}(t^{*})=0

Using the chain rule we obtain

ddtϕ(t)=f(x(t))Tx˙(t)=f(x)Tx˙(t)=0\frac{d}{dt}\phi(t^{*})=\nabla f(x(t^{*}))^{T}\dot{x}(t^{*})=\nabla f(x^{*})^{T}\dot{x}(t^{*})=0

Therefore f(x)\nabla f(x^{*}) and x˙(t)\dot{x}(t^{*}) are orthogonal. We have already shown above that h(x)\nabla h(x^{*}) is orthogonal to x˙(t)\dot{x}(t^{*}), so the vectors f(x)\nabla f(x^{*}) and h(x)\nabla h(x^{*}) are parallel, and we can derive the Lagrange theorem for this case:

Lagrange theorem for n=2, m=3: Let the point xx^{*} be a minimizer of the function f:R2Rf:R^{2}\to R subject to the constraint h(x)=0,h:R2Rh(x)=0,h:R^{2}\to R. Then f(x)\nabla f(x^{*}) and h(x)\nabla h(x^{*}) are parallel, i.e. if h(x)0\nabla h(x^{*})\neq 0, then there exists a scalar λ\lambda^{*} such that

f(x)+λh(x)=0\nabla f(x^{*})+\lambda^{*}\nabla h(x^{*})=0

where λ\lambda^{*} is the Lagrange multiplier. Generalizing this theorem to the general case, i.e. when f:RnR,h:RnRm,mnf:R^{n}\to R,h:R^{n}\to R^{m},m\le n, we obtain: Lagrange theorem: Let xx^{*} be a local minimizer (or maximizer) of f:RnRf:R^{n}\to R subject to the constraint h(x)=0,h:RnRm,mnh(x)=0,h:R^{n}\to R^{m},m\le n. If xx^{*} is a regular point, then there exists λRm\lambda^{*}\in R^{m} such that

Df(x)+λTDh(x)=0D f(x^{*})+\lambda^{*T}D h(x^{*})=0

Second-Order Conditions

Second-order necessary condition: Let xx^{*} be a local minimizer of f:RnRf:R^{n}\to R subject to the constraint h(x)=0,h:RnRm,mn,f,hC2h(x)=0,h:R^{n}\to R^{m},m\le n,f,h\in C^{2}. If xx^{*} is a regular point, then there exists λRm\lambda^{*}\in R^{m} such that

  1. Df(x)+λTDh(x)=0TD f(x^{*})+\lambda^{*T}D h(x^{*})=0^{T} 2. For all yT(x)y\in T(x^{*}), we have yTL(x,λ)y0y^{T}L(x^{*},\lambda^{*})y\ge 0

Second-order sufficient condition: Suppose the functions f,hC2f,h\in C^{2}. If there exist a point xRnx^{*}\in R^{n} and λRm\lambda^{*}\in R^{m} such that

  1. Df(x)+λTDh(x)=0TD f(x^{*})+\lambda^{*T}D h(x^{*})=0^{T} 2. For all yT(x)y\in T(x^{*}), we have yTL(x,λ)y>0y^{T}L(x^{*},\lambda^{*})y> 0

then xx^{*} is a strict local minimizer of ff subject to the constraint h(x)=0h(x)=0.

This article introduced the method of Lagrange multipliers under equality constraints. Later we will also cover the method of Lagrange multipliers under inequality constraints, the KKT conditions, and more. To be continued…