Solving Optimization Problems with Equality Constraints

Basic Concepts

This article discusses optimization problems of the following form

\begin{align*} minimize\quad f(x)\\ subject\ to\quad h(x)=0 \end{align*}

where $x\in R^{n},f:R^{n}\to R,h:R^{n}\to R^{m},h=[h_{1},...,h_{m}]^{T},m\le n$ , and we assume the function $h$ is continuously differentiable, i.e. $h\in C^{1}$ . Let us introduce a few basic concepts:

Regular point: For a point $x^{**}$ satisfying the constraints $h_{1}(x^{*})=0,...,h_{m}(x^{*})=0$ , if the gradient vectors $\nabla h_{1}(x^{*}),...,\nabla h_{m}(x^{*})$ are linearly independent, then $x^{*}$ is said to be a regular point of the constraints.

Tangent space: The tangent space at a point $x^{*}$ on the surface $S={x\in R^{n}:h(x)=0}$ is the set $T(x^{*})=\{ y:Dh(x^{*})y=0\}$ . We can see that the tangent space $T(x^{_})$ is the null space of the matrix $Dh(x^{_})$ , i.e. $T(x^{_})=N(Dh(x^{_}))$ .

Normal space: The normal space at a point $x^{*}$ on the surface $S={x\in R^{n}:h(x)=0}$ is the set $N(x^{*})=\{ x\in R^{n}:x=Dh(x^{*})^{T}z,z\in R^{m}\}$ . We can see that the normal space $N(x^{_})$ is the null space of the matrix $Dh(x^{_})$ , i.e. $N(x^{_})=R(Dh(x^{_})^{T})$ .

The Lagrange Condition

First consider an optimization problem with only two decision variables and one equality constraint. Let $h:R^{2}\to R$ be the constraint function. We know that at a point $x$ in the domain of the function, the gradient $\nabla h(x)$ is orthogonal to the level set of $h(x)$ passing through that point. Choose a point $x^{*}=[x^{*}_{1},x^{*}_{1}]^{T}$ such that $h(x^{*})=0$ and $\nabla h(x^{*})\neq 0$ . The level set passing through the point $x^{*}$ is the set $\{ x:h(x)=0\}$ . We can parameterize it within a neighborhood of $x^{*}$ using a curve $x(t)$ , where $x(t)$ is a continuously differentiable vector function $h:R\to R^{2}$ :

\begin{align*} x(t)=[x_{1}(t),x_{1}(t)]^{T},t\in (a,b),x^{*}=x(t^{*}),\dot{x}(t^{*})\neq 0,t^{*}\in (a,b) \end{align*}

Next, we can show that $\nabla h(x^{*})$ is orthogonal to $\dot{x}(t^{*})$ . Since $h$ is the constant 0 along the curve $\{x(t):t\in (a,b)\}$ , i.e. for all $t\in (a,b)$ we have

h(x(t))=0

therefore for any $t\in(a,b)$ we have

\frac{d}{dt}h(x(t))=0

Using the chain rule we obtain

\frac{d}{dt}h(x(t))=\nabla h(x(t))^{T}\dot{x}(t)=0

Therefore $\nabla h(x^{*})$ and $\dot{x}(t^{*})$ are orthogonal. When $x^{*}$ is a minimizer of $f:R\to R^{2}$ subject to $h(x)=0$ , we can show that $\nabla f(x^{*})$ is orthogonal to $\dot{x}(t^{*})$ . Construct the composite function of $t$ :

\phi(t)=f(x(t))

It attains its minimum when $t=t^{*}$ . By the first-order necessary condition for unconstrained extremum problems, we know that

\frac{d\phi}{dt}(t^{*})=0

Using the chain rule we obtain

\frac{d}{dt}\phi(t^{*})=\nabla f(x(t^{*}))^{T}\dot{x}(t^{*})=\nabla f(x^{*})^{T}\dot{x}(t^{*})=0

Therefore $\nabla f(x^{*})$ and $\dot{x}(t^{*})$ are orthogonal. We have already shown above that $\nabla h(x^{*})$ is orthogonal to $\dot{x}(t^{*})$ , so the vectors $\nabla f(x^{*})$ and $\nabla h(x^{*})$ are parallel, and we can derive the Lagrange theorem for this case:

Lagrange theorem for n=2, m=3: Let the point $x^{*}$ be a minimizer of the function $f:R^{2}\to R$ subject to the constraint $h(x)=0,h:R^{2}\to R$ . Then $\nabla f(x^{*})$ and $\nabla h(x^{*})$ are parallel, i.e. if $\nabla h(x^{*})\neq 0$ , then there exists a scalar $\lambda^{*}$ such that

\nabla f(x^{*})+\lambda^{*}\nabla h(x^{*})=0

where $\lambda^{*}$ is the Lagrange multiplier. Generalizing this theorem to the general case, i.e. when $f:R^{n}\to R,h:R^{n}\to R^{m},m\le n$ , we obtain: Lagrange theorem: Let $x^{*}$ be a local minimizer (or maximizer) of $f:R^{n}\to R$ subject to the constraint $h(x)=0,h:R^{n}\to R^{m},m\le n$ . If $x^{*}$ is a regular point, then there exists $\lambda^{*}\in R^{m}$ such that

D f(x^{*})+\lambda^{*T}D h(x^{*})=0

Second-Order Conditions

Second-order necessary condition: Let $x^{*}$ be a local minimizer of $f:R^{n}\to R$ subject to the constraint $h(x)=0,h:R^{n}\to R^{m},m\le n,f,h\in C^{2}$ . If $x^{*}$ is a regular point, then there exists $\lambda^{*}\in R^{m}$ such that

$D f(x^{*})+\lambda^{*T}D h(x^{*})=0^{T}$ 2. For all $y\in T(x^{*})$ , we have $y^{T}L(x^{*},\lambda^{*})y\ge 0$

Second-order sufficient condition: Suppose the functions $f,h\in C^{2}$ . If there exist a point $x^{*}\in R^{n}$ and $\lambda^{*}\in R^{m}$ such that

$D f(x^{*})+\lambda^{*T}D h(x^{*})=0^{T}$ 2. For all $y\in T(x^{*})$ , we have $y^{T}L(x^{*},\lambda^{*})y> 0$

then $x^{*}$ is a strict local minimizer of $f$ subject to the constraint $h(x)=0$ .

This article introduced the method of Lagrange multipliers under equality constraints. Later we will also cover the method of Lagrange multipliers under inequality constraints, the KKT conditions, and more. To be continued…

Technology

2018 · 05 · 18