Derivation of SVM (3)

Language:中文/EN

Derivation of SVM (3)

In the previous article, we introduced the derivation of hard-margin SVM. This article will continue with the mathematical derivation of soft-margin SVM, which allows some misclassification of samples when they are not linearly separable. Soft-margin SVM permits some samples to not satisfy the constraint: yi(wxi)0y_{i}(w\cdot x_{i})\ge 0.

The optimization objective can be written as:

minw,b12w2+Ci=1mloss(yi(wxi+b)1)min_{w,b}\quad \frac{1}{2}||w||^{2}+C\sum_{i=1}^{m}loss(y_{i}(w\cdot x_{i}+b)-1)

Here, CC is a constant used to measure the degree of constraint violation allowed. The loss()loss() function can use the hinge()hinge() function, i.e., losshinge(z)=max(0,1z)loss_{hinge}(z)=max(0,1-z).

Thus, the optimization objective can be rewritten as:

minw,b12w2+Ci=1mmax(0,1yi(wxi+b))min_{w,b}\quad \frac{1}{2}||w||^{2}+C\sum_{i=1}^{m}max(0,1-y_{i}(w\cdot x_{i}+b))

By introducing the “slack variable” ξi0\xi_{i}\ge 0, the above expression can be rewritten as:

minw,b,ξi12w2+Ci=1mξis.t.yi(wxi+b)1ξiξi0,i=1,2,...,m\begin{align*} min_{w,b,\xi_{i}}\quad& \frac{1}{2}||w||^{2}+C\sum_{i=1}^{m}\xi_{i}\\ s.t.\quad& y_{i}(w\cdot x_{i}+b)\ge1-\xi_{i}\\ &\xi_{i}\ge0,i=1,2,...,m \end{align*}

Similar to hard-margin SVM, the above problem is also a quadratic programming problem. It can be converted into a corresponding dual problem using Lagrangian duality and solved with the SMO algorithm. The Lagrangian function corresponding to the above problem is:

L(w,b,α,ξ,μ)=12w2+Ci=1mξi+i=1mαi(1ξiyi(wxi+b))i=1mμiξiL(w,b,\alpha,\xi,\mu)=\frac{1}{2}||w||^{2}+C\sum_{i=1}^{m}\xi_{i}\\+\sum_{i=1}^{m}\alpha_{i}(1-\xi_{i}-y_{i}(w\cdot x_{i}+b))-\sum_{i=1}^{m}\mu_{i}\xi_{i}

Setting the partial derivatives of LL with respect to w,b,αw, b, \alpha to 0, we get:

w=i=1mαiyixi0=i=1mαiyiC=αi+μiw=\sum_{i=1}^{m}\alpha_{i}y_{i}x_{i}\\ 0=\sum_{i=1}^{m}\alpha_{i}y_{i}\\ C=\alpha_{i}+\mu_{i}

Substituting into L(w,b,α,ξ,μ)L(w,b,\alpha,\xi,\mu), the original problem can be transformed into the dual problem:

minα12i=1mj=1mαiαjyiyjxixji=1mαis.t.i=1mαiyi=0Cαi0i=1,2,...,m\begin{align*} min_ \alpha\quad &\frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\alpha_{i}\alpha_{j}y_{i}y_{j}x_{i}\cdot x_{j}-\sum_{i=1}^{m}\alpha_{i}\\ s.t.\quad &\sum_{i=1}^{m}\alpha_{i}y_{i}=0\\ &C\ge \alpha_{i}\ge 0\\ &i=1,2,...,m \end{align*}

The only difference from hard-margin SVM is that αi0\alpha_{i}\ge 0 becomes Cαi0C\ge \alpha_{i}\ge 0. The SMO algorithm mentioned earlier can be conveniently used to solve this, with the only difference being that clipping needs to consider both directions.

Later, we will introduce the kernel trick of SVM and commonly used kernels. To be continued…