Deriving the SVM (3)
The previous posts covered the derivation of the hard-margin SVM. This post continues with the mathematical derivation of the soft-margin SVM, which allows some samples to be misclassified when the data is not linearly separable. The soft-margin SVM permits some samples to violate the constraint .
We can write the optimization objective as:
Here is a constant that measures how much constraint violation we tolerate, and the function can be the function, i.e. .
So the optimization objective can be written as:
Introducing “slack variables” , we can rewrite the expression above as
As with the hard-margin SVM, the problem above is a quadratic programming problem. We can first use Lagrangian duality to convert it into the corresponding dual problem, then solve it with the SMO algorithm. The Lagrangian for the problem above is:
Setting the partial derivatives of with respect to to yields
Substituting back into turns the original problem into the dual problem:
As we can see, the only difference from the hard-margin SVM is that becomes . It can likewise be solved conveniently with the SMO algorithm mentioned earlier; the only difference is that the clipping step needs to consider two directions.
In later posts I’ll also introduce the kernel trick for SVMs and commonly used kernels. To be continued…