Deriving the SVM (Part 2)
In the previous post (Part 1) we discussed the derivation of the hard-margin SVM and its dual form, whose dual problem can be simplified into the following form:
This problem can be regarded as a quadratic programming problem with as the optimization variable. There are many mature methods for solving quadratic programming problems, and for SVM optimization the most efficient one is the SMO (Sequential Minimal Optimization) algorithm.
The SMO Sequential Optimization Algorithm
The SMO sequential optimization algorithm first initializes all the variables in , for example by setting and then treats two of the components of as variables, for example (when selecting the two components , one usually first picks as the one that most severely violates the KKT conditions mentioned above, and then chooses as the second variable the corresponding to the that is farthest in margin from ), while keeping the remaining fixed. Then, by the constraint , we obtain . The above problem can thus be reduced to a quadratic programming problem in two variables (letting ):
In the above quadratic programming problem, since , we obtain . Substituting this constraint into yields a single-variable quadratic programming problem. If we set aside the inequality constraints for the moment, we can obtain a closed-form solution directly, without resorting to numerical methods, which greatly improves computational speed.
Letting , substituting into gives:
Setting directly, we obtain the closed-form solution for as , where and . The obtained here does not yet account for the inequality constraints . From together with , we can solve the inequalities to obtain the upper bound and lower bound of ; that is, after clipping we obtain the closed-form solution for as:
In addition, from we can obtain , which completes the update of one group of variables in the SMO algorithm. Repeating the process of variable selection, closed-form solving, and variable clipping until all variables of satisfy the KKT conditions from Part 1, we can then use the formulas for and given in Part 1 to obtain the trained hyperplane. This completes the mathematical derivation of the hard-margin SVM. Future posts will continue to introduce the derivation of the soft-margin SVM and the application of kernel methods. To be continued…