This problem can be viewed as a quadratic programming problem with α as the optimization variable. There are many mature solutions for quadratic programming problems, and the SMO (Sequential Minimal Optimization) algorithm is one of the most efficient for SVM optimization.
SMO Sequential Optimization Algorithm
The SMO algorithm first initializes all variables of α, for example, let α1,α2,...,αN=0. Then, it considers two components of α as variables, such as α1,α2 (when selecting two components αi,αj, typically the one that most severely violates the KKT conditions mentioned earlier is chosen as αi, and then αj corresponding to the point xj that is farthest from the margin of xi is chosen as the second variable). The remaining α3,α4,...,αN are fixed. According to the constraint ∑i=1Nαiyi=0, we can derive α1=−y1∑i=2Nαiyi. This problem can be transformed into a quadratic programming problem with two variables (let Kij=xi⋅xj):
In this quadratic programming problem, since α1y1+α2y2=ζ, we can derive α1=(ζ−y2α2)y1. Substituting this constraint into W(α1,α2) results in a single-variable quadratic programming problem. If we initially ignore the inequality constraints, we can directly obtain an analytical solution without numerical computation, significantly speeding up calculations.
Let vi=∑j=3NαjyjK(xi,xj). Substituting α1=(ζ−y2α2)y1 into W(α1,α2) gives:
By directly setting ∂α2∂W=0, we can obtain the analytical solution for α2 as α^2=α2+ηy2(E1−E2), where Ei=∑j=1NαjyjKij+b−yi and η=K11+K22−2K12. The obtained α^2 has not yet considered the inequality constraints α1,α2≥0. From α1=(ζ−y2α2)y1≥0 and α2≥0, we can solve the inequality to find the upper bound H and lower bound L for α2. After clipping, the analytical solution for α2 is:
α2∗=⎩⎨⎧H,α^2>Hα^2,L≤α^2≤HL,α^2<L
Additionally, based on α1∗=(ζ−y2α2∗)y1, we can obtain α1∗. This completes the update of a set of variables in the SMO algorithm. The process of variable selection, analytical solving, and variable clipping is repeated until all variables of α satisfy the KKT conditions mentioned in article (1). Then, using the formulas for w and b from article (1), we can obtain the trained hyperplane, completing the mathematical derivation of the hard-margin SVM. Future articles will continue to introduce the derivation of soft-margin SVM and the application of kernel methods. To be continued…