HomeBlogTopicsPublish
  • rss

  • contact

© 2025 MIT Licensed

Topics→Machine Learning→Support Vector Machine 2

Machine Learning

Fundamentals
Decision TreeBackpropagationLinear 1Linear 2Linear 3Linear DualLinear IntroML Numerical MethodsNaive SolveLagrangian Conditions under Equality ConstraintsLagrangian Conditions under Inequality ConstraintsSupport Vector Machine 1Support Vector Machine 2Support Vector Machine 3Convex
Deep Learning Acceleration
Paper DartsPaper MobileNetsPaper ShuffleNetPaper HashingTricksPaper ShuffleNetV2Neural Architecture Search MilestonePyTorch Training Acceleration
Computer Vision
Paper RobotPaper InceptionV4Dataset Cityscapes
Reinforcement Learning
Paper Deep Q-Network

Support Vector Machine 2

April 4, 2018

by Frank

上一篇文章(1)我们讨论了硬间隔 SVM 的推导及其对偶形式,其对偶问题可以化简成以下形式:

硬间隔 SVM 的推导及其对偶形式,其对偶问题可以化简成以下形式:

minα12∑i=1N∑j=1Nαiαjyiyjxi⋅xj−∑i=1Nαis.t.∑i=1Nαiyi=0αi≥0i=1,2,...,N\begin{align*} min_ \alpha\quad &\frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_{i}\alpha_{j}y_{i}y_{j}x_{i}\cdot x_{j}-\sum_{i=1}^{N}\alpha_{i}\\ s.t.\quad &\sum_{i=1}^{N}\alpha_{i}y_{i}=0\\ &\alpha_{i}\ge 0\\ &i=1,2,...,N \end{align*}minα​s.t.​21​i=1∑N​j=1∑N​αi​αj​yi​yj​xi​⋅xj​−i=1∑N​αi​i=1∑N​αi​yi​=0αi​≥0i=1,2,...,N​

该问题可以看作是一个以α\alphaα为优化变量的二阶规划问题,二阶规划问题有很多成熟的解法,针对 SVM 的优化有一种最为高效的 SMO 序列优化算法。

SMO 序列优化算法

SMO 序列优化算法先将α\alphaα的所有变量进行初始化,比如令α1,α2,...,αN=0,\alpha_{1},\alpha_{2},...,\alpha_{N}=0,α1​,α2​,...,αN​=0,再将α\alphaα的其中两个分量看作变量,比如α1,α2\alpha_{1},\alpha_{2}α1​,α2​(在选取两个分量αi,αj\alpha_{i},\alpha_{j}αi​,αj​的时候,通常先取违反上文中 KKT 条件最严重的为αi\alpha_{i}αi​,然后选取离xix_{i}xi​间隔最远的xjx_{j}xj​对应的αj\alpha_{j}αj​为第二个变量),其余的α3,α4,...,αN\alpha_{3},\alpha_{4},...,\alpha_{N}α3​,α4​,...,αN​固定住,则根据约束条件∑i=1Nαiyi=0\sum_{i=1}^{N}\alpha_{i}y_{i}=0∑i=1N​αi​yi​=0可以得到α1=−y1∑i=2Nαiyi\alpha_{1}=-y_{1}\sum_{i=2}^{N}\alpha_{i}y_{i}α1​=−y1​∑i=2N​αi​yi​。上述问题即可以化为两个变量的二次规划问题(令Kij=xi⋅xjK_{ij}=x_{i}\cdot x_{j}Kij​=xi​⋅xj​):

minα1,α2W(α1,α2)=12K11α12+12K22α22+y1y2K12α1α2−(α1+α2)+y1α1∑i=3NyiαiKi1+y2α2∑i=3NyiαiKi2s.t.α1y1+α2y2=−∑i=3Nyiαi=ζα1,α2≥0\begin{align*} min_{\alpha_{1},\alpha_{2}}\quad W(\alpha_{1},\alpha_{2})=&\frac{1}{2}K_{11}\alpha_{1}^{2}+\frac{1}{2}K_{22}\alpha_{2}^{2}+y_{1}y_{2}K_{12}\alpha_{1}\alpha_{2}\\ &-(\alpha_{1}+\alpha_{2})+y_{1}\alpha_{1}\sum_{i=3}^{N}y_{i}\alpha_{i}K_{i1}+y_{2}\alpha_{2}\sum_{i=3}^{N}y_{i}\alpha_{i}K_{i2}\\ s.t.\quad &\alpha_{1}y_{1}+\alpha_{2}y_{2}=-\sum_{i=3}^{N}y_{i}\alpha_{i}=\zeta\\ &\alpha_{1},\alpha_{2}\ge0 \end{align*}minα1​,α2​​W(α1​,α2​)=s.t.​21​K11​α12​+21​K22​α22​+y1​y2​K12​α1​α2​−(α1​+α2​)+y1​α1​i=3∑N​yi​αi​Ki1​+y2​α2​i=3∑N​yi​αi​Ki2​α1​y1​+α2​y2​=−i=3∑N​yi​αi​=ζα1​,α2​≥0​

在上述二次规划问题中,由于α1y1+α2y2=ζ\alpha_{1}y_{1}+\alpha_{2}y_{2}=\zetaα1​y1​+α2​y2​=ζ,那么可以得到α1=(ζ−y2α2)y1\alpha_{1}=(\zeta-y_{2}\alpha_{2})y_{1}α1​=(ζ−y2​α2​)y1​,将该约束条件代入W(α1,α2)W(\alpha_{1},\alpha_{2})W(α1​,α2​)中即可以得到单变量的二次规划问题,如果先不考虑不等式约束条件,则可以直接得到解析解,不必利用数值计算的方式求解,这样可以大大提升计算速度。

令vi=∑j=3NαjyjK(xi,xj)v_{i}=\sum_{j=3}^{N}\alpha_{j}y_{j}K(x_{i},x_{j})vi​=∑j=3N​αj​yj​K(xi​,xj​),则将α1=(ζ−y2α2)y1\alpha_{1}=(\zeta-y_{2}\alpha_{2})y_{1}α1​=(ζ−y2​α2​)y1​代入W(α1,α2)W(\alpha_{1},\alpha_{2})W(α1​,α2​)可以得到:

W(α2)=12K11(ζ−α2y2)2+12K22α22+y2K12(ζ−α2y2)α2−(ζ−α2y2)y1−α2+v1(ζ−α2y2)+y2v2α2W(\alpha_{2})=\frac{1}{2}K_{11}(\zeta-\alpha_{2}y_{2})^{2}+\frac{1}{2}K_{22}\alpha_{2}^{2}+y_{2}K_{12}(\zeta-\alpha_{2}y_{2})\alpha_{2}-(\zeta-\alpha_{2}y_{2})y_{1}-\alpha_{2}+v_{1}(\zeta-\alpha_{2}y_{2})+y_{2}v_{2}\alpha_{2}W(α2​)=21​K11​(ζ−α2​y2​)2+21​K22​α22​+y2​K12​(ζ−α2​y2​)α2​−(ζ−α2​y2​)y1​−α2​+v1​(ζ−α2​y2​)+y2​v2​α2​

直接令∂W∂α2=0\frac{\partial W}{\partial\alpha_{2}}=0∂α2​∂W​=0,那么可以得到α2\alpha_{2}α2​的解析解为α^2=α2+y2(E1−E2)η\hat\alpha_{2}=\alpha_{2}+\frac{y_{2}(E_{1}-E_{2})}{\eta}α^2​=α2​+ηy2​(E1​−E2​)​,其中Ei=∑j=1NαjyjKij+b−yiE_{i}=\sum_{j=1}^{N}\alpha_{j}y_{j}K_{ij}+b-y_{i}Ei​=∑j=1N​αj​yj​Kij​+b−yi​,η=K11+K22−2K12\eta=K_{11}+K_{22}-2K_{12}η=K11​+K22​−2K12​。此时得到的α^2\hat\alpha_{2}α^2​还没有考虑不等式约束α1,α2≥0\alpha_{1},\alpha_{2}\ge 0α1​,α2​≥0,由α1=(ζ−y2α2)y1≥0\alpha_{1}=(\zeta-y_{2}\alpha_{2})y_{1}\ge0α1​=(ζ−y2​α2​)y1​≥0与α2≥0\alpha_{2}\ge0α2​≥0可以解不等式得到α2\alpha_2α2​的上界HHH与下界LLL,即经过剪辑可以得到α2\alpha_{2}α2​的解析解为:

α2∗={H,α^2>Hα^2,L≤α^2≤HL,α^2<L\alpha_{2}^{*}= \begin{cases} H,\quad \hat\alpha_{2}>H\\ \hat\alpha_{2},\quad L\le\hat\alpha_{2}\le H\\ L,\quad \hat\alpha_{2}<L \end{cases}α2∗​=⎩⎨⎧​H,α^2​>Hα^2​,L≤α^2​≤HL,α^2​<L​

另外根据α1∗=(ζ−y2α2∗)y1\alpha_{1}^{*}=(\zeta-y_{2}\alpha_{2}^{*})y_{1}α1∗​=(ζ−y2​α2∗​)y1​则可以得到α1∗\alpha_{1}^{*}α1∗​,这样便完成了 SMO 算法的一组变量的更新。重复进行变量选择,解析求解,变量剪辑的过程,直到α\alphaα的所有变量都能满足文章(1)中的 KKT 条件为止,然后再根据文章(1)中www与bbb的计算公式便可以得到训练好的超平面,这样便完成了硬间隔 SVM 的数学推导过程,后面的文章还会继续介绍软间隔 SVM 的推导与核方法的应用。To be continue...

←Previous: Support Vector Machine 1Next: Support Vector Machine 3→

Comments