http://www.ma-xy.com

第一章中级支持向量机 1

1.1 支持向量机最优化算法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 SMO 算法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 支持向量回归 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 多分类支持向量机 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4 最小二乘支持向量机 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4.1

分类问题

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4.2 回归问题 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5 MATLAB 支持向量机示例 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.5.1 支持向量机示例 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.5.2 使用 Bayesian 优化方法优化 SVM . . . . . . . . . . . . . . . . . . . . . . 19

1.6 Python 支持向量机示例 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.7 libsvm 和 LSSVM 简介 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

http://www.ma-xy.com

第一章中级支持向量机

1.1 支持向量机最优化算法

SVM 最后要求解一个凸二次规划问题

min

Qα − e

α (1.1)

s.t.











α = 0

0 ⩽ α

⩽ c

i = 1, 2, . . . , n

而对于普通的二次规划问题，有许多算法可以考虑，如：起作用集法、路径跟踪法、Lembe

方法等，可以参考后面优化部分相应的章节，还可以参考更详实的优化算法书籍。求解上面的

SVM 二次规划模型，我们就可以对分类数据进行分类了，但是，由普通的二次规划方法得到的

SVM 只在小样本上好用，对大样本而言，这种 SVM 会变得异常慢，甚至没办法求解。这是因为

在求解上述 SVM 二次规划问题时，我们会产生一个 Q 矩阵，Q 矩阵的大小为 n × n，由样本量

n 决定。在求解二次规划过程中，我们会迭代 Q，同时也会对其进行许多操作。当 n 很大时，Q

相关的操作变慢，因而 SVM 不能很好的工作。所以，对于大样本而言，我们需要重新设计优化

算法，使 SVM 能够很好的训练。

SVM 对偶问题 (1.1) 的解仅依赖于支持向量对应的样本，因此，如果我们事先知道哪些向

量是支持向量，就可以保留其对应的样本。而从训练集中去掉非支持向量后，训练的分类超平面

不变。基于此，Boser 等首先提出块选算法，块选算法的基本思想是，取训练集合中任意一个子

集为工作集 B，对 B 求最优化问题，得到支持向量，并对此时集合 N = data − B 用判别函数，

将其中不满足优化条件的向量按照偏离程度排序作为候补工作集 C。剔除 B 中非支持向量样本，

并用 C 做补充。Osuna 提出分解算法，其思想和块选法相似，不过它将 B 大小固定不变，剔除

和补充后的 B 大小和之前是一样的，此方法避免了当支持向量数目很大时的求解困难。

SMO 求解算法由 Platt 提出，是 |B| = 2 时 Osuna 分解的特例，Keerthi 等指出 Platt 的

SMO 中工作集两变量的选择不能保证最大程度优化原目标，为此，给出一种选择最大冲突对的

计算公式。Lin 等对改进的 SMO 算法的收敛性进行了证明。Fan 在 Kerethi 的基础上提出用函

数逼近方法选择工作集 B，并证明其收敛性。LIBSVM 工具包就是基于 Fan-SMO 算法的。下

面，我们主要介绍 SMO 算法及其工作集 B 的选择策略 (Keerthi 和 Fan 的策略)

http://www.ma-xy.com

1.1 支持向量机最优化算法第一章中级支持向量机

1.1.1 SMO 算法

设 α

∗

是对偶问题 (1.1) 的最优解，则对 ∀0 < α

∗

< c, b

∗

= y

−



i=1

−



i=1

∗

k(x

, x

)，有

∀0 < α

∗

< c y





i=1

∗

k(x

, x

) + b

∗



= 1

∗

= c y





i=1

∗

k(x

, x

) + b

∗



⩽ 1

∗

= 0 y





i=1

∗

k(x

, x

) + b

∗



⩾ 1

SMO 的基本思想是：如果所有变量的解都满足 KKT 条件，那么就得到了 (KKT 条件是充

分必要条件)；否则，选择两个变量，固定其它变量，针对这两个变量构建二次规划问题，构建的

二次规划问题可以通过解析方法来求解，这样可以提高 SVM 的速度。子问题的最优解接近原问

题 (1.1) 的解，因为它们使目标函数变小。

不失为一般性，假设选择的变量为 α

, α

，固定其它 n − 2 个变量 α

(i = 3, 4, . . . , n)。由



i=1

= 0

⇒ α

= y

−



i=2

所以当 α

给定后，α

也就确定了。但上述过程存在一个问题，即如何选取 α

, α

(即 B)？我们

将 α

, α

带入到优化模型 (1.1) 中，有

min

W (α) =



i=1



j=1

k(x

, x

) −



i=1





i=1



j=1

k(x

, x

) +



i=1



j=3

k(x

, x

)



i=3



j=1

k(x

, x

) +



i=3



j=3

k(x

, x

)



−



i=1

−



i=3



+ α

+ 2y



j=3



j=3



i=3



i=3



− α

+ ψ

constant

+ y

+ α

− α

+ ψ

constant

s.t.











+ α

= −



i=3

= ξ

0 ⩽ α

⩽ c

http://www.ma-xy.com 2 http://www.ma-xy.com

http://www.ma-xy.com

第一章中级支持向量机 1.1 支持向量机最优化算法

其中：k

= k(x

, x

)，v



j=3

。

上述问题是一个二元二次规划问题，我们可以作图来分析。先画取值范围 (即 s.t.)，由约束

0 ⩽ α

⩽ c，我们有图 (1.1)

图 1.1: SMO 二元二次规划图 1

图

(

1.1)

给出了变量

, α

的取值范围，接着，我们约束

，此约束需要分情

况讨论：

(1) 当 y

, y

同号时，有 α

+ α

= ξ，那么我们就要讨论 ξ 和 c 的大小，当 ξ < c 时，α

, α

的直线就如图 (1.2)(a) 下直线所示；当 ξ > c 时，α

, α

的直线就如图 (1.2)(a) 上直线所示。由

此我们有 α

的取值

L = max(0, ξ − c)

H = min(c, ξ)

L ⩽ α

new

⩽ H

(2) 当 y

, y

异号时，有 α

− α

= ξ，我们仍然要讨论 ξ 和 c 的大小，当 ξ < c 时，α

, α

的直线就如图

(

1.2)(b) 上直线所示；当 ξ < c 时，α

, α

的直线就如图 (1.2)(b) 下直线所示。由

此我们有 α

的取值

L = max(0, ξ − c)

H = min(c, ξ + c)

L ⩽ α

new

⩽ H

图

1.2: SMO

二元二次规划图

http://www.ma-xy.com 3 http://www.ma-xy.com

http://www.ma-xy.com

1.1 支持向量机最优化算法第一章中级支持向量机

设 α

old

, α

old

为更新前的点，α

new

, α

new

为更新后的点。由 α

+ α

= ξ 得到

= (ξ − α

)/y

= (ξ − α

将目标 J 中的 α

用上式替换，有

min

(

) =

(ξ − α

)

+ y

(ξ − α

+ (ξ − α

+ α

− (ξ − α

− α

+ ψ

constant

(ξ − α

)

+ y

(ξ − α

)

+ (ξ − α

+ α

− (ξ − α

− α

+ ψ

constant

因为要求 min

，所以目标函数 W (α

) 对 α

求导，令导数为零以求取极值，有

∂W

∂α

= k

+ k

− 2k

− k

ξy

+ k

ξy

+ y

− 1 − v

+ v

= 0

由上式推得

+ k

− 2k

)α

= y

− y

+ ξk

− ξk

+ v

− v

)

将 ξ = α

old

+ α

old

和 v

, v

带入上式，有

+ k

− 2k

)α

= y



− y

+ (α

old

+ α

old

)(k

− k

)



j=3

k(x

, x

) −



j=3

k(x

, x

)



= y



− y

+ (α

old

+ α

old

)(k

− k

)





j=1

k(x

, x

) − α

− α



−





j=1

k(x

, x

) − α

− α



= y

− y

+ g(x

) − g(x

) + α

old

+ k

− 2k

)]

其中：g(x

) =



j=1

k(x

, x

), g(x

) =



j=1

k(x

, x

)。

令 E



i=1

k(x

, x

) + b − y

为估计值与真实值之差，η = k

+ k

− 2k

。上式两

边同除 η，有

= α

old

− E

)

http://www.ma-xy.com 4 http://www.ma-xy.com

http://www.ma-xy.com

第一章中级支持向量机 1.1 支持向量机最优化算法

由于 α

有取值范围 L ⩽ α

⩽ H，于是 α

为

new,clipped











H H ⩽ α

new

L < α

new

< H

L α

new

⩽ L

与此同时，α

new

= α

old

。令 s = y

，最终的 α

的更新公式为

new

= α

old

+ s(α

− α

new,clipped

)

至此，我们已经给出其他参数的更新公式。但上述求解过程有一个问题，即 η 的取值，如

果 η 为 0，则不可以将其作为分母。Platt 在其论文中讨论了 η 的特殊情况：如果核 K 不满足

Mercer 定理，那么目标函数可能变得非正，η 可能取负。即使 K 是有效核，如果训练集中出现

相同的 x，那么 η 有可能为 0。SMO 算法在 η 非正时仍有效：我们可以推导出

∂

∂α

= η，当

η < 0 时，W 没有极小值，最小值在边缘取得；当 η > 0 时，W 更为单调函数，最小值也在边

缘处取得。而 α

的边缘即为 L 和 H，这样，将 α

= L 和 α

= H 带入 W 中即可求得 W 最

小值，将 α

修正到目标函数 W (L), W (H) 较小的端点上，α

= L/H。具体计算公式为

= y

+ b) − α

k(x

, x

) − sα

k(x

, x

)

= y

+ b) − sα

k(x

, x

) − α

k(x

, x

)

= α

+ s(α

− L)

= α

+ s(α

− H)

= L

+ Lf

k(x

, x

) +

k(x

, x

) + sLL

k(x

, x

)

= H

+ Hf

k(x

, x

) +

k(x

, x

) + sHH

k(x

, x

)

下面，我们来更新阈值 b 和差值 E

。在每次完成 α

, α

的更新后，都要重新计算 b 和 E。

(1) 当 0 < α

new

< c 时，有



i=1

+ b = y

由此可得

new

= y

−



i=3

− α

new

− α

new

我们知道 E



j=1

k(x

, x

) + b − y

，由此，我们得到



i=3

+ α

old

+ α

old

+ b

old

− y

将 E

带入到上面 b

的更新公式，有

new

= −E

+ α

old

+ α

old

+ b

old

− α

new

− α

new

= −E

− y

(α

new

− α

old

) + y

(α

new

− α

old

) + b

old

http://www.ma-xy.com 5 http://www.ma-xy.com

http://www.ma-xy.com

1.1 支持向量机最优化算法第一章中级支持向量机

同理，我们有 b

的更新公式

new

= −E

− y

(α

new

− α

old

) + y

(α

new

− α

old

) + b

old

(2) 如果 α

new

, α

new

同时满足 0 < α

new

< c，即两个 Langrange 乘子都在界内，则 b

new

。

(3) 如果 α

new

, α

new

是 0 或者 c，即两个 Langrange 乘子都在边界上，b

, b

以及它们之间

的都可以作为符合 KKT 条件的阈值，Platt 采用 b =

来处理。

值的更新要用到 b

new

，以及所有支持向量对应的 α

new



k(x

, x

) + b

new

− y

其中：s 为支持向量 x

的集合。

工作集 B 的确定

下面，我们来介绍工作集 B 的确定。Platt 的 SMO 算法每次修改 E

都涉及到 b，Keerthi

提出一种不考虑 b 的 KKT 条件判别法：设 F



j=1

k(x

, x

) − y

有

= E

− b

于是 KKT 条件变为

= 0 ⇒ y

+ b) ⩾ 0

0 < α

< c ⇒ y

+ b) = 0

= 0 ⇒ y

+ b) ⩽ 0

引入符号

= {i : 0 < α

< c} F

= −b

= {i : y

± 1 & α

= 0} F

⩾ −b

= {i : y

= −1 & α

= c} F

⩾ −b

= {i : y

= +1 & α

= c} F

⩽ −b

= {i : y

= −1 & α

> 0} F

⩽ −b

则 KKT 条件可表示为

i ∈ I

∪ I

⇒ F

⩾ −b

i ∈ I

∪ I

⇒ F

⩽ −b

http://www.ma-xy.com 6 http://www.ma-xy.com

http://www.ma-xy.com

第一章中级支持向量机 1.2 支持向量回归

当 KKT 条件满足时，∀i ∈ I

∪ I

, ∀j ∈ I

∪ I

，有 F

⩾ −b ⩾ F

。设

low

= max{−F

: i ∈ I

∪ I

}

= min{−F

: j ∈ I

∪ I

}

当 KKT 条件满足时，有 b

⩾ b

low

。这样就可以通过直接计算 b

⩾ b

low

，而不是每次先更新 b

值再来判断所得的解是否满足 KKT。Keerthi 工作集 B 的选取为

i = max ({−y

= 1, α

< c} ∪ {y

= −1, α

> 0})

= max (−F

= 1, α

< c} ∪ {y

= −1, α

> 0})

j = max ({y

= −1, α

< c} ∪ {−y

= 1, α

> 0})

= max (−F

= −1, α

< c} ∪ {y

= 1, α

> 0})

算法结束后，有 b =

low

。

上面介绍了

Keerthi

工作集

的选取方法，下面简单介绍

Fan

的方法。

For all

t, s

dene

= k

+ k

− 2k

= −y

∇f(α

)

+ y

∇f(α

)

> 0

¯a







> 0

τ otherwise

select

i ∈ max



−y

∇f(α

)

|t ∈ I

(α

)



j ∈ min



¯a

|i ∈ I

low

(α

), −y

∇f(α

) < −y

∇f(α

)



where

(α) = {t|α

< c, y

= 1 or α

> 0, y

= −1}

low

(α) = {t|α

< c, y

= −1 or α

> 0, y

= 1}

1.2 支持向量回归

前一部分我们建立了二分类支持向量机，二分类标签为 y = {−1, 1}。接下来我们讨论 y 是连

续型数值的情况，即回归问题。用于处理回归问题的支持向量机被称为支持向量回归 SVR。(此

处应该注意，我们仍然强调支持向量的概念，如果不存在支持向量，那么这种方法也就称不上支

持向量机了)。为了研究简便，我们考虑二元回归 y = f(x)，x, y ∈ R 的情况。我们来看一些可

能的回归问题，如图 (1.3) 所示

http://www.ma-xy.com 7 http://www.ma-xy.com

http://www.ma-xy.com

1.2 支持向量回归第一章中级支持向量机

图 1.3: 二元回归问题示意图

和前面的研究思路一样，我们仍从线性回归模型出发，最后利用核技术将其投影成非线性问

题。支持向量回归 SVR 问题如图 (1.4) 所示

图 1.4: 支持向量回归示意图

回忆一下，在前面的 2 分类问题中，我们的目标是：求 w, b，使得样本点到分割线/直线的

最小距离最大。下面，我们来考虑回归问题的目标：

1. 求 w, b，使得样本点到直线 f (x) 的总距离最小 (距离可以是平方距离等)；

2. 求 w, b，使得样本点到直线 f (x) 的最大距离最小 (所有样本点落在区域内，且区域最小)；

3. 求 w, b，使得样本点到直线 f (x) 的最小距离最大。

上面第一个目标/方案是不错的，而且非常好，很像最小二乘回归 (样本点到直线的总离差平

方最小)。当然也可以使用最小一乘回归和最大离差回归，因为其不具备支持向量，所以我们暂

时不做研究。下面，我们来描述一下第二个目标/方案：最大距离最小化，如图 (1.5) 所示。为了

方便，我们将 y 记为 x

http://www.ma-xy.com 8 http://www.ma-xy.com

http://www.ma-xy.com

第一章中级支持向量机 1.2 支持向量回归

图 1.5: 最大距离最小化示意图

首先，点到线的距离为

+ b|

||w||

= r

最大距离最小化的目标为

min

w,b

max



+ b|

||w||



记最大距离为

||w||

，重写上述目标有

min

w,b

||w||

s.t.







+ b| ⩽ d

i = 1, 2, . . . , n

由于距离是相对的，不妨设最大距离的 d = 1，而其样本点的距离 < 1，于是有

min

w,b

||w||

(1.2)

s.t.







+ b| ⩽ 1

i = 1, 2, . . . , n

也即 max

w,b

||w||

，即 min

w,b

−

||w||

。设置容错量 ξ，使得部分样本可以在距离外，即 |w

+ b| ⩽

1 + ξ

, ξ

⩾ 0。

ξ 的设置带来了损失，我们计算 w, b 时，ξ(“带”外的点) 无效。为此我们使损失最小，有

min

w,b



i=1

(1.3)

将上述目标 (1.2) 和目标 (1.3) 合并，并设置目标权重为 c，有

min

w,b,ξ

−

w + c



i=1

s.t.











+ b| ⩽ 1 + ξ

ξ ⩾ 0

i = 1, 2, . . . , n

http://www.ma-xy.com 9 http://www.ma-xy.com

http://www.ma-xy.com

1.2 支持向量回归第一章中级支持向量机

将上述模型中的不等式约束展开，有

min

w,b,ξ,ξ

∗

−

w + c



i=1

(ξ

+ ξ

∗

)

s.t.











+ b ⩽ 1 + ξ

+ b ⩾ −1 − ξ

∗

⩾ 0

∗

⩾ 0

i = 1, 2, . . . , n

至此，最大距离最小模型已经建成。我们来看一下这个模型是否可求，当 ξ

, ξ

∗

增大时，相

当于边界减小，w, b 也会增大，−w 会减小，即 ξ ↑, −w ↓，所以有最优 w, b 是二者取中。不过困

难的是，上述问题并非凸二次规划 (min x

Hx + cx,H 为半正定时，为凸规划)，因为这里的 H

为 −I(是负定的)，所以我们不能用 Langrange 对偶方法对问题进行求解。

下面，我们来考虑目标/方案 3：使样本点到直线 f(x) 的最小距离最大。如图 (1.6) 所示

图 1.6: 最小距离最大化示意图

如果直接求取最小距离最大化的话，会出现如上图 (1.6) 的情况：我们会让直线 f (x) 尽可

能的向远处走，甚至是无穷大。这样是不好的，我们希望将直线约束在一定的范围内 (数据点到

直线的距离是有限的)，比如 |y

− w

− b| < ϵ，换句话说，我们是将数据点放在一个“带”内，

这个“带”就是两个直线 |y

− w

− b| < ϵ 构成的。并且，这种最小距离最大并不好，毕竟我

们没有必要要求最小距离最大 (二分类支持向量机的目标)。

上面三个目标/方案都不是很好，下面，我们在二分类支持向量机的模型上进行思考

min

w,b,ξ

||w||

+ c



i=1

s.t.







φ(x

) + b) ⩾ 1 − ξ

⩾ 0, i = 1, 2, . . . , n

其实，上面这种模型可以有另一种解释：我们将 ξ

视为观测值 y

和理论值 w

φ(x

) + b 的误差，

则目标中的



i=1

部分是表示误差和最小；和回归模型相比，||w||

可视为正则项，||w|| 会使

http://www.ma-xy.com 10 http://www.ma-xy.com

http://www.ma-xy.com

第一章中级支持向量机 1.2 支持向量回归

得拟合函数更加平滑。这里的损失函数就是 Huber 损失，只有当 y

和 w

φ(x

) + b 超过一定界

限时 (y

φ(x

) + b) ⩾ 1 − ξ

)，才计算误差，否则误差为 0。

对于回归问题，我们仍将 ξ

视为 y

和 w

φ(x

) + b 的误差，我们要求误差总和最小。仍采

用 Huber 损失，Huber 损失的阈值设为 ε，有

− f(x

) ⩽ ε + ξ

f(x

) − y

⩽ ε + ξ

∗

这里的 ξ

, ξ

∗

为误差。建立支持向量回归模型

min

w,b,ξ,ξ

∗

J(w, b, ξ, ξ

∗

) = c



i=1

(ξ

+ ξ

∗

)

s.t.











− f(x

) ⩽ ε + ξ

f(x

) − y

⩽ ε + ξ

∗

, ξ

∗

⩾ 0

i = 1, 2, . . . , n

我们在上面的模型中添加正则项 ||w||，有

min

w,b,ξ,ξ

∗

J(w, b, ξ, ξ

∗

) = c



i=1

(ξ

+ ξ

∗

) +

||w||

s.t.











− f(x

) ⩽ ε + ξ

f(x

) − y

⩽ ε + ξ

∗

, ξ

∗

⩾ 0

i = 1, 2, . . . , n

上述问题是一个凸二次规划问题，引入 Langrange 函数

L =

||w|| + c



i=1

(ξ

+ ξ

∗

) −



i=1

[ξ

+ ε − y

+ f(x

)]

−



i=1

∗

[ξ

∗

+ ε − y

+ f(x

)] −



i=1

(ξ

+ ξ

∗

)

其中：α

, α

∗

, r

∗

为 Langrange 算子，α

∗

, r

∗

⩾ 0。求 L 对 w, b, ξ, ξ

∗

最小，对 α

, α

∗

, r

∗

最

大，推导 L 有

max

α,α

∗

W (α, α

∗

) =



i=1



j=1

(α

− α

∗

)(α

− α

∗

)k(x

, x

) +



i=1

(α

− α

∗

−



i=1

(α

+ α

∗

)ε

s.t.













i=1

(α

− α

∗

) = 0

0 ⩽ α

, α

∗

⩽ c

http://www.ma-xy.com 11 http://www.ma-xy.com

http://www.ma-xy.com

1.2 支持向量回归第一章中级支持向量机

将上式写为二次规划问题为

min

α,α

∗

(α − α

∗

)

Q(α − α

∗

) − ε



i=1

(α

+ α

∗

) +



i=1

(α

− α

∗

s.t.













i=1

(α

− α

∗

) = 0

0 ⩽ α

, α

∗

⩽ c

由 KKT 条件，在鞍点处

[

−

(

)] = 0

∗

[ε + ξ

∗

− y

+ f(x

)] = 0

= 0

∗

= 0

由上式可以推得 α

∗

= 0，所以 α

, α

∗

不同时为 0。同时，我们得到

(c − α

)ξ = 0

(c − α

∗

)ξ

∗

= 0

¬当 α

= c 或者 α

∗

= c 时，|f (x

)−y

| 可能大于 ε，与其对应的 x

称为边界支持向量 (Boundary

Support Vector)；当 α

∗

∈ (0, c) 时，|f (x

) − y

| = ε，即 ξ

∗

= 0 与其对应的向量 x

称为标

准支持向量 (Normol Support Vector)；®当 α

∗

, α

= 0 时，x

为非支持向量，它们对 w 没有影

响，因此 ε 越大，支持向量越多。BSV、NSV 如图 (1.7) 所示

图 1.7: 边界及标准支持向量示意图

对于 NSV，0 < α

< c(α

∗

= 0)，此时 ξ

= 0。由 KKT 可以求出 b

b = y

−



j=1

(α

− α

∗

− ε

= y

−



∈SV

(α

− α

∗

− ε

http://www.ma-xy.com 12 http://www.ma-xy.com

http://www.ma-xy.com

第一章中级支持向量机 1.3 多分类支持向量机

而对于 0 < α

∗

< c(α

= 0) 的 NSV，有

b = y

−



∈SV

(α

− α

∗

− ε

一般对所有标准支持向量分别计算 b，然后求平均值

b =

NSV









0<α



−



∈SV

(α

− α

∗

)k(x

, x

) − ε





0<α

∗



−



∈SV

(α

− α

∗

)k(x

, x

) − ε









关于 LSSVM 解的稀疏性：大部分 Langrange 乘子 α

, α

∗

都等于 0，样本数据只有少量支持

向量。这个性质是由 ε 不敏感性引起的。通过增加或减少 ε 值，可以控制支持向量的个数，也即

可控制 SVR 解的稀疏性。从解的稀疏性可以看出，其实很多样本是冗余的，在拟合过程中只有

那些支持向量起作用，由此可以引导出下面的最小二乘支持向量机。

LIBSVM 工具包还提供了 Scholkopf 提出的 ν-SVR 的求解。Scholkoph 引入新的参数 ν 来

反应超出 ε 带之外样本数据点和支持向量数。ν-SVR 模型如下

min

w,b,ξ,ξ

∗

,ε

||w|| + c(νε +



i=1

(ξ

, x

∗

))

s.t.











− w

φ(x

) − b ⩽ ε + ξ

φ(x

) + b − y

⩽ ε + ξ

∗

, ξ

∗

⩾ 0

i = 1, 2, . . . , n

其对偶模型为

min

α,α

∗

(α − α

∗

)

Q(α − α

∗

) + y

(α − α

∗

)

s.t.











(α − α

∗

) = 0

(α + α

∗

) ⩽ cν

0 ⩽ α

, α

∗

⩽ c/n

为简化，不等式 e

(α + α

∗

) ⩽ cν 用等式替代。在 LIBSVM 中，我们考虑 c ← c/l，所以有如下

等价问题：

min

α,α

∗

(α − α

∗

)

Q(α − α

∗

) + y

(α − α

∗

)

s.t.











(α − α

∗

) = 0

(α + α

∗

) ⩽ cnν

0 ⩽ α

, α

∗

⩽ c

1.3 多分类支持向量机

支持向量机虽然可以直接用于多分类问题，但是并不是很方便。对于多分类问题，我们可以

将其转化为多个二分类问题进行求解，基于这一思想，会产生许多方法，这里我们不做详细介绍。

http://www.ma-xy.com 13 http://www.ma-xy.com

http://www.ma-xy.com

1.4 最小二乘支持向量机第一章中级支持向量机

1.4 最小二乘支持向量机

无论是 SVR 还是 SVM，Vaonik 等提出的原始支持向量机都需要借一个带不等式约束的二

次规划问题。1999 年，基于等式约束和最小二乘损失函数，Suykens 和 Vandewalle 提出了求解

二分类问题的最小二乘支持向量机 LSSVM。LSSVM 与标准 SVM 相比，减少了一个调整参数，

减少了 n 个优化变量，从而简化计算，然而 LSSVM 没有保留解的稀疏性。改进的 LSSVM 有递

推 LSSVM、加权 LSSVM、多分辨 LSSVM、正则化 LSSVM。

1.4.1 分类问题

对二分类问题 {x

, y

}

k=1

, x

∈ R

, y

∈ {−1, 1}，1999 年 Suykens 给出的最小二乘支持向

量模型如下

min

w,b,ξ

J(w, b, ξ) =

w +



i=1

s.t. y

φ(x

) + b) = 1 − ξ

i = 1, 2, . . . , n

其中：ξ

是容错变量，γ 为目标权重，



i=1

为正则化项。

为求解上述优化问题，引入 Langrange 乘子 α，上述优化问题的 Langrange 函数为

L(w, b, ξ, α) = J(w, b, ξ) −



i=1

φ(x

) + b) − 1 + ξ

]

称 α

= 0 的样本点为支持向量。根据 KKT 条件可得到

∂L

∂w

= 0 ⇒ w =



i=1

φ(x

)

∂L

∂b

= 0 ⇒



i=1

= 0

∂L

∂ξ

= 0 ⇒ α

= γξ

i = 1, 2, . . . , n

∂L

∂α

= 0 ⇒ y

φ(x

) + b) − 1 + ξ

= 0 i = 1, 2, . . . , n

消元去掉 ξ

, w，可以得到下面的线性方程组







I 0 0 −Z

0 0 0 −y

0 0 cI −I

Z y I 0































http://www.ma-xy.com 14 http://www.ma-xy.com

http://www.ma-xy.com

第一章中级支持向量机 1.4 最小二乘支持向量机

最后写成矩阵的形式有



0 y

y ZZ

+ γ

−1









其中：Z = (φ(x

, . . . , φ(x

)

，y = (y

, . . . , y

)

，1

= (1, . . . , 1)

，α = (α

, . . . , α

)

。非

线性函数分类 ZZ

内积运算可用满足 Mercer 条件的核函数 k(x

, x

) 替代，令 Ω = ZZ

，则

Ω

= y

φ(x

)

φ(x

)

则上述方程可以表述为



0 y

y Ω + γ

−1









解上述方程得到 α, b 后，对于新的输入向量/样本 x，其分离可以根据下式进行判断

y(x) = sgn





i=1

k(x, x

) + b



1.4.2 回归问题

对于回归问题 {x

, y

}

i=1

, x

∈ R

, y

∈ R，在 Saunders, Gammerman 和 Vovk 所提出的岭

回归公式中，令 a =

，则最小二乘支持向量机的优化问题为

min

w,b,ξ

J(w, b, ξ) =

w +



i=1

s.t. y

(

) +

= 1

, . . . , n

上面优化问题的 Langrange 函数为

L(w, b, ξ, α) = J(w, b, ξ) −



i=1

φ(x

) + b + ξ

− y

)

相应的 KKT 条件为

∂L

∂w

= 0 ⇒ w =



i=1

φ(x

)

∂L

∂b

= 0 ⇒



i=1

= 0

∂L

∂ξ

= 0 ⇒ α

= γξ

i = 1, 2, . . . , n

∂L

∂α

= 0 ⇒ w

φ(x

) + b + ξ

− y

= 0 i = 1, 2, . . . , n

可以将其写为如下方程组的形式



0 1

Ω + γ

−1









http://www.ma-xy.com 15 http://www.ma-xy.com

http://www.ma-xy.com

1.5 MATLAB 支持向量机示例第一章中级支持向量机

1.5 MATLAB 支持向量机示例

1.5.1 支持向量机示例

MATLAB 支持二分类、多分类、单分类支持向量机以及支持向量回归。tcsvm 用于构建分

类支持向量机，trsvm 用于构建回归支持向量机，二者都支持 SMO, ISDA, or L1 soft-margin

minimization via quadratic programming, and supports many kernel functions,like ’rbf’,’linear’

and ’polynomial’, 并且支持自定义核。

1 %% 参考

2 % https : / / cn . mathworks . com/help / st a t s /support−vector−machines−for−binary−c l a s s i f i c a t i o n .

html

3 %% 二分类支持向量机 SVM

4 % 1 、构建数据

5 rng (1 ) ; % For r e p r o d u c i b i l i t y

6 r = s q rt ( rand (100 ,1) ) ;

7 t = 2* pi *rand (100 ,1 ) ;

8 data1 = [ r . * co s ( t ) , r . * s in ( t ) ] ;

9 r2 = s q rt (3* rand (1 0 0 , 1 ) +1) ;

10 t2 = 2* pi *rand (10 0 , 1 ) ;

11 data2 = [ r2 . * co s ( t2 ) , r2 .* s in ( t2 ) ] ;

12 X = [ data1 ; data2 ] ;

13 Y = ones (200 ,1 ) ;

14 Y( 1: 1 00 ) = −1;

16 % 2 、训练SVM

17 % 2.1 Train the SVM C l a s s i f i e r

18 SVM_twoclass = fitcsvm (X,Y, ’ KernelFunction ’ , ’ r bf ’ , . . .

19 ’ BoxConstraint ’ , Inf , ’ ClassNames ’ ,[ −1 ,1]) ;

21 d = 0 . 0 2 ;% 网格跨度

22 [ x1Grid , x2Grid ] = meshgrid ( min(X( : , 1 ) ) : d :max(X( : , 1 ) ) , . . .

23 min(X( : , 2 ) ) : d : max(X( : , 2 ) ) ) ;% x1Grid 是一个矩阵

24 xGrid = [ x1Grid ( : ) , x2Grid ( : ) ] ;

26 [~ , s co re s ] = p re di c t (SVM_twoclass , xGrid ) ;

27 % 2.2 使用自定义的核函数

28 % fun c tion G = mysigmoid (U,V)

29 % % Sigmoid k e rne l f u nctio n with s lop e gamma and i n t er c e p t c

30 % gamma = 1 ;

31 % c = −1;

32 % G = tanh (gamma*U*V’ + c ) ;

33 % end

34 SVM_twoclass_2 = fitcsvm (X,Y, ’ KernelFunction ’ , ’ mysigmoid ’ , ’ Standardize ’ , tr u e ) ;

35 [~ , s c or e s1 ] = p r ed ic t (SVM_twoclass_2 , xGrid) ;

37 % 3 、绘制数据和分类线

38 f i g u r e ;

39 h ( 1 : 2 ) = g s ca tt e r (X( : , 1 ) ,X( : , 2 ) ,Y, ’ rb ’ , ’ . ’ ) ;

40 hold on

41 ezp o lar (@(x) 1) ;% 单位圆：真实分类线

http://www.ma-xy.com 16 http://www.ma-xy.com

http://www.ma-xy.com

第一章中级支持向量机 1.5 MATLAB 支持向量机示例

42 h(3) = p l ot (X(SVM_twoclass . IsSupportVector ,1 ) , . . .

43 X(SVM_twoclass . IsSupportVector ,2 ) , ’ ko ’ ) ;% 支撑向量

44 contour ( x1Grid , x2Grid , reshape ( s c o r e s ( : , 2 ) , s i z e ( x1Grid ) ) , [ 0 0 ] , ’ : k ’ ) ;% 分类线

45 h(4) = p l ot (X(SVM_twoclass_2 . IsSupportVector , 1 ) , . . .

46 X(SVM_twoclass_2 . IsSupportVector , 2 ) , ’ ko ’ , ’ MarkerSize ’ ,10) ;

47 contour ( x1Grid , x2Grid , reshape ( sc o re s1 ( : , 2 ) , s i z e ( x1Grid ) ) , [ 0 0 ] , ’−−r ’ ) ;

48 t i t l e ( ’ S catt e r Diagram with the Decision Boundary ’ )

49 legend ({ ’−1 ’ , ’1 ’ , ’ Support Vectors ’ } , ’ Location ’ , ’ Best ’ ) ;

50 hold o f f

52 % 4 、分类错误率

53 CVMdl1 = c r os s v a l (SVM_twoclass_2) ;%10倍交叉验证

54 mis c las s 1 = k f oldLoss (CVMdl1) ;

56 %% 多分类支持向量机 SVMc

57 % 方法 1 ：

58 load f i s h e r i r i s

59 X = meas ( : , 3 : 4 ) ;

60 Y = s p e c i e s ;

61 f i g u r e

62 g sc at t e r (X( : , 1 ) ,X( : , 2 ) ,Y) ;

63 h = gca ;

64 li m s = [ h .XLim h .YLim ] ;

65 %

66 SVM_mulclass = c e l l (3 ,1) ;

67 c l a s s e s = unique (Y) ;

68 rng (1) ;

70 f o r j = 1 : numel ( c l a s s e s ) ;

71 indx = strcmp (Y, c l a s s e s ( j ) ) ; % Create binary c l a s s e s f o r each c l a s s i f i e r

72 SVM_mulclass{ j } = fitcsvm (X, indx , ’ ClassNames ’ , [ f a l s e true ] , ’ Standardize ’ , true , . . .

73 ’ KernelFunction ’ , ’ r bf ’ , ’ BoxConstraint ’ , 1) ;

74 end

76 d = 0 . 0 2 ;

77 [ x1Grid , x2Grid ] = meshgrid ( min(X( : , 1 ) ) : d :max(X( : , 1 ) ) , . . .

78 min(X( : , 2 ) ) : d : max(X( : , 2 ) ) ) ;

79 xGrid = [ x1Grid ( : ) , x2Grid ( : ) ] ;

80 N = s i z e ( xGrid , 1 ) ;

81 Sc ores = ze r os (N, numel ( c l a s s e s ) ) ;

83 f o r j = 1 : numel ( c l a s s e s ) ;

84 [ ~ , sc o re ] = pr ed i ct (SVM_mulclass{ j } , xGrid ) ;

85 S cores ( : , j ) = s c ore ( : , 2 ) ; % Second column co n t a i ns po s i t i v e−c l a s s s c o r e s

86 end

87 [ ~ , maxScore ] = max( Scores , [ ] , 2 ) ;

89 f i g u r e

90 h ( 1 : 3 ) = g s ca tt e r ( xGrid ( : , 1 ) , xGrid ( : , 2 ) , maxScore , . . .

91 [ 0 . 1 0 . 5 0 . 5 ; 0 .5 0.1 0 . 5 ; 0. 5 0 . 5 0 . 1 ] ) ;

92 hold on

93 h ( 4 : 6 ) = g s ca tt e r (X( : , 1 ) ,X( : , 2 ) ,Y) ;

94 t i t l e ( ’ {\ bf I r i s C l a s s i f i c a t i o n Regions } ’ ) ;

http://www.ma-xy.com 17 http://www.ma-xy.com

http://www.ma-xy.com

1.5 MATLAB 支持向量机示例第一章中级支持向量机

95 x l a b el ( ’ Peta l Length (cm) ’ ) ;

96 y l a b el ( ’ Peta l Width (cm) ’ ) ;

97 legend (h ,{ ’ s et o sa reg i o n ’ , ’ v e r s i c o l o r r e gion ’ , ’ v i r g i n i c a reg i on ’ , . . .

98 ’ observed se t osa ’ , ’ observed v e r s i c o l o r ’ , ’ observed v i r g i n i c a ’ } , . . .

99 ’ Location ’ , ’ Northwest ’ ) ;

100 ax is t i g h t

101 hold o f f

102

103 % 方法 2 ：使用ECOC多分类器（ECOC还可以使用其他的二分类器，比如 l o g i s t i c 回归）

104 % load f i s h e r i r i s

105 % X = meas ( : , 3 : 4 ) ;

106 % Y = s p e c i e s ;

107 rng (1) ; % For re p r o d u c i b i l i t y

108 t = templateSVM( ’ Standardize ’ ,1 , ’ KernelFunction ’ , ’ g aussian ’ ) ;

109 SVM_mulclass_ecoc = f i t c e c o c (X,Y, ’ Learners ’ , t , ’ F it Po st e r i or ’ , 1 , . . .

110 ’ ClassNames ’ ,{ ’ se t os a ’ , ’ v e r s i c o l o r ’ , ’ v i r g i n i c a ’ } , . . .

111 ’ Verbose ’ ,2 ) ;

112 [ l a be l , ~ , ~ , P os t er i or ] = resubPredict ( SVM_mulclass_ecoc , ’ Verbose ’ ,1) ;

113 SVM_mulclass_ecoc . BinaryLoss

114 idx = randsample ( s i z e (X, 1 ) , 10 ,1 ) ;

115 SVM_mulclass_ecoc . ClassNames

116 ta b le (Y( idx ) , l a b e l ( idx ) , P o st e ri o r ( idx , : ) , . . .

117 ’ VariableNames ’ ,{ ’ TrueLabel ’ , ’ PredLabel ’ , ’ P os te r io r ’ })

118 xMax = max(X) ;

119 xMin = min (X) ;

120 x1Pts = l i n s p ac e (xMin( 1 ) ,xMax(1) ) ;

121 x2Pts = l i n s p ac e (xMin( 2 ) ,xMax(2) ) ;

122 [ x1Grid , x2Grid ] = meshgrid ( x1Pts , x2Pts ) ;

123

124 [ ~ , ~ , ~ , Poster i orRegion ] = pr ed i ct ( SVM_mulclass_ecoc , [ x1Grid ( : ) , x2Grid ( : ) ] ) ;

125

126 f i g u r e ;

127 co n to u r f ( x1Grid , x2Grid , . . .

128 reshape (max( P osteriorRegion , [ ] , 2 ) , s i z e ( x1Grid , 1 ) , s i z e ( x1Grid , 2 ) ) ) ;

129 h = c o lor b ar ;

130 h . YLabel . S t r ing = ’Maximum p o s t e r i o r ’ ;

131 h . YLabel . FontSize = 15 ;

132 hold on

133 gh = g s ca tt e r (X( : , 1 ) ,X( : , 2 ) ,Y, ’ krk ’ , ’ *xd ’ , 8) ;

134 gh(2 ) . LineWidth = 2 ;

135 gh(3 ) . LineWidth = 2 ;

136 t i t l e ’ I r i s Pet al Measurements and Maximum P os te r io r ’ ;

137 ax is t i g h t

138 legend ( gh , ’ Location ’ , ’ NorthWest ’ )

139 hold o f f

140

141 %% 单分类支持向量机 SVC

142 load f i s h e r i r i s

143 X = meas ( : , 1 : 2 ) ;

144 y = ones ( s i z e (X, 1 ) , 1) ;

145

146 rng (1) ;

147 SVM_oneclass = f i tc s v m (X, y , ’ K e rnelSca le ’ , ’ auto ’ , ’ Stan dardize ’ , true , . . .

http://www.ma-xy.com 18 http://www.ma-xy.com

http://www.ma-xy.com

第一章中级支持向量机 1.5 MATLAB 支持向量机示例

148 ’ O utlie r Fract i o n ’ , 0 . 0 5 ) ;

149 svInd = SVM_oneclass . IsSupportVector ;

150 d = 0 . 0 2 ; % Mesh grid step s i z e

151 [X1,X2] = meshgrid (min(X( : , 1 ) ) : d :max(X( : , 1 ) ) , . . .

152 min(X( : , 2 ) ) : d : max(X( : , 2 ) ) ) ;

153 [ ~ , sc o re ] = pr ed i ct ( SVM_oneclass , [ X1( : ) ,X2 ( : ) ] ) ;

154 scoreG rid = reshape ( score , s i z e (X1, 1 ) , s i z e (X2, 2 ) ) ;

155

156 f i g u r e

157 pl o t (X( : , 1 ) ,X( : , 2 ) , ’k . ’ )

158 hold on

159 pl o t (X( svInd , 1 ) ,X( svInd , 2 ) , ’ ro ’ , ’ MarkerSize ’ ,10)

160 contour (X1,X2, scoreGrid )

161 co l o rba r ;

162 t i t l e ( ’ {\ bf I r i s O utli e r Detection via One−Cla ss SVM} ’ )

163 hold o f f

164

165 CVSVMModel = c r os s v a l (SVM_oneclass) ;

166 [ ~ , scorePred ] = k fo l dP r edi ct (CVSVMModel) ;

167 ou t li e rRa te = mean( scorePred <0)

168

169 %% 支持向量回归 SVR

170 load carsmall

171 rng ’ d ef au lt ’ % For r e p r o d u c i b i l i t y

172 X = [ Horsepower Weight ] ;

173 Y = MPG;

174 MdlGau = fi t r svm (X,Y, ’ Standardize ’ , true , ’KFold ’ ,5 , ’ KernelFunction ’ , ’ gau s sian ’ )

175 MdlGau . Trained

176 % Check the model f o r convergence .

177 MdlStd . ConvergenceInfo . Converged

178 % Compute the r e s u b s ti t u t i o n ( in−sample ) mean−squared e r ro r f o r the new model .

179 lS t d = resubLoss ( MdlStd)

180 mseGau = kfold L o s s (MdlGau)

181

1.5.2 使用 Bayesian 优化方法优化 SVM

下面这个例子是需要继续学习的。

1 %% 使用 Bayesian 优化方法优化交叉验证的SVM−−−−要进一步学习

2 % This example shows how to optimize an SVM c l a s s i f i c a t i o n .

3 % The c l a s s i f i c a t i o n works on lo c at i o n s o f p o ints from a Gaussian mixture model .

4 % In ”The Elements of S t a t i s t i c a l Learning ” , Hastie , Tibshirani , and Friedman (2009) ,

page 17 d e s c r ib es the model .

5 % The model b e g ins with ge nerating 10 base poin t s f o r a ”green ” c l a s s ,

6 % d i s t r i bu te d as 2−D independent normals with mean (1 , 0 ) and unit v a r i a n c e .

7 % I t a l s o genera t e s 10 base p o ints f o r a ” red ” cl as s ,

8 % d i s t r i bu te d as 2−D independent normals with mean (0 , 1 ) and unit v a r i a n c e .

9 % For each c l a s s ( green and red ) , generate 100 random p o i nts as f ol l o w s :

10 % 1 、 Choose a base point m o f the app r o p r i a t e c o l o r uni formly at random .

11 % 2 、 Generate an independent random po int with 2−D normal d i s t r i b u t i o n with mean m and

variance I /5 ,

http://www.ma-xy.com 19 http://www.ma-xy.com

http://www.ma-xy.com

1.5 MATLAB 支持向量机示例第一章中级支持向量机

12 % where I i s the 2−by−2 id e n t it y matrix .

13 % In t h i s example , use a varian c e I /50 to show the advantage of optimization more

c l e a r l y .

14 % After genera t i n g 100 green and 100 red points , c l a s s i f y them using fitcsvm .

15 % Then use bayesopt to optimize the parameters of the r e s u l t i n g SVM model with r es p ec t to

cr os s v al id at io n .

17 rng d e f a ul t

18 grnpop = mvnrnd ( [ 1 , 0 ] , eye ( 2 ) ,10) ;

19 redpop = mvnrnd ( [ 0 , 1 ] , eye ( 2 ) ,10) ;

20 pl o t ( grnpop ( : , 1 ) , grnpop ( : , 2 ) , ’ go ’ )

21 hold on

22 pl o t ( redpop ( : , 1 ) , redpop ( : , 2 ) , ’ ro ’ )

23 hold o f f

25 re dpts = ze r os (1 0 0 , 2 ) ; grnpts = r e d pts ;

26 f o r i = 1:100

27 grnpts ( i , : ) = mvnrnd( grnpop ( randi (10) , : ) , eye (2 ) *0.02 ) ;

28 r edpts ( i , : ) = mvnrnd( redpop ( randi (10) , : ) , eye (2) * 0 .02) ;

29 end

30 f i g u r e

31 pl o t ( grnpts ( : , 1 ) , grnpts ( : , 2 ) , ’ go ’ )

32 hold on

33 pl o t ( redp t s ( : , 1 ) , redpts ( : , 2 ) , ’ ro ’ )

34 hold o f f

35 cdata = [ grnpts ; redp t s ] ;

36 grp = ones ( 200 , 1 ) ;

37 % Green l a b e l 1 , red l a b e l −1

38 grp (101 : 2 00) = −1;

39 c = c v p a r t i t i o n (200 , ’KFold ’ , 1 0 ) ;

40 sigma = optimizabl e V a r i ab le ( ’ sigma ’ , [ 1 e −5,1e5 ] , ’ Transform ’ , ’ l og ’ ) ;

41 box = o p t i mi zableVariable ( ’box ’ , [ 1 e −5,1e5 ] , ’ Transform ’ , ’ l o g ’ ) ;

42 minfn = @( z ) k f o l dLoss ( fitcsvm ( cdata , grp , ’ CVPartition ’ , c , . . .

43 ’ KernelFunction ’ , ’ r bf ’ , ’ BoxConstraint ’ , z . box , . . .

44 ’ Kernel Scale ’ , z . sigma ) ) ;

45 r e s u l t s = bayesopt ( minfn , [ sigma , box ] , ’ Is O bj e ct i ve D et e rm i nis tic ’ , true , . . .

46 ’ AcquisitionFunctionName ’ , ’ expected−improvement−plus ’ )

47 % Use the r e s u l t s to t r a i n a new , optimized SVM c l a s s i f i e r .

48 z ( 1) = r e s u l t s . XAtMinObjective . sigma ;

49 z ( 2) = r e s u l t s . XAtMinObjective . box ;

50 SVM_oneclass = f i tc s v m ( cdata , grp , ’ KernelFunction ’ , ’ rb f ’ , . . .

51 ’ Kernel Scale ’ , z ( 1) , ’ BoxConstraint ’ , z ( 2 ) ) ;

53 % Plot the c l a s s i f i c a t i o n boundaries . To v i s u a l i z e the support v e c t o r c l a s s i f i e r , p r ed i ct

sc or es over a g rid .

55 d = 0 . 0 2 ;

56 [ x1Grid , x2Grid ] = meshgrid ( min( cdata ( : , 1 ) ) : d :max( cdata ( : , 1 ) ) , . . .

57 min( cdata ( : , 2 ) ) : d :max( cdata ( : , 2 ) ) ) ;

58 xGrid = [ x1Grid ( : ) , x2Grid ( : ) ] ;

59 [ ~ , s co re s ] = p re di c t ( SVM_oneclass , xGrid ) ;

61 h = nan( 3 ,1 ) ; % P r e a ll oc a t i o n

http://www.ma-xy.com 20 http://www.ma-xy.com

http://www.ma-xy.com

第一章中级支持向量机 1.5 MATLAB 支持向量机示例

62 f i g u r e ;

63 h ( 1 : 2 ) = g s ca tt e r ( cdata ( : , 1 ) , cdata ( : , 2 ) , grp , ’ rg ’ , ’+*’ ) ;

64 hold on

65 h(3) = p l ot ( cdata ( SVM_oneclass . IsSupportVector , 1 ) , . . .

66 cdata ( SVM_oneclass . IsSupportVector , 2 ) , ’ ko ’ ) ;

67 contour ( x1Grid , x2Grid , reshape ( s c o r e s ( : , 2 ) , s i z e ( x1Grid ) ) , [ 0 0 ] , ’ k ’ ) ;

68 legend (h ,{ ’−1 ’ , ’+1 ’ , ’ Support Vectors ’ } , ’ Location ’ , ’ Southeast ’ ) ;

69 ax is equal

70 hold o f f

71 % Generate and c l a s s i f y some new data points .

73 grnobj = g m d i s t r i b u t i o n ( grnpop , . 2 * eye (2 ) ) ;

74 r edobj = gmdistribution ( redpop , . 2 * eye ( 2 ) ) ;

76 newData = random ( grnobj ,1 0 ) ;

77 newData = [ newData ; random( redobj ,1 0 ) ] ;

78 grpData = ones ( 2 0 , 1) ;

79 grpData ( 11 : 20 ) = −1; % red = −1

81 v = p re d ic t ( SVM_oneclass , newData ) ;

83 g = nan ( 7 , 1 ) ;

84 f i g u r e ;

85 h ( 1 : 2 ) = g s ca tt e r ( cdata ( : , 1 ) , cdata ( : , 2 ) , grp , ’ rg ’ , ’+*’ ) ;

86 hold on

87 h ( 3 : 4 ) = g s ca tt e r ( newData ( : , 1 ) ,newData ( : , 2 ) ,v , ’mc ’ , ’ ** ’ ) ;

88 h(5) = p l ot ( cdata ( SVM_oneclass . IsSupportVector , 1 ) , . . .

89 cdata ( SVM_oneclass . IsSupportVector , 2 ) , ’ ko ’ ) ;

90 contour ( x1Grid , x2Grid , reshape ( s c o r e s ( : , 2 ) , s i z e ( x1Grid ) ) , [ 0 0 ] , ’ k ’ ) ;

91 legend (h ( 1 : 5 ) ,{ ’−1 ( t ra i n i n g ) ’ , ’+1 ( t r a i n i n g ) ’ , ’−1 ( c l a s s i f i e d ) ’ , . . .

92 ’+1 ( c l a s s i f i e d ) ’ , ’ Support Vectors ’ } , ’ Location ’ , ’ Southeast ’ ) ;

93 ax is equal

94 hold o f f

95 % See which new data points are c o r r e c t l y c l a s s i f i e d . C i r c l e the c o r r e c t l y c l a s s i f i e d

poin t s in red , and the i n c o r r e c t l y c l a s s i f i e d p o i nts i n black .

97 mydiff = ( v == grpData ) ; % C l a s s i f i e d c o r r e c t l y

98 f i g u r e ;

99 h ( 1 : 2 ) = g s ca tt e r ( cdata ( : , 1 ) , cdata ( : , 2 ) , grp , ’ rg ’ , ’+*’ ) ;

100 hold on

101 h ( 3 : 4 ) = g s ca tt e r ( newData ( : , 1 ) ,newData ( : , 2 ) ,v , ’mc ’ , ’ ** ’ ) ;

102 h(5) = p l ot ( cdata ( SVM_oneclass . IsSupportVector , 1 ) , . . .

103 cdata ( SVM_oneclass . IsSupportVector , 2 ) , ’ ko ’ ) ;

104 contour ( x1Grid , x2Grid , reshape ( s c o r e s ( : , 2 ) , s i z e ( x1Grid ) ) , [ 0 0 ] , ’ k ’ ) ;

105

106 f o r i i = mydiff % Plot red squares around c o r re ct pts

107 h (6) = p l ot ( newData ( i i , 1 ) , newData ( i i , 2 ) , ’ r s ’ , ’ MarkerSize ’ ,1 2 ) ;

108 end

109

110 f o r i i = not ( mydiff ) % Plot black squa r es around in c o r r e c t pts

111 h (7) = p l ot ( newData ( i i , 1 ) , newData ( i i , 2 ) , ’ ks ’ , ’ MarkerSize ’ ,1 2 ) ;

112 end

113 legend (h ,{ ’−1 ( t ra i n i n g ) ’ , ’+1 ( t r a i n i n g ) ’ , ’−1 ( c l a s s i f i e d ) ’ , . . .

http://www.ma-xy.com 21 http://www.ma-xy.com

http://www.ma-xy.com

1.6 PYTHON 支持向量机示例第一章中级支持向量机

114 ’+1 ( c l a s s i f i e d ) ’ , ’ Support Vectors ’ , ’ C orrectly C l a s s i f i e d ’ , . . .

115 ’ M i s c l a s s i f i e d ’ } , ’ Location ’ , ’ Southeast ’ ) ;

116 hold o f f

117 % Plot P os t er i or P r oba b ili t y Regions f o r SVM C l a s s i f i c a t i o n Models

118

1.6 Python 支持向量机示例

下面是 scikit-learn 工具包的 SVM 示例，此工具包仍然支持自定义核函数，并且提供了 SVC,

NuSVC(即上面提到的 ν-SVM) 以及 LinearSVC 三大类支持向量机。Internally, Scikit use libsvm

and liblinear to handle all computations. These libraries are wrapped using C and Cython.

1 # 1 、分类支持向量机

2 # SVC, NuSVC and LinearSVC are c l a s s e s capable o f performing multi−c l a s s c l a s s i f i c a t i o n

on a d a t a s e t .

3 from sk lea r n import svm

4 X = [ [ 0 , 0 ] , [ 1 , 1 ] ]

5 y = [ 0 , 1 ]

6 c l f = svm .SVC()

7 c l f . f i t (X, y)

8 SVC(C=1.0 , cache_size =200, class_weight=None , co e f 0 =0.0 ,

9 decision_function_shape=’ ovr ’ , degree =3, gamma=’ auto ’ , ker n el=’ rb f ’ ,

10 max_iter=−1, p r o b a b i l i t y=False , random_state=None , sh r inki n g=True ,

11 t o l =0.001 , verbose=Fal se )

12 c l f . pre di c t ( [ [ 2 . , 2 . ] ] )

13 c l f . support_vectors_ # get support v e cto r s

14 c l f . support_ # ge t i n d i c e s of support v ect o rs

15 c l f . n_support_ # get number of support v ect o rs f o r each c l a s s

17 # 2 、支持向量回归 SVR

18 import numpy as np

19 from sk lea r n . svm import SVR

20 # Generate sample data

21 X = np . sor t (5 * np . random . rand ( 4 0 , 1) , a x is =0)

22 y = np . si n (X) . ra v e l ( )

23 # Add no ise to ta rg e t s

24 y [ : : 5 ] += 3 * ( 0 . 5 − np . random . rand (8) )

25 # F it r e g r e s s i o n model

26 svr_rbf = SVR( ke r n el=’ rb f ’ , C=1e3 , gamma=0.1)

27 sv r _l i n = SVR( k e rne l=’ l i n e a r ’ , C=1e3 )

28 svr_poly = SVR( ke r nel=’ poly ’ , C=1e3 , degree =2)

29 y_rbf = svr_rbf . f i t (X, y ) . pre d ic t (X)

30 y_lin = svr_l i n . f i t (X, y) . p re di c t (X)

31 y_poly = svr_poly . f i t (X, y ) . pr ed i c t (X)

33 # 3 、单分类支持向量 SVC

34 import numpy as np

35 from sk lea r n import svm

36 xx , yy = np . meshgrid (np . l i ns pa ce (−5, 5 , 500) , np . l i n s pa ce (−5, 5 , 500) )

37 # Generate t r a i n data

http://www.ma-xy.com 22 http://www.ma-xy.com

http://www.ma-xy.com

第一章中级支持向量机 1.7 LIBSVM 和 LSSVM 简介

38 X = 0 .3 * np . random . randn (100 , 2)

39 X_train = np . r_ [X + 2 , X − 2 ]

40 # Generate some r e gu l ar novel obse r vati o n s

41 X = 0 .3 * np . random . randn (20 , 2)

42 X_test = np . r_ [X + 2 , X − 2 ]

43 # Generate some abnormal novel obse r v atio n s

44 X_outliers = np . random . uniform ( low=−4, high =4, s i z e =(20, 2) )

45 # f i t the model

46 c l f = svm . OneClassSVM(nu=0.1 , ker n el=” r b f ” , gamma=0.1)

47 c l f . f i t (X_train)

48 y_pred_train = c l f . p r ed ic t ( X_train)

49 y_pred_test = c l f . pre d ic t ( X_test)

50 y_pred_outliers = c l f . p r e di c t ( X_outliers )

51 n_error_train = y_pred_train [ y_pred_train == −1]. s i z e

52 n_error_test = y_pred_test [ y_pred_test == −1]. s i z e

53 n_ e r r o r _ o u t l i e rs = y_pred_outliers [ y_pred_outliers == 1 ] . s i z e

1.7 libsvm 和 LSSVM 简介

todo: 待补充。。。

http://www.ma-xy.com 23 http://www.ma-xy.com