rlm@71
|
1 #+TITLE: Notes on SVMs
|
rlm@71
|
2 #+AUTHOR:Dylan Holmes
|
rlm@71
|
3 #+SETUPFILE: ../../aurellem/org/setup.org
|
rlm@71
|
4 #+INCLUDE: ../../aurellem/org/level-0.org
|
rlm@71
|
5 #+MATHJAX: align:"left" mathml:t path:"http://www.aurellem.org/MathJax/MathJax.js"
|
rlm@71
|
6
|
rlm@71
|
7 * Question: How do you find $\vec{w}$, $b$, and the \alpha{}'s?
|
rlm@71
|
8
|
rlm@71
|
9 You should take a look at the class notes, for starters:
|
rlm@71
|
10 http://ai6034.mit.edu/fall12/images/SVM_and_Boosting.pdf
|
rlm@71
|
11
|
rlm@71
|
12 You can also take a look at @388 to see an example of (part of) a
|
rlm@71
|
13 worked problem.
|
rlm@71
|
14
|
rlm@71
|
15
|
rlm@71
|
16 When solving SVM problems, there are some useful equations to keep in
|
rlm@71
|
17 mind:
|
rlm@71
|
18
|
rlm@71
|
19 1. $\vec{w}\cdot \vec{x} + b = 0$ defines the boundary, and in
|
rlm@71
|
20 particular $\vec{w}\cdot \vec{x} + b \geq 0$ defines the positive
|
rlm@71
|
21 side of the boundary.
|
rlm@71
|
22 2. $\vec{w}\cdot \vec{x} + b = \pm 1$ defines the positive and
|
rlm@71
|
23 negative gutters.
|
rlm@71
|
24 3. The distance between the gutters (the width of the margin) is
|
rlm@71
|
25 $\text{margin-width} = \frac{2}{||\vec{w}||} =
|
rlm@71
|
26 \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$
|
rlm@71
|
27 4. $\sum_{\text{training data}} y_i \alpha_i =
|
rlm@71
|
28 0$
|
rlm@71
|
29 5. $\sum_{\text{training data}} y_i \alpha_i \vec{x}_i = \vec{w}$
|
rlm@71
|
30
|
rlm@71
|
31 The last two equations are the ones that can help you find the
|
rlm@71
|
32 $\alpha_i$. You may also find it useful to think of the $\alpha_i$
|
rlm@71
|
33 as measuring \ldquo{}supportiveness\rdquo{}. This means, for
|
rlm@71
|
34 example, that: - $\alpha_i$ is zero for non-support vectors,
|
rlm@71
|
35 i.e. training points that do not determine the boundary, and which
|
rlm@71
|
36 would not affect the placement of the boundary if deleted - When you
|
rlm@71
|
37 compare two separate SVM problems, where the first has support
|
rlm@71
|
38 vectors that are far from the boundary, and the second has support
|
rlm@71
|
39 vectors very close to the boundary, the latter support vectors have
|
rlm@71
|
40 comparatively higher $\alpha_i$ values.
|
rlm@71
|
41
|
rlm@71
|
42 ** Can \alpha{} be zero for support vectors?
|
rlm@71
|
43 No, \alpha{} is zero for a point if and only if it's not a support
|
rlm@71
|
44 vector. You can see this because in equation (5), just the points
|
rlm@71
|
45 with nonzero alphas contribute to $\vec{w}$.
|
rlm@71
|
46
|
rlm@71
|
47 ** Are all points on the gutter support vectors? How do you tell whether a point is a support vector?
|
rlm@71
|
48
|
rlm@71
|
49 Not all points on the gutter are support vectors. To tell if a
|
rlm@71
|
50 training point is a support vector, imagine what would happen if you
|
rlm@71
|
51 deleted it --- would the decision boundary be different? If it would,
|
rlm@71
|
52 the point is a support vector. If it wouldn't, the point isn't a
|
rlm@71
|
53 support vector.
|
rlm@71
|
54
|
rlm@71
|
55 * 2011 Q2 Part A
|
rlm@71
|
56
|
rlm@71
|
57 The equation for the line you drew in part A is $y = 0$, and points
|
rlm@71
|
58 will be classified as positive if $y\geq 0$.
|
rlm@71
|
59
|
rlm@71
|
60 # shouldn't points be classified as positive only if leq 1 + b dot w x?
|
rlm@71
|
61 Now in general, the equation for the decision boundary is
|
rlm@71
|
62 $\vec{w}\cdot \vec{x} + b = 0$, and points $\vec{x}$ are
|
rlm@71
|
63 classified as positive if $\vec{w}\cdot \vec{x} + b \geq 0$. To
|
rlm@71
|
64 solve for $\vec{w}$ and $b$, you just manipulate your inequation
|
rlm@71
|
65 for the boundary into this form:
|
rlm@71
|
66
|
rlm@71
|
67 $$y\geq 0$$
|
rlm@71
|
68 $$0x + 1y + 0 \geq 0$$
|
rlm@71
|
69 $$\begin{bmatrix}0&1\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 \geq 0$$
|
rlm@71
|
70
|
rlm@71
|
71 And so we have:
|
rlm@71
|
72
|
rlm@71
|
73 $$\underbrace{\begin{bmatrix}0&1\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$
|
rlm@71
|
74
|
rlm@71
|
75 ...which is /almost/ right. The trouble is that we can multiply this
|
rlm@71
|
76 inequation by any positive constant $c>0$ and it will still be true
|
rlm@71
|
77 --- so, the right answer might not be the $\vec{w}$ and $b$ we found
|
rlm@71
|
78 above, but instead a /multiple/ of them. To fix this, we multiply the
|
rlm@71
|
79 inequation by $c>0$, and solve for $c$:
|
rlm@71
|
80
|
rlm@71
|
81 $$c\cdot (0x + 1y + 0) \geq c\cdot0$$
|
rlm@71
|
82 $$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} +
|
rlm@71
|
83 0 \geq 0$$
|
rlm@71
|
84
|
rlm@71
|
85 One formula that you can use to solve for $c$ is the margin-width
|
rlm@71
|
86 formula; by examining the lines you drew, you see that the margin
|
rlm@71
|
87 width is *4*. Now, in general, the equation for margin width is
|
rlm@71
|
88
|
rlm@71
|
89 $$\text{margin-width} = \frac{2}{||\vec{w}||} = \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$$
|
rlm@71
|
90
|
rlm@71
|
91 So, for the second time, we use the technique of setting the quantity
|
rlm@71
|
92 you can see (margin width = 4) equal to the general formula (margin
|
rlm@71
|
93 width = 2/||w||) and solving for the parameters in the general
|
rlm@71
|
94 formula:
|
rlm@71
|
95
|
rlm@71
|
96 $$\frac{2}{||w||} = 4$$
|
rlm@71
|
97
|
rlm@71
|
98 $$\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}} = 4$$
|
rlm@71
|
99
|
rlm@71
|
100 $$\frac{4}{\vec{w}\cdot \vec{w}} = 16$$ (squaring both sides)
|
rlm@71
|
101
|
rlm@71
|
102 $$\vec{w}\cdot \vec{w} = \frac{1}{4}$$
|
rlm@71
|
103
|
rlm@71
|
104 $$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}0\\c\end{bmatrix}
|
rlm@71
|
105 = \frac{1}{4}$$
|
rlm@71
|
106
|
rlm@71
|
107 (This uses the expression for $\vec{w}$ we got after
|
rlm@71
|
108 we multiplied by $c$ above.)
|
rlm@71
|
109
|
rlm@71
|
110 $$\begin{eqnarray*}
|
rlm@71
|
111 c^2 &=& \frac{1}{4}\\
|
rlm@71
|
112 c &=& \pm \frac{1}{2}\\
|
rlm@71
|
113 c &=& \frac{1}{2}\end{eqnarray*}$$...because we require $c>0$ so that it doesn't flip the inequality.
|
rlm@71
|
114
|
rlm@71
|
115
|
rlm@71
|
116 Returning to our inequality, we have
|
rlm@71
|
117
|
rlm@71
|
118 $$\begin{eqnarray*}\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\\
|
rlm@71
|
119 \begin{bmatrix}0&\frac{1}{2}\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\end{eqnarray*}$$
|
rlm@71
|
120
|
rlm@71
|
121 $$\underbrace{\begin{bmatrix}0&\frac{1}{2}\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$
|
rlm@71
|
122
|
rlm@71
|
123 And this is truly the right answer, now that we have solved for $c$.
|
rlm@71
|
124
|
rlm@71
|
125
|
rlm@71
|
126 * What's the use of gutters?
|
rlm@71
|
127 The gutters are the regions where you have no hard data supporting
|
rlm@71
|
128 your classification, but the decision boundary is, as its name
|
rlm@71
|
129 implies, the cutoff for making a decision between the two
|
rlm@71
|
130 classifications. As such, you would likely have lower confidence in
|
rlm@71
|
131 any decision within the gutters, but a classifier needs to make a
|
rlm@71
|
132 decision one way or the other for all possible test data (there's no
|
rlm@71
|
133 reason not to do so, as making no decision means you're automatically
|
rlm@71
|
134 wrong in all cases, whereas even a lousy classifier will get some of
|
rlm@71
|
135 them right).
|
rlm@71
|
136
|
rlm@71
|
137 ----
|
rlm@71
|
138
|
rlm@71
|
139 Here are two more ways to look at it. First of all, and simplest, SVMs
|
rlm@71
|
140 are supposed to find the boundary with the widest margin. The width of
|
rlm@71
|
141 the margin is just the perpendicular distance between the gutters, so
|
rlm@71
|
142 the gutters are important for that reason.
|
rlm@71
|
143
|
rlm@71
|
144 Second of all, if you attempt to define the SVM problem
|
rlm@71
|
145 geometrically, i.e. put it into a form that can be solved by a
|
rlm@71
|
146 computer, it looks something like the following.
|
rlm@71
|
147
|
rlm@71
|
148 #+BEGIN_QUOTE
|
rlm@71
|
149 Find a boundary --- which is defined by a normal vector
|
rlm@71
|
150 $\vec{w}$ and an offset $b$ --- such that:
|
rlm@71
|
151 1. The boundary \ldquo{} $\vec{w}\cdot \vec{x} + b = 0$ \rdquo{}
|
rlm@71
|
152 separates the positive and negative points
|
rlm@71
|
153 2. The width of the margin is as large as possible. Remember, the
|
rlm@71
|
154 margin width is twice the distance between the boundary and the
|
rlm@71
|
155 training point that is closest to the boundary; in other words,
|
rlm@71
|
156 $$\text{margin-width} = \min_{\text{training data}} 2 y_i
|
rlm@71
|
157 (\vec{w}\cdot \vec{x}_i + b)$$
|
rlm@71
|
158 #+END_QUOTE
|
rlm@71
|
159
|
rlm@71
|
160 Unfortunately, the problem as stated has no unique solution, since if
|
rlm@71
|
161 you find a satisfactory pair $\langle \vec{w}, b \rangle$ , then the
|
rlm@71
|
162 pair $\langle 3\vec{w}, 3b\rangle$ (for example) defines the same
|
rlm@71
|
163 boundary but has a larger margin. The problem is that any multiple of
|
rlm@71
|
164 a satisfactory pair yields another satisfactory pair. In essence,
|
rlm@71
|
165 we're just changing the units of measurement without materially
|
rlm@71
|
166 affecting the boundary.
|
rlm@71
|
167
|
rlm@71
|
168 We can /remove this additional degree of freedom by adding another
|
rlm@71
|
169 constraint/ to the problem which establishes a sense of scale. For
|
rlm@71
|
170 example, we could require $\vec{w}$ to be a /unit/ normal vector,
|
rlm@71
|
171 i.e. we could require that $||\vec{w}|| = 1$. This fixes the problem
|
rlm@71
|
172 and gives SVMs a unique solution.
|
rlm@71
|
173
|
rlm@71
|
174 In 6.034, and elsewhere, we use an alternate constraint; we impose the /gutter conditions/:
|
rlm@71
|
175
|
rlm@71
|
176 $\vec{w}\cdot \vec{x_+} + b = + 1$ for positive training points
|
rlm@71
|
177 $\vec{x_+}$.
|
rlm@71
|
178
|
rlm@71
|
179 $\vec{w}\cdot \vec{x_-} + b = -1$ for negative
|
rlm@71
|
180 training points $\vec{x_-}$.
|
rlm@71
|
181
|
rlm@71
|
182 which we often combine into a single inequality: $y_i (\vec{w}\cdot
|
rlm@71
|
183 \vec{x}_i + b) \geqslant 1$ for all training points $\langle
|
rlm@71
|
184 \vec{x}_i, y_i\rangle$.
|
rlm@71
|
185
|
rlm@71
|
186
|
rlm@71
|
187 This constraint is enough to eliminate the extra degree of freedom and give SVMs a unique solution. Moreover, you get a nice formula for the margin width by requiring that it holds:
|
rlm@71
|
188
|
rlm@71
|
189 $\text{margin-width} = \frac{2}{||\vec{w}||}$ because the gutter
|
rlm@71
|
190 conditions are enforced.
|
rlm@71
|
191
|
rlm@71
|
192
|
rlm@71
|
193 That's why gutters are important.
|