view org/svm.org @ 136:46bc0f596b91

ideas from reading /life in the frozen state/.
author Robert McIntyre <rlm@mit.edu>
date Sun, 07 Sep 2014 21:35:52 -0700
parents dc521fea9219
children
line wrap: on
line source
1 #+TITLE: Notes on SVMs
2 #+AUTHOR:Dylan Holmes
3 #+SETUPFILE: ../../aurellem/org/setup.org
4 #+INCLUDE: ../../aurellem/org/level-0.org
5 #+MATHJAX: align:"left" mathml:t path:"http://www.aurellem.org/MathJax/MathJax.js"
7 * Question: How do you find $\vec{w}$, $b$, and the \alpha{}'s?
9 You should take a look at the class notes, for starters:
10 http://ai6034.mit.edu/fall12/images/SVM_and_Boosting.pdf
12 You can also take a look at @388 to see an example of (part of) a
13 worked problem.
16 When solving SVM problems, there are some useful equations to keep in
17 mind:
19 1. $\vec{w}\cdot \vec{x} + b = 0$ defines the boundary, and in
20 particular $\vec{w}\cdot \vec{x} + b \geq 0$ defines the positive
21 side of the boundary.
22 2. $\vec{w}\cdot \vec{x} + b = \pm 1$ defines the positive and
23 negative gutters.
24 3. The distance between the gutters (the width of the margin) is
25 $\text{margin-width} = \frac{2}{||\vec{w}||} =
26 \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$
27 4. $\sum_{\text{training data}} y_i \alpha_i =
28 0$
29 5. $\sum_{\text{training data}} y_i \alpha_i \vec{x}_i = \vec{w}$
31 The last two equations are the ones that can help you find the
32 $\alpha_i$. You may also find it useful to think of the $\alpha_i$
33 as measuring \ldquo{}supportiveness\rdquo{}. This means, for
34 example, that: - $\alpha_i$ is zero for non-support vectors,
35 i.e. training points that do not determine the boundary, and which
36 would not affect the placement of the boundary if deleted - When you
37 compare two separate SVM problems, where the first has support
38 vectors that are far from the boundary, and the second has support
39 vectors very close to the boundary, the latter support vectors have
40 comparatively higher $\alpha_i$ values.
42 ** Can \alpha{} be zero for support vectors?
43 No, \alpha{} is zero for a point if and only if it's not a support
44 vector. You can see this because in equation (5), just the points
45 with nonzero alphas contribute to $\vec{w}$.
47 ** Are all points on the gutter support vectors? How do you tell whether a point is a support vector?
49 Not all points on the gutter are support vectors. To tell if a
50 training point is a support vector, imagine what would happen if you
51 deleted it --- would the decision boundary be different? If it would,
52 the point is a support vector. If it wouldn't, the point isn't a
53 support vector.
55 * 2011 Q2 Part A
57 The equation for the line you drew in part A is $y = 0$, and points
58 will be classified as positive if $y\geq 0$.
60 # shouldn't points be classified as positive only if leq 1 + b dot w x?
61 Now in general, the equation for the decision boundary is
62 $\vec{w}\cdot \vec{x} + b = 0$, and points $\vec{x}$ are
63 classified as positive if $\vec{w}\cdot \vec{x} + b \geq 0$. To
64 solve for $\vec{w}$ and $b$, you just manipulate your inequation
65 for the boundary into this form:
67 $$y\geq 0$$
68 $$0x + 1y + 0 \geq 0$$
69 $$\begin{bmatrix}0&amp;1\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 \geq 0$$
71 And so we have:
73 $$\underbrace{\begin{bmatrix}0&amp;1\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$
75 ...which is /almost/ right. The trouble is that we can multiply this
76 inequation by any positive constant $c>0$ and it will still be true
77 --- so, the right answer might not be the $\vec{w}$ and $b$ we found
78 above, but instead a /multiple/ of them. To fix this, we multiply the
79 inequation by $c>0$, and solve for $c$:
81 $$c\cdot (0x + 1y + 0) \geq c\cdot0$$
82 $$\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} +
83 0 \geq 0$$
85 One formula that you can use to solve for $c$ is the margin-width
86 formula; by examining the lines you drew, you see that the margin
87 width is *4*. Now, in general, the equation for margin width is
89 $$\text{margin-width} = \frac{2}{||\vec{w}||} = \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$$
91 So, for the second time, we use the technique of setting the quantity
92 you can see (margin width = 4) equal to the general formula (margin
93 width = 2/||w||) and solving for the parameters in the general
94 formula:
96 $$\frac{2}{||w||} = 4$$
98 $$\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}} = 4$$
100 $$\frac{4}{\vec{w}\cdot \vec{w}} = 16$$ (squaring both sides)
102 $$\vec{w}\cdot \vec{w} = \frac{1}{4}$$
104 $$\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}0\\c\end{bmatrix}
105 = \frac{1}{4}$$
107 (This uses the expression for $\vec{w}$ we got after
108 we multiplied by $c$ above.)
110 $$\begin{eqnarray*}
111 c^2 &=& \frac{1}{4}\\
112 c &=& \pm \frac{1}{2}\\
113 c &=& \frac{1}{2}\end{eqnarray*}$$...because we require $c>0$ so that it doesn't flip the inequality.
116 Returning to our inequality, we have
118 $$\begin{eqnarray*}\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\\
119 \begin{bmatrix}0&amp;\frac{1}{2}\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\end{eqnarray*}$$
121 $$\underbrace{\begin{bmatrix}0&amp;\frac{1}{2}\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$
123 And this is truly the right answer, now that we have solved for $c$.
126 * What's the use of gutters?
127 The gutters are the regions where you have no hard data supporting
128 your classification, but the decision boundary is, as its name
129 implies, the cutoff for making a decision between the two
130 classifications. As such, you would likely have lower confidence in
131 any decision within the gutters, but a classifier needs to make a
132 decision one way or the other for all possible test data (there's no
133 reason not to do so, as making no decision means you're automatically
134 wrong in all cases, whereas even a lousy classifier will get some of
135 them right).
137 ----
139 Here are two more ways to look at it. First of all, and simplest, SVMs
140 are supposed to find the boundary with the widest margin. The width of
141 the margin is just the perpendicular distance between the gutters, so
142 the gutters are important for that reason.
144 Second of all, if you attempt to define the SVM problem
145 geometrically, i.e. put it into a form that can be solved by a
146 computer, it looks something like the following.
148 #+BEGIN_QUOTE
149 Find a boundary --- which is defined by a normal vector
150 $\vec{w}$ and an offset $b$ --- such that:
151 1. The boundary \ldquo{} $\vec{w}\cdot \vec{x} + b = 0$ \rdquo{}
152 separates the positive and negative points
153 2. The width of the margin is as large as possible. Remember, the
154 margin width is twice the distance between the boundary and the
155 training point that is closest to the boundary; in other words,
156 $$\text{margin-width} = \min_{\text{training data}} 2 y_i
157 (\vec{w}\cdot \vec{x}_i + b)$$
158 #+END_QUOTE
160 Unfortunately, the problem as stated has no unique solution, since if
161 you find a satisfactory pair $\langle \vec{w}, b \rangle$ , then the
162 pair $\langle 3\vec{w}, 3b\rangle$ (for example) defines the same
163 boundary but has a larger margin. The problem is that any multiple of
164 a satisfactory pair yields another satisfactory pair. In essence,
165 we're just changing the units of measurement without materially
166 affecting the boundary.
168 We can /remove this additional degree of freedom by adding another
169 constraint/ to the problem which establishes a sense of scale. For
170 example, we could require $\vec{w}$ to be a /unit/ normal vector,
171 i.e. we could require that $||\vec{w}|| = 1$. This fixes the problem
172 and gives SVMs a unique solution.
174 In 6.034, and elsewhere, we use an alternate constraint; we impose the /gutter conditions/:
176 $\vec{w}\cdot \vec{x_+} + b = + 1$ for positive training points
177 $\vec{x_+}$.
179 $\vec{w}\cdot \vec{x_-} + b = -1$ for negative
180 training points $\vec{x_-}$.
182 which we often combine into a single inequality: $y_i (\vec{w}\cdot
183 \vec{x}_i + b) \geqslant 1$ for all training points $\langle
184 \vec{x}_i, y_i\rangle$.
187 This constraint is enough to eliminate the extra degree of freedom and give SVMs a unique solution. Moreover, you get a nice formula for the margin width by requiring that it holds:
189 $\text{margin-width} = \frac{2}{||\vec{w}||}$ because the gutter
190 conditions are enforced.
193 That's why gutters are important.