Mercurial > thoughts
comparison org/svm.org @ 85:41741f76b567
merge, and an idea about the power of ice.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Thu, 16 Jan 2014 10:37:11 -0500 |
parents | dc521fea9219 |
children |
comparison
equal
deleted
inserted
replaced
84:dfcbbb3d4b9a | 85:41741f76b567 |
---|---|
1 #+TITLE: Notes on SVMs | |
2 #+AUTHOR:Dylan Holmes | |
3 #+SETUPFILE: ../../aurellem/org/setup.org | |
4 #+INCLUDE: ../../aurellem/org/level-0.org | |
5 #+MATHJAX: align:"left" mathml:t path:"http://www.aurellem.org/MathJax/MathJax.js" | |
6 | |
7 * Question: How do you find $\vec{w}$, $b$, and the \alpha{}'s? | |
8 | |
9 You should take a look at the class notes, for starters: | |
10 http://ai6034.mit.edu/fall12/images/SVM_and_Boosting.pdf | |
11 | |
12 You can also take a look at @388 to see an example of (part of) a | |
13 worked problem. | |
14 | |
15 | |
16 When solving SVM problems, there are some useful equations to keep in | |
17 mind: | |
18 | |
19 1. $\vec{w}\cdot \vec{x} + b = 0$ defines the boundary, and in | |
20 particular $\vec{w}\cdot \vec{x} + b \geq 0$ defines the positive | |
21 side of the boundary. | |
22 2. $\vec{w}\cdot \vec{x} + b = \pm 1$ defines the positive and | |
23 negative gutters. | |
24 3. The distance between the gutters (the width of the margin) is | |
25 $\text{margin-width} = \frac{2}{||\vec{w}||} = | |
26 \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$ | |
27 4. $\sum_{\text{training data}} y_i \alpha_i = | |
28 0$ | |
29 5. $\sum_{\text{training data}} y_i \alpha_i \vec{x}_i = \vec{w}$ | |
30 | |
31 The last two equations are the ones that can help you find the | |
32 $\alpha_i$. You may also find it useful to think of the $\alpha_i$ | |
33 as measuring \ldquo{}supportiveness\rdquo{}. This means, for | |
34 example, that: - $\alpha_i$ is zero for non-support vectors, | |
35 i.e. training points that do not determine the boundary, and which | |
36 would not affect the placement of the boundary if deleted - When you | |
37 compare two separate SVM problems, where the first has support | |
38 vectors that are far from the boundary, and the second has support | |
39 vectors very close to the boundary, the latter support vectors have | |
40 comparatively higher $\alpha_i$ values. | |
41 | |
42 ** Can \alpha{} be zero for support vectors? | |
43 No, \alpha{} is zero for a point if and only if it's not a support | |
44 vector. You can see this because in equation (5), just the points | |
45 with nonzero alphas contribute to $\vec{w}$. | |
46 | |
47 ** Are all points on the gutter support vectors? How do you tell whether a point is a support vector? | |
48 | |
49 Not all points on the gutter are support vectors. To tell if a | |
50 training point is a support vector, imagine what would happen if you | |
51 deleted it --- would the decision boundary be different? If it would, | |
52 the point is a support vector. If it wouldn't, the point isn't a | |
53 support vector. | |
54 | |
55 * 2011 Q2 Part A | |
56 | |
57 The equation for the line you drew in part A is $y = 0$, and points | |
58 will be classified as positive if $y\geq 0$. | |
59 | |
60 # shouldn't points be classified as positive only if leq 1 + b dot w x? | |
61 Now in general, the equation for the decision boundary is | |
62 $\vec{w}\cdot \vec{x} + b = 0$, and points $\vec{x}$ are | |
63 classified as positive if $\vec{w}\cdot \vec{x} + b \geq 0$. To | |
64 solve for $\vec{w}$ and $b$, you just manipulate your inequation | |
65 for the boundary into this form: | |
66 | |
67 $$y\geq 0$$ | |
68 $$0x + 1y + 0 \geq 0$$ | |
69 $$\begin{bmatrix}0&1\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 \geq 0$$ | |
70 | |
71 And so we have: | |
72 | |
73 $$\underbrace{\begin{bmatrix}0&1\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$ | |
74 | |
75 ...which is /almost/ right. The trouble is that we can multiply this | |
76 inequation by any positive constant $c>0$ and it will still be true | |
77 --- so, the right answer might not be the $\vec{w}$ and $b$ we found | |
78 above, but instead a /multiple/ of them. To fix this, we multiply the | |
79 inequation by $c>0$, and solve for $c$: | |
80 | |
81 $$c\cdot (0x + 1y + 0) \geq c\cdot0$$ | |
82 $$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + | |
83 0 \geq 0$$ | |
84 | |
85 One formula that you can use to solve for $c$ is the margin-width | |
86 formula; by examining the lines you drew, you see that the margin | |
87 width is *4*. Now, in general, the equation for margin width is | |
88 | |
89 $$\text{margin-width} = \frac{2}{||\vec{w}||} = \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$$ | |
90 | |
91 So, for the second time, we use the technique of setting the quantity | |
92 you can see (margin width = 4) equal to the general formula (margin | |
93 width = 2/||w||) and solving for the parameters in the general | |
94 formula: | |
95 | |
96 $$\frac{2}{||w||} = 4$$ | |
97 | |
98 $$\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}} = 4$$ | |
99 | |
100 $$\frac{4}{\vec{w}\cdot \vec{w}} = 16$$ (squaring both sides) | |
101 | |
102 $$\vec{w}\cdot \vec{w} = \frac{1}{4}$$ | |
103 | |
104 $$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}0\\c\end{bmatrix} | |
105 = \frac{1}{4}$$ | |
106 | |
107 (This uses the expression for $\vec{w}$ we got after | |
108 we multiplied by $c$ above.) | |
109 | |
110 $$\begin{eqnarray*} | |
111 c^2 &=& \frac{1}{4}\\ | |
112 c &=& \pm \frac{1}{2}\\ | |
113 c &=& \frac{1}{2}\end{eqnarray*}$$...because we require $c>0$ so that it doesn't flip the inequality. | |
114 | |
115 | |
116 Returning to our inequality, we have | |
117 | |
118 $$\begin{eqnarray*}\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\\ | |
119 \begin{bmatrix}0&\frac{1}{2}\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\end{eqnarray*}$$ | |
120 | |
121 $$\underbrace{\begin{bmatrix}0&\frac{1}{2}\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$ | |
122 | |
123 And this is truly the right answer, now that we have solved for $c$. | |
124 | |
125 | |
126 * What's the use of gutters? | |
127 The gutters are the regions where you have no hard data supporting | |
128 your classification, but the decision boundary is, as its name | |
129 implies, the cutoff for making a decision between the two | |
130 classifications. As such, you would likely have lower confidence in | |
131 any decision within the gutters, but a classifier needs to make a | |
132 decision one way or the other for all possible test data (there's no | |
133 reason not to do so, as making no decision means you're automatically | |
134 wrong in all cases, whereas even a lousy classifier will get some of | |
135 them right). | |
136 | |
137 ---- | |
138 | |
139 Here are two more ways to look at it. First of all, and simplest, SVMs | |
140 are supposed to find the boundary with the widest margin. The width of | |
141 the margin is just the perpendicular distance between the gutters, so | |
142 the gutters are important for that reason. | |
143 | |
144 Second of all, if you attempt to define the SVM problem | |
145 geometrically, i.e. put it into a form that can be solved by a | |
146 computer, it looks something like the following. | |
147 | |
148 #+BEGIN_QUOTE | |
149 Find a boundary --- which is defined by a normal vector | |
150 $\vec{w}$ and an offset $b$ --- such that: | |
151 1. The boundary \ldquo{} $\vec{w}\cdot \vec{x} + b = 0$ \rdquo{} | |
152 separates the positive and negative points | |
153 2. The width of the margin is as large as possible. Remember, the | |
154 margin width is twice the distance between the boundary and the | |
155 training point that is closest to the boundary; in other words, | |
156 $$\text{margin-width} = \min_{\text{training data}} 2 y_i | |
157 (\vec{w}\cdot \vec{x}_i + b)$$ | |
158 #+END_QUOTE | |
159 | |
160 Unfortunately, the problem as stated has no unique solution, since if | |
161 you find a satisfactory pair $\langle \vec{w}, b \rangle$ , then the | |
162 pair $\langle 3\vec{w}, 3b\rangle$ (for example) defines the same | |
163 boundary but has a larger margin. The problem is that any multiple of | |
164 a satisfactory pair yields another satisfactory pair. In essence, | |
165 we're just changing the units of measurement without materially | |
166 affecting the boundary. | |
167 | |
168 We can /remove this additional degree of freedom by adding another | |
169 constraint/ to the problem which establishes a sense of scale. For | |
170 example, we could require $\vec{w}$ to be a /unit/ normal vector, | |
171 i.e. we could require that $||\vec{w}|| = 1$. This fixes the problem | |
172 and gives SVMs a unique solution. | |
173 | |
174 In 6.034, and elsewhere, we use an alternate constraint; we impose the /gutter conditions/: | |
175 | |
176 $\vec{w}\cdot \vec{x_+} + b = + 1$ for positive training points | |
177 $\vec{x_+}$. | |
178 | |
179 $\vec{w}\cdot \vec{x_-} + b = -1$ for negative | |
180 training points $\vec{x_-}$. | |
181 | |
182 which we often combine into a single inequality: $y_i (\vec{w}\cdot | |
183 \vec{x}_i + b) \geqslant 1$ for all training points $\langle | |
184 \vec{x}_i, y_i\rangle$. | |
185 | |
186 | |
187 This constraint is enough to eliminate the extra degree of freedom and give SVMs a unique solution. Moreover, you get a nice formula for the margin width by requiring that it holds: | |
188 | |
189 $\text{margin-width} = \frac{2}{||\vec{w}||}$ because the gutter | |
190 conditions are enforced. | |
191 | |
192 | |
193 That's why gutters are important. |