Mercurial > thoughts
view org/svm.org @ 103:6c504a6a4413
organize minsky material into its own section.
author | rlm |
---|---|
date | Tue, 03 Jun 2014 12:05:34 -0400 |
parents | dc521fea9219 |
children |
line wrap: on
line source
1 #+TITLE: Notes on SVMs2 #+AUTHOR:Dylan Holmes3 #+SETUPFILE: ../../aurellem/org/setup.org4 #+INCLUDE: ../../aurellem/org/level-0.org5 #+MATHJAX: align:"left" mathml:t path:"http://www.aurellem.org/MathJax/MathJax.js"7 * Question: How do you find $\vec{w}$, $b$, and the \alpha{}'s?9 You should take a look at the class notes, for starters:10 http://ai6034.mit.edu/fall12/images/SVM_and_Boosting.pdf12 You can also take a look at @388 to see an example of (part of) a13 worked problem.16 When solving SVM problems, there are some useful equations to keep in17 mind:19 1. $\vec{w}\cdot \vec{x} + b = 0$ defines the boundary, and in20 particular $\vec{w}\cdot \vec{x} + b \geq 0$ defines the positive21 side of the boundary.22 2. $\vec{w}\cdot \vec{x} + b = \pm 1$ defines the positive and23 negative gutters.24 3. The distance between the gutters (the width of the margin) is25 $\text{margin-width} = \frac{2}{||\vec{w}||} =26 \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$27 4. $\sum_{\text{training data}} y_i \alpha_i =28 0$29 5. $\sum_{\text{training data}} y_i \alpha_i \vec{x}_i = \vec{w}$31 The last two equations are the ones that can help you find the32 $\alpha_i$. You may also find it useful to think of the $\alpha_i$33 as measuring \ldquo{}supportiveness\rdquo{}. This means, for34 example, that: - $\alpha_i$ is zero for non-support vectors,35 i.e. training points that do not determine the boundary, and which36 would not affect the placement of the boundary if deleted - When you37 compare two separate SVM problems, where the first has support38 vectors that are far from the boundary, and the second has support39 vectors very close to the boundary, the latter support vectors have40 comparatively higher $\alpha_i$ values.42 ** Can \alpha{} be zero for support vectors?43 No, \alpha{} is zero for a point if and only if it's not a support44 vector. You can see this because in equation (5), just the points45 with nonzero alphas contribute to $\vec{w}$.47 ** Are all points on the gutter support vectors? How do you tell whether a point is a support vector?49 Not all points on the gutter are support vectors. To tell if a50 training point is a support vector, imagine what would happen if you51 deleted it --- would the decision boundary be different? If it would,52 the point is a support vector. If it wouldn't, the point isn't a53 support vector.55 * 2011 Q2 Part A57 The equation for the line you drew in part A is $y = 0$, and points58 will be classified as positive if $y\geq 0$.60 # shouldn't points be classified as positive only if leq 1 + b dot w x?61 Now in general, the equation for the decision boundary is62 $\vec{w}\cdot \vec{x} + b = 0$, and points $\vec{x}$ are63 classified as positive if $\vec{w}\cdot \vec{x} + b \geq 0$. To64 solve for $\vec{w}$ and $b$, you just manipulate your inequation65 for the boundary into this form:67 $$y\geq 0$$68 $$0x + 1y + 0 \geq 0$$69 $$\begin{bmatrix}0&1\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 \geq 0$$71 And so we have:73 $$\underbrace{\begin{bmatrix}0&1\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$75 ...which is /almost/ right. The trouble is that we can multiply this76 inequation by any positive constant $c>0$ and it will still be true77 --- so, the right answer might not be the $\vec{w}$ and $b$ we found78 above, but instead a /multiple/ of them. To fix this, we multiply the79 inequation by $c>0$, and solve for $c$:81 $$c\cdot (0x + 1y + 0) \geq c\cdot0$$82 $$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} +83 0 \geq 0$$85 One formula that you can use to solve for $c$ is the margin-width86 formula; by examining the lines you drew, you see that the margin87 width is *4*. Now, in general, the equation for margin width is89 $$\text{margin-width} = \frac{2}{||\vec{w}||} = \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$$91 So, for the second time, we use the technique of setting the quantity92 you can see (margin width = 4) equal to the general formula (margin93 width = 2/||w||) and solving for the parameters in the general94 formula:96 $$\frac{2}{||w||} = 4$$98 $$\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}} = 4$$100 $$\frac{4}{\vec{w}\cdot \vec{w}} = 16$$ (squaring both sides)102 $$\vec{w}\cdot \vec{w} = \frac{1}{4}$$104 $$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}0\\c\end{bmatrix}105 = \frac{1}{4}$$107 (This uses the expression for $\vec{w}$ we got after108 we multiplied by $c$ above.)110 $$\begin{eqnarray*}111 c^2 &=& \frac{1}{4}\\112 c &=& \pm \frac{1}{2}\\113 c &=& \frac{1}{2}\end{eqnarray*}$$...because we require $c>0$ so that it doesn't flip the inequality.116 Returning to our inequality, we have118 $$\begin{eqnarray*}\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\\119 \begin{bmatrix}0&\frac{1}{2}\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\end{eqnarray*}$$121 $$\underbrace{\begin{bmatrix}0&\frac{1}{2}\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$123 And this is truly the right answer, now that we have solved for $c$.126 * What's the use of gutters?127 The gutters are the regions where you have no hard data supporting128 your classification, but the decision boundary is, as its name129 implies, the cutoff for making a decision between the two130 classifications. As such, you would likely have lower confidence in131 any decision within the gutters, but a classifier needs to make a132 decision one way or the other for all possible test data (there's no133 reason not to do so, as making no decision means you're automatically134 wrong in all cases, whereas even a lousy classifier will get some of135 them right).137 ----139 Here are two more ways to look at it. First of all, and simplest, SVMs140 are supposed to find the boundary with the widest margin. The width of141 the margin is just the perpendicular distance between the gutters, so142 the gutters are important for that reason.144 Second of all, if you attempt to define the SVM problem145 geometrically, i.e. put it into a form that can be solved by a146 computer, it looks something like the following.148 #+BEGIN_QUOTE149 Find a boundary --- which is defined by a normal vector150 $\vec{w}$ and an offset $b$ --- such that:151 1. The boundary \ldquo{} $\vec{w}\cdot \vec{x} + b = 0$ \rdquo{}152 separates the positive and negative points153 2. The width of the margin is as large as possible. Remember, the154 margin width is twice the distance between the boundary and the155 training point that is closest to the boundary; in other words,156 $$\text{margin-width} = \min_{\text{training data}} 2 y_i157 (\vec{w}\cdot \vec{x}_i + b)$$158 #+END_QUOTE160 Unfortunately, the problem as stated has no unique solution, since if161 you find a satisfactory pair $\langle \vec{w}, b \rangle$ , then the162 pair $\langle 3\vec{w}, 3b\rangle$ (for example) defines the same163 boundary but has a larger margin. The problem is that any multiple of164 a satisfactory pair yields another satisfactory pair. In essence,165 we're just changing the units of measurement without materially166 affecting the boundary.168 We can /remove this additional degree of freedom by adding another169 constraint/ to the problem which establishes a sense of scale. For170 example, we could require $\vec{w}$ to be a /unit/ normal vector,171 i.e. we could require that $||\vec{w}|| = 1$. This fixes the problem172 and gives SVMs a unique solution.174 In 6.034, and elsewhere, we use an alternate constraint; we impose the /gutter conditions/:176 $\vec{w}\cdot \vec{x_+} + b = + 1$ for positive training points177 $\vec{x_+}$.179 $\vec{w}\cdot \vec{x_-} + b = -1$ for negative180 training points $\vec{x_-}$.182 which we often combine into a single inequality: $y_i (\vec{w}\cdot183 \vec{x}_i + b) \geqslant 1$ for all training points $\langle184 \vec{x}_i, y_i\rangle$.187 This constraint is enough to eliminate the extra degree of freedom and give SVMs a unique solution. Moreover, you get a nice formula for the margin width by requiring that it holds:189 $\text{margin-width} = \frac{2}{||\vec{w}||}$ because the gutter190 conditions are enforced.193 That's why gutters are important.