rlm@71: #+TITLE: Notes on SVMs rlm@71: #+AUTHOR:Dylan Holmes rlm@71: #+SETUPFILE: ../../aurellem/org/setup.org rlm@71: #+INCLUDE: ../../aurellem/org/level-0.org rlm@71: #+MATHJAX: align:"left" mathml:t path:"http://www.aurellem.org/MathJax/MathJax.js" rlm@71: rlm@71: * Question: How do you find $\vec{w}$, $b$, and the \alpha{}'s? rlm@71: rlm@71: You should take a look at the class notes, for starters: rlm@71: http://ai6034.mit.edu/fall12/images/SVM_and_Boosting.pdf rlm@71: rlm@71: You can also take a look at @388 to see an example of (part of) a rlm@71: worked problem. rlm@71: rlm@71: rlm@71: When solving SVM problems, there are some useful equations to keep in rlm@71: mind: rlm@71: rlm@71: 1. $\vec{w}\cdot \vec{x} + b = 0$ defines the boundary, and in rlm@71: particular $\vec{w}\cdot \vec{x} + b \geq 0$ defines the positive rlm@71: side of the boundary. rlm@71: 2. $\vec{w}\cdot \vec{x} + b = \pm 1$ defines the positive and rlm@71: negative gutters. rlm@71: 3. The distance between the gutters (the width of the margin) is rlm@71: $\text{margin-width} = \frac{2}{||\vec{w}||} = rlm@71: \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$ rlm@71: 4. $\sum_{\text{training data}} y_i \alpha_i = rlm@71: 0$ rlm@71: 5. $\sum_{\text{training data}} y_i \alpha_i \vec{x}_i = \vec{w}$ rlm@71: rlm@71: The last two equations are the ones that can help you find the rlm@71: $\alpha_i$. You may also find it useful to think of the $\alpha_i$ rlm@71: as measuring \ldquo{}supportiveness\rdquo{}. This means, for rlm@71: example, that: - $\alpha_i$ is zero for non-support vectors, rlm@71: i.e. training points that do not determine the boundary, and which rlm@71: would not affect the placement of the boundary if deleted - When you rlm@71: compare two separate SVM problems, where the first has support rlm@71: vectors that are far from the boundary, and the second has support rlm@71: vectors very close to the boundary, the latter support vectors have rlm@71: comparatively higher $\alpha_i$ values. rlm@71: rlm@71: ** Can \alpha{} be zero for support vectors? rlm@71: No, \alpha{} is zero for a point if and only if it's not a support rlm@71: vector. You can see this because in equation (5), just the points rlm@71: with nonzero alphas contribute to $\vec{w}$. rlm@71: rlm@71: ** Are all points on the gutter support vectors? How do you tell whether a point is a support vector? rlm@71: rlm@71: Not all points on the gutter are support vectors. To tell if a rlm@71: training point is a support vector, imagine what would happen if you rlm@71: deleted it --- would the decision boundary be different? If it would, rlm@71: the point is a support vector. If it wouldn't, the point isn't a rlm@71: support vector. rlm@71: rlm@71: * 2011 Q2 Part A rlm@71: rlm@71: The equation for the line you drew in part A is $y = 0$, and points rlm@71: will be classified as positive if $y\geq 0$. rlm@71: rlm@71: # shouldn't points be classified as positive only if leq 1 + b dot w x? rlm@71: Now in general, the equation for the decision boundary is rlm@71: $\vec{w}\cdot \vec{x} + b = 0$, and points $\vec{x}$ are rlm@71: classified as positive if $\vec{w}\cdot \vec{x} + b \geq 0$. To rlm@71: solve for $\vec{w}$ and $b$, you just manipulate your inequation rlm@71: for the boundary into this form: rlm@71: rlm@71: $$y\geq 0$$ rlm@71: $$0x + 1y + 0 \geq 0$$ rlm@71: $$\begin{bmatrix}0&1\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 \geq 0$$ rlm@71: rlm@71: And so we have: rlm@71: rlm@71: $$\underbrace{\begin{bmatrix}0&1\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$ rlm@71: rlm@71: ...which is /almost/ right. The trouble is that we can multiply this rlm@71: inequation by any positive constant $c>0$ and it will still be true rlm@71: --- so, the right answer might not be the $\vec{w}$ and $b$ we found rlm@71: above, but instead a /multiple/ of them. To fix this, we multiply the rlm@71: inequation by $c>0$, and solve for $c$: rlm@71: rlm@71: $$c\cdot (0x + 1y + 0) \geq c\cdot0$$ rlm@71: $$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + rlm@71: 0 \geq 0$$ rlm@71: rlm@71: One formula that you can use to solve for $c$ is the margin-width rlm@71: formula; by examining the lines you drew, you see that the margin rlm@71: width is *4*. Now, in general, the equation for margin width is rlm@71: rlm@71: $$\text{margin-width} = \frac{2}{||\vec{w}||} = \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$$ rlm@71: rlm@71: So, for the second time, we use the technique of setting the quantity rlm@71: you can see (margin width = 4) equal to the general formula (margin rlm@71: width = 2/||w||) and solving for the parameters in the general rlm@71: formula: rlm@71: rlm@71: $$\frac{2}{||w||} = 4$$ rlm@71: rlm@71: $$\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}} = 4$$ rlm@71: rlm@71: $$\frac{4}{\vec{w}\cdot \vec{w}} = 16$$ (squaring both sides) rlm@71: rlm@71: $$\vec{w}\cdot \vec{w} = \frac{1}{4}$$ rlm@71: rlm@71: $$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}0\\c\end{bmatrix} rlm@71: = \frac{1}{4}$$ rlm@71: rlm@71: (This uses the expression for $\vec{w}$ we got after rlm@71: we multiplied by $c$ above.) rlm@71: rlm@71: $$\begin{eqnarray*} rlm@71: c^2 &=& \frac{1}{4}\\ rlm@71: c &=& \pm \frac{1}{2}\\ rlm@71: c &=& \frac{1}{2}\end{eqnarray*}$$...because we require $c>0$ so that it doesn't flip the inequality. rlm@71: rlm@71: rlm@71: Returning to our inequality, we have rlm@71: rlm@71: $$\begin{eqnarray*}\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\\ rlm@71: \begin{bmatrix}0&\frac{1}{2}\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\end{eqnarray*}$$ rlm@71: rlm@71: $$\underbrace{\begin{bmatrix}0&\frac{1}{2}\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$ rlm@71: rlm@71: And this is truly the right answer, now that we have solved for $c$. rlm@71: rlm@71: rlm@71: * What's the use of gutters? rlm@71: The gutters are the regions where you have no hard data supporting rlm@71: your classification, but the decision boundary is, as its name rlm@71: implies, the cutoff for making a decision between the two rlm@71: classifications. As such, you would likely have lower confidence in rlm@71: any decision within the gutters, but a classifier needs to make a rlm@71: decision one way or the other for all possible test data (there's no rlm@71: reason not to do so, as making no decision means you're automatically rlm@71: wrong in all cases, whereas even a lousy classifier will get some of rlm@71: them right). rlm@71: rlm@71: ---- rlm@71: rlm@71: Here are two more ways to look at it. First of all, and simplest, SVMs rlm@71: are supposed to find the boundary with the widest margin. The width of rlm@71: the margin is just the perpendicular distance between the gutters, so rlm@71: the gutters are important for that reason. rlm@71: rlm@71: Second of all, if you attempt to define the SVM problem rlm@71: geometrically, i.e. put it into a form that can be solved by a rlm@71: computer, it looks something like the following. rlm@71: rlm@71: #+BEGIN_QUOTE rlm@71: Find a boundary --- which is defined by a normal vector rlm@71: $\vec{w}$ and an offset $b$ --- such that: rlm@71: 1. The boundary \ldquo{} $\vec{w}\cdot \vec{x} + b = 0$ \rdquo{} rlm@71: separates the positive and negative points rlm@71: 2. The width of the margin is as large as possible. Remember, the rlm@71: margin width is twice the distance between the boundary and the rlm@71: training point that is closest to the boundary; in other words, rlm@71: $$\text{margin-width} = \min_{\text{training data}} 2 y_i rlm@71: (\vec{w}\cdot \vec{x}_i + b)$$ rlm@71: #+END_QUOTE rlm@71: rlm@71: Unfortunately, the problem as stated has no unique solution, since if rlm@71: you find a satisfactory pair $\langle \vec{w}, b \rangle$ , then the rlm@71: pair $\langle 3\vec{w}, 3b\rangle$ (for example) defines the same rlm@71: boundary but has a larger margin. The problem is that any multiple of rlm@71: a satisfactory pair yields another satisfactory pair. In essence, rlm@71: we're just changing the units of measurement without materially rlm@71: affecting the boundary. rlm@71: rlm@71: We can /remove this additional degree of freedom by adding another rlm@71: constraint/ to the problem which establishes a sense of scale. For rlm@71: example, we could require $\vec{w}$ to be a /unit/ normal vector, rlm@71: i.e. we could require that $||\vec{w}|| = 1$. This fixes the problem rlm@71: and gives SVMs a unique solution. rlm@71: rlm@71: In 6.034, and elsewhere, we use an alternate constraint; we impose the /gutter conditions/: rlm@71: rlm@71: $\vec{w}\cdot \vec{x_+} + b = + 1$ for positive training points rlm@71: $\vec{x_+}$. rlm@71: rlm@71: $\vec{w}\cdot \vec{x_-} + b = -1$ for negative rlm@71: training points $\vec{x_-}$. rlm@71: rlm@71: which we often combine into a single inequality: $y_i (\vec{w}\cdot rlm@71: \vec{x}_i + b) \geqslant 1$ for all training points $\langle rlm@71: \vec{x}_i, y_i\rangle$. rlm@71: rlm@71: rlm@71: This constraint is enough to eliminate the extra degree of freedom and give SVMs a unique solution. Moreover, you get a nice formula for the margin width by requiring that it holds: rlm@71: rlm@71: $\text{margin-width} = \frac{2}{||\vec{w}||}$ because the gutter rlm@71: conditions are enforced. rlm@71: rlm@71: rlm@71: That's why gutters are important.