# HG changeset patch # User Robert McIntyre # Date 1383773857 18000 # Node ID dc521fea9219f194779e886c1d5007459c029681 # Parent ed7d0ca99c5534d16127ece42aa49916b23fe51f add svn and time stamp idea. diff -r ed7d0ca99c55 -r dc521fea9219 org/ideas.org --- a/org/ideas.org Mon Oct 14 22:26:21 2013 -0400 +++ b/org/ideas.org Wed Nov 06 16:37:37 2013 -0500 @@ -31,6 +31,10 @@ getting credit. #+end_quote +- time verification :: some standard way to verify that some piece of + data was recorded at a specific time. Might involve a time + server, a key for each time period, something liek that. + - tamper proof gold bars :: [[http://www.tungsten-alloy.com/gold-plated-tungsten-alloy-bar.html][this site]] offers gold plated tungsten bars as "novelty" items. One reason to prefer coins is because they are much harder to counterfeit because there is less surface area diff -r ed7d0ca99c55 -r dc521fea9219 org/svm.org --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/org/svm.org Wed Nov 06 16:37:37 2013 -0500 @@ -0,0 +1,193 @@ +#+TITLE: Notes on SVMs +#+AUTHOR:Dylan Holmes +#+SETUPFILE: ../../aurellem/org/setup.org +#+INCLUDE: ../../aurellem/org/level-0.org +#+MATHJAX: align:"left" mathml:t path:"http://www.aurellem.org/MathJax/MathJax.js" + +* Question: How do you find $\vec{w}$, $b$, and the \alpha{}'s? + +You should take a look at the class notes, for starters: +http://ai6034.mit.edu/fall12/images/SVM_and_Boosting.pdf + +You can also take a look at @388 to see an example of (part of) a +worked problem. + + +When solving SVM problems, there are some useful equations to keep in +mind: + +1. $\vec{w}\cdot \vec{x} + b = 0$ defines the boundary, and in + particular $\vec{w}\cdot \vec{x} + b \geq 0$ defines the positive + side of the boundary. +2. $\vec{w}\cdot \vec{x} + b = \pm 1$ defines the positive and + negative gutters. +3. The distance between the gutters (the width of the margin) is + $\text{margin-width} = \frac{2}{||\vec{w}||} = + \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$ +4. $\sum_{\text{training data}} y_i \alpha_i = + 0$ +5. $\sum_{\text{training data}} y_i \alpha_i \vec{x}_i = \vec{w}$ + +The last two equations are the ones that can help you find the + $\alpha_i$. You may also find it useful to think of the $\alpha_i$ + as measuring \ldquo{}supportiveness\rdquo{}. This means, for + example, that: - $\alpha_i$ is zero for non-support vectors, + i.e. training points that do not determine the boundary, and which + would not affect the placement of the boundary if deleted - When you + compare two separate SVM problems, where the first has support + vectors that are far from the boundary, and the second has support + vectors very close to the boundary, the latter support vectors have + comparatively higher $\alpha_i$ values. + +** Can \alpha{} be zero for support vectors? +No, \alpha{} is zero for a point if and only if it's not a support +vector. You can see this because in equation (5), just the points +with nonzero alphas contribute to $\vec{w}$. + +** Are all points on the gutter support vectors? How do you tell whether a point is a support vector? + +Not all points on the gutter are support vectors. To tell if a +training point is a support vector, imagine what would happen if you +deleted it --- would the decision boundary be different? If it would, +the point is a support vector. If it wouldn't, the point isn't a +support vector. + +* 2011 Q2 Part A + +The equation for the line you drew in part A is $y = 0$, and points +will be classified as positive if $y\geq 0$. + +# shouldn't points be classified as positive only if leq 1 + b dot w x? +Now in general, the equation for the decision boundary is +$\vec{w}\cdot \vec{x} + b = 0$, and points $\vec{x}$ are +classified as positive if $\vec{w}\cdot \vec{x} + b \geq 0$. To +solve for $\vec{w}$ and $b$, you just manipulate your inequation +for the boundary into this form: + +$$y\geq 0$$ +$$0x + 1y + 0 \geq 0$$ +$$\begin{bmatrix}0&1\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 \geq 0$$ + +And so we have: + +$$\underbrace{\begin{bmatrix}0&1\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$ + +...which is /almost/ right. The trouble is that we can multiply this +inequation by any positive constant $c>0$ and it will still be true +--- so, the right answer might not be the $\vec{w}$ and $b$ we found +above, but instead a /multiple/ of them. To fix this, we multiply the +inequation by $c>0$, and solve for $c$: + +$$c\cdot (0x + 1y + 0) \geq c\cdot0$$ +$$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + +0 \geq 0$$ + +One formula that you can use to solve for $c$ is the margin-width +formula; by examining the lines you drew, you see that the margin +width is *4*. Now, in general, the equation for margin width is + +$$\text{margin-width} = \frac{2}{||\vec{w}||} = \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$$ + +So, for the second time, we use the technique of setting the quantity +you can see (margin width = 4) equal to the general formula (margin +width = 2/||w||) and solving for the parameters in the general +formula: + +$$\frac{2}{||w||} = 4$$ + +$$\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}} = 4$$ + +$$\frac{4}{\vec{w}\cdot \vec{w}} = 16$$ (squaring both sides) + +$$\vec{w}\cdot \vec{w} = \frac{1}{4}$$ + +$$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}0\\c\end{bmatrix} += \frac{1}{4}$$ + +(This uses the expression for $\vec{w}$ we got after +we multiplied by $c$ above.) + +$$\begin{eqnarray*} +c^2 &=& \frac{1}{4}\\ +c &=& \pm \frac{1}{2}\\ +c &=& \frac{1}{2}\end{eqnarray*}$$...because we require $c>0$ so that it doesn't flip the inequality. + + +Returning to our inequality, we have + +$$\begin{eqnarray*}\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\\ +\begin{bmatrix}0&\frac{1}{2}\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\end{eqnarray*}$$ + +$$\underbrace{\begin{bmatrix}0&\frac{1}{2}\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$ + +And this is truly the right answer, now that we have solved for $c$. + + +* What's the use of gutters? +The gutters are the regions where you have no hard data supporting +your classification, but the decision boundary is, as its name +implies, the cutoff for making a decision between the two +classifications. As such, you would likely have lower confidence in +any decision within the gutters, but a classifier needs to make a +decision one way or the other for all possible test data (there's no +reason not to do so, as making no decision means you're automatically +wrong in all cases, whereas even a lousy classifier will get some of +them right). + +---- + +Here are two more ways to look at it. First of all, and simplest, SVMs +are supposed to find the boundary with the widest margin. The width of +the margin is just the perpendicular distance between the gutters, so +the gutters are important for that reason. + + Second of all, if you attempt to define the SVM problem + geometrically, i.e. put it into a form that can be solved by a + computer, it looks something like the following. + +#+BEGIN_QUOTE +Find a boundary --- which is defined by a normal vector +$\vec{w}$ and an offset $b$ --- such that: +1. The boundary \ldquo{} $\vec{w}\cdot \vec{x} + b = 0$ \rdquo{} + separates the positive and negative points +2. The width of the margin is as large as possible. Remember, the + margin width is twice the distance between the boundary and the + training point that is closest to the boundary; in other words, + $$\text{margin-width} = \min_{\text{training data}} 2 y_i + (\vec{w}\cdot \vec{x}_i + b)$$ +#+END_QUOTE + +Unfortunately, the problem as stated has no unique solution, since if +you find a satisfactory pair $\langle \vec{w}, b \rangle$ , then the +pair $\langle 3\vec{w}, 3b\rangle$ (for example) defines the same +boundary but has a larger margin. The problem is that any multiple of +a satisfactory pair yields another satisfactory pair. In essence, +we're just changing the units of measurement without materially +affecting the boundary. + +We can /remove this additional degree of freedom by adding another +constraint/ to the problem which establishes a sense of scale. For +example, we could require $\vec{w}$ to be a /unit/ normal vector, +i.e. we could require that $||\vec{w}|| = 1$. This fixes the problem +and gives SVMs a unique solution. + +In 6.034, and elsewhere, we use an alternate constraint; we impose the /gutter conditions/: + +$\vec{w}\cdot \vec{x_+} + b = + 1$ for positive training points +$\vec{x_+}$. + +$\vec{w}\cdot \vec{x_-} + b = -1$ for negative +training points $\vec{x_-}$. + +which we often combine into a single inequality: $y_i (\vec{w}\cdot +\vec{x}_i + b) \geqslant 1$ for all training points $\langle +\vec{x}_i, y_i\rangle$. + + +This constraint is enough to eliminate the extra degree of freedom and give SVMs a unique solution. Moreover, you get a nice formula for the margin width by requiring that it holds: + +$\text{margin-width} = \frac{2}{||\vec{w}||}$ because the gutter +conditions are enforced. + + +That's why gutters are important.