thoughts: org/svm.org comparison

comparison org/svm.org @ 71:dc521fea9219

add svn and time stamp idea.

author	Robert McIntyre <rlm@mit.edu>
date	Wed, 06 Nov 2013 16:37:37 -0500
parents
children

comparison

equal deleted inserted replaced

-:ed7d0ca99c55
+:dc521fea9219
+#+TITLE: Notes on SVMs
+#+AUTHOR:Dylan Holmes
+#+SETUPFILE: ../../aurellem/org/setup.org
+#+INCLUDE:   ../../aurellem/org/level-0.org
+#+MATHJAX: align:"left" mathml:t path:"http://www.aurellem.org/MathJax/MathJax.js"
+* Question: How do you find $\vec{w}$, $b$, and the \alpha{}'s?
+You should take a look at the class notes, for starters:
+http://ai6034.mit.edu/fall12/images/SVM_and_Boosting.pdf
+You can also take a look at @388 to see an example of (part of) a
+worked problem.
+When solving SVM problems, there are some useful equations to keep in
+mind:
+1. $\vec{w}\cdot \vec{x} + b = 0$ defines the boundary, and in
+particular $\vec{w}\cdot \vec{x} + b \geq 0$ defines the positive
+side of the boundary.
+2. $\vec{w}\cdot \vec{x} + b = \pm 1$ defines the positive and
+negative gutters.
+3. The distance between the gutters (the width of the margin) is
+$\text{margin-width} = \frac{2}{||\vec{w}||} =
+\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$
+4. $\sum_{\text{training data}} y_i \alpha_i =
+0$
+5. $\sum_{\text{training data}} y_i \alpha_i \vec{x}_i = \vec{w}$
+The last two equations are the ones that can help you find the
+$\alpha_i$. You may also find it useful to think of the $\alpha_i$
+as measuring \ldquo{}supportiveness\rdquo{}. This means, for
+example, that: - $\alpha_i$ is zero for non-support vectors,
+i.e. training points that do not determine the boundary, and which
+would not affect the placement of the boundary if deleted - When you
+compare two separate SVM problems, where the first has support
+vectors that are far from the boundary, and the second has support
+vectors very close to the boundary, the latter support vectors have
+comparatively higher $\alpha_i$ values.
+** Can \alpha{} be zero for support vectors?
+No, \alpha{} is zero for a point if and only if it's not a support
+vector. You can see this because in equation (5), just the points
+with nonzero alphas contribute to $\vec{w}$.
+** Are all points on the gutter support vectors? How do you tell whether a point is a support vector?
+Not all points on the gutter are support vectors. To tell if a
+training point is a support vector, imagine what would happen if you
+deleted it --- would the decision boundary be different? If it would,
+the point is a support vector. If it wouldn't, the point isn't a
+support vector.
+* 2011 Q2 Part A
+The equation for the line you drew in part A is $y = 0$, and points
+will be classified as positive if $y\geq 0$.
+# shouldn't points be classified as positive only if leq 1 + b dot w x?
+Now in general, the equation for the decision boundary is
+$\vec{w}\cdot \vec{x} + b = 0$, and points $\vec{x}$ are
+classified as positive if $\vec{w}\cdot \vec{x} + b \geq 0$. To
+solve for $\vec{w}$ and $b$, you just manipulate your inequation
+for the boundary into this form:
+$$y\geq 0$$
+$$0x + 1y + 0 \geq 0$$
+$$\begin{bmatrix}0&amp;1\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 \geq 0$$
+And so we have:
+$$\underbrace{\begin{bmatrix}0&amp;1\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$
+...which is /almost/ right. The trouble is that we can multiply this
+inequation by any positive constant $c>0$ and it will still be true
+--- so, the right answer might not be the $\vec{w}$ and $b$ we found
+above, but instead a /multiple/ of them. To fix this, we multiply the
+inequation by $c>0$, and solve for $c$:
+$$c\cdot (0x + 1y + 0) \geq c\cdot0$$
+$$\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} +
+0 \geq 0$$
+One formula that you can use to solve for $c$ is the margin-width
+formula; by examining the lines you drew, you see that the margin
+width is *4*. Now, in general, the equation for margin width is
+$$\text{margin-width} = \frac{2}{||\vec{w}||} = \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$$
+So, for the second time, we use the technique of setting the quantity
+you can see (margin width = 4) equal to the general formula (margin
+width = 2/||w||) and solving for the parameters in the general
+formula:
+$$\frac{2}{||w||} = 4$$
+$$\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}} = 4$$
+$$\frac{4}{\vec{w}\cdot \vec{w}} = 16$$ (squaring both sides)
+$$\vec{w}\cdot \vec{w} = \frac{1}{4}$$
+$$\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}0\\c\end{bmatrix}
+= \frac{1}{4}$$
+(This uses the expression for $\vec{w}$ we got after
+we multiplied by $c$ above.)
+$$\begin{eqnarray*}
+c^2 &=& \frac{1}{4}\\
+c &=& \pm \frac{1}{2}\\
+c &=& \frac{1}{2}\end{eqnarray*}$$...because we require $c>0$ so that it doesn't flip the inequality.
+Returning to our inequality, we have
+$$\begin{eqnarray*}\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\\
+\begin{bmatrix}0&amp;\frac{1}{2}\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\end{eqnarray*}$$
+$$\underbrace{\begin{bmatrix}0&amp;\frac{1}{2}\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$
+And this is truly the right answer, now that we have solved for $c$.
+* What's the use of gutters?
+The gutters are the regions where you have no hard data supporting
+your classification, but the decision boundary is, as its name
+implies, the cutoff for making a decision between the two
+classifications.  As such, you would likely have lower confidence in
+any decision within the gutters, but a classifier needs to make a
+decision one way or the other for all possible test data (there's no
+reason not to do so, as making no decision means you're automatically
+wrong in all cases, whereas even a lousy classifier will get some of
+them right).
+----
+Here are two more ways to look at it. First of all, and simplest, SVMs
+are supposed to find the boundary with the widest margin. The width of
+the margin is just the perpendicular distance between the gutters, so
+the gutters are important for that reason.
+Second of all, if you attempt to define the SVM problem
+geometrically, i.e. put it into a form that can be solved by a
+computer, it looks something like the following.
+#+BEGIN_QUOTE
+Find a boundary --- which is defined by a normal vector
+$\vec{w}$ and an offset $b$ --- such that:
+1. The boundary \ldquo{} $\vec{w}\cdot \vec{x} + b = 0$ \rdquo{}
+separates the positive and negative points
+2. The width of the margin is as large as possible. Remember, the
+margin width is twice the distance between the boundary and the
+training point that is closest to the boundary; in other words,
+$$\text{margin-width} = \min_{\text{training data}} 2 y_i
+(\vec{w}\cdot \vec{x}_i + b)$$
+#+END_QUOTE
+Unfortunately, the problem as stated has no unique solution, since if
+you find a satisfactory pair $\langle \vec{w}, b \rangle$ , then the
+pair $\langle 3\vec{w}, 3b\rangle$ (for example) defines the same
+boundary but has a larger margin. The problem is that any multiple of
+a satisfactory pair yields another satisfactory pair. In essence,
+we're just changing the units of measurement without materially
+affecting the boundary.
+We can /remove this additional degree of freedom by adding another
+constraint/ to the problem which establishes a sense of scale. For
+example, we could require $\vec{w}$ to be a /unit/ normal vector,
+i.e. we could require that $||\vec{w}|| = 1$. This fixes the problem
+and gives SVMs a unique solution.
+In 6.034, and elsewhere, we use an alternate constraint; we impose the /gutter conditions/:
+$\vec{w}\cdot \vec{x_+} + b = + 1$ for positive training points
+$\vec{x_+}$.
+$\vec{w}\cdot \vec{x_-} + b = -1$ for negative
+training points $\vec{x_-}$.
+which we often combine into a single inequality: $y_i (\vec{w}\cdot
+\vec{x}_i + b) \geqslant 1$ for all training points $\langle
+\vec{x}_i, y_i\rangle$.
+This constraint is enough to eliminate the extra degree of freedom and give SVMs a unique solution. Moreover, you get a nice formula for the margin width by requiring that it holds:
+$\text{margin-width} = \frac{2}{||\vec{w}||}$ because the gutter
+conditions are enforced.
+That's why gutters are important.

Mercurial > thoughts

comparison org/svm.org @ 71:dc521fea9219