Mercurial > thoughts

     1 #+TITLE: Notes on SVMs

     2 #+AUTHOR:Dylan Holmes

     3 #+SETUPFILE: ../../aurellem/org/setup.org

     4 #+INCLUDE:   ../../aurellem/org/level-0.org

     5 #+MATHJAX: align:"left" mathml:t path:"http://www.aurellem.org/MathJax/MathJax.js"

     6 

     7 * Question: How do you find $\vec{w}$, $b$, and the \alpha{}'s?

     8 

     9 You should take a look at the class notes, for starters:

    10 http://ai6034.mit.edu/fall12/images/SVM_and_Boosting.pdf

    11 

    12 You can also take a look at @388 to see an example of (part of) a

    13 worked problem.

    14 

    15 

    16 When solving SVM problems, there are some useful equations to keep in

    17 mind:

    18 

    19 1. $\vec{w}\cdot \vec{x} + b = 0$ defines the boundary, and in

    20    particular $\vec{w}\cdot \vec{x} + b \geq 0$ defines the positive

    21    side of the boundary.

    22 2. $\vec{w}\cdot \vec{x} + b = \pm 1$ defines the positive and

    23    negative gutters.  

    24 3. The distance between the gutters (the width of the margin) is

    25    $\text{margin-width} = \frac{2}{||\vec{w}||} =

    26    \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$ 

    27 4. $\sum_{\text{training data}} y_i \alpha_i =

    28    0$ 

    29 5. $\sum_{\text{training data}} y_i \alpha_i \vec{x}_i = \vec{w}$

    30 

    31 The last two equations are the ones that can help you find the

    32   $\alpha_i$. You may also find it useful to think of the $\alpha_i$

    33   as measuring \ldquo{}supportiveness\rdquo{}. This means, for

    34   example, that: - $\alpha_i$ is zero for non-support vectors,

    35   i.e. training points that do not determine the boundary, and which

    36   would not affect the placement of the boundary if deleted - When you

    37   compare two separate SVM problems, where the first has support

    38   vectors that are far from the boundary, and the second has support

    39   vectors very close to the boundary, the latter support vectors have

    40   comparatively higher $\alpha_i$ values.

    41 

    42 ** Can \alpha{} be zero for support vectors?

    43 No, \alpha{} is zero for a point if and only if it's not a support

    44 vector. You can see this because in equation (5), just the points

    45 with nonzero alphas contribute to $\vec{w}$.

    46 

    47 ** Are all points on the gutter support vectors? How do you tell whether a point is a support vector?

    48 

    49 Not all points on the gutter are support vectors. To tell if a

    50 training point is a support vector, imagine what would happen if you

    51 deleted it --- would the decision boundary be different? If it would,

    52 the point is a support vector. If it wouldn't, the point isn't a

    53 support vector.

    54 

    55 * 2011 Q2 Part A

    56 

    57 The equation for the line you drew in part A is $y = 0$, and points

    58 will be classified as positive if $y\geq 0$.

    59 

    60 # shouldn't points be classified as positive only if leq 1 + b dot w x? 

    61 Now in general, the equation for the decision boundary is

    62 $\vec{w}\cdot \vec{x} + b = 0$, and points $\vec{x}$ are

    63 classified as positive if $\vec{w}\cdot \vec{x} + b \geq 0$. To

    64 solve for $\vec{w}$ and $b$, you just manipulate your inequation

    65 for the boundary into this form:

    66 

    67 $$y\geq 0$$

    68 $$0x + 1y + 0 \geq 0$$

    69 $$\begin{bmatrix}0&amp;1\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 \geq 0$$

    70 

    71 And so we have:

    72 

    73 $$\underbrace{\begin{bmatrix}0&amp;1\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$

    74 

    75 ...which is /almost/ right. The trouble is that we can multiply this

    76 inequation by any positive constant $c>0$ and it will still be true

    77 --- so, the right answer might not be the $\vec{w}$ and $b$ we found

    78 above, but instead a /multiple/ of them. To fix this, we multiply the

    79 inequation by $c>0$, and solve for $c$:

    80 

    81 $$c\cdot (0x + 1y + 0) \geq c\cdot0$$

    82 $$\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} +

    83 0 \geq 0$$

    84 

    85 One formula that you can use to solve for $c$ is the margin-width

    86 formula; by examining the lines you drew, you see that the margin

    87 width is *4*. Now, in general, the equation for margin width is

    88 

    89 $$\text{margin-width} = \frac{2}{||\vec{w}||} = \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$$

    90 

    91 So, for the second time, we use the technique of setting the quantity

    92 you can see (margin width = 4) equal to the general formula (margin

    93 width = 2/||w||) and solving for the parameters in the general

    94 formula:

    95 

    96 $$\frac{2}{||w||} = 4$$

    97 

    98 $$\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}} = 4$$

    99 

   100 $$\frac{4}{\vec{w}\cdot \vec{w}} = 16$$ (squaring both sides)

   101 

   102 $$\vec{w}\cdot \vec{w} = \frac{1}{4}$$

   103 

   104 $$\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}0\\c\end{bmatrix}

   105 = \frac{1}{4}$$ 

   106 

   107 (This uses the expression for $\vec{w}$ we got after

   108 we multiplied by $c$ above.)

   109 

   110 $$\begin{eqnarray*}

   111 c^2 &=& \frac{1}{4}\\

   112 c &=& \pm \frac{1}{2}\\

   113 c &=& \frac{1}{2}\end{eqnarray*}$$...because we require $c>0$ so that it doesn't flip the inequality.

   114 

   115 

   116 Returning to our inequality, we have 

   117 

   118 $$\begin{eqnarray*}\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\\

   119 \begin{bmatrix}0&amp;\frac{1}{2}\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\end{eqnarray*}$$

   120 

   121 $$\underbrace{\begin{bmatrix}0&amp;\frac{1}{2}\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$

   122 

   123 And this is truly the right answer, now that we have solved for $c$.

   124 

   125 

   126 * What's the use of gutters?

   127 The gutters are the regions where you have no hard data supporting

   128 your classification, but the decision boundary is, as its name

   129 implies, the cutoff for making a decision between the two

   130 classifications.  As such, you would likely have lower confidence in

   131 any decision within the gutters, but a classifier needs to make a

   132 decision one way or the other for all possible test data (there's no

   133 reason not to do so, as making no decision means you're automatically

   134 wrong in all cases, whereas even a lousy classifier will get some of

   135 them right).

   136 

   137 ----

   138 

   139 Here are two more ways to look at it. First of all, and simplest, SVMs

   140 are supposed to find the boundary with the widest margin. The width of

   141 the margin is just the perpendicular distance between the gutters, so

   142 the gutters are important for that reason.

   143 

   144  Second of all, if you attempt to define the SVM problem

   145  geometrically, i.e. put it into a form that can be solved by a

   146  computer, it looks something like the following.

   147 

   148 #+BEGIN_QUOTE

   149 Find a boundary --- which is defined by a normal vector

   150 $\vec{w}$ and an offset $b$ --- such that:

   151 1. The boundary \ldquo{} $\vec{w}\cdot \vec{x} + b = 0$ \rdquo{}

   152    separates the positive and negative points

   153 2. The width of the margin is as large as possible. Remember, the

   154    margin width is twice the distance between the boundary and the

   155    training point that is closest to the boundary; in other words,

   156    $$\text{margin-width} = \min_{\text{training data}} 2 y_i

   157    (\vec{w}\cdot \vec{x}_i + b)$$

   158 #+END_QUOTE

   159 

   160 Unfortunately, the problem as stated has no unique solution, since if

   161 you find a satisfactory pair $\langle \vec{w}, b \rangle$ , then the

   162 pair $\langle 3\vec{w}, 3b\rangle$ (for example) defines the same

   163 boundary but has a larger margin. The problem is that any multiple of

   164 a satisfactory pair yields another satisfactory pair. In essence,

   165 we're just changing the units of measurement without materially

   166 affecting the boundary.

   167 

   168 We can /remove this additional degree of freedom by adding another

   169 constraint/ to the problem which establishes a sense of scale. For

   170 example, we could require $\vec{w}$ to be a /unit/ normal vector,

   171 i.e. we could require that $||\vec{w}|| = 1$. This fixes the problem

   172 and gives SVMs a unique solution.

   173 

   174 In 6.034, and elsewhere, we use an alternate constraint; we impose the /gutter conditions/:  

   175 

   176 $\vec{w}\cdot \vec{x_+} + b = + 1$ for positive training points

   177 $\vec{x_+}$.  

   178 

   179 $\vec{w}\cdot \vec{x_-} + b = -1$ for negative

   180 training points $\vec{x_-}$.

   181 

   182 which we often combine into a single inequality: $y_i (\vec{w}\cdot

   183 \vec{x}_i + b) \geqslant 1$ for all training points $\langle

   184 \vec{x}_i, y_i\rangle$.

   185 

   186 

   187 This constraint is enough to eliminate the extra degree of freedom and give SVMs a unique solution. Moreover, you get a nice formula for the margin width by requiring that it holds:

   188 

   189 $\text{margin-width} = \frac{2}{||\vec{w}||}$ because the gutter

   190 conditions are enforced.

   191 

   192 

   193 That's why gutters are important.
author	rlm
date	Tue, 03 Jun 2014 15:33:21 -0400
parents	dc521fea9219
children