Mercurial > thoughts

changeset 71:dc521fea9219
add svn and time stamp idea.
author: Robert McIntyre <rlm@mit.edu>
date: Wed, 06 Nov 2013 16:37:37 -0500
parents: ed7d0ca99c55
children: ace7f097eb6c
files: org/ideas.org org/svm.org
diffstat: 2 files changed, 197 insertions(+), 0 deletions(-) [+]
[-]

org/ideas.org 4

org/svm.org 193 org/ideas.org 4 org/svm.org 193
org/ideas.org 4
org/svm.org 193
     1.1 --- a/org/ideas.org	Mon Oct 14 22:26:21 2013 -0400
     1.2 +++ b/org/ideas.org	Wed Nov 06 16:37:37 2013 -0500
     1.3 @@ -31,6 +31,10 @@
     1.4  getting credit.
     1.5  #+end_quote
     1.6  
     1.7 +- time verification :: some standard way to verify that some piece of
     1.8 +     data was recorded at a specific time. Might involve a time
     1.9 +     server, a key for each time period, something liek that.
    1.10 +
    1.11  - tamper proof gold bars :: [[http://www.tungsten-alloy.com/gold-plated-tungsten-alloy-bar.html][this site]] offers gold plated tungsten bars
    1.12       as "novelty" items. One reason to prefer coins is because they
    1.13       are much harder to counterfeit because there is less surface area

     2.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     2.2 +++ b/org/svm.org	Wed Nov 06 16:37:37 2013 -0500
     2.3 @@ -0,0 +1,193 @@
     2.4 +#+TITLE: Notes on SVMs
     2.5 +#+AUTHOR:Dylan Holmes
     2.6 +#+SETUPFILE: ../../aurellem/org/setup.org
     2.7 +#+INCLUDE:   ../../aurellem/org/level-0.org
     2.8 +#+MATHJAX: align:"left" mathml:t path:"http://www.aurellem.org/MathJax/MathJax.js"
     2.9 +
    2.10 +* Question: How do you find $\vec{w}$, $b$, and the \alpha{}'s?
    2.11 +
    2.12 +You should take a look at the class notes, for starters:
    2.13 +http://ai6034.mit.edu/fall12/images/SVM_and_Boosting.pdf
    2.14 +
    2.15 +You can also take a look at @388 to see an example of (part of) a
    2.16 +worked problem.
    2.17 +
    2.18 +
    2.19 +When solving SVM problems, there are some useful equations to keep in
    2.20 +mind:
    2.21 +
    2.22 +1. $\vec{w}\cdot \vec{x} + b = 0$ defines the boundary, and in
    2.23 +   particular $\vec{w}\cdot \vec{x} + b \geq 0$ defines the positive
    2.24 +   side of the boundary.
    2.25 +2. $\vec{w}\cdot \vec{x} + b = \pm 1$ defines the positive and
    2.26 +   negative gutters.  
    2.27 +3. The distance between the gutters (the width of the margin) is
    2.28 +   $\text{margin-width} = \frac{2}{||\vec{w}||} =
    2.29 +   \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$ 
    2.30 +4. $\sum_{\text{training data}} y_i \alpha_i =
    2.31 +   0$ 
    2.32 +5. $\sum_{\text{training data}} y_i \alpha_i \vec{x}_i = \vec{w}$
    2.33 +
    2.34 +The last two equations are the ones that can help you find the
    2.35 +  $\alpha_i$. You may also find it useful to think of the $\alpha_i$
    2.36 +  as measuring \ldquo{}supportiveness\rdquo{}. This means, for
    2.37 +  example, that: - $\alpha_i$ is zero for non-support vectors,
    2.38 +  i.e. training points that do not determine the boundary, and which
    2.39 +  would not affect the placement of the boundary if deleted - When you
    2.40 +  compare two separate SVM problems, where the first has support
    2.41 +  vectors that are far from the boundary, and the second has support
    2.42 +  vectors very close to the boundary, the latter support vectors have
    2.43 +  comparatively higher $\alpha_i$ values.
    2.44 +
    2.45 +** Can \alpha{} be zero for support vectors?
    2.46 +No, \alpha{} is zero for a point if and only if it's not a support
    2.47 +vector. You can see this because in equation (5), just the points
    2.48 +with nonzero alphas contribute to $\vec{w}$.
    2.49 +
    2.50 +** Are all points on the gutter support vectors? How do you tell whether a point is a support vector?
    2.51 +
    2.52 +Not all points on the gutter are support vectors. To tell if a
    2.53 +training point is a support vector, imagine what would happen if you
    2.54 +deleted it --- would the decision boundary be different? If it would,
    2.55 +the point is a support vector. If it wouldn't, the point isn't a
    2.56 +support vector.
    2.57 +
    2.58 +* 2011 Q2 Part A
    2.59 +
    2.60 +The equation for the line you drew in part A is $y = 0$, and points
    2.61 +will be classified as positive if $y\geq 0$.
    2.62 +
    2.63 +# shouldn't points be classified as positive only if leq 1 + b dot w x? 
    2.64 +Now in general, the equation for the decision boundary is
    2.65 +$\vec{w}\cdot \vec{x} + b = 0$, and points $\vec{x}$ are
    2.66 +classified as positive if $\vec{w}\cdot \vec{x} + b \geq 0$. To
    2.67 +solve for $\vec{w}$ and $b$, you just manipulate your inequation
    2.68 +for the boundary into this form:
    2.69 +
    2.70 +$$y\geq 0$$
    2.71 +$$0x + 1y + 0 \geq 0$$
    2.72 +$$\begin{bmatrix}0&amp;1\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 \geq 0$$
    2.73 +
    2.74 +And so we have:
    2.75 +
    2.76 +$$\underbrace{\begin{bmatrix}0&amp;1\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$
    2.77 +
    2.78 +...which is /almost/ right. The trouble is that we can multiply this
    2.79 +inequation by any positive constant $c>0$ and it will still be true
    2.80 +--- so, the right answer might not be the $\vec{w}$ and $b$ we found
    2.81 +above, but instead a /multiple/ of them. To fix this, we multiply the
    2.82 +inequation by $c>0$, and solve for $c$:
    2.83 +
    2.84 +$$c\cdot (0x + 1y + 0) \geq c\cdot0$$
    2.85 +$$\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} +
    2.86 +0 \geq 0$$
    2.87 +
    2.88 +One formula that you can use to solve for $c$ is the margin-width
    2.89 +formula; by examining the lines you drew, you see that the margin
    2.90 +width is *4*. Now, in general, the equation for margin width is
    2.91 +
    2.92 +$$\text{margin-width} = \frac{2}{||\vec{w}||} = \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$$
    2.93 +
    2.94 +So, for the second time, we use the technique of setting the quantity
    2.95 +you can see (margin width = 4) equal to the general formula (margin
    2.96 +width = 2/||w||) and solving for the parameters in the general
    2.97 +formula:
    2.98 +
    2.99 +$$\frac{2}{||w||} = 4$$
   2.100 +
   2.101 +$$\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}} = 4$$
   2.102 +
   2.103 +$$\frac{4}{\vec{w}\cdot \vec{w}} = 16$$ (squaring both sides)
   2.104 +
   2.105 +$$\vec{w}\cdot \vec{w} = \frac{1}{4}$$
   2.106 +
   2.107 +$$\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}0\\c\end{bmatrix}
   2.108 += \frac{1}{4}$$ 
   2.109 +
   2.110 +(This uses the expression for $\vec{w}$ we got after
   2.111 +we multiplied by $c$ above.)
   2.112 +
   2.113 +$$\begin{eqnarray*}
   2.114 +c^2 &=& \frac{1}{4}\\
   2.115 +c &=& \pm \frac{1}{2}\\
   2.116 +c &=& \frac{1}{2}\end{eqnarray*}$$...because we require $c>0$ so that it doesn't flip the inequality.
   2.117 +
   2.118 +
   2.119 +Returning to our inequality, we have 
   2.120 +
   2.121 +$$\begin{eqnarray*}\begin{bmatrix}0&amp;c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\\
   2.122 +\begin{bmatrix}0&amp;\frac{1}{2}\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\end{eqnarray*}$$
   2.123 +
   2.124 +$$\underbrace{\begin{bmatrix}0&amp;\frac{1}{2}\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$
   2.125 +
   2.126 +And this is truly the right answer, now that we have solved for $c$.
   2.127 +
   2.128 +
   2.129 +* What's the use of gutters?
   2.130 +The gutters are the regions where you have no hard data supporting
   2.131 +your classification, but the decision boundary is, as its name
   2.132 +implies, the cutoff for making a decision between the two
   2.133 +classifications.  As such, you would likely have lower confidence in
   2.134 +any decision within the gutters, but a classifier needs to make a
   2.135 +decision one way or the other for all possible test data (there's no
   2.136 +reason not to do so, as making no decision means you're automatically
   2.137 +wrong in all cases, whereas even a lousy classifier will get some of
   2.138 +them right).
   2.139 +
   2.140 +----
   2.141 +
   2.142 +Here are two more ways to look at it. First of all, and simplest, SVMs
   2.143 +are supposed to find the boundary with the widest margin. The width of
   2.144 +the margin is just the perpendicular distance between the gutters, so
   2.145 +the gutters are important for that reason.
   2.146 +
   2.147 + Second of all, if you attempt to define the SVM problem
   2.148 + geometrically, i.e. put it into a form that can be solved by a
   2.149 + computer, it looks something like the following.
   2.150 +
   2.151 +#+BEGIN_QUOTE
   2.152 +Find a boundary --- which is defined by a normal vector
   2.153 +$\vec{w}$ and an offset $b$ --- such that:
   2.154 +1. The boundary \ldquo{} $\vec{w}\cdot \vec{x} + b = 0$ \rdquo{}
   2.155 +   separates the positive and negative points
   2.156 +2. The width of the margin is as large as possible. Remember, the
   2.157 +   margin width is twice the distance between the boundary and the
   2.158 +   training point that is closest to the boundary; in other words,
   2.159 +   $$\text{margin-width} = \min_{\text{training data}} 2 y_i
   2.160 +   (\vec{w}\cdot \vec{x}_i + b)$$
   2.161 +#+END_QUOTE
   2.162 +
   2.163 +Unfortunately, the problem as stated has no unique solution, since if
   2.164 +you find a satisfactory pair $\langle \vec{w}, b \rangle$ , then the
   2.165 +pair $\langle 3\vec{w}, 3b\rangle$ (for example) defines the same
   2.166 +boundary but has a larger margin. The problem is that any multiple of
   2.167 +a satisfactory pair yields another satisfactory pair. In essence,
   2.168 +we're just changing the units of measurement without materially
   2.169 +affecting the boundary.
   2.170 +
   2.171 +We can /remove this additional degree of freedom by adding another
   2.172 +constraint/ to the problem which establishes a sense of scale. For
   2.173 +example, we could require $\vec{w}$ to be a /unit/ normal vector,
   2.174 +i.e. we could require that $||\vec{w}|| = 1$. This fixes the problem
   2.175 +and gives SVMs a unique solution.
   2.176 +
   2.177 +In 6.034, and elsewhere, we use an alternate constraint; we impose the /gutter conditions/:  
   2.178 +
   2.179 +$\vec{w}\cdot \vec{x_+} + b = + 1$ for positive training points
   2.180 +$\vec{x_+}$.  
   2.181 +
   2.182 +$\vec{w}\cdot \vec{x_-} + b = -1$ for negative
   2.183 +training points $\vec{x_-}$.
   2.184 +
   2.185 +which we often combine into a single inequality: $y_i (\vec{w}\cdot
   2.186 +\vec{x}_i + b) \geqslant 1$ for all training points $\langle
   2.187 +\vec{x}_i, y_i\rangle$.
   2.188 +
   2.189 +
   2.190 +This constraint is enough to eliminate the extra degree of freedom and give SVMs a unique solution. Moreover, you get a nice formula for the margin width by requiring that it holds:
   2.191 +
   2.192 +$\text{margin-width} = \frac{2}{||\vec{w}||}$ because the gutter
   2.193 +conditions are enforced.
   2.194 +
   2.195 +
   2.196 +That's why gutters are important.