Mercurial > thoughts
diff org/svm.org @ 85:41741f76b567
merge, and an idea about the power of ice.
author | Robert McIntyre <rlm@mit.edu> |
---|---|
date | Thu, 16 Jan 2014 10:37:11 -0500 |
parents | dc521fea9219 |
children |
line wrap: on
line diff
1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/org/svm.org Thu Jan 16 10:37:11 2014 -0500 1.3 @@ -0,0 +1,193 @@ 1.4 +#+TITLE: Notes on SVMs 1.5 +#+AUTHOR:Dylan Holmes 1.6 +#+SETUPFILE: ../../aurellem/org/setup.org 1.7 +#+INCLUDE: ../../aurellem/org/level-0.org 1.8 +#+MATHJAX: align:"left" mathml:t path:"http://www.aurellem.org/MathJax/MathJax.js" 1.9 + 1.10 +* Question: How do you find $\vec{w}$, $b$, and the \alpha{}'s? 1.11 + 1.12 +You should take a look at the class notes, for starters: 1.13 +http://ai6034.mit.edu/fall12/images/SVM_and_Boosting.pdf 1.14 + 1.15 +You can also take a look at @388 to see an example of (part of) a 1.16 +worked problem. 1.17 + 1.18 + 1.19 +When solving SVM problems, there are some useful equations to keep in 1.20 +mind: 1.21 + 1.22 +1. $\vec{w}\cdot \vec{x} + b = 0$ defines the boundary, and in 1.23 + particular $\vec{w}\cdot \vec{x} + b \geq 0$ defines the positive 1.24 + side of the boundary. 1.25 +2. $\vec{w}\cdot \vec{x} + b = \pm 1$ defines the positive and 1.26 + negative gutters. 1.27 +3. The distance between the gutters (the width of the margin) is 1.28 + $\text{margin-width} = \frac{2}{||\vec{w}||} = 1.29 + \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$ 1.30 +4. $\sum_{\text{training data}} y_i \alpha_i = 1.31 + 0$ 1.32 +5. $\sum_{\text{training data}} y_i \alpha_i \vec{x}_i = \vec{w}$ 1.33 + 1.34 +The last two equations are the ones that can help you find the 1.35 + $\alpha_i$. You may also find it useful to think of the $\alpha_i$ 1.36 + as measuring \ldquo{}supportiveness\rdquo{}. This means, for 1.37 + example, that: - $\alpha_i$ is zero for non-support vectors, 1.38 + i.e. training points that do not determine the boundary, and which 1.39 + would not affect the placement of the boundary if deleted - When you 1.40 + compare two separate SVM problems, where the first has support 1.41 + vectors that are far from the boundary, and the second has support 1.42 + vectors very close to the boundary, the latter support vectors have 1.43 + comparatively higher $\alpha_i$ values. 1.44 + 1.45 +** Can \alpha{} be zero for support vectors? 1.46 +No, \alpha{} is zero for a point if and only if it's not a support 1.47 +vector. You can see this because in equation (5), just the points 1.48 +with nonzero alphas contribute to $\vec{w}$. 1.49 + 1.50 +** Are all points on the gutter support vectors? How do you tell whether a point is a support vector? 1.51 + 1.52 +Not all points on the gutter are support vectors. To tell if a 1.53 +training point is a support vector, imagine what would happen if you 1.54 +deleted it --- would the decision boundary be different? If it would, 1.55 +the point is a support vector. If it wouldn't, the point isn't a 1.56 +support vector. 1.57 + 1.58 +* 2011 Q2 Part A 1.59 + 1.60 +The equation for the line you drew in part A is $y = 0$, and points 1.61 +will be classified as positive if $y\geq 0$. 1.62 + 1.63 +# shouldn't points be classified as positive only if leq 1 + b dot w x? 1.64 +Now in general, the equation for the decision boundary is 1.65 +$\vec{w}\cdot \vec{x} + b = 0$, and points $\vec{x}$ are 1.66 +classified as positive if $\vec{w}\cdot \vec{x} + b \geq 0$. To 1.67 +solve for $\vec{w}$ and $b$, you just manipulate your inequation 1.68 +for the boundary into this form: 1.69 + 1.70 +$$y\geq 0$$ 1.71 +$$0x + 1y + 0 \geq 0$$ 1.72 +$$\begin{bmatrix}0&1\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 \geq 0$$ 1.73 + 1.74 +And so we have: 1.75 + 1.76 +$$\underbrace{\begin{bmatrix}0&1\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$ 1.77 + 1.78 +...which is /almost/ right. The trouble is that we can multiply this 1.79 +inequation by any positive constant $c>0$ and it will still be true 1.80 +--- so, the right answer might not be the $\vec{w}$ and $b$ we found 1.81 +above, but instead a /multiple/ of them. To fix this, we multiply the 1.82 +inequation by $c>0$, and solve for $c$: 1.83 + 1.84 +$$c\cdot (0x + 1y + 0) \geq c\cdot0$$ 1.85 +$$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 1.86 +0 \geq 0$$ 1.87 + 1.88 +One formula that you can use to solve for $c$ is the margin-width 1.89 +formula; by examining the lines you drew, you see that the margin 1.90 +width is *4*. Now, in general, the equation for margin width is 1.91 + 1.92 +$$\text{margin-width} = \frac{2}{||\vec{w}||} = \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$$ 1.93 + 1.94 +So, for the second time, we use the technique of setting the quantity 1.95 +you can see (margin width = 4) equal to the general formula (margin 1.96 +width = 2/||w||) and solving for the parameters in the general 1.97 +formula: 1.98 + 1.99 +$$\frac{2}{||w||} = 4$$ 1.100 + 1.101 +$$\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}} = 4$$ 1.102 + 1.103 +$$\frac{4}{\vec{w}\cdot \vec{w}} = 16$$ (squaring both sides) 1.104 + 1.105 +$$\vec{w}\cdot \vec{w} = \frac{1}{4}$$ 1.106 + 1.107 +$$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}0\\c\end{bmatrix} 1.108 += \frac{1}{4}$$ 1.109 + 1.110 +(This uses the expression for $\vec{w}$ we got after 1.111 +we multiplied by $c$ above.) 1.112 + 1.113 +$$\begin{eqnarray*} 1.114 +c^2 &=& \frac{1}{4}\\ 1.115 +c &=& \pm \frac{1}{2}\\ 1.116 +c &=& \frac{1}{2}\end{eqnarray*}$$...because we require $c>0$ so that it doesn't flip the inequality. 1.117 + 1.118 + 1.119 +Returning to our inequality, we have 1.120 + 1.121 +$$\begin{eqnarray*}\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\\ 1.122 +\begin{bmatrix}0&\frac{1}{2}\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\end{eqnarray*}$$ 1.123 + 1.124 +$$\underbrace{\begin{bmatrix}0&\frac{1}{2}\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$ 1.125 + 1.126 +And this is truly the right answer, now that we have solved for $c$. 1.127 + 1.128 + 1.129 +* What's the use of gutters? 1.130 +The gutters are the regions where you have no hard data supporting 1.131 +your classification, but the decision boundary is, as its name 1.132 +implies, the cutoff for making a decision between the two 1.133 +classifications. As such, you would likely have lower confidence in 1.134 +any decision within the gutters, but a classifier needs to make a 1.135 +decision one way or the other for all possible test data (there's no 1.136 +reason not to do so, as making no decision means you're automatically 1.137 +wrong in all cases, whereas even a lousy classifier will get some of 1.138 +them right). 1.139 + 1.140 +---- 1.141 + 1.142 +Here are two more ways to look at it. First of all, and simplest, SVMs 1.143 +are supposed to find the boundary with the widest margin. The width of 1.144 +the margin is just the perpendicular distance between the gutters, so 1.145 +the gutters are important for that reason. 1.146 + 1.147 + Second of all, if you attempt to define the SVM problem 1.148 + geometrically, i.e. put it into a form that can be solved by a 1.149 + computer, it looks something like the following. 1.150 + 1.151 +#+BEGIN_QUOTE 1.152 +Find a boundary --- which is defined by a normal vector 1.153 +$\vec{w}$ and an offset $b$ --- such that: 1.154 +1. The boundary \ldquo{} $\vec{w}\cdot \vec{x} + b = 0$ \rdquo{} 1.155 + separates the positive and negative points 1.156 +2. The width of the margin is as large as possible. Remember, the 1.157 + margin width is twice the distance between the boundary and the 1.158 + training point that is closest to the boundary; in other words, 1.159 + $$\text{margin-width} = \min_{\text{training data}} 2 y_i 1.160 + (\vec{w}\cdot \vec{x}_i + b)$$ 1.161 +#+END_QUOTE 1.162 + 1.163 +Unfortunately, the problem as stated has no unique solution, since if 1.164 +you find a satisfactory pair $\langle \vec{w}, b \rangle$ , then the 1.165 +pair $\langle 3\vec{w}, 3b\rangle$ (for example) defines the same 1.166 +boundary but has a larger margin. The problem is that any multiple of 1.167 +a satisfactory pair yields another satisfactory pair. In essence, 1.168 +we're just changing the units of measurement without materially 1.169 +affecting the boundary. 1.170 + 1.171 +We can /remove this additional degree of freedom by adding another 1.172 +constraint/ to the problem which establishes a sense of scale. For 1.173 +example, we could require $\vec{w}$ to be a /unit/ normal vector, 1.174 +i.e. we could require that $||\vec{w}|| = 1$. This fixes the problem 1.175 +and gives SVMs a unique solution. 1.176 + 1.177 +In 6.034, and elsewhere, we use an alternate constraint; we impose the /gutter conditions/: 1.178 + 1.179 +$\vec{w}\cdot \vec{x_+} + b = + 1$ for positive training points 1.180 +$\vec{x_+}$. 1.181 + 1.182 +$\vec{w}\cdot \vec{x_-} + b = -1$ for negative 1.183 +training points $\vec{x_-}$. 1.184 + 1.185 +which we often combine into a single inequality: $y_i (\vec{w}\cdot 1.186 +\vec{x}_i + b) \geqslant 1$ for all training points $\langle 1.187 +\vec{x}_i, y_i\rangle$. 1.188 + 1.189 + 1.190 +This constraint is enough to eliminate the extra degree of freedom and give SVMs a unique solution. Moreover, you get a nice formula for the margin width by requiring that it holds: 1.191 + 1.192 +$\text{margin-width} = \frac{2}{||\vec{w}||}$ because the gutter 1.193 +conditions are enforced. 1.194 + 1.195 + 1.196 +That's why gutters are important.