thoughts: org/svm.org annotate

annotate org/svm.org @ 124:7415863b39f2

minor fix.

author	rlm
date	Tue, 03 Jun 2014 15:33:21 -0400
parents	dc521fea9219
children

rev	line source
rlm@71	1 #+TITLE: Notes on SVMs
rlm@71	2 #+AUTHOR:Dylan Holmes
rlm@71	3 #+SETUPFILE: ../../aurellem/org/setup.org
rlm@71	4 #+INCLUDE: ../../aurellem/org/level-0.org
rlm@71	5 #+MATHJAX: align:"left" mathml:t path:"http://www.aurellem.org/MathJax/MathJax.js"
rlm@71	6
rlm@71	7 * Question: How do you find $\vec{w}$, $b$, and the \alpha{}'s?
rlm@71	8
rlm@71	9 You should take a look at the class notes, for starters:
rlm@71	10 http://ai6034.mit.edu/fall12/images/SVM_and_Boosting.pdf
rlm@71	11
rlm@71	12 You can also take a look at @388 to see an example of (part of) a
rlm@71	13 worked problem.
rlm@71	14
rlm@71	15
rlm@71	16 When solving SVM problems, there are some useful equations to keep in
rlm@71	17 mind:
rlm@71	18
rlm@71	19 1. $\vec{w}\cdot \vec{x} + b = 0$ defines the boundary, and in
rlm@71	20 particular $\vec{w}\cdot \vec{x} + b \geq 0$ defines the positive
rlm@71	21 side of the boundary.
rlm@71	22 2. $\vec{w}\cdot \vec{x} + b = \pm 1$ defines the positive and
rlm@71	23 negative gutters.
rlm@71	24 3. The distance between the gutters (the width of the margin) is
rlm@71	25 $\text{margin-width} = \frac{2}{\|\|\vec{w}\|\|} =
rlm@71	26 \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$
rlm@71	27 4. $\sum_{\text{training data}} y_i \alpha_i =
rlm@71	28 0$
rlm@71	29 5. $\sum_{\text{training data}} y_i \alpha_i \vec{x}_i = \vec{w}$
rlm@71	30
rlm@71	31 The last two equations are the ones that can help you find the
rlm@71	32 $\alpha_i$. You may also find it useful to think of the $\alpha_i$
rlm@71	33 as measuring \ldquo{}supportiveness\rdquo{}. This means, for
rlm@71	34 example, that: - $\alpha_i$ is zero for non-support vectors,
rlm@71	35 i.e. training points that do not determine the boundary, and which
rlm@71	36 would not affect the placement of the boundary if deleted - When you
rlm@71	37 compare two separate SVM problems, where the first has support
rlm@71	38 vectors that are far from the boundary, and the second has support
rlm@71	39 vectors very close to the boundary, the latter support vectors have
rlm@71	40 comparatively higher $\alpha_i$ values.
rlm@71	41
rlm@71	42 ** Can \alpha{} be zero for support vectors?
rlm@71	43 No, \alpha{} is zero for a point if and only if it's not a support
rlm@71	44 vector. You can see this because in equation (5), just the points
rlm@71	45 with nonzero alphas contribute to $\vec{w}$.
rlm@71	46
rlm@71	47 ** Are all points on the gutter support vectors? How do you tell whether a point is a support vector?
rlm@71	48
rlm@71	49 Not all points on the gutter are support vectors. To tell if a
rlm@71	50 training point is a support vector, imagine what would happen if you
rlm@71	51 deleted it --- would the decision boundary be different? If it would,
rlm@71	52 the point is a support vector. If it wouldn't, the point isn't a
rlm@71	53 support vector.
rlm@71	54
rlm@71	55 * 2011 Q2 Part A
rlm@71	56
rlm@71	57 The equation for the line you drew in part A is $y = 0$, and points
rlm@71	58 will be classified as positive if $y\geq 0$.
rlm@71	59
rlm@71	60 # shouldn't points be classified as positive only if leq 1 + b dot w x?
rlm@71	61 Now in general, the equation for the decision boundary is
rlm@71	62 $\vec{w}\cdot \vec{x} + b = 0$, and points $\vec{x}$ are
rlm@71	63 classified as positive if $\vec{w}\cdot \vec{x} + b \geq 0$. To
rlm@71	64 solve for $\vec{w}$ and $b$, you just manipulate your inequation
rlm@71	65 for the boundary into this form:
rlm@71	66
rlm@71	67 $$y\geq 0$$
rlm@71	68 $$0x + 1y + 0 \geq 0$$
rlm@71	69 $$\begin{bmatrix}0&1\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 \geq 0$$
rlm@71	70
rlm@71	71 And so we have:
rlm@71	72
rlm@71	73 $$\underbrace{\begin{bmatrix}0&1\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$
rlm@71	74
rlm@71	75 ...which is /almost/ right. The trouble is that we can multiply this
rlm@71	76 inequation by any positive constant $c>0$ and it will still be true
rlm@71	77 --- so, the right answer might not be the $\vec{w}$ and $b$ we found
rlm@71	78 above, but instead a /multiple/ of them. To fix this, we multiply the
rlm@71	79 inequation by $c>0$, and solve for $c$:
rlm@71	80
rlm@71	81 $$c\cdot (0x + 1y + 0) \geq c\cdot0$$
rlm@71	82 $$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} +
rlm@71	83 0 \geq 0$$
rlm@71	84
rlm@71	85 One formula that you can use to solve for $c$ is the margin-width
rlm@71	86 formula; by examining the lines you drew, you see that the margin
rlm@71	87 width is 4. Now, in general, the equation for margin width is
rlm@71	88
rlm@71	89 $$\text{margin-width} = \frac{2}{\|\|\vec{w}\|\|} = \frac{2}{\sqrt{\vec{w}\cdot \vec{w}}}$$
rlm@71	90
rlm@71	91 So, for the second time, we use the technique of setting the quantity
rlm@71	92 you can see (margin width = 4) equal to the general formula (margin
rlm@71	93 width = 2/\|\|w\|\|) and solving for the parameters in the general
rlm@71	94 formula:
rlm@71	95
rlm@71	96 $$\frac{2}{\|\|w\|\|} = 4$$
rlm@71	97
rlm@71	98 $$\frac{2}{\sqrt{\vec{w}\cdot \vec{w}}} = 4$$
rlm@71	99
rlm@71	100 $$\frac{4}{\vec{w}\cdot \vec{w}} = 16$$ (squaring both sides)
rlm@71	101
rlm@71	102 $$\vec{w}\cdot \vec{w} = \frac{1}{4}$$
rlm@71	103
rlm@71	104 $$\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}0\\c\end{bmatrix}
rlm@71	105 = \frac{1}{4}$$
rlm@71	106
rlm@71	107 (This uses the expression for $\vec{w}$ we got after
rlm@71	108 we multiplied by $c$ above.)
rlm@71	109
rlm@71	110 $$\begin{eqnarray*}
rlm@71	111 c^2 &=& \frac{1}{4}\\
rlm@71	112 c &=& \pm \frac{1}{2}\\
rlm@71	113 c &=& \frac{1}{2}\end{eqnarray*}$$...because we require $c>0$ so that it doesn't flip the inequality.
rlm@71	114
rlm@71	115
rlm@71	116 Returning to our inequality, we have
rlm@71	117
rlm@71	118 $$\begin{eqnarray*}\begin{bmatrix}0&c\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\\
rlm@71	119 \begin{bmatrix}0&\frac{1}{2}\end{bmatrix}\cdot \begin{bmatrix}x\\y\end{bmatrix} + 0 &\geq& 0\end{eqnarray*}$$
rlm@71	120
rlm@71	121 $$\underbrace{\begin{bmatrix}0&\frac{1}{2}\end{bmatrix}}_{\vec{w}}\cdot \underbrace{\begin{bmatrix}x\\y\end{bmatrix}}_{\vec{x}} + \underbrace{0}_{b} \geq 0$$
rlm@71	122
rlm@71	123 And this is truly the right answer, now that we have solved for $c$.
rlm@71	124
rlm@71	125
rlm@71	126 * What's the use of gutters?
rlm@71	127 The gutters are the regions where you have no hard data supporting
rlm@71	128 your classification, but the decision boundary is, as its name
rlm@71	129 implies, the cutoff for making a decision between the two
rlm@71	130 classifications. As such, you would likely have lower confidence in
rlm@71	131 any decision within the gutters, but a classifier needs to make a
rlm@71	132 decision one way or the other for all possible test data (there's no
rlm@71	133 reason not to do so, as making no decision means you're automatically
rlm@71	134 wrong in all cases, whereas even a lousy classifier will get some of
rlm@71	135 them right).
rlm@71	136
rlm@71	137 ----
rlm@71	138
rlm@71	139 Here are two more ways to look at it. First of all, and simplest, SVMs
rlm@71	140 are supposed to find the boundary with the widest margin. The width of
rlm@71	141 the margin is just the perpendicular distance between the gutters, so
rlm@71	142 the gutters are important for that reason.
rlm@71	143
rlm@71	144 Second of all, if you attempt to define the SVM problem
rlm@71	145 geometrically, i.e. put it into a form that can be solved by a
rlm@71	146 computer, it looks something like the following.
rlm@71	147
rlm@71	148 #+BEGIN_QUOTE
rlm@71	149 Find a boundary --- which is defined by a normal vector
rlm@71	150 $\vec{w}$ and an offset $b$ --- such that:
rlm@71	151 1. The boundary \ldquo{} $\vec{w}\cdot \vec{x} + b = 0$ \rdquo{}
rlm@71	152 separates the positive and negative points
rlm@71	153 2. The width of the margin is as large as possible. Remember, the
rlm@71	154 margin width is twice the distance between the boundary and the
rlm@71	155 training point that is closest to the boundary; in other words,
rlm@71	156 $$\text{margin-width} = \min_{\text{training data}} 2 y_i
rlm@71	157 (\vec{w}\cdot \vec{x}_i + b)$$
rlm@71	158 #+END_QUOTE
rlm@71	159
rlm@71	160 Unfortunately, the problem as stated has no unique solution, since if
rlm@71	161 you find a satisfactory pair $\langle \vec{w}, b \rangle$ , then the
rlm@71	162 pair $\langle 3\vec{w}, 3b\rangle$ (for example) defines the same
rlm@71	163 boundary but has a larger margin. The problem is that any multiple of
rlm@71	164 a satisfactory pair yields another satisfactory pair. In essence,
rlm@71	165 we're just changing the units of measurement without materially
rlm@71	166 affecting the boundary.
rlm@71	167
rlm@71	168 We can /remove this additional degree of freedom by adding another
rlm@71	169 constraint/ to the problem which establishes a sense of scale. For
rlm@71	170 example, we could require $\vec{w}$ to be a /unit/ normal vector,
rlm@71	171 i.e. we could require that $\|\|\vec{w}\|\| = 1$. This fixes the problem
rlm@71	172 and gives SVMs a unique solution.
rlm@71	173
rlm@71	174 In 6.034, and elsewhere, we use an alternate constraint; we impose the /gutter conditions/:
rlm@71	175
rlm@71	176 $\vec{w}\cdot \vec{x_+} + b = + 1$ for positive training points
rlm@71	177 $\vec{x_+}$.
rlm@71	178
rlm@71	179 $\vec{w}\cdot \vec{x_-} + b = -1$ for negative
rlm@71	180 training points $\vec{x_-}$.
rlm@71	181
rlm@71	182 which we often combine into a single inequality: $y_i (\vec{w}\cdot
rlm@71	183 \vec{x}_i + b) \geqslant 1$ for all training points $\langle
rlm@71	184 \vec{x}_i, y_i\rangle$.
rlm@71	185
rlm@71	186
rlm@71	187 This constraint is enough to eliminate the extra degree of freedom and give SVMs a unique solution. Moreover, you get a nice formula for the margin width by requiring that it holds:
rlm@71	188
rlm@71	189 $\text{margin-width} = \frac{2}{\|\|\vec{w}\|\|}$ because the gutter
rlm@71	190 conditions are enforced.
rlm@71	191
rlm@71	192
rlm@71	193 That's why gutters are important.

Mercurial > thoughts

annotate org/svm.org @ 124:7415863b39f2