rlm@371
|
1 When I write my thesis, I want it to have links to every
|
rlm@371
|
2
|
rlm@371
|
3
|
rlm@371
|
4
|
rlm@369
|
5 * Object Recognition from Local Scale-Invariant Features, David G. Lowe
|
rlm@369
|
6
|
rlm@369
|
7 This is the famous SIFT paper that is mentioned everywhere.
|
rlm@369
|
8
|
rlm@369
|
9 This is a way to find objects in images given an image of that
|
rlm@369
|
10 object. It is moderately risistant to variations in the sample image
|
rlm@369
|
11 and the target image. Basically, this is a fancy way of picking out
|
rlm@369
|
12 a test pattern embedded in a larger pattern. It would fail to learn
|
rlm@369
|
13 anything resembling object categories, for instance. Usefull concept
|
rlm@369
|
14 is the idea of storing the local scale and rotation of each feature
|
rlm@369
|
15 as it is extracted from the image, then checking to make sure that
|
rlm@369
|
16 proposed matches all more-or-less agree on shift, rotation, scale,
|
rlm@369
|
17 etc. Another good idea is to use points instead of edges, since
|
rlm@369
|
18 they seem more robust.
|
rlm@369
|
19
|
rlm@369
|
20 ** References:
|
rlm@369
|
21 - Basri, Ronen, and David. W. Jacobs, “Recognition using region
|
rlm@369
|
22 correspondences,†International Journal of Computer Vision, 25, 2
|
rlm@369
|
23 (1996), pp. 141–162.
|
rlm@369
|
24
|
rlm@369
|
25 - Edelman, Shimon, Nathan Intrator, and Tomaso Poggio, “Complex
|
rlm@369
|
26 cells and object recognition,†Unpublished Manuscript, preprint at
|
rlm@369
|
27 http://www.ai.mit.edu/edelman/mirror/nips97.ps.Z
|
rlm@369
|
28
|
rlm@369
|
29 - Lindeberg, Tony, “Detecting salient blob-like image structures
|
rlm@369
|
30 and their scales with a scale-space primal sketch: a method for
|
rlm@369
|
31 focus-of-attention,†International Journal of Computer Vision, 11, 3
|
rlm@369
|
32 (1993), pp. 283–318.
|
rlm@369
|
33
|
rlm@369
|
34 - Murase, Hiroshi, and Shree K. Nayar, “Visual learning and
|
rlm@369
|
35 recognition of 3-D objects from appearance,†International Journal
|
rlm@369
|
36 of Computer Vision, 14, 1 (1995), pp. 5–24.
|
rlm@369
|
37
|
rlm@369
|
38 - Ohba, Kohtaro, and Katsushi Ikeuchi, “Detectability, uniqueness,
|
rlm@369
|
39 and reliability of eigen windows for stable verification of
|
rlm@369
|
40 partially occluded objects,†IEEE Trans. on Pattern Analysis and
|
rlm@369
|
41 Machine Intelligence, 19, 9 (1997), pp. 1043–48.
|
rlm@369
|
42
|
rlm@369
|
43 - Zhang, Z., R. Deriche, O. Faugeras, Q.T. Luong, “A robust
|
rlm@369
|
44 technique for matching two uncalibrated images through the recovery
|
rlm@376
|
45 of the unknown epipolar geometry,†Artificial Intelligence, 78,
|
rlm@369
|
46 (1995), pp. 87-119.
|
rlm@369
|
47
|
rlm@369
|
48
|
rlm@369
|
49
|
rlm@369
|
50
|
rlm@376
|
51
|
rlm@371
|
52 * Alignment by Maximization of Mutual Information, Paul A. Viola
|
rlm@371
|
53
|
rlm@371
|
54 PhD Thesis recommended by Winston. Describes a system that is able
|
rlm@371
|
55 to align a 3D computer model of an object with an image of that
|
rlm@371
|
56 object.
|
rlm@371
|
57
|
rlm@371
|
58 - Pages 9-19 is a very adequate intro to the algorithm.
|
rlm@371
|
59
|
rlm@371
|
60 - Has a useful section on entropy and probability at the beginning
|
rlm@371
|
61 which is worth reading, especially the part about entropy.
|
rlm@371
|
62
|
rlm@371
|
63 - Differential entropy seems a bit odd -- you would think that it
|
rlm@371
|
64 should be the same as normal entropy for a discrete distrubition
|
rlm@371
|
65 embedded in continuous space. How do you measure the entropy of a
|
rlm@376
|
66 half continuous, half discrete random variable? Perhaps the
|
rlm@376
|
67 problem is related to the delta function, and not the definition
|
rlm@376
|
68 of differential entropy?
|
rlm@371
|
69
|
rlm@371
|
70 - Expectation Maximation (Mixture of Gaussians cool stuff)
|
rlm@371
|
71 (Dempster 1977)
|
rlm@371
|
72
|
rlm@371
|
73 - Good introduction to Parzen Window Density Estimation. Parzen
|
rlm@371
|
74 density functions trade construction time for evaulation
|
rlm@376
|
75 time.(Pg. 41) They are a way to transform a sample into a
|
rlm@376
|
76 distribution. They don't work very well in higher dimensions due
|
rlm@376
|
77 to the thinning of sample points.
|
rlm@376
|
78
|
rlm@376
|
79 - Calculating the entropy of a Markov Model (or state machine,
|
rlm@376
|
80 program, etc) seems like it would be very hard, since each trial
|
rlm@376
|
81 would not be independent of the other trials. Yet, there are many
|
rlm@376
|
82 common sense models that do need to have state to accurately model
|
rlm@376
|
83 the world.
|
rlm@376
|
84
|
rlm@376
|
85 - "... there is no direct procedure for evaluating entropy from a
|
rlm@376
|
86 sample. A common approach is to model the density from the sample,
|
rlm@376
|
87 and then estimate the entropy from the density."
|
rlm@376
|
88
|
rlm@376
|
89 - pg. 55 he says that infinity minus infinity is zero lol.
|
rlm@376
|
90
|
rlm@376
|
91 - great idea on pg 62 about using random samples from images to
|
rlm@376
|
92 speed up computation.
|
rlm@376
|
93
|
rlm@376
|
94 - practical way of terminating a random search: "A better idea is to
|
rlm@376
|
95 reduce the learning rate until the parameters have a reasonable
|
rlm@376
|
96 variance and then take the average parameters."
|
rlm@376
|
97
|
rlm@376
|
98 - p. 65 bullshit hack to make his parzen window estimates work.
|
rlm@376
|
99
|
rlm@376
|
100 - this alignment only works if the initial pose is not very far
|
rlm@376
|
101 off.
|
rlm@376
|
102
|
rlm@371
|
103
|
rlm@371
|
104 Occlusion? Seems a bit holistic.
|
rlm@371
|
105
|
rlm@376
|
106 ** References
|
rlm@376
|
107 - "excellent" book on entropy (Cover & Thomas, 1991) [Elements of
|
rlm@376
|
108 Information Theory.]
|
rlm@376
|
109
|
rlm@376
|
110 - Canny, J. (1986). A Computational Approach to Edge Detection. IEEE
|
rlm@376
|
111 Transactions PAMI, PAMI-8(6):679{698
|
rlm@376
|
112
|
rlm@376
|
113 - Chin, R. and Dyer, C. (1986). Model-Based Recognition in Robot
|
rlm@376
|
114 Vision. Computing Surveys, 18:67-108.
|
rlm@376
|
115
|
rlm@376
|
116 - Grimson, W., Lozano-Perez, T., Wells, W., et al. (1994). An
|
rlm@376
|
117 Automatic Registration Method for Frameless Stereotaxy, Image
|
rlm@376
|
118 Guided Surgery, and Enhanced Realigy Visualization. In Proceedings
|
rlm@376
|
119 of the Computer Society Conference on Computer Vision and Pattern
|
rlm@376
|
120 Recognition, Seattle, WA. IEEE.
|
rlm@376
|
121
|
rlm@376
|
122 - Hill, D. L., Studholme, C., and Hawkes, D. J. (1994). Voxel
|
rlm@376
|
123 Similarity Measures for Auto-mated Image Registration. In
|
rlm@376
|
124 Proceedings of the Third Conference on Visualization in Biomedical
|
rlm@376
|
125 Computing, pages 205 { 216. SPIE.
|
rlm@376
|
126
|
rlm@376
|
127 - Kirkpatrick, S., Gelatt, C., and Vecch Optimization by Simulated
|
rlm@376
|
128 Annealing. Science, 220(4598):671-680.
|
rlm@376
|
129
|
rlm@376
|
130 - Jones, M. and Poggio, T. (1995). Model-based matching of line
|
rlm@376
|
131 drawings by linear combin-ations of prototypes. Proceedings of the
|
rlm@376
|
132 International Conference on Computer Vision
|
rlm@376
|
133
|
rlm@376
|
134 - Ljung, L. and Soderstrom, T. (1983). Theory and Practice of
|
rlm@376
|
135 Recursive Identi cation. MIT Press.
|
rlm@376
|
136
|
rlm@376
|
137 - Shannon, C. E. (1948). A mathematical theory of communication. Bell
|
rlm@376
|
138 Systems Technical Journal, 27:379-423 and 623-656.
|
rlm@376
|
139
|
rlm@376
|
140 - Shashua, A. (1992). Geometry and Photometry in 3D Visual
|
rlm@376
|
141 Recognition. PhD thesis, M.I.T Artificial Intelligence Laboratory,
|
rlm@376
|
142 AI-TR-1401.
|
rlm@376
|
143
|
rlm@376
|
144 - William H. Press, Brian P. Flannery, S. A. T. and Veterling,
|
rlm@376
|
145 W. T. (1992). Numerical Recipes in C: The Art of Scienti c
|
rlm@376
|
146 Computing. Cambridge University Press, Cambridge, England, second
|
rlm@376
|
147 edition edition.
|
rlm@376
|
148
|
rlm@376
|
149 * Semi-Automated Dialogue Act Classification for Situated Social Agents in Games, Deb Roy
|
rlm@376
|
150
|
rlm@376
|
151 Interesting attempt to learn "social scripts" related to resturant
|
rlm@376
|
152 behaviour. The authors do this by creating a game which implements a
|
rlm@376
|
153 virtual restruant, and recoding actual human players as they
|
rlm@376
|
154 interact with the game. The learn scripts from annotated
|
rlm@376
|
155 interactions and then use those scripts to label other
|
rlm@376
|
156 interactions. They don't get very good results, but their
|
rlm@376
|
157 methodology of creating a virtual world and recording
|
rlm@376
|
158 low-dimensional actions is interesting.
|
rlm@376
|
159
|
rlm@376
|
160 - Torque 2D/3D looks like an interesting game engine.
|
rlm@376
|
161
|
rlm@376
|
162
|
rlm@376
|
163 * Face Recognition by Humans: Nineteen Results all Computer Vision Researchers should know, Sinha
|
rlm@376
|
164
|
rlm@376
|
165 This is a summary of a lot of bio experiments on human face
|
rlm@376
|
166 recognition.
|
rlm@376
|
167
|
rlm@376
|
168 - They assert again that the internal gradients/structures of a face
|
rlm@376
|
169 are more important than the edges.
|
rlm@376
|
170
|
rlm@376
|
171 - It's amazing to me that it takes about 10 years after birth for a
|
rlm@376
|
172 human to get advanced adult-like face detection. They go through
|
rlm@376
|
173 feature based processing to a holistic based approach during this
|
rlm@376
|
174 time.
|
rlm@376
|
175
|
rlm@376
|
176 - Finally, color is a very important cue for identifying faces.
|
rlm@371
|
177
|
rlm@371
|
178 ** References
|
rlm@376
|
179 - A. Freire, K. Lee, and L. A. Symons, BThe face-inversion effect as
|
rlm@376
|
180 a deficit in the encoding of configural information: Direct
|
rlm@376
|
181 evidence,[ Perception, vol. 29, no. 2, pp. 159–170, 2000.
|
rlm@376
|
182 - M. B. Lewis, BThatcher’s children: Development and the Thatcher
|
rlm@376
|
183 illusion,[Perception, vol. 32, pp. 1415–21, 2003.
|
rlm@376
|
184 - E. McKone and N. Kanwisher, BDoes the human brain process objects
|
rlm@376
|
185 of expertise like faces? A review of the evidence,[ in From Monkey
|
rlm@376
|
186 Brain to Human Brain, S. Dehaene, J. R. Duhamel, M. Hauser, and
|
rlm@376
|
187 G. Rizzolatti, Eds. Cambridge, MA: MIT Press, 2005.
|
rlm@376
|
188
|
rlm@376
|
189
|
rlm@376
|
190
|
rlm@376
|
191
|
rlm@376
|
192 heee~eeyyyy kids, time to get eagle'd!!!!
|
rlm@376
|
193
|
rlm@376
|
194
|
rlm@376
|
195
|
rlm@376
|
196
|
rlm@376
|
197
|
rlm@376
|
198 * Ullman
|
rlm@376
|
199
|
rlm@376
|
200 Actual code reuse!
|
rlm@376
|
201
|
rlm@376
|
202 precision = fraction of retrieved instances that are relevant
|
rlm@376
|
203 (true-postives/(true-positives+false-positives))
|
rlm@376
|
204
|
rlm@376
|
205 recall = fraction of relevant instances that are retrieved
|
rlm@376
|
206 (true-positives/total-in-class)
|
rlm@376
|
207
|
rlm@376
|
208 cross-validation = train the model on two different sets to prevent
|
rlm@376
|
209 overfitting.
|
rlm@376
|
210
|
rlm@377
|
211 nifty, relevant, realistic ideas
|
rlm@377
|
212 He doesn't confine himself to unplasaubile assumptions
|
rlm@376
|
213
|
rlm@378
|
214 ** Our Reading
|
rlm@378
|
215 *** 2002 Visual features of intermediate complexity and their use in classification
|
rlm@376
|
216
|
rlm@378
|
217
|
rlm@376
|
218
|
rlm@376
|
219
|
rlm@376
|
220 ** Getting around the dumb "fixed training set" methods
|
rlm@376
|
221
|
rlm@376
|
222 *** 2006 Learning to classify by ongoing feature selection
|
rlm@376
|
223
|
rlm@376
|
224 Brings in the most informative features of a class, based on
|
rlm@376
|
225 mutual information between that feature and all the examples
|
rlm@376
|
226 encountered so far. To bound the running time, he uses only a
|
rlm@376
|
227 fixed number of the most recent examples. He uses a replacement
|
rlm@376
|
228 strategy to tell whether a new feature is better than one of the
|
rlm@376
|
229 corrent features.
|
rlm@376
|
230
|
rlm@376
|
231 *** 2009 Learning model complexity in an online environment
|
rlm@376
|
232
|
rlm@376
|
233 Sort of like the heirichal baysean models of Tennanbaum, this
|
rlm@376
|
234 system makes the model more and more complicated as it gets more
|
rlm@376
|
235 and more training data. It does this by using two systems in
|
rlm@376
|
236 parallell and then whenever the more complex one seems to be
|
rlm@376
|
237 needed by the data, the less complex one is thrown out, and an
|
rlm@376
|
238 even more complex model is initialized in its place.
|
rlm@376
|
239
|
rlm@376
|
240 He uses a SVM with polynominal kernels of varying complexity. He
|
rlm@376
|
241 gets good perfoemance on a handwriting classfication using a large
|
rlm@376
|
242 range of training samples, since his model changes complexity
|
rlm@376
|
243 depending on the number of training samples. The simpler models do
|
rlm@376
|
244 better with few training points, and the more complex ones do
|
rlm@376
|
245 better with many training points.
|
rlm@376
|
246
|
rlm@377
|
247 The final model had intermediate complexity between published
|
rlm@377
|
248 extremes.
|
rlm@377
|
249
|
rlm@376
|
250 The more complex models must be able to be initialized efficiently
|
rlm@376
|
251 from the less complex models which they replace!
|
rlm@376
|
252
|
rlm@376
|
253
|
rlm@376
|
254 ** Non Parametric Models
|
rlm@376
|
255
|
rlm@377
|
256 *** 2010 The chains model for detecting parts by their context
|
rlm@376
|
257
|
rlm@376
|
258 Like the constelation method for rigid objects, but extended to
|
rlm@376
|
259 non-rigid objects as well.
|
rlm@376
|
260
|
rlm@376
|
261 Allows you to build a hand detector from a face detector. This is
|
rlm@376
|
262 usefull because hands might be only a few pixels, and very
|
rlm@376
|
263 ambiguous in an image, but if you are expecting them at the end of
|
rlm@376
|
264 an arm, then they become easier to find.
|
rlm@376
|
265
|
rlm@377
|
266 They make chains by using spatial proximity of features. That way,
|
rlm@377
|
267 a hand can be idntified by chaining back from the head. If there
|
rlm@377
|
268 is a good chain to the head, then it is more likely that there is
|
rlm@377
|
269 a hand than if there isn't. Since there is some give in the
|
rlm@377
|
270 proximity detection, the system can accomodate new poses that it
|
rlm@377
|
271 has never seen before.
|
rlm@377
|
272
|
rlm@377
|
273 Does not use any motion information.
|
rlm@377
|
274
|
rlm@377
|
275 *** 2005 A Hierarchical Non-Parametric Method for Capturing Non-Rigid Deformations
|
rlm@377
|
276
|
rlm@377
|
277 (relative dynamic programming [RDP])
|
rlm@377
|
278
|
rlm@377
|
279 Goal is to match images, as in SIFT, but this time the images can
|
rlm@377
|
280 be subject to non rigid transformations. They do this by finding
|
rlm@377
|
281 small patches that look the same, then building up bigger
|
rlm@377
|
282 patches. They get a tree of patches that describes each image, and
|
rlm@377
|
283 find the edit distance between each tree. Editing operations
|
rlm@377
|
284 involve a coherent shift of features, so they can accomodate local
|
rlm@377
|
285 shifts of patches in any direction. They get some cool results
|
rlm@377
|
286 over just straight correlation. Basically, they made an image
|
rlm@377
|
287 comparor that is resistant to multiple independent deformations.
|
rlm@377
|
288
|
rlm@377
|
289 !important small regions are treated the same as nonimportant
|
rlm@377
|
290 small regions
|
rlm@377
|
291
|
rlm@377
|
292 !no conception of shape
|
rlm@377
|
293
|
rlm@377
|
294 quote:
|
rlm@377
|
295 The dynamic programming procedure looks for an optimal
|
rlm@377
|
296 transformation that aligns the patches of both images. This
|
rlm@377
|
297 transformation is not a global transformation, but a composition
|
rlm@377
|
298 of many local transformations of sub-patches at various sizes,
|
rlm@377
|
299 performed one on top of the other.
|
rlm@377
|
300
|
rlm@377
|
301 *** 2006 Satellite Features for the Classification of Visually Similar Classes
|
rlm@377
|
302
|
rlm@377
|
303 Finds features that can distinguish subclasses of a class, by
|
rlm@377
|
304 first finding a rigid set of anghor features that are common to
|
rlm@377
|
305 both subclasses, then finding distinguishing features relative to
|
rlm@377
|
306 those subfeatures. They keep things rigid because the satellite
|
rlm@377
|
307 features don't have much information in and of themselves, and are
|
rlm@377
|
308 only informative relative to other features.
|
rlm@377
|
309
|
rlm@377
|
310 *** 2005 Learning a novel class from a single example by cross-generalization.
|
rlm@377
|
311
|
rlm@377
|
312 Let's you use a vast visual experience to generate a classifier
|
rlm@377
|
313 for a novel class by generating synthetic examples by replaceing
|
rlm@377
|
314 features from the single example with features from similiar
|
rlm@377
|
315 classes.
|
rlm@377
|
316
|
rlm@377
|
317 quote: feature F is likely to be useful for class C if a similar
|
rlm@377
|
318 feature F proved effective for a similar class C in the past.
|
rlm@377
|
319
|
rlm@377
|
320 Allows you to trasfer the "gestalt" of a similiar class to a new
|
rlm@377
|
321 class, by adapting all the features of the learned class that have
|
rlm@378
|
322 correspondance to the new class.
|
rlm@378
|
323
|
rlm@378
|
324 *** 2007 Semantic Hierarchies for Recognizing Objects and Parts
|
rlm@378
|
325
|
rlm@378
|
326 Better learning of complex objects like faces by learning each
|
rlm@378
|
327 piece (like nose, mouth, eye, etc) separately, then making sure
|
rlm@378
|
328 that the features are in plausable positions. |