view org/literature-review.org @ 551:d304b2ea7c58

some changes from winston.
author Robert McIntyre <rlm@mit.edu>
date Fri, 02 May 2014 13:40:47 -0400
parents 8e62bf52be59
children
line wrap: on
line source
1 When I write my thesis, I want it to have links to every
5 * Object Recognition from Local Scale-Invariant Features, David G. Lowe
7 This is the famous SIFT paper that is mentioned everywhere.
9 This is a way to find objects in images given an image of that
10 object. It is moderately risistant to variations in the sample image
11 and the target image. Basically, this is a fancy way of picking out
12 a test pattern embedded in a larger pattern. It would fail to learn
13 anything resembling object categories, for instance. Usefull concept
14 is the idea of storing the local scale and rotation of each feature
15 as it is extracted from the image, then checking to make sure that
16 proposed matches all more-or-less agree on shift, rotation, scale,
17 etc. Another good idea is to use points instead of edges, since
18 they seem more robust.
20 ** References:
21 - Basri, Ronen, and David. W. Jacobs, “Recognition using region
22 correspondences,” International Journal of Computer Vision, 25, 2
23 (1996), pp. 141–162.
25 - Edelman, Shimon, Nathan Intrator, and Tomaso Poggio, “Complex
26 cells and object recognition,” Unpublished Manuscript, preprint at
27 http://www.ai.mit.edu/edelman/mirror/nips97.ps.Z
29 - Lindeberg, Tony, “Detecting salient blob-like image structures
30 and their scales with a scale-space primal sketch: a method for
31 focus-of-attention,” International Journal of Computer Vision, 11, 3
32 (1993), pp. 283–318.
34 - Murase, Hiroshi, and Shree K. Nayar, “Visual learning and
35 recognition of 3-D objects from appearance,” International Journal
36 of Computer Vision, 14, 1 (1995), pp. 5–24.
38 - Ohba, Kohtaro, and Katsushi Ikeuchi, “Detectability, uniqueness,
39 and reliability of eigen windows for stable verification of
40 partially occluded objects,” IEEE Trans. on Pattern Analysis and
41 Machine Intelligence, 19, 9 (1997), pp. 1043–48.
43 - Zhang, Z., R. Deriche, O. Faugeras, Q.T. Luong, “A robust
44 technique for matching two uncalibrated images through the recovery
45 of the unknown epipolar geometry,” Artificial Intelligence, 78,
46 (1995), pp. 87-119.
52 * Alignment by Maximization of Mutual Information, Paul A. Viola
54 PhD Thesis recommended by Winston. Describes a system that is able
55 to align a 3D computer model of an object with an image of that
56 object.
58 - Pages 9-19 is a very adequate intro to the algorithm.
60 - Has a useful section on entropy and probability at the beginning
61 which is worth reading, especially the part about entropy.
63 - Differential entropy seems a bit odd -- you would think that it
64 should be the same as normal entropy for a discrete distrubition
65 embedded in continuous space. How do you measure the entropy of a
66 half continuous, half discrete random variable? Perhaps the
67 problem is related to the delta function, and not the definition
68 of differential entropy?
70 - Expectation Maximation (Mixture of Gaussians cool stuff)
71 (Dempster 1977)
73 - Good introduction to Parzen Window Density Estimation. Parzen
74 density functions trade construction time for evaulation
75 time.(Pg. 41) They are a way to transform a sample into a
76 distribution. They don't work very well in higher dimensions due
77 to the thinning of sample points.
79 - Calculating the entropy of a Markov Model (or state machine,
80 program, etc) seems like it would be very hard, since each trial
81 would not be independent of the other trials. Yet, there are many
82 common sense models that do need to have state to accurately model
83 the world.
85 - "... there is no direct procedure for evaluating entropy from a
86 sample. A common approach is to model the density from the sample,
87 and then estimate the entropy from the density."
89 - pg. 55 he says that infinity minus infinity is zero lol.
91 - great idea on pg 62 about using random samples from images to
92 speed up computation.
94 - practical way of terminating a random search: "A better idea is to
95 reduce the learning rate until the parameters have a reasonable
96 variance and then take the average parameters."
98 - p. 65 bullshit hack to make his parzen window estimates work.
100 - this alignment only works if the initial pose is not very far
101 off.
104 Occlusion? Seems a bit holistic.
106 ** References
107 - "excellent" book on entropy (Cover & Thomas, 1991) [Elements of
108 Information Theory.]
110 - Canny, J. (1986). A Computational Approach to Edge Detection. IEEE
111 Transactions PAMI, PAMI-8(6):679{698
113 - Chin, R. and Dyer, C. (1986). Model-Based Recognition in Robot
114 Vision. Computing Surveys, 18:67-108.
116 - Grimson, W., Lozano-Perez, T., Wells, W., et al. (1994). An
117 Automatic Registration Method for Frameless Stereotaxy, Image
118 Guided Surgery, and Enhanced Realigy Visualization. In Proceedings
119 of the Computer Society Conference on Computer Vision and Pattern
120 Recognition, Seattle, WA. IEEE.
122 - Hill, D. L., Studholme, C., and Hawkes, D. J. (1994). Voxel
123 Similarity Measures for Auto-mated Image Registration. In
124 Proceedings of the Third Conference on Visualization in Biomedical
125 Computing, pages 205 { 216. SPIE.
127 - Kirkpatrick, S., Gelatt, C., and Vecch Optimization by Simulated
128 Annealing. Science, 220(4598):671-680.
130 - Jones, M. and Poggio, T. (1995). Model-based matching of line
131 drawings by linear combin-ations of prototypes. Proceedings of the
132 International Conference on Computer Vision
134 - Ljung, L. and Soderstrom, T. (1983). Theory and Practice of
135 Recursive Identi cation. MIT Press.
137 - Shannon, C. E. (1948). A mathematical theory of communication. Bell
138 Systems Technical Journal, 27:379-423 and 623-656.
140 - Shashua, A. (1992). Geometry and Photometry in 3D Visual
141 Recognition. PhD thesis, M.I.T Artificial Intelligence Laboratory,
142 AI-TR-1401.
144 - William H. Press, Brian P. Flannery, S. A. T. and Veterling,
145 W. T. (1992). Numerical Recipes in C: The Art of Scienti c
146 Computing. Cambridge University Press, Cambridge, England, second
147 edition edition.
149 * Semi-Automated Dialogue Act Classification for Situated Social Agents in Games, Deb Roy
151 Interesting attempt to learn "social scripts" related to resturant
152 behaviour. The authors do this by creating a game which implements a
153 virtual restruant, and recoding actual human players as they
154 interact with the game. The learn scripts from annotated
155 interactions and then use those scripts to label other
156 interactions. They don't get very good results, but their
157 methodology of creating a virtual world and recording
158 low-dimensional actions is interesting.
160 - Torque 2D/3D looks like an interesting game engine.
163 * Face Recognition by Humans: Nineteen Results all Computer Vision Researchers should know, Sinha
165 This is a summary of a lot of bio experiments on human face
166 recognition.
168 - They assert again that the internal gradients/structures of a face
169 are more important than the edges.
171 - It's amazing to me that it takes about 10 years after birth for a
172 human to get advanced adult-like face detection. They go through
173 feature based processing to a holistic based approach during this
174 time.
176 - Finally, color is a very important cue for identifying faces.
178 ** References
179 - A. Freire, K. Lee, and L. A. Symons, BThe face-inversion effect as
180 a deficit in the encoding of configural information: Direct
181 evidence,[ Perception, vol. 29, no. 2, pp. 159–170, 2000.
182 - M. B. Lewis, BThatcher’s children: Development and the Thatcher
183 illusion,[Perception, vol. 32, pp. 1415–21, 2003.
184 - E. McKone and N. Kanwisher, BDoes the human brain process objects
185 of expertise like faces? A review of the evidence,[ in From Monkey
186 Brain to Human Brain, S. Dehaene, J. R. Duhamel, M. Hauser, and
187 G. Rizzolatti, Eds. Cambridge, MA: MIT Press, 2005.
192 heee~eeyyyy kids, time to get eagle'd!!!!
198 * Ullman
200 Actual code reuse!
202 precision = fraction of retrieved instances that are relevant
203 (true-postives/(true-positives+false-positives))
205 recall = fraction of relevant instances that are retrieved
206 (true-positives/total-in-class)
208 cross-validation = train the model on two different sets to prevent
209 overfitting.
211 nifty, relevant, realistic ideas
212 He doesn't confine himself to unplasaubile assumptions
214 ** Our Reading
215 *** 2002 Visual features of intermediate complexity and their use in classification
220 ** Getting around the dumb "fixed training set" methods
222 *** 2006 Learning to classify by ongoing feature selection
224 Brings in the most informative features of a class, based on
225 mutual information between that feature and all the examples
226 encountered so far. To bound the running time, he uses only a
227 fixed number of the most recent examples. He uses a replacement
228 strategy to tell whether a new feature is better than one of the
229 corrent features.
231 *** 2009 Learning model complexity in an online environment
233 Sort of like the heirichal baysean models of Tennanbaum, this
234 system makes the model more and more complicated as it gets more
235 and more training data. It does this by using two systems in
236 parallell and then whenever the more complex one seems to be
237 needed by the data, the less complex one is thrown out, and an
238 even more complex model is initialized in its place.
240 He uses a SVM with polynominal kernels of varying complexity. He
241 gets good perfoemance on a handwriting classfication using a large
242 range of training samples, since his model changes complexity
243 depending on the number of training samples. The simpler models do
244 better with few training points, and the more complex ones do
245 better with many training points.
247 The final model had intermediate complexity between published
248 extremes.
250 The more complex models must be able to be initialized efficiently
251 from the less complex models which they replace!
254 ** Non Parametric Models
256 *** 2010 The chains model for detecting parts by their context
258 Like the constelation method for rigid objects, but extended to
259 non-rigid objects as well.
261 Allows you to build a hand detector from a face detector. This is
262 usefull because hands might be only a few pixels, and very
263 ambiguous in an image, but if you are expecting them at the end of
264 an arm, then they become easier to find.
266 They make chains by using spatial proximity of features. That way,
267 a hand can be idntified by chaining back from the head. If there
268 is a good chain to the head, then it is more likely that there is
269 a hand than if there isn't. Since there is some give in the
270 proximity detection, the system can accomodate new poses that it
271 has never seen before.
273 Does not use any motion information.
275 *** 2005 A Hierarchical Non-Parametric Method for Capturing Non-Rigid Deformations
277 (relative dynamic programming [RDP])
279 Goal is to match images, as in SIFT, but this time the images can
280 be subject to non rigid transformations. They do this by finding
281 small patches that look the same, then building up bigger
282 patches. They get a tree of patches that describes each image, and
283 find the edit distance between each tree. Editing operations
284 involve a coherent shift of features, so they can accomodate local
285 shifts of patches in any direction. They get some cool results
286 over just straight correlation. Basically, they made an image
287 comparor that is resistant to multiple independent deformations.
289 !important small regions are treated the same as nonimportant
290 small regions
292 !no conception of shape
294 quote:
295 The dynamic programming procedure looks for an optimal
296 transformation that aligns the patches of both images. This
297 transformation is not a global transformation, but a composition
298 of many local transformations of sub-patches at various sizes,
299 performed one on top of the other.
301 *** 2006 Satellite Features for the Classification of Visually Similar Classes
303 Finds features that can distinguish subclasses of a class, by
304 first finding a rigid set of anghor features that are common to
305 both subclasses, then finding distinguishing features relative to
306 those subfeatures. They keep things rigid because the satellite
307 features don't have much information in and of themselves, and are
308 only informative relative to other features.
310 *** 2005 Learning a novel class from a single example by cross-generalization.
312 Let's you use a vast visual experience to generate a classifier
313 for a novel class by generating synthetic examples by replaceing
314 features from the single example with features from similiar
315 classes.
317 quote: feature F is likely to be useful for class C if a similar
318 feature F proved effective for a similar class C in the past.
320 Allows you to trasfer the "gestalt" of a similiar class to a new
321 class, by adapting all the features of the learned class that have
322 correspondance to the new class.
324 *** 2007 Semantic Hierarchies for Recognizing Objects and Parts
326 Better learning of complex objects like faces by learning each
327 piece (like nose, mouth, eye, etc) separately, then making sure
328 that the features are in plausable positions.