annotate org/literature-review.org @ 470:3401053124b0

integrating vision into thesis.
author Robert McIntyre <rlm@mit.edu>
date Fri, 28 Mar 2014 17:10:43 -0400
parents 8e62bf52be59
children
rev   line source
rlm@371 1 When I write my thesis, I want it to have links to every
rlm@371 2
rlm@371 3
rlm@371 4
rlm@369 5 * Object Recognition from Local Scale-Invariant Features, David G. Lowe
rlm@369 6
rlm@369 7 This is the famous SIFT paper that is mentioned everywhere.
rlm@369 8
rlm@369 9 This is a way to find objects in images given an image of that
rlm@369 10 object. It is moderately risistant to variations in the sample image
rlm@369 11 and the target image. Basically, this is a fancy way of picking out
rlm@369 12 a test pattern embedded in a larger pattern. It would fail to learn
rlm@369 13 anything resembling object categories, for instance. Usefull concept
rlm@369 14 is the idea of storing the local scale and rotation of each feature
rlm@369 15 as it is extracted from the image, then checking to make sure that
rlm@369 16 proposed matches all more-or-less agree on shift, rotation, scale,
rlm@369 17 etc. Another good idea is to use points instead of edges, since
rlm@369 18 they seem more robust.
rlm@369 19
rlm@369 20 ** References:
rlm@369 21 - Basri, Ronen, and David. W. Jacobs, “Recognition using region
rlm@369 22 correspondences,” International Journal of Computer Vision, 25, 2
rlm@369 23 (1996), pp. 141–162.
rlm@369 24
rlm@369 25 - Edelman, Shimon, Nathan Intrator, and Tomaso Poggio, “Complex
rlm@369 26 cells and object recognition,” Unpublished Manuscript, preprint at
rlm@369 27 http://www.ai.mit.edu/edelman/mirror/nips97.ps.Z
rlm@369 28
rlm@369 29 - Lindeberg, Tony, “Detecting salient blob-like image structures
rlm@369 30 and their scales with a scale-space primal sketch: a method for
rlm@369 31 focus-of-attention,” International Journal of Computer Vision, 11, 3
rlm@369 32 (1993), pp. 283–318.
rlm@369 33
rlm@369 34 - Murase, Hiroshi, and Shree K. Nayar, “Visual learning and
rlm@369 35 recognition of 3-D objects from appearance,” International Journal
rlm@369 36 of Computer Vision, 14, 1 (1995), pp. 5–24.
rlm@369 37
rlm@369 38 - Ohba, Kohtaro, and Katsushi Ikeuchi, “Detectability, uniqueness,
rlm@369 39 and reliability of eigen windows for stable verification of
rlm@369 40 partially occluded objects,” IEEE Trans. on Pattern Analysis and
rlm@369 41 Machine Intelligence, 19, 9 (1997), pp. 1043–48.
rlm@369 42
rlm@369 43 - Zhang, Z., R. Deriche, O. Faugeras, Q.T. Luong, “A robust
rlm@369 44 technique for matching two uncalibrated images through the recovery
rlm@376 45 of the unknown epipolar geometry,” Artificial Intelligence, 78,
rlm@369 46 (1995), pp. 87-119.
rlm@369 47
rlm@369 48
rlm@369 49
rlm@369 50
rlm@376 51
rlm@371 52 * Alignment by Maximization of Mutual Information, Paul A. Viola
rlm@371 53
rlm@371 54 PhD Thesis recommended by Winston. Describes a system that is able
rlm@371 55 to align a 3D computer model of an object with an image of that
rlm@371 56 object.
rlm@371 57
rlm@371 58 - Pages 9-19 is a very adequate intro to the algorithm.
rlm@371 59
rlm@371 60 - Has a useful section on entropy and probability at the beginning
rlm@371 61 which is worth reading, especially the part about entropy.
rlm@371 62
rlm@371 63 - Differential entropy seems a bit odd -- you would think that it
rlm@371 64 should be the same as normal entropy for a discrete distrubition
rlm@371 65 embedded in continuous space. How do you measure the entropy of a
rlm@376 66 half continuous, half discrete random variable? Perhaps the
rlm@376 67 problem is related to the delta function, and not the definition
rlm@376 68 of differential entropy?
rlm@371 69
rlm@371 70 - Expectation Maximation (Mixture of Gaussians cool stuff)
rlm@371 71 (Dempster 1977)
rlm@371 72
rlm@371 73 - Good introduction to Parzen Window Density Estimation. Parzen
rlm@371 74 density functions trade construction time for evaulation
rlm@376 75 time.(Pg. 41) They are a way to transform a sample into a
rlm@376 76 distribution. They don't work very well in higher dimensions due
rlm@376 77 to the thinning of sample points.
rlm@376 78
rlm@376 79 - Calculating the entropy of a Markov Model (or state machine,
rlm@376 80 program, etc) seems like it would be very hard, since each trial
rlm@376 81 would not be independent of the other trials. Yet, there are many
rlm@376 82 common sense models that do need to have state to accurately model
rlm@376 83 the world.
rlm@376 84
rlm@376 85 - "... there is no direct procedure for evaluating entropy from a
rlm@376 86 sample. A common approach is to model the density from the sample,
rlm@376 87 and then estimate the entropy from the density."
rlm@376 88
rlm@376 89 - pg. 55 he says that infinity minus infinity is zero lol.
rlm@376 90
rlm@376 91 - great idea on pg 62 about using random samples from images to
rlm@376 92 speed up computation.
rlm@376 93
rlm@376 94 - practical way of terminating a random search: "A better idea is to
rlm@376 95 reduce the learning rate until the parameters have a reasonable
rlm@376 96 variance and then take the average parameters."
rlm@376 97
rlm@376 98 - p. 65 bullshit hack to make his parzen window estimates work.
rlm@376 99
rlm@376 100 - this alignment only works if the initial pose is not very far
rlm@376 101 off.
rlm@376 102
rlm@371 103
rlm@371 104 Occlusion? Seems a bit holistic.
rlm@371 105
rlm@376 106 ** References
rlm@376 107 - "excellent" book on entropy (Cover & Thomas, 1991) [Elements of
rlm@376 108 Information Theory.]
rlm@376 109
rlm@376 110 - Canny, J. (1986). A Computational Approach to Edge Detection. IEEE
rlm@376 111 Transactions PAMI, PAMI-8(6):679{698
rlm@376 112
rlm@376 113 - Chin, R. and Dyer, C. (1986). Model-Based Recognition in Robot
rlm@376 114 Vision. Computing Surveys, 18:67-108.
rlm@376 115
rlm@376 116 - Grimson, W., Lozano-Perez, T., Wells, W., et al. (1994). An
rlm@376 117 Automatic Registration Method for Frameless Stereotaxy, Image
rlm@376 118 Guided Surgery, and Enhanced Realigy Visualization. In Proceedings
rlm@376 119 of the Computer Society Conference on Computer Vision and Pattern
rlm@376 120 Recognition, Seattle, WA. IEEE.
rlm@376 121
rlm@376 122 - Hill, D. L., Studholme, C., and Hawkes, D. J. (1994). Voxel
rlm@376 123 Similarity Measures for Auto-mated Image Registration. In
rlm@376 124 Proceedings of the Third Conference on Visualization in Biomedical
rlm@376 125 Computing, pages 205 { 216. SPIE.
rlm@376 126
rlm@376 127 - Kirkpatrick, S., Gelatt, C., and Vecch Optimization by Simulated
rlm@376 128 Annealing. Science, 220(4598):671-680.
rlm@376 129
rlm@376 130 - Jones, M. and Poggio, T. (1995). Model-based matching of line
rlm@376 131 drawings by linear combin-ations of prototypes. Proceedings of the
rlm@376 132 International Conference on Computer Vision
rlm@376 133
rlm@376 134 - Ljung, L. and Soderstrom, T. (1983). Theory and Practice of
rlm@376 135 Recursive Identi cation. MIT Press.
rlm@376 136
rlm@376 137 - Shannon, C. E. (1948). A mathematical theory of communication. Bell
rlm@376 138 Systems Technical Journal, 27:379-423 and 623-656.
rlm@376 139
rlm@376 140 - Shashua, A. (1992). Geometry and Photometry in 3D Visual
rlm@376 141 Recognition. PhD thesis, M.I.T Artificial Intelligence Laboratory,
rlm@376 142 AI-TR-1401.
rlm@376 143
rlm@376 144 - William H. Press, Brian P. Flannery, S. A. T. and Veterling,
rlm@376 145 W. T. (1992). Numerical Recipes in C: The Art of Scienti c
rlm@376 146 Computing. Cambridge University Press, Cambridge, England, second
rlm@376 147 edition edition.
rlm@376 148
rlm@376 149 * Semi-Automated Dialogue Act Classification for Situated Social Agents in Games, Deb Roy
rlm@376 150
rlm@376 151 Interesting attempt to learn "social scripts" related to resturant
rlm@376 152 behaviour. The authors do this by creating a game which implements a
rlm@376 153 virtual restruant, and recoding actual human players as they
rlm@376 154 interact with the game. The learn scripts from annotated
rlm@376 155 interactions and then use those scripts to label other
rlm@376 156 interactions. They don't get very good results, but their
rlm@376 157 methodology of creating a virtual world and recording
rlm@376 158 low-dimensional actions is interesting.
rlm@376 159
rlm@376 160 - Torque 2D/3D looks like an interesting game engine.
rlm@376 161
rlm@376 162
rlm@376 163 * Face Recognition by Humans: Nineteen Results all Computer Vision Researchers should know, Sinha
rlm@376 164
rlm@376 165 This is a summary of a lot of bio experiments on human face
rlm@376 166 recognition.
rlm@376 167
rlm@376 168 - They assert again that the internal gradients/structures of a face
rlm@376 169 are more important than the edges.
rlm@376 170
rlm@376 171 - It's amazing to me that it takes about 10 years after birth for a
rlm@376 172 human to get advanced adult-like face detection. They go through
rlm@376 173 feature based processing to a holistic based approach during this
rlm@376 174 time.
rlm@376 175
rlm@376 176 - Finally, color is a very important cue for identifying faces.
rlm@371 177
rlm@371 178 ** References
rlm@376 179 - A. Freire, K. Lee, and L. A. Symons, BThe face-inversion effect as
rlm@376 180 a deficit in the encoding of configural information: Direct
rlm@376 181 evidence,[ Perception, vol. 29, no. 2, pp. 159–170, 2000.
rlm@376 182 - M. B. Lewis, BThatcher’s children: Development and the Thatcher
rlm@376 183 illusion,[Perception, vol. 32, pp. 1415–21, 2003.
rlm@376 184 - E. McKone and N. Kanwisher, BDoes the human brain process objects
rlm@376 185 of expertise like faces? A review of the evidence,[ in From Monkey
rlm@376 186 Brain to Human Brain, S. Dehaene, J. R. Duhamel, M. Hauser, and
rlm@376 187 G. Rizzolatti, Eds. Cambridge, MA: MIT Press, 2005.
rlm@376 188
rlm@376 189
rlm@376 190
rlm@376 191
rlm@376 192 heee~eeyyyy kids, time to get eagle'd!!!!
rlm@376 193
rlm@376 194
rlm@376 195
rlm@376 196
rlm@376 197
rlm@376 198 * Ullman
rlm@376 199
rlm@376 200 Actual code reuse!
rlm@376 201
rlm@376 202 precision = fraction of retrieved instances that are relevant
rlm@376 203 (true-postives/(true-positives+false-positives))
rlm@376 204
rlm@376 205 recall = fraction of relevant instances that are retrieved
rlm@376 206 (true-positives/total-in-class)
rlm@376 207
rlm@376 208 cross-validation = train the model on two different sets to prevent
rlm@376 209 overfitting.
rlm@376 210
rlm@377 211 nifty, relevant, realistic ideas
rlm@377 212 He doesn't confine himself to unplasaubile assumptions
rlm@376 213
rlm@378 214 ** Our Reading
rlm@378 215 *** 2002 Visual features of intermediate complexity and their use in classification
rlm@376 216
rlm@378 217
rlm@376 218
rlm@376 219
rlm@376 220 ** Getting around the dumb "fixed training set" methods
rlm@376 221
rlm@376 222 *** 2006 Learning to classify by ongoing feature selection
rlm@376 223
rlm@376 224 Brings in the most informative features of a class, based on
rlm@376 225 mutual information between that feature and all the examples
rlm@376 226 encountered so far. To bound the running time, he uses only a
rlm@376 227 fixed number of the most recent examples. He uses a replacement
rlm@376 228 strategy to tell whether a new feature is better than one of the
rlm@376 229 corrent features.
rlm@376 230
rlm@376 231 *** 2009 Learning model complexity in an online environment
rlm@376 232
rlm@376 233 Sort of like the heirichal baysean models of Tennanbaum, this
rlm@376 234 system makes the model more and more complicated as it gets more
rlm@376 235 and more training data. It does this by using two systems in
rlm@376 236 parallell and then whenever the more complex one seems to be
rlm@376 237 needed by the data, the less complex one is thrown out, and an
rlm@376 238 even more complex model is initialized in its place.
rlm@376 239
rlm@376 240 He uses a SVM with polynominal kernels of varying complexity. He
rlm@376 241 gets good perfoemance on a handwriting classfication using a large
rlm@376 242 range of training samples, since his model changes complexity
rlm@376 243 depending on the number of training samples. The simpler models do
rlm@376 244 better with few training points, and the more complex ones do
rlm@376 245 better with many training points.
rlm@376 246
rlm@377 247 The final model had intermediate complexity between published
rlm@377 248 extremes.
rlm@377 249
rlm@376 250 The more complex models must be able to be initialized efficiently
rlm@376 251 from the less complex models which they replace!
rlm@376 252
rlm@376 253
rlm@376 254 ** Non Parametric Models
rlm@376 255
rlm@377 256 *** 2010 The chains model for detecting parts by their context
rlm@376 257
rlm@376 258 Like the constelation method for rigid objects, but extended to
rlm@376 259 non-rigid objects as well.
rlm@376 260
rlm@376 261 Allows you to build a hand detector from a face detector. This is
rlm@376 262 usefull because hands might be only a few pixels, and very
rlm@376 263 ambiguous in an image, but if you are expecting them at the end of
rlm@376 264 an arm, then they become easier to find.
rlm@376 265
rlm@377 266 They make chains by using spatial proximity of features. That way,
rlm@377 267 a hand can be idntified by chaining back from the head. If there
rlm@377 268 is a good chain to the head, then it is more likely that there is
rlm@377 269 a hand than if there isn't. Since there is some give in the
rlm@377 270 proximity detection, the system can accomodate new poses that it
rlm@377 271 has never seen before.
rlm@377 272
rlm@377 273 Does not use any motion information.
rlm@377 274
rlm@377 275 *** 2005 A Hierarchical Non-Parametric Method for Capturing Non-Rigid Deformations
rlm@377 276
rlm@377 277 (relative dynamic programming [RDP])
rlm@377 278
rlm@377 279 Goal is to match images, as in SIFT, but this time the images can
rlm@377 280 be subject to non rigid transformations. They do this by finding
rlm@377 281 small patches that look the same, then building up bigger
rlm@377 282 patches. They get a tree of patches that describes each image, and
rlm@377 283 find the edit distance between each tree. Editing operations
rlm@377 284 involve a coherent shift of features, so they can accomodate local
rlm@377 285 shifts of patches in any direction. They get some cool results
rlm@377 286 over just straight correlation. Basically, they made an image
rlm@377 287 comparor that is resistant to multiple independent deformations.
rlm@377 288
rlm@377 289 !important small regions are treated the same as nonimportant
rlm@377 290 small regions
rlm@377 291
rlm@377 292 !no conception of shape
rlm@377 293
rlm@377 294 quote:
rlm@377 295 The dynamic programming procedure looks for an optimal
rlm@377 296 transformation that aligns the patches of both images. This
rlm@377 297 transformation is not a global transformation, but a composition
rlm@377 298 of many local transformations of sub-patches at various sizes,
rlm@377 299 performed one on top of the other.
rlm@377 300
rlm@377 301 *** 2006 Satellite Features for the Classification of Visually Similar Classes
rlm@377 302
rlm@377 303 Finds features that can distinguish subclasses of a class, by
rlm@377 304 first finding a rigid set of anghor features that are common to
rlm@377 305 both subclasses, then finding distinguishing features relative to
rlm@377 306 those subfeatures. They keep things rigid because the satellite
rlm@377 307 features don't have much information in and of themselves, and are
rlm@377 308 only informative relative to other features.
rlm@377 309
rlm@377 310 *** 2005 Learning a novel class from a single example by cross-generalization.
rlm@377 311
rlm@377 312 Let's you use a vast visual experience to generate a classifier
rlm@377 313 for a novel class by generating synthetic examples by replaceing
rlm@377 314 features from the single example with features from similiar
rlm@377 315 classes.
rlm@377 316
rlm@377 317 quote: feature F is likely to be useful for class C if a similar
rlm@377 318 feature F proved effective for a similar class C in the past.
rlm@377 319
rlm@377 320 Allows you to trasfer the "gestalt" of a similiar class to a new
rlm@377 321 class, by adapting all the features of the learned class that have
rlm@378 322 correspondance to the new class.
rlm@378 323
rlm@378 324 *** 2007 Semantic Hierarchies for Recognizing Objects and Parts
rlm@378 325
rlm@378 326 Better learning of complex objects like faces by learning each
rlm@378 327 piece (like nose, mouth, eye, etc) separately, then making sure
rlm@378 328 that the features are in plausable positions.