annotate org/literature-review.org @ 376:057d47fc4789

reviewing ullman's stuff.
author Robert McIntyre <rlm@mit.edu>
date Thu, 11 Apr 2013 05:40:23 +0000
parents 9c37a55e1cd2
children 80cd096682b2
rev   line source
rlm@371 1 When I write my thesis, I want it to have links to every
rlm@371 2
rlm@371 3
rlm@371 4
rlm@369 5 * Object Recognition from Local Scale-Invariant Features, David G. Lowe
rlm@369 6
rlm@369 7 This is the famous SIFT paper that is mentioned everywhere.
rlm@369 8
rlm@369 9 This is a way to find objects in images given an image of that
rlm@369 10 object. It is moderately risistant to variations in the sample image
rlm@369 11 and the target image. Basically, this is a fancy way of picking out
rlm@369 12 a test pattern embedded in a larger pattern. It would fail to learn
rlm@369 13 anything resembling object categories, for instance. Usefull concept
rlm@369 14 is the idea of storing the local scale and rotation of each feature
rlm@369 15 as it is extracted from the image, then checking to make sure that
rlm@369 16 proposed matches all more-or-less agree on shift, rotation, scale,
rlm@369 17 etc. Another good idea is to use points instead of edges, since
rlm@369 18 they seem more robust.
rlm@369 19
rlm@369 20 ** References:
rlm@369 21 - Basri, Ronen, and David. W. Jacobs, “Recognition using region
rlm@369 22 correspondences,” International Journal of Computer Vision, 25, 2
rlm@369 23 (1996), pp. 141–162.
rlm@369 24
rlm@369 25 - Edelman, Shimon, Nathan Intrator, and Tomaso Poggio, “Complex
rlm@369 26 cells and object recognition,” Unpublished Manuscript, preprint at
rlm@369 27 http://www.ai.mit.edu/edelman/mirror/nips97.ps.Z
rlm@369 28
rlm@369 29 - Lindeberg, Tony, “Detecting salient blob-like image structures
rlm@369 30 and their scales with a scale-space primal sketch: a method for
rlm@369 31 focus-of-attention,” International Journal of Computer Vision, 11, 3
rlm@369 32 (1993), pp. 283–318.
rlm@369 33
rlm@369 34 - Murase, Hiroshi, and Shree K. Nayar, “Visual learning and
rlm@369 35 recognition of 3-D objects from appearance,” International Journal
rlm@369 36 of Computer Vision, 14, 1 (1995), pp. 5–24.
rlm@369 37
rlm@369 38 - Ohba, Kohtaro, and Katsushi Ikeuchi, “Detectability, uniqueness,
rlm@369 39 and reliability of eigen windows for stable verification of
rlm@369 40 partially occluded objects,” IEEE Trans. on Pattern Analysis and
rlm@369 41 Machine Intelligence, 19, 9 (1997), pp. 1043–48.
rlm@369 42
rlm@369 43 - Zhang, Z., R. Deriche, O. Faugeras, Q.T. Luong, “A robust
rlm@369 44 technique for matching two uncalibrated images through the recovery
rlm@376 45 of the unknown epipolar geometry,” Artificial Intelligence, 78,
rlm@369 46 (1995), pp. 87-119.
rlm@369 47
rlm@369 48
rlm@369 49
rlm@369 50
rlm@376 51
rlm@371 52 * Alignment by Maximization of Mutual Information, Paul A. Viola
rlm@371 53
rlm@371 54 PhD Thesis recommended by Winston. Describes a system that is able
rlm@371 55 to align a 3D computer model of an object with an image of that
rlm@371 56 object.
rlm@371 57
rlm@371 58 - Pages 9-19 is a very adequate intro to the algorithm.
rlm@371 59
rlm@371 60 - Has a useful section on entropy and probability at the beginning
rlm@371 61 which is worth reading, especially the part about entropy.
rlm@371 62
rlm@371 63 - Differential entropy seems a bit odd -- you would think that it
rlm@371 64 should be the same as normal entropy for a discrete distrubition
rlm@371 65 embedded in continuous space. How do you measure the entropy of a
rlm@376 66 half continuous, half discrete random variable? Perhaps the
rlm@376 67 problem is related to the delta function, and not the definition
rlm@376 68 of differential entropy?
rlm@371 69
rlm@371 70 - Expectation Maximation (Mixture of Gaussians cool stuff)
rlm@371 71 (Dempster 1977)
rlm@371 72
rlm@371 73 - Good introduction to Parzen Window Density Estimation. Parzen
rlm@371 74 density functions trade construction time for evaulation
rlm@376 75 time.(Pg. 41) They are a way to transform a sample into a
rlm@376 76 distribution. They don't work very well in higher dimensions due
rlm@376 77 to the thinning of sample points.
rlm@376 78
rlm@376 79 - Calculating the entropy of a Markov Model (or state machine,
rlm@376 80 program, etc) seems like it would be very hard, since each trial
rlm@376 81 would not be independent of the other trials. Yet, there are many
rlm@376 82 common sense models that do need to have state to accurately model
rlm@376 83 the world.
rlm@376 84
rlm@376 85 - "... there is no direct procedure for evaluating entropy from a
rlm@376 86 sample. A common approach is to model the density from the sample,
rlm@376 87 and then estimate the entropy from the density."
rlm@376 88
rlm@376 89 - pg. 55 he says that infinity minus infinity is zero lol.
rlm@376 90
rlm@376 91 - great idea on pg 62 about using random samples from images to
rlm@376 92 speed up computation.
rlm@376 93
rlm@376 94 - practical way of terminating a random search: "A better idea is to
rlm@376 95 reduce the learning rate until the parameters have a reasonable
rlm@376 96 variance and then take the average parameters."
rlm@376 97
rlm@376 98 - p. 65 bullshit hack to make his parzen window estimates work.
rlm@376 99
rlm@376 100 - this alignment only works if the initial pose is not very far
rlm@376 101 off.
rlm@376 102
rlm@371 103
rlm@371 104 Occlusion? Seems a bit holistic.
rlm@371 105
rlm@376 106 ** References
rlm@376 107 - "excellent" book on entropy (Cover & Thomas, 1991) [Elements of
rlm@376 108 Information Theory.]
rlm@376 109
rlm@376 110 - Canny, J. (1986). A Computational Approach to Edge Detection. IEEE
rlm@376 111 Transactions PAMI, PAMI-8(6):679{698
rlm@376 112
rlm@376 113 - Chin, R. and Dyer, C. (1986). Model-Based Recognition in Robot
rlm@376 114 Vision. Computing Surveys, 18:67-108.
rlm@376 115
rlm@376 116 - Grimson, W., Lozano-Perez, T., Wells, W., et al. (1994). An
rlm@376 117 Automatic Registration Method for Frameless Stereotaxy, Image
rlm@376 118 Guided Surgery, and Enhanced Realigy Visualization. In Proceedings
rlm@376 119 of the Computer Society Conference on Computer Vision and Pattern
rlm@376 120 Recognition, Seattle, WA. IEEE.
rlm@376 121
rlm@376 122 - Hill, D. L., Studholme, C., and Hawkes, D. J. (1994). Voxel
rlm@376 123 Similarity Measures for Auto-mated Image Registration. In
rlm@376 124 Proceedings of the Third Conference on Visualization in Biomedical
rlm@376 125 Computing, pages 205 { 216. SPIE.
rlm@376 126
rlm@376 127 - Kirkpatrick, S., Gelatt, C., and Vecch Optimization by Simulated
rlm@376 128 Annealing. Science, 220(4598):671-680.
rlm@376 129
rlm@376 130 - Jones, M. and Poggio, T. (1995). Model-based matching of line
rlm@376 131 drawings by linear combin-ations of prototypes. Proceedings of the
rlm@376 132 International Conference on Computer Vision
rlm@376 133
rlm@376 134 - Ljung, L. and Soderstrom, T. (1983). Theory and Practice of
rlm@376 135 Recursive Identi cation. MIT Press.
rlm@376 136
rlm@376 137 - Shannon, C. E. (1948). A mathematical theory of communication. Bell
rlm@376 138 Systems Technical Journal, 27:379-423 and 623-656.
rlm@376 139
rlm@376 140 - Shashua, A. (1992). Geometry and Photometry in 3D Visual
rlm@376 141 Recognition. PhD thesis, M.I.T Artificial Intelligence Laboratory,
rlm@376 142 AI-TR-1401.
rlm@376 143
rlm@376 144 - William H. Press, Brian P. Flannery, S. A. T. and Veterling,
rlm@376 145 W. T. (1992). Numerical Recipes in C: The Art of Scienti c
rlm@376 146 Computing. Cambridge University Press, Cambridge, England, second
rlm@376 147 edition edition.
rlm@376 148
rlm@376 149 * Semi-Automated Dialogue Act Classification for Situated Social Agents in Games, Deb Roy
rlm@376 150
rlm@376 151 Interesting attempt to learn "social scripts" related to resturant
rlm@376 152 behaviour. The authors do this by creating a game which implements a
rlm@376 153 virtual restruant, and recoding actual human players as they
rlm@376 154 interact with the game. The learn scripts from annotated
rlm@376 155 interactions and then use those scripts to label other
rlm@376 156 interactions. They don't get very good results, but their
rlm@376 157 methodology of creating a virtual world and recording
rlm@376 158 low-dimensional actions is interesting.
rlm@376 159
rlm@376 160 - Torque 2D/3D looks like an interesting game engine.
rlm@376 161
rlm@376 162
rlm@376 163 * Face Recognition by Humans: Nineteen Results all Computer Vision Researchers should know, Sinha
rlm@376 164
rlm@376 165 This is a summary of a lot of bio experiments on human face
rlm@376 166 recognition.
rlm@376 167
rlm@376 168 - They assert again that the internal gradients/structures of a face
rlm@376 169 are more important than the edges.
rlm@376 170
rlm@376 171 - It's amazing to me that it takes about 10 years after birth for a
rlm@376 172 human to get advanced adult-like face detection. They go through
rlm@376 173 feature based processing to a holistic based approach during this
rlm@376 174 time.
rlm@376 175
rlm@376 176 - Finally, color is a very important cue for identifying faces.
rlm@371 177
rlm@371 178 ** References
rlm@376 179 - A. Freire, K. Lee, and L. A. Symons, BThe face-inversion effect as
rlm@376 180 a deficit in the encoding of configural information: Direct
rlm@376 181 evidence,[ Perception, vol. 29, no. 2, pp. 159–170, 2000.
rlm@376 182 - M. B. Lewis, BThatcher’s children: Development and the Thatcher
rlm@376 183 illusion,[Perception, vol. 32, pp. 1415–21, 2003.
rlm@376 184 - E. McKone and N. Kanwisher, BDoes the human brain process objects
rlm@376 185 of expertise like faces? A review of the evidence,[ in From Monkey
rlm@376 186 Brain to Human Brain, S. Dehaene, J. R. Duhamel, M. Hauser, and
rlm@376 187 G. Rizzolatti, Eds. Cambridge, MA: MIT Press, 2005.
rlm@376 188
rlm@376 189
rlm@376 190
rlm@376 191
rlm@376 192 heee~eeyyyy kids, time to get eagle'd!!!!
rlm@376 193
rlm@376 194
rlm@376 195
rlm@376 196
rlm@376 197
rlm@376 198 * Ullman
rlm@376 199
rlm@376 200 Actual code reuse!
rlm@376 201
rlm@376 202 precision = fraction of retrieved instances that are relevant
rlm@376 203 (true-postives/(true-positives+false-positives))
rlm@376 204
rlm@376 205 recall = fraction of relevant instances that are retrieved
rlm@376 206 (true-positives/total-in-class)
rlm@376 207
rlm@376 208 cross-validation = train the model on two different sets to prevent
rlm@376 209 overfitting.
rlm@376 210
rlm@376 211
rlm@376 212
rlm@376 213
rlm@376 214
rlm@376 215 ** Getting around the dumb "fixed training set" methods
rlm@376 216
rlm@376 217 *** 2006 Learning to classify by ongoing feature selection
rlm@376 218
rlm@376 219 Brings in the most informative features of a class, based on
rlm@376 220 mutual information between that feature and all the examples
rlm@376 221 encountered so far. To bound the running time, he uses only a
rlm@376 222 fixed number of the most recent examples. He uses a replacement
rlm@376 223 strategy to tell whether a new feature is better than one of the
rlm@376 224 corrent features.
rlm@376 225
rlm@376 226 *** 2009 Learning model complexity in an online environment
rlm@376 227
rlm@376 228 Sort of like the heirichal baysean models of Tennanbaum, this
rlm@376 229 system makes the model more and more complicated as it gets more
rlm@376 230 and more training data. It does this by using two systems in
rlm@376 231 parallell and then whenever the more complex one seems to be
rlm@376 232 needed by the data, the less complex one is thrown out, and an
rlm@376 233 even more complex model is initialized in its place.
rlm@376 234
rlm@376 235 He uses a SVM with polynominal kernels of varying complexity. He
rlm@376 236 gets good perfoemance on a handwriting classfication using a large
rlm@376 237 range of training samples, since his model changes complexity
rlm@376 238 depending on the number of training samples. The simpler models do
rlm@376 239 better with few training points, and the more complex ones do
rlm@376 240 better with many training points.
rlm@376 241
rlm@376 242 The more complex models must be able to be initialized efficiently
rlm@376 243 from the less complex models which they replace!
rlm@376 244
rlm@376 245
rlm@376 246 ** Non Parametric Models
rlm@376 247
rlm@376 248 *** Visual features of intermediate complexity and their use in classification
rlm@376 249
rlm@376 250 *** The chains model for detecting parts by their context
rlm@376 251
rlm@376 252 Like the constelation method for rigid objects, but extended to
rlm@376 253 non-rigid objects as well.
rlm@376 254
rlm@376 255 Allows you to build a hand detector from a face detector. This is
rlm@376 256 usefull because hands might be only a few pixels, and very
rlm@376 257 ambiguous in an image, but if you are expecting them at the end of
rlm@376 258 an arm, then they become easier to find.
rlm@376 259
rlm@376 260