cortex: org/literature-review.org annotate

annotate org/literature-review.org @ 485:ac953b562eab

completed first draft.

author	Robert McIntyre <rlm@mit.edu>
date	Sat, 29 Mar 2014 16:22:49 -0400
parents	8e62bf52be59
children

rev	line source
rlm@371	1 When I write my thesis, I want it to have links to every
rlm@371	2
rlm@371	3
rlm@371	4
rlm@369	5 * Object Recognition from Local Scale-Invariant Features, David G. Lowe
rlm@369	6
rlm@369	7 This is the famous SIFT paper that is mentioned everywhere.
rlm@369	8
rlm@369	9 This is a way to find objects in images given an image of that
rlm@369	10 object. It is moderately risistant to variations in the sample image
rlm@369	11 and the target image. Basically, this is a fancy way of picking out
rlm@369	12 a test pattern embedded in a larger pattern. It would fail to learn
rlm@369	13 anything resembling object categories, for instance. Usefull concept
rlm@369	14 is the idea of storing the local scale and rotation of each feature
rlm@369	15 as it is extracted from the image, then checking to make sure that
rlm@369	16 proposed matches all more-or-less agree on shift, rotation, scale,
rlm@369	17 etc. Another good idea is to use points instead of edges, since
rlm@369	18 they seem more robust.
rlm@369	19
rlm@369	20 ** References:
rlm@369	21 - Basri, Ronen, and David. W. Jacobs, “Recognition using region
rlm@369	22 correspondences,” International Journal of Computer Vision, 25, 2
rlm@369	23 (1996), pp. 141–162.
rlm@369	24
rlm@369	25 - Edelman, Shimon, Nathan Intrator, and Tomaso Poggio, “Complex
rlm@369	26 cells and object recognition,” Unpublished Manuscript, preprint at
rlm@369	27 http://www.ai.mit.edu/edelman/mirror/nips97.ps.Z
rlm@369	28
rlm@369	29 - Lindeberg, Tony, “Detecting salient blob-like image structures
rlm@369	30 and their scales with a scale-space primal sketch: a method for
rlm@369	31 focus-of-attention,” International Journal of Computer Vision, 11, 3
rlm@369	32 (1993), pp. 283–318.
rlm@369	33
rlm@369	34 - Murase, Hiroshi, and Shree K. Nayar, “Visual learning and
rlm@369	35 recognition of 3-D objects from appearance,” International Journal
rlm@369	36 of Computer Vision, 14, 1 (1995), pp. 5–24.
rlm@369	37
rlm@369	38 - Ohba, Kohtaro, and Katsushi Ikeuchi, “Detectability, uniqueness,
rlm@369	39 and reliability of eigen windows for stable verification of
rlm@369	40 partially occluded objects,” IEEE Trans. on Pattern Analysis and
rlm@369	41 Machine Intelligence, 19, 9 (1997), pp. 1043–48.
rlm@369	42
rlm@369	43 - Zhang, Z., R. Deriche, O. Faugeras, Q.T. Luong, “A robust
rlm@369	44 technique for matching two uncalibrated images through the recovery
rlm@376	45 of the unknown epipolar geometry,” Artificial Intelligence, 78,
rlm@369	46 (1995), pp. 87-119.
rlm@369	47
rlm@369	48
rlm@369	49
rlm@369	50
rlm@376	51
rlm@371	52 * Alignment by Maximization of Mutual Information, Paul A. Viola
rlm@371	53
rlm@371	54 PhD Thesis recommended by Winston. Describes a system that is able
rlm@371	55 to align a 3D computer model of an object with an image of that
rlm@371	56 object.
rlm@371	57
rlm@371	58 - Pages 9-19 is a very adequate intro to the algorithm.
rlm@371	59
rlm@371	60 - Has a useful section on entropy and probability at the beginning
rlm@371	61 which is worth reading, especially the part about entropy.
rlm@371	62
rlm@371	63 - Differential entropy seems a bit odd -- you would think that it
rlm@371	64 should be the same as normal entropy for a discrete distrubition
rlm@371	65 embedded in continuous space. How do you measure the entropy of a
rlm@376	66 half continuous, half discrete random variable? Perhaps the
rlm@376	67 problem is related to the delta function, and not the definition
rlm@376	68 of differential entropy?
rlm@371	69
rlm@371	70 - Expectation Maximation (Mixture of Gaussians cool stuff)
rlm@371	71 (Dempster 1977)
rlm@371	72
rlm@371	73 - Good introduction to Parzen Window Density Estimation. Parzen
rlm@371	74 density functions trade construction time for evaulation
rlm@376	75 time.(Pg. 41) They are a way to transform a sample into a
rlm@376	76 distribution. They don't work very well in higher dimensions due
rlm@376	77 to the thinning of sample points.
rlm@376	78
rlm@376	79 - Calculating the entropy of a Markov Model (or state machine,
rlm@376	80 program, etc) seems like it would be very hard, since each trial
rlm@376	81 would not be independent of the other trials. Yet, there are many
rlm@376	82 common sense models that do need to have state to accurately model
rlm@376	83 the world.
rlm@376	84
rlm@376	85 - "... there is no direct procedure for evaluating entropy from a
rlm@376	86 sample. A common approach is to model the density from the sample,
rlm@376	87 and then estimate the entropy from the density."
rlm@376	88
rlm@376	89 - pg. 55 he says that infinity minus infinity is zero lol.
rlm@376	90
rlm@376	91 - great idea on pg 62 about using random samples from images to
rlm@376	92 speed up computation.
rlm@376	93
rlm@376	94 - practical way of terminating a random search: "A better idea is to
rlm@376	95 reduce the learning rate until the parameters have a reasonable
rlm@376	96 variance and then take the average parameters."
rlm@376	97
rlm@376	98 - p. 65 bullshit hack to make his parzen window estimates work.
rlm@376	99
rlm@376	100 - this alignment only works if the initial pose is not very far
rlm@376	101 off.
rlm@376	102
rlm@371	103
rlm@371	104 Occlusion? Seems a bit holistic.
rlm@371	105
rlm@376	106 ** References
rlm@376	107 - "excellent" book on entropy (Cover & Thomas, 1991) [Elements of
rlm@376	108 Information Theory.]
rlm@376	109
rlm@376	110 - Canny, J. (1986). A Computational Approach to Edge Detection. IEEE
rlm@376	111 Transactions PAMI, PAMI-8(6):679{698
rlm@376	112
rlm@376	113 - Chin, R. and Dyer, C. (1986). Model-Based Recognition in Robot
rlm@376	114 Vision. Computing Surveys, 18:67-108.
rlm@376	115
rlm@376	116 - Grimson, W., Lozano-Perez, T., Wells, W., et al. (1994). An
rlm@376	117 Automatic Registration Method for Frameless Stereotaxy, Image
rlm@376	118 Guided Surgery, and Enhanced Realigy Visualization. In Proceedings
rlm@376	119 of the Computer Society Conference on Computer Vision and Pattern
rlm@376	120 Recognition, Seattle, WA. IEEE.
rlm@376	121
rlm@376	122 - Hill, D. L., Studholme, C., and Hawkes, D. J. (1994). Voxel
rlm@376	123 Similarity Measures for Auto-mated Image Registration. In
rlm@376	124 Proceedings of the Third Conference on Visualization in Biomedical
rlm@376	125 Computing, pages 205 { 216. SPIE.
rlm@376	126
rlm@376	127 - Kirkpatrick, S., Gelatt, C., and Vecch Optimization by Simulated
rlm@376	128 Annealing. Science, 220(4598):671-680.
rlm@376	129
rlm@376	130 - Jones, M. and Poggio, T. (1995). Model-based matching of line
rlm@376	131 drawings by linear combin-ations of prototypes. Proceedings of the
rlm@376	132 International Conference on Computer Vision
rlm@376	133
rlm@376	134 - Ljung, L. and Soderstrom, T. (1983). Theory and Practice of
rlm@376	135 Recursive Identi cation. MIT Press.
rlm@376	136
rlm@376	137 - Shannon, C. E. (1948). A mathematical theory of communication. Bell
rlm@376	138 Systems Technical Journal, 27:379-423 and 623-656.
rlm@376	139
rlm@376	140 - Shashua, A. (1992). Geometry and Photometry in 3D Visual
rlm@376	141 Recognition. PhD thesis, M.I.T Artificial Intelligence Laboratory,
rlm@376	142 AI-TR-1401.
rlm@376	143
rlm@376	144 - William H. Press, Brian P. Flannery, S. A. T. and Veterling,
rlm@376	145 W. T. (1992). Numerical Recipes in C: The Art of Scienti c
rlm@376	146 Computing. Cambridge University Press, Cambridge, England, second
rlm@376	147 edition edition.
rlm@376	148
rlm@376	149 * Semi-Automated Dialogue Act Classification for Situated Social Agents in Games, Deb Roy
rlm@376	150
rlm@376	151 Interesting attempt to learn "social scripts" related to resturant
rlm@376	152 behaviour. The authors do this by creating a game which implements a
rlm@376	153 virtual restruant, and recoding actual human players as they
rlm@376	154 interact with the game. The learn scripts from annotated
rlm@376	155 interactions and then use those scripts to label other
rlm@376	156 interactions. They don't get very good results, but their
rlm@376	157 methodology of creating a virtual world and recording
rlm@376	158 low-dimensional actions is interesting.
rlm@376	159
rlm@376	160 - Torque 2D/3D looks like an interesting game engine.
rlm@376	161
rlm@376	162
rlm@376	163 * Face Recognition by Humans: Nineteen Results all Computer Vision Researchers should know, Sinha
rlm@376	164
rlm@376	165 This is a summary of a lot of bio experiments on human face
rlm@376	166 recognition.
rlm@376	167
rlm@376	168 - They assert again that the internal gradients/structures of a face
rlm@376	169 are more important than the edges.
rlm@376	170
rlm@376	171 - It's amazing to me that it takes about 10 years after birth for a
rlm@376	172 human to get advanced adult-like face detection. They go through
rlm@376	173 feature based processing to a holistic based approach during this
rlm@376	174 time.
rlm@376	175
rlm@376	176 - Finally, color is a very important cue for identifying faces.
rlm@371	177
rlm@371	178 ** References
rlm@376	179 - A. Freire, K. Lee, and L. A. Symons, BThe face-inversion effect as
rlm@376	180 a deficit in the encoding of configural information: Direct
rlm@376	181 evidence,[ Perception, vol. 29, no. 2, pp. 159–170, 2000.
rlm@376	182 - M. B. Lewis, BThatcher’s children: Development and the Thatcher
rlm@376	183 illusion,[Perception, vol. 32, pp. 1415–21, 2003.
rlm@376	184 - E. McKone and N. Kanwisher, BDoes the human brain process objects
rlm@376	185 of expertise like faces? A review of the evidence,[ in From Monkey
rlm@376	186 Brain to Human Brain, S. Dehaene, J. R. Duhamel, M. Hauser, and
rlm@376	187 G. Rizzolatti, Eds. Cambridge, MA: MIT Press, 2005.
rlm@376	188
rlm@376	189
rlm@376	190
rlm@376	191
rlm@376	192 heee~eeyyyy kids, time to get eagle'd!!!!
rlm@376	193
rlm@376	194
rlm@376	195
rlm@376	196
rlm@376	197
rlm@376	198 * Ullman
rlm@376	199
rlm@376	200 Actual code reuse!
rlm@376	201
rlm@376	202 precision = fraction of retrieved instances that are relevant
rlm@376	203 (true-postives/(true-positives+false-positives))
rlm@376	204
rlm@376	205 recall = fraction of relevant instances that are retrieved
rlm@376	206 (true-positives/total-in-class)
rlm@376	207
rlm@376	208 cross-validation = train the model on two different sets to prevent
rlm@376	209 overfitting.
rlm@376	210
rlm@377	211 nifty, relevant, realistic ideas
rlm@377	212 He doesn't confine himself to unplasaubile assumptions
rlm@376	213
rlm@378	214 ** Our Reading
rlm@378	215 *** 2002 Visual features of intermediate complexity and their use in classification
rlm@376	216
rlm@378	217
rlm@376	218
rlm@376	219
rlm@376	220 ** Getting around the dumb "fixed training set" methods
rlm@376	221
rlm@376	222 *** 2006 Learning to classify by ongoing feature selection
rlm@376	223
rlm@376	224 Brings in the most informative features of a class, based on
rlm@376	225 mutual information between that feature and all the examples
rlm@376	226 encountered so far. To bound the running time, he uses only a
rlm@376	227 fixed number of the most recent examples. He uses a replacement
rlm@376	228 strategy to tell whether a new feature is better than one of the
rlm@376	229 corrent features.
rlm@376	230
rlm@376	231 *** 2009 Learning model complexity in an online environment
rlm@376	232
rlm@376	233 Sort of like the heirichal baysean models of Tennanbaum, this
rlm@376	234 system makes the model more and more complicated as it gets more
rlm@376	235 and more training data. It does this by using two systems in
rlm@376	236 parallell and then whenever the more complex one seems to be
rlm@376	237 needed by the data, the less complex one is thrown out, and an
rlm@376	238 even more complex model is initialized in its place.
rlm@376	239
rlm@376	240 He uses a SVM with polynominal kernels of varying complexity. He
rlm@376	241 gets good perfoemance on a handwriting classfication using a large
rlm@376	242 range of training samples, since his model changes complexity
rlm@376	243 depending on the number of training samples. The simpler models do
rlm@376	244 better with few training points, and the more complex ones do
rlm@376	245 better with many training points.
rlm@376	246
rlm@377	247 The final model had intermediate complexity between published
rlm@377	248 extremes.
rlm@377	249
rlm@376	250 The more complex models must be able to be initialized efficiently
rlm@376	251 from the less complex models which they replace!
rlm@376	252
rlm@376	253
rlm@376	254 ** Non Parametric Models
rlm@376	255
rlm@377	256 *** 2010 The chains model for detecting parts by their context
rlm@376	257
rlm@376	258 Like the constelation method for rigid objects, but extended to
rlm@376	259 non-rigid objects as well.
rlm@376	260
rlm@376	261 Allows you to build a hand detector from a face detector. This is
rlm@376	262 usefull because hands might be only a few pixels, and very
rlm@376	263 ambiguous in an image, but if you are expecting them at the end of
rlm@376	264 an arm, then they become easier to find.
rlm@376	265
rlm@377	266 They make chains by using spatial proximity of features. That way,
rlm@377	267 a hand can be idntified by chaining back from the head. If there
rlm@377	268 is a good chain to the head, then it is more likely that there is
rlm@377	269 a hand than if there isn't. Since there is some give in the
rlm@377	270 proximity detection, the system can accomodate new poses that it
rlm@377	271 has never seen before.
rlm@377	272
rlm@377	273 Does not use any motion information.
rlm@377	274
rlm@377	275 *** 2005 A Hierarchical Non-Parametric Method for Capturing Non-Rigid Deformations
rlm@377	276
rlm@377	277 (relative dynamic programming [RDP])
rlm@377	278
rlm@377	279 Goal is to match images, as in SIFT, but this time the images can
rlm@377	280 be subject to non rigid transformations. They do this by finding
rlm@377	281 small patches that look the same, then building up bigger
rlm@377	282 patches. They get a tree of patches that describes each image, and
rlm@377	283 find the edit distance between each tree. Editing operations
rlm@377	284 involve a coherent shift of features, so they can accomodate local
rlm@377	285 shifts of patches in any direction. They get some cool results
rlm@377	286 over just straight correlation. Basically, they made an image
rlm@377	287 comparor that is resistant to multiple independent deformations.
rlm@377	288
rlm@377	289 !important small regions are treated the same as nonimportant
rlm@377	290 small regions
rlm@377	291
rlm@377	292 !no conception of shape
rlm@377	293
rlm@377	294 quote:
rlm@377	295 The dynamic programming procedure looks for an optimal
rlm@377	296 transformation that aligns the patches of both images. This
rlm@377	297 transformation is not a global transformation, but a composition
rlm@377	298 of many local transformations of sub-patches at various sizes,
rlm@377	299 performed one on top of the other.
rlm@377	300
rlm@377	301 *** 2006 Satellite Features for the Classification of Visually Similar Classes
rlm@377	302
rlm@377	303 Finds features that can distinguish subclasses of a class, by
rlm@377	304 first finding a rigid set of anghor features that are common to
rlm@377	305 both subclasses, then finding distinguishing features relative to
rlm@377	306 those subfeatures. They keep things rigid because the satellite
rlm@377	307 features don't have much information in and of themselves, and are
rlm@377	308 only informative relative to other features.
rlm@377	309
rlm@377	310 *** 2005 Learning a novel class from a single example by cross-generalization.
rlm@377	311
rlm@377	312 Let's you use a vast visual experience to generate a classifier
rlm@377	313 for a novel class by generating synthetic examples by replaceing
rlm@377	314 features from the single example with features from similiar
rlm@377	315 classes.
rlm@377	316
rlm@377	317 quote: feature F is likely to be useful for class C if a similar
rlm@377	318 feature F proved effective for a similar class C in the past.
rlm@377	319
rlm@377	320 Allows you to trasfer the "gestalt" of a similiar class to a new
rlm@377	321 class, by adapting all the features of the learned class that have
rlm@378	322 correspondance to the new class.
rlm@378	323
rlm@378	324 *** 2007 Semantic Hierarchies for Recognizing Objects and Parts
rlm@378	325
rlm@378	326 Better learning of complex objects like faces by learning each
rlm@378	327 piece (like nose, mouth, eye, etc) separately, then making sure
rlm@378	328 that the features are in plausable positions.

Mercurial > cortex

annotate org/literature-review.org @ 485:ac953b562eab