rlm@371: When I write my thesis, I want it to have links to every rlm@371: rlm@371: rlm@371: rlm@369: * Object Recognition from Local Scale-Invariant Features, David G. Lowe rlm@369: rlm@369: This is the famous SIFT paper that is mentioned everywhere. rlm@369: rlm@369: This is a way to find objects in images given an image of that rlm@369: object. It is moderately risistant to variations in the sample image rlm@369: and the target image. Basically, this is a fancy way of picking out rlm@369: a test pattern embedded in a larger pattern. It would fail to learn rlm@369: anything resembling object categories, for instance. Usefull concept rlm@369: is the idea of storing the local scale and rotation of each feature rlm@369: as it is extracted from the image, then checking to make sure that rlm@369: proposed matches all more-or-less agree on shift, rotation, scale, rlm@369: etc. Another good idea is to use points instead of edges, since rlm@369: they seem more robust. rlm@369: rlm@369: ** References: rlm@369: - Basri, Ronen, and David. W. Jacobs, “Recognition using region rlm@369: correspondences,” International Journal of Computer Vision, 25, 2 rlm@369: (1996), pp. 141–162. rlm@369: rlm@369: - Edelman, Shimon, Nathan Intrator, and Tomaso Poggio, “Complex rlm@369: cells and object recognition,” Unpublished Manuscript, preprint at rlm@369: http://www.ai.mit.edu/edelman/mirror/nips97.ps.Z rlm@369: rlm@369: - Lindeberg, Tony, “Detecting salient blob-like image structures rlm@369: and their scales with a scale-space primal sketch: a method for rlm@369: focus-of-attention,” International Journal of Computer Vision, 11, 3 rlm@369: (1993), pp. 283–318. rlm@369: rlm@369: - Murase, Hiroshi, and Shree K. Nayar, “Visual learning and rlm@369: recognition of 3-D objects from appearance,” International Journal rlm@369: of Computer Vision, 14, 1 (1995), pp. 5–24. rlm@369: rlm@369: - Ohba, Kohtaro, and Katsushi Ikeuchi, “Detectability, uniqueness, rlm@369: and reliability of eigen windows for stable verification of rlm@369: partially occluded objects,” IEEE Trans. on Pattern Analysis and rlm@369: Machine Intelligence, 19, 9 (1997), pp. 1043–48. rlm@369: rlm@369: - Zhang, Z., R. Deriche, O. Faugeras, Q.T. Luong, “A robust rlm@369: technique for matching two uncalibrated images through the recovery rlm@376: of the unknown epipolar geometry,” Artificial Intelligence, 78, rlm@369: (1995), pp. 87-119. rlm@369: rlm@369: rlm@369: rlm@369: rlm@376: rlm@371: * Alignment by Maximization of Mutual Information, Paul A. Viola rlm@371: rlm@371: PhD Thesis recommended by Winston. Describes a system that is able rlm@371: to align a 3D computer model of an object with an image of that rlm@371: object. rlm@371: rlm@371: - Pages 9-19 is a very adequate intro to the algorithm. rlm@371: rlm@371: - Has a useful section on entropy and probability at the beginning rlm@371: which is worth reading, especially the part about entropy. rlm@371: rlm@371: - Differential entropy seems a bit odd -- you would think that it rlm@371: should be the same as normal entropy for a discrete distrubition rlm@371: embedded in continuous space. How do you measure the entropy of a rlm@376: half continuous, half discrete random variable? Perhaps the rlm@376: problem is related to the delta function, and not the definition rlm@376: of differential entropy? rlm@371: rlm@371: - Expectation Maximation (Mixture of Gaussians cool stuff) rlm@371: (Dempster 1977) rlm@371: rlm@371: - Good introduction to Parzen Window Density Estimation. Parzen rlm@371: density functions trade construction time for evaulation rlm@376: time.(Pg. 41) They are a way to transform a sample into a rlm@376: distribution. They don't work very well in higher dimensions due rlm@376: to the thinning of sample points. rlm@376: rlm@376: - Calculating the entropy of a Markov Model (or state machine, rlm@376: program, etc) seems like it would be very hard, since each trial rlm@376: would not be independent of the other trials. Yet, there are many rlm@376: common sense models that do need to have state to accurately model rlm@376: the world. rlm@376: rlm@376: - "... there is no direct procedure for evaluating entropy from a rlm@376: sample. A common approach is to model the density from the sample, rlm@376: and then estimate the entropy from the density." rlm@376: rlm@376: - pg. 55 he says that infinity minus infinity is zero lol. rlm@376: rlm@376: - great idea on pg 62 about using random samples from images to rlm@376: speed up computation. rlm@376: rlm@376: - practical way of terminating a random search: "A better idea is to rlm@376: reduce the learning rate until the parameters have a reasonable rlm@376: variance and then take the average parameters." rlm@376: rlm@376: - p. 65 bullshit hack to make his parzen window estimates work. rlm@376: rlm@376: - this alignment only works if the initial pose is not very far rlm@376: off. rlm@376: rlm@371: rlm@371: Occlusion? Seems a bit holistic. rlm@371: rlm@376: ** References rlm@376: - "excellent" book on entropy (Cover & Thomas, 1991) [Elements of rlm@376: Information Theory.] rlm@376: rlm@376: - Canny, J. (1986). A Computational Approach to Edge Detection. IEEE rlm@376: Transactions PAMI, PAMI-8(6):679{698 rlm@376: rlm@376: - Chin, R. and Dyer, C. (1986). Model-Based Recognition in Robot rlm@376: Vision. Computing Surveys, 18:67-108. rlm@376: rlm@376: - Grimson, W., Lozano-Perez, T., Wells, W., et al. (1994). An rlm@376: Automatic Registration Method for Frameless Stereotaxy, Image rlm@376: Guided Surgery, and Enhanced Realigy Visualization. In Proceedings rlm@376: of the Computer Society Conference on Computer Vision and Pattern rlm@376: Recognition, Seattle, WA. IEEE. rlm@376: rlm@376: - Hill, D. L., Studholme, C., and Hawkes, D. J. (1994). Voxel rlm@376: Similarity Measures for Auto-mated Image Registration. In rlm@376: Proceedings of the Third Conference on Visualization in Biomedical rlm@376: Computing, pages 205 { 216. SPIE. rlm@376: rlm@376: - Kirkpatrick, S., Gelatt, C., and Vecch Optimization by Simulated rlm@376: Annealing. Science, 220(4598):671-680. rlm@376: rlm@376: - Jones, M. and Poggio, T. (1995). Model-based matching of line rlm@376: drawings by linear combin-ations of prototypes. Proceedings of the rlm@376: International Conference on Computer Vision rlm@376: rlm@376: - Ljung, L. and Soderstrom, T. (1983). Theory and Practice of rlm@376: Recursive Identi cation. MIT Press. rlm@376: rlm@376: - Shannon, C. E. (1948). A mathematical theory of communication. Bell rlm@376: Systems Technical Journal, 27:379-423 and 623-656. rlm@376: rlm@376: - Shashua, A. (1992). Geometry and Photometry in 3D Visual rlm@376: Recognition. PhD thesis, M.I.T Artificial Intelligence Laboratory, rlm@376: AI-TR-1401. rlm@376: rlm@376: - William H. Press, Brian P. Flannery, S. A. T. and Veterling, rlm@376: W. T. (1992). Numerical Recipes in C: The Art of Scienti c rlm@376: Computing. Cambridge University Press, Cambridge, England, second rlm@376: edition edition. rlm@376: rlm@376: * Semi-Automated Dialogue Act Classification for Situated Social Agents in Games, Deb Roy rlm@376: rlm@376: Interesting attempt to learn "social scripts" related to resturant rlm@376: behaviour. The authors do this by creating a game which implements a rlm@376: virtual restruant, and recoding actual human players as they rlm@376: interact with the game. The learn scripts from annotated rlm@376: interactions and then use those scripts to label other rlm@376: interactions. They don't get very good results, but their rlm@376: methodology of creating a virtual world and recording rlm@376: low-dimensional actions is interesting. rlm@376: rlm@376: - Torque 2D/3D looks like an interesting game engine. rlm@376: rlm@376: rlm@376: * Face Recognition by Humans: Nineteen Results all Computer Vision Researchers should know, Sinha rlm@376: rlm@376: This is a summary of a lot of bio experiments on human face rlm@376: recognition. rlm@376: rlm@376: - They assert again that the internal gradients/structures of a face rlm@376: are more important than the edges. rlm@376: rlm@376: - It's amazing to me that it takes about 10 years after birth for a rlm@376: human to get advanced adult-like face detection. They go through rlm@376: feature based processing to a holistic based approach during this rlm@376: time. rlm@376: rlm@376: - Finally, color is a very important cue for identifying faces. rlm@371: rlm@371: ** References rlm@376: - A. Freire, K. Lee, and L. A. Symons, BThe face-inversion effect as rlm@376: a deficit in the encoding of configural information: Direct rlm@376: evidence,[ Perception, vol. 29, no. 2, pp. 159–170, 2000. rlm@376: - M. B. Lewis, BThatcher’s children: Development and the Thatcher rlm@376: illusion,[Perception, vol. 32, pp. 1415–21, 2003. rlm@376: - E. McKone and N. Kanwisher, BDoes the human brain process objects rlm@376: of expertise like faces? A review of the evidence,[ in From Monkey rlm@376: Brain to Human Brain, S. Dehaene, J. R. Duhamel, M. Hauser, and rlm@376: G. Rizzolatti, Eds. Cambridge, MA: MIT Press, 2005. rlm@376: rlm@376: rlm@376: rlm@376: rlm@376: heee~eeyyyy kids, time to get eagle'd!!!! rlm@376: rlm@376: rlm@376: rlm@376: rlm@376: rlm@376: * Ullman rlm@376: rlm@376: Actual code reuse! rlm@376: rlm@376: precision = fraction of retrieved instances that are relevant rlm@376: (true-postives/(true-positives+false-positives)) rlm@376: rlm@376: recall = fraction of relevant instances that are retrieved rlm@376: (true-positives/total-in-class) rlm@376: rlm@376: cross-validation = train the model on two different sets to prevent rlm@376: overfitting. rlm@376: rlm@377: nifty, relevant, realistic ideas rlm@377: He doesn't confine himself to unplasaubile assumptions rlm@376: rlm@378: ** Our Reading rlm@378: *** 2002 Visual features of intermediate complexity and their use in classification rlm@376: rlm@378: rlm@376: rlm@376: rlm@376: ** Getting around the dumb "fixed training set" methods rlm@376: rlm@376: *** 2006 Learning to classify by ongoing feature selection rlm@376: rlm@376: Brings in the most informative features of a class, based on rlm@376: mutual information between that feature and all the examples rlm@376: encountered so far. To bound the running time, he uses only a rlm@376: fixed number of the most recent examples. He uses a replacement rlm@376: strategy to tell whether a new feature is better than one of the rlm@376: corrent features. rlm@376: rlm@376: *** 2009 Learning model complexity in an online environment rlm@376: rlm@376: Sort of like the heirichal baysean models of Tennanbaum, this rlm@376: system makes the model more and more complicated as it gets more rlm@376: and more training data. It does this by using two systems in rlm@376: parallell and then whenever the more complex one seems to be rlm@376: needed by the data, the less complex one is thrown out, and an rlm@376: even more complex model is initialized in its place. rlm@376: rlm@376: He uses a SVM with polynominal kernels of varying complexity. He rlm@376: gets good perfoemance on a handwriting classfication using a large rlm@376: range of training samples, since his model changes complexity rlm@376: depending on the number of training samples. The simpler models do rlm@376: better with few training points, and the more complex ones do rlm@376: better with many training points. rlm@376: rlm@377: The final model had intermediate complexity between published rlm@377: extremes. rlm@377: rlm@376: The more complex models must be able to be initialized efficiently rlm@376: from the less complex models which they replace! rlm@376: rlm@376: rlm@376: ** Non Parametric Models rlm@376: rlm@377: *** 2010 The chains model for detecting parts by their context rlm@376: rlm@376: Like the constelation method for rigid objects, but extended to rlm@376: non-rigid objects as well. rlm@376: rlm@376: Allows you to build a hand detector from a face detector. This is rlm@376: usefull because hands might be only a few pixels, and very rlm@376: ambiguous in an image, but if you are expecting them at the end of rlm@376: an arm, then they become easier to find. rlm@376: rlm@377: They make chains by using spatial proximity of features. That way, rlm@377: a hand can be idntified by chaining back from the head. If there rlm@377: is a good chain to the head, then it is more likely that there is rlm@377: a hand than if there isn't. Since there is some give in the rlm@377: proximity detection, the system can accomodate new poses that it rlm@377: has never seen before. rlm@377: rlm@377: Does not use any motion information. rlm@377: rlm@377: *** 2005 A Hierarchical Non-Parametric Method for Capturing Non-Rigid Deformations rlm@377: rlm@377: (relative dynamic programming [RDP]) rlm@377: rlm@377: Goal is to match images, as in SIFT, but this time the images can rlm@377: be subject to non rigid transformations. They do this by finding rlm@377: small patches that look the same, then building up bigger rlm@377: patches. They get a tree of patches that describes each image, and rlm@377: find the edit distance between each tree. Editing operations rlm@377: involve a coherent shift of features, so they can accomodate local rlm@377: shifts of patches in any direction. They get some cool results rlm@377: over just straight correlation. Basically, they made an image rlm@377: comparor that is resistant to multiple independent deformations. rlm@377: rlm@377: !important small regions are treated the same as nonimportant rlm@377: small regions rlm@377: rlm@377: !no conception of shape rlm@377: rlm@377: quote: rlm@377: The dynamic programming procedure looks for an optimal rlm@377: transformation that aligns the patches of both images. This rlm@377: transformation is not a global transformation, but a composition rlm@377: of many local transformations of sub-patches at various sizes, rlm@377: performed one on top of the other. rlm@377: rlm@377: *** 2006 Satellite Features for the Classification of Visually Similar Classes rlm@377: rlm@377: Finds features that can distinguish subclasses of a class, by rlm@377: first finding a rigid set of anghor features that are common to rlm@377: both subclasses, then finding distinguishing features relative to rlm@377: those subfeatures. They keep things rigid because the satellite rlm@377: features don't have much information in and of themselves, and are rlm@377: only informative relative to other features. rlm@377: rlm@377: *** 2005 Learning a novel class from a single example by cross-generalization. rlm@377: rlm@377: Let's you use a vast visual experience to generate a classifier rlm@377: for a novel class by generating synthetic examples by replaceing rlm@377: features from the single example with features from similiar rlm@377: classes. rlm@377: rlm@377: quote: feature F is likely to be useful for class C if a similar rlm@377: feature F proved effective for a similar class C in the past. rlm@377: rlm@377: Allows you to trasfer the "gestalt" of a similiar class to a new rlm@377: class, by adapting all the features of the learned class that have rlm@378: correspondance to the new class. rlm@378: rlm@378: *** 2007 Semantic Hierarchies for Recognizing Objects and Parts rlm@378: rlm@378: Better learning of complex objects like faces by learning each rlm@378: piece (like nose, mouth, eye, etc) separately, then making sure rlm@378: that the features are in plausable positions.