Book Review from Kybernetes 32, no. 9/10, pp. 1557-8, 2003

2D Object Detection and Recognition: Models, Algorithms and Networks

by Yali Amit

MIT Press, Cambridge, Massachusetts, 2002, xv + 306 pp., ISBN 0-262-01194-8, hardback, 26.95

The treatment here is somewhat more general than the title suggests, since attention is not restricted to images that are intrinsically two-dimensional, but includes those arising as projections of objects or scenes in three dimensions. This is a very thorough and practical review of methods for analysis of two-dimensional images, and hence of most work on computer vision, apart from schemes in which the visual input is supplemented by distance estimation by laser or ultrasonic means. Stereoscopic depth perception is not treated either, though methods are described for location of image features in ways that would seem to offer an excellent basis for stereoscopy based on paired images.

Images that are intrinsically two-dimensional include printed or hand-formed characters, as well as brain scans and other medical body scans. In the book special attention is given to brain scans.

There is an enormous literature on methods for the interpretation of visual images in terms of three-dimensional objects and scenes. Most of it is with the assumption that objects are rigid, even though there are important recognition tasks for which this does not hold, an obvious example being face recognition. The present book is very much concerned with recognition where there may be elastic deformation. Living tissue, as in the brain, is notoriously subject to deformation as well as variability from subject to subject and the analysis of brain scans has to accept this complication. Similar considerations apply to recognition of hand-formed characters, which are elastically deformed versions of prototypes.

Detection and recognition when there may be elastic deformation can be treated using Bayesian statistical methods. Given an input, or "instantiation", it is usually possible to estimate the probability with which it could have come from each of a number of prototypes, and Bayes' rule then allows posterior probabilities to be assigned to these. This becomes impracticable when the number of prototypes is large, and more economical alternatives are described, some employing Dynamic Programming. The treatment is entirely practical and attention is paid to computational complexity.

In reviews quoted on the back of the dust cover, the book is welcomed by leading workers in computer vision as presenting a novel synthesis of earlier strands of research, and as presenting efficient and well-motivated algorithms that have fundamental as well as practical implications for the study of the topic. This is a major contribution that will be a standard work of reference. Like other books from MIT Press it is well produced with useful illustrations and an extensive bibliography.

Computer programs implementing the methods can be downloaded from an associated website. They are in C++ and will compile under Linux. It is mentioned that familiarity with C++ is needed to understand the programs, and some familiarity with Unix will be needed to run them.

Like other studies in AI and Robotics, this work has implications for biological studies as well as for artefacts. In one of the chapters a possible neural network implementation is discussed in considerable detail, with the suggestion that its successive layers of processing may correspond to layers of the visual, or striate, cortex of the mammalian brain. The latter has been very extensively investigated following the pioneering work of Hubel and Wiesel. The visual system of humans and animals has a remarkable ability to remember a shape presented to it, and to recognise it subsequently irrespective, within limits, of size and position on the retina. Schemes involving exhaustive listing of all possible sizes and positions are clearly not feasible, but the method proposed here, based on detection of local features, offers a plausible economical solution in terms of a neural network with synaptic modification as postulated by Hebb. It undoubtedly merits consideration by neurophysiologists.

Alex M. Andrew

Back to Reviews