The microfeature extractor

Next: Maintaining Diversity Up: Pic1 Module Structure Previous: The search engine

The microfeature extractor

The above modules do not perform perhaps the most difficult aspect of this project when applied to real images rather than constructed drawings: how to generate the vector associated with each image. The extraction of features from the image into a vector value is orthogonal to Pic1, but it would be dishonest to merely assume it done. We contend simple microfeature analysis will prove sufficient to be useful within some domains like eyeballs, while expert-guided analysis may be necessary for complex domains like insects. However, Pic1 is still useful on constructed domains even without this analysis. More sophisticated data modeling methods (e.g. Griffioen et al 1993, Chang et al, 1988, Bach et al, 1993) could also be employed.

Initial work with generating microfeatures out of simple attributes of pictures (e.g. overall brightness, color properties, number of bright/dim pixels, number of pixels of various HSV attributes, ratios of such simple metrics for different areas of an image, etc) suggests they show promise in partitioning a set of images which is not highly heterogeneous and in which the similarities are not abstract properties.

The goal of the micro-feature extractor is to extract micro features and thus construct a feature space. Give an image, this extractor should be able to extract lots of information, which provides clues to the search engine in order to distinguish itself among others. In other words, the features extracted can be represented as its own 'signature', which is distinct in the vector space. So the search engine can use this signature as a basis to converge based on user intention.

Currently, we use over 8,500 microfeatures, generated mostly by generator programs. At least 6,000 of them can be used to distinguish any two images in our current collection of over 10,000 images. Because we are not trying to solve vision problems like image recognition, the features we extract do not deal with abstract information about objects like shape, shading, and so on. In fact, each feature only denotes a fairly small amount of information. Hence, the search will mostly rely on the search engine to find out what features are important from the current user's point of view. Examples of these microfeatures are shown below:

1.: Measure overall darkness, brightness, and color depth.
2.: Measure the differences between the central part and the boundary part of images. (Differences can be measured on the average color depth for RGB, brightness, HSV, and so on).
3.: Measure the ratio in certain parts of image that exceed some thresholds.
4.: Divide images into pieces, and measure the difference between the pieces.

Even with these rough, imprecise microfeatures, Pic1 still does a fairly good job identifying images, needing about ten selection cycles to find most pre-picked images in a database of over 10,000. Pic1 also can often distinguish between images in a more homogeneous environment, like faces, for example, but does not do as well. Such domains would likely require far more sophisticated image analyzers.

Next: Maintaining Diversity Up: Pic1 Module Structure Previous: The search engine

Gregory J. E. Rawlins
7/6/1998