My ongoing experiments on using opensource content-based image retrieval software for pornography detection have led me to outline a quick taxonomy of image retrieval approaches:
- Transmedia approaches are those in which the query is expressed in a non-image format:
- Images may be retrieved in terms of words occurring close to them in multimedia documents. The query is a set of keywords, and the text surrounding the image is searched for the keywords. Importantly, the text is implicit to the image (it is not specifically its caption, etc.). This is the way Google Images works (plus using the text in the URL).
- Images may be retrieved according to their metadata. Metadata is all information attached to the picture, generally not another image (apart from thumbnails), and may include date, author, place (in fact, many pictures in Google's Picasa are geotagged), keywords, a textual description (caption), filename/path/URL, etc. The metadata are explicit, and most often stored in a database (although modern image formats support storng metadata within the image itself).
- Some metadata may be assigned automatically, as keywords are in applications like ALIPR. In this application, a set of images have been manually annotated with textual concepts, and then processed to build a model of the concepts using image processing and machine learning. Incoming images are automatically tagged. The tags or concepts are used for retrieval.
- Single media approaches, in which the query is an image itself:
- Light processing approaches make use of nearly trivial image properties, as their size.
- Heavy processing approaches make use of deep image processing techniques involving the analysis of colors, shapes, textures, etc., and sounded mathematical analysis as wavelet transform. Lire and imgSeek (e.g. Flikr Suggestions demo) are instances of these class.
This taxonomy is far from complete or accurate, and express my own view of the topic. So, I greatly appreciate comments and corrections.