| :: [reports] Book Chapter: Multimedia Systems: Content Based Indexing and Retrieval :: | ||||
| HOME |
|
Contributed book chapter for IEEE handbook.
I will update the entire reference as soon as it is available.
ABSTRACT:
---------
Multimedia data, such as text, audio, images and video, is rapidly evolving as the main form for the creation, exchange, and storage of information in the modern era. Primarily, this is attributed to rapid advances in the three major technologies that determine its growth: VLSI technology that is producing greater processing power; broad-band networks (ISDN, ATM, etc) that are providing much higher bandwidth for many practical applications, and multimedia compression standards (JPEG, H.263, MPEG, MP3, etc) that enable efficient storage and communication. The combination of these three advances is spurring the creation and processing of increasingly high-volume multimedia data, along with its efficient compression and transmission over high-bandwidth networks. This current trend towards the removal of any conceivable bottleneck in using multimedia and its impact on a whole spectrum of users, from advanced research organizations to home users, has led to the explosive growth of visual information available in the form of digital libraries and online multimedia archives. According to a press release by Google Inc. in December 2001, the search engine offers access to over 3 billion web documents and its Image search comprises more that 330 million images. AltaVista has been serving around 25 million search queries per day in more than 25 languages, with its multimedia search featuring over 45 million images, videos and audio clips. This explosive growth of multimedia data accessible to users poses a whole new set of challenges relating to its storage and retrieval. The current technology of text-based indexing and retrieval implemented for relational databases does not provide practical solutions for this problem of managing huge multimedia repositories. Most of the commercially available multimedia indexing and search systems index the media based on keyword annotations and use standard text based indexing and retrieval mechanisms to store and retrieve multimedia data. There are often many limitations with this method of keywords based indexing and retrieval especially in the context of multimedia databases. First, it is often difficult to describe with human languages the content of a multimedia object, for example an image having complicated texture patterns. Second, manual annotation of text phrases for a large database is prohibitively laborious in terms of time and effort. Third, since users may have different interests in the same multimedia object, it is difficult to describe it with a complete set of key words. Finally, even if all relevant object characteristics are annotated, difficulty may still arise due to the use of different indexing languages or vocabularies by different users. As recently as in 1990?s, these major drawbacks of searching visual media based on textual annotations were recognized to be unavoidable and this prompted a surging increase in interest in content-based solutions [16]. In content-based retrieval, manual annotation of visual media is avoided and indexing and retrieval is instead performed on the basis of media content itself. There have been extensive studies on the design of automatic content-based indexing and retrieval (CBIR) systems. For visual media these contents may include, color, shape, texture, motion, etc. For audio/speech data contents may include phonemes, pitch, rhythm, cepstral coefficients, etc. Studies of human visual perception indicate that there exists a gradient of sophistication in human perception, ranging from seemingly primitive inferences of shapes, textures, colors, etc. to complex notions of structures such as chairs, buildings, affordances, and to cognitive processes such as recognition of emotions and feelings. Given the multidisciplinary nature of the techniques for modeling, indexing and retrieval of multimedia data, efforts from many different communities of engineering, computer science and psychology have merged in the advancement of CBIR systems. But the field is still in its infancy and calls for more coherent efforts to make practical CBIR systems a reality. In particular, robust techniques are needed to develop semantically rich models to represent data, computationally efficient methods to compress, index, retrieve, and browse the information, and semantic visual interfaces integrating the above components into viable multimedia systems. This chapter presents a review of the state of the art research in the area of multimedia systems. In Section 2, we present a review of storage and coding techniques for different media types. Section 3 studies fundamental issues related to the representation of multimedia data and discusses salient indexing and retrieval approaches introduced in the literature. For the sake of compactness and focus, in this chapter we review only CBIR techniques for visual data, i.e. for images and videos, for the review of systems for audio data readers are referred to [28], [15]. |
Attachment:
EE-Handbook-Chapter-Indexing.pdf
Description: Adobe PDF document