Advisory Committee Chair
Chencui Zhang
Advisory Committee Members
Barrett R Bryant
Yuhua Song
Alan Sprague
Robert W Thacker
Document Type
Dissertation
Date of Award
2008
Degree Name by School
Doctor of Philosophy (PhD) College of Arts and Sciences
Abstract
The research described in this dissertation has proposed a human-centered retrieval framework that can automatically retrieve multimedia data based on their semantic content. In particular, the framework queries and searches images and videos in a multimedia database according to their visual content. In order for computers to understand the semantic contents of images and videos, human guidance is necessary. By incorporating the user’s Relevance Feedback (RF) on the retrieval results into the learning and retrieval mechanism, the semantic gap between humans and computers can be gradually bridged. High-dimensional feature vectors of multimedia data can cause a dramatic increase in computation time. This is known as the “Curse of Dimensionality.” To alleviate this problem, clustering algorithms are designed to reduce the search space for retrieval and thus reduce the time complexity. In addition, in order to facilitate the query and retrieval of video data, a multimedia database model is designed according to the spatiotemporal nature of video data. The proposed framework in this research is composed of three major components – Interactive Content-based Image Retrieval (CBIR), Semantic Video Retrieval, and Spatiotemporal Multimedia Database Model. The Interactive CBIR component successfully maps the region-based image retrieval problem to a Multiple Instance Learning (MIL) problem. A distance-based clustering algorithm and a semantic-based clustering algorithm are designed to reduce the search space. This component supports both short term and long term learning. The Semantic Video Retrieval component emphasizes the study of spatiotemporal characteristics and iii relations among semantic objects in videos. Traffic incidents in transportation surveillance videos and abnormal human interactions in indoor surveillance videos are used as case studies. The proposed work designs and implements a semantic event retrieval system for intelligent surveillance systems. The technique of RF plays a key role in the retrieval process. Various spatiotemporal event models and learning mechanisms are designed and tested. In addition, since the application for surveillance video database retrieval is a focus of interest in this research, an efficient conceptual Spatiotemporal Multimedia Database model is designed to facilitate the query of user-interested spatiotemporal events. A case study on the proposed database model is provided using transportation surveillance videos. In brief, the human-centered multimedia retrieval system proposed in this research focuses on alleviating the above-mentioned two problems – the “Semantic Gap” and the “Curse of Dimensionality.” The Interactive Region-based Image Retrieval component and the Semantic Video Retrieval component both explore the use of RF in the learning and retrieval phase to solve the problem of “Semantic Gap.” These two components also integrate RF with MIL to ease the burden of users in providing feedback on the retrieval results. To alleviate the “Curse of Dimensionality” problem, semantic clustering algorithms are designed and implemented which consider both the low-level features of multimedia data and the high-level human perceptions. In order to facilitate the query and retrieval, a spatiotemporal multimedia database model is proposed to provide an efficient indexing scheme. The experimental results for each individual component are presented. Comparisons with related work are conducted, showing the effectiveness of the proposed framework.
Recommended Citation
Chen, Xin, "Human-Centered Semantic Retrieval In Multimedia Databases" (2008). All ETDs from UAB. 3681.
https://digitalcommons.library.uab.edu/etd-collection/3681