Open-Vocabulary Scene Graph Generation With Compositional Prompt-Tuning

Mirza Tanzim Sami, University of Alabama at Birmingham

Advisory Committee Chair

Da Yan

Advisory Committee Members

Chengcui Zhang

Tianyang Wang

Sidharth Kumar

Kai Zhao

Zhe Jiang

Document Type

Dissertation

Date of Award

2023

Degree Name by School

Doctor of Philosophy (PhD) College of Arts and Sciences

Abstract

Scene graphs are explicit data structures that express the semantics of an image. It is semantically richer than simple image encodings from Convolutional Neural Networks (CNNs) because objects (big or small) are represented as distinct nodes and the relationship between them are expressed as labeled edges in a graph structure. Thus, scene graphs can express not only the visual context of an image but also the relationship context between pairs of objects. This added context information makes scene graphs ideal for several downstream visual language (VL) tasks such as Visual Question Answering (VQA), image captioning, text to image retrieval, and more. Even though there has been a surge in research toward the development of scene graph generation (SGG) models in recent years, they are some severe drawbacks that prevent scene graphs from being widely adopted for VL tasks. These drawbacks are namely (1) the semantic gap between Vision and Language modalities that limits the compositional performance. (2) Most scene graph generation models are limited to a closed set of object and relationship labels that limits their real-life application (3) The quality of pairwise object relationship/edge predictions that suffers from a biased long-tailed distribution. This work describes a novel Open-vocabulary Scene Graph Generation model with Compositional Prompt Tuning (OCoPro) that enforces compositionality through a Cross-modal Composition Refinement (CCR) module and uses compositional prompts to efficiently predict open vocabulary object and relationship labels, not limited to closed-set training data. Thus, significantly improving the quality and robustness of scene-graphs in real-world applications. Empirical results prove that the novel (OCoPro) model out-performs state-of-the-art scene graph generation models in different SGG evaluation protocols.

Recommended Citation

Sami, Mirza Tanzim, "Open-Vocabulary Scene Graph Generation With Compositional Prompt-Tuning" (2023). All ETDs from UAB. 3523.
https://digitalcommons.library.uab.edu/etd-collection/3523

Download

Included in

Arts and Humanities Commons

COinS

Open-Vocabulary Scene Graph Generation With Compositional Prompt-Tuning

Advisory Committee Chair

Advisory Committee Members

Document Type

Date of Award

Degree Name by School

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Open-Vocabulary Scene Graph Generation With Compositional Prompt-Tuning

Authors

Advisory Committee Chair

Advisory Committee Members

Document Type

Date of Award

Degree Name by School

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner