All ETDs from UAB

Advisor(s)

Qing Tian

Committee Member(s)

Chengcui Zhang

Baocheng Geng

Tianyang Wang

Rachel J Smith

School

College of Arts and Sciences

Document Type

Dissertation

Department (new version)

Computer and Information Sciences

Date of Award

4-16-2025

Degree Name by School

Doctor of Philosophy (PhD) College of Arts and Sciences

Abstract

Object detection is a critical component of autonomous driving, requiring real-time, robust perception to ensure safety. However, state-of-the-art deep neural network object detectors typically incur high computational cost and memory footprint, hindering their deployment in resource-constrained environments such as self-driving vehicles. This dissertation addresses the need for efficient yet accurate detectors by leveraging knowledge distillation (KD), a model compression technique that transfers knowledge from a high-capacity teacher model to a lightweight student model. While KD has seen success in image classification, its application to object detection poses unique challenges due to multiple instances per image and complex output structures. To overcome these challenges, this thesis presents five novel KD frameworks tailored for object detection. First, Adaptive Instance Distillation (AID) selectively weights the distillation of each object instance based on the teacher’s prediction confidence, enabling the student to focus on reliably learned knowledge. Second, Multi-Teacher AID (MAID) aggregates complementary knowledge from multiple teachers to provide richer supervision for the student. Third, Gradient-Guided KD (GKD) leverages teacher gradient information to prioritize critical features that most impact the detection loss, thereby guiding the student to imitate the most pertinent representations. Fourth, CLoCKDistill (Consistent Location-and-Context-aware KD) addresses transformer-based detectors (DETRs) by distilling global context from the teacher’s transformer encoder and aligning teacher–student attention on object locations for more effective knowledge transfer. Finally, ACAM-KD (Adaptive and Cooperative Attention Masking for KD) introduces an interactive distillation process, in which student–teacher feature maps are adaptively fused via cross-attention and dynamically masked to highlight important spatial and channel-wise information. Extensive experiments on benchmark datasets (e.g., KITTI and COCO) demonstrate the efficacy of these approaches. The proposed techniques consistently boost detection accuracy (achieving up to 6% mAP gains) while substantially reducing model complexity. The resulting student models – including one-stage, two-stage, and transformer-based detectors – attain performance comparable to much larger teachers at a fraction of the computation, contributing to the development of scalable and deployable vision systems.

ProQuest Publication Number

31939597

ProQuest ID

3253956294

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.