Advisor(s)
Qing Tian
Committee Member(s)
Chengcui Zhang
Baocheng Geng
Tianyang Wang
Rachel J Smith
School
College of Arts and Sciences
Document Type
Dissertation
Department (new version)
Computer and Information Sciences
Date of Award
4-16-2025
Degree Name by School
Doctor of Philosophy (PhD) College of Arts and Sciences
Abstract
Object detection is a critical component of autonomous driving, requiring real-time, robust perception to ensure safety. However, state-of-the-art deep neural network object detectors typically incur high computational cost and memory footprint, hindering their deployment in resource-constrained environments such as self-driving vehicles. This dissertation addresses the need for efficient yet accurate detectors by leveraging knowledge distillation (KD), a model compression technique that transfers knowledge from a high-capacity teacher model to a lightweight student model. While KD has seen success in image classification, its application to object detection poses unique challenges due to multiple instances per image and complex output structures. To overcome these challenges, this thesis presents five novel KD frameworks tailored for object detection. First, Adaptive Instance Distillation (AID) selectively weights the distillation of each object instance based on the teacher’s prediction confidence, enabling the student to focus on reliably learned knowledge. Second, Multi-Teacher AID (MAID) aggregates complementary knowledge from multiple teachers to provide richer supervision for the student. Third, Gradient-Guided KD (GKD) leverages teacher gradient information to prioritize critical features that most impact the detection loss, thereby guiding the student to imitate the most pertinent representations. Fourth, CLoCKDistill (Consistent Location-and-Context-aware KD) addresses transformer-based detectors (DETRs) by distilling global context from the teacher’s transformer encoder and aligning teacher–student attention on object locations for more effective knowledge transfer. Finally, ACAM-KD (Adaptive and Cooperative Attention Masking for KD) introduces an interactive distillation process, in which student–teacher feature maps are adaptively fused via cross-attention and dynamically masked to highlight important spatial and channel-wise information. Extensive experiments on benchmark datasets (e.g., KITTI and COCO) demonstrate the efficacy of these approaches. The proposed techniques consistently boost detection accuracy (achieving up to 6% mAP gains) while substantially reducing model complexity. The resulting student models – including one-stage, two-stage, and transformer-based detectors – attain performance comparable to much larger teachers at a fraction of the computation, contributing to the development of scalable and deployable vision systems.
ProQuest Publication Number
31939597
ProQuest ID
3253956294
Recommended Citation
Lan, Qizhen, "Knowledge Distillation for Efficient Object Detection: Toward Scalable and Deployable Vision Models" (2025). All ETDs from UAB. 7235.
https://digitalcommons.library.uab.edu/etd-collection/7235