Knowledge Distillation for Efficient Object Detection: Toward Scalable and Deployable Vision Models

Qizhen Lan, University Of Alabama At Birmingham

Advisor(s)

Qing Tian

Committee Member(s)

Chengcui Zhang

Baocheng Geng

Tianyang Wang

Rachel J Smith

School

College of Arts and Sciences

Document Type

Dissertation

Department (new version)

Computer and Information Sciences

Date of Award

4-16-2025

Degree Name by School

Doctor of Philosophy (PhD) College of Arts and Sciences

Abstract

Object detection is a critical component of autonomous driving, requiring real-time, robust perception to ensure safety. However, state-of-the-art deep neural network object detectors typically incur high computational cost and memory footprint, hindering their deployment in resource-constrained environments such as self-driving vehicles. This dissertation addresses the need for efficient yet accurate detectors by leveraging knowledge distillation (KD), a model compression technique that transfers knowledge from a high-capacity teacher model to a lightweight student model. While KD has seen success in image classification, its application to object detection poses unique challenges due to multiple instances per image and complex output structures. To overcome these challenges, this thesis presents five novel KD frameworks tailored for object detection. First, Adaptive Instance Distillation (AID) selectively weights the distillation of each object instance based on the teacher’s prediction confidence, enabling the student to focus on reliably learned knowledge. Second, Multi-Teacher AID (MAID) aggregates complementary knowledge from multiple teachers to provide richer supervision for the student. Third, Gradient-Guided KD (GKD) leverages teacher gradient information to prioritize critical features that most impact the detection loss, thereby guiding the student to imitate the most pertinent representations. Fourth, CLoCKDistill (Consistent Location-and-Context-aware KD) addresses transformer-based detectors (DETRs) by distilling global context from the teacher’s transformer encoder and aligning teacher–student attention on object locations for more effective knowledge transfer. Finally, ACAM-KD (Adaptive and Cooperative Attention Masking for KD) introduces an interactive distillation process, in which student–teacher feature maps are adaptively fused via cross-attention and dynamically masked to highlight important spatial and channel-wise information. Extensive experiments on benchmark datasets (e.g., KITTI and COCO) demonstrate the efficacy of these approaches. The proposed techniques consistently boost detection accuracy (achieving up to 6% mAP gains) while substantially reducing model complexity. The resulting student models – including one-stage, two-stage, and transformer-based detectors – attain performance comparable to much larger teachers at a fraction of the computation, contributing to the development of scalable and deployable vision systems.

ProQuest Publication Number

31939597

ProQuest ID

3253956294

Recommended Citation

Lan, Qizhen, "Knowledge Distillation for Efficient Object Detection: Toward Scalable and Deployable Vision Models" (2025). All ETDs from UAB. 7235.
https://digitalcommons.library.uab.edu/etd-collection/7235

Download

Included in

Computer Engineering Commons

COinS

Knowledge Distillation for Efficient Object Detection: Toward Scalable and Deployable Vision Models

Advisor(s)

Committee Member(s)

School

Document Type

Department (new version)

Date of Award

Degree Name by School

Abstract

ProQuest Publication Number

ProQuest ID

Recommended Citation

Included in

Search

Browse

Author Corner

Knowledge Distillation for Efficient Object Detection: Toward Scalable and Deployable Vision Models

Authors

Advisor(s)

Committee Member(s)

School

Document Type

Department (new version)

Date of Award

Degree Name by School

Abstract

ProQuest Publication Number

ProQuest ID

Recommended Citation

Included in

Share

Search

Browse

Author Corner