Iterative Data Augmentation For Enhancing Deep Learning Performance With Limited Training Data

Avinash Singh, University of Alabama at Birmingham

Advisor(s)

Arie Nakhmani

Committee Member(s)

Dinh Nguyen
Leon Jololian
Nikolay Sirakov
Rachel Smith

Document Type

Dissertation

Date of Award

1-27-2026

Degree Name

Doctor of Philosophy (PhD)

School

School of Engineering

Department

Computer Engineering

Abstract

Deep learning models have demonstrated impressive performance across different domains; however, their effectiveness heavily depends on large, well annotated datasets. In practice, data are often limited in size, leading to overfitting, poor generalization, and degraded model robustness and performance. Moreover, conventional augmentation techniques are typically static in nature, lack adaptability during training, and can produce geometrically inconsistent or unrealistic mixed images. This dissertation addresses three major challenges in data augmentation and model generalization: (1) the scarcity of labeled data and limited dataset size, (2) the absence of adaptive mechanisms for dynamically adjusting learning parameters during training, and (3) the creation of unnatural or misaligned composite images during pixel-level mixing. To overcome these limitations, three novel augmentation frameworks are proposed, Iterative Data Distillation and Augmentation (IDDA), Adaptive Data Augmentation (ADA), and Feature-Aligned Mixing Enhancement (FAME), each designed to solve one of these critical problems while collectively advancing data-efficient deep learning. In the first part of dissertation, IDDA introduces an iterative data expansion approach aimed at improving performance when only limited training data are available. This method leverages the concept of image distillation, drawing upon the principles of knowledge distillation to construct compact, high quality distilled datasets. Kernel Inducing Point (KIP) technique is employed to transform the chosen dataset into a distilled subset, which is subsequently merged with the original dataset. This distillation process is repeated iteratively until the model accuracy begins to saturate, enabling progressive dataset enrichment while preventing redundancy. Experimental evaluations on Digit-MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 demonstrate that IDDA can expand dataset size up to fourfold and substantially enhance classification accuracy. The proposed iterative augmentation approach proves especially effective for small datasets, achieving improved generalization and reduced overfitting compared to traditional augmentation techniques. The second part of dissertation, we present ADA, a dynamic and feedback-driven augmentation framework that addresses the limitations of static data augmentation by introducing adaptive learning parameter updates during training. ADA incorporates a new and novel Elliptical Cutout method that performs smooth, realistic occlusion by masking regions with soft elliptical boundaries, preserving spatial continuity and reducing information loss. Unlike conventional augmentation techniques that apply fixed transformations, ADA automatically modifies key training hyperparameters such as batch size, dropout rate, and weight decay in response to stagnation in validation accuracy. When validation accuracy fails to improve over multiple epochs, these parameters are adjusted to promote generalization and mitigate overfitting. Experimental validation across CIFAR-10, Brain MRI, and Chest X-Ray datasets demonstrates that ADA significantly improves classification accuracy and model robustness. The proposed method achieves Area Under the Receiver Operating Characteristic Curve (AUC-ROC) values ranging from 98.19% to 99.85%, consistently outperforming conventional augmentation methods such as MixUp while maintaining computational efficiency. In the third part of dissertation, FAME addresses the challenge of geometric distortion and ghosting artifacts in existing pixel-level mixing methods like MixUp and CutMix. FAME introduces a geometry-aware, patch-wise mixing strategy that preserves object structure and visual realism in synthesized training samples. The method utilizes keypoint detection and descriptor matching to identify corresponding regions between two same-class images. For each reliable keypoint match, a local patch from the secondary image is extracted, geometrically aligned to the primary image through translation, rotation, and scale adjustments, and then smoothly blended using a Gaussian-weighted mask. This ensures spatial and semantic consistency in the augmented images while retaining the primary image’s class label. FAME was evaluated on CIFAR-10, CIFAR-100, and Brain Tumor MRI datasets using multiple pretrained architectures VGG16, MobileNet, DenseNet121, and ResNet101 and consistently improved classification accuracy compared to state-of-the-art augmentation methods. The generated composite images exhibit natural appearance, enhanced structural coherence, and reduced artifacts, confirming the superiority of geometry-aligned feature mixing over pixel-level blending. The three proposed methods address the key challenges of limited data availability, non-adaptive learning, and geometric inconsistency in image synthesis. Together, they advance the field of data-efficient deep learning by enabling robust, scalable, and generalizable models that perform reliably even in data-limited environments.

Keywords

Data Augmentation;Deep Learning;Geometry-Aware Mixing;Image Classification;Iterative Data Distillation;Machine Learning

ProQuest Publication Number

32282722

ProQuest ID

https://proquest.com/docview/3297388634

ISBN

9798273349728

Recommended Citation

Singh, Avinash, "Iterative Data Augmentation For Enhancing Deep Learning Performance With Limited Training Data" (2026). ETDs from 2020-2029. 55.
https://digitalcommons.library.uab.edu/etd-2020s/55

Download

Included in

Computer Engineering Commons

COinS

Iterative Data Augmentation For Enhancing Deep Learning Performance With Limited Training Data

Advisor(s)

Committee Member(s)

Document Type

Date of Award

Degree Name

School

Department

Abstract

Keywords

ProQuest Publication Number

ProQuest ID

ISBN

Recommended Citation

Included in

Search

Browse

Author Corner

Iterative Data Augmentation For Enhancing Deep Learning Performance With Limited Training Data

Authors

Advisor(s)

Committee Member(s)

Document Type

Date of Award

Degree Name

School

Department

Abstract

Keywords

ProQuest Publication Number

ProQuest ID

ISBN

Recommended Citation

Included in

Share

Search

Browse

Author Corner