All ETDs from UAB

Advisory Committee Chair

Nengjun Yi

Advisory Committee Members

Casey Morrow

Charity Morgan

Tomi Akinyemiju

Xiangqin Cui

Document Type


Date of Award


Degree Name by School

Doctor of Philosophy (PhD) School of Public Health


This dissertation focused on developing statistical methods in two areas, cancer survival prediction and microbiome data analysis. Heterogeneity in terms of tumor characteristics, prognosis, and survival has been a persistent problem in cancer prediction and prognosis for many decades. One of the main shortcomings of past studies is the failure to incorporate prior biological information into the predictive model, given strong evidence of pathway-based genetic nature of cancer. In paper 1, to address this problem, we propose a two-stage approach to incorporate pathway information into the prognostic modeling using large-scale gene expression data. In the first stage, we fit all predictors within each pathway using penalized Cox model and Bayesian hierarchical Cox model. In the second stage, we combine the leave-one-out cross-validated prognostic scores of all pathways obtained in the first stage as new predictors to build a super prediction model. The two-stage approach improved the prediction power substantially compared with gene-based approach. The advent of development in next-generation sequencing (NGS) technology enables researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating associations between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, including varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures or longitudinal designs, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome data. In paper 2, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for microbiome count data with clustered structures. The proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. In paper 3, we propose zero-inflated Gaussian mixed models (ZIGMMs) for detecting the association between the microbiome and host environmental/clinical factors for longitudinal microbiome data. The proposed ZIGMMs incorporate random effects to account for the dynamic correlation among samples into the zero-inflated Gaussian model. Both the mixed models are applicable in future research of predicting diseases with microbiome data.

Included in

Public Health Commons