Statistical Methods in Cancer Survival Prediction and Microbiome Data Analysis

Xinyan Zhang

Advisor(s)

Nengjun Yi

Committee Member(s)

Casey Morrow

Charity Morgan

Tomi Akinyemiju

Xiangqin Cui

Document Type

Dissertation

Date of Award

2017

Degree Name by School

Doctor of Philosophy (PhD) School of Public Health

Abstract

This dissertation focused on developing statistical methods in two areas, cancer survival prediction and microbiome data analysis. Heterogeneity in terms of tumor characteristics, prognosis, and survival has been a persistent problem in cancer prediction and prognosis for many decades. One of the main shortcomings of past studies is the failure to incorporate prior biological information into the predictive model, given strong evidence of pathway-based genetic nature of cancer. In paper 1, to address this problem, we propose a two-stage approach to incorporate pathway information into the prognostic modeling using large-scale gene expression data. In the first stage, we fit all predictors within each pathway using penalized Cox model and Bayesian hierarchical Cox model. In the second stage, we combine the leave-one-out cross-validated prognostic scores of all pathways obtained in the first stage as new predictors to build a super prediction model. The two-stage approach improved the prediction power substantially compared with gene-based approach. The advent of development in next-generation sequencing (NGS) technology enables researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating associations between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, including varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures or longitudinal designs, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome data. In paper 2, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for microbiome count data with clustered structures. The proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. In paper 3, we propose zero-inflated Gaussian mixed models (ZIGMMs) for detecting the association between the microbiome and host environmental/clinical factors for longitudinal microbiome data. The proposed ZIGMMs incorporate random effects to account for the dynamic correlation among samples into the zero-inflated Gaussian model. Both the mixed models are applicable in future research of predicting diseases with microbiome data.

ProQuest Publication Number

Document on ProQuest

ISBN

978-1-369-82626-5

Comments

etdadmin_upload_489936

Recommended Citation

Zhang, Xinyan, "Statistical Methods in Cancer Survival Prediction and Microbiome Data Analysis" (2017). All ETDs from UAB. 3444.
https://digitalcommons.library.uab.edu/etd-collection/3444

Download

Included in

Public Health Commons

COinS

Statistical Methods in Cancer Survival Prediction and Microbiome Data Analysis

Advisor(s)

Committee Member(s)

Document Type

Date of Award

Degree Name by School

Abstract

ProQuest Publication Number

ISBN

Comments

Recommended Citation

Included in

Search

Browse

Author Corner

Statistical Methods in Cancer Survival Prediction and Microbiome Data Analysis

Authors

Advisor(s)

Committee Member(s)

Document Type

Date of Award

Degree Name by School

Abstract

ProQuest Publication Number

ISBN

Comments

Recommended Citation

Included in

Share

Search

Browse

Author Corner