Advisory Committee Chair
Hemant K Tiwari
Advisory Committee Members
Devin M Absher
Timothy M Beasley
Xiangqin Cui
Marguerite R Irvin
Degui Zhi
Document Type
Dissertation
Date of Award
2015
Degree Name by School
Doctor of Philosophy (PhD) School of Public Health
Abstract
The study of epigenetics involves the investigation of genomic elements that do not directly affect DNA sequence but are stable and preserved during cell division. One of the most commonly studied epigenetic elements is DNA methylation, which occurs when a methyl group is added to a cytosine residue in the DNA sequence. The advancement of technology to quantify DNA methylation, including bisulfite sequencing and bisulfite microarrays, has led to a large amount of available data. However, many questions still remain as to how to statistically analyze this data. This dissertation addresses some statistical concerns related to the analysis of DNA methylation data. First, we provide a new method for estimating the cell-type makeup of DNA methylation from blood samples, including subtypes of T and B cells. This is important because DNA methylation varies markedly by cell type, and understanding the cell-type makeup of a sample is important for understanding the DNA methylation of that sample and controlling for confounding in association studies. Next, we examine the heritability of DNA methylation of CD4+ T cells in families, determining the number of CpG sites for which DNA methylation is highly heritable and assessing the association of the sites with genotype. We find that most highly heritable CpG methylation is associated with genotype at nearby SNPs. However, some highly heritable methylation is not strongly associated with genotype, and these sites have several features in common, most interestingly association with immune-related genes. Finally, we compare statistical approaches for association studies using DNA methylation data, including models testing a single CpG at a time and penalized regression models that include many CpGs in a single model. We confirm results of previous studies, finding classical linear regression models outperform robust linear regression models and beta regression models in single CpG association tests. Additionally, we discover that penalized regression models, specifically elastic net models, outperform single CpG linear regression models in terms of false positive rate, true positive rate, and computational efficiency in many contexts, especially for small sample sizes.
Recommended Citation
Jones, Lindsay Leigh Waite, "Statistical Methodology to Improve the Understanding of DNA Methylation Data" (2015). All ETDs from UAB. 2068.
https://digitalcommons.library.uab.edu/etd-collection/2068