All ETDs from UAB

Advisory Committee Chair

Hemant K Tiwari

Advisory Committee Members

Timothy M Beasley

Marguerite Ryan Irvin

Charles R Katholi

Nita A Limdi

Nengjun Yi

Document Type


Date of Award


Degree Name by School

Doctor of Philosophy (PhD) School of Public Health


Pharmacogenetics aims to improve individualized medicine by combining genetic information with clinical data when assigning drug treatments. Different diseases are often associated with different factors, each with its own distributional characteristics. For rare or complex diseases, these characteristics may not be well understood and standard analysis methods may not be appropriate. In psychiatric disorders, for example, multiple drug treatments need to be compared across small sample sizes with heterogeneous patient populations, making implementation of standard methods difficult. Simulating data that mirrors clinical data is important because it allows investigators to determine which method is best suited for data from their own patients and perform power calculations that accurately account for a given correlation structure. In addition, investigators may wish to compare dosing algorithms on data that is similar to their data but varies across specified factors. The simulated datasets can contain multiple types of variables with outcomes observed at a single time point or over multiple time points. This dissertation creates three new R programming packages for the simulation of real clinical and genetic datasets (plasmodes). The SimMultiCorrData package generates correlated continuous (normal or non-normal distributions using either Fleishman (1978)'s third or Headrick (2002)'s fifth-order power method transformation), binary, ordinal, Poisson, and Negative Binomial variables. The SimCorrMix package adds continuous mixture and zero-inflated Poisson and Negative Binomial variables. The SimRepeat package simulates correlated systems of statistical equations which represent repeated measures or clustered data. These systems either contain all continuous variables, extending Headrick and Beasley (2004)'s techniques, or contain multiple types of variables, based on the hierarchical linear models approach. All three packages offer two simulation pathways that provide greater accuracy under different parameter ranges. We use SimCorrMix to simulate psychiatric genetic data including single nucleotide polymorphism variants. We compare machine learning methods (elastic net penalized regression and multivariate adaptive regression splines) and Bayesian lasso regression for three different numbers of drug treatment groups, sample sizes per treatment group, and treatment effect sizes. We show that elastic net models outperform the other models in terms of predictive performance and estimation of drug treatment effects.

Included in

Public Health Commons



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.