Advisory Committee Chair
Hemant K Tiwari
Advisory Committee Members
Timothy M Beasley
Marguerite Ryan Irvin
Charles R Katholi
Nita A Limdi
Nengjun Yi
Document Type
Dissertation
Date of Award
2018
Degree Name by School
Doctor of Philosophy (PhD) School of Public Health
Abstract
Pharmacogenetics aims to improve individualized medicine by combining genetic information with clinical data when assigning drug treatments. Different diseases are often associated with different factors, each with its own distributional characteristics. For rare or complex diseases, these characteristics may not be well understood and standard analysis methods may not be appropriate. In psychiatric disorders, for example, multiple drug treatments need to be compared across small sample sizes with heterogeneous patient populations, making implementation of standard methods difficult. Simulating data that mirrors clinical data is important because it allows investigators to determine which method is best suited for data from their own patients and perform power calculations that accurately account for a given correlation structure. In addition, investigators may wish to compare dosing algorithms on data that is similar to their data but varies across specified factors. The simulated datasets can contain multiple types of variables with outcomes observed at a single time point or over multiple time points. This dissertation creates three new R programming packages for the simulation of real clinical and genetic datasets (plasmodes). The SimMultiCorrData package generates correlated continuous (normal or non-normal distributions using either Fleishman (1978)'s third or Headrick (2002)'s fifth-order power method transformation), binary, ordinal, Poisson, and Negative Binomial variables. The SimCorrMix package adds continuous mixture and zero-inflated Poisson and Negative Binomial variables. The SimRepeat package simulates correlated systems of statistical equations which represent repeated measures or clustered data. These systems either contain all continuous variables, extending Headrick and Beasley (2004)'s techniques, or contain multiple types of variables, based on the hierarchical linear models approach. All three packages offer two simulation pathways that provide greater accuracy under different parameter ranges. We use SimCorrMix to simulate psychiatric genetic data including single nucleotide polymorphism variants. We compare machine learning methods (elastic net penalized regression and multivariate adaptive regression splines) and Bayesian lasso regression for three different numbers of drug treatment groups, sample sizes per treatment group, and treatment effect sizes. We show that elastic net models outperform the other models in terms of predictive performance and estimation of drug treatment effects.
Recommended Citation
Fialkowski, Allison Cynthia, "Simulation Of Correlated Variables, Mixture Distributions, And Repeated Measures And Comparison Of Pharmacogenetic Prediction With Machine Learning Methods" (2018). All ETDs from UAB. 1643.
https://digitalcommons.library.uab.edu/etd-collection/1643