Simulation Of Correlated Variables, Mixture Distributions, And Repeated Measures And Comparison Of Pharmacogenetic Prediction With Machine Learning Methods

Allison Cynthia Fialkowski

Advisor(s)

Hemant K Tiwari

Committee Member(s)

Timothy M Beasley

Marguerite Ryan Irvin

Charles R Katholi

Nita A Limdi

Nengjun Yi

Document Type

Dissertation

Date of Award

2018

Degree Name by School

Doctor of Philosophy (PhD) School of Public Health

Abstract

Pharmacogenetics aims to improve individualized medicine by combining genetic information with clinical data when assigning drug treatments. Different diseases are often associated with different factors, each with its own distributional characteristics. For rare or complex diseases, these characteristics may not be well understood and standard analysis methods may not be appropriate. In psychiatric disorders, for example, multiple drug treatments need to be compared across small sample sizes with heterogeneous patient populations, making implementation of standard methods difficult. Simulating data that mirrors clinical data is important because it allows investigators to determine which method is best suited for data from their own patients and perform power calculations that accurately account for a given correlation structure. In addition, investigators may wish to compare dosing algorithms on data that is similar to their data but varies across specified factors. The simulated datasets can contain multiple types of variables with outcomes observed at a single time point or over multiple time points. This dissertation creates three new R programming packages for the simulation of real clinical and genetic datasets (plasmodes). The SimMultiCorrData package generates correlated continuous (normal or non-normal distributions using either Fleishman (1978)'s third or Headrick (2002)'s fifth-order power method transformation), binary, ordinal, Poisson, and Negative Binomial variables. The SimCorrMix package adds continuous mixture and zero-inflated Poisson and Negative Binomial variables. The SimRepeat package simulates correlated systems of statistical equations which represent repeated measures or clustered data. These systems either contain all continuous variables, extending Headrick and Beasley (2004)'s techniques, or contain multiple types of variables, based on the hierarchical linear models approach. All three packages offer two simulation pathways that provide greater accuracy under different parameter ranges. We use SimCorrMix to simulate psychiatric genetic data including single nucleotide polymorphism variants. We compare machine learning methods (elastic net penalized regression and multivariate adaptive regression splines) and Bayesian lasso regression for three different numbers of drug treatment groups, sample sizes per treatment group, and treatment effect sizes. We show that elastic net models outperform the other models in terms of predictive performance and estimation of drug treatment effects.

ProQuest Publication Number

Document on ProQuest

ISBN

978-0-438-34197-5

Comments

etdadmin_upload_594351

Recommended Citation

Fialkowski, Allison Cynthia, "Simulation Of Correlated Variables, Mixture Distributions, And Repeated Measures And Comparison Of Pharmacogenetic Prediction With Machine Learning Methods" (2018). All ETDs from UAB. 1643.
https://digitalcommons.library.uab.edu/etd-collection/1643

Download

Included in

Public Health Commons

COinS

Simulation Of Correlated Variables, Mixture Distributions, And Repeated Measures And Comparison Of Pharmacogenetic Prediction With Machine Learning Methods

Advisor(s)

Committee Member(s)

Document Type

Date of Award

Degree Name by School

Abstract

ProQuest Publication Number

ISBN

Comments

Recommended Citation

Included in

Search

Browse

Author Corner

Simulation Of Correlated Variables, Mixture Distributions, And Repeated Measures And Comparison Of Pharmacogenetic Prediction With Machine Learning Methods

Authors

Advisor(s)

Committee Member(s)

Document Type

Date of Award

Degree Name by School

Abstract

ProQuest Publication Number

ISBN

Comments

Recommended Citation

Included in

Share

Search

Browse

Author Corner