Advisory Committee Chair
Hemant Tiwari
Advisory Committee Members
Zechen Chong
Robert R Kimberly
Elliot J Lefkowitz
Alexander F Rosenberg
Document Type
Dissertation
Date of Award
2022
Degree Name by School
Doctor of Philosophy (PhD) Heersink School of Medicine
Abstract
Structural variants (SVs) contribute to genomic diversity and play pathogenic roles in a wide range of genetic disorders. Accurate characterization of SVs is critical for genomic research and studies of disease mechanisms. The rapid development of Third-Generation Sequencing (TGS) technologies has largely increased sequencing read length compared to Next-Generation Sequencing (NGS), bringing both great potentials and challenges in SV discovery through alignment-based and assembly-based approaches. In order to take full advantage of TGS data, I have developed a suite of bioinformatics tools focusing on comprehensive characterization of SVs. For the alignment-based SV discovery, I have developed DeBreak to identify SVs directly from long-read alignments. With the implanted density-based clustering algorithm and breakpoint refinement method, DeBreak can accurately identify SVs with precise breakpoint locations in both simulated and real datasets. When compared to the assembly-based SV callsets, DeBreak showed highest consistency among the four tested alignment-based SV callers. For the assembly-based SV discovery, I have developed Inspector to assess and improve the quality of whole-genome de novo assembly results. Inspector achieved highest accuracy in reporting both small-scale and larger assembly errors among the three tested assembly evaluation tools on simulated datasets. When applied on the assemblies of a real human genome, Inspector revealed that both small- iv scale and structural assembly errors are enriched in repetitive regions for most assemblers. With its error correction module, Inspector reduced number of assembly errors and improved the assembly quality after polishing with long reads. In addition, I have developed FusionSeeker to detect gene fusions caused by SVs from long-read cancer transcriptome sequencing data. FusionSeeker reports gene fusions in both exonic and intronic regions with high accuracy and can reconstruct fused transcript sequences in simulated and cancer cell line datasets. These tools will facilitate the SV analysis using long-read sequencing data in the community.
Recommended Citation
Chen, Yu, "Comprehensive Characterization of Structural Variations Using Long-Read Sequencing Data" (2022). All ETDs from UAB. 198.
https://digitalcommons.library.uab.edu/etd-collection/198