Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage

Emma F. Jones, University of Alabama at Birmingham
Timothy C. Howton, University of Alabama at Birmingham
Victoria L. Flanary, University of Alabama at Birmingham
Amanda D. Clark, University of Alabama at Birmingham
Brittany Lasseigne, University of Alabama at Birmingham

Author ORCID

Emma F. Jones 0000-0003-4244-1456

Timothy C. Howton 0000-0002-9423-0135

Victoria L. Flanary 0000-0003-4208-3695

Amanda D. Clark 0000-0002-1186-3114

Brittany N. Lasseigne 0000-0002-1642-8904

Publication Date

12-14-2023

Abstract

data_minus_bam.tar.gz contains all files from the data directory (except for bam outputs) associated with the 230227_EJ_MouseBrainIsoDiv GitHub project and includes the following:

- comparison_gene_lists/: The RData in the following directory contains all comparison gene lists with DGE, DTE, and DTU for importing into the R environment and reproducing analyses.

- all_comparison_gene_lists.Rdata

- cpm_out/: The RData in the following directory contains the processed counts per million and formatted metadata for downstream analyses.

- cpm_counts_metadata.RData

- deseq2_data/: All files in the following directory are Rds files with deseq2 results for the study design indicated in the file name. If the file name includes “gene” it was done at the gene level and “transcript” indicates the analysis was done at the transcript level. If a filename includes two regions, it is a comparison between the two, a file name with one region denotes either “one vs all” or “male vs female”. Any filename that includes “sex” is male vs female in the indicated region(s).

- all_regions_sex_gene_results.Rds

- all_regions_sex_transcript_results.Rds

- cerebellum_cortex_results.Rds

- cerebellum_cortex_transcripts_results.Rds

- cerebellum_gene_results.Rds

- cerebellum_hippocampus_results.Rds

- cerebellum_hippocampus_transcripts_results.Rds

- cerebellum_sex_gene_results.Rds

- cerebellum_sex_transcript_results.Rds

- cerebellum_striatum_results.Rds

- cerebellum_striatum_transcripts_results.Rds

- cerebellum_transcript_results.Rds

- cortex_gene_results.Rds

- cortex_hippocampus_results.Rds

- cortex_hippocampus_transcripts_results.Rds

- cortex_sex_gene_results.Rds

- cortex_sex_transcript_results.Rds

- cortex_striatum_results.Rds

- cortex_striatum_transcripts_results.Rds

- cortex_transcript_results.Rds

- hippocampus_gene_results.Rds

- hippocampus_sex_gene_results.Rds

- hippocampus_sex_transcript_results.Rds

- hippocampus_striatum_transcripts_results.Rds

- hippocampus_transcript_results.Rds

- striatum_gene_results.Rds

- striatum_hippocampus_results.Rds

- striatum_hippocampus_transcripts_results.Rds

- striatum_sex_gene_results.Rds

- striatum_sex_transcript_results.Rds

- striatum_transcript_results.Rds

- gencode_annotations/: This directory contains the exact GENCODE genome and transcriptome annotations used for our analyses

- GRCm39.primary_assembly.genome.fa

- GRCm39.primary_assembly.genome.fa.fai

- gencode.vM31.primary_assembly.annotation.gtf

- gffread/: This directory contains the generated fasta files with exact isoform sequences for novel and annotated genes required for creating isoformSwitchAnalyzeR objects.

- isoform_sequences.fa

- isoform_sequences_linear.fa

- nextflow/: All files in the following directories in the overarching nextflow are direct outputs from the nf-core nanoseq pipeline. For specific information on nanoseq pipeline outputs, please refer to https://nf-co.re/nanoseq/3.1.0/docs/output

- bambu/

- counts_gene.txt

- counts_transcript.txt

- extended_annotations.gtf

- extended_annotations.gtf.idx

- versions.yml

- fastqc/

- There are 2 files for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

- sample01_R1_1_fastqc.html

- sample01_R1_1_fastqc.zip

- minimap2/

- bam/ This directory has been removed to save space, please contact us for more information.

- bigBed/

- There is 1 file for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

- sample01_R1.bigBed

- bigWig/

- There is 1 file for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

- sample01_R1.bigWig

- genome/

- GRCm39.primary_assembly.genome.fa.mmi

- samtools_stats/

- There are 3 files for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:

- sample01_R1.sorted.bam.flagstat

- sample01_R1.sorted.bam.idxstats

- sample01_R1.sorted.bam.stats

- multiqc/

- multiqc_data/

- mqc_samtools-idxstats-mapped-reads-plot_Normalised_Counts.txt

- mqc_samtools-idxstats-mapped-reads-plot_Observed_over_Expected_Counts.txt

- mqc_samtools-idxstats-mapped-reads-plot_Raw_Counts.txt

- mqc_samtools-idxstats-xy-plot_1.txt

- mqc_samtools_alignment_plot_1.txt

- multiqc.log

- multiqc_data.json

- multiqc_general_stats.txt

- multiqc_samtools_flagstat.txt

- multiqc_samtools_idxstats.txt

- multiqc_samtools_stats.txt

- multiqc_sources.txt

- multiqc_plots/

- pdf/

- mqc_samtools-idxstats-mapped-reads-plot_Normalised_Counts.pdf

- mqc_samtools-idxstats-mapped-reads-plot_Observed_over_Expected_Counts.pdf

- mqc_samtools-idxstats-mapped-reads-plot_Raw_Counts.pdf

- mqc_samtools-idxstats-xy-plot_1.pdf

- mqc_samtools-idxstats-xy-plot_1_pc.pdf

- mqc_samtools_alignment_plot_1.pdf

- mqc_samtools_alignment_plot_1_pc.pdf

- png/

- *The same multiqc plots as the pdf directory, but in png format*

- svg/

- *The same multiqc plots as the pdf and png directory, but in svg format*

- multiqc_report.html

- versions.yml

- nanoplot/

- fastq/

- Contains 40 directories for 40 samples, each containing 12 files obtained from running nanoplot with the nf-core nanoseq pipeline. Below is a representative example, but this repo contains 1 directory per sample:

- sample01_R1/

- Dynamic_Histogram_Read_length.html

- HistogramReadlength.png

- LengthvsQualityScatterPlot_dot.png

- LengthvsQualityScatterPlot_kde.png

- LogTransformed_HistogramReadlength.png

- NanoPlot-report.html

- NanoPlot_20230413_1600.log

- NanoPlot_20230413_2047.log

- NanoStats.txt

- Weighted_HistogramReadlength.png

- Weighted_LogTransformed_HistogramReadlength.png

- Yield_By_Length.png

- pipeline_info/

- execution_report_2023-04-13_15-46-11.html

- execution_timeline_2023-04-13_15-46-11.html

- execution_trace_2023-04-13_10-59-24.txt

- execution_trace_2023-04-13_15-46-11.txt

- pipeline_dag_2023-04-13_15-46-11.svg

- samplesheet.valid.csv

- software_versions.yml

- switchlist_fasta/: This directory contains the generated fasta files for amino acids and nucleotides for individual isoformSwitchAnalyzeR objects required for downstream analyses.

- cerebellum_AA.fasta

- cerebellum_nt.fasta

- cerebellum_sex_AA.fasta

- cerebellum_sex_nt.fasta

- cortex_AA.fasta

- cortex_nt.fasta

- cortex_sex_AA.fasta

- cortex_sex_nt.fasta

- hippocampus_AA.fasta

- hippocampus_nt.fasta

- region_region_AA.fasta

- region_region_nt.fasta

- striatum_AA.fasta

- striatum_nt.fasta

- striatum_sex_AA.fasta

- striatum_sex_nt.fasta

- switchlist_objects/: This directory contains intermediate and final isoformSwitchAnalyzeR objects. “Region_all” in the filename is a list of four switchlists that compare a single brain region (cerebellum, cortex, hippocampus, striatum) to all others in aggregate. “Region_sex” in the filename is a list of four switchlists (cerebellum, cortex, hippocampus, striatum) that compare across sexes (male and female). “Region_region” denotes a single switchlist that includes all pairwise region comparisons. “Sex” in the name without “region” is comparing all regions in aggregate.

- de_added/: This directory contains final isoformSwitchAnalyzeR objects that include open reading frame and differential expression results incorporated.

- region_all_switchlist_list_orf_de.Rds

- region_region_orf_de.Rds

- region_sex_switchlist_list_orf_de.Rds

- orf_added/: This directory contains intermediate and final isoformSwitchAnalyzeR objects with open reading frame information added.

- region_all_switchlist_list.Rds

- region_region_switchlist_analyzed.Rds

- region_sex_switchlist_list.Rds

- sex_switchlist_analyzed.Rds

- pfam_added/: This directory contains final isoformSwitchAnalyzeR objects (including de and orf information) with added protein domain information. Please note pfam does not comprehensively identify all protein domains for every gene.

- region_all_list_orf_de_pfam.Rds

- region_region_orf_de_pfam.Rds

- region_sex_list_orf_de_pfam.Rds

- raw/: This directory contains the initial isoformSwitchAnalyzeR objects, without additional information added.

- region_all_switchlist_list.Rds

- region_region_switchlist_analyzed.Rds

- region_sex_switchlist_list.Rds

- sex_switchlist.Rds

Keywords

long-read RNA sequencing, brain, sex, alternative splicing, gene expression, transcript usage, isoform usage

Related Items

Is supplement to: https://github.com/lasseignelab/230227_EJ_MouseBrainIsoDiv/tree/main
Is supplemented by: 10.5281/zenodo.10480924
Is supplement to: https://lasseignelab.shinyapps.io/mouse_brain_iso_div/
Is supplement to: 10.5281/zenodo.10481312

Repository

Zenodo

Distribution License

The MIT License

Access Instructions

This data is available under the MIT License

Funder

Funder: National Human Genome Research Institute
Integrating multidimensional genomic data to discover clinically-relevant predictive models
R00HG009678

Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage

Author ORCID

Publication Date

Abstract

Keywords

Related Items

Repository

Distribution License

Access Instructions

Funder

Search

Browse

Author Corner

Research Data Catalog

Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage

Authors

Author ORCID

Publication Date

Abstract

Keywords

Related Items

Repository

Distribution License

Access Instructions

Funder

Share

Search

Browse

Author Corner