Data for Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage
Author ORCID
Emma F. Jones 0000-0003-4244-1456
Timothy C. Howton 0000-0002-9423-0135
Victoria L. Flanary 0000-0003-4208-3695
Amanda D. Clark 0000-0002-1186-3114
Brittany N. Lasseigne 0000-0002-1642-8904
Publication Date
12-14-2023
Abstract
data_minus_bam.tar.gz contains all files from the data directory (except for bam outputs) associated with the 230227_EJ_MouseBrainIsoDiv GitHub project and includes the following:
- comparison_gene_lists/: The RData in the following directory contains all comparison gene lists with DGE, DTE, and DTU for importing into the R environment and reproducing analyses.
- all_comparison_gene_lists.Rdata
- cpm_out/: The RData in the following directory contains the processed counts per million and formatted metadata for downstream analyses.
- cpm_counts_metadata.RData
- deseq2_data/: All files in the following directory are Rds files with deseq2 results for the study design indicated in the file name. If the file name includes “gene” it was done at the gene level and “transcript” indicates the analysis was done at the transcript level. If a filename includes two regions, it is a comparison between the two, a file name with one region denotes either “one vs all” or “male vs female”. Any filename that includes “sex” is male vs female in the indicated region(s).
- all_regions_sex_gene_results.Rds
- all_regions_sex_transcript_results.Rds
- cerebellum_cortex_results.Rds
- cerebellum_cortex_transcripts_results.Rds
- cerebellum_gene_results.Rds
- cerebellum_hippocampus_results.Rds
- cerebellum_hippocampus_transcripts_results.Rds
- cerebellum_sex_gene_results.Rds
- cerebellum_sex_transcript_results.Rds
- cerebellum_striatum_results.Rds
- cerebellum_striatum_transcripts_results.Rds
- cerebellum_transcript_results.Rds
- cortex_gene_results.Rds
- cortex_hippocampus_results.Rds
- cortex_hippocampus_transcripts_results.Rds
- cortex_sex_gene_results.Rds
- cortex_sex_transcript_results.Rds
- cortex_striatum_results.Rds
- cortex_striatum_transcripts_results.Rds
- cortex_transcript_results.Rds
- hippocampus_gene_results.Rds
- hippocampus_sex_gene_results.Rds
- hippocampus_sex_transcript_results.Rds
- hippocampus_striatum_transcripts_results.Rds
- hippocampus_transcript_results.Rds
- striatum_gene_results.Rds
- striatum_hippocampus_results.Rds
- striatum_hippocampus_transcripts_results.Rds
- striatum_sex_gene_results.Rds
- striatum_sex_transcript_results.Rds
- striatum_transcript_results.Rds
- gencode_annotations/: This directory contains the exact GENCODE genome and transcriptome annotations used for our analyses
- GRCm39.primary_assembly.genome.fa
- GRCm39.primary_assembly.genome.fa.fai
- gencode.vM31.primary_assembly.annotation.gtf
- gffread/: This directory contains the generated fasta files with exact isoform sequences for novel and annotated genes required for creating isoformSwitchAnalyzeR objects.
- isoform_sequences.fa
- isoform_sequences_linear.fa
- nextflow/: All files in the following directories in the overarching nextflow are direct outputs from the nf-core nanoseq pipeline. For specific information on nanoseq pipeline outputs, please refer to https://nf-co.re/nanoseq/3.1.0/docs/output
- bambu/
- counts_gene.txt
- counts_transcript.txt
- extended_annotations.gtf
- extended_annotations.gtf.idx
- versions.yml
- fastqc/
- There are 2 files for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:
- sample01_R1_1_fastqc.html
- sample01_R1_1_fastqc.zip
- minimap2/
- bam/ This directory has been removed to save space, please contact us for more information.
- bigBed/
- There is 1 file for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:
- sample01_R1.bigBed
- bigWig/
- There is 1 file for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:
- sample01_R1.bigWig
- genome/
- GRCm39.primary_assembly.genome.fa.mmi
- samtools_stats/
- There are 3 files for each of the 40 samples. Below is a representative example, of files expected for each of the 40 samples:
- sample01_R1.sorted.bam.flagstat
- sample01_R1.sorted.bam.idxstats
- sample01_R1.sorted.bam.stats
- multiqc/
- multiqc_data/
- mqc_samtools-idxstats-mapped-reads-plot_Normalised_Counts.txt
- mqc_samtools-idxstats-mapped-reads-plot_Observed_over_Expected_Counts.txt
- mqc_samtools-idxstats-mapped-reads-plot_Raw_Counts.txt
- mqc_samtools-idxstats-xy-plot_1.txt
- mqc_samtools_alignment_plot_1.txt
- multiqc.log
- multiqc_data.json
- multiqc_general_stats.txt
- multiqc_samtools_flagstat.txt
- multiqc_samtools_idxstats.txt
- multiqc_samtools_stats.txt
- multiqc_sources.txt
- multiqc_plots/
- pdf/
- mqc_samtools-idxstats-mapped-reads-plot_Normalised_Counts.pdf
- mqc_samtools-idxstats-mapped-reads-plot_Observed_over_Expected_Counts.pdf
- mqc_samtools-idxstats-mapped-reads-plot_Raw_Counts.pdf
- mqc_samtools-idxstats-xy-plot_1.pdf
- mqc_samtools-idxstats-xy-plot_1_pc.pdf
- mqc_samtools_alignment_plot_1.pdf
- mqc_samtools_alignment_plot_1_pc.pdf
- png/
- *The same multiqc plots as the pdf directory, but in png format*
- svg/
- *The same multiqc plots as the pdf and png directory, but in svg format*
- multiqc_report.html
- versions.yml
- nanoplot/
- fastq/
- Contains 40 directories for 40 samples, each containing 12 files obtained from running nanoplot with the nf-core nanoseq pipeline. Below is a representative example, but this repo contains 1 directory per sample:
- sample01_R1/
- Dynamic_Histogram_Read_length.html
- HistogramReadlength.png
- LengthvsQualityScatterPlot_dot.png
- LengthvsQualityScatterPlot_kde.png
- LogTransformed_HistogramReadlength.png
- NanoPlot-report.html
- NanoPlot_20230413_1600.log
- NanoPlot_20230413_2047.log
- NanoStats.txt
- Weighted_HistogramReadlength.png
- Weighted_LogTransformed_HistogramReadlength.png
- Yield_By_Length.png
- pipeline_info/
- execution_report_2023-04-13_15-46-11.html
- execution_timeline_2023-04-13_15-46-11.html
- execution_trace_2023-04-13_10-59-24.txt
- execution_trace_2023-04-13_15-46-11.txt
- pipeline_dag_2023-04-13_15-46-11.svg
- samplesheet.valid.csv
- software_versions.yml
- switchlist_fasta/: This directory contains the generated fasta files for amino acids and nucleotides for individual isoformSwitchAnalyzeR objects required for downstream analyses.
- cerebellum_AA.fasta
- cerebellum_nt.fasta
- cerebellum_sex_AA.fasta
- cerebellum_sex_nt.fasta
- cortex_AA.fasta
- cortex_nt.fasta
- cortex_sex_AA.fasta
- cortex_sex_nt.fasta
- hippocampus_AA.fasta
- hippocampus_nt.fasta
- region_region_AA.fasta
- region_region_nt.fasta
- striatum_AA.fasta
- striatum_nt.fasta
- striatum_sex_AA.fasta
- striatum_sex_nt.fasta
- switchlist_objects/: This directory contains intermediate and final isoformSwitchAnalyzeR objects. “Region_all” in the filename is a list of four switchlists that compare a single brain region (cerebellum, cortex, hippocampus, striatum) to all others in aggregate. “Region_sex” in the filename is a list of four switchlists (cerebellum, cortex, hippocampus, striatum) that compare across sexes (male and female). “Region_region” denotes a single switchlist that includes all pairwise region comparisons. “Sex” in the name without “region” is comparing all regions in aggregate.
- de_added/: This directory contains final isoformSwitchAnalyzeR objects that include open reading frame and differential expression results incorporated.
- region_all_switchlist_list_orf_de.Rds
- region_region_orf_de.Rds
- region_sex_switchlist_list_orf_de.Rds
- orf_added/: This directory contains intermediate and final isoformSwitchAnalyzeR objects with open reading frame information added.
- region_all_switchlist_list.Rds
- region_region_switchlist_analyzed.Rds
- region_sex_switchlist_list.Rds
- sex_switchlist_analyzed.Rds
- pfam_added/: This directory contains final isoformSwitchAnalyzeR objects (including de and orf information) with added protein domain information. Please note pfam does not comprehensively identify all protein domains for every gene.
- region_all_list_orf_de_pfam.Rds
- region_region_orf_de_pfam.Rds
- region_sex_list_orf_de_pfam.Rds
- raw/: This directory contains the initial isoformSwitchAnalyzeR objects, without additional information added.
- region_all_switchlist_list.Rds
- region_region_switchlist_analyzed.Rds
- region_sex_switchlist_list.Rds
- sex_switchlist.Rds
Keywords
long-read RNA sequencing, brain, sex, alternative splicing, gene expression, transcript usage, isoform usage
Repository
Zenodo
Distribution License
Access Instructions and Link
This data is available under the MIT License
Funder
Funder: National Human Genome Research Institute
Integrating multidimensional genomic data to discover clinically-relevant predictive models
R00HG009678
Recommended Citation
Jones, Emma F.; Howton, Timothy C.; Flanary, Victoria L.; Clark, Amanda D.; and Lasseigne, Brittany, "Data for Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage" (2023). UAB Research Data Catalog. 109.
https://digitalcommons.library.uab.edu/datasets/109