git clone https://github.com/isaacvock/Isoform_Analysis_Tutorial.git
cd Isoform_Analysis_Tutorial
Introduction to the EZbakR-suite isoform tutorial
Introduction
We recently put out a preprint introducing the first strategy for analyzing transcript isoform synthesis and degradation kinetics with NR-seq data from total RNA and the EZbakR-suite. To help others get this analysis strategy working in their hands, I developed this tutorial so as to show (with real data) how to go from FASTQ files to volcano plots using fastq2EZbakR to process the raw data and EZbakR to analyze the processed data. The data is hosted at this repository.
The dataset
The dataset is a heavily downsampled version of the main NR-seq dataset presented in the preprint. It includes:
- 3 replicates of DMSO treated, s4U fed data.
- 3 replicates of SMG1 inhibitor (SMG1i) treated, s4U fed data.
- 1 replicate of DMSO treated, -s4U data.
- 1 replicate of SMG1i treated, -s4U data.
The fastq files included in the tutorial repo are downsampled to only include 5% of reads from a particular region of chromosome 6 (bases 16,103,264 to 57,096,698). Given how deeply the original dataset was sequenced, this leaves around 500,000 reads per fastq file. This region was chosen as it includes the SRSF3 gene, used as an example locus throughout the paper.
SMG1 inhibition is designed to inhibit NMD. The goal of a transcript isoform-level analysis of this data is thus to identify transcripts that are stabilized upon NMD inhibition, as these are likely targets of NMD. SRSF3 is a gene that produces a well established NMD target (as well as a major isoform that is not degraded by NMD). If you do everything right, you should thus see the NMD degraded SRSF3 isoform coming up as significantly stabilized, whereas the other isoform should appear largely unaffected by SMG1i treatment.
The output
By the end of this tutorial, you will have produced all of the output generated by fastq2EZbakR, as well as an EZbakRData object containing transcript isoform level analyses. In addition, this tutorial also showcases several other semi-orthogonal analysis strategies that can provide useful, sub-isoform level information (e.g., analyses of individual exon-exon junctions).
Further reading
More details about the two main tools used in this tutorial can be found at their respective websites:
Tutorial setup
To get started, make git is installed on your system, clone the data repository locally, and navigate into it with:
Inside of Isoform_Analysis_Tutorial
, you will find a directory called data
. Unzip the GTF and FASTA files in this folder:
gzip -d data/hg38_*
Once you are ready, proceed to the fastq2EZbakR portion of this tutorial!