Introduction to the EZbakR-suite isoform tutorial

Introduction

We recently put out a preprint introducing the first strategy for analyzing transcript isoform synthesis and degradation kinetics with NR-seq data from total RNA and the EZbakR-suite. To help others get this analysis strategy working in their hands, I developed this tutorial so as to show (with real data) how to go from FASTQ files to volcano plots using fastq2EZbakR to process the raw data and EZbakR to analyze the processed data. The data is hosted at this repository.

The dataset

The dataset is a heavily downsampled version of the main NR-seq dataset presented in the preprint. It includes:

  1. 3 replicates of DMSO treated, s4U fed data.
  2. 3 replicates of SMG1 inhibitor (SMG1i) treated, s4U fed data.
  3. 1 replicate of DMSO treated, -s4U data.
  4. 1 replicate of SMG1i treated, -s4U data.

The fastq files included in the tutorial repo are downsampled to only include 5% of reads from a particular region of chromosome 6 (bases 16,103,264 to 57,096,698). Given how deeply the original dataset was sequenced, this leaves around 500,000 reads per fastq file. This region was chosen as it includes the SRSF3 gene, used as an example locus throughout the paper.

SMG1 inhibition is designed to inhibit NMD. The goal of a transcript isoform-level analysis of this data is thus to identify transcripts that are stabilized upon NMD inhibition, as these are likely targets of NMD. SRSF3 is a gene that produces a well established NMD target (as well as a major isoform that is not degraded by NMD). If you do everything right, you should thus see the NMD degraded SRSF3 isoform coming up as significantly stabilized, whereas the other isoform should appear largely unaffected by SMG1i treatment.

The output

By the end of this tutorial, you will have produced all of the output generated by fastq2EZbakR, as well as an EZbakRData object containing transcript isoform level analyses. In addition, this tutorial also showcases several other semi-orthogonal analysis strategies that can provide useful, sub-isoform level information (e.g., analyses of individual exon-exon junctions).

Further reading

More details about the two main tools used in this tutorial can be found at their respective websites:

  1. fastq2EZbakR
  2. EZbakR

Tutorial setup

To get started, make git is installed on your system, clone the data repository locally, and navigate into it with:

git clone https://github.com/isaacvock/Isoform_Analysis_Tutorial.git

cd Isoform_Analysis_Tutorial

Inside of Isoform_Analysis_Tutorial, you will find a directory called data. Unzip the GTF and FASTA files in this folder:

gzip -d data/hg38_*

Once you are ready, proceed to the fastq2EZbakR portion of this tutorial!