Analyze and Integrate Any Type of Nucleotide Recoding RNA-seq Data • EZbakR

EZbakR is a highly flexible tool for analyses of nucleotide recoding RNA-seq datasets (NR-seq; e.g., TimeLapse-seq, SLAM-seq, TUC-seq, etc.). See our paper for a discussion of the motivation behind EZbakR and its companion pipeline fastq2EZbakR, as well as validation of all of its novel functionality.

To install or update, run:

if (!require("roxygen2", quietly = TRUE))
    install.packages("roxygen2")
if (!require("remotes", quietly = TRUE))
    install.packages("remotes")
remotes::install_github("isaacvock/EZbakR")

At this point, changes will be made weekly, so updating frequently is highly recommended.

Documentation is here: https://isaacvock.github.io/EZbakR/

Vignettes

Currently, the following functionalities have dedicated vignettes:

Quickstart: Takes you through the standard workflow, similar to bakR’s one and only workflow.
Estimating fractions: Estimating the fraction of reads from each mutational population in your data. This is the nearly universal first step in all NR-seq analyses. This is done with EZbakR’s EstimateFractions() function.
Estimating kinetics: Estimating kinetic parameters of synthesis and degradation in a standard NR-seq experiment. For standard, single label, NR-seq analyses, this is the next step in your analysis workflow after estimating the fraction of reads that are from labeled RNA. This is done with EZbakR’s EstimateKinetics() function.
Quality Control: Assessing the quality of your NR-seq data. This is done with EZbakR’s EZQC() function.
Comparative analyses: Fitting a flexible generalized linear model to your NR-seq data so as to perform comparative analyses of estimated kinetic parameters that complements differential expression analyses. This is done with EZbakR’s AverageAndRegularize() and CompareParameters() functions.
Dynamical systems modeling. For analyses of subcellular fractionation and/or pre-mRNA processing dynamics. This is done with EZbakR’s EZDynamics() function.
Navigating EZbakR output. Conveniently fetching data from EZbakR analyses. This is done with EZbakR’s EZget() function.

Other implemented functionality that may be of interest includes:

Providing fractions or kinetic parameter estimates as input. The former works similarly to how it did in bakR, and is implemented via the EZbakRFractions() function. THe latter is unique to EZbakR and is implemented via the EZbakRKinetics() function.
Simulating NR-seq data. There are a number of simulation functions implemented in EZbakR. EZSimulate() is a convenient wrapper to several of these.

Update (3/15/2025): Analyses of transcript isoforms

We recently (3/14/2025) put out a preprint describing a method by which to analyze the kinetics of individual transcript isoforms using short read NR-seq data from total RNA. While this strategy is touched on a little bit in one of the EZbakR vignettes (this one), I have also developed a full fastq-to-volcano plot walkthrough using real downsampled fastq files from that preprint so you can see how every step of the fastq2EZbakR and EZbakR pipeline needs to be configured/run for these analyses. The tutorial is here, and the data used in that tutorial is here. Over the next couple weeks I will be adding some extra details/analyses to this tutorial, but in its current form (as of 3/15/2025), all of the basics of performing isoform-level analyses are covered there. It also acts as a hand-on tutorial for all of the EZbakR-suite and can thus useful to checkout and try out even if you aren’t interested in this particular analysis strategy.

What’s new?

EZbakR represents a complete rewrite of bakR. Improvements implemented in EZbakR include:

Modular function design that facilitates using EZbakR with any kind of NR-seq data, regardless of the experimental design or data details.
Extended mixture modeling capabilities. Includes:
- Support for multi-label analyses.
- Hierarchical mutation rate estimation strategy to allow for feature-specific mutation rates.
- More efficient and accurate uncertainty quantification.
Additional kinetic parameter estimation strategies:
- Non-steady-state analyses as introduced in Narain et al., 2021.
- Short-feed analyses that assume negligible degradation of existing RNA.
- Synthesis rate estimation is implemented as a part of all strategies.
Improved uncertainty propogation so as to achieve performance of bakR’s slower implementations (Hybrid and MCMC) with a strategy as efficient as bakR’s most efficent implementation (MLE).
Removal of Stan dependencies. I love Stan, but having it as an R package dependency makes installation and maintenace more difficult.
Optional Apache Arrow backend to help with analyses of larger-than-RAM datasets
Linear model-based averaging of replicate data to support more complex experimental designs and maximally flexible comparative analyses.
Greater flexibility in terms of the input data structure. Namely, multiple different features can be specified in your input cB table, and multiple different experimental details can be included in your input metadf table.
A novel transcript isoform deconvolution strategy that allows for isoform-specific kinetic parameter estimation.
Generalized linear dynamical systems modeling of NR-seq data. Supports analyses of subcellular fractionation NR-seq extensions, such as those described here, here, and here. Also supports analyses of pre-mRNA processing dynamics.

In the near future, EZbakR will support anything bakR can do that isn’t currently implemented (Namely DissectMechanisms() and various visualization functions). There are also a number of exciting developments on the horizon, so stay tuned!

What is NR-seq?

NR-seq refers to a class of methods that combine RNA-seq, metabolic labeling, and unique metabolic label recoding chemistries. These methods were originally developed to dissect the kinetics of RNA synthesis and degradation. Excitingly though, a treasure trove of extensions of the original methods have been created over the years. To-date, nucleotide recoding has been combined with the likes of TT-seq, Start-seq, Ribo-seq, scRNA-seq (other examples of this here, here, and here), Perturb-seq, long-read sequencing, and subcellular fractionation. In addition, while the original methods used 4-thiouridine (s⁴U), the same chemistry has been found to work with 6-thioguanosine (s⁶G), opening the door to dual-labeling experimental designs (e.g., TILAC). EZbakR and its companion pipeline fastq2EZbakR aim to provide an integrated and flexible framework to support this exciting class of methods.

Welcome to EZbakR!

Vignettes

Update (3/15/2025): Analyses of transcript isoforms

What’s new?

What is NR-seq?