Skip to contents

Uses the strategy described here, and similar to that originally presented in Berg et al. 2024.

Usage

CorrectDropout(
  obj,
  grouping_factors = NULL,
  features = NULL,
  populations = NULL,
  fraction_design = NULL,
  repeatID = NULL,
  exactMatch = TRUE,
  read_cutoff = 25,
  dropout_cutoff = 5
)

Arguments

obj

An EZbakRFractions object, which is an EZbakRData object on which you have run EstimateFractions().

grouping_factors

Which sample-detail columns in the metadf should be used to group -s4U samples by for calculating the average -s4U RPM? The default value of NULL will cause all sample-detail columns to be used.

features

Character vector of the set of features you want to stratify reads by and estimate proportions of each RNA population. The default of NULL will expect there to be only one fractions table in the EZbakRFractions object.

populations

Mutational populations that were analyzed to generate the fractions table to use. For example, this would be "TC" for a standard s4U-based nucleotide recoding experiment.

fraction_design

"Design matrix" specifying which RNA populations exist in your samples. By default, this will be created automatically and will assume that all combinations of the mutrate_populations you have requested to analyze are present in your data. If this is not the case for your data, then you will have to create one manually. See docs for EstimateFractions (run ?EstimateFractions()) for more details.

repeatID

If multiple fractions tables exist with the same metadata, then this is the numerical index by which they are distinguished.

exactMatch

If TRUE, then features must exactly match the features metadata for a given fractions table for it to be used. Means that you cannot specify a subset of features by default. Set this to FALSE if you would like to specify a feature subset.

read_cutoff

Minimum number of reads for a feature to be used to fit the dropout model.

dropout_cutoff

Maximum ratio of -s4U:+s4U RPMs for a feature to be used to fit the dropout model (i.e., simple outlier filtering cutoff).

Value

An EZbakRData object with the specified "fractions" table replaced with a dropout corrected table.

Details

Dropout is the disproportionate loss of labeled RNA/reads from said RNA described independently here and here. It can originate from a combination of bioinformatic (loss of high mutation content reads due to alignment problems), technical (loss of labeled RNA during RNA extraction), and biological (transcriptional shutoff in rare cases caused by metabolic label toxicity) sources. CorrectDropout() compares label-fed and label-free controls from the same experimental conditions to estimate and correct for this dropout. It assumes that there is a single number (referred to as the dropout rate, or pdo) which describes the rate at which labeled RNA is lost (relative to unlabeled RNA). pdo ranges from 0 (no dropout) to 1 (complete loss of all labeled RNA), and is thus interpreted as the percentage of labeled RNA/reads from labeled RNA disproportionately lost, relative to the equivalent unlabeled species.

Examples


# Simulate data to analyze
simdata <- EZSimulate(30)

# Create EZbakR input
ezbdo <- EZbakRData(simdata$cB, simdata$metadf)

# Estimate Fractions
ezbdo <- EstimateFractions(ezbdo)
#> Estimating mutation rates
#> Summarizing data for feature(s) of interest
#> Averaging out the nucleotide counts for improved efficiency
#> Estimating fractions
#> Processing output

# Correct for dropout
ezbdo <- CorrectDropout(ezbdo)
#> Estimated rates of dropout are:
#>    sample        pdo
#> 1 sample1 0.26821608
#> 2 sample2 0.01000000
#> 3 sample3 0.13272591
#> 4 sample4 0.01000000
#> 5 sample5 0.01000000
#> 6 sample6 0.06388398