EZget
EZget.Rmd
Introduction
This vignette shows how to use the EZget()
function
provided by EZbakR
. In cases where you have multiple tables
of a particular type in your EZbakRData
object, this can
greatly facilitate extracting the table of interest. As a part of this
vignette, I will also describe how an EZbakRData
object is
organized.
EZbakRData objects
Let’s first analyze some simulated data to generate an
EZbakRData
object that we can explore the contents of:
simdata <- EZSimulate(nfeatures = 300, nreps = 2)
# Make initial EZbakRData object
ezbdo <- EZbakRData(simdata$cB, simdata$metadf)
# Estimate fractions twice, and don't overwrite the first analysis
# Second run will use different model; see EstimateFractions vignette for details
ezbdo <- EstimateFractions(ezbdo)
#> Estimating mutation rates
#> Summarizing data for feature(s) of interest
#> Averaging out the nucleotide counts for improved efficiency
#> Estimating fractions
#> Processing output
ezbdo <- EstimateFractions(ezbdo, strategy = 'hierarchical', overwrite = FALSE)
#> Estimating mutation rates
#> Summarizing data for feature(s) of interest
#> Averaging out the nucleotide counts for improved efficiency
#> Estimating fractions
#> FITTING HIERARCHICAL TWO-COMPONENT MIXTURE MODEL:
#> Estimating distribution of feature-specific pnews
#> Estimating fractions with feature-specific pnews
#> Processing output
# Estimate kinetic parameters with three different strategies
# See EstimateKinetics vignettes for details.
ezbdo <- EstimateKinetics(ezbdo, repeatID = 1)
ezbdo <- EstimateKinetics(ezbdo, repeatID = 1, strategy = "shortfeed")
ezbdo <- EstimateKinetics(ezbdo, repeatID = 2, strategy = "shortfeed")
An EZbakRData
object is a list that can contain the
following items:
- cB: The cB table you provided upon object creation
- metadf: The metadf table you provided upon object creation
-
fractions: List of fractions estimates generated by
EstimateFractions()
. -
kinetics: List of kinetic parameter estimates
generated by
EstimateKinetics()
-
averages: List of parameter replicate averages
generated by
AverageAndRegularize()
-
comparisons: List of comparisons of parameter
averages, generated by
CompareParameters()
- readcounts: List of tables of read counts generated by various EZbakR functions.
-
metadata: List with elements corresponding to the
lists of tables described above. Describes various features of the
tables so that they can be fetched with
EZget()
.
As an EZbakRData
object is a list, its elements can be
accessed in a few ways:
# `$` notation:
ezbdo$fractions$feature
#> # A tibble: 1,800 × 6
#> sample feature fraction_highTC logit_fraction_highTC se_logit_fraction_hig…¹
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 sample1 Gene1 0.102 -2.17 0.0620
#> 2 sample1 Gene10 0.222 -1.25 0.111
#> 3 sample1 Gene100 0.188 -1.46 0.0719
#> 4 sample1 Gene101 0.184 -1.49 0.0338
#> 5 sample1 Gene102 0.141 -1.81 0.0926
#> 6 sample1 Gene103 0.116 -2.03 0.0871
#> 7 sample1 Gene104 0.152 -1.72 0.0398
#> 8 sample1 Gene105 0.0827 -2.41 0.0875
#> 9 sample1 Gene106 0.177 -1.54 0.0893
#> 10 sample1 Gene107 0.161 -1.65 0.0466
#> # ℹ 1,790 more rows
#> # ℹ abbreviated name: ¹se_logit_fraction_highTC
#> # ℹ 1 more variable: n <int>
# `[[]]` notation
ezbdo[['fractions']][['feature']]
#> # A tibble: 1,800 × 6
#> sample feature fraction_highTC logit_fraction_highTC se_logit_fraction_hig…¹
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 sample1 Gene1 0.102 -2.17 0.0620
#> 2 sample1 Gene10 0.222 -1.25 0.111
#> 3 sample1 Gene100 0.188 -1.46 0.0719
#> 4 sample1 Gene101 0.184 -1.49 0.0338
#> 5 sample1 Gene102 0.141 -1.81 0.0926
#> 6 sample1 Gene103 0.116 -2.03 0.0871
#> 7 sample1 Gene104 0.152 -1.72 0.0398
#> 8 sample1 Gene105 0.0827 -2.41 0.0875
#> 9 sample1 Gene106 0.177 -1.54 0.0893
#> 10 sample1 Gene107 0.161 -1.65 0.0466
#> # ℹ 1,790 more rows
#> # ℹ abbreviated name: ¹se_logit_fraction_highTC
#> # ℹ 1 more variable: n <int>
# `[[]]` notation with numeric indices
ezbdo[[4]][[1]]
#> # A tibble: 1,800 × 6
#> sample feature fraction_highTC logit_fraction_highTC se_logit_fraction_hig…¹
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 sample1 Gene1 0.102 -2.17 0.0620
#> 2 sample1 Gene10 0.222 -1.25 0.111
#> 3 sample1 Gene100 0.188 -1.46 0.0719
#> 4 sample1 Gene101 0.184 -1.49 0.0338
#> 5 sample1 Gene102 0.141 -1.81 0.0926
#> 6 sample1 Gene103 0.116 -2.03 0.0871
#> 7 sample1 Gene104 0.152 -1.72 0.0398
#> 8 sample1 Gene105 0.0827 -2.41 0.0875
#> 9 sample1 Gene106 0.177 -1.54 0.0893
#> 10 sample1 Gene107 0.161 -1.65 0.0466
#> # ℹ 1,790 more rows
#> # ℹ abbreviated name: ¹se_logit_fraction_highTC
#> # ℹ 1 more variable: n <int>
Using EZget
EZget()
provides an alternative strategy for getting a
particular table. It has two required arguments:
-
obj
: TheEZbakRData
object you would like to get a table from. -
type
: The type of table you are looking for. Options are “fractions”, “kinetics”, “readcounts”, “averages”, and “comparisons”, the lists of tables described above.
Most of the remaining parameters are search criteria that you
specify. The full list can be seen in the function docs
(?EZget()
). These all except strings or vectors of strings
as input, and all metadata will be checked to see if the provided string
is contained in the respective metadata slot. For example, we can
extract the kinetics table generated from the standard analysis like
so:
kinetics <- EZget(ezbdo,
type = "kinetics",
kstrat = "standard")
In some cases, multiple tables with the exact same metadata exist.
For example, the metadata for fractions
tables is:
- The feature columns by which reads were grouped. This is “feature”
for both of our
fractions
tables. - The mutational populations analyzed. This is “TC” for both of our
fractions
tables. - The fraction_design table used. This is the standard fraction_design
for a single mutation type analysis for both of our
fractions
tables.
Since we set overwrite = FALSE
in our second run of
EstimateFractions
, these tables were both saved. What
distinguishes them is a final piece of metadata saved for all tables:
repeatID
. This is a numerical ID that distinguishes
multiple instances of the same table. The ID is 1 for the first such
object created, 2 for the second, etc. Thus, the analysis with the
standard mixture model has a repeatID
of 1, and the
analysis with the hierarchical mixture model has a repeatID
of 2. We can thus access the latter as such:
h_fxn <- EZget(ezbdo,
type = 'fractions',
repeatID = 2)
There are three parameters that tune EZget()
’s behavior.
These are:
-
returnNameOnly
: If TRUE, then only the names of the tables consistent with the search criterion you specify will be returned. This will throw a warning if there is more than one table that passes your criteria, but it will not error in this case. IfreturnNameOnly
isFALSE
, then an error is thrown if there is more than one table that matches your search criteria. -
exactMatch
: Thefeatures
andpopulations
arguments are the two arguments that can be vectors of strings. SettingexactMatch
to TRUE will force the providedfeatures
andpopulations
vectors to exactly match those in a table’s metadata for that table to be returned. The alternative (default) behavior, is that the providedfeature(s)
andpopulation(s)
only have to all be contained in a table’s metadata. -
alwaysCheck
: If only a single table of the relevanttype
is present in yourEZbakRData
object,EZget()
automatically returns that table without checking to see if the search criteria match. If you setalwaysCheck
to TRUE, then the table is searched for as normal and will only be returned if its metadata match the search criteria.