Generalized dynamical systems modeling
EZDynamics.Rd
EZDynamics()
estimates parameters of a user-specified dynamical systems
model. The dynamical system model is specified through an adjacency matrix,
which is an NxN matrix described below (see graph
documentation). Modeling
can either be done for species all assayed in each sample, or species that
are assayed across a set of independent samples (e.g., subcellular fractionation
involves assaying different species in different samples).
Usage
EZDynamics(
obj,
graph,
sub_features,
grouping_features,
scale_factors = NULL,
sample_feature = NULL,
modeled_to_measured = NULL,
parameter_names = paste0("k", 1:max(graph)),
unassigned_name = "__no_feature",
type = "averages",
features = NULL,
populations = NULL,
fraction_design = NULL,
parameter = NULL,
repeatID = NULL,
exactMatch = TRUE,
feature_lengths = NULL,
overwrite = TRUE
)
Arguments
- obj
Currently must be an EZbakRData object on which
AverageAndRegularize
has been run. In the future, will also support (in case where all species are assayed in every sample) providing output of justEstimateFractions()
as input, acting as a generalization ofEstimateKinetics()
in that case.- graph
An NxN adjacency matrix, where N represents the number of species being modeled. One of these species must be called "0" and represent the "no RNA" species. This is the species from which some species are synthesized (e.g., 0 -> P, means premature RNA is synthesized from no RNA), and the species to which some species are degraded (e.g., M -> 0 means mature RNA is converted to "no RNA" via degradation). The rows and columns of this matrix must be the names of all modled species, and rownames(graph) == colnames(graph). Entry i,j of the matrix is either 0 if species i cannot be converted into species j under your model, and an integer from 1:npars (where npars = total number of parameters to be estimated) if it can.
For example, the model 0 -> P -> M -> 0 would have the
graph
:matrix(c(0, 1, 0, 0, 0, 2, 3, 0, 0), nrow = 3, ncol = 3, byrow = TRUE
.- sub_features
Which feature columns distinguish between the different measured species? Note, the measured species need not have the same name, and may not be directly equivalent to, the modeled species. The relationship between the modeled species in
graph
andsub_features
needs to be specified inmodeled_to_measured
if the names are not equivalent though.- grouping_features
Which features are the overarching feature assignments by which
sub_features
should be grouped? This will usually be the feature columns corresponding to full-gene assignments, as well as any higher order assignments (e.g., chromosome). Asub_feature
can be included ingrouping_features
if it never has the value ofunassigned_name
("__no_feature" by default). Only onesub_feature
should ever fulfill this criterion though.- scale_factors
Data frame mapping samples to factors by which to multiply read counts so as ensure proper normalization between different RNA populations. Only relevant if you are modeling relationships between distinct RNA populations, for example RNA from nuclear and cytoplasmic fractions. Will eventually be inferred automatically.
- sample_feature
If different samples involve assaying different species, this must be the name of the metadf column that distinguishes the different classes of samples. For example, if analyzing a subcellular fractionation dataset, you likely included a column in your metadf called "compartment". This would then be your
sample_feature
, assuming you ranAverageAndRegularize()
, with amean_formula
that included compartment as a term.- modeled_to_measured
If
sub_features
is not identical to the non "0" species names ingraph
, then you must specify the relationship betweensub_features
,sample_feature
(if specified), and the species ingraph
. This is done through a list of formulas, whose left side is asub_feature
and whose right side is a function of species ingraph
. Ifsample_feature
is not specified, thenmodeled_to_measured
should be a nameless list of formulae. Ifsample_feature
is specified, thenmodeled_to_measured
must be a named list of formulas where the names correspond to unique values ofsample_feature
. In this latter case, the elements, should be the mapping of measured features (sub_features
) to modeled species (names ingraph
that aren't "0").For example, if your model is 0 -> P -> M -> 0, where P stands for premature RNA and M stands for mature RNA, and you have a column called GF that corresponds to assignment anywhere in the gene, and XF that corresponds to assignment of exclusively exon mapping reads, then your
modeled_to_measured
should belist(GF ~ P + M, XF ~ M).
.As another example, if your model is 0 -> N -> C -> 0, where N stands for "nuclear RNA" and C stands for "cytoplasmic RNA", and your
sample_feature
takes on values of "nuclear", "cytoplasmic", and "total", and you have a singlesub_feature
called XF, then yourmodeled_to_measured
shouild belist(nuclear = GF ~ N, cytoplasmic = GF ~ C, total = GF ~ C + N)
. This is interpreted as meaning in groups of samples for whichsample_feature
== "nuclear", reads assigned to a GF are from the N species (nuclear RNA). Whensample_feature
== "cytoplasmic", reads assigned to GF correspond to the C species (cytoplasmic RNA). Whensample_feature
== "total", reads assigned to GF correspond to a combination of N and C (nuclear and cytoplasmic RNA).- parameter_names
Vector of names you would like to give to the estimated parameters. ith element should correspond to name of parameter given the ID i in
graph
. By default, this is just ki, where i is this numerical index.- unassigned_name
What value will a
sub_feature
column have if a read was not assigned to said feature? "__no_feature" by default.- type
What type of table would you like to use? Currently only supports "averages", but will support "fractions" in the near future.
- features
Character vector of the set of features you want to stratify reads by and estimate proportions of each RNA population. The default of "all" will use all feature columns in the
obj
's cB.- populations
Mutational populations that were analyzed to generate the fractions table to use. For example, this would be "TC" for a standard s4U-based nucleotide recoding experiment.
- fraction_design
"Design matrix" specifying which RNA populations exist in your samples. By default, this will be created automatically and will assume that all combinations of the
mutrate_populations
you have requested to analyze are present in your data. If this is not the case for your data, then you will have to create one manually. See docs forEstimateFractions
(run ?EstimateFractions()) for more details.- parameter
Parameter to average across replicates of a given condition. Has to be "logit_fraction_high
", where is the type of mutation modeled in EstimateFractions()
(e.g, TC) in this case.- repeatID
If multiple
fractions
tables exist with the same metadata, then this is the numerical index by which they are distinguished.- exactMatch
If TRUE, then
features
andpopulations
have to exactly match those for a given fractions table for that table to be used. Means that you can't specify a subset of features or populations by default, since this is TRUE by default.- feature_lengths
Table of effective lengths for each feature combination in your data. For example, if your analysis includes features named GF and XF, this should be a data frame with columns GF, XF, and length.
- overwrite
If TRUE and a fractions estimate output already exists that would possess the same metadata (features analyzed, populations analyzed, and fraction_design), then it will get overwritten with the new output. Else, it will be saved as a separate output with the same name + "_#" where "#" is a numerical ID to distinguish the similar outputs.
Details
When running AverageAndRegularize()
to produce input for EZDynamics()
, you
must set parameter
to "logit_fraction_highEstimateFractions()
, e.g., TC), and you must include the label time
in your regression formula. EZDynamics()
models the logit(fraction high <muttype),
and this will depend on the label time (longer label time = higher fraction), which
is why these two conditions must be met.