Simulate one replicate of multi-label NR-seq data
SimulateMultiLabel.RdGeneralizes SimulateOneRep() to simulate any combination of mutation types. Currently, no kinetic model is used to relate certain parameters to the fractions of reads belonging to each simulated mutational population. Instead these fractions are drawn from a Dirichlet distribution with gene-specific parameters.
Usage
SimulateMultiLabel(
nfeatures,
populations = c("TC"),
fraction_design = create_fraction_design(populations),
fractions_matrix = NULL,
read_vect = NULL,
sample_name = "sampleA",
feature_prefix = "Gene",
kdeg_vect = NULL,
ksyn_vect = NULL,
logkdeg_mean = -1.9,
logkdeg_sd = 0.7,
logksyn_mean = 2.3,
logksyn_sd = 0.7,
phighs = stats::setNames(rep(0.05, times = length(populations)), populations),
plows = stats::setNames(rep(0.002, times = length(populations)), populations),
seqdepth = nfeatures * 2500,
readlength = 200,
alpha_min = 3,
alpha_max = 6,
Ucont = 0.25,
Acont = 0.25,
Gcont = 0.25,
Ccont = 0.25
)Arguments
- nfeatures
Number of "features" (e.g., genes) to simulate data for
- populations
Vector of mutation populations you want to simulate.
- fraction_design
Fraction design matrix, specifying which potential mutational populations should actually exist. See ?EstimateFractions for more details.
- fractions_matrix
Matrix of fractions of each mutational population to simulate. If not provided, this will be simulated. One row for each feature, one column for each mutational population, rows should sum to 1.
- read_vect
Vector of length =
nfeatures; specifies the number of reads to be simulated for each feature. If this is not provided, the number of reads simulated is equal toround(seqdepth * (ksyn_i/kdeg_i)/sum(ksyn/kdeg)). In other words, the normalized steady-state abundance of a feature is multiplied by the total number of reads to be simulated and rounded to the nearest integer.- sample_name
Character vector to assign to
samplecolumn of output simulated data table (the cB table).- feature_prefix
Name given to the i-th feature is
paste0(feature_prefix, i). Shows up in thefeaturecolumn of the output simulated data table.- kdeg_vect
Vector of length =
nfeatures; specifies the degradation rate constant to use for each feature's simulation. If this is not provided andfn_vectis, thenkdeg_vect = -log(1 - fn_vect)/label_time. If bothkdeg_vectandfn_vectare not provided, each feature'skdeg_vectvalue is drawn from a log-normal distrubition with meanlog =logkdeg_meanand sdlog =logkdeg_sd.kdeg_vectis actually only simulated in the case whereread_vectis also not provided, as it will be used to simulate read counts as described above.- ksyn_vect
Vector of length =
nfeatures; specifies the synthesis rate constant to use for each feature's simulation. If this is not provided, andread_vectis also not provided, then each feature'sksyn_vectvalue is drawn from a log-normal distribution with meanlog =logksyn_meanand sdlog =logksyn_sd. ksyn's do not need to be simulated ifread_vectis provided, as they only influence read counts.- logkdeg_mean
If necessary, meanlog of a log-normal distribution from which kdegs are simulated
- logkdeg_sd
If necessary, sdlog of a log-normal distribution from which kdegs are simulated
- logksyn_mean
If necessary, meanlog of a log-normal distribution from which ksyns are simulated
- logksyn_sd
If necessary, sdlog of a log-normal distribution from which ksyns are simulated
- phighs
Vector of probabilities of mutation rates in labeled reads of each type denoted in
populations. Should be a named vector, with names being the correspondingpopulation.- plows
Vector of probabilities of mutation rates in unlabeled reads of each type denoted in
populations. Should be a named vector, with names being the correspondingpopulation.- seqdepth
Only relevant if
read_vectis not provided; in that case, this is the total number of reads to simulate.- readlength
Length of simulated reads. In this simple simulation, all reads are simulated as being exactly this length.
- alpha_min
Minimum possible value of alpha element of Dirichlet random variable
- alpha_max
Maximum possible value of alpha element of Dirichlet random variable
- Ucont
Probability that a nucleotide in a simulated read is a U.
- Acont
Probability that a nucleotide in a simulated read is an A.
- Gcont
Probability that a nucleotide in a simulated read is a G.
- Ccont
Probability that a nucleotide in a simulated read is a C.
Examples
simdata <- SimulateMultiLabel(3)
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
#> `.name_repair` is omitted as of tibble 2.0.0.
#> ℹ Using compatibility `.name_repair`.
#> ℹ The deprecated feature was likely used in the EZbakR package.
#> Please report the issue to the authors.