Simulation of transcript isoform kinetic parameters.

SimulateIsoforms() performs a simple simulation of isoform-specific kinetic parameters to showcase and test EstimateIsoformFractions(). It assumes that there are a set of reads (fraction of total set by funique parameter) which map uniquely to a given isoform, while the rest are ambiguous to all isoforms from that gene. Mutational content of these reads are simulated as in SimulateOneRep().

Usage

SimulateIsoforms(
  nfeatures,
  nt = NULL,
  seqdepth = nfeatures * 2500,
  label_time = 4,
  sample_name = "sampleA",
  feature_prefix = "Gene",
  pnew = 0.1,
  pold = 0.002,
  funique = 0.2,
  readlength = 200,
  Ucont = 0.25,
  avg_numiso = 2,
  psynthdiff = 0.5,
  logkdeg_mean = -1.9,
  logkdeg_sd = 0.7,
  logksyn_mean = 2.3,
  logksyn_sd = 0.7
)

Arguments

nfeatures: Number of "features" to simulate data for. Each feature will have a simulated number of transcript isoforms
nt: (Optional), can provide a vector of the number of isoforms you would like to simulate for each of the nfeatures features. Vector can either be length 1, in which case that many isoforms will be simulated for all features, or length equal to nfeatures.
seqdepth: Total number of sequencing reads to simulate
label_time: Length of s^4^U feed to simulate.
sample_name: Character vector to assign to sample column of output simulated data table (the cB table).
feature_prefix: Name given to the i-th feature is paste0(feature_prefix, i). Shows up in the feature column of the output simulated data table.
pnew: Probability that a T is mutated to a C if a read is new.
pold: Probability that a T is mutated to a C if a read is old.
funique: Fraction of reads that uniquely "map" to a single isoform.
readlength: Length of simulated reads. In this simple simulation, all reads are simulated as being exactly this length.
Ucont: Probability that a nucleotide in a simulated read is a U.
avg_numiso: Average number of isoforms for each feature. Feature-specific isoform counts are drawn from a Poisson distribution with this average. NOTE: to insure that all features have multiple isoforms, the simulated number of isoforms drawn from a Poisson distribution is incremented by 2. Thus, the actual average number of isoforms from each feature is avg_numiso + 2.
psynthdiff: Percentage of genes for which all isoform abundance differences are synthesis driven. If not synthesis driven, then isoform abundance differences will be driven by differences in isoform kdegs.
logkdeg_mean: meanlog of a log-normal distribution from which kdegs are simulated
logkdeg_sd: sdlog of a log-normal distribution from which kdegs are simulated
logksyn_mean: meanlog of a log-normal distribution from which ksyns are simulated
logksyn_sd: sdlog of a log-normal distribution from which ksyns are simulated

Examples

simdata <- SimulateIsoforms(30)