Correct for batch effects in miRNA and gene expression measurements

This function allows to remove unwanted batch effects from miRNA and gene expression matrices. In particular, this function fits a linear model to miRNA/gene expression levels, and then removes the variability caused by batch effects. Furthermore, a weighted surrogate variable analysis (WSVA) can also be included to remove the effects due to surrogate variables. If batch effects are present, it is crucial to remove them with this function before moving to correlation analysis.

Usage

batchCorrection(
  mirnaObj,
  assay,
  batch = NULL,
  batch2 = NULL,
  covariates = NULL,
  includeWsva = FALSE,
  n.sv = 1L,
  weight.by.sd = TRUE
)

Arguments

mirnaObj: A MirnaExperiment object containing miRNA and gene data
assay: The expression matrix to correct. It must be one of genes and microRNA
batch: It must be the name of a variable present in the colData of a MirnaExperiment object (eg. "disease"), or, alternatively, it must be a character/factor object that defines batch memberships. See the details section for additional information
batch2: It must be the name of a variable present in the colData of a MirnaExperiment object (eg. "disease"), or, alternatively, it must be a character/factor object that defines another series of batches that have additive effects to those specified in batch. See the details section for additional information
covariates: Additional numeric covariates that we want to correct for. It must be a character vector containing the names of numeric variables present in the colData of a MirnaExperiment object (eg. c("age", "RIN", "quantity")), or, alternatively, it must be a simple matrix object. See the details section for additional information
includeWsva: Logical, whether to correct for surrogate variables or not. Default is FALSE
n.sv: The number of surrogate variables to estimate
weight.by.sd: Logical, whether to specifically tune the surrogate variables to the more variable genes or not. Default is TRUE

Value

A MirnaExperiment object containing batch effect-corrected expression matrices.

Details

Batch effects consist in unwanted sources of technical variation that confound expression variability and limit downstream analyses. Since the reliability of biological conclusions of integrative miRNA-mRNA analyses depends on the association between miRNA and gene expression levels, it is pivotal to ensure that expression measurements are not affected by technical variations. In this regard, if batch effects are noticed in the data, the user should run this function before using the mirnaIntegration() function to perform a correlation analysis.

Usually, given a MirnaExperiment object, the user should specify:

the assay from which we want to remove batch effects (one between genes and microRNA);
the batch variable, which is a variable that defines the different batches;
the batch2 variable, which can be included to correct for a second series of batches that have additive effects to those specified in batch;
the covariates variables, which allows correction for one or more continuous numeric effects.

In particular, batch and batch2 could be provided as the names of covariates included in the colData of a MirnaExperiment object. Alternatively, they can be character/factor objects that declare batch memberships. Similarly, covariates can be supplied as a vector containing the names of numeric variables listed in the colData of MirnaExperiment objects, or they can be provided as a simple matrix.

Additionally, the influence of unknown sources of technical variation can be removed by including surrogate variables estimated through WSVA. To do so, we can set includeWsva to TRUE, and then we can specify the number of surrogate variables to use through the n.sv parameter. Further, the surrogate variables can be tuned to the more variable genes by setting weight.by.sd to TRUE.

Please note that we only recommend to remove batch effects directly from expression measurements prior to correlation analysis. This function can't be used to remove batch effects before differential expression analysis, because for that purpose, it is better to include batch variables in the linear model. In this way, we do not underestimate the residual degrees of freedom, so that the calculated standard errors, t-statistics and p-values are not overoptimistic.

Note

To estimate surrogate variables and to remove batch effects from expression data, MIRit uses the limma::wsva() and limma::removeBatchEffect() functions, respectively.

References

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), e47. doi:10.1093/nar/gkv007.

Author

Jacopo Ronchi, jacopo.ronchi@unimib.it

Examples

# load example MirnaExperiment object
obj <- loadExamples()

# correct batch effects due to the patient from miRNA expression matrix
obj <- batchCorrection(obj, "microRNA", batch = "patient")