Integrate microRNA and gene expression — mirnaIntegration • MIRit

This function allows to identify microRNAs that are significantly associated/correlated with their targets. The principle is that, since the biological role of miRNAs is mainly to negatively regulate gene expression post-transcriptionally, the expression of a microRNA should be negatively correlated with the expression of its targets. To test this assumption for matched-sample data, this function performs a correlation analysis. On the other hand, for unpaired data, it offers different one-sided association tests to estimate if targets of down-regulated miRNAs are enriched in up-regulated genes and vice versa. Additionally, for unpaired data, miRNA effects on target gene expression can also be quantified through a fast approximation to rotation gene-set testing ('fry' method). For correlation analyses, the default behavior is to use Spearman's correlation analysis, whereas for association tests the default option makes use of a one-sided Boschloo's exact test. See the details section for further information.

Usage

mirnaIntegration(
  mirnaObj,
  test = "auto",
  pCutoff = 0.05,
  pAdjustment = "fdr",
  corMethod = "spearman",
  corCutoff = 0.5,
  associationMethod = "boschloo",
  nuisanceParam = 100,
  BPPARAM = bpparam()
)

Arguments

mirnaObj: A MirnaExperiment object containing miRNA and gene data
test: The statistical test to evaluate the association between miRNAs and genes. It must be one of auto (default), to automatically determine the appropriate statistical test; correlation, to perform a correlation analysis; association, to perform a one-sided association test; fry to perform the integrative analysis through rotation gene-set testing
pCutoff: The adjusted p-value cutoff to use for statistical significance. The default value is 0.05
pAdjustment: The p-value correction method for multiple testing. It must be one of: fdr (default), BH, none, holm, hochberg, hommel, bonferroni, BY
corMethod: The correlation method to be used for correlation analysis. It must be one of: spearman (default), pearson, kendall. See the details section for further information
corCutoff: The minimum (negative) value of correlation coefficient to consider meaningful a miRNA-target relationship. Default is 0.5
associationMethod: The statistical test used for evaluating the association between miRNAs and their targets for unpaired data. It must be one of boschloo (default), to perform a one-sided Boschloo's exact test; fisher-midp, to compute a one-sided Fisher's exact test with Lancaster's mid-p correction; fisher, to perform a one-sided Fisher's exact test
nuisanceParam: The number of nuisance parameter values considered for p-value calculation in boschloo method. The higher this value, the better the p-value estimation accuracy. Default is 100
BPPARAM: The desired parallel computing behavior. This parameter defaults to BiocParallel::bpparam(), but this can be edited. See BiocParallel::bpparam() for information on parallel computing in R

Value

A MirnaExperiment object containing integration results. To access these results, the user can make use of the integration() function. For additional details on how to interpret the results of miRNA-gene integrative analysis, please see MirnaExperiment.

Details

As already pointed out, if miRNA and gene expression data derive from the same samples, a correlation analysis is used. For evaluating these relationships, the default method used is Spearman's correlation coefficient, as:

it does not need normally distributed data;
it does not assume linearity;
it is much more resistant to outliers.

However, the user can also decide to use other correlation methods, such as Pearson's and Kendall's correlation. Nevertheless, for NGS data it may happen that a certain number of ties is present in the expression values. This can be handled by spearman method as it computes a tie-corrected version of Spearman's coefficients. However, another correlation method that is suitable to perform rank correlation on tied data is the Kendall's tau-b method, usable with kendall.

Regarding correlation direction, since miRNAs mainly act as negative regulators, only negatively correlated miRNA-target pairs are evaluated, and statistical significance is calculated through a one-tailed t-test.

Please notice that if strong batch effects are noticed in expression data, it is recommended to remove them through the batchCorrection() function implemented in MIRit.

Moreover, if gene expression data and miRNA expression data derive from different samples (unpaired data), a correlation analysis can't be performed. However, one-sided association tests can be applied in these cases to evaluate if targets of down-regulated miRNAs are statistically enriched in up-regulated genes, and, conversely, if targets of up-regulated miRNAs are statistically enriched in down-regulated genes. In this case, Fisher's exact test can be used to assess the statistical significance of this inverse association. Moreover, Lancaster's mid-p adjustment can be applied since it has been shown that it increases statistical power while retaining Type I error rates. However, Fisher's exact test is a conditional test that requires the sum of both rows and columns of a contingency table to be fixed. Notably, this is not true for genomic data because it is likely that different datasets may lead to a different number of DEGs. Therefore, the default behavior in MIRit is to use a variant of Barnard's exact test, named Boschloo's exact test, that is suitable when group sizes of contingency tables are variable. Moreover, it is possible to demonstrate that Boschloo's test is uniformly more powerful compared to Fisher's exact test.

Finally, for unpaired data, the effect of DE-miRNAs on the expression of target genes can be estimated through rotation gene-set tests. In particular, a fast approximation to rotation gene-set testing called fry, implemented in the limma package, can be used to statistically quantify the influence of miRNAs on the expression changes of their target genes.

To speed up the identification of anti-correlated or anti-associated miRNA-target pairs, this function implements parallel computation via BiocParallel::bpparam(). In this regard, the parallelization behavior can be specified via the BPPARAM parameter.

References

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), e47. doi:10.1093/nar/gkv007.

Di Wu and others, ROAST: rotation gene set tests for complex microarray experiments, Bioinformatics, Volume 26, Issue 17, September 2010, Pages 2176–2182, https://doi.org/10.1093/bioinformatics/btq401.

Routledge, R. D. (1994). Practicing Safe Statistics with the Mid-p. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 22(1), 103–110, https://doi.org/10.2307/3315826.

Boschloo R.D. (1970). "Raised Conditional Level of Significance for the 2x2-table when Testing the Equality of Two Probabilities". Statistica Neerlandica. 24: 1–35. doi:10.1111/j.1467-9574.1970.tb00104.x.

Author

Jacopo Ronchi, jacopo.ronchi@unimib.it

Examples

# load example MirnaExperiment object
obj <- loadExamples()

# perform integration analysis with default settings
obj <- mirnaIntegration(obj)
#> Since data derive from paired samples, a correlation test will be used.
#> Performing Spearman's correlation analysis...
#> A statistically significant correlation between 215 miRNA-target pairs was found!