Skip to main content

Table 7 Methods for batch correction: ComBat, SVA (surrogate variable analysis)

From: Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

ComBat

 ComBat is a widely used batch correction method that has been shown to have generally good performance [47]. For each gene, this method estimates location and scale parameters for each batch separately. Then the data are transformed using these parameter estimates so that the location and scale parameters are the same across batches. This method is robust to outliers also in small sample sizes and thus especially well-suited for HDD analysis. ComBat-Seq [48] is an extension specifically developed for count data using a negative binomial model, and it is compatible with differential expression algorithms that require counts. Figure 9 [49] shows an example of the effect of ComBat, comparing the results with and without using this batch correction.

SVA (surrogate variable analysis)

 Variability in measurements may arise from unknown technical sources or biological sources that are not expected or controllable and can affect the accuracy of statistical inference in genome-wide expression experiments. SVA [50] was developed to deal with the unmeasured factors that influence gene expression by introducing spurious signal or confounding biological signal. SVA identifies unobserved factors and construct surrogate variables that can be used as covariates in subsequent analyses to improve the accuracy and reproducibility of the results. SVA was first developed for microarray data and later adapted for sequencing data [51].