Skip to main content

Table 15 Methods for hypothesis testing for a single variable: t-test, permutation test

From: Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

t-test

 The t-test is a standard test for comparing the means of two groups, for continuous outcomes (e.g., blood pressure or tumor size after therapy for a treatment and a control group, or expression values of a gene for two patient groups with different diseases). The null hypothesis is that the true difference between the group means is 0, and the alternative hypothesis is that it is not 0 (two-sided testing). The t-statistic underlying the usual t-test equals the ratio of the observed mean difference and a pooled standard error of both groups. It is important to note that validity of a statistical test depends on assumptions that should be checked. For this t-test, assumptions include independence of the observations, approximate normal distribution of the variable in each group and similar variance of the variable irrespective of group. t-tests tend to be sensitive to outliers, and in such situations, alternative nonparametric tests may be preferred. Extensions include the Welch test, if group variances are not assumed equal, and one-way ANOVA (analysis of variance), when more than two groups are compared

Permutation test

 The idea behind a permutation test is to scramble the data to mimic a null hypothesis situation in which a variable is not associated with a particular outcome or phenotype. For the simple example of comparing the distribution of a variable between two phenotype classes, a permutation test would randomly scramble or re-assign class labels to the collection of observations. For each data permutation, the test statistic is calculated and recorded. After this statistic has been calculated on many permuted versions of the data, a p-value can be computed as the number of permutations on which the calculated test statistic was as extreme or more than the test statistic calculated of the original data