Multiple comparaisons correction with Bonferroni and Benjamini-Hochberg procedures

If you set a type I error (rejection of a true null hypothesis) of 0.05, there is 5% of risk to reject your null hypothesis even if it is actually true. If you do 100 tests, 5 of your previously found significative tests might actually be type I errors (false positive).

If you want to solve this problem, you need to correct your p-values according to the number of tests you have done.

The Bonferroni correction

The idea of Bonferroni correction is to set the p-value thresold lower than 0.05 to compensate for type I error. In this correction, you divide your critical value by the number of tests. With 25 tests, your critical value will be 0.05 / 25 = 0.002.

variables = vector()
for (i in seq(1,25))
  variables = c(variables, paste("variable ",i))
p.values <- c(0.001,0.008,0.039,0.041,0.042,0.06,0.074,0.205,0.212,0.216,0.222,0.251,0.269,0.275,0.34,0.341,0.384,0.569,0.594,0.696,0.762,0.94,0.942,0.975,0.986)
table <- data.frame(variables, p.values)
bonferroni.correction <- 0.05 / length(p.values)
table$bonferroni.correction <- bonferroni.correction
table

Before Bonferroni correction, p-values of variables 1 to 6 are below the critical value. After Bonferroni correction, it is the case only for variable 1.

Bonferroni adjusted p-values can be obtained through the p.adjust function : minimum(1, p-value * number of tests)

p.adjust(p.values, method="bonferroni")

##  [1] 0.025 0.200 0.975 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## [12] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## [23] 1.000 1.000 1.000

If you have a large number of tests, Bonferroni correction becomes too restrictive and only huge differences in your sample will stand out, so you can produce many false negative.

The Benjamini-Hochberg correction

The Benjamni-Hochberg procedure controls the false discovery rate (false positives). Here we do not correct for every comparison made but only for the remaining comparisons in assigning an index from the lowest to the highest p-values.Unlike the Bonferroni correction, the critical value of p then gets bigger and less conservative for each comparison. You first have to choose the percentage of false rate (Q) you are willing to accept (for example 5%). The procedure first consists in ordering the p-values obtained. Then, you compute the Benjamini-Hochberg critical value for each p-value : (index / number of tests) * Q

table$index <- rank(p.values)
table$bh.correction <- table$index / length(table$index) * 0.05
table

Starting from the highest p-value, we then go up until the p-value to be <= to the BH correction. At this point, we stop and assume that all other comparisons are also significant. In this case, only variable 1 (p=0.001) is under the BH correction (p=0.002)

Benjamini-Hochberg adjusted p-values can be obtained through the p.adjust function: minimum(p-value * number of tests / index, p-value au rang supérieur )

p.adjust(p.values, method="BH")

##  [1] 0.0250000 0.1000000 0.2100000 0.2100000 0.2100000 0.2500000 0.2642857
##  [8] 0.4910714 0.4910714 0.4910714 0.4910714 0.4910714 0.4910714 0.4910714
## [15] 0.5328125 0.5328125 0.5647059 0.7815789 0.7815789 0.8700000 0.9071429
## [22] 0.9860000 0.9860000 0.9860000 0.9860000

References

http://www.biostathandbook.com/multiplecomparisons.html

Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage publications.

Multiple comparaisons correction with Bonferroni and Benjamini-Hochberg procedures

Sunny Avry

June 21th, 2019

The Bonferroni correction

The Benjamini-Hochberg correction

References