If you set a type I error (rejection of a true null hypothesis) of 0.05, there is 5% of risk to reject your null hypothesis even if it is actually true. If you do 100 tests, 5 of your previously found significative tests might actually be type I errors (false positive).

If you want to solve this problem, you need to correct your p-values according to the number of tests you have done.

### The Bonferroni correction

The idea of Bonferroni correction is to set the p-value thresold lower than 0.05 to compensate for type I error. In this correction, you divide your critical value by the number of tests. With 25 tests, your critical value will be 0.05 / 25 = 0.002.

variables = vector()
for (i in seq(1,25))
variables = c(variables, paste("variable ",i))
p.values <- c(0.001,0.008,0.039,0.041,0.042,0.06,0.074,0.205,0.212,0.216,0.222,0.251,0.269,0.275,0.34,0.341,0.384,0.569,0.594,0.696,0.762,0.94,0.942,0.975,0.986)
table <- data.frame(variables, p.values)
bonferroni.correction <- 0.05 / length(p.values)
table$bonferroni.correction <- bonferroni.correction table Before Bonferroni correction, p-values of variables 1 to 6 are below the critical value. After Bonferroni correction, it is the case only for variable 1. Bonferroni adjusted p-values can be obtained through the p.adjust function : minimum(1, p-value * number of tests) p.adjust(p.values, method="bonferroni") ## [1] 0.025 0.200 0.975 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 ## [12] 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 ## [23] 1.000 1.000 1.000 If you have a large number of tests, Bonferroni correction becomes too restrictive and only huge differences in your sample will stand out, so you can produce many false negative. ### The Benjamini-Hochberg correction The Benjamni-Hochberg procedure controls the false discovery rate (false positives). Here we do not correct for every comparison made but only for the remaining comparisons in assigning an index from the lowest to the highest p-values.Unlike the Bonferroni correction, the critical value of p then gets bigger and less conservative for each comparison. You first have to choose the percentage of false rate (Q) you are willing to accept (for example 5%). The procedure first consists in ordering the p-values obtained. Then, you compute the Benjamini-Hochberg critical value for each p-value : (index / number of tests) * Q table$index <- rank(p.values)
table$bh.correction <- table$index / length(table\$index) * 0.05
table

Starting from the highest p-value, we then go up until the p-value to be <= to the BH correction. At this point, we stop and assume that all other comparisons are also significant. In this case, only variable 1 (p=0.001) is under the BH correction (p=0.002)

Benjamini-Hochberg adjusted p-values can be obtained through the p.adjust function: minimum(p-value * number of tests / index, p-value au rang supérieur )

p.adjust(p.values, method="BH")
##  [1] 0.0250000 0.1000000 0.2100000 0.2100000 0.2100000 0.2500000 0.2642857
##  [8] 0.4910714 0.4910714 0.4910714 0.4910714 0.4910714 0.4910714 0.4910714
## [15] 0.5328125 0.5328125 0.5647059 0.7815789 0.7815789 0.8700000 0.9071429
## [22] 0.9860000 0.9860000 0.9860000 0.9860000

### References

http://www.biostathandbook.com/multiplecomparisons.html

Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage publications.