Behavioral Ecology Advance Access originally published online on September 1, 2004
Behavioral Ecology 2005 16(1):325; doi:10.1093/beheco/arh145
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Forum |
What hypothesis tests are not: a reply to Johnson
a SBS, University of Edinburgh, Ashworth Laboratories, King's Buildings, West Mains Road, Edinburgh, EH9 3JT, UK, and b Division of Environmental and Evolutionary Biology, IBLS, Graham Kerr Building, University of Glasgow, Glasgow, G12 8QQ, UK
Address correspondence to N. Colegrave. E-mail: n.colegrave{at}ed.ac.uk.
Received 28 May 2004; accepted 4 June 2004.
We are sorry that Johnson's pleasure at reading our recent paper (Colegrave and Ruxton, 2003
) was so short-lived. Johnson (2005)
is correct that the definition of the P value that we use in the paper is incorrect, and we are grateful to him for correcting this unfortunate slip of the pen. However, while the Pr{hypothesis|data} may differ dramatically from Pr{data|hypothesis}, this has no effect whatsoever on the arguments that we were making. Thus, we would like to emphasise that the important take-home message of our paper, that using confidence intervals to think more about the range of effect sizes that are consistent with the data is more useful than thinking too much about P values and post hoc power analysis, remains unchanged.
Johnson also correctly points out that just considering the range of the confidence interval rather than its position and range can lead to a misinterpretation of the likelihood of the real effect size being small or zero. Indeed, probably the best statistic to quote would be the estimated probability of the effect size being within some user-defined tolerance (d) of zero. For parametric tests this can be obtained very easily from the confidence interval. Assuming that the defined tolerance, is less than the magnitude of the measured effect size (e) this probability is
![]() | (1) |
![]() | (2) |
![]() | (3) |
In the case of the example in our previous paper, this gives a probability of the actual effect size being between 0.1 and +0.1 (i.e., d = 0.1) of 0.09, with a confidence limit of (0.07, 0.81) (Johnson, 2005
: Figure 1). For the broader confidence limit of (0.59, 1.33) (Johnson, 2005
: Figure 2) this probability becomes 0.12. Thus, we concur with Johnson that the broader confidence interval actually gives more credence to the actual effect size being very small than the narrower interval (although note that the likelihoods he calculates are for a one-tailed rather than a two-tailed hypothesis). The conclusion from Colegrave and Ruxton (2003)
regarding the maximum effect sizes consistent with these confidence limits remain unchanged. We are grateful to Johnson for drawing our attention to this issue and hope that he will now feel that "...[his] efforts are both appreciated and contributing to the advance of science" (Johnson, 2005
).
| ACKNOWLEDGEMENTS |
|---|
We thank Sean Nee for comments on this reply.
| REFERENCES |
|---|
|
|
|---|
Colegrave N, and Ruxton GD, 2003. Confidence intervals are a more useful compliment to nonsignificant tests than are power calculations. Behav Ecol 14:446447.
Johnson DH, 2005. What hypothesis tests are not: a response to Colegrave and Ruxton. Behav Ecol 16:204205.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Rashed and T. N. Sherratt Mimicry in hoverflies (Diptera: Syrphidae): a field test of the competitive mimicry hypothesis Behav. Ecol., March 1, 2007; 18(2): 337 - 344. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



