The APA Task Force on Statistical Inference (Wilkinson & TFSI, 1999) has advocated the inclusion of effect sizes in journal articles as an important source of information. The fifth and current edition of the APA publication manual (2001, 2010) emphasized these findings from the task force, along with requirement of effect sizes to publish in their journals. However, Fidler et al. (2005) have conveyed that only slight increases in report rates have been found in popular journals. A recent analysis of power, confidence intervals, and effect size showed that effect size is the most prevalent piece added to traditional null hypothesis test statistics (A. Fritz, Scherndl, & Kuhberger, 2013; see also C. Fritz, Morris, & Richler, 2012). A. Fritz et al. (2013) estimated that effect size reporting has increased to approximately 40% in clinical studies, while non-clinical studies show nearly 70% report rates (however, these numbers may be biased by analyses that are both effect sizes and tests, such as r). Lastly, the push for effect sizes lies not only with psychology; education (Task Force on Reporting of Research Methods in AREA Publications, 2006) and medicine (International Committee of Medical Journal Editors, 2010) have both shown similar movements.
It is possible this lack of proper reporting comes from the idea that effect sizes are generally thought of as mysterious and difficult to select and calculate (Lipsey, 1990). To alleviate the mysteriousness of effect sizes we would refer readers to the recent Kelley and Preacher (2012) article, which has defined effect size in terms of three ideas. First, effect size dimension is an abstract idea of how to quantify the concepts of experimental variance without using units. Second, effect size measure or effect size index is the specific formula or equation used to delineate the effect size. Third, effect size value is the actual value obtained when you have applied the effect size measure to specific data. For an example, if our effect size dimension is the standardized mean difference between two groups, our effect size measure would be Cohen’s d calculated by dividing that difference by the pooled standard deviation (Cohen, 1988; Hedges & Olkin, 1985).
Given what we’ve been told about reporting effect sizes, why are researchers omitting these values in their journal articles? Current effect size calculators do exist as webpages such as Soper’s (2013) and macros available for SPSS/SAS (Wilson, 2010; Smithson, 2003). However, the flexibility of these calculators, as well as the extent to which they explain the calculations, varies greatly. Additionally, the use of SPSS/SAS macros requires a particular knowledge set that an introductory statistics student or casual user may not have. SPSS and SAS will calculate partial eta squared with some simple button clicks (although SPSS has mislabeled this variable in the past: Pierce, Block, & Aguinis, 2004), as well as R squared for regression analyses. While ANOVA and regression are popular statistical analyses, we have not yet made calculation for post hocs or other designs quite so easy. Further complications arise if a researcher wishes to add confidence intervals for these effect sizes, which would be useful to examine in the wake of psychology’s “replicability crisis”. The problem with the calculation of confidence intervals on effect sizes is the use of the non-central distribution for confidence limits. Although Hedges and Olkin (1985) used the normal distribution to calculate the confidence limits on d, more recent work has shown that the non-central t distribution is more appropriate (Smithson, 2003; Cumming & Finch, 2001; Kelly, 2007). The calculation of any non-central distribution confidence limits is an iterative process that involves a working knowledge of calculus - not something generally expected from an average social science researcher.
Here we introduce MOTE: Measure of the Effect, much like G*Power (Faul, Erdfelder, Lang, & Buchner, 2007), that was developed to calculate effect sizes and their confidence intervals. First, a range of effect sizes will be included, such as Cohen’s d, omega, eta, r/R, phi, f, and odds-ratios. In cases where formulas were not ubiquitously agreed upon, both versions of calculations are provided. For example, Cohen’s d for dependent t-tests is traditionally calculated by divided the mean difference between time measurements by the standard deviation of the difference scores. However, as Cumming (2012) outlines, effect sizes can be artificially inflated with small difference scores, and therefore recommends dividing the mean differences by the average standard deviation of the two time measurements. Formulae are provided in the user’s guide and directly on the program, so users will know and can cite how they calculated their effect sizes. This information will also allow the users to understand their effect size in terms of its dimension, index, and value, which follows well with recent work (Kelley & Preacher, 2012).
One way to encourage a change in report rates of effect sizes is to train the next generation of researchers to include these values as part of or in lieu of the traditional hypothesis test. However, as statistics teachers know, it can be difficult to get students to understand which test to select, much less which effect size then corresponds to that statistical test. Therefore, a user can select the specific effect size they wish to calculate (such as Cohen’s d) or they can select the type of statistical test to calculate from (such as independent t-test). This format is flexible to both the goals and knowledge level of the user. Further, with the recent interest in replication and publication bias, both meta-analyses and effect size analyses are growing in prominence. Francis’ (2012a, 2012b) recent publications in these areas have shown that effect sizes and their confidence intervals give us a powerful insight into the potential file drawer of a study, as well as the ability to examine if an effect is consistent with a particular size. We believe that this user-friendly calculator will have a large impact on everyone from the general researcher to the statistics teacher.
References:
- American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC.
- American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC.
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
- Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge.
- Cumming, G., & Finch. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532-574. doi:10.1177/0013164401614002
- Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175-191.
- Fidler, F., Cumming, G., Thomason, N., Pannuzzo, D., Smith, J., Fyffe, P., Edmonds, H., Harrington, C., & Schmitt, R. (2005). Evaluating the effectiveness of editorial policy to improve statistical practice: The case of the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 73, 136-143.
- Francis, G. (2012a). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin & Review, 19, 975-991.
- Francis, G. (2012b). Too good to be true: Publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin & Review, 19, 151-156.
- Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141, 2-18.
- Fritz, A., Scherndl, T., & Kuhberger, A. (2013). A comprehensive review of reporting practices in psychological journals: Are effect sizes really enough?. Theory & Psychology, 23, 98- 122.
- Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York, NY: Academic Press.
- International Committee of Medical Journal Editors. (2010). Uniform requirements for manuscripts submitted to biomedical journals: Writing and editing for biomedical publication. Retrieved from http://www.icmje.org/urm_full.pdf.
- Kelley, K. (2007). Confidence intervals for standardized effect sizes: Theory, application, and implementation. Journal of Statistical Software, 20, 1-24.
- Kelley, K., & Preacher, K.J. (2012). On Effect Size. Psychological Methods, 17, 137-152.
- Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Newbury Park, CA: Sage.
- Pierce, C. A., Block, R. A., & Aguinis, H. (2004). Cautionary note on reporting eta-squared values from multifactor ANOVA designs. Educational and psychological measurement, 64, 916-924.
- Smithson, M. (2003). Confidence intervals. Thousand Oaks, CA: Sage.
- Soper, D.S. (2013). Effect Size (Cohen's d) Calculator for a Student t-Test [Software]. Available from http://www.danielsoper.com/statcalc
- Task Force on Reporting of Research Methods in AERA Publications. (2006). Standards for reporting on empirical social science research in AERA publications. Washington, DC: American Educational Research Association.
- Wilkinson, L., & American Psychological Association Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604. doi: 10.1037/0003-066X.54.8.594
- Wilson, D. B. (2010). Meta-analysis macros for SAS, SPSS, and Stata. Retrieved from http://mason.gmu.edu/~dwilsonb/ma.html.