Effect Sizes for Research: A Broad Practical Approach

Effect Sizes for Research Effect Sizes for Research A Broad Practical Approach A Broad Practical Approach This page pa

1,694 115 7MB

Pages 270 Page size 336 x 525.12 pts Year 2006

Report DMCA / Copyright

DOWNLOAD FILE

Recommend Papers

Effect Sizes for Research: A Broad Practical Approach

Effect Sizes for Research A Broad Practical Approach This page intentionally left blank Effect Sizes for Research A

572 11 17MB Read more

Practical Ethics for Psychologists: A Positive Approach

Practical Ethics for Psychologists A POSITIVE APPROACH Samuel J. Knapp Leon D. VandeCreek AMERICAN PSYCHOLOGICAL ASSOC

1,851 70 16MB Read more

The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results

This page intentionally left blank The Essential Guide to Effect Sizes This succinct and jargon-free introduction to

1,081 478 751KB Read more

Renal Nursing: A Practical Approach

1,062 84 1MB Read more

Corrective Exercise: A Practical Approach

CORRECTIVE EXERCISE A Practical Approach This page intentionally left blank CORRECTIVE EXERCISE A Practical Approach

2,417 21 9MB Read more

Overseas Research: A Practical Guide

302 37 5MB Read more

Ethics in Health Administration: A Practical Approach for Decision Makers

26524_FMxx_Morrison.qxd 8/9/05 1:50 PM Page i Eileen E. Morrison, EdD, MPH, CHES Associate Professor, Health Adminis

4,521 1,161 1MB Read more

Orthopedic Imaging: A Practical Approach 4th Edition

336 16 137MB Read more

Heart Failure: A Practical Approach to Treatment

Heart Failure NOTICE Medicine is an ever-changing science. As new research and clinical experience broaden our knowled

1,511 1,082 3MB Read more

California Politics and Government: A Practical Approach

1,818 25 1MB Read more

File loading please wait...

Citation preview

Effect Sizes for Research Effect Sizes for Research A Broad Practical Approach A Broad Practical Approach

This page page intentionally left blank

Effect Effect Sizes Sizes for for Research Research A Brood Broad Practical Practical Approach Approach A

Robert Grissom Robert J. J. Grissom

John J. J. Kim John Kim San San Francisco Francisco State State University University

� 2005 2005

IAWRENCE ERLBAUM ASSOCIATES, ASSOCIATES, PUBLISHERS LAWRENCE ERLBAUM PUBLISHERS London Mahwah, New New Jersey

Copyright Copyright © © 2005 by by Lawrence Lawrence Erlbaum Associates, Inc. rights reserved. reserved. No No part of this book may may be repro reproAll rights any form, by photostat, microform, microform, retrieval retrieval duced in any other means, without without prior prior written written per persystem, or any other mission of the publisher. mission

Inc., Lawrence Erlbaum Associates, Inc . , Publishers 10 10 Industrial Avenue Mahwah, New Jersey 007430 Mahwah, 7430

I Cover design by Sean Trane Sciarrone I Cover design by Sean Trane Sciarrone

Library of of Congress Congress Cataloging-in-Publication Cataloging-in-Publication Data Data Library Grissom, Robert JJ.. Effect sizes sizes for research :: a broad broad practical approach approach // RobEffect Rob ert J. Grissom, John J. Kim. p. cm. bibliographical references and and index. Includes bibliographical 0-8058-5014-7 ISBN 0-8058-5 0 1 4- 7 (alk. (alk. paper) of variance. 2. Effect Effect sizes sizes (Statistics (Statistics)) 33.. Exper11.. Analysis of Exper 1. Kim, II. Title. imental design. I. Kim, John J. II. QA279.F75 2005 CU\2 7 9 . F 75 2005 519.5'38—dc22 5 1 9 . 5'38-dc22

2004053284 2004053284 CIP

Books published published by Lawrence Lawrence Erlbaum Associates are printed Books bindings are chosen for strength on acid-free paper, and their bindings durability. and durability. Printed in the the United States of America 1100 9 9 88 7 66 55 4 3 2 1

This This book book is is dedicated dedicated to to those those scholars, scholars,

amply amply cited cited herein, herein, who who during during the the past past three three decades decades have have worked worked diligently diligently to to develop develop

and promote the and promote the use use of of effect effect sizes sizes and and robust robust

statistical methods methods and and to those who who have have statistical to those constructively criticized criticized such such procedures. procedures. constructively

This page page intentionally left blank

Contents

Preface Preface

1 Introdudtion Introduction 1 Review of Simple Cases of Null-Hypothesis Null-Hypothesis Significance Significance Testing 11 Signifying and Practical Practical Statistically Signifying 3 Significance Significance 3 Definition Definition of Effect Effect Size Size 44

xiii

I 1

Controversy About Null-Hypothesis Null-Hypothesis Significance Testing 4 4

The Purpose of This Book Book and and the the Need for a Broad Approach 6 6 Power Analysis Analysis 77 Meta-Analysis 8 8 Meta-Analysis Assumptions of Test Statistics and and Effect Effect Sizes 9 9 0 Violation Assumptions in Real Data Violation of Assumptions Data 110 Exploring the Data for a Possible Effect Effect Exploring Variability 114 of a Treatment on Variability 4 9 Worked Examples of Measures of Variability Variability 119 Questions 21 21 2 2 Confidence Confidence Intervals Intervals for Comparing Comparing the the Averages of Two Two Groups Introduction 23 Introduction 23

Confidence Intervals Intervals for �a - �b: Confidence Independent Groups Groups 24 Independent 24 Worked Example for Independent Groups 29 29 Further Discussions Discussions and Methods Further Methods 31 31 Solutions Solutions to Violations Violations of Assumptions: Assumptions: 32 Approximate Method Welch's Approximate Method 32 Worked Example of the the Welch Method 34 34

23 23

vii vii

viii viii

oP

CONTENTS CONTENTS

Yuen's Confidence Interval Interval for the the Difference

Between lWo Two 1rimmed Trimmed Means 36 36 Other Methods for Independent Groups Other

Dependent Groups

Questions Questions

46 46

43 43

40 40

Difference Between Means 3 3 The Standardized Difference Unfamiliar and Incomparable Unfamiliar Incomparable Scales 48 48

48 48

4 4 Correlational Effect Sizes for Comparing

70 70

Standardized Means: Standardized Difference Between Means: 49 Assuming Normality and a Control Control Group 49 Assuming Normality 53 Equal or Unequal Variances 53 Tentative Recommendations Recommendations 55 55 Additional Standardized-Difference Effect Effect Sizes Additional Standardized-Difference W hen There Are Outliers 57 When 57 Technical Note 3.1: 3.1: A Nonparametric Estimator Nonparametric Estimator of Standardized-Difference Standardized-Difference Effect Effect Sizes Sizes 58 58 of Confidence Intervals for a Standardized-Difference Standardized-Difference Effect Effect Size 59 59 Confidence Confidence Intervals Intervals Using Noncentral 64 Distributions Distributions 64 The Counternull Counternull Effect 65 Effect Size 65 Dependent Groups 67 67 Questions Questions 68 68

Two Groups The Point-Biserial Point-Biserial Correlation Correlation 70 70 Example of r pb 71 rpb 71 Confidence Confidence Intervals Intervals and Null-Counternull Null-Counternull Intervals for 72 72 Intervals for rrpop pop pop Assumptions r and r rpb 73 Assumptions of rand pb 73 Unequal Sample Sizes 76 76 Unreliability 76 Unreliability 76 Restricted Range 81 81 Large Effect Effect Size Size Values Small, Medium, and and Large Binomial Effect Display 87 87 Binomial Effect Size Display Limitations of the the BESD BESD 89 89 Limitations 91 The Coefficient Coefficient of Determination Determination 91 Questions 95 95

5 Effect Effect Size Size Measures That Go Go Beyond 5 Comparing Two Two Centers The Probability Superiority: Probability of Superiority: Independent Groups 98 98 Independent

85 85

98 98

CONTENTS CONTENTS

�

ix ix

the PS PS 1101 Example of the 01 03 A Related Measure 103 Measure of Effect Effect Size 1 Assumptions 1103 03 The Common Language Language Effect Effect Size Statistic Statistic 1105 05 5.1: The PS PS and and its Estimators Estimators 106 Technical Note 5.1: 1 06 Introduction to Overlap 106 Introduction 1 06 The Dominance Measure 1107 07 Cohen's U3 08 U3 1108 Effect Size 1109 Relationships Among Measures of Effect 09 Application to to Cultural Cultural Effect Effect Size 1110 10 Technical Note Note 5.2: 5.2: Estimating Effect Effect Sizes Throughout Distribution 111 111 Throughout a Distribution Hedges-Friedman 12 Hedges-Friedman Method 1112 Shift-Function Method 1112 Shift-Function 12 Graphical Estimators of Effect Effect Sizes 1113 Other Graphical 13 Dependent Groups 114 114 115 Questions 1 15 6 Effect Effect Sizes for One-Way ANOVA Designs 6 Introduction 117 Introduction 117 ANOVA Results for This Chapter 1117 ANOVA 17

117 1 17

7 Effect Sizes for Factorial Designs 7 Introduction 139 1 39

139 139

A Standardized-Difference Standardized-Difference Measure of Overall Overall Effect Size 1118 18 Effect A Standardized Standardized Overall Effect Effect Size Size 119 Using All Means 1 19 Strength of Association 1120 Strength 20 (n2) 1121 Eta Squared (112) 21 2 and Omega Squared «02) (w2) 1121 21 Epsilon Squared (e (£2)) and for Specific Specific Comparisons 123 123 Strength of Association for Evaluation of Criticisms Criticisms of Estimators Strength Evaluation Estimators of Strength of Association 1124 of 24 Standardized-Difference Effect Effect Sizes for Standardized-Difference for Two of kk Means at a Time 1127 27 Worked Examples 1128 28 Statistical Significance, Confidence Confidence Intervals, Statistical 129 and Robustness 1 29 Within-Groups Designs Designs and Further Reading Reading 134 134 Questions 1 37 137

Strength of Association: Proportion of Variance Variance 140 Explained 1 40

X x

�

CONTENTS CONTENTS

Partial Partial

w &2

2

141 1 41

Comparing Comparing Values Values of

&2 42 w2 1142

Effect Size Ratios of Estimates of Effect

1143 43

744 1 44 Manipulated Manipulated Factors Only 146 1 46 Manipulated Targeted Factor Factor and Intrinsic Peripheral Factor 1148 48 Illustrative Worked Examples 1150 Illustrative 50 Comparisons of Levels of a Manipulated Comparisons Manipulated Factor at One Level of a Peripheral Factor 1153 53 at and Extrinsic Targeted Classificatory Factor and 155 55 Peripheral Factor 1 156 Classificatory Factors Only 1 56 Statistical Inference and Further Further Reading 1160 60 Within-Groups 162 Within-Groups Factorial Designs 1 62 Additional Designs Designs and and Measures 1265 65 Limitations and and Recommendations Recommendations 166 Limitations 1 66 Questions 1167 Questions 67 and Results Results for This Chapter Designs and

8 Effect Effect Sizes Sizes for for Categorical Categorical Variables Variables 8 Review 1270 70 Background Review Chi-Square Test Test and and Phi Phi 1173 Chi-Square 73 Null-Counternull Null-Counternull Interval Interval for for Phi Phipop 1 76 pop 176

170 170

The Difference Between Between Two Proportions Proportions 1277 77 T he Difference Approximate Confidence Interval for P 82 Approximate P11 - PP2 1182 and the the Number Needed to to Treat 1283 83 Relative Risk and The Odds Odds Ratio 1288 88 1 91 Construction of Confidence Confidence Intervals Intervals for OR ORppop Construction opop 292 p Tables LargerThan Than 193 Tables Larger 2 2 x x 22 1 93 for Large Large rr x x cc Tables Tables 1295 95 Odds Ratios for Multiway Tables 1296 96 Multiway Tables 296 96 Recommendations 1 2 98 Questions 198 -

9 Effect Effect Sizes for Ordinal Categorical Variables 9 Introduction 200 200 The Point-Biserial rr Applied to Ordinal Categorical Categorical 202 Data 202

Confidence Interval Interval and Null-Counternull Null-Counternull Interval Interval Confidence for rrppop 203 op 203 for Limitations of rrppbb for Ordinal Ordinal Categorical Data 203 203 Limitations The Probability Superiority Applied to Ordinal Ordinal T he P robability of Superiority Data 205 205 Data

200 200

CONTENTS CONTENTS

�

xi xi

Worked Example Example of Estimating the PS PS From Ordinal Data 206 206 Data Measure and The Dominance Dominance Measure and Somers' Somers' D D 21 2111 the cIs 3 Worked Example of the ds 21 213 Generalized Odds Ratio 21 3 213 Cumulative Odds Ratio 21 214 Cumulative 4 21 6 The Phi Phi Coefficient Coefficient 216 A Caution Caution 21 2166 References for Further References for Further Discussion of Ordinal 7 Categorical Methods Methods 21 217 Questions 7 Questions 21 217 References References

2 19 219

Author Index Author

237 237

Subject Subject Index

245 245

This page page intentionally left blank

Preface

effect sizes sizes is a rapidly rising tide as over 20 journals journals in Emphasis on effect various fields various fields of research now now require that authors authors of research reports provide estimates estimates of effect effect size. size. For certain kinds of applied research it is provide acceptable only only to report report that results were statisti statistino longer considered considered acceptable Statistically significant results results indicate indicate that a re recally significant. Statistically discoveredevidence evidenceof ofaareal realdifference difference between betweenparameters parameters searcher has discovered association between variables, but unknown size. Esor a real association but it is one of unknown Es pecially in applied applied research research such statements often need to be supple supplepecially mented with estimates of how different different the average results results for studied strong the association association between variables is. Those groups are or how strong who apply apply research research results results often need to know know more, for example, than marketing campaign, or that one therapy, one teaching method, one marketing evione medication appears to be better than another; they often need evi of how how much better it is is (i.e., the estimated effect effect size). Chapter 1\ dence of effect size, discussion of those cir cirprovides a more detailed definition of effect cumstances in which effect sizes is especially important, cumstances which estimation of effect and discussion of why effect sizes is needed. and why a variety of measures of effect needed. The purpose of this book is to inform a broad readership (broad with respect to fields of research research and and extent statisextent of knowledge of general statis about a variety variety of measures and and estimators estimators of effect effect sizes for re retics) about search, their their proper applications applications and interpretations, interpretations, and their their excellent books on the topic of effect effect limitations. There are several excellent but these books generally treat treat the topic in a different different context context sizes, but books and for a purpose that is different from that of this book. Some books effect sizes in the the context of preresearch analysis analysis of statistical statistical discuss effect needed sample sizes for the the planned research. power for determining needed not the purpose This is not purpose of this book, which focuses on analyzing the size size of the obtained effects. effects. Some postresearch results in terms of the books discuss effect effect sizes in the context context of meta-analysis, meta-analysis, the quantita quantitasynthesizing of results from from an an earlier set of underlying underlying individual individual tive synthesizing research studies. This also is not not the purpose of this book, which fo foanalysis of data from an individual individual piece of research research cuses on the analysis Books on meta-analysis meta-analysis are also concerned (called primary research). Books xiii

xiv xiv

�

PREFACE PREFACE

with methods for approximating approximating estimates of effect effect size indirectly from reported test statistics statistics because raw data data from from the underlying pri prifrom mary research are rarely available available to meta-analysts. However, this mary meta-analysts. However, book is concerned with direct estimation estimation of effect effect sizes by primary primary re reeffect sizes directly directly because because they, unlike searchers, who who can estimate effect meta-analysts, have access to the raw data. meta-analysts, Broad Practical Approach Approach in part part because because it The book is subtitled A Broad effect sizes for for different different types of deals with a broad variety of kinds of effect variables, designs, circumstances, circumstances, and purposes. Our approach approach encom encomvariables, differences between means passes detailed discussions of standardized differences 6, and and 7), 7), some of the correlational measures (chap. (chap. 4), (chaps. 3, 6, strength of association (chaps. (chaps. 66 and and 77), confidence intervals (chap. (chap. 2 strength ) , confidence thereafter), other other common methods, and less-known less-known measures and thereafter), superiority (chaps. (chaps. 5 and 9). The book is broad 9). The broad also such as stochastic superiority the interest of fairness fairness and and completeness, completeness, we respectfully respectfully cite because, in the alternative viewpoints viewpoints for cases cases in which experts disagree about about the ap apalternative propriate measure of effect effect size. Consistent with the modern trend trend to topropriate ward more use of robust robust statistical statistical methods, the book also pays much assumptions of methods. methods. Also consistent consistent attention to the statistical assumptions with the broad approach, there are more than than 300 references. Software for those calculations calculations that would be laborious by hand is cited. for and content content of this book make it appropriate for use as a The level and statistics in such fields fields as psychol psycholsupplement for graduate courses in statistics ogy, education, education, the social sciences, sciences, business, management, management, and medi medicine. The book is also appropriate for use as the text for a special-topics course in those fields. In addition, beseminar or independent-reading independent-reading course be content and and extensive references references the book is intended cause of its broad content professional researchers, researchers, graduate students students to be a valuable source for professional who are analyzing doctoral thesis, or advanced analyzing data for a master's or doctoral undergraduates. Readers are expected expected to have knowledge of parametric statistics through and some knowledge of statistics through factorial analysis of variance and chi-square analysis analysis of contingency tables. Some Some knowledge knowledge of chi-square the case of two independent groups (i.e., the nonparametric analysis in the Mann-Whitney U U test test or W Wilcoxon Wmm test) would be helpful, helpful, but not ilcoxon W introductory with regard to statis statisessential. Although the book is not introductory many readers have little little or no prior tics in general, we assume that many effect size and and their estimation. estimation. knowledge of measures of effect We typically typically use standard notation. notation. However, However, where we believe that understanding, we adopt adopt notation that is more memorable and it helps understanding, consistent with the concept that underlies the notation. notation. Also, Also, to assist have only only the minimum minimum background define readers who have background in statistics, we define some basic statistical statistical terms with which other other readers will likely likely already We request the forbearance forbearance of these more knowledgeable be familiar. We readers in this regard. Although readability was a major goal, so too was avoiding oversimplifying. length of the book we do not discuss multivariate multivariate cases To restrain the length (some references references are provided); provided); we do not present equations or discus-

PREFACE PREFACE

�

XV XV

sions for all measures of effect effect size that are known to us. We We present

examples, and discussions for estimators of many equations, worked worked examples, references for others others that are sufficiently sufficiently dis dismeasures and provide references cussed elsewhere. Our discussions of the presented measures are also in intended to provide a basis for understanding understanding and for the appropriate appropriate use of of those other other measures that are presented in the sources sources that we cite. Criteria for deciding whether to include include a particular measure of effect effect Criteria computational accessibility, accessibility, both of size included its conceptual and computational of which likelihood that the measure will find its way into which relate to the likelihood common common practice, which was another another important important criterion. However, However, we admit admit to some personal preferences preferences and perhaps perhaps even fascination fascination with some measures. times we violate our own criteria for in measures. Therefore, at times inclusion. clusion. A few exotic measures are included. Readers should should be able to find in this book many many kinds kinds of effect effect sizes that they they can knowledgeably knowledgeably apply apply to many of their their data sets. We We at attempt ex tempt to enhance the practicality practicality of the book by the use of worked examples involving mostly real real data, for which the book provides provides amples calculations calculations of estimates estimates of effect effect sizes sizes that had had not not previously previously been original researchers. made by the original

ACKNOWLEDGMENTS ACKNOWLEDGMENTS We are grateful for many many insightful recommendations recommendations made by the re reviewers: Scott Maxwell, the University University of Notre Dame; Dame; Allen Huffcutt, Huffcutt, Bradley University; Shlomo S. S. Sawilowsky, Sawilowsky, Wayne State University; University; and Timothy Urdan, Santa Clara University. Failure to implement implement any of their recommendations correctly correctly is our fault. We We thank Ted Ted Steiner Steiner of for for clarifying a solution solution for a problem with the relative risk as an effect effect Wealso thank Julie A. A. Gorecki providing data, and for her assis assissize. We Gorecki for providing wordprocessing and graphics. The The authors authors gratefully ac actance with wordprocessing knowledge the generous, prompt, and very professional assistance of of our Riegert, and our production editor, Sarah Wahlert. our editor, Debra Riegert, Wahlert.

This page page intentionally left blank

Chapter Chapter

11

Introduction

REVIEW OF OF SIMPLE CASES OF OF NULL-HYPOTHESIS SIGNIFICANCE TESTING Much applied research begins with a research hypothesis that states relationship between two variables or a difference difference bethat there is a relationship be tween two two parameters, often often means. (In (In later chapters we consider re re-

involving more than two variables.) One typical form of the search involving variables.) One research hypothesis is that there is a nonzero correlation between the two variables in the population. Often Often one variable is a categorical inde indetwo pendent variable involving vari involving group membership (called a grouping grouping variTreatment a versus 1teatment Treatment b, able), such as male-female or 1teatment b, and the continuous dependent dependent variable, such as score score on an other variable is a continuous attitude attitude scale scale or on a test of mental health health or achievement. In this case of of grouping variable there are two customary customary forms of of research hypoth hypotha grouping eses. The hypothesis might might again be correlational, positing positing a nonzero point-biserial correlation between group membership and the dependpoint-biserial depend discussed in chapter 4. More often often in this case case of a ent variable, as is discussed grouping variable the research hypothesis posits that there is a differ difference between means in the two two popUlations. populations. Readers who are familiar with the general linear model should recogrecog nize the relationship between hypotheses that involve either correlation or the difference difference between means. However, However, the the two kinds of hypotheses not identical, identical, and some researchers may may prefer prefer one or the other other form are not may prefer prefer one approach approach,, some of hypothesis. Although a researcher may may prefer prefer the other. Therefore, researchers readers of a research report may from both approaches. should consider reporting results from The usual statistical analysis of the the results from from the kinds of research hand involves testing a null hypothesis hypothesis (Ha) (H0) which conflicts conflicts with the at hand research hypothesis either by positing that the correlation between the two variables is zero in the population population or by positing that there is no diftwo dif ference ference between the means of the two populations. The The t statistic is usu usutest the Ha H0 against against the research research hypothesis. The The significance significance ally used to test attained by a test statistic statistic such as tt represents the probalevel (p) that is attained

1 1

2 2

((Ila - Ilb) ,, or nega nega(Ya-- Yb) < < (Ila ( The larger larger the sample sizes and the less variable variable tive, (Ya the populations of raw raw scores, scores, the smaller the absolute value of the mar mar1, the margin gin of error be. That error will be. That is, as is reflected reflected in Equation 2. 2.1, margin of error is a function of the standard error. In this case case the standard error is the standard deviation of the distribution distribution of differences differences between two two populations' sample means. populations' Another factor that influences influences the amount amount of margin of error is the in one's estimate of a range of level of confidence that one wants to have in likely to contain Ila - Ilb '. Although it might might seem counter countervalues that is likely confiintuitive to some readers at first, we soon observe that the more confi dent one wants to be in this estimate, the greater the margin of error will have to be. be. For For a very simple example, it is safe to say that we can be 100% 100% confident confident that the difference difference in mean annual incomes of the popu population of high-school dropouts and the the population population of college graduates lation found within the interval interval between $0 and $1,000,000, but but would be found our 100% 100% confidence confidence in this estimate is of no benefit benefit because it involves our unacceptably large margin of error (an insufficiently informative re rean unacceptably two population population means of an ansult) .. The actual difference difference between these two $0 $1,000,000. (For the nual income is obviously not near $ 0 or near $1, 000,000. (For section, we used mean income as a dependent dependent variable in purpose of this section, our example despite the fact that income data are usually skewed and are typified by medians instead of means.) means.) A procedure that greatly decreases decreases the margin of error without exex cessively confidence in the truth of our result cessively reducing our level of confidence tradition is to adopt what what is called called the the 95% 95% (or would be useful. The tradition ..95) 95) confidence confidence level that leads to an estimate of a range of values that has a ..95 the value of Ila - Ilb ' When expressed has 95 probability of containing the expressed (e.g.,, ..95) accurately cal calas a decimal value (e.g. 95) the confidence confidence level of an accurately called the the probability probability coverage of of a culated confidence interval is also called confidence interval. interval. To To the the extent extent that thataamethod method for forconstructing constructingaacon conconfidence inaccurate, the actual probability coverage coverage will defidence interval is inaccurate, de part from what it was intended to be and what it appears to be (e.g. (e.g.,, from the nominal ..95). 95% confidence confidence may seem to 95). Although 95% depart from readers to be only slightly less confidence confidence than 1100% confidence, some readers 00% confidence, insuch a procedure procedure typically results in a very much narrower, more in formative interval than in our example example that compared incomes. formative incomes. For For simplicity, the first procedure that we discuss assumes normality, normality, homoscedasticity, and independent groups. The procedure is easily generalized to confidence levels levels other than than the 95% 95%level. First, First, we con consider an additional assumption of random sampling and consider fur further the assumption of independent groups.

Vb)

- Ilb)·

- Vb)

26

�

CHAPTER 2 CHAPTER 2

nonexperimental research we typically typically have to accept violation violation of In nonexperimental of problem by con conthe assumption of random sampling. sampling . Some finesse this problem cluding that research results apply to theoretical theoretical populations populations from from which our samples would have constituted constituted a random sample. sample. It can be argued that such a conclusion can be justified justified if the samples that were used seem to be reasonably representative of the the kinds of people to used reasonably representative whom we want to generalize the results. In the case of experimental re rewhom assignment to treatments treatments satisfies the assumption assumption (in search random assignment terms of the the statistical statistical validity validity of the the results, if not not necessarily in terms terms of the external external validity validity of the the results) results).. We We have more to say about of about the possible influence influence of sampling method method on confidence confidence intervals later. Independent groups groups can be roughly roughly defined defined for for our our purposes as groups Independent within which which no individual's score on the dependent dependent variable is related within to or predictable from the scores of any individual another group. individual in another Groups are independent if the probability probability that an individual in a group score remains the the same regardless regardless of what score is will produce a certain score individual in another with dependent produced by an individual another group. Research with dependent groups methods for construction of confidence confidence intervals intervals that groups requires methods different from from methods used for research with are different with independent groups, a ass we discuss discuss in the the last section of this chapter. for simplicity simplicity for for now now that the assumptions assumptions of normality, normality, Assuming for homoscedasticity, homoscedasticity, and independence have been satisfied and that the usual usual is applicable, it can be shown (central) t distribution distribution is shown that that for constructing constructing a confidence interval interval for f..la

-

� the the margin margin of error

[ (n

ME = t ' s�

'

1

a

1 +

)]

1/2

(ME) is given by (ME) is

(2. 1 ) (2.1)

�

after tt*isisthe the standard standard error errorof ofthe the difference difference The part of Equation 2.1 2. 1 after between two sample means. In addition addition to its role in confidence confidence inter intervals, the standard standard error is is used to indicate the precision with which a statistic estimating a parameter; the smaller the standard standard error the statistic is estimating greater the precision. construct a 95% 95%confidence confidence interval, interval, (t' When Equation 2.1 2 . 1 is used to construct is the absolute absolute value of t that a table of critical values of t indicates is re reat the .05 two-tailed quired to attain statistical statistical significance at two-tailed level (or one-tailed level) mat the 95% orany any other other level levelof ofconfi confi..025 025 one-tailed in a t test. For the 95% or 2 the pooled estimate of of the common variance of dence, ss �p is the the assumed common of 2 the two populations, acr. . Use for for the the degrees-of-freedom row of the the the two populations, degrees-of-freedom (df) (dj) row + nb nb - 22.. Because Because for now we are assuming assuming t table, ddff == n aa + 2 obtained by pooling the 02 is obtained homoscedasticity, the best estimate of a data from the two two samples to calculate the usual usual weighted average of data of two samples' estimates of 0 a22 to produce (weighting by sample sizes the two via the the separate sample's dfs dfs):) : via

-

CONFIDENCE CONFIDENCE INTERVALS INTERVALS 2 Sp

=

(no

� + (nb - l)s !

- l)s

�

27

(2.2) (2.2)

Because approximately approximately 95% 95% of the the time time when when such such confidence interinter vals are constructed, in the current case, might be be case, the value of Yaa -- Ybb might overestimating or underestimating ME 95 95,, one one can can say say that that underestimating Ila - Ilb by the ME approximately 95% 95% of the time the following interval of values will concon the value of of Ila - �:: tain the

(2.3) (2.3) The value (Y is called calledthe the lower lower limit limit of of the the 95% 95%confidence confidence Vb) A:!.E .95 9JS (Yaa -- Y b) --ME interval, and and the the value (Yaa -- Ybb)) + ME 95 95 is is called called the the upper upper limit limit of ofthe the95% 95% confidence confidenceinterval intervalisis(for (for our our purpose) purpose)the theinterval interval confidence interval. interval. AAconfidence of values between the Cl for the lower limit and and the the upper limit. We often often use CI confidence interval, interval, and 95% Cl CI we use .95 Cl CI or Cl.95• CI95. and to denote the the 95% Although confidence intervals for for the difference difference between two two averages are not not effect effect sizes, they can provide (but not information not always) alway s) useful useful information about For example, example, in our about the magnitude of the results. For our case case of comparing weight-reduction programs for diabetics, suppose that the lower and two weight-reduction interval for Ila - �, after upper limits of the confidence interval after 1 year y ear in one or the other program were l1 Ib and 2 1b, Ib, respectively. respectively. A between-program between-program difdif population weights (a constant, but an unknown unknown one) ference in mean popUlation one) that we are 95% 95%confident confident would wouldbe befound foundin inthe theinterval intervalbetween between 11and and22 Ib after after 11 year lb y ear in the programs would seem to indicate that there is likely little practical difference difference in the effectiveness effectiveness of the two two programs, one of of seemirigto tobe beonly onlynegligibly negligiblybetter betterthan thanthe theother other at at most. most.On Onthe the which seemirtg other hand, if if the lower and upper limits were found to be, be, say, 20 and 30 Ib, then one would be fairly fairly confident confident that one has evidence evidence (not proof) proof) lb, effective program is substantially better. that the more effective two examples examples of outcomes that neither the interval from from I1 Note in the two from 20 to 30 30 contains the value 0 within it. It can be be shown shown in to 2 nor from 95% confidence interval does not not contain the the the present case that if the 95% value 0 the results imply that a two-tailed two-tailed t test of Ho:0: Ila - � = = 0 would If have produced produced a statistically significant t at the .05 significance level. If the interval does contain the value 0, say, example, limits of -10 -10 and say, for example, +10, conclude that the difference difference between Ya Yaand andYb Ybisisnot not sig sig+ 1 0, we would conclude two-tailed .05 .05 level of significance. In general, general, if if we were were to to nificant at the two-tailed adopt a significance significance level level alpha, if if the ((1 - > Vb)' Yb), where Pr The expression expression for current effect Pr stands for for probability. This Pr(Ya Pr(Ya > > Vb) Yb) measure has has no widely used name, although names have been given to its estimators estimators (Grissom, Grissom & & Kim, Kim,200 2001; McGraw& Wong, 11992). 1 ; McGraw & Wong, 992). In 11994a, 9 94a, 11994b, 994b, 11996, 996, Grissom the just-cited just-cited references Grissom named an an estimator estimator of Pr(Ya Pr(Ya > > Vb) Yb) the the the probability of of superiority superiority (PS) (PS).. In this book we will instead use probability probability probability of superiority superiority to to label Pr(Ya Pr(Ya > > Vb) Yb) itself an estimator estimator of of it), of itself (not an it), so that we now define it as follows follows:: now F5 = Pr(Ya > Yb).

(5.1) (5 . 1 )

F5 measures the the stochastic (i.e (i.e.,. , probabilistic) superiority superiority of one The PS another group's group's scores. scores. Because the the PS PSisisaaprobabilprobabilgroup's scores scores over another 98

EFFECT MEASURES EFFECT SIZE SIZE MEt\SURES

�

99

ity and probabilities range from from 0 to I1,, the PS ranges from 0 to 11.. There Therewhen comparing comparing Populations Populations a and b fore, the two most extreme results when would be (a) = 0, in which which every member of Population Population a is outscored would (a) PS = by every member of Population b; and 1,, in which which every mem memby and (b) (b) PS = = I ber of Population Population a outscores every member of Population Population b. The least result (no effect effect of group membership one way way or the other) extreme result would result in PS PS = = ..5, Populations a and b out out5 , in which which members of Populations score each other equally equally often. sample estimates estimates a probability in a population. A proportion in a sample popUlation. For For 522 heads results in a sample of 1100 (ranexample, if one counts, say, 5 00 (ran dom) tosses of a coin, the proportion proportion of heads in that sample' sample'ss results is 52/100 = ..52, estimate of the the probability probability of heads for for a popula popula5 2/ 1 00 = 5 2 , and the estimate random tosses of that specific coin would would be ..5522.. Similarly, Similarly,the thePS PS tion of random estimated from from the proportion proportion of times that the nnaa participants participants in can be estimated outscore nb participants in Sample b in head-to-head com comSample a outs core the nb parisons of scores within all possible pairings of the score of a member of of one sample with the score of a member member of the other sample. The total number of possible possible such comparisons is given by the product two number product of the two nanb. Therefore, if, if, say, nnaa = nb nb = = 110 sample sizes, nanb' 0 (but sample sizes do not have to be equal), equal), and and in 70 of the the npb nanb = = 1100 00 comparisons the score not from the member of Sample Sample a is greater greater than the score from from the member from of Sample b, then then the the estimate estimate of PS is 70/1 70/100 = ..70. 00 = 70 . of example, suppose suppose that Sample a has For a more detailed detailed but but simple example, three members, Persons A, C; and three A, B, B, and and C; and Sample b has has three members, Persons D, D, E, E, and F. F.The Thennaanb nb==33x3 pairings to toobserve observewho whohas hasthe the x 3 ==99pairings higher E, A versus F, higher score would would be A versus D, A versus versus E, F, B B versus D, B versus E, E, B versus F, F,C versus D, C versus E, E, and and C versus F. F. Suppose that in five five of these nine pairings of of scores scores the the scores scores of of Persons Persons A, B, B, and and C in C the scores of of Persons Persons D, E, E, and and FF (Sample b), (Sample a) are greater than the other four pairings Sample b wins wins.. In this example the esti estiand in the other mate of PS is 5/9 5/9 = = ..56. actual research one would would not not mate 5 6 . Of course, in actual want to base the estimate estimate on such such small small samples. samples. estimate of PS will willbe begreater greater than than . .5 when members members of ofSample Sample aa 5 when The estimate outscore members of Sample b in more than one half outscore members half of the pairings, pairings, and when members of Sample a are out outthe estimate will be less than ..5 5 when scored by members of Sample b in more than one half the pairings. half of the solution is to to allocate one half half of the ties When there are ties the simplest solution group.. (There are other for handling handling ties; see see Brunner & to each group other methods for & & Gibbons, 1981; 2001; Munzel, 2000; Fay, Fay, 2003; Pratt Pratt & 1 9 8 1 ; Randies, Randles, 200 1; & Best, 200 2001; example if members members Rayner & 1 ; Sparks, 1967.) 1 96 7. ) Thus, in this example of Sample a had outscored outscored members members of Sample b not not five five but four times of but four in the nine nine pairings, half of the tie would be awarded awarded as pairings, with one tie, one half would be 44.5 supea superior outcome to each sample. Therefore, there would . 5 supe rior outcomes for each sample in the nine pairings of its members with rior the members of the the other other sample, and and the the estimate of PS PS would, there therethe 4.5/9 = ..5. related to the PS PS but fore, be 4 . 5 /9 = 5 . A measure that is related but ignores ties (Cliff, 1993) chapter (in Equation (Cliff, 1 99 3 ) is considered later in this chapter Equation 5.5). 5.5).

1100 00

�

CHAPTER 5 5 CHAPTER

number of times that the scores from one specified specified sample are The number higher than the scores from the other sample with which which they they are paired higher of the sample proportion that is is used to estimate estimate the (i.e., the numerator ofthe P5) is called the U statistic (Mann & & Whitney, 11947). Recalling that the the PS) 94 7 ) . Recalling number of possible comparisons comparisons (pairings) is nanb n a n b and using Pa>" pa>b to total number PS, we can now define define:: denote the sample proportion that estimates the PS, u

((5.2) 5.2)

other words, iinn Equation 55.2 numerator iiss the number ooff wins IIn n other . 2 the numerator for a specified specified sample and the denominator denominator is the number of opportuni opportunifor win in head-to-head member's ties to win head-to-head comparisons of each of its member 's scores the scores scores of the the other sample's members members.. The value of U U with each of the can be calculated manually, but but it can be laborious to do so except except for samples. Although Although currently major statistical statistical software very small samples. packages do not not calculate Pa>b' pa>b, many many do calculate the Mann-Whitney Mann-Whitney U U the equivalent W Wmm statistic. If the value of U is obtained obtained statistic or the then divides this outputted outputted U Uby nanb to by npb through the use of software, one then find the estimator, estimator, P pa>b software provides the the equivalent equivalent Wilcoxon a>b .' If software (1945) Wmm rank-sum statistic instead instead of the U U statistic, statistic, if there are no ( 1 945) W U by calculating calculating U U= =W Wmm -- [n ss(nss + + 11)] / 2, where nss is is the ties, find U )1 / smaller sample size or, if sample sizes are equal, the size of one sample. Note that Equation .2 satisfies the general Equation 55.2 general formula, which which was pre prechapter 11,, for the relationship relationship between between an estimate estimate of effect effect sented in chapter (E5EST test statistic (TS); (TS);ES = TS TS/[f(N)]. the case of EquaESEST Equa size (ES / [ f{N) I . In the EST = EST) and a test 5.2, ESEST p,, TS = = U, U, and and f{N) f(iV) = = naanb". = P tion 5 .2, ES EST = Researchers who focus on means and assume normality and homoscedasticity might might prefer to use use the t test to compare the means means homoscedasticity effect size size.. Researchers Researchers who do not not and use a standardized-difference effect interested in a measure of the extent extent to assume normality and who are interested which the scores in one group group are stochastically stochastically superior to those in which another group group will prefer to use the PS PS or a similar measure. Under homoscedasticity (in this case, equal variability variability of the overall overall ranks of homoscedasticity of scores in each group) one may may use the original Mann-Whitney Mann-Whitney U the scores U test to test Ho: H0: PS PS = = ..5 Halt: PS PS#- ..5. Utest test 5 against Halt: 5 . However, the ordinary U usually provided by ssoftware robust against against that is usually o ftware is not robust heteroscedasticity (Delaney (Delaney & V Vargha, B.. P.P. Murphy, 1976; argha, 2002; B 1 9 76 ; heteroscedasticity Pratt, 11964; & Zumbo, 11993). 964; Zimmerman & 9 9 3 ) . Further discussion of of discussion of a researcher researcher's choice between homoscedasticity and discussion ' s choice comparing means and and using the PS PS is is found in the forthcoming section assumptions. on assumptions. Wilcox ((1996, Vargha Delaney (2000), (2000), and and Consult Wilcox 1 996, 11997), 9 9 7) , V argha and Delaney and Vargha (2002) (2002) for for extensive extensive discussions discussions of robust robust meth methDelaney and for testing testing H PS = = ..5. presented a Minitab macro ods for Ho: 5 . Wilcox ((1996) 1 9 9 6 ) presented 0: PS and S-PLUS software functions ((Wilcox, and Wilcox, 11997) 99 7) for constructing a con-

EFFECT SIZE MEASURES EFFECT SIZE MEASURES

�

101 1 01

fidence (1981) fidence interval interval for the the PS PS based on Fligner and and Policello's (1981) heteroscedasticity-adjusted LIstatistic, statistic, U', U,and and on on aamethod method for for con conheteroscedasticity-adjusted U Mee ((1990) structing a confidence interval interval by Mee 1 990) that appears to be fairly further improved by mak accurate accurate.. The Fligner-Policello U' test can be further makadjustment to the degrees & ing a Welch-like adjustment degrees of freedom ((cf. cf. Delaney & Vargha, 2002).) . Refer Refer to V Vargha and Delaney ((2000) for critiques of al alVargha, 2002 argha and 2000) for ternative methods for constructing constructing a confidence PS, ternative confidence interval interval for the PS, equations for manual calculation, and extension extension of the PS equations PS to to compari comparisons of multiple groups.. multiple groups Also refer refer to Brunner and Puri (2001) to mul mul(200 1 ) for extensions extensions of the PS PS to designs.. (Factorial designs are discussed discussed in tiple groups and to factorial designs (Factorial designs chap Brunner and Munzel (2000) presented presented a further chap.. 77 of this book) book.) Brunner Munzel (2000) robust method that .5 that can be used to test the null hypothesis that PS PS = .5 and to provide provide aann estimate ooff the PS PS and construct a confidence interval for it. This method is applicable ties, heteroscedasticity, for applicable when there are ties, heteroscedasticity, Wilcox (2003) provided an an accessible discussion discussion of the Brunor both. Wilcox Brun and S-PLUS software functions for for the the calculations ner-Munzel method and two groups and for extension to the case in the current case of only two case in which s . (Wilcox (Wilcox which groups are taken taken two two at a time from from multiple multiple group groups. called the the PS PS p or P, and and Vargha and and Delaney called it A.) ==

THE PS PS EXAMPLE OF THE Recall from from chapters 33 and and 4 the example example in which the the scores of the mothers of schizophrenic children (Sample a) were compared to those of of the mothers of normal children (Sample b) b).. We observed observed from from two two difthe dif ferent perspectives perspectives in those chapters that there is is a moderately strong mother and the score score on a measure of relationship between type of mother of healthy parent-child parent-child relationship, as was indicated by the results results d = healthy -. 7 7 and now estimate the the PS for for the the data data of of this this example. example. -.77 and rr pb" = .40. We now Because na = nb = 20 in this example, example, nanb == = 20 X x 20 = 400 400.. Four hun hunmany pairings for manually dred is too many manually calculating U conveniently and confidence that the calculation will be error free. Therefore, we with confidence used software (many kinds of statistical statistical software can do this) to find 03/400 = .26. We We 1 0 3 . We that U U = 103. We can then calculate calculate pa>b == = U/nanb == = 1103/400 thus estimate estimate that in the populations populations there there is only only a .26 probability probability that a randomly randomly sampled mother mother of a schizophrenic child will outscore a a randomly sampled mother randomly mother of a normal normal child. Under the assumption homoscedasticity one can test test H PS = .5 assumption of homoscedasticity Ho: .5 0: PS using the ordinary ordinary U off which is often often U test or equivalent W Wm test, one o packages . Because software reveals provided by statistical software packages. reveals a statistically significant U at at pp < .05 for these data, one can conclude conclude in statistically PS '* ..5. assuming homoscedasticity for the the 5 . Specifically, assuming this case that PS current example, example, we conclude conclude that the population of schizophrenics' current defined by the the PS) PS) in its scoring when when compared mothers is inferior (as defined the population population of the the normals' normals' mothers (i.e. (i.e.,, PS PS < ..5). 5 ) . A researcher to the ==

==

n: nb ==

nan"

==

U

Pa>b U/nanb

==

==

==

==

m

U

1102 02

�

CHAPTER 5 5 CHAPTER

assume homoscedasticity homoscedasticity should choose to use one of the the who does not assume found in the sources alternative methods that can be found sources that were cited in the previous section. < .05 our result instead of reporting reporting a .05 for our Note that we reported p < why we did this this.. First, differspecific value for p. There are two reasons why differ ent statistics statistics packages might might output different different results for the the U 17test ent & Spooren, 2000 2000).) . Second, Second, we are not not confident (Bergmann, Ludbrook, & specific outputted p values beyond beyond the .05 level for the sample sample sizes in in specific We provide further discussion of these two two issues in the the this example. We remainder of this section. section. remainder the sampling distributions distributions of values of U or As sample sizes sizes increase, the W Wmm approach approach the the normal normal curve. curve. Therefore, Therefore, some some software software that that in includes the Mann-Whitney U Utest test or or the the equivalent equivalent Wilcoxon Wilcoxon W Wmtest test or or manually may may be some researchers who do the calculations for the test manually needed for statistical statistical significance significance on what is basing the critical values needed large-sample approximation approximation of these critical critical values values.. Because called a large-sample textbooks do not not have tables of critical values for these two two sta stasome textbooks have tables that lack critical critical values for the particular particular tistics or may have sample sizes or for the alpha alpha levels of interest in a particular particular instance of of normal curve research, recourse to the widely available table of the normal convenient.. Unfortunately, Unfortunately, the literature is is inconsistent inconsistent would be very convenient recommendations about about how how large samples should be before the in its recommendations convenient normal curve provides a satisfactory approximation approximation to the convenient sampling distributions of these statistics statistics.. However, computer computer simulasampling simula tions by Fahoome (2002) (2002) indicated that, if sample sizes sizes are equal, each = 115 satisfactory minimum minimum when testing at the ..05 n= 5 is a satisfactory 0 5 alpha level = 29 is is a satisfactory satisfactory minimum minimum when when testing testing at at the ..01 01 and each n = Also, Fay (2002) (2002) provided Fortran Fortran 9900 programs programs for use bbyy re relevel. Also, searchers who who need exact exact critical critical values for W Wmm for a wide range of searchers of and for a wide range of alpha levels. sample sizes and levels . If sample sizes sufficient for use of the normal curve for an an ap apIf sizes are sufficient homoscedasticity, if there are no ties, then proximate test and, assuming homoscedasticity, may test the null hypothesis hypothesis that PS PS = = ..5 5.3 one may 5 by using Equation 5 . 3 to convert U to z: convert m

((5.3) 5.3)

the null hypothesis hypothesis at two-tailed two-tailed level level ex a iiff the the value ooff I|zz I| exReject the ex za/22 in a table of the normal normal curve. Applying the values from the ceeds zw find that example of the two groups of mothers, we find 2 = 11103-(20 X 20)/2 20)/2|1 /[(20(20)(20 /[(20(20)(20 + 20 + +1)]/ = 22.624. I|z| zl = 1 03 - (20 x 1 ) J / 112]' 2]" = . 624. InIn the normal normal curve we wefind find that I|z 1| = = 22.624 is a statis. 624 is specting a table of the

EFFECT SIZE SIZE MEASURES EFFECT MEASURES

�

1 03 103

level, two-tailed. tic ally significant result at tically at the the p < .05 level, two-tailed. If there are ties, d� ', which replace the denominator . 3 withSaad denominator in Equation 55.3 which can be obtained obtained from . 5 in chapter 9. 9. from Equation 99.5

A RELATED MEASURE OF EFFECT EFFECT SIZE Because the maximum probability proportion equals 1, I, the the maximum probability or proportion the sum sum of the the possible outcomes outcomes probabilities or proportions proportions of occurrences of all of the of an event must sum to 11.. For Forexample, example,the theprobability probability that thataatoss toss of ofaacoin coin of will = 1. if there are will produce either a head or a tail equals 1/2 + V2 = 1 . Therefore, if no ties or ties are allocated equally, then I , and P then P paa>b +P paab + b b as an esa//pPa< timator timator (the inverse of the previous ratio), ratio), replace the the word lower with the question.The Thetwo two versions versionsof ofthe the ratio ratio are arere repreceding question. word higher in the preceding lated to estimators of a generalized generalized odds ratio, about about which there is more discussion in chapter 9 9.. data involving the two two samples of For an example consider again the data of mothers. Recall that in this example P pa>b now we find find that that >b = .26, so now P 74, and . 8 . We Pa< Pa>b = 11 - .26 -26 = = ..74, and Ppaab = .-74/.26 74.26 = = 22.8. Weare arethus thus b = 11 -- P >b = < b/Pa ab = estimating . 8 times more pair estimating that in the populations there would would be 22.8 pairings in which which the schizophrenics' mothers are outscored by the nor noroutscore the mals' mothers mothers than in which the schizophrenics' mothers outscore normals' normals' mothers. -

-

ASSUMPTIONS The original purpose of the scores in one population the U test was was to test if scores are stochastically larger (i.e., likely to be larger) larger) than scores scores in another another population, assuming that both popUlations populations have the same, but but not & Whitney, 11947). was later 947). (This test was necessarily normal, shape (Mann & Wm 945, the W observed to be equivalent to an earlier test by Wilcoxon, 11945, rank-sum test.) if rank-sum test.) In other other words, the the purpose of the the U test was to test if score at at the ith ith percentile percentile of Population a is larger than the score at the score m

1104 04

b. Refer Grissom and 1 ) for compari sons of the values of the the pa>b estimates and and the the CL CLestimates estimatesapplied appliedtoto simulations on the sets of real data and for the results of some computer simulations effect of heteroscedasticity further results of effect heteroscedasticity on the two two estimators. For further of

Pa>b '

Pa >b

1106 06

�

CHAPTER 5 5 CHAPTER

computer computer simulations of the robustness of various methods for testing Ho0: PS PS = = ..5, Vargha and Delaney Delaney (2000) (2000) and and Delaney Delaney and and 5 , consult V argha and Vargha (2002).) . Refer Refer to Dunlap ((1999) for software to calculate the the CL. Vargha (2002 1 999) for CL .

TECHNICAL 5.1: ITS ESTIMATORS ESTIMATORS TECHNICAL NOTE NOTE 5. 1 : THE THE PS PS AND AND ITS The PS scores from from Group PS measures the the tendency of scores Group a to outrank outrank the the scores scores of the the members of of scores from Group b across all pairings of the each group. Therefore, Therefore, the the PS PS is an an ordinal measure of effect effect size, reflectreflect absolute magnitudes magnitudes of the paired scores but but the rank order ing not the absolute of these paired scores. Although, outside outside of the physical sciences, one of often ten treats scores as if they they were on an interval interval scale, many of the mea meabut not sures of dependent dependent variables variables are likely monotonically, monotonically, but necessarily linearly, related to the latent latent variables that that they are measur measuring. In other other words, the scores presumably increase increase and decrease decrease along with the latent latent variables (i.e., they have the same rank order as the la lawith tent variables) variables) but but not necessarily to the same degree. Monotonic trans transtent leave the ordinally oriented oriented PS PS invariant. formations of the data leave Therefore, different different measures of the same dependent variable should PS invariant. If a researcher is interested in the tendency of the leave the PS the pair scores in one group to outrank the scores in another another group over over all pairings of the two, then then use of the the PS PS is reasonable. Theoretically, pPa> consistent and and unbiased unbiased estimator estimator of the PS, PS, a>bb is a consistent and it has the smallest smallest sampling variance variance of any any unbiased estimator estimator of of the PS. PS. (A consistent estimator estimator is one that converges randomly toward the the parameter that it is estimating as sample sizes approach infinity.) infinity. ) PS ;f. ..5, 5 , or Also, using pPa>b to test test Ho: H0: PS PS = .5 against against Halt: Halt: PS or against a one onea>b to

==

tailed alternative, alternative, is a consistent consistent test in the sense that the power of such tailed a test approaches 11 as sample sizes approach infinity. Some readers may question the statement that readers may that the CL CL assumes homoscedasticity because regardless because the variance of (Yaa - V Ybb)) is cr + cr regardless -

of of the values of cr a

�

�

: and shown that the CL and cr � .. However, it can be shown CL strictly strictly

estimates the PS PS under homoscedasticity and that it only estimates under normality normality and homoscedasticity an unbiased unbiased estimator estimator of the PS PS unless it is adjusted adjusted (Pratt & & is not not quite an and Wong ((1992), who named the CL, CL, were Gibbons, 11981). 9 8 1 ) . McGraw and 1 992), who discussions of the PS correct in assuming homoscedasticity. For more discussions the PS and its estimators consult consult Lehmann ((1975), Mosteller ((1990), 1 9 75 ), Laird and Mosteller 1 9 90), Pratt and and Gibbons Gibbons ((1981), and Vargha Vargha and and Delaney Delaney (2000) (2000).. Note that in Pratt 1 98 1 ), and you will typically find find the parameter symbolized symbolized in a these sources you manner similar to Pr(Y > Y ) Pr(Ya > Vb) with no name attached to it. manner a b

INTRODUCTION TO OVERLAP INTRODUCTION TO OVERLAP Measures of effect size can be related to the effect size the relative positions of the the dis distributions 0 = 0, tributions of Populations a and and b. When there is no effect, effect, A A = 0, rpop 5 . In this case, Distribution a and and PS = ..5. case, if assumptions are satisfied, Distributions and

==

��

EFFECT SIZE SIZE MEASURES MEASURES EFFECT

�

1107 07

maximum effect, effect, d Aisisat at its its maxi maxib completely overlap. When there is a maximum mum negative or positive value for the data, rr 0 = =+ +1 PS = =0 1 , and PS 1 or --1, 0 mum or 11 depending on whether whether it is is Population Population b orr Population Population a, respec respectively, that is is superior superior in all of the comparisons comparisons within the paired scores. maximum effect effect there is no overlap of the the two two distribu distribuIn this case of maximum score in the higher scoring group is is higher than tions; even the lowest score score in the lower scoring group. Intermediate Intermediate values of ef efthe highest score fect size result in less extreme two previ previfect extreme amounts of overlap than in the two cases. Recall the example in chapter chapter 33 in which which Fig. 3.1 the ous cases. Fig. 3 . 1 depicted the treated population's distribution distribution shifting 11 cry a unit to the mean of the treated of the control population's population's distribution distribution when .1. A= = ++1. right of the mean ofthe 1.

15 b

THE DOMINANCE MEASURE

Cliff ((1993) discussed a variation variation on the PS PS concept concept that avoids dealing Cliff 1 99 3 ) discussed >Y Ybbor orYYbb>>YYa.a. with ties by considering only those pairings in which Yaa > the dominance measure of of effect effect size (DM) (DM) here here be beWe call this measure the cause Cliff Cliff ((1993) estimator the the dominance statistic, which which we 1 993) called its estimator ds.. This measure is defined as denote by ds (5.5) (5.5)

and its estimator, estimator, ds, is given by

ds = Pa > b - T\ > a ·

(5.6) (5.6)

Here the the p p values are, are, as before, given by by U U/n nbb for each group, ex ex/nap cept for including including in each group' group'ss U LTonly only the the number number of ofwins winsin in the the nanb nanb pairings of scores scores from Groups a and b, with no allocation allocation of any any ties. pairings na = = nb nb = = 10, the 10 100 pairFor example, suppose that na 1 0, and of the 10 Xx 10 10 = = 1 00 pair Group a has the higher of the two paired scores scores 50 ings Group 50 times, Group b has the higher score 40 0 times, and there are 110 0 ties within the paired score 4 4 ; therefore, scores. In this case, case, P paa>b 50/100 = = .5, .5, T\>a pb>a = =4 40/100 therefore, 0/1 00 = ..4; >b = 50/100 the estimate estimate of the DM DMisis. 5.5-- . 4.4== ++.1, suggestingaaslight slightsuperiority superiority . 1 , suggesting of probabilities, both Pr Pr values can range from 0 0 of Group a. Because, as probabilities, to 11,, DM DMranges rangesfrom from00-11 ==-1 -1 to to11-00 == ++1. WhenDM DM==-1 -1the thepop pop1 . When ulation's distributions do not not overlap, with all of the scores distributions do scores from the scores scores from and vice versa Group a being below all of the from Group b, and DM = =+ +1. the DM DMbetween betweenthe the two two extremes extremesof of-1 -1 when DM 1 . For values of the +1, overlap.. When there iiss aann equal number and + 1 , there iiss intermediate overlap of their pairings, pairings, Ppaa>b = pT\>a = .5 of wins for Groups a and b in their .5 and the eses >b = b>a = timate of the DM DMisis. 5.5-- . 5.5 == O.0.In Inthis thiscase casethere there isisno noeffect effect and andcom complete overlap. overlap. Refer to Cliff Cliff ((1993) testing and and con conRefer 1 99 3 ) for discussions of significance testing intervals for the the DM DMfor for the the independent independent-groups -groups struction of confidence intervals dependent-groups cases, and for software software to undertake the caland the dependent-groups -

-

108 1 08

�

CHAPTER 5 5

refer to V Vargha (2000) for further further discusdiscus culations. Also Also refer argha and Delaney Delaney (2000) 's sion. Wilcox (2003) (2003) provided S-PLUS software functions for for Cliff Cliff's robust method method for constructing constructing a confidence confidence interval for theDM ((1996) 1 996) robust DM for the case of only two groups groups and for the case of groups taken taken two two at a (2003) indi inditime from multiple groups. groups . Preliminary findings by Wilcox (2003) provides good control of Type I error ( 1 993) method provides cated that Cliff's Cliff ' s (1993) many tied values, a situation situation that may may be problem problemeven when there are many methods.. Many ties are likely when there are relaatic for competing methods rela the dependent dependent variable, tively few possible possible values for the variable, such as is the case for discussed in chapter 9. An example for rating-scale data as discussed example of the the DM is presented in chapter chapter 99 along with more discussion. COHEN'S U COHEN'S U33

If assumptions of normality normality and and homoscedasticity homoscedasticity are satisfied satisfied and and if if experimental re repopulations are of equal size (as they always are in experimental distribusearch), one can estimate the percentage of nonoverlap of the distribu and b. One of the methods uses an estimate of tions of Populations a and uses as an of nonoverlap higher scoring sample nonoverlap the percentage of the members of the higher when normal normalwho score above the median (which is same as the mean when ity is satisfied) the lower scoring scoring sample. We Weobserved observedwith withregard regard to to ity satisfied) of the 3. 1 of chapter 3 that when Fig. when A =+ +1, the mean of the higher Fig . 3.1 chapter 3 !:!. = 1 , the higher scoring population lies 11 cryy unit unit above the the mean of of the the lower lower scoring populapopula popUlation normality, 50% tion. Because, under normality, 50% of the scores are at or below the approximately 34 34% lie between the mean and 11 % of the scores mean and approximately scores lie

ay unit above the mean (i.e., zz = =+ +1), approxcry 1 ), when !:!.A == ++11 we infer that approx 4% of the 34 % == 884% imately the scores of the the superior group exceed exceed the the imately 50% 50% + 34% comparison group. Cohen (1988) median of the comparison ( 1 988) denoted this percentage effect size, Uy as a measure of effect U3, to contrast it with his related measures, U1 and U22,, which which we do not not discuss discuss here. ill and il When there is no effect effect we have observed that � A = 0, 0, rpo rpop = = 0, 0, and and the the = .5, .5, and and now now we note that U3 U3 = = 50%. 50%. In this case 50% 50% of of the the scores PS = scores from PopUlation Population a are at at or above the median of the scores scores from from Popu Popufrom course, so too are 50% 50% of the scores from Population b lation b, but, of course, scores from at (0% nonoverlap) nonoverlap).. As As�A at or above its median; there is complete overlap (O% increases above 0, U U33 approaches approaches 100%. 1 00%. For example, if �A == +3.4, + 3 . 4, then U > 99.95%, 99.95%, with with nearly all all of of the the scores scores from from Population Population aa being being U33 > b.. above the median of Population b scores compared to a control, In research that iis s intended tto o improve scores standard-treatment group, a case case of successful successful treatment treatment is placebo, or standard-treatment defined (but not not always justifiably so) any score that exsometimes defined so) as any ex Then, the percentage percentage of the the ceeds the median of the comparison group. Then, scores from score of the com comfrom the treated group that exceed the median score parison group is called ascalled the success percentage of the treatment. When as sumptions are satisfied the success percentage is, U33.• For For is, by definition, U further discussions consult Lipsey (2000) and Lipsey and and Wilson further discussions Lipsey (2000)

EFFECT SIZE MEASURES

�

1109 09

(2001). For a more complex but but robust robust approach approach to an overlap measure (200 1 ) . For of effect effect size that does not not assume normality or homoscedasticity, homoscedasticity, refer refer of to Hess, Olejnik, (200 1 ) . Olejnik, and Huberty (2001). EFFECT SIZE RELATIONSHIPS AMONG MEASURES OF EFFECT Although Cohen's ((1988) His apparently merely coinci coinciAlthough 1 988) use of the letter U is apparently dental to the Mann-Whitney U LIstatistic, statistic, when when assumptions assumptions are aremet, met, dental U3 and the PS. PS. Indeed, many of the mea meathere is a relationship between U3 effect size that are discussed discussed in this book are related when as assures of effect Numerous approximately equivalent values among sumptions are met. Numerous many measures measures can be found by combining the information that is is in many tables presented presented by Rosenthal 6-2 1 ), Lipsey and WilWil Rosenthal et al. al. (2000, (2000, pp. 116-21), (2001, Cohen ((1988, 22), and Grissom ((1994a, son (200 1 , p. 1153), 5 3 ) , Cohen 1 988, p. 22), 1 994a, p. 315). 3 1 5). presents an an abbreviated set of approximate relationships relationships Table 5.1 5 . 1 presents measures of effect effect size. Table 5.1 accurate among measures size. The values in T able 5 . 1 are more accurate nearly normality, normality, homoscedasticity, and equality equality of sample the more nearly sizes are satisfied, and the larger the sample sizes. chapter 4 we discussed Cohen's Cohen's ((1988) admittedly rough rough criteria for In chapter 1 988) admittedly and large effed effect sizes in terms of values of � A and and values of of small, medium, and 5.1 TABLE S.1 Measures of of Effect Effect Size Approximate Relationships Among Some Measures A !1

rpop

pop

0

.000 .000

..1 1

.050

.2

..100 100

.3

..148 148

.4

..196 1 96

.5

.243

.6

.287 .287

.7

.330 .330

.8

.3711 .37

.9

.410 .410

11.0 .0

.447 .447

11.5 .5

.600 .600

2.0

.707 .707

2.5 3.0 3.4

.781 .781 .832 .862 .862

PS .500 .528 .556 .556 ..584 584 ..611 61 1 ..638 638 .664 .664 .690 .690 ..714 714 .738 .738 .760 .760 .856 .921 .962 .962 .983 .992 .992

U3(%) Ul;';') 50.0

54.0 54.0 57.9 57.9 61.8 6 1 .8 65.5 65.5

69.11 69. 72.6 72.6

75.8 75.8 78.8

81.6 8 1 .6 84.11 84. 93.3

97.7 97.7 99.4 99.4 99.9 99.9 >99.95

1110 10

CHAPTER 5 5 CHAPTER

�

relationships among r pop . Due to the the relationships among many many measures of effect effect size, we can now criteria to the the PS and U U33•. Categorized as small efri o� also apply Cohen's criteria ef fect (.1. � � . 1 0) would � . 5 6 and U3 � 5 7.9%. fect sizes sizes (A < .20, rpop < .10) would be PS < .56 and U < 57.9%. Medium op 3. ppop 6 3 8 and 9 . 1 %. Large values (.1. (A = = ..50, = .243) would would be PS = = ..638 and U33 = = 669.1%. values 5 0, rpop pop = values (.1. 8 , rpop 3 7 1 ) would 7 1 4 and .8%. (A ;;::> ..8, would be PS ;;:: > ..714 and U33 ;;:: > 78 78.8%. pop ;;::> ..371) 0

•

APPLICATION TO CULTURAL EFFECT EFFECT SIZE Three of the the measures of effect effect size that have been discussed thus far far in this book book have been applied to the the comparison comparison of two cultures cultures (Matsumoto, (Matsumoto, Grissom, & 1 ) . Among many differences between &Dinnel, 200 2001). many other other differences between partici partici1 6 1 ) that had (nus = = 1182) Japan (n (nJjpp = =161) pants in the United States (nus 82) and in Japan study (Kleinknecht, Dinnel, Dinnel, Kleinknecht, been reported in a previous study Kleinknecht, & Hirada, 11997), statistically significantly 99 7), the Japanese had statistically Hiruma, & USparticipants participants on onaascale scaleof ofEmbarrassability, Embarrassability, higher mean scores than the US t(341) = 4.33, Pp < < .00 .001; Social Anxiety, Anxiety, t(3 t(341) =:: 2.96, p < < ..01; t(34 1) = 1 ; a scale scale of Social 4 1 ) :::: 01; and a scale of Social Interaction Anxiety, Anxiety, t(34 t(341) = 33.713, < .00 .001. Todem dem. 71 3, p < 1 . To and 1) = onstrate that statistically onstrate statistically significant differences, differences, or even so-called "highly" "highly" statistically statistically significant differences, differences, do not not necessarily translate translate to very effects of culture (cultural effect effect size), Matsumoto et al. large, or even large, effects (200 1 ) estimated a standardized-difference (2001) standardized-difference effect effect size (Hedges' g gpop of chap. 3), 3), rPOl?' was estimated by and the the PS for these results. The PS was by pa>b >b using pop , and Equation 5 .2 . T able 5 . 2 displays the results. 5.2. Table 5.2 Values not included in Table 55.2 populaV alues of U33 are not . 2 because U33 assumes popula condition that is is not met by the United States and tions of equal size, a condition Japan. Observe that the values in the last column 5 , sug column are all below below ..5, suggesting that the members of Group a ((USA) would tend to be outscored outscored gesting USA) would by the members members of Group b (Japan) in paired paired comparisons comparisons of members of of the two two groups. groups. Recall that when when the PS PS is based on Y.la ) , if the Pr(Y instead of the the equally equally applicable Pr(Yb Pr(Yb > Y the members members Pr(Yaa > YYb) b) instead

Pa

'

TABLE 5.2 5.2

Effect Size Estimates When Comparing Comparing the United States and Japan Japan Cultural Effect

Scale Embarrassability

Yus

1108.80 08.80

YJP

1112.27 1 2.27

Social anxiety

83. 65 83.65

93.50

interaction anxiety Social interaction

26.36

3 1 .50 31.50

Note.

p level level < .001 < ..01 < 01 < .00 .0011 <
b

.46

-.34 -.34

..17 17

.41 .41

-.4 -.411

.20

.38

from "Do between-culture differences differences really mean that people are different? different? Adapted from

effect size," by D. D. Matsumoto, R R.. J. J. Grissom, and D. D. L. A look at some measures of cultural effect 2001, Journal of of Cross-Cultural Cross-Cultural Psychology, Psychology, 32, 32, ((No. 4), Dinnel, 200 1 , Journal No. 4 ). 478-490, p. 486. © 2001 with permission of Sage Publications. Copyright © 2001 by Sage Publications. Adapted with

EFFECT SIZE SIZE MEASURES MEASURES EFFECT

�

1111 11

of Group Group a tend to be outscored by the members members of Group Group b, then the the PS gets gets smaller as the the effect effect gets larger. larger. Thus, the greater value of this PS the effect, effect, the more the current current PS PSdeparts departsupward upward from from .5.5when whenGroup Group is superior and downward from ..5 Group b is is superior. superior. a is 5 when Group Observe in Table 55.2 although the estimates of effect effect size for the the .2 that, although anxiety scales scales are between Cohen' Cohen'ss ((1988) two anxiety 1 98 8 ) criteria for small and effect sizes, the the large sample sample sizes (182 and 1161) medium effect ( 1 82 and 6 1 ) have elevated elevated cultural mean differences differences to what some would call highly or very the cultural highly statistically significant basis of the impresimpres significant differences differences on the basis sively small p values values.. Moreover, Moreover, although although the cultural difference difference for Embarrassability might be considered by some to be highly statistically effect sizes sizes are only in the category of small effects. effects. significant, the effect Thus, it is possible possiblefor a cultural cultural (or gender) stereotype that is based based on a statistically significant difference difference actually actually to translate translate to a small effect effect statistically of culture (or gender) gender).. Even a somewhat valid (statistically) stereotype of may actually not apply to a large percentage of the stereotyped stereotyped group may therefore, may not not be of much practical use, such as in the training training and, therefore, of diplomats diplomats.. Worse, of course, some some stereotypes stereotypescan cando domuch muchpersonal personal of harm. and social harm. TECHNICAL NOTE 5.2: ESTIMATING EFFECT SIZES THROUGHOUT A DISTRIBUTION Traditional measures measures of effect effect size might be insufficiently insufficiently informative 1taditional informative or even misleading when there is heteroscedasticity, heteroscedasticity, nonhomomerity, nonhomomerity, or both. Nonhomomerity Nonhomomerity means inequality inequality of shapes of the the distributions. distributions. For example, example, suppose that a treatment treatment causes causes some participants participants to score higher and some to score lower than they would have scored if they had group.. In this case the treated group' group'ss variabil variabilbeen in the comparison group increase or decrease depending depending on whether whether it was the higher or ity will increase scoring participants whose scores were increased increasedor or decreased decreasedby by lower scoring treatment. However, However, although although variability variability has been changed by the the treatment. treatment in this example, example, the two groups' means means and/or and/or medians might remain nearly the same (which is possible might possible but but much less likely than the example that is case, if is presented presented in the next paragraph) paragraph).. In this case, if we estimate an effect effect size with Yaa -- Y Ybbor orMdna Mdna --Mdnb Mdnb in inthe thenumerator, numerator, be a value that is is not far from from zero zero although although the the estimate might be treatment may have had had a moderate moderate or large effect on the tails even ifif treatment may large effect not much of an effect effect on the center of the treated group's distri distrithere is not bution. The effect effect on variability variability may may have resulted resulted from from the treatment treatment "pulled" tails outward outward or having "pushed" tails inward. having "pulled" In another another case, the treatment treatment may have an effect effect throughout a distri distribution, changing both both the center and the tails of the treated group's disdis tribution. tribution. In fact, fact, it is is common for the group with with the higher mean also to now consider consider a combined greater variability. variability. In this case, if we now combined have the greater distribution that contains all of the scores scores of the treated and comparison distribution

112 112

�

CHAPTER S 5 CHAPTER

proportions of the treated group's scores among the overall groups, the proportions different from from what what high scores scores and among the overall low scores scores can be different would by an an estimate of A and Nowell (1995) Ii or U U33.• Hedges and ( 1 995) would be implied by provided a specific example. In this example, if A Ii = +.3, + . 3 , distributions are normal, the variance variance of the treated treated population' population'ss scores is only 15% normal, and the 1 5% greater variance of the comparison population's scores, one greater than the variance comparison population's would find approximately 2 2.5 participants' scores would find approximately . 5 times more treated participants' scores than comparison participants' participants' scores scores in the top 5% ofthe thecombined combineddistri distri5% of bution. For For more discussion and and examples examples consult Feingold ((1992, 1 992, 1995) 1 995 ) that havejust been been dis disand O'Brien O 'Brien ((1988). 1 98 8 ) . Note that the kinds of results that homoscedasticity if there is non noncussed can occur even under homoscedasticity Todeal dealwith withthe the possibility possibilityof oftreatment treatment effects effects that that are arenot not homomerity. To restricted to the the centers centers of distributions, other measures effect size measures of effect proposed, such as the measures that are briefly introduced in have been proposed, next two sections. the next Hedges-Friedman Method Informative methods have been proposed for for measuring effect effect size at at places along a distribution distribution in addition to its center. Such methods are necnec than the usual methods, so they have not not been essarily more complex than widely used. 1 993), assuming nor used. For For example, example, Hedges and and Friedman Friedman ((1993), norrecommended the the use of a standardized-difference standardized-difference effect effect size, A mality, recommended liu' a, beyond a fixed value, Yu' Ya, in a distribution distribution of the com comat a portion of a tail beyond from PopUlations Populations a and b. The The subscript alpha indicates that bined scores from Ya is is the the score score at the the l00 percentile point point of the the combined combined distribution, distribution, Yu 1 00aa percentile and the value of alpha is chosen by the researcher according to which porwhich por tion of the the combined distribution distribution is of interest. For example, if a tion example, if a = = .25, then score that has 100(.25)% = 25% the scores above above it. Yaa is the score 1 00(.25)% = 2 5 % of the then Y One can then define ((5.7) 5 . 7)

where m of just those scores scores from /-laa /-lab from Populations a aa and m ab are the means of respectively, that are higher than Y", Ya, and O"a aa is the standard deviaand b, respectively, devia tion of those scores combined distribution that are higher than than Y Yo;.a. scores in the combined the value of Yu Ya is selected by by the researcher score in the the Again, the researcher as the score combined distribution distribution that has c% c%of ofthe the scores scoresabove aboveit. it. Computations Computations of the estimates estimates of values of the various various A liua are repeated for those values of the of c that are of interest to the researcher. Extensive computational computational dede of c tails can be found in the appendix appendix of Hedges and Friedman (1993). ( 1 99 3 ) .

Shift-Function Method Doksum ((1977) two 1 9 7 7) presented a graphical method for comparing two groups not not only at the centers of their their distributions distributions but, more informainforma-

EFFECT SIZE SIZE MEASURES MEASURES EFFECT

> 2 proportions proportions.. (A general answer answer stated stated symbolically symbolically suffices suffices.) general .) What influences the accuracy of the normal-approximation normal-approximation pro pro118. 8 . What cedure for constructing constructing a confidence interval for the difference difference be between two urobabilities? probabilities?

EFFECT SIZES SIZES FOR CATEGORICAL VARIABLES VARIABLES EFFECT FOR CATEGORICAL

�

1199 99

19. Explain why why the interpretation interpretation of of a given given difference between two two difference between 1 9. Explain probabilities depends depends on on whether whether both both probabilities probabilities are are close to to 11 probabilities or close to .5. or 0, or or both both are are close to .5. Define relative risk, and and explain explain when when it it iiss most most useful. useful. 20. Define 21. What might might be be aa better better name name than than relative relative risk risk when when this this measure 2 1 . What is applied applied to aa category category that represents aa successful successful outcome? a limitation limitation of of relative relative risk as as an an effect effect size. 22. Discuss Discuss a 23. For which kinds kinds of of categorizing or or assignment assignment of of participants 2 3 . For participants is relative risk risk applicable? applicable? relative Define prospective and and retrospective research. 24. Define 25.. Define Define odds odds ratio in in general general terms. 25 terms . 26. Define odds odds ratio formally. 2 6 . Define 27. To which kinds kinds of of categorization categorization or or assignment assignment of of participants participants is 2 7. T o which an odds ratio ratio applicable? an 28. an odds odds ratio for the data data in Table 88.1. 2 8 . Calculate and interpret interpret an .1. 29. Construct and and interpret interpret aa confidence confidence interval for the the population population 2 9 . Construct interval for odds ratio ratio for for the the data data in in Table Table 8.1. odds 8. 1 . Why is is an an empty empty cell cell problematic problematic for for aa sample sample odds odds ratio? 30. Why ratio? 31. How does does one one test the the null null hypothesis that that the the population population odds odds ra ra3 1 . How tio is is equal equal to to 11 against against the alternate alternate hypothesis hypothesis that that it is is not not equal equal to 1? 1? to 32. Construct aa null-counternull null-counternull interval interval for for the the population odds odds ra ra3 2 . Construct tio for for the the data data in in Table Table 8.2. tio 8.2. 33. In which which circumstance circumstance would would iitt not not be be surprising surprising that that a 3 3 . In null--counternull wide? null-counternull interval interval is is very very wide? Name two two common common measures of of the overall overall association association between 34. Name row and column variables for tables larger than 22 x 22. 35. off sampling sampling are are the two measures measures in Question 3 5 . For For which which kind kind o the two in Q uestion 34 applicable? applicable? 36. Two or more more values values of of the the CC CC should should only only be be compared comparedor or averaged 3 6. T wo or averaged for tables tables that that have have what what in in common? for common? 37. Two or or more more values values of of V should should only only be be compared compared or or averaged averaged for for 3 7. lWo tables that have what in common? 38. Why should should aa research report always present present aa contingency contingency table table 3 8 . Why report always on whose data data an estimate of effect effect size is reported? 3 9 . Define meaning. 39. Define the the NNT NNT and and discuss discuss its its meaning. 40. Discuss Discuss the the problem problem of of testing testing the the significance significance of of an an estimate estimate of of NNT. NNT.

Chapter Chapter

9 9

Effect Effect Sizes for Ordinal for Ordinal Categorical Variables Variables

INTRODUCTION Often Often one of the two categorical variables that are being related is an or ordinal categorical variable, variable, a set of categories categories that, unlike a nominal vari varivariables able, has a meaningful meaningful order. order. Examples Examples of ordinal categorical variables the set of of rating-scale categories Unimproved, Moder Moderinclude the categories Worse, Unimproved, Improved, and and Much Improved; attitudinal scale categoately Improved, Improved; the set of attitudinal catego ries Strongly Agree, Agree, Disagree, Disagree, and of and Strongly Disagree; the the set of Waiting List, Applicant Re Recategories Applicant Accepted, Applicant on Waiting from Introversion to Extroversion. jected; and the scale from Extroversion. The technical name for categorical variables variables is ordered foordered polytomy. The fo for such ordinal categorical cus of this chapter is on some relatively simple methods for estimating estimating an effect effect size in tables with two rows that represent two groups and three or more columns that represent ordinal categorical outcomes (2 X case of two x cc tables) tables).. (The (The methods also apply to the case two ordinal cate categorical outcomes. categories, the outcomes. However, However, with with fewer fewer categories, the number of tied likely to increase, matter that is outcomes between the groups is more likely increase, a matter chapter.)) Table 99.1 provides an example example with real discussed later in this chapter. . 1 provides data in which participants were randomly assigned assigned to one or another treatment. Of course, course, the roles of the rows and and columns columns can be reversed, reversed, so the methods also apply to comparable comparable r x 2 tables. tables. The clinical details do not concern us here, but but we do observe that the Improved column re reveals that neither neither Therapy 11 nor Therapy 2 appears to have been very However, this result successful. However, result is perhaps less surprising when when we note therapy that the results were based on a 4-year follow-up study after therapy and the presenting problem (marital problems) problems)was waslikely likelydeteriorating deteriorating from D. just prior to the start of therapy. The data are from D. K. K. Snyder, Wills, and Grady-Fletcher ((1991). and 1 99 1 ) . Gliner et al. (2002) provided important points provided reminders reminders ooff two two important about the use of ordinal categorical scales. scales. First, the number of categoabout First, the catego the greatest ries to be used should be the greatest number of categories into which 200

EFFECT EFFECT SIZES-ORDINAL SIZES—ORDINAL CATEGORICAL CATEGORICAL VARIABLES VARIABLES

�

201

TABLE 9. 9.11 Ordinal Categorical Outcomes of T wo Psychotherapies Two Psychotherapies

Therapy 1 1 Therapy 2

1 Worse Worse 3 3 2 112

2 No Change 22 113 3

3 3 Improved Improved 4 4 11

T otaL Total 29 26

The insight- oriented The data data are are from from "Long-term "Long-term effectiveness effectiveness of of behavioral behavioral versus versus insight-oriented marital Wills, and A. marital therapy: therapy: A A four-year four-year follow-up follow-up study," study," by by D. D. K. K. Snyder, Snyder, R. R. M. M. Wills, and A. Grady-Fletcher, Grady-Fletcher, 11991, Journal of of Consulting Consulting and and Clinical Psychology, Psychology, 59, 59, p. p. 140. Copyright Copyright 99 1 , Journal © 1999 1999 by by the the American American Psychological Psychological Association. Association. Adapted Adapted with with permission. permission. ©

Note.

the participants participants can placed. Second, Second, if the data the can be be reliably reliably placed. if the data are are origi originally de nally continuous continuous it it is is generally generally not not appropriate appropriate (due (due to to aa likely likely decrease crease in in statistical statistical power) power) to to slice slice the the continuous continuous scores scores into into ordinal ordinal categories. categories. Note Note also also that that one one should should be be very very cautious cautious about about compar comparing ing effect effect sizes sizes across across studies studies that that involve involve attitudinal attitudinal scales scales.. Such Such ef effect sizes can vary if if there there are the number fect sizes can vary are differences differences in in the number of of items, items, number number of of categories categories of of response, response, or or the the proportion proportion of of positively positively and and negatively worded worded items items across to Onwuegbuzie Onwuegbuzie and negatively across studies. studies. Refer Refer to and Levin (2003 ) for Levin (2003) for further further discussion. discussion. The the row The statistical statistical significance significance of of the the association association between between the row and and column variables, variables, as as well as the the effect effect size that is is used to measure measure the the column well as size that used to strength of of that that association, association, might might vary vary depending depending on on who who is is doing doing the the strength categorizing. For categorizing. For example, example, there there may may not not be be high high interobserver interobserver reliabil reliability by a close relative ity in in the the categorization categorization done done by a patient, patient, aa close relative of of the the patient, patient, or Therefore, a or aa professional professional observer observer of of the the patient. patient. Therefore, a researcher researcher should should be appropriately appropriately cautious cautious in in interpreting interpreting the the results results.. (Refer (Refer to Davidson, Rasmussen, Hackett, & Rasmussen, Hackett, & Pitrosky, Pitrosky, 2002, for for an an example example of of comparing comparing ef effect sizes observer-rated scales anxi fect sizes for for patient-rated patient-rated and and observer-rated scales in in generalized generalized anxiety disorder.) disorder.) A related related concern has been raised raised about the use of of a researcher 's rating patients after treatment with researcher's rating of of the the status status of of patients after treatment with aa drug, drug, even even under under double-blind double-blind conditions, conditions, in in cases cases in in which which the the researcher researcher has has a a monetary monetary relationship relationship with with the the drug drug company. company. This This and and other other possi possibly drug-favoring drug-favoring methodologies methodologies (Antonuccio, (Antonuccio,Danton, Danton,&&McClanahan, McClanahan, bly 200 3 ) might 2003) might inflate inflate the the estimate estimate of of effect effect size. size. Before discussing discussing estimation estimation of of effect effect sizes sizes for for such such data data we we briefly Before consider the related problem of of testing testing t�e the statistical statistical significance significance of of the the association between the the row row and column variables. variables. Suppose Suppose that the re association between and column that the researcher's hypothesis is is better the searcher ' s hypothesis is that that one one specified specified treatment treatment is better than than the other—a specified specified ordering ordering of of the the efficacies efficacies of of the the two treatments. Such other-a two treatments. Such a research research hypothesis hypothesis leads leads to to aa one-tailed one-tailed test. test. Alternatively, Alternatively, suppose a that the the researcher researcher's hypothesis is that that one one treatment treatment or or the the other other (unthat 'S hypothesis (un specified) is is better—a prediction that that there will will be be an an unspecified orderorderbetter-a prediction

202

CHAPTER 9 9 CHAPTER

�

efficacies of the two two treatments. This latter latter hypothesis leads leads to ing of the efficacies test. One One or the other of these two ordinal hypotheses pro proa two-tailed test. H0 that posits no association association between between vides the alternative to the usual Ho row and column variables variables.. An ordinal hypothesis is is a hypothesis the row predicts not not only a difference difference between the two treatments treatments in the that predicts distribution of their scores in the outcome categories (columns distribution their scores (columns in this but a superior outcome outcome for one ((specified example) but specified or unspecified) of the two treatments treatments.. These These typical ordinal researchers' researchers' hypotheses are of in intwo terest in this chapter. AX x22 test is inappropriate to test the null hypotheses at at hand because because

the value of X x2 is insensitive to the ordinal ordinal nature nature of ordinal categorical 2 2 variables. In this ordinal ordinal case a x test can only only validly validly test test a not very X test useful "nonordinal" "nonordinal" researcher researcher's' s hypothesis hypothesis that the two groups are in useful way not distributed the same in the various outcome categories some way categories from chapter 8 that the magnitude of xX22 is (Grissom, 11994b). 9 94b) . Also, Also, recall from not an an estimator estimator of effect effect size because because it is very sensitive sensitive to sample size, not not just to the strength strength of association association between the variables variables.. (The not (The Kolmogorov-Smirnov two-sample test would would be be a better better choice than Kolmogorov-Smirnov testing Ho H0 against against a researcher researcher's' s hypothesis hypothesis of superiority superiority the x X22 test for testing of one treatment treatment over another, but but this test also has unacceptable unacceptable short shortof comings for this purpose; Grissom, Grissom, 11994b.) 9 94b. ) Although there are other, complex, approaches to data more complex, data analysis for a 2 x c contingency table ordinal categorical outcome outcome variable, in this chapter chapter we con conwith an ordinal sider those that involve relatively simple measures of effect effect size: the the point-biserial correlation (perhaps problematic in this case), (perhaps the most problematic the probability probability of superiority, the the dominance measure, measure, the the generalized the generalized cumulative odds ratio. odds ratio, and the cumulative 2

APPLIED TO ORDINAL ORDINAL CATEGORICAL CATEGORICALDATA DATA THE POINT-BISERIAL r APPLIED

Although we soon observe that there are limitations limitations to this method (as (as is Although often true true regarding measures of effect effect size), one might might calculate a often (see chap. 4), as perhaps perhaps the the simplest simplest estiesti point-biserial correlation, r pb b (see effect size size for th the case case at at hand. First, First, the c column category la lamate of an effect replaced by ordered numerical values, such as 11,, 2, ...., c. For the the bels are replaced . . , c. 9.1 might use I1,2, column categories in Table 9 . 1 one might , 2, and 3, 3, and call these the Next, the labels for the row categories are replaced scores on a Y variable. Next, numerical values, say, 11 and 2, and these are called the scores on anX an X with numerical One then then uses any any statistical statistical software to calculate the correla correlavariable. One coefficient, r, for for the the now now numerical X X and and Y variables. tion coefficient, Software output yielded rppbb = Software = -.397 -. 3 9 7 for for the the data data in in Table 9.1. 9 . 1 . When When unequal, as they are in Table 99.1, correct for the sample sizes are unequal, . I , one can correct the attenuation attenuation of r that results from such inequality by using Equation from chapter 4 for rce, where c denotes denotes corrected. corrected. Because sample 4.4 from sizes are reasonably large and not very different different for the data in T Table able 9 . 1 , we are not surprised 9.1, surprised to find find that the correction correction makes little differdiffer-

l

EFFECT SIZES-ORDINAL SIZES—ORDINAL CATEGORICAL CATEGORICAL VARIABLES VARIABLES EFFECT

203 203

�

ence in this case; 3 9 8 . The case; rc rc = -. -.398. The correlation correlation is moderately large using Cohen's ((1988) 1 9 8 8 ) criteria criteria for the relative sizes sizes of correlations correlations that were critiqued in chapter 3 9 7 is statisti chapter 4. Output Output also indicates that rpbb = = -. -.397is statistisignificantly different different from from 0 at at the p < < ..002 two-tailed. Note 002 llevel, vel, two-tailed. cally significantly that the negative correlation correlation indicates that Therapy 11 is is better better than Therapy Therapy 22.. One One can now conclude, conclude, subject to the limitations limitations that are discussed later, that Therapy 11 has a statistically statistically significant and moderately strong superiority superiority over Therapy 2. moderately 2.

l

CONFIDENCE INTERVAL AND NULL-COUNTERNULL INTERVAL FOR FOR rpop rpop

Recall from from chapter 4 that construction of an accurate confidence inter inter0 can be complex and that there may may be no entirely satisfactory val for rrpop metho herefore, researchers method. Therefore, researcherswho who report reportaaconfidence confidenceinterval intervalfor forrr should also include include such a cautionary comment in their research research re-: ports.. For For more details consult Hedges and Olkin ((1985), ports 1 9 8 5 ) , Smithson (2003), 2003). Refer to the section on confi confi(2003 ), and Wilcox ((1996, 1 996, 11997, 99 7 , 2003 ) . Refer dence intervals and null-counternull brief null-counternull intervals in chapter 4 for a brief discussion of the improved methods for construction of a confidence confidence in in(2003) and Wilcox Wilcox (2003 (2003). As an alternative alternative terval for rpop by Smithson (2003) ) . As confidence be inclined to construct instead the to a confid ce interval one might be simple null-counternull o using Equation 4 . 2 from null-counternull interval for rpop 4.2 from chap chapHowever, as pointed out out in cha chapter 8, a null-counternull null-counternull interval interval ter 4. However, t r 8, for an an effect effect size is less useful useful (very wide) when the the estimate of the the effect effect for size is is already known known to be large large and statistically significant, significant, which is the case for the data in Table 9 .1. 9.1. Althoug h , from n advance that the Although, from chapter 88 w wee know know iin null-counternull Ho: null-counternull interval will be wide wide when the the null hypothesis is is H 0: r 0 = rpop = 0 and and the the obtained estimate estimate of effect effect size is is large, large, we proceed to to use Equation 4 4.2 null-counternull interval for rpop for s Equation . 2 to construct a null-counternull our null-hypothesized value oof rpop these data as an exercise. Because our op is the lower limit of the the interval (null (nullvalue) value) is o. 0. We Weapply the the obtained obta ned 0, the 2 '2 -.397 Equation 44.2, = 2r // ((1 r = -. 3 9 7 to Equation . 2 , rrm 1 + 3r2)' ) '',, to find that the upper cn = 2 2 1/2 (counternull value) value) is is 22(-.397) / [[1 3(-.397 = -.654. Therelimit (counternull (-. 3 9 7 ) / 1 + 3 (-. 3 9 7 )] ) ] ,' = There from 0 to -.654. fore, the interval runs from

lT

;;

�';i

p�

J�

r';!' i

=

LIMITATIONS OF OF rpb rpb FOR ORDINAL CATEGORICAL DATA

For general discussion of limitations of r pb b refer refer to the the section AssumpAssump troubletions of r and r pbb in chapter 4. The limitations limitat ons may may be especially trouble cases, such one, in which there are very few values some in cases, uch as the present one, of the X and and Y variables (two and and three values, respectively) respectively).. These These data data of cause concerns such as the possibly inaccurate obtained pp levels levels for the t test that is is used to test for the statistical statistical significance significance of rr pbb.. However, test However, in ordinal example there might might be some favorable favorable circ circumstances this ordinal mstances that

l

i

tl

204

�

CHAPTER 9 CHAPTER

possibly reduce the risk in using rr pbb.. First, sample sizes are reasonably are reasonably large. Second, the obtained obtained p level 1iss well beyond the the customary customary mini minimum criterion criterion of .05 .05.. Also, indicated that statistical statistical mum Also, some studies have indicated power and and accurate p levels can be maintained maintained for the t test test even when when the the power variable is is dichotomous dichotomous (resulting (resulting in in aa 22 x 22 table) table) if if sample sample sizes sizes are are Y variable 'Agostino, 11971; greater than 20 each, as they are in our our example (D (D'Agostino, greater 971; 9 70 ) . A dichotomy is a much grouping of categorical Lunney, 11970). much coarser coarser grouping categorical outcome than the polytomy able 9 .1. polytomy of tables such as T Table 9.1. Regarding the t test statistical significance of rpbb', it has been re retest of the statistical ported that even when five the the p levels for ported when sample sizes are as small as fi� test can be accurate when at least three ordinal ordinal categories the t test when there are at (Bevan, Denton, & Sawilowsky (1998) & Meyers, 11974). 9 74) . Also, Nanna and Sawilowsky ( 1 998) showed that the t test can bbee robust with respect ttoo T Type showed ype I error and can applied to data from from rating scales, but but M Maxwell axwell maintain power power when when applied showed that, under under heteroscedasticity and equality and Delaney ((1985) 1 9 8 5 ) showed of means of populations, populations, parametric methods applied to ordinal data of might result result in misleading conclusions conclusions.. (However, in experimental re remight search it might might not be common common to find find that treatments treatments change variances means.)) For For references to many articles whose con conwithout changing means. clusions favor favor one or the other side of this longstanding controversy controversy parametric methods methods for ordinal data, data, consult consult Nanna about the use of parametric and Maxwell and and Delaney (2004 (2004). Regarding the the prospects for (2002) and ) . Regarding future satisfactory method method for constructing constructing a confifuture development of a satisfactory confi difference between the mean ratings of two two dence interval interval for the difference Penfield (2003). (200 3 ) . groups, refer refer to Penfield bee concerned off our One might also b concerned about the arbitrary nature o equal-interval scoring of the the columns ((1,2, and 33)) because other sets of 1 , 2, and of numbers could have been used. Snedecor and Cochran three increasing numbers and Moses (1986) moderate differences among ((1989) 1 9 8 9 ) and ( 1 9 8 6 ) reported that moderate differences among ordered, but but not not necessarily equally spaced, respaced, numerical scores that re place ordinal ordinal categories do not result in important important differences differences in the value of t. However, Delaney and Vargha (2002) provided provided contrary re value rethere was a statistically significant difference between between sults in which there significant difference means for two treatments for problem problem drinking drinking when when the increas increasthe means levels of alcohol consumption consumption were ordinally ordinally numerically numerically scaled ing levels with equal spacing as 11 (abstinence) (abstinence),. 2 (2 to 6 drinks per week), week). 3 (be(be tween 7 and 1140 week), and 4 (more (more than 1140 tween 40 drinks per week). 40 drinks per week), but but there was not not a statistically significant difference when the week). difference when same four four levels of drinking were scaled with slightly spacing slightly unequal spacing such 2,, 33,, 44.. Consult Consult Agresti (2002) for similar similar results that indi indisuch as 0, 2 the spacing is important important and and for for further further discussion of the choice cated the of scores for for the the categories categories.. For dependent variables for for which there is of obvious choice of score spacing, such as the the dependent variable in no obvious Table 9.1, 9 . 1 , Agresti (2002) acknowledged that equal spacing of scores is often a reasonable reasonable choice. often If, unbeknownst to the researcher, a continuous latent variable variable hap hapscale, one would would want want the spacing of scores to be pens to underlie the scale,

EFFECT SIZES-ORDINAL SIZES—ORDINAL CATEGORICAL CATEGORICAL VARIABLES VARIABLES EFFECT

rlW=

205

differences between the underlying values values.. Agresti consistent with the differences (2002) recommended recommended the use of sensitivity analysis analysis in which which the results from compared. One from two two or three sensible sensible scoring schemes schemes are compared. One would hope that the results results would not be very different. different. In any event, the re rewould not sults from each of the scoring schemes should be presented. researchers will remain concerned concernedabout about the the validity validity of rr pbb and and Some researchers the accuracy of the the p levels levels of the t test under the the following co combinabina the circumstances:: Sample sizes are small, there are as few as three tion of circumstances ordinal categories, there is possible skew or skew in different different directions directions ordinal for the two heteroscedasticity. Because for two groups, and there is possible heteroscedasticity. Because and/or highest extremes extremes of the ordinal categories categories may may not the lowest and/or extreme as the actual actual most extreme standings of the participants participants be as extreme with regard to the construct construct that underlies underlies the rating scale, skew or with differential skew may may result. For example, example, suppose suppose that there are re redifferential spondents in one group who disagree extremely extremely strongly strongly with a pre prespondents statement and respondents in the other other group who sented attitudinal statement extremely strongly strongly with it. If the scale does does not not include these agree extremely extreme categories, categories, the responses responses of the two two groups will "bunch very extreme less extreme strongly strongly disagree disagree or strongly strongly agree up" with those in the less and ceiling effects, effects, as dis discategories, respectively (which are floor and different directions directions cussed in chap. 11). ) . The consequence will be skew in different for the two two groups as well as a restricted range of the dependent dependent vari varifor able. Recall from from chapter 4 that differential differential skew and restricted range of the the measure of the dependent dependent variable variable can be problematic for for rrph" of pb. Pearson rrpop Note that the issue issue of the Pearson pop' as a measure of only the linear component between X and and Y is not relevant here be becomponent of a relationship relationship betwe not relevant cause the two two values of the X variable do not not represent a dichotomized continuous variable that might have a nonlinear relationship with the variable. Instead, Instead, the the two two values values of of the the X X variable variable represent representaa true true di diY variable. chotomy such as Therapy 11 and Therapy 2 or male and female. Finally, Cliff Cliff ((1993, justi1 99 3 , 1996) 1 99 6 ) argued that there is rarely rarely empirical justi fication for treating the numbers numbers that are assigned to ordinal categories fication having other than than ordinal properties. Wenow nowturn turnto toaaless lessproblem problemas having properties. We effect size for ordinal categorical categorical data, a measure for which the the cate cateatic effect ordered and and the the issue of the spacing spacing of numerical gories need only be ordered scores is irrelevant.

rri'

lri

PROBABILITY OF OF SUPERIORITY APPLIED TO ORDINAL DATA THE PROBABILITY DATA

The part of the following material that is is background background information information was explained in more detail in chapter 55,, where the effect effect size called the probability continuous probability of superiority was introduced in the context of a continuous Yvariable. variable.Recall Recallthat thatthe theprobability probabilityofofsuperiority, superiority,PS, PS,was wasdefined definedasas Y probability that a randomly randomly sampled member of Population a will the probability (Ya) that is is higher randomly have a score (Yal higher than the score attained by a randomly of Population Population b (Y1J (Yb). Symbolically, PS = P Pr(Y >Y sampled member of r(Yaa > Yb) . In b).

206

�

CHAPTER 9 9 CHAPTER

9.1 Therapy 11 and b represents Therapy 2, the case of Table 9 . 1 a represents Therapy 2, wee now now call these therapies Therapy a and Therapy b. The PS iiss esti estisso ow paa >>bb,' which mated by P which is is the theproportion proportionof oftimes times that thatmembers membersof ofSample Sample have a better outcome than members members of Sample Sample bb when the outcome a have of each member of Sample a is compared to the outcome of each member of of Sample Sample b, one by one. In Table 99.1 the outcome of No of . 1 we consider the = 22)) to be better than the the outcome Worse (Y = = 1) and the the out outChange (Y = 1 ) and Improved (Y = = 33)) to be better than than the outcome No Change. Change. The come Improved number of times that the outcome outcome for a member of Sample a is better better number Sample b in all of these than the outcome for the compared member of Sample head-to-head comparisons is called called the U statistic. (We (Wesoon soonconsider considerthe the head-to-head scores.) comhandling of tied scores . ) The total number of such head-to-head head-to-head com product of the two two sample sizes, sizes, nnaa and n nb parisons is given by the product b. an estimate of the PS is given by Equation 55.2 from chapter 55,, Therefore, an .2 from pa>b estimator are are not not sensitive to to the the magmag and its its p Paa>b > b estimator > = U/nanb. The PS and but they are nitudes of the scores scores that are being compared two at a time, but which of the two two scores is is higher (better), (better), that is, ordersensitive to which is, an order the two two scores. Therefore, Therefore, the PS and and P paa >bb are are applicable applicable to to 22 xx cc ing of the c categories are ordinal categorical. categorical. Note Note that nu nutables in which the c likely when comparing two scores outmerous ties are likely scores at a time when out comes are categorical, even more so the smaller the effect effect size and the fewer the categories; categories; consult Fay Fay (2003 (2003). Therefore, we pay particular fewer ) . Therefore, attention to ties in the following sections. attention

PS

(Y

(Y

U

Pa b U/nanb•

PS

:>

EXAMPLE OF ESTIMATING ESTIMATING THE PS PS FROM ORDINAL ORDINAL DATA WORKED EXAMPLE

discussing the use of software for the present task we describe Before discussing describe calculation. (Although (Although a standard standard statistical package might manual calculation. intermediate values for the calculations, calculations, we describe provide at least intermediate manual calculation calculation here because because it should provide readers readers with with a better manual the concept of the PS when applied to ordinal categori categoriunderstanding of the manual calculation calculation requires frequencies, cal data. Also, Also, manual requires only cell frequencies, whereas calculation calculation using using standard standard software might might require more labo labowhereas entry of of each observation. observation.)) We We estimate PS = = Pr(Ya > > Yb) using Sa Sa to to rious entry Sample a has an outcome denote the number of times that a member of Sample that is superior to the outcome for the compared member of Sample b. We use T to denote the number of times that the two outcomes are tied. two participants who are being compared A tie occurs whenever the two have outcomes that are in the same outcome category (same (same column of of 9.1). number of ties arising arising from each column column of the table is Table 9 . 1 ) . The number product of the two cell cell frequencies frequencies in the column. column. Using the simple the product tie-handling method that was recommended recommended by Moses, Emerson, Emerson, and tie-handling Vargha (2002) we al alHosseini ((1984) 1 984) and also adopted by Delaney Delaney and Vargha locate ties equally equally to each group group by counting counting each tie as one half half of a win assigned to each of the two two samples. (Consult Brunner & & Munzel, 2000; assigned 2003;; Pratt & & Gibbons, Gibbons, 11981; Randies, 200 2001; Rayner & & Best, 200 2001; 9 8 1 ; Randles, 1 ; Rayner 1; Fay, 2003 and Sparks, Sparks, 11967, discussions of ties. ties.)) Therefore, Therefore, and 96 7, for further discussions

PS

PS Pr(Ya Yb)

EFFECT SIZES-ORDINAL SIZES—ORDINAL CATEGORICAL CATEGORICAL VARIABLES VARIABLES EFFECT

= U =

�

Sa+.5T. Sa + . 5 T.

207 ((9.1) 9.1)

Sa

Calculating 5a bbyy beginning beginning with with the the last last column column (Improved) (Improved) ooff Table Table Calculating 9.1, observe that that the the outcomes outcomes of of the the four four patients patients in in the the first first row row (now (now 9 . 1 , observe called Therapy Therapy a) a) are are superior superior to to those those of of 113 = 25 of of the the patients in called 3 + 112 2 = patients in row 2 (now (now called called Therapy Therapy bb).) . Therefore, Therefore, thus thus far far 4( 4(13 = 100 1 3 + 112) 2) = 1 00 row pairings of of patients patients have have been been found found in in which which Therapy Therapy aa had had the the supe supepairings rior outcome. outcome. Similarly, Similarly, moving moving now now to to the the middle middle column column (No (NoChange) Change) rior of the the table table observe observe that that the the outcomes outcomes of of 22 of of the the patients patients in in Therapy Therapy a of are superior superior to to those of of 112 of the the patients patients in in Therapy Therapy b. This This latter latter result result are 2 of adds 22 x 112 to the the previous previous subtotal subtotal of of 1100 pairings within within adds 2 = 264 to 00 pairings which patients patients in in Therapy Therapy a had had the the superior superior outcome. outcome. Therefore, Therefore, which Sa == 1100 = 364. The The number number of of ties ties arising arising from from columns columns 11,2, 00 + 264 = Sa , 2, 12 = 36, 22 x 13 13 = so and 33 is is 33 x 12 = 286, and and 4 x 11 == 44,, respectively, respectively, so and T == 36 5 (326) = 5527. 2 7. The 36 + 286 286 + 4 == 326. 326. Thus, Thus, U U == Sa + .5T .5T == 364 364 + ..5(326) The number of of head-to-head head-to-head comparisons comparisons in in which which aa patient in Therapy number patient in Therapy a a had aa better better outcome outcome than than aa patient patient in in Therapy Therapy b, b, when when one one allocates allocates had ties equally, equally, is is 5527. There were were nanb = =2 6= = 754 total total comparisons comparisons 2 7 . There 299 x 226 ties made. Therefore, Therefore, the the proportion proportion of of times times that that aa patient patient in in Therapy Therapy aa had had made. an outcome outcome that that was was superior superior to to the the outcome outcome of of aa compared compared patient patient in an Therapy b, pa > b, (with (with equal equal allocation allocation of ofties) ties) is is 527/754 5 2 7/754 = = .699. . 6 9 9 . We We thus estimate estimate that that there there is is nearly nearly aa ..77probability probability that that aarandomly randomly sam sampled patient patient from from aa population population that that receives receives Therapy Therapy aa will outperform outperform a randomly randomly sampled sampled patient patient from from aa population population that that receives receivesTherapy Therapy b. b. a If therapy has If type type of of therapy has no no effect effect on on outcome, outcome, PS = = .5. Before Before citing citing methods that for methods that might might be be more more robust robust we we discuss discuss traditional traditional methods methods for testing PS = ..5. As discussed in chapter 55,, one might H0: PS = .5 5 . As might test Ho: .5 against Halt: PS ..5 using the the Mann-Whitney Mann-Whitney U U test test (perhaps (perhaps more more ap apagainst 5 using propriately called, in in terms of of historical historical precedence, the Wilcoxon Wilcoxonpropriately called, precedence, the Mann-Whitney test). However, However, as as discussed discussed in in the Assumptions Mann-Whitney test). the section section Assumptions in chapter chapter 5, heteroscedasticity can can result in aa loss of of power power or or inaccu inaccuin 5, heteroscedasticity result in rate pp levels levels and and inaccurate inaccurate confidence confidence intervals intervals for for the (cf. Delaney & rate the PS (cf. Vargha, 2000; Wilcox, Wilcox, 11996, 2001, 2003).) . Vargha, 996, 200 1 , 2003 Only a minority minority ooff textbooks textbooks ooff statistics statistics have have aa table table ooff critical critical val valOnly a ues of of U for for various various combinations combinations of of sample sample sizes, sizes, na and and nb. Also, Also, books books that do include include such such aa table table (or aa table for for the equivalent equivalent statistic, statistic, Wm Wm', that is is discussed shortly) shortly) may not not include include the the same same sample sample sizes that that sizes that were used by by the researcher. researcher. Therefore, Therefore, we we now use of soft softnow consider the use U test. ware to ware to conduct conduct aa U test. Programs of statistical software packages can be used to conduct conduct a U U test from ordinal ordinal categorical categorical data data if if aa data data file is created created in in which which the the ordi orditest from file is nal categories categories are are replaced replacedby by aa set setof ofany any increasing increasingpositive positive numbers, numbers, as as nal . 1 . Available we we already already did did for for the the columns columns in in Table Table 99.1. Available software software may may in in' s Wm stead provide an an equivalent equivalent test test using using Wilcoxon Wilcoxon's Wm statistic. statistic. Software Software stead may may also also be be using using an an approximating approximating normal normal distribution distribution instead instead of of the the exact distribution distribution of of the the W Wmm statistic statistic and and use use as as the the standard standard deviation deviation of exact of ==

Sa

==

nanb

Pa > b'

PS

PS Halt: PS ;t:.

PS

==

PS

na

nb.

==

208

�

CHAPTER 9 CHAPTER

distribution (the (the standard standard error) error) a standard deviation that has not this distribution been adjusted for ties. ties. (We (Weadjust adjust for forties tieslater laterin inthis thissection. section.) Forthe the data data ) For Table 9.1 yields Wm Wm = = 8878, = .01 .0114, . 1 such software yields 78, Pp = 14, two-tailed, using in T able 9 normal approximation approximation in which the standard standard error is is not ad adjusted a normal justed for reties, so the reported p level is not as accurate as it could be although the re ported value of Wmm is correct. To To derive derivean an estimate estimate of ofPS PSfrom fromthis this output output ported is transformed transformed to to U Uusing, using, as as in in chapter chapter 5, 5, Wm m is

(9.2) nss iiss the the smaller ooff the the two two sample sizes oorr simply nn iiff nnaa = = nnbb. Ap Apwhere n plying the data data in T Table 9.1 9.2, the obtained Wm Wm = = .878 able 9 . 1 to Equation 9.2, . 8 78 transforms to U U= = 878 878 -- [26(26 [26(26 + I1)] 527,which whichisisthe thesame samevalue value transforms ) ) //22==527, for U that we previously obtained using manual calculation. for When sample sizes are larger than those in a table of critical values of U U a manually-calculated U U test is often often conducted conducted using a normal of approximation. (Unlike approximation. (Unlike tables of critical values of U or of Wm m', tables of of the normal curve appear in all books on general general statistics . ) For statistics.) For ordinal categorical data data there is an old three-part three-part rule of thumb (possible mod modification of which we suggest later) ification later) that has been used to justify use of of U test that uses uses the normal approxima approximathe version of the Wm m test or U tion. The rule consists consists of (Part 11)) nnaa �> 110, (Part 2) nnbb �> 110, and (Part 33)) no no 0, (Part 0, and frequency > ..5AT, = nnaa + + nnbb (Emerson & & Moses, column total frequency 5N, where N = al.,, 11984). 11985; 98 5 ; Moses Moses et al. 984). According According to this rule, if all of these three satisfied the the following transformation of U U to z is made, criteria are satisfied and the the obtained z is referred referred to a table of the normal curve to see see if it is and least as extreme as the critical value that is required for the adopted at least level (e.g., z = ± ±1.96 two-tailed):: 1 .96 for the .05 level, two-tailed) significance level

9.3) ((9.3) standard deviation ooff the distribution ooff U U (standard er erwhere Ssuu is the standard ror) and

(9.4)

might justify use of With regard to the minimum sample sizes that might the normal approximation, approximation, recall from from chapter 5 that Fahoome Fahoome (2002) found that the minimum equal sample sizes sizes that would justify justify the use found of the normal approximation approximation for the Wm Wm test (equivalent (equivalent to the U U test), test), of 5 for tests in terms terms of adequately adequately controlling controlling lYpe Type II error, were 115 tests at at the the

EFFECT SIZES-ORDINAL SIZES—ORDINAL CATEGORICAL CATEGORICAL VARIABLES VARIABLES EFFECT

�

209

Therefore, until there is is fur fur.05 level and 29 for tests at the ..01 0 1 level. level. Therefore, about minimum sample the ther evidence about sample sizes for the case of using the approximation to test test PS PS = = ..5 ordinal categorical data, normal approximation 5 with ordinal perhaps a better rule of thumb Fahoome ' s thumb would be to substitute Fahoome's minimum sample sizes for those in in Parts 1\ and and 22 in in the the previ previ(2002) minimum sizes for ously described old rule. significance level level can be attained attained by adjusting Ssuu for A more accurate significance ties.. Such Such an ad adjustment beneficial if any any column to toties justT!;mt might be especially beneficial tal contains more mure than one half of the total participants participants.. This This condition condition violates the criterion for Part 3 that was previously previously listed for justifying justifying use of a normal normal approximation. approximation. (Because some software might not make adjustment adjustment.) Observe that that this ad justment we demonstrate the manual ad justment . ) Observe able 9. 3 = 35 of the column 2 of T Table 9.11 contains 22 + 113 the 29 + 26 = 55 of the the patients. Because 35/55 = = .64, .64, which is is greater than the criterion criterion total patients. maximum of .5, we use the ad adjusted su/ denoted Ssadj denominator maximum justed sU' adj ', in the denominator of Zu zu for a more accurate accurate test, of

(9.5 (9.5))

where/; where J; iis s a column total frequency. Beginning our Equation 9.4 for Su our calculation with with Equation su we find find for the Table 9.11 that thatsSuu = = [29(26)(29 + 26 + 1l)/12] 9. 3 1 8 . Next, )1 1 21/2i )" = 559.318. data in T able 9. we calculate , 2 , and 33 in that order. calculateff33 ij -- J; fi for each of the columns columns 11,2, 5 33--15 1 5 = 3,360, 35 3 5 = 42, 840, and - 5 = 1120, 20, These results are 115 3533--35 42,840, and 5533 -5 Summing these last three values values yields respectively. Summing 3,360 + 42, 840 + 120 = 42,840 = 46,320. Placing 46,320 into Equation 9.5 we 1 8 [ 1 -- 46,320/(5 have S saadd" = 59.3 59.318[1 46,320/(555 33 - 55)f' 55)]' A = 50.385. From Equation with Ssadj 99.3 . 3 wit su' u, we now have adj replacing s 5 (29)(26)] / 50.385 == 22.98. . 98. Inspection zZuu = = [527 [527 - ..5(29)(26)1 Inspection of a table table of the normal normal curve reveals that a Zz that is is equal to 2.98 is is statistically statistically signif significant beyond the .0028 level, two-tailed. There is is thus support for a re researcher's hypothesis that one one of the therapies is is better better than the other, searcher 's hypothesis find that Therapy a is is the better one. and we soon find adjusting Ssuu for ties ties results now now in a different different ob obObserve first that acljusting tained significance level from from the value of .01 .0114 previously ob ob1 4 that was previously although both both levels levels represent < .02. Because tained, although represent significance at p < our estimate of P Pr(Y >Y Fb)b),, pa a>>b'b, was was ..699, which isisaa value valuegreater greaterthan than 699, which r( Yaa > our null-hypothesized value of .5, the therapy therapy for which which there is is this the null-hypothesized just-reported statistically significant significant evidence of superiority is Therapy a. Because U U is statistically statistically significant significant beyond the .0028 (approxi (approxitwo-tailed level, paa >>bb == .699 mately) two-tailed .699 isis statistically statistically significantly significantlygreater greater than .5 .5 beyond the .0028 two-tailed two-tailed level. naa and nnbb > 110 ties, as is true for the the 0 the presence of many ties, When both n data in T Table 9.1, result generally in the approxiapproxidata able 9 . 1 , has been reported to result

h

P

P

210 210

�

CHAPTER 9 9 CHAPTER

mate p level level being within within 50% of of the the exact p level (Emerson & & Moses, Moses, 1 985; but obtained p level, 1985; but also consult consult Fay, Fay, 2003) 2003).. In this example the the obtained .0028, is so far far from from the the usual criterion of .05 that perhaps one need not not very concerned about the exact exact p level attained attained by the results. How Howbe very ever, especially when more than half of the participants participants fall fall in one out outthan half and the the approximate approximate obtained p level is close close to .05, a come column and researcher might prefer report an an exact obtained p level as is discussed discussed prefer to report in the paragraph paragraph after after the next next one. Note that the and OR OR o of the DM and of the the next next three three sections) sections) is the PS (and the wo ordinal applicable to tables that have as few aastltwo ordinal outcome catego categories, although more ties are likely when when there are only two outcome outcome . 1 (chap. . 1 (chap. (chap. 8 categories. Tables 44.1 (chap. 4) and and 88.1 8)) provide examples be because Participant Participant Not Better after treatment treatment Participant Better versus Participant 4.1 after treat treat. 1 and Symptoms Remain versus Symptoms Gone after in Table 4 able 8. ment in T Table 8.11 represent in each case case an an ordering of outcomes outcomes.. One outcome is not not just different different from from the other, as would would be the case case for a scale, but but in each example one outcome can be considered nominal scale, considered to alternative outcome. As an exercise exercise the reader might be superior to its alternative . 1 to Equation . 1 to verify, apply the results of Table 88.1 Equation 99.1 verify, with with regard to the superiority Psychotherapy to Drug Therapy in that example, superiority of Psychotherapy that the PS P5 is estimated to be .649. An exact p level for for U and, therefore, therefore, for for testing H H0o:: PS PS = = .5 against H PS "* .5, can be obtained using the statistical software packages Halt: alt: PS Exact, or SAS SASVersion Version9.9.(Refer (Referto toPosch, Posch,2002, 2002,for foraastudy study StatXact, SPSS Exact, of of the power of exact [StatXact [StatXact]I versions of the the W Wmm test and and competing tests applied to data data from . ) Recall from from 2 Xx c tables tables.) from chapter 55 that Fay (2002) provided a Fortran 90 program program to produce produce exact critical values for for the the Wmm test over a wider range of sample sizes and and alpha levels than generally be found in published can generally published tables. For For further further discussions discussions of the the PS and Inde and U U test in general review the the The Probability of Superiority: Superiority: Independent Groups and Assumptions sections in chapter 5 5.. Consult Delaney Delaney and and Vargha Vargha (2002) (2002) for discussion of robust meth methfor the the current current case of ordinal categorical dependent dependent variables. ods for variables . Type condiHowever, such methods might inflate T ype I error under some condi and Vargha (2002) demonstrated that these tions of skew. Delaney Delaney and methods methods might might not not perform perform well when when extreme skew is combined with both sample sizes sizes being at at or below 10. Sample Sample sizes sizes between 20 one or both and 30 might be satisfactory. satisfactory. Wilcox (2003 (2003)) provided provided an an S-PLUS func funcand tion Munzel (2000) method for testing H Ho: tion for the the Brunner and and Munzel PS = = .5 .5 0: PS PS under conditions and for constructing constructing a confidence confidence interval for the PS of heteroscedasticity, heteroscedasticity, ties, or both. of Recall that from recommended that re refrom time to time in this book we recommended searchers consider estimating and reporting reporting more than one kind of ef effect size for a given set of data data to gain different different perspectives perspectives on the the fect results. acknowledged a contrary contrary opinion opinion that holds results . However, we also acknowledged reporting of estimates of multiple measures might only serve that such reporting

EFFECT SIZES-ORDINAL SIZES—ORDINAL CATEGORICAL VARIABLES EFFECT CATEGORICAL VARIABLES

�

211 2 11

to confuse readers. The example of estimation confuse some readers. estimation of the point-biserial point-biserial rpop and and the PS PSfor fordata data such suchas asthose thosein inTable Table99.1 . 1 are areof ofinterest interestin inthis this re regard. the former former was -.398 -.398 and the estimate of the latter latter d. The estimate of the was .699. A researcher who who reports both of these values values would be not only to discuss discuss the limitations limitations of the point-biserial correla correlaobliged not but also to make clear to readers the dif diftion in the case of ordinal data but ferent meanings, but message, of the two two reported estimates but consistent message, of effect effect size. Both results support the the superiority superiority of Therapy aa.. of The values -.398 -.398 and .699 for the two two estimates both constitute esti estieffect sizes by Cohen's Cohen's ((1988) 1 988) criteria that were mates of moderately large effect discussed in chapters 4 and and 5. Also, referring to the columns 5 . Also, columns forrrpop op', the PS, U33 measure of overlap in T ' s ((1988) and Cohen Cohen's Table and 1 988) U able 5.1 of chapter 5, observe that these two two values for estimates of rr p and the PS both correspond to a value of U U33 that indicates that tely three fourths of the members that appro approximately of the better performing group have outcomes that are above the median of performing group. (Note that it is of no concern outcome of the poorer performing concern = .398 0 = when interpreting the results or examining the rows closest to rpop the PS was positive. in Table 5.1 5.1 that rpb b was negative and the estimate of the PS wa ositive. p Because it is a proportion proportion the the estimate of PS cannot be negative, negative, and and a value superiority for for Group a. A negative value for for rrpb b, similarly similarly over .5 indicates superiority Group a] al tends to score higher tthan an Group indicates that Group 11 [same as Group 2. The sign of r pbb depends on on which which sample's data data are are arbitrarily arbitrarily placed in row 11 or row row 2, as discussed discussed in chap. chap. 4.) Note that those who do not find find the median to be meaningful meaningful in the case of ordinal data data with few categories many ties would not not want to apply U U33 in such cases. and many Note that in the case of ordinal categorical data, due to the limited number of possible outcomes outcomes (categories) number (categories ) there is no opportunity for the shifted up or down by a treatment treatment to a most extreme outcomes to be shifted (for which which there is no outcome outcome category) category).. The The result more extreme value (for would be a bunching of tallies tallies in the existing most most extreme category would cf. Fay, Fay, 2003), 2003), obscuring the the degree degree of shift shift in the the underlying underlying (skew; cf. an underestimation of the PS, PS, be bevariable. Such bunching can cause an cause this bunching can increase increase ties in an existing extreme category when in fact fact some of these ties actually represent superior outcomes outcomes for represent superior resultmembers of one group regarding the underlying variable. Skew result ing from such bunching bunching can also cause rr fbb to to underestimate underestimate rrpo pop ' as was discussed in the the section Assumptions oof r and and rrpb b, in chapter 4. Again, reduced by the use of th the maximum maximum number number of such problems can be reduced of categories into into which which participants can reliably reliably be placed and by the use of either the the Brunner Brunner and and Munzel (2000) (2000) tie-handling tie-handling method method or Cliff Cliff's of 's ((1996) 1 996) method method that is discussed in the next section.

g��

5,

xmia

�p

h

1,

f

/

4.

AND SOMERS' D D THE DOMINANCE MEASURE AND from the section The Dominance Dominance Measure Measure in chapter 5 that Cliff Cliff Recall from discussed an effect effect size that is a variation variation on the PS PSconcept concept ((1993, 1 993, 11996) 996) discussed

2 12 212

> Yb) Yb) -- Pr(Yb Pr(Yb > > Ya). Ya). Cliff Cliff ((1993, 1 993, 11996) 996) called called the the estimator estimator of this effect effect size the the dominance statis statistic, which . 6 as ds = which we defined defined in Equation 55.6asds = Pa p a>>bb- pPt,b >> aa.. When calculat calculat' s value of U/npb ing the ds each p p value is given by the sample sample's U/nanb with no now given only by the 5S part part of Equation allocation of ties, so each U is now 9.1. denominator of each p p value is still given by nnanb. 9. 1 . The denominator anb. Note again that many many ties are likely in the case of ordinal categorical categorical data, which is ordinal outcomes. outcomes. The application application of theDM especially true with fewer ordinal DM and the ds will be made clear in the worked example in the the next next section. Recall from chapter 1 . When chapter 55 that the ds and DM range from from -1 to + +1. every member of Sample of Sample b has a better better outcome than every member of Sample a, ds == = -1 -1.. When every every member member of Sample a has an outcome outcome = ++1. 1. that is better than the outcome of every member of Sample Sample b, ds = When there is an equal number number of superior superior outcomes outcomes for each sample in the head-to-head head-to-head pairings, ds = o. 0. = -l -\ or + +11 there is is no overlap between between the two samples' dis disWhen ds = tributions tributions in the the 2 X x c table, and when ds = 0 there is complete overlap in the two samples samples'' distributions. distributions. However, because estimators of the PS and the DM are sensitive to which which outcome is better in each pairing, but but of not sensitive sensitive to how good the better better outcome outcome is, is, reporting reporting an estimate estimate of these two two effect effect sizes is not not very informative for for ordinal categorical categorical data data unless the 2 x c c (or rr x 2) table is also presented. For For example, with re regard to a table with with the column categories of Table 9. 9.11 (but not not the data therein), if P paa>>bb == 11 or ords ds = =+ + \1 (both (bothindicating indicatingthe themost mostextreme extreme possipossi ble superiority superiority of Therapy a over Therapy b) b) the result could mean that (a) all members members of Sample b were in the Worse Worse column column whereas whereas all mem members of Sample Column, or Sample a were in the No Change Change column, Improved Column, in either either the the No Change or Improved columns columns;; or (b) (b) all members of of Sample b were in the Change column, whereas all members of Sam the No Change Sample a were in the Improved Improved column. Readers Readers of a research research report report would would know which of these four meaningfully different different results under underwant to know . 5 or lying Pa p a>>bb = = 11 or ds = =+ +11 had occurred. Similarly, when when Pa pa>> bb = .5 ds = = 0 (both indicating no superiority superiority for either therapy), among other possible patterns of frequencies in the table the result could mean mean that all participants participants were in the Worse Worse column, all were in the No Change column, or all were in the Improved column. One One would would certainly want column, to know know whether whether such a Ppda >> bb or or ds ds were were indicating indicating that that both both therapies therapies were always possibly (No possibly harmful harmful (Worse column), always always ineffective ineffective (No Change column), column), always always effective effective (Improved (Improved column), column), or that there were some other pattern in the table. table. 1 99 3, 11996) 996) for Refer Refer to to Cliff Cliff ((1993, for a discussion of significance testing for the ds and construction of confidence confidence intervals for the DM for the inde independent-groups and the dependent-groups cases pendent-groups cases and for software to undertake the calculations. calculations. Wilcox (2003) provided provided an S-Plus software software undertake function for Cliff's Cliff's ((1993, method and, as noted in the discussion 1 993, 11996) 996) method function of the DM in chapter chapter 5, (Wilcox, (Wilcox, 2003) reported reported tentative tentative findings that of ==

==

EFFECT VARIABLES EFFECT SIZES-ORDINAL SIZES—ORDINAL CATEGORICAL CATEGORICAL VARIABLES

�

2 13 213

this method Type II error well even this method controls controls Type error well even when when there there are are many many ties. ties. 1 986), Vargha Delaney Consult Consult Simonoff, Simonoff, Hochberg, Hochberg, and and Reiser Reiser ((1986), Vargha and and Delaney and Delaney Delaney and and Vargha for further further discussions. discussions. The (2000), and Vargha (2002) for The ds is also known is also known as as the the version version of of Somers' Somers' D statistic statistic (Agresti, (Agresti, 2002; Somers, 1 962) that that is is applied to 2 x Somers, 1962) applied to X c tables tables with with ordinal ordinal outcomes outcomes (Cliff, 11996). exact p level level for for the the statistical statistical significance significance of of Somers' (Cliff, 996). An exact Somers' D is provided by StatXact StatXact and and SPSS is provided by SPSS Exact. Exact. WORKED EXAMPLE OF THE THE ds ds Calculating Calculating the the ds with with the the data data in in Table Table 9 9.1 . 1 by by starting starting with with column column 3, and not not allocating allocating ties, ties, we we note note that that (as (as already already found found in in the the previous previous secand sec tion) tion) Therapy Therapy aa had had S a = 364 superior superior outcomes outcomes in in the the 29 x 26 = = 754 head-to-head head-to-head comparisons. comparisons. Therefore, Therefore, P p a > b = 364/754 = .4828. Starting Starting again with with column column 3 we we now now find find that that 11 patient patient in in Therapy Therapy b b had had aa better better again outcome outcome than than 22 + 3 patients patients in in Therapy Therapy a, a, so so thus thus far far there there are are 11(22 (22 +3) = 25 pairs pairs of of patients patients within within which which Therapy Therapy b b had had the the superior superior outcome. outcome. Moving Moving now now to to column column 2 we we find find that that 13 13 patients patients in in Therapy Therapy b b had Therapy had an an outcome outcome that that was was superior superior to to the the outcome outcome of of 3 patients patients in in Therapy 3 x 3 = 39 to for a, a, adding adding 113x3 to the the previous previous subtotal subtotal of of 25 superior superior outcomes outcomes for Therapy b. Therefore, Therapy b. Therefore, Pb > a = = (25 (25 ++ 39)/ 39)1 754 754 == .0849. .0849. Thus, Thus, ds .4828--.0849 .0849 == .398, . 398,another anotherindication, indication,now now on ds== P a > b - P b > a = -4828 onaascale scale 1, ofthe from from -1 to to + +1, of the degree degree of of superiority superiority of of Therapy Therapy aa over over Therapy Therapy b. b. Observe that one Observe that one can can check check our our calculation calculation of of 25 + 39 = = 64 superior superior outcomes outcomes for for Therapy Therapy b b by by noting noting that that there there were were aa total total of of 754 compari comparisons, resulting (S)a) for sons, resulting in in 364 superior superior outcomes outcomes (S forTherapy Therapy aaand andTT== 326 326 ties; ties; so so there there must must be be 754 754 - 364 364 - 326 326 = = 64 comparisons comparisons in in which which Ther Therapy had the that it apy b b had the superior superior outcome. outcome. Note Note that it is is a a coincidence coincidence that that the the abab solute of the solute values values of the ds and and the the previously previously reported reported corrected corrected rpb (i.e., (i.e., rre), c), for 1 . The for the the data data in in Table Table 9.1 9.1 are are the the same, same, |1 .398 .3981. Theds dsand and rppbactually actually dedescribe different characteristics characteristics of of the the data. data. scribe somewhat somewhat different

a

Pa >b - Pr, > a

a>b

Pr, > a

=

rb rb p

GENERALIZED ODDS ODDS RATIO the discussion of Recall from Related Effect Size section chapter 5 Recall from the the A A Related Effect Size section in in chapter 5 the discussion of an estimator estimator of of an an effect effect size size that from the ratio of of the p val valan that results results from the ratio the two two P ues, the generalized generalized odds odds ratio. ratio. We Wenow now apply applythe the generalized generalizedodds oddsratio ratio ues, the to the able 9.1 9 . 1 by by using using the the same same definitions to the data data inT inTable definitions of of P paa >b == Ua/nanb and is, and Pt, = that that were were used used in in the the previous previous two two sections sections;; that that is, all we we ignore ignore ties ties in in calculating calculating the the two two U U values values but but we we use use all nanb = = 26 xX 29 = 754 possible possible comparisons comparisons for for the the two two denominators. denominators. given by Therefore, generalized odds Therefore, the the generalized odds ratio ratio estimate, estimate, OR ORg,, is is given by

> b U/nanb

Pr, > a Ub/nanb

nanb

g

OR g

=

Pa> b Ph> a

(9.6)

214 2 14

�

CHAPTER 9 9 CHAPTER

From the the values values that that were were calculated calculated in in the the previous previous section section we we now now From find pro .4828/.0849 = 5 .69. For these data the OR a = find that that P. pa>b /p = = 5.69. For these data the OR > b/ Pr, b>a > gg provides vides the the informative informative estimate estimate that that in in the the population population there there are are 5.69 times times more more pairings pairings in in which which patients patients in in Therapy Therapy aa have have aa better better outcome outcome than than patients patients in in Therapy Therapy bb than than pairings pairings in in which which patients patients in in Therapy Therapy bb have have aa better parameter, better outcome outcome than than patients patients in in Therapy Therapy a. a. The The estimated estimated parameter, ORg op 0 = Pr(Y Prey.a > Y Yb) Pr(Yb >> Y.), Ya),measures measures how how many many times times more more pair pairb) 1/ Pr(Yb ings there are in in which which aa member member of of Population Population aa has has an an outcome outcome that that is ing here are better than than the the outcome outcome for for aa member member of of Population Population bb than than vice vice versa. versa. For better For more discussion discussion of of generalized generalized odds odds ratios ratios consult consult Agresti Agresti (1984). more ( 1 984).

:{

CUMULATIVE ODDS RATIO

Suppose that that in in aa 2 Xx cc table table with with ordinal ordinal categories, categories, such such as as Table Table 9.2, Suppose one iiss interested interested in in comparing comparing the the two two groups groups with with respect respect to to their their at atone taining least some category. For taining at at least some ordinal ordinal category. For example, example, with with regard regard to to the the ordinal categories of Agree, Agree, Agree, Disagree, Disagree, ordinal categories of the the rating rating scale-Strongly scale—Strongly Agree, and Strongly Strongly Disagree-suppose Disagree—supposethat thatone onewants wantsto tocompare comparethe thecollege college and Agree women and and college college men men with with regard regard to to their their attaining attaining at at least least the the Agree women category. Attaining Attaining at at least the the Agree Agree category means means attaining attaining the the category. Strongly Agree category Strongly Agree Agree category category or or the the Agree category instead instead of of the the Strongly Strongly Disagree category or or the the Disagree category. Therefore, Therefore, one's Disagree category Disagree category. one's focus focus would be be on on the the now now combined Strongly Agree Agree and and Agree Agree categories would versus the the now now combined combined Strongly Strongly Disagree Disagree and and Disagree categories.. versus Disagree categories Thus, Table Table 9.2 is is temporarily temporarily collapsed collapsed (reduced) (reduced) to to aa 2 XX 2 table table for for Thus, X 2 tables of this purpose, rendering this purpose, rendering the the odds odds ratio ratio (OR) effect effect size size for for 22x2 tables of chapter 8 applicable applicable to to the the analysis analysis of of the the collapsed collapsed data. data. chapter A A population population OR ORthat that is is based based on on combined combined categories categories is is called called aa pop population cumulative cumulative odds odds ratio (population (population ORcum) ORcum).' This This effect effect size size is is aa ulation measure of of how how many many times times greater greater the the odds odds are are that that aa member member of of aa cer cermeasure tain g . , Agree tain group group will will fall fall into into aa certain certain set set of of categories categories (e. (e.g., Agree and and Strongly Agree) Strongly Agree) than than the the odds odds that that aa member member of of another another group group will will fall fall into of into that that set set of of categories categories.. In In our our example example we we are are calculating calculating the the ratio ratio of (a) the odds odds that that aa woman woman Agrees Agrees or or Strongly Agrees with with the the statement statement (a) the Strongly Agrees (instead of of Disagreeing Disagreeing or or Strongly Strongly Disagreeing Disagreeing with with it) it) and and (b) the odds (b) the that aa man man Agrees Agrees or or Strongly Strongly Agrees Agrees with with that that statement statement (instead of that of

9.2 TABLE 9.2 Gender Comparison With With Regard to an Attitude Attitude Scale

Women Men

Strongly Agree Strongly 62 62 30 30

Agree

Disagree

118 8

2 2

112 2

7 7

Strongly Disagree Strongly o 0 1

EFFECT SIZES-ORDINAL SIZES—ORDINAL CATEGORICAL CATEGORICAL VARIABLES VARIABLES EFFECT

�

215 2 15

Disagreeing with it). it). The choice choice ooff which which ooff the the Disagreeing or Strongly Disagreeing 2X x cc ordinal categorical categorical table table two or more categories to combine in a 2 should be made before the data data are collected. 9.2 exshould collected. Table 9 . 2 presents an ex original complete table (before collapsing it into Table 9.3) ample of an original actual data, but but the labels ooff the response categories have been using actual changed somewhat. somewhat. The nonstatistical nonstatistical details of the research research do not not changed concern us here. 9.2 combinCollapsing Table 9 . 2 by combining columns 11 and 2 and by combin ing columns columns 3 and 4 produces T Table ORcum by applying applying able 9.3. One finds ORcum Equation 8 . 7 from chapter 8 8.7 8 to Table 9.3, OR OR ==ff11 f 12 f 21 Observe in 1 ' Observe 1 if22 22/fnr2 0 = 7 + 11 = 8 , /] /22 thatf11l = 62 + 118 80,f 7 = 8,/ = = 2, 2, and and 8 = 80 = 2 + = Table 9.3 thatfl 2 22 12 ' f21 42. As chapter 8 each f value . As in chapter 8 we adjust eachfvalue by adding ..5 5 f 30 + 112 2 = 42 1 = 2 to it to improve the sample ORcum ORcum as an an estimator estimator of OR OR 0 . We We then then use Equation 8.7 f11l = = 80. 80.5,f = 88.5,f = 22.5, andf21 =42.5. in Equation 8. 7fl 5 ' /22 . 5 ' /12 . 5 , andf 5 . Therefore, 22 = 21 = 12 = the adjusted ORcum ORcum is ORa ORadjdj = = 80. 80.5(8.5)72.5(42.5)= 6.44. We havejust have just the 5 (8 . 5 ) / 2 . 5 (42 . 5 ) = 6 .44. We found from the sample sample ORad ORadj that the odds that a woman will Agree or found Strongly tement are estimated Strongly Agree Agree with the st statement estimated to be more than six However, to avoid times greater than the odds that a man will do so. so. However, exaggerating the gender difference difference that was found by ORa ORadd" in these exaggerating is also important important to note in Table 99.3 majority . 3 that a great m jority of of data, it is the men Agree or S trongly Agree with the statement (42/50 = 84%) 84%) Strongly but an even greater majority of the women Agree Agree or Strongly Strongly Agree but 97.6%). with it ((80/82 8 0/82 = 9 7 . 6%) . effect size that were discussed previously in Any of the measures of effect discussed previously data in T Table subject previthis chapter are applicable to the data able 9.2, subj ect to the previ ously discussed limitations. With regard to Table 9.3, if the two popula population odds are equal, the population population ORpo = 11.. Recall from chapter 88 that tion popp = op :#: 11 can be conducted a test of Ho H0:: OR ORppop = 11 versus Hal Haltt :: OR ORppop conducted using the the op = 2 2 usual xX2 test of association. If Xx2 is significant significant at at a certain p level, level, then then ORaaddj is statistically statistically significantly significantly different different from 11 at at the same p level. 2 data of Table 9.3 yield Xx2 = 111.85,p

235 235

Venables, W. 1 9 75 ) . Calculation W. ((1975). Calculation of confidence intervals for for noncentrality pa parameters.

Journal ooff the Royal Statistical Statistical Society Society (Series (Series B), 337, 406-412. 7, 406-4 12.

E. L. L. ((1947). and distance discrimination. discrimination. Unpub UnpubWalker, E. 1 94 7 ) . Factors in vernier acuity and Stanford, CA. CA. lished doctoral dissertation, Stanford Stanford University, Stanford,

Wampold, ., & R. C. C. (2000). The consequences of Wampold, BB.. EE., & Serlin, R. of ignoring a nested nested

factor on on measures effect size in analysis of of variance. variance. factor measures of effect

Psychological Meth MethPsychological

ods, S, 5, 425-433.

Watson, JJ.. SS.. ((1985). groups are more homogeneous homogeneous Watson, 1 98 5 ) . Volunteer and risk-taking groups seeking than control groups. groups. on measures measures of sensation seeking

Perceptual and Motor Motor Perceptual

61, 471-475. 1, 4 7 1-475 . Skills, 6 S. 5 ( 1 9 9 5 ) . Effects of Weisz, J. R R.,. , Weiss, Weiss, B B.,. , Han, S. S.,. , Granger, Granger, D. A., & Morton, T. T. (1995). Effects of psychotherapy with children of children and and adolescents adolescents revisited: revisited: A meta-analysis of 1 7, 450-468. treatment outcome studies. Psychological Psychological Bulletin, 1117, 450-468.

L. ((1938). of the the difference difference between between two two means means when Welch, B. B. L. 1 93 8 ) . The significance of 5 0-362. the population variances are unequal. Biometrika, Biometrika, 29, 29, 3350-362.

1 9 70). T AT method Werner, M., Stabenau, ., & Stabenau, J. B B., & Pollin, W. W. ((1970). TAT methodfor forthe thedifferentia differentia-

tion of families families of of schizophrenics, delinquents, and and normals. Journal ooffAb Abnormal Psychology, 75, 1139-145. 3 9-1 45 . K. D. D. ((1998). Testing for homogeneity homogeneity ooffvariance: An An evaluation Weston, T, T., & Hopkins, K. 1 998). T estingfor ooff current practice. Unpublished Unpublished manuscript, University University of Colorado Colorado at at Boulder. Wickens, T. 1 9 8 9 ) . Multiway T. D. ((1989). Multiway contingency contingency tables analysis f for tables analysis or the social sciences. Mahwah, NJ: Lawrence Erlbaum Mahwah, NJ: Erlbaum Associates Associates.. Wiener, R. 1 9 9 7 ) . Sexual harassment [Special ) . Psychology, R. L., L., & & Gutek, B B.. ((1997). [Special issue issue]. Psychology, Public Policy, and 5(3). and Law, 5(3). 1 9 8 7 ) . New Wilcox, R. R. R R.. ((1987). New designs iinn analysis of of variance. variance. Annual Review ooff Psy Psychology, 38, 29-60. 29-60. Wilcox, R. R. R. R. ((1995). Comparing two two independent independent groups groups via via multiple Wilcox, 1 9 9 5 ) . Comparing 1-99. quantiles. The The Statistician, 44, 991-99. 96). Statisticsf or the social sciences. San W ilcox, R. Wilcox, R. R. R. ((11 9996). for San Diego, Diego, CA: CA:Academic AcademicPress. Press. Wilcox, R. R. R. 997)7). Introduction . Introductiontotorobust robustestimation estimationand andhypothesis hypothesistesting. testing.San San Wilcox, R. (( 1199 Diego, CA: CA: Academic Academic Press. R. R. R. (200 (2001). Fundamentals ooff modern statistical statistical methods: Substantially Substantially 1 ) . Fundamentals Wilcox, R. ork: Springer-Verlag. improving improving power and accuracy. New Y York: Spring er-Verlag. R. (2002 (2002). Multiple comparisons comparisons among among dependent dependent groups groups based based on on a Wilcox, R. R. R. ) . Multiple modified 77. modified one-step one-step M-estimator. Biometrical Journal, Journal, 44, 466-4 466-477. Wilcox, R. 3 ) . Applying R. R. R. (200 (2003). Applying contemporary contemporary statistical statistical techniques. techniques. San San Diego, CA: CA: Academic Press. L., & 1 986). New Wilcox, R. R. R R.,. , Charlin, Charlin, V. V. L., & Thompson, K. K. L. L. ((1986). New Monte Monte Carlo results results the robustness of the the ANOVA F, F, W, and and F* F* statistics. Communications Communications in on the Statistics-Simulation S, 933-943. Statistics-Simulation and Computation, Computation, I15, Wilcox, R., & Wilcox, R. R. R., & Keselman, H. H. J. (2002a). Power Power analysis when comparing 1. trimmed means means.. Journal Journal ooffModern Modern Applied Applied Statistical Methods, 11,, 24-3 24-31. Wilcox, R R.. R R., & Keselman, H H.. JJ.. (2002b (2002b).) . Within groups multiple comparisons Wilcox, ., & on robust measures of of location. Journal Journal ooff Modern Modern Applied Applied Statistical based on Methods, , 2 8 1 -2 8 7. Methods, 11,281 -287.

R., & & Keselman, H. R. R., H. J. (2003a). Modern robust data analysis: Measures Wilcox, R. of of central tendency. Psychological Psychological Methods, Methods, 8, 8, 254-274.

236

REFERENCES REFERENCES

R., & Keselman, Wilcox, R. R. R., Keselman, H. J. (2003b). (2003b) . Repeated measures one-way one-way ANOVA based on on a modified one-step one-step M-estimator. M-estimator.

British British Journal Journal of of Mathematical Mathematical

and Statistical Psychology, 56, 1-13. 6 , 1-1 3. and Psychology, 5 R.. R R., & Muska, J. ((1999). effect size: size: A Anon-parametric ana1 9 9 9 ) . Measuring effect non-parametric ana Wilcox, R ., &

log of (j) co22. . British British Journal Journal ooff Mathematical Mathematical and and Statistical Psychology, Psychology, 52, 52, log of 93-110. 93-1 10. F.( (1945). Individualcomparisons comparisonsbby ranking methods. methods.Biometrics, Biometrics,11, 1 945 ) . Individual y ranking , Wilcoxon, F. 80-83. 80-8 3. Wilde, M. c 1 99 5 ) . D o recognition-free recall discrepdiscrep C,. , Boake, Boake, c., C, & & Sherer, Sherer, M. ((1995). Do deficits in closed-head injury? An An exploratory exploratory analysis analysis ancies detect retrieval deficits the California Verbal Verbal Learning Learning T Test. and Experimental Experimental with the est. Journal Journal ooff Clinical and Neuropsychology, 117, 849-855. Neuropsychology, 7, 849-8 55. Wilk, M. B ., & B., & Gnanadesikan, R R.. ((1968). Probability plotting plotting methods for for the the 1 96 8 ) . Probability analysis 7. analysis of data. Biometrika, Biometrika,55, 55, 1-1 1-17. Wilkinson, LL., & AP APA on Statistical Statistical Inference. ((1999). ., & A Task Force on 1 9 9 9 ) . Statistical methods psychology journals: journals : Guidelines methods in psychology Guidelines and and explanations. explanations. American Psy Psy594-604. 94-604. chologist, 54, 5 effectiveWilson, D. B., B . , & Lipsey, M. W. W. (2001). (200 1 ) . The The role role of of method method in in treatment treatment effective 6, 4 1 3-42 9 . ness ness:: Evidence from meta-analysis. meta-analysis. Psychological Psychological Methods, Methods, 6, 413-429. Wright, S 1 946). Spacing S.. T. T. ((1946). Spacing ooffpractice in verbal learning and and the the maturation hy hy's thesis, Stanford University, Stanford, CA. CA. pothesis. Unpublished master master's K. K. K. (19 74).. The two two sample sample trimmed t for unequal population variances. variances. Yuen, K. (I 9 74) Biometrika, 6 1, 1 65-1 70. 61, 165-170. Zimmerman, D. D. M. M. ((1996). preliminary tests of of equality of 1 99 6 ) . Some properties properties of preliminary of variances in in the the two-sample two-sample location location problem. The Journal Journal ooffGeneral Psychol Psychology, 1123,217-231. 23, 2 1 7-23 1 . Zimmerman, D D.. M., & & Zumbo, B B.. D D.. ((1993). transformations and and the the Zimmerman, 1 9 93 ) . Rank transformations power populations power of the Student Student t test and Welch tt'' test for non-normal popUlations with unequal Journal oof f Experimental unequal variances. Canadian Journal Experimental Psychology, 47, 47, 523-539. 523-5 3 9 .

Author Index

A

84, 161, 219 Abelson, R. R. P., P., 84, 161, 2 19 219 Abu Libdeh, 0., 122, 122, 2 19 Adnan, RR., 233 Adnan, . , 19, 1 9, 2 33 A.,, 179, Agresti, A 1 79 , 183, 1 8 3, 184, 1 84, 1190, 90,

192, 204, 205, 205, 1 92 , 194, 1 94, 196, 1 9 6, 204, 213, 214, 217, 219 2 1 3, 2 1 4, 216, 2 1 6, 2 1 7, 2 19 S., 219 Ahadi, 5 . , 95, 9 5 , 126, 1 2 6, 2 19 L.. S., 176, Aiken, L 5 . , 74, 75, 7 5 , 82, 95, 95, 1 76, 221 22 1 J.,. , 68, 123, 1 23 , 124, 1 24, 127, 1 2 7, 1133, 33, Algina, J

134, 1 34, 135, 1 3 5 , 136, 1 3 6, 139, 1 3 9 , 1140, 40, 143, 150, 1 5 0, 155, 1 5 5 , 156, 1 5 6, 1159, 5 9, 160, 1 60, 161, 1 6 1 , 163, 1 6 3, 164, 1 64, 1165, 65, 166, 219, 1 6 6, 167, 167, 2 1 9, 230 219 Altman, D. G., G . , 31, 31, 2 19 S., Anderson, S . , 183, 1 83 , 225 Andrews, G G., 227 Andrews, . , 58, 2 27 219 Antonuccio, D. D. O., O. , 201, 201, 2 19 Arvey, R. D.,. , 122, 124, Arve� R. D 1 24, 143, 1 4 3 , 1144, 44, 229 2 29 A. A A.,, 333, 219 Aspin, A 3, 2 19 219 Auguinis, H., 82, 2 19

B Barnett, V, 219 Barnett, v., 112, 2, 2 19 J.,. , 54, 125, 219 1 25 , 126, 126, 2 19 Barnette, J Baugh, ER,, 79, 220 220 S.. L., 183, 220 1 83 , 220 Beal, S Beatty, M M.. J., 991, 220 1 , 92, 220 Becker, B. 221 B . J., 9, 2 21 J.,. , 184, 220 Bedrick, E. E. J 1 84, 220 Begg, C C.. B., 559, 220 9 , 220

Belsley, D. A A.,, 75, 220 220 Bergin, A A. E., 11, 228 E., 1 1 , 228 Bergmann, R., 220 R., 102, 1 02 , 105, 1 05 , 220 Bernhardson, C C.. S., 130, 220 S., 1 30, 220 231 Best, D. J., 99, 206, 2 31 Bevan, M M.. E, 204, 220 220 E , 204, Bickel, P 220 4 1 , 220 P. J., 41, Bird, K. 31, 133, 160, K. D., 3 1, 1 33, 1 6 0, 161, 1 6 1 , 1164, 64, 220 Blair, R. R. c., C., 11, 232 1 1 , 232 Boake, C., 236 c . , 13, 1 3, 2 36 Bond, C C.. E, 161, 220 E , 23, 130, 1 30, 1 6 1 , 220 Bonett, D. D . G., 34, 40, 131, 1 3 1 , 220 220 Borenstein, M., M., 24, 220 Bradeley, M. T., 59, 220 T., 5 9, 220 Brant, R., 12, 18, 220 R., 1 2, 1 8 , 220 Breaugh, JJ.. A., 220 A, 86, 93, 94, 185, 1 85 , 220 Brown, R. R. A A.,, 113, 220 3 , 220 Brunner, E., E., 99, 101, 1 0 1 , 115, 1 1 5 , 134, 1 34, 1136, 3 6, 162, 1 62 , 164, 1 64, 206, 210, 2 1 0 , 2211, 11, 217, 221 2 1 7, 220, 2 21 Bryant, T. T. N N.,. , 331, 219 1, 2 19 Bryk, A. S.,. , 11, 221, 2311 A S 1 1 , 14, 22 1 , 23 J.,. , 338, 221 Bunner, J 8, 2 21 Burgess, E. S.,. , 13, 220 1 3 , 220 E. S

C c 2211 Callendar, J. J. C., c . , 82, 22 Camp, C. C. C., c . , 122, 1 2 2 , 124, 1 24, 143, 1 4 3 , 1144, 44, 229 M.,. , 7, 22 2211 J. M Campbell, J. K., 19, 2211 ., 1 9, 22 Carling, K

221 J. B., B . , 74, 176, 1 76, 2 21 Carroll, J. M.,. , 1122, 127, 142, 221 22 , 1 2 7, 1 42, 2 21 Carroll, R. R. M

237

238

�

AUTHOR AUTHOR INDEX INDEX

J.,. , 1194, 233 Castellan, N. N. J 94, 2 33 Chacon-Moscoso, 5 . , 1173, 73 , 1 98, 2 32 S., 198, 232 05 , 1198, 9 8 , 229 Chalmers, T. T. c C,. , 1105, 229 S.. EF.,, 1179, Chan, I. S 79 , 2221 21 W,, 882, 221 Chan, vv. 2 , 84, 2 21 Chan, vv. -L . , 8 2 , 84, 2 21 W.-L., 82, 221 L., 114, 235 Charlin, V 1., 4, 2 35 Chen, P. 21 PY Y,, 82, 2221 Chernick, M. R . , 43, 2 21 R., 221 J.,. , 1183, Chiang, J 8 3 , 226 Chuang-Stein, C., 2211 Chuang-Stein, c., 1179, 79, 22 Cleveland, W 5 . , 1113, 1 3, 1 1 4, 2 21 W. S., 114, 221 Cliff, N., 99, 1107, 151, 205, Cliff, 0 7, 1108, 08, 1 5 1 , 205, 211, 212, 213, 217, 221 2 11, 2 1 2, 2 13, 2 1 7, 2 21 W. G G.,. , 111, 30, 204, Cochran, W 1, 3 0 , 35, 3 5 , 38, 3 8 , 204, 233 2 33 J.,. , 7, 24, 228, 75, 8 , 55, 5 5 , 74, 7 5 , 82, Cohen, J 85, 86, 93, 95, 8 5, 8 6, 9 3, 9 5 , 1108, 0 8 , 1109, 09 , 1111, 1 1 , 1119, 1 9, 1120, 20, 121, 1 2 1 , 1143, 43, 1 60, 1 76 , 1183, 8 3 , 203, 11, 160, 176, 203, 2211, 2 16, 2 21 216, 221 E,, 74, 75, 82, 995, 221 Cohen, P. 5 , 1176, 76 , 2 21 Cohn, 1. D., 9 21 L. D., 9,, 2221 05 , 222 Colditz, G. G. A A.,. , 1105, 222 Cook, R. 5 , 222 R. D D.,. , 775, 222 Cooper, H 8 6 , 222 H.. M., 99,, 86, 222 M. C., Corballis, M. c., 1123, 2 3 , 1124, 24, 1127, 2 7, 1160, 60, 1166, 6 6 , 234 J.,. , 1192, Cornfield, J 92 , 222 J. M M.,. , 1134, 139, Cortina, J. 3 4, 1136, 3 6, 1 3 9, 1140, 40, 1149, 49, 1 66 , 222 166, Cribbie, R. 0 , 112, 2, 40, 1130, 30 , 1131, 31, R. A., 110, 1132, 32 , 1133, 3 3 , 222, 2 27, 2 28 227, 228 Crits-Christoph, P.E,, 1141, 222 Crits-Christoph, 4 1 , 222 E.. 1L., 222 Crow, E . , 90, 222 . , 23, 24, 3 1 , 62, 65, Cumming, G G., 31, 223 68, 222, 2 23 D o

D'Agostino, B., 28, 61, D 'Agostino, R. R. B ., 2 8, 6 1 , 204, 222, 229 E,, 229, Daly, F. 9 , 43, 44, 225 W. G., G., 2201, 219 01 , 2 19 Danton, W 1 4, 222 Darlington, M M.. 1., L., 1114, 222 Davidson, J. R T., 20 1 , 222 R.. T., 201, 222 2 , 222 Davies, 1., L., 112, 222 Davis, D D.. E., E., 40, 225 225 A. C., 222 Davison, A. c., 43, 222 Dayton, C 3 0, 1132, 32 , 1 3 5 , 1181, 81, C.. M., 1130, 135, 222 De Carlo, L. L. T. T.,, 14, 222 222

Delaney, H 00, 1101, 0 1 , 1104, 04, 1105, 05 , H.. D D.,. , 1100, 1 06 , 1108, 08, 1 1 5 , 1119, 1 9 , 1120, 20, 106, 115, 1121, 2 1 , 1130, 30, 1131, 3 1 , 1134, 34, 1135, 35 , 1 3 6 , 1139, 3 9, 1 40, 1143, 43 , 1160, 60, 136, 140, 164, 1161, 6 1 , 1162, 62 , 1 64, 1165, 6 5 , 1166, 66, 167, 204, 206, 2207, 210, 1 6 7, 204, 07, 2 10, 213, 217, 229, 2 13, 2 1 7, 222, 2 2 9 , 234 Denton, J 220 J.. eL a, , 204, 220 R. P.E,, 68, 229 229 DeShon, R. Diaconis, P. E,, 43, 223 223 Diener, E ., 9 5 , 1126, 26, 2 19 E., 95, 219 ., 1 1 0, 2 2 7 , 229 Dinnel, D D.. 1L., 110, 227, 229 W. J., 28, 2223 Dixon, vv. 23 Dodd, D 3 5 , 1160, 60, 1165, 65 , 1166, 66 , D.. H H.,. , 1135, 223 2 23 K. A., 1112, 223 Doksum, K. 1 2 , 1113, 1 3 , 223 Donahue, B., B., 12, 1 2, 22 2277 230 Dosser, D. D. A., 1124, 24, 1127, 2 7 , 230 W. P.E,, 1106, Dunlap, vv. 06, 2223 23 223 Dwyer, J. H., H . , 160, 1 60, 2 23

E Edwards, A 8 5 , 224 A.,. , 6, 1185, 224 Efron, B., 43, 223 Efron, B., 223 J. D., 210, Emerson, J. D., 111, 1 , 206, 208, 2 1 0, 217, 223, 2 1 7, 2 2 3 , 229 C., 886, 229 Emrich, c., 6, 2 29 D.. M., 113, 220 Evans, D 3 , 220 Ezekiel, M 2 1 , 223 M.,. , 1121, 223

F

G., 208, 209, 209, Fahoome, G . , 1102, 02 , 1162, 62 , 208, 223, 232 X., 55,, 118, 226 Fan, x., 8 , 223, 226 B.. R., 99, 206, 210, Fay, B R., 9 9 , 102, 206, 2 1 0 , 2211, 11, 223 2 23 Feingold, A 3 , 1112, 12 , 2 23 A.,. , 113, 223 A. R R., 32, 223 Feinstein, A. ., 3 2 , 223 F. EE.,. , 1127, 223 Fern, E 2 7, 223 Feske, U U.,. , 113, 223 3, 2 23 E,, 223, 24, 29, 31, Fidler, F. 3, 2 4, 2 9, 3 1 , 62, 1122, 22 , 1123, 2 3 , 1160, 60, 223 223 Finch, SS., 23, 31, 65, 68, ., 2 3 , 24, 3 1 , 62, 6 5, 6 8, 223 222, 2 23 Findley, M 6 , 222 M.,. , 886, 222 R.. A., 4, 224 224 Fisher, R J.. L., 11, 61, Fleiss, J 1., 1 1, 6 1 , 1172, 72 , 1173, 7 3 , 1174, 74, 1175, 75 , 1176, 76 , 1178, 7 8, 1179, 79 , 1180, 80, 1181, 8 1 , 1182, 8 2, 1183, 8 3 , 1184, 84, 1185, 85 ,

AUTHOR INDEX INDEX AUTHOR

186, 188, 1 86 , 187, 1 8 7, 1 88, 190, 1 9 0, 1191, 91, 192, 216, 217, 1 92 , 195, 1 9 5 , 197, 1 9 7, 2 1 6, 2 1 7, 224 M. A., 1101, 224 Fligner, M. 0 1 , 224 226 Folks, L., L., 28, 226 65, Fouladi, R. R. T., T. , 6 5 , 133, 1 3 3 , 160, 1 60, 2234 34 R. L., L., 1143, Fowler, R. 43 , 2224 24 J.,. , 75, 224 224 Fox, J 11, 36, 39, 42,, 2227, Fradette, K., K., 1 1, 3 6, 3 9 , 42 27, 230 2 30 R. W, 28, 32, 224 Frick, R. w. , 2 8, 3 2 , 224 112, 225 Friedman, L., L., 1 1 2 , 225 19, 224 Frigge, M., 1 9, 224 G

R., 222 Gallop, R . , 1141, 4 1 , 222 M. J., 331, 219 Gardner, M. 1, 2 19 J.. JJ.,. , 184, Gart, J 1 84, 192, 1 92 , 224 Gasko, M., 231 Gasko, M., 36, 2 31 2 , 222 Gather, Gather, U U.,. , 112, D., 106, 206, 2217, Gibbons, J. D . , 99, 1 06 , 206, 1 7, 224, 2231 31 G., 6,, 185, 224 Gigerenzer, G ., 6 1 8 5 , 224 166, 22 Gillett, R., R., 140, 1 40, 143, 143, 1 66 , 167, 1 6 7, 224 6, Glass, G. 8,, 9, 86, G. V, 8 9 , 50, 51, 5 1 , 74, 8 Glass, 233 1140, 40, 224, 2 33 Gleser, L. L. JJ.,. , 4 43, 224 GIeser, 3 , 224 J.. A. A.,, 228, 200, 224, 224, 2; Gliner, J 8 , 1195, 9 5 , 200, 234 R., 236 Gnanadesikan, R . , 113, 1 13, 2 36 Goldberg, K K.. M M.,. , 559, 224 9, 224 3, 223 Goldstein, A. A. J., 113, 223 H. E., 231 Gollob, H. E., 28, 2 31 Goodman, LL.. A A.,, 1196, 224 9 6 , 224 Goodman, J.. A., 663, Gorecki, J 3 , 225 Grady-Fletcher, A., A., 200, 200, 2201, 233 0 1 , 233 Grady-Fletcher, Granger, 13, Granger, D. A., A, 1 3 , 235 R. H H.,. , 1149, 230 Greenberg, R. 49, 2 30 G. R R., 167, 225 Grice, G. ., 1 6 7 , 225 R.. J., 110, 31, 51, Grissom, R 0, 113, 3 , 114, 4, 3 1, 5 1, 85, 86, 105, 109, 5, 8 6 , 98, 1 05 , 109, 52, 8 110, 117, 202, 2225, 1 1� 1 1 7, 133, 1 3 3 , 202, 25, 229 J.. 5S., 13, ., 1 3 , 225 Gross, J W. M., 557, 226 Grove, W. 7, 226 Gutek, B., 235 Gutek, S., 94, 235 H

Haase, R. 85, 225 R. E, F., 8 5 , 225 Hackett, D., 201, Hackett, D., 20 1 , 222

�

239

Haddock, C. K.,, 176, 225 Haddock, C. K. 1 76 , 190, 1 90, 197, 1 9 7, 225 Hampel, F. F. R., 4 41, 225 Hampel, 1 , 225 Han, S. S. S., 13, 235 5., 1 3, 2 35 D. JJ.,. , 229, 44, Hand, D. 9 , 43, 4 4 , 1115, 1 5 , 225 Harlow, L. L. L., 6,, 24, 24, 225 225 Harlow, L., 6 R. JJ.,. , 195, 200, 224 224 Harmon, R. 1 95 , 200, Harrell, F. F. E., 40, 225 40, 225 W. W, Hauck, W. w., 1183, 8 3 , 225 R.. B., 188, 232 Haynes, R S., 1 88, 2 32 W. LL., 121, 174, Hays, W. ., 1 2 1 , 1122, 22, 1 74, 1194, 94, 225 Hedges, L. L. V, 5,, 9, 53, 54, 55, 558, V, 5 8, 60, 62, 72, 1112, 203, 1 2 , 136, 1 3 6 , 203, 222, 225 225,, 226 222, Hekmat, H H., 13, Hekmat, ., 1 3 , 226 R.. K., K., 55,, 2231 Henson, R 31 Herranz Tejedor, II.,. , 1179, 229 79 , 183, 183, 2 29 Hess, B., 226 S . , 109, 1 09, 226 Hildebrand, D. D. K K.,, 2217, 226 Hildebrand, 1 7, 226 D.. V V,, 43, 222 Hinkley, D Hirada, N., 2277 Hirada, N., 1110, 1 0, 22 Hiruma, N N., 2277 Hiruma, . , 1110, 1 0, 22 Hoaglin, D. C., 19, 41, 224, 226 D. c . , 18, 18, 1 9, 4 1 , 224, Hochberg, 213, 233 Hochberg, Y., Y, 2 1 3, 2 33 Hogarty, K. K. Y Y,, 77,, 555, 56, 226 5, 5 6 , 226 Hogarty, Hopkins, K K. D D.,. , 114, 224, 235 235 Hopkins, 4, 74, 224, H., 208, 2217, Hosseini, H . , 206, 208, 1 7, 229 Hou, C.-D., 183, 226 Hou, 1 8 3 , 226 Howell, D. C., 226 c . , 13, 2 26 Howland, E. E. w. W,, 1123, 234 23 , 1143, 4 3 , 165, 1 65 , 2 34 C., Hsu, J. C . , 1131, 3 1 , 226 Hsu, L. L. M., 888, 226 8 , 90, 226 Huberty, C. JJ.,. , 66,, 110, 109, 226, Huberty, C. 0, 12, 1 09 , 226, 2277 22 R., 228 Hullett, C. C. R . , 142, 1 42 , 143, 1 43 , 2 28 Hunter, J. EE., 5,, 99,, 110, 53, ., 5 0, 24, 43, 5 3, Hunter, 54, 70, 72, 76, 79, 80, 81, 5 4, 7 0, 7 2, 7 6, 7 9, 8 0, 8 1, 95, 82, 90, 94, 9 5 , 1104, 04, 1127, 27, 136, 226, 2232 1 3 6 , 165, 1 65 , 226, 32 Huynh, C C.. LL., 55, 56, 226 ., 5 5, 5 6, 2 26 Hyndman, R. J.,. , 18, 226 R. J 1 8, 2 26 I Iglewicz, S ., 1 9, 5 9 , 224 B., 19, 59, 224 Ilies, 1, 2 32 Hies, R., 881, 232 Impara, J. . , 78, 2 31 J. c C., 231 J.,. , 114, IIzenman, zenman, A. A J 1 1 4, 226

J J

Jacoby, W. W. G., 226 G . , 12, 12, 2 26

240

�

AUTHOR INDEX

Jones, L. 226 L. V, v., 6, 226 Joormann, J., 233 Joormann, J . , 1141, 41, 2 33

K Kempthorne, O., 28, 0., 2 8 , 226 Pc C,. , 557, 7, 226 Kendall, P. G., 143, Keppel, G . , 114, 4, 1124, 24, 1127, 2 7 , 1136, 3 6, 143, 227 2 27 G., 143, 227 Keren, G ., 1 43, 2 27 Keselman, H H.. J., 110, 11, 36, 39, 0, 1 1 , 112, 2, 3 6, 3 9, Keselman, 41, 68, 40, 4 1 , 42, 43, 6 8 , 1117, 1 7, 1 21, 1 3 0, 1131, 3 1 , 1132, 32 , 1133, 33 , 121, 130, 134, 219, 222, 2227, 1 34, 1135, 35 , 2 1 9, 222, 2� 228, 35 228, 230, 230, 2235 C., 227 Keselman, J. c . , 112, 2, 2 27 Kim,, c C., 179, 234 Kim ., 1 79, 2 34 Kim,, J. JJ.,. , 998, 133, Kim 8 , 1105, 05 , 1 3 3 , 225 R. EE., 24, 227 227 Kirk, R. . , 24, E. E E., 110, Kleinknecht, E. ., 1 1 0 , 2227 27 R. A A.,, 1110, 2277 Kleinknecht, R. 1 0, 22 B., Kline, R. R. B . , 166, 1 66 , 2227 27 T. R R., 5,, 228, 61, Knapp, T. ., 5 8 , 32, 6 1 , 2227 27 D., 7, 555, 56, 226 Komrey, J. D . , 7, 5, 5 6, 226 R. K., K., 12, 2227 Kowalchuk, R. 27 H. c C., 7, 54, 556, 58, 227 Kraemer, H. . , 7, 6, 5 8, 2 27 6,, 2227 Krueger, J., J., 6 27 W. H H.,. , 1105, 05 , 2227 27 Kruskal, W. E., 75, 220 Kuh, E ., 7 5 , 220 L Laing, J. D ., 2 1 7, 2 26 D., 217, 226 N.. M M.,. , 558, 198, 2277 Laird, N 8 , 1106, 06, 1 9 8 , 22 Lambert, M. M. JJ.,. , I11, 228 Lambert, I , 228 D.. M., 229 Lane, D M . , 86, 86/ 229 A.,, 1188, 228 Laupacis, A 8 8, 228 Lawson, S., 233 S./ 1121, 2 1 / 1126, 2 6, 1127, 27, 2 33 D.. A A.,, 559, 228 Lax, D 9 / 228 H.,. , 881, 232 Le, H 1/ 2 32 J.,. , 223, 24, 223 Leeman, J 3, 2 4 , 223 Lehmann, EE.. LL., 106, 220, 228 228 Lehmann, . , 41, 1 06 , 220, Levin, B., 61, B., 6 1 , 1172, 72 , 1173, 73 , 1174, 74, 1175, 75 , 176, 180, 1 76 / 1178, 78 , 1179, 7 9, 1 80, 1181, 81/ 182, 186, 1 82 , 1183, 83 , 1 86 , 1187, 8 7, 1188, 88, 190, 195, 1 90, 1191, 9 1 , 1192, 92 , 1 95 , 1197, 9 7, 216, 2 1 6 , 217, 2 1 7 , 224 J.. R., 7, 112, 28, 81, R., 5, 7, 2, 2 8, 8 1 , 1125, 25, Levin, J 167, 201, 227, 1127, 2 7 , 1141, 41, 1 6 7/ 2 01, 2 27, 228, 230, 230, 2231, 233 228, 31, 2 33 T. R R.,. , 1142, 228 Levine, T. 42 , 1143, 4 3 , 228

R,, 71, 991, 228 Levy, P. 1 , 228 Lewis, C., 143, 227 c., 1 43, 2 27 Lewis, T. T.,, 112, 2 , 2219 19 Liebetrau, A. 174, 217, 228 Liebetrau, A M., M , 1 74, 1194, 94, 2 1 7 , 228 W.,, 7, 9, 885, Lipsey, M. w. 5 , 86, 1108, 08 , 228, 2236 1109, 09 , 1127, 2 7, 1167, 6 7, 228, 36 Liu, 0" 1 7 , 228 228 a, 2217, L. M M.,. , 12, 42, 227, 228 228 Lix, L. R. L., 12, 227 Lowman, R. L., 1 2 , 227 J.,. , 1102, 220 Ludbrook, J 02 , 1105, 05 , 220 Limn, D., 44, 225 Lunn, A. A D . , 29, 2 9 , 43, 44, 32, 228 Lunneborg, C. C . E., E., 3 2 , 41, 43, 228 Lunney, Limney, G. G. H H.,. , 204, 204, 228 228 .

M

Machin, D D., 31, 219 Machin, ., 3 1, 2 19 G.. B B.. JJ.,. , 1187, 228, 2232 Mancini, G 8 7, 228, 32 H. B B., 100, 103, 228 Mann, H. ., 1 00, 1 0 3 , 1105, 05 , 228 U.,. , 1179, 231 79 , 2 31 Mansmann, U Marin-Martinez, F. E,, 1173, Marin-Martinez, 73 , 1198, 9 8, 232 Markman, B B.. S., 232 S . , 33, 33, 2 32 Markus, K. K. A A.,. , 66,, 228 228 Markus, Marss-Garcia, A A.,, 557, Marss-Garcia, 7, 226 Martell, R. E,, 86, 229 229 R. F. Martin Andres, A., A., 1179, 183, 229 Martin 7 9, 1 83 , 2 29 Massey, F. JJ.,. , 228, 223 Massey, F. 8 , 223 D., 229 Matsumoto, D . , 1110, 1 0, 2 29 Maxwell, S.. E., 119, 121, Maxwell, S E., 1 1 9, 120, 120, 1 2 1 , 1122, 22, 124, 1 24, 1130, 3 0, 1131, 3 1 , 1132, 32, 1134, 34, 1135, 3 5 , 1136, 3 6, 1139, 3 9 , 1140, 40, 1143, 43 , 144, 1 44, 1160, 60, 161, 1 6 1 , 1162, 62 , 1164, 64, 165, 167, 204, 229 1 6 5 , 1166, 6 6, 1 6 7, 204, 19 McClanahan, T. T. M., 20 201,1 , 2219 McConway, K. K. JJ.,. , 229, 225 McConway, 9 , 43, 44, 225 McGaw, B., 224 B., 9, 50, 86, 1140, 40, 224 O., 90, 98, 1105, McGraw, K. K. 0., 05 , 1106, 06 , 229 W.,, 40, 2229 McKean, J. w. 29 J.. EE., 125, 219 McLean, J . , 54, 1 2 5 , 1126, 2 6, 2 19 5, 7 6 , 1117, 1 7, 1 3 6 , 229 McNemar, 0" a, 775, 76, 136, R. w. W.,, 1101, 229 Mee, R. 0 1 , 229 229 Meeks, S. S. L., L., 28, 61, 6 1 , 229 204, 220 220 Meyers, J. L., L., 204, 229 Micceri, T., T., 10, 1 0, 229 T.,, 887, Miller, D. T. 7, 2231 31 I. w. W,, 113, 220 3 , 220 Miller, I. J.. N., N., 1105, Miller, J 05 , 222 L. C., Miller, L. c., 113, 3 , 2230 30 Mohr, D.. c C.,. , 111, 229 Mohr, D 1 , 229 Monroe, K. K. B., 127, 223 Monroe, B., 1 2 7, 223

AUTHOR INDEX

G.. A A., 28, ., 2 8 , 1195, 9 5 , 200, 224, Morgan, G 234 S.. B., B., 68, 2229 Morris, S 29 X,, 113, 235 Morton, T. 3, 2 35 L. E E., 34, ., 3 4 , 204, 206, 208, Moses, L. 210, 217, 223, 2 1 0, 2 1 7, 2 2 3 , 229 E,, 118, 41, 58, 105, Mosteller, E 8, 4 1, 5 8, 1 05 , 106, 1 06, 222, 226, 226, 2227, 1198, 9 8 , 222, 2 7 , 229 Mueller, C. C. G G., 11, Mueller, ., 1 1 , 2230 30 T. II.,. , 113, 220 Mueller, T. 3 , 220 Mulaik, SS.. A., A., 66,, 24, 225 225 Mulaik, U., 101, 206, 2210, . , 99, 1 0 1 , 206, 1 0, 211, 211, Munzel, U 220 Murphy, B B.. P.P,, 1100, 230 Murph� 00, 2 30 K. R R.,. , 7, 2230 Murphy, K 30 Murray, L. L. W W,, 1124, Murray, 24, 1127, 2 7, 230 Muska, JJ.,. , 557, 236 Muska, 7, 2 36 B., 7, 2230 Myors, B . , 7, 30 N

J.,. , 1184, 224 Nam, J 84, 224 Nanna, M M.. J., 204, 204, 230 Nanna, S.. R., 57, 226 Nath, S R., 5 7, 226 R.. G G., 183, 230 Newcombe, R ., 1 83 , 2 30 R. 5S., 6,, 2230 Nickerson, R. ., 6 30 Nordholm, L. L. A A.,. , 122, 122, 1127, 2 7 , 1142, 42, Nordholm, 221 2 21

M. JJ.,. , 111, 230 Norusis, M. 1 , 230 H., 139, 140, Nouri, H . , 1134, 34, 1136, 3 6, 1 39, 1 40, 149, 1 49 , 222, 230 1166, 6 6 , 222, Nowell, A A.,. , 1112, 1 2 , 225 Nowell,

O o Pc C., 112, O'Brien, P. ., 1 1 2 , 2230 30 K. EE., 95, ., 9 5 , 1126, 2 6, 1127, 2 7 , 2230 30 O'Grady, K Olejnik, SS., 12, 109, 123, 124, Olejnik, ., 1 2, 1 0 9, 1 23, 1 24, 127, 1 2 7, 133, 136, 1 3 3 , 1135, 3 5, 1 3 6 , 1139, 3 9, 140, 1 40, 155, 1143, 4 3 , 1150, 5 0, 1 5 5 , 1156, 5 6 , 1159, 59, 160, 163, 1 60, 1161, 61, 1 6 3 , 1164, 64, 165, 165, 166, 226, 2227, 1 6 6 , 1167, 6 7, 226, 2 7 , 230 Olkin, II.,. , 55,, 9, 553, 58, Olkin, 3 , 54, 55, 5 8, 60, 72, 1136, 203, 225, 225, 226 62, 72, 3 6, 203, Onwuegbuzie, A. A. J., 81, Onwuegbuzie, J., 5, 28, 8 1 , 125, 1 25 , 127, 201, 230 1 2 7 , 1167, 6 7, 20 1, 2 30 Osburn, H H.. G G., 2211 . , 82, 22 E., 229, 9 , 43, 44, 225 Ostrowski, E., A. R R., 11, 36, Othman, A. ., 1 1, 3 6 , 39, 42, 227, 227, 230