8,628 358 43MB
Pages 304 Page size 450 x 572 pts Year 2011
THOMSON
•
VVADSVVORTH
Tile Practice of Social Researc/t, Eleventh Edition Earl Babbie AcquisiriollS Editor: Chris Caldeira Dew/opmem Ediwr: Sherry Symington Assistam Editor. Kristin Marrs Technology Project AJanager Dee Dee Zobian AIarketing Manager· Michelle Williams AIarkethzg COlllmunications Manager: Linda Yip Project Manager, Editorial Prodl!ction Matt Ballantyne Creative Director.: Rob Hugel Prilll Buyer. Becky Cross Permissions Editor Bob Kauser
Production Service Greg Hubit Bookworks Text Designer. Carolyn Deacy Copy Editor Molly D. Roth Illustrator. Lotus Art Cowr Designer: Bill Stanton COI'fr Image PhotoAlto/SuperSrock COl'er Primer; Phoenix Color Corp Composiror: G & S Typesetters, Inc Prillier· RR Donnelley-Crawfordsville
Dedication Sheila Babbie
© 2007 Thomson Wadsworth, a part of The Thomson Corporation. Thomson, the Star logo, and Wadsworth are trademarks used herein under license.
ALL RIGHTS RESERVED No part of this work covered by the copyright hereon may be reproduced or used in any form or by any means-graphic, electronic, or mechanicaL including photocopying, recording, taping, web distribution, information storage and retrieval systems, or in any other manner-without the written permission of the publisher . Printed in the United States of America 3 4 5 6 7 10 09 08 07 Library of Congress Control Number: 2005932100 Student Edition ISBN-I3: 978-0-495-09325-1 ISBN -10: 0-495-09325-4 International Student Edition ISBN-13: 978-0-495-18738-7 ISBN-IO: 0-495-18738-0
EmmView0 and Exam View Pro ° are registered trademarks of FSCreations, Inc Windows is a registered trademark of the Microsoft Corporation used herein under license. Macintosh and Power Macintosh are registered trademarks of Apple Computer, Inc Used herein under license. © 2007 Thomson Learning, Inc All Rights Reserved. Thomson Learning WebTutornl is a trademark of Thomson Learning, Inc
Thomson Higher Education 10 Davis Drive Belmont, CA 94002-3098 USA
For more information abour our products, contact us at: Thomson Learning Academic Resource Center 1-800-423-0563
For permission to use material from this text or product, submit a request online at http://www.thomsonrights.com Any additional questions abour permissions can be submitted bye-mail [email protected]
An Introduction to Inquiry 1 1 Human Inquiry and Science 2 2 Paradigms, Theory, and Social Research 30 3 The Ethics and Politics of Social Research 60
The Structuring of Inquiry 85 4 Research Design 86 5 Conceptualization,Operationalization, and Measurement 120
6 Indexes, Scales, and Typologies 152 7 The Logic of Sampling 179
Modes of Operation 219 8 Experiments 220 9 Survey Research 243 10 Qualitative Field Research 285
11 Unobtrusive Research 318 12 Evaluation Research 348
Analysis of Data 375 13 14 15 16 17
Qualitative Data Analysis 377 Quantitative Data Analysis 404 The Elaboration Model 430 Statistical Analyses 449 Reading and Writing Social Research 488
Appendixes A1 A
Using the Library A2
B GSS Household Enumeration Questionnaire A8 (
Random Numbers A18
D E F
Distribution of Chi Square A20 Normal Curve Areas An Estimated Sampling Error A23
G Twenty Questions aJournalist Should Ask about Poll Results A24
Preface xv Acknowledgments xxii
Part 1 An Introduction to Inquiry 1 Human Inquiry and Science 2
Paradigms, Theory, and Social Research 30
Introduction 3 Looking for Reality 4
Introduction 31 Some Social Science Paradigms 31
Ordinary Human Inquiry 4 Tradition 5 Authority 5 Errors in Inquiry, and Some Solutions 6 What's Really Real? 7
The Foundations of Social Science 10 Theory, Not Philosophy or Belief 10 Social Regularities 11 Aggregates, Not Individuals 13 A Variable Language 14
Some Dialectics of Social Research 19 Idiographic and Nomothetic Explanation 19 Inductive and Deductive Theory 22 Qualitative and Quantitative Data 23 Pure and Applied Research 25
The Ethics of Social Research 26 Voluntary Participation 26 No Harm to Subjects 27
vi
Main Points 27 0 Key Terms 28 0 Review Questions and Exercises 28 • Additional Readings 29 0 SSPS Exercises 29 • Online Study Resources 29
Macrotheory and Microtheory 33 Early Positivism 33 Social Danvinism 34 Conflict Paradigm 34 Symbolic Interactionism 35 Ethnomethodology 36 Structural Functionalism 37 Feminist Paradigms 37 Critical Race Theory 39 Rational Objectivity Reconsidered 40
Elements of Social Theory 43 Two Logical Systems Revisited 44 The Traditional Model of Science 44 Deductive and Inductive Reasoning: A Case Illustration 46 A Graphic Contrast 49
Deductive Theory Construction 51 Getting Started 51 Constructing Your Theory 52
vii
viii
Contents
Contents
An Example of Deductive Theory: Distributive Justice 52
Inductive Theory Construction 54
Part 2 The Structuring of Inquiry 85
An Example of Inductive Theory: Why Do People Smoke Marijuana? 54
The Links between Theory and Research 55
Research Design 86
Main Points 56 • Key Terms 57 • Review Questions and Exercises 58 • Additional Readings 58 • SSPS Exercises 59 • Online Study Resources 59
Introduction 87 Three Purposes of Research 87 Exploration 88 Description 89 Explanation 89
The Logic of Nomothetic Explanation 90
The Ethics and Politics of Social Research 60 Introduction 61 Ethical Issues in Social Research 62 Voluntary Participation 62 No Harm to the Participants 63 Anonymity and Confidentiality 64 Deception 67 Analysis and Reporting 68 Institutional Review Boards 69 Professional Codes of Ethics 71
Two Ethical Controversies 71 Trouble in the Tearoom 71 Observing Human Obedience 73
The Politics of Social Research 74 Objectivity and Ideology 75 Politics with a Little "p" 78 Politics in Perspective 79 Main Points 80 • Key Terms 81 • Review Questions and Exercises 81 • Additional Readings 82 • SSPS Exercises 82 • Online Study Resources 82
Criteria for Nomothetic Causality 90 False Criteria for Nomothetic Causality 91
Necessary and Sufficient Causes 93 Units of Analysis 94 Individuals 96 Groups 96 Organizations 97 Social Interactions 97 Social Artifacts 97 Units of Analysis in Review 99 Faulty Reasoning about Units of Analysis: The Ecological Fallacy and Reductionism 99
The Time Dimension 101 Cross-Sectional Studies 102 Longitudinal Studies 102 Approximating Longitudinal Studies 105 Examples of Research Strategies 107
How to Design a Research Project 107 Getting Started 109 Conceptualization 110 Choice of Research Method 110 Operationalization III Population and Sampling III Observations III Data Processing III Analysis 112 Application 112 Research Design in Review 112
The Research Proposal 113 Elements of a Research Proposal 114
Who Decides What's Valid? 148 Tension between Reliability and Validity 148
Main Points 115 • Key Terms 116 • Review Questions and Exercises 117 • Additional Readings 117 • SSPS Exercises 118 • Online Study Resources 118
Main Points 149 • Key Terms 150 • Review Questions and Exercises 150 • Additional Readings 150 • SSPS Exercises 151 • Online Study Resources 151
Conceptualization, Operationalization, and Measurement 120
Indexes, Scales, and Typologies 152
Introduction 121
Indexes versus Scales 153 Index Construction 156
Measuring Anything That Exists 121 Conceptions, Concepts, and Reality 122 Concepts as Constructs 123
Conceptualization 124 Indicators and Dimensions 125 The Interchangeability of Indicators 127 Real, Nominal, and Operational Definitions 128 Creating Conceptual Order 128 An Example of Conceptualization: The Concept of Anomie 130
Definitions in Descriptive and Explanatory Studies 132 Operationalization Choices 133 Range of Variation 133 Variations between the Extremes 135 A Note on Dimensions 135 Defining Variables and Attributes 136 Levels of Measurement 136 Single or Multiple Indicators 140 Some Illustrations of Operationalization Choices 141 Operationalization Goes On and On 142
Criteria of Measurement Quality 143 Precision and Accuracy 143 Reliability 143 Validity 146
Introduction 153
Item Selection 156 Examination of Empirical Relationships 157 Index Scoring 162 Handling Missing Data 163 Index Validation 165 The Status of Women: An Illustration of Index Construction 167
Scale Construction 168 Bogardus Social Distance Scale 168 Thurstone Scales 169 Likert Scaling 170 Semantic Differential 171 Guttman Scaling 172
Typologies 175 Main Points 176 • Key Terms 176 • Review Questions and Exercises 177 • Additional Readings 177 • SSPS Exercises 177 • Online Study Resources 177
The Logic of Sampling 179 Introduction 180 A Brief History of Sampling 181 President Alf Landon 181 President Thomas E Dewey 182 Two Types of Sampling Methods 183
ix
x
Contents
Nonprobability Sampling 183 Reliance on Available Subjects 183 Purposive or Judgmental Sampling 184 Snowball Sampling 184 Quota Sampling 185 Selecting Informants 186
The Theory and Logic of Probability Sampling 187 Conscious and Unconscious Sampling Bias 188 Representativeness and Probability of Selection 189 Random Selection 190 Probability Theory, Sampling Distributions, and Estimates of Sampling Error 191
Populations and Sampling Frames 199 Review of Populations and Sampling Frames 201
Types of Sampling Designs 202 Simple Random Sampling 202 Systematic Sampling 202 Stratified Sampling 205 Implicit Stratification in Systematic Sampling 207 Illustration: Sampling University Students 208
Multistage Cluster Sampling 208 Multistage Designs and Sampling Error 209 Stratification in Multistage Cluster Sampling 211 Probability Proportionate to Size (PPS) Sampling 211 Disproportionate Sampling and Weighting 213
Contents
Part 3 Modes of Operation 219 Experiments 220 Introduction 221 Topics Appropriate to Experiments 221 The Classical Experiment 222 Independent and Dependent Variables 222 Pretesting and Posttesting 222 Experimental and Control Groups 223 The Double-Blind Experiment 224
Selecting Subjects 225 Probability Sampling 225 Randomization 226 Matching 226 Matching or Randomization? 227
Variations on Experimental Design 228 Preexperimental Research Designs 228 Validity Issues in Experimental Research 230
An Illustration of Experimentation 234 Alternative Experimental Settings 237 Web-Based EX"periments 237 "Natural" Experiments 237
Strengths and Weaknesses of the Experimental Method 239 Main Points 240 • Key Terms 241 • Review Questions and Exercises 241 • Additional Readings 241 • SSPS Exercises 241 • Online Study Resources 241
Probability Sampling in Review 215 Main Points 215 • Key Terms 216 • Review Questions and Exercises 216 • Additional Read~ ings 217 • SSPS Exercises 217 • Online Study Resources 217
Survey Research 243 Introduction 244 Topics Appropriate for Survey Research 244 Guidelines for Asking Questions 245 Choose Appropriate Question Forms 246 Make Items Clear 247
Avoid Double~Barreled Questions 247 Respondents Must Be Competent to Answer 248 Respondents Must Be Willing to Answer 249 Questions Should Be Relevant 249 Short Items Are Best 249 Avoid Negative Items 250 Avoid Biased Items and Terms 250
Questionnaire Construction 251 General Questionnaire Format 252 Formats for Respondents 252 Contingency Questions 252 Matrix Questions 254 Ordering Items in a Questionnaire 255 Questionnaire Instructions 256 Pretesting the Questionnaire 257 A Composite Illustration 257
Self-Administered Questionnaires 257 Mail Distribution and Return 260 Monitoring Returns 261 Follow-up Mailings 261 Acceptable Response Rates 262 A Case Study 263
Interview Surveys 264 The Role of the Survey Interviewer 264 General Guidelines for Survey Interviewing 265 Coordination and Control 267
Telephone Surveys 269 Computer Assisted Telephone Interviewing (CATI) 270 Response Rates in Interview Surveys 271
New Technologies and Survey Research 272 Comparison of the Different Survey Methods 275
Qualitative Field Research 285 Introduction 286 Topics Appropriate to Field Research 286 Special Considerations in Qualitative Field Research 289 The Various Roles of the Observer 289 Relations to Subjects 291
Some Qualitative Field Research Paradigms 293 Naturalism 293 Ethnomethodology 294 Grounded Theory 296 Case Studies and the Extended Case Method 298 Institutional Ethnography 300 Participatory Action Research 301
Conducting Qualitative Field Research 303 Preparing for the Field 304 Qualitative Interviewing 305 Focus Groups 308 Recording Observations 309
Research Ethics in Qualitative Field Research 312 Strengths and Weaknesses of QUalitative Field Research 312 Validity 313 Reliability 314 Main Points 314 • Key Terms 31 5 • Review Questions and Exercises 315 • Additional Read~ inas 315 • SSPS Exercises 316 • Online Study b Resources 317
Strengths and Weaknesses of Survey Research 276 Secondary Analysis 277 Main Points 280 • Key Terms 281 • Review Questions and Exercises 282 • Additional Readinas o 282 • SSPS Exercises 283 • Online Study Resources 283
xi
Unobtrusive Research 318 Introduction 319 Content Analysis 320
xi i
Contents
Topics Appropriate to Content Analysis 320 Sampling in Content Analysis 321 Coding in Content Analysis 325 An illustration of Content Analysis 328 Strengths and Weaknesses of Content Analysis 330
Analyzing Existing Statistics 330 Durkheim's Study of Suicide 331 The Consequences of Globalization 332 Units of Analysis 333 Problems of Validity 333 Problems of Reliability 334 Sources of Existing Statistics 335
Contents
The Social Context 362 Logistical Problems 363 Some Ethical Issues 364 Use of Research Results 365
Social Indicators Research 369 The Death Penalty and Deterrence 370 Computer Simulation 371 Main Points 371 • Key Terms 372 • Review Questions and Exercises 372 • Additional Readings 372 • SSPS Exercises 373 • Online Study Resources 373
Part 4 Analysis of Data 375
Comparative and Historical Research 338 Examples of Comparative and Historical Research 338 Sources of Comparative and Historical Data 341 Analytical Techniques 342 Main Points 345 • Key Terms 345 • Review Questions and Exercises 345 • Additional Readings 346 • SSPS Exercises 346 • Online Study Resources 346
Qualitative Data Analysis 377 Introduction 378 Linking Theory and Analysis 378 Discovering Patterns 378 Grounded Theory Method 380 Semiotics 381 Conversation Analysis 383
Qualitative Data Processing 384
Evaluation Research 348 Introduction 349 Topics Appropriate to Evaluation Research 350 Formulating the Problem: Issues of Measurement 351 Specifying Outcomes 352 Measuring Experimental Contexts 353 Specifying Interventions 354 Specifying the Population 354 New versus Existing Measures 354 Operationalizing Success/Failure 355
Types of Evaluation Research Designs 356 Experimental Designs 356 Quasi-Experimental Designs 357 Qualitative Evaluations 361
Coding 384 Memoing 388 Concept Mapping 389
Computer Progranls for Qualitative Data 390 Leviticus as Seen through NUD*IST 391 UsinG NVivo to Understand Women Film Directors, by Sandrine Zerbib 393
Quantitative Data Analysis 404 Introduction 405 Quantification of Data 405
Main Points 401 • Key Terms 401 • Review Questions and Exercises 401 • Additional Readings 402 • SSPS Exercises 402 • Online Study Resources 402
Elaboration and Ex Post Facto Hypothesizing 444 Main Points 445 • Key Terms 446 • Review Questions and Exercises 446 • Additional Readings 447 • SSPS Exercises 447 • Online Study Resources 447
Developing Code Categories 406 Codebook Construction 408 Data Entry 409
Univariate Analysis 409 Distributions 410 Central Tendency 411 Dispersion 414 Continuous and Discrete Variables 415 Detail versus Manageability 415
Subgroup Comparisons 416 "Collapsing" Response Categories 416 Handling "Don't Knows" 418 Numerical Descriptions in Qualitative Research 419
Bivariate Analysis 419 Percentaging a Table 421 Constructing and Reading Bivariate Tables 423
Introduction to Multivariate Analysis 424 Sociological Diagnostics 425 Main Points 427 • Key Terms 428 • Review Questions and Exercises 428 • Additional Readings 428 • SSPS Exercises 429 • Online Study Resources 429
{:>
The Qualitative Analysis of Quantitative Data 398
xiii
Statistical Analyses 449 Introduction 450 Descriptive Statistics 450 Data Reduction 450 Measures of Association 451 Regression Analysis 455
Inferential Statistics 459 Univariate Inferences 460 Tests of Statistical Significance 461 The Logic of Statistical Significance 462 Chi Square 466
Other M.ultivariate Tedmiques 470 Path Analysis 471 Time-Series Analysis 471 Factor Analysis 474 Analysis of Variance 476 Discriminant Analysis 478 Log-Linear Models 482 Geographic Information Systems (GIS) 483 Main Points 484 • Key Terms 486 • Review Questions and Exercises 486 • Additional Readings 486 • SSPS Exercises 487 • Online Study Resources 487
The Elaboration Model 430 Introduction 431 The Origins of the Elaboration Model 431 The Elaboration Paradigm 436 Replication 437 Explanation 437 Interpretation 439 SpeCification 439 Refinements to the Paradigm 442
Reading and Writing Social Research 488 Introduction 489 OrgarIizing a Review of the Literature 489
xiv " Contents
Journals versus Books 490 Evaluation of Research Reports 491
Using the Internet Wisely 496 Some Useful Websites 496 Searching the Web 496 Evaluating the Quality of Internet Materials 498 Citing Internet Materials 502
Writing Social Research 503 Some Basic Considerations 503 Organization of the Report 505 Guidelines for Reporting Analyses 507 Going Public 508 Main Points 509 • Key Terms 510 0 Review Questions and Exercises 510 • Additional Readings 511 0 SSPS Exercises 511 • Online Study Resources 511
Appendixes A1 A Using the Library A2 B GSS Household Enumeration Questionnaire A8
( Random Numbers A18 D Distribution of Chi Square A20 E Normal Curve Areas A22 F Estimated Sampling Error A23 G Twenty Questions a Journalist Should Ask about Poll Results A24
Bibliography B1 Glossary G1 Index 11
A "few" years ago (I hate to tell you how many), I began teaching my first course in social research methodso The course focused specifically on survey research methods, and I had only six students in the classo As the semester progressed, I became more relaxed as a teacher. Before long, my students and I began meeting in my office, where I could grab and lend books from my own library as their relevance occurred to me during class meetingso One nagging problem I faced then was the lack of a good textbook on survey research. The available books fell into one of two groupso Some books presented the theoretical logic of research methods in such abstract terms that I didn't think students would be able to apply any of the general principles to the practical world of "doing" researcho The other books were just the oppositeo Often termed "cookbooks," they presented detailed, step-by-step instructions on how to conduct a survey. Unfortunately, this approach only prepared students to conduct surveys very much like the one described by the authorso Neither the abstract nor the "cookbook" approach seemed truly useful to students or their instructors. One day I found myself jotting down the table of contents for my ideal research methods textbook. It was organized around three theoretical principles on which scientific research is based: 10
Understanding the theoretical principles on which scientific research is based.
20
Seeing how those principles are reflected in the established techniques for doing research.
30
Being prepared to make appropriate compromises whenever field conditions do not permit the routine application of established techniqueso
The next day, unexpectedly, Wadsworth called and asked me to write a methods text! Survey Research Methods was published in 19730 My editors and I immediately received some good news, some bad news, and some additional good news. The first good news was that all survey research instructors seemed to love the book, and it was being used in virtually every survey research course in the countryo The bad news was that there weren't all that many survey research courseso The final good news, however, was that many instructors who taught general social research courses-covering survey research alongside other research methods-were inclined to use our book and supplement it with other books dealing with field research, experiments, and so on. While adjusting to our specialized book, however, many instructors suggested that Wadsworth have "that same guy" write a more general social research text The preface of the first edition of The Practice of . Social Research (1975) acknowledged the assistance of a dozen social research instructors from California to Floridao The book was a collaboration in a very real sense, even though only my name was on the cover and I was ultimately responsible for it. The Practice of Social Research was an immediate successo Although it was initially written for sociology courses, subsequent editions have been increasingly used in fields such as Psychology, Public
xv
xvi
Preface
Administration, Urban Studies, Education, Communications, Social Sciences, and Political Sciencein some 30 different disciplines, I'm told. Moreover, it's being used by teachers and researchers in numerous countries around the world, and in 2000 a Beijing publisher released a Chinese edition. I've laid out this lengthy history of the book for a couple of reasons. First, when I was a student, I suppose I thought of textbooks the same way that I thought about government buildings: They were just there. I never really thought about them as being written by human beings. I certainly never thought about textbooks as evolving: being updated, getting better, having errors corrected. As a student, I would have been horrified by the thought that any of my textbooks might contain mistakes! Second, pointing out the evolution of the book sets the stage for a preview of the changes that have gone into this 11 th edition. As with previous revisions' several factors have prompted changes. For example, because social research technology and practices are continually changing, the book must be updated to remain current and usefuL In my own teaching, I frequently find improved ways to present standard materials. Colleagues also often share their ideas for ways to teach specific topics. Some of these appear as boxed inserts in the book. Both students and instructors often suggest that various topics be reorganized, expanded, clarified, shrunk, or-gasp-deleted.
New to the 11 th Edition In an earlier edition of this book, I said, "Revising a textbook such as this is a humbling experience. No matter how good it seems to be, there is no end of ideas about how it could be improved." That observation still holds true. When we asked instructors what could be improved, they once again thought of things, and I've considered all their suggestions, followed many of them, and chosen to "think some more" about others. I've also received numerous comments and suggestions from students who
Preface have been assigned the book; many of the changes come from them. Here are some of the other changes in this edition, arranged by chapter: Chapter 1, "Human Inquiry and Science" Birthrate data have been eX"panded and updated. In the interests of highlighting the international character of social research, I've added a report on a courageous Egyptian sociologist, Saad Ibrahim. There is also a discussion of Crystal Eastman, an applied researcher early in the twentieth century. Chapter 2, "Paradigms, Theory, and Social Research" The discussions of feminist paradigms and of postmodernism have been expanded. There is a new section, "Critical Race Theory." The Sherif experiments are discussed in the section "Rational Objectivity Reconsidered." Chapter 3, "The Ethics and Politics of Social Research" There are two new sections: "The Politics of Sexual Research" and "Politics and the Census." There is an expanded discussion of the Rik Scarce case, and participatory action research is introduced here, in the context of social action. Chapter 4, "Research Design" I discuss the use of both qualitative and quantitative research, through an examination of terrorism studies. The importance of cohort studies is illustrated vvith an example in which the findings of simpler, cross-sectional studies are reversed. The discussion of units of analysis has been expanded. There is a new section on social interactions as a unit of analysis. The discussion of reductionism has been revised, and there is an eX"panded treatment of literature reviews. Chapter 5, "Conceptualization, Operationalization, and Measurement" The discussion of reification has been eX"panded, and conceptualization has been illustrated with an analysis of the meaning of "genocide." Chapter 6, "Indexes, Scales, and Typologies" I've expanded the discussion of the difference between indexes and scales. There's
also a discussion of the "reverse Bogardus scale." Chapter 7, "The Logic of Sampling" The chapter begins by looking at the polls from the 2004 presidential election. I've expanded the discussion of sampling error and replaced "Sampling Santa's Fans" with an example of sampling in Iran. There's also a discussion of weighting by political party in political polls. Chapter 8, "Experiments" I've added a discussion of labeling theory in connection with an existing sample that illustrates it. There is also a new section on web-based experiments. Chapter 9, "Survey Research" I've added some new examples and expanded the discussion of factors increasing and decreasing response rates. I've clarified that the rough guidelines regarding acceptable response rates are not based on theory; rather, they merely reflect my observation of professional norms. I've expanded the discussions of the General Social Survey and the "Analyze" program for analyzing GSS data online. I've eX"panded the discussion of online surveys, and there is a new discussion and citation for the secondary analysis of qualitative data. Chapter 10, "Qualitative Field Research" There is a new example from John Lofland on the demolition of an old building. I've introduced the concepts of "etic" and "emic" approaches to field research. I've more clearly distinguished case studies from comparative case studies, and I've added discussions of virtual ethnography and autoethnography. Finally, there is a discussion of telephone and online focus groups. Chapter 11, "Unobtrusive Research" I've added both qualitative and quantitative examples of content analysis and added a section on the consequences of globalization. There are directions for downloading the Statistical Abstract of the United States from the web. I've also added discussions of Stark's The Rise of Christianity and Deflem's book on international policing.
xvii
Chapter 12, "Evaluation Research" I've expanded the discussion of different types of evaluation. Chapter 13, "Qualitative Data Analysis" I've added a discussion of axial coding and selective coding. Chapter 14, "Quantitative Data Analysis" A new section, "Sociological Diagnostics," illustrates the power of social scientific analyses in addressing real social problems. I've updated Table 14-4 and expanded the explanation of it, as well as adding two new tables with discussions. I reformatted Figure 14-4 per reviewer suggestion. Chapters 15, "The Elaboration Model" The logic of elaboration lays the groundwork for most multivariate analysis in the social sciences; I've expanded the discussion of that point so students will understand why the chapter is important to their training in social research. Chapter 16, "Statistical Analyses" The major change in this chapter is the addition of sections on analysis of variance, discriminant analysis, log-linear models, and Geographic Information Systems. I've expanded the discussion of the difference between statistical and substantive significance. Finally, I've dropped the opening discussion of Mathematical Marvin and math avoidance to help make space for the expansion of techniques. Chapter 17, "Reading and Writing Social Research" There is a new section on organizing a review of the literature. There's also another new section on presenting papers and publishing articles-activities that students are pursuing with increased frequency. As always, I've updated materials throughout the book. As an instructor, I'm constantly searching for new and more effective ways of explaining social research to my own students; many of those new explanations take the form of diagrams. You'll find several new graphical illustrations in this edition. Once again, I've sought to replace aging research examples (except for the classics) with more
xviii
Preface
recent ones. I've also dropped some sections that I don't think do much for students anymore. There's one small change I'm especially pleased with. From the very first edition, I've tried to retain my sanity while writing glossary definitions by including some (arguably) funny ones. As I was doing this revision, I received suggested additions from two students, both outside the United States. I've included several of their definitions and will be open to more student submissions in the future. As with each new edition, I would appreciate any comments you have about how the book can be improved. Its evolution over the past 30 years has reflected countless comments from students and others.
Pedagogical Features Although students and instructors both have told me that the past editions of this book were effective tools for learning research methods, I have used this revision as an opportunity to review the book from a pedagogical standpoint, fine-tuning some elements, adding others. Here's the package we ended up with in the 11 th edition.
Chapter Overview Each chapter is preceded '''"ith a pithy focus paragraph that highlights the principal content of the chapter. Chapter Introduction Each chapter opens with an introduction that lays out the main ideas in that chapter and, importantly, relates them to the content of other chapters in the book. Clear and provocative examples Students often tell me that the examples-real and hypothetical-have helped them grasp difficult and/or abstract ideas, and this edition has many new examples as well as some that have proven particularly valuable in earlier editions. Graphics From the first time I took a course in research methods, most of the key concepts have made sense to me in graphical form. Whereas my task here has been to translate those mental pictures into words, I've also included some graphical illustrations in the book.
Preface Advances in computer graphics have helped me communicate to the Wadsworth artists what I see in my head and would like to share with students . I'm delighted with the new graphics in this edition.
Boxed examples and discussions Students tell me they like the boxed materials that highlight particular ideas and studies, as well as varying the format of the book. Beginning in the tenth edition, I've been using boxes that focus on the ways the mass media use and misuse social research. Running glossary Key terms are highlighted in the text, and definitions for each term are listed at the bottom of the page. This will help students learn the definitions of these terms and locate them in each chapter to review them in context. Main Points At the end of each chapter, a concise list of main points provides both a brief chapter summary and a useful review. The main points let students know exactly what ideas they should focus on in each chapter.
Key Terms A list of key terms follows the main points. These lists reinforce the students' acquisition of necessary vocabulary. The new vocabulary in these lists is defined in context in the chapters. The terms are boldfaced in the text, defined in the running glossary that appears at the bottom of the page throughout the text, and included in the glossary at the back of the book. Review Questions and Exercises This review aid allows students to test their understanding of the chapter concepts and apply what they've learned. Additional Readings In this section, I've included an annotated list of references that students can turn to if they would like to learn more on the topics discussed in the chapter. SPSS Exercises and Online Study Resources This edition continues previous editions' movement into cyberspace. Students can use the annotated list of useful websites in this section, as well as other resources mentioned,
to take their learning beyond the text and classroom.
Appendixes As in previous editions, a set of appendixes provides students with some research tools, such as a guide to the library, a table of random numbers, and so forth. There is an SPSS primer on the book's website along with primers for NVivo and Qualrus. Clear and accessible writing This is perhaps the most important "pedagogical aid" of alL I know that all authors strive to write texts that are clear and accessible, and I take some pride in the fact that this "feature" of the book has been one of its most highly praised attributes through its ten previous editions. It is the one thing students write most often about. For the 11 th edition, the editors and I have taken special care to reexamine literally every line in the book, pruning, polishing, embellishing, and occasionally restructuring for a maximally "reader-friendly" text. Whether you're new to this book or intimately familiar with previous editions, I invite you to open to any chapter and evaluate the vvriting for yourself.
xix
it heavily as a review of the text, and I count the exercises as half their grade in the course. In this edition, Ted and I have once again sorted through the exercises and added new ones we've created in our own teaching or heard about from colleagues. These include matching, multiplechoice, and open-ended discussion questions for each chapter, along with four to six exercises that use examples from everyday life to reinforce the material learned in the text. Also included are the answers to the matching and multiple-choice review questions, as well as a General Social Survey appendix, plus chapter objectives, chapter summaries, and key terms.
SPSS Student Version CD-ROM 74.0 (Windows only) Based on the professional version of one of the world's leading desktop statistical software packages, SPSS Student Version for Windows provides real-world software for students to do sociological data analysis, such as interpreting the GSS data sets found on the companion website.
Supplements
SPSS Practice Workbook
The Practice of Sodal Research, 11 th edition, is accompanied by a wide array of supplements prepared for both the instructor and student to create the best learning environment inside as well as outside the classroom. All the continuing supplements for The Practice of Social Research, 11 th edition, have been thoroughly revised and updated, and several are new to this edition. I invite you to examine and take full advantage of the teaching and learning tools available to you.
This handy guide is coordinated with the text and SPSS CD-ROM 14.0 to help students learn basic navigation in SPSS, including how to enter their own data; create, save, and retrieve files; produce and interpret data summaries; and much more. Also included are SPSS practice exercises correlated with each chapter. The guides comes free when bundled with the text.
GSS Data Disk
For the Student Guided Activities forThe Practice of Social Research, 77th Edition The student study guide and workbook Ted Wagenaar and I have prepared continues to be a mainstay of my own teaching. Students tell me they use
Over the years, the publisher and I have sought to provide up-to-date personal computer support for students and instructors. Because there are now many excellent programs for analyzing data, we've provided data to be used with them. With this edition, we've updated the data disk to include the 2004 GSS data.
xx
Preface
Experiencing Social Research: An Introduction Using MicroCase, 2nd Edition This supplementary workbook and statistical package, written by David 1. Ayers of Grove City College, includes short discussions, quizzes, and computerized exercises in which students vvilliearn and apply key methodological concepts and skills by analyzing, and in some cases collecting and building, simple data files for real sociological data. Designed to accompany The Practice of Sodal Research, the workbook and statistical package take a step-by-step approach to show students how to do real sociological research, using the same data and techniques used by professional researchers, to reinforce, build on, and complement course materials.
Readings in Social Research, 2nd Edition The concepts and methodologies of social research come to life in this interesting collection of articles specifically designed to accompany The Praaice of Soda I Research Diane Kholos Wysocki includes an interdisciplinary range of readings from the fields of psychology, sociology, social work criminal justice, and political science. The articles focus on the important methods and concepts typically covered in the social research course and provide an illustrative advantage. Organized by key concepts, each of the reader's II chapters begins with an introduction highlighting and explaining the research concept that each chapter's readings elucidate.
Researching Sociology on the Intemet, 3rd Edition This guide is designed to help sociology students do research on the Internet. Part One contains general information necessary to get started and answers questions about security, the type of sociology material available on the Internet, the information that is reliable and the sites that are not, the best ways to find research, and the best links to take students where they want to go. Part Two looks at each main topic in sociology and refers students to sites where they can ?btain the most enlightening research and information.
Preface
For the Instructor
Internet-Based Supplements
Instructor's Manual with Test Bank
SociologyNow™: Research Methods
This supplement offers the instructor brief chapter outlines, detailed chapter outlines, behavioral objectives, teaching suggestions and resources, InfoTraceD College Edition exercises, Internet exercises, and possible study guide answers. In addition, for each chapter of the text, the Test Bank has 20-30 multiple-choice questions, 10-15 true-false questions, and 3-5 essay questions with answers and page references. All questions are labeled as new, modified, or pickup so instructors know if the question is new to this edition of the Test Bank picked up but modified from the previous edition of the Test Bank, or picked up straight from the previous edition.
This feature empowers students with the first assessment-centered student tutorial system for Social Research/Research Methods. Seamlessly tied to the new edition, this interactive web-based learning tool helps students gauge their unique study needs with a "pretest" for each chapter to assess their understanding of the material. They are then given a personalized study plan that offers interactive, visual and audio resources to help them master the material. They can check their progress with an interactive posttest as well.
Exam View Computerized Testing for Macintosh and Windows This allows instructors to create, deliver, and customize printed and online tests and study guides. ExamView includes a Quick Test Wizard and an Online Test Wizard to guide instructors step-by-step through the process of creating tests. The test appears onscreen exactly as it will print or display online. Using ExamView's complete word-processing capabilities, instructors can enter an unlimited number of new questions or edit questions included with ExamView.
Multimedia Manager with Instructor's Resources: A MicrosoftB PowerPoint® Tool This one-stop lecture and class preparation tool makes it easy to assemble, edit, publish, and present custom lectures for a course, using Microsoft PowerPoint. The Multimedia Manager brings together art (figures, tables, maps) from this text, preassembled Microsoft Power Point lecture slides, sociology-related videos, and video and animations from the web or your mVil materials-culminating in a powerful, personalized, media-enhanced presentation. The CD-ROM also contains a full Instructor's Manual, Test Bank, and other instructor resources.
WebTutor™ Toolbox on Blackboard and Weba This web-based software for students and instructors takes a course beyond the classroom to an anywhere, anytime environment. Students gain access to the rich content from this book's companion websites. Available for WebCT and Blackboard only.
InfoTrac College Edition with InfoMarks™ Available as a free option with newly purchased texts, InfoTrac College Edition gives instructors and students four months of free access to an extensive online database of reliable, full-length articles (not just abstracts) from thousands of scholarly and popular publications going back as far as 22 years. Among the journals available are American JOlln1at of Sodology, Sodal Forces, Sodal Research, and Sodology. InfoTrac College Edition now also comes with InfoMarks, a tool that allows you to save your search parameters, as well as save links to specific articles. (Available to North American college and university students only; journals are subject to change.)
xxi
Companion Website for The Practice of Social Research, 71 th Edition The book's companion website (http://sociology .wadsworth.com/babbie_practicelIe) includes chapter-specific resources for instructors and students. For instructors, the site offers a passwordprotected instructor's manual, Microsoft PowerPoint presentation slides, and more. For students, there is a multitude of text-specific study aids, including the follovving: G
Tutorial practice quizzing that can be scored and emailed to the instructor
G
Web links
G
InfoTrac College Edition exercises
G
Flashcards
G
GSS data sets
III
Data analysis primers
G
MicroCase Online data exercises
III
Crossword puzzles
Thomson InSite for Writing and Research TM_ with Tumitin'E! Originality Checker InSite features a full suite of writing, peer review, online grading, and e-portfolio applications. It is an all-in-one tool that helps instructors manage the flow of papers electronically and allows students to submit papers and peer reviews online. Also included in the suite is Turnitin, an originality checker that offers a simple solution for instructors who want a strong deterrent against plagiarism, as well as encouragement for students to employ proper research techniques. Access is available for packaging with each copy of this book. For more information, visit http://insite.thomson.com.
Acknowledgments 1. David Martin, Ivlidwestern State University
Patrick A. Moore, University of Great Falls
It would be impossible to acknowledge adequately
all the people who have influenced this book. My earlier methods text, Survey Research Methods, was dedicated to Samuel Stouffer, Paul Lazarsfeld, and Charles Glock. I again acknowledge my debt to them. I also repeat my thanks to those colleagues acknowledged for their comments during the writing of the first, second, and third editions of this book. The present book still reflects their contributions. Many other colleagues helped me revise the book as well-including the amazing 110 instructors who took the time to respond to our electronic survey" Their feedback was invaluable. I also particularly want to thank the instructors who reviewed the manuscript of this edition and made helpful suggestions: Melanie Arthur, Portland State University Craig Forsyth, University of Louisiana at Lafayette Robert Kleidman, Cleveland State University Marci B. Littlefield, Indiana State University
Kimberly Dugan, Eastern Connecticut State University Herman Gibson, Henderson State University Ellen Goldring, Peabody College, Vanderbilt Susan Gore, University of Massachusetts at Boston Sarah Hurley, Arkansas State University Jana L Jasinski, University of Central Florida Ivlichael Kleiman, University of South Florida Augustine Kposowa, University of California, Riverside Patrick E McManimon, Jr., William Patterson University Jared Schultz, Texas Tech University Health Sciences Center Thomas C. Wilson, Florida Atlantic University Gary Wyatt, Emporia State University I would also like to thank survey participants who took the time to provide valuable information on several features of the book:
Jeanne Mekolichick, Radford University
James T. Ault, III, Creighton University
Bruce H. Wade, Spelman College
Paul Calarco, SUNY at Albany
Also, I appreciate the insights and assistance of those who reviewed the previous edition:
Roy Childs, University of the Pacific Liz Depoy, University of Maine Pat Fisher, University of Tennessee
Victor Agadjanian, Arizona State University Pat Christian, Canisius College William T. Clute, University of Nebraska at Omaha Marian A. O. Cohen, Framingham State College
xxii
Robert Gardner, Bowdoin College Elizabeth Jones, California University of Pennsylvania Barbara Keating, Minnesota State University, Mankato
I also ,vish to thank Anne Baird, Morehouse College; Rae Banks, Syracuse University; Roland Chilton, University of Massachusetts, Amherst; M. Richard Cramer, University of North Carolina, Chapel Hill; Joseph Fletcher, University of Toronto; Shaul Gabbay, University of Illinois, Chicago; Marcia Ghidina, University of North Carolina, Asheville; Roland Hawkes, Southern illinois University; Jeffrey Jacques, Florida A&M University; Daniel J. Klenow, North Dakota State University; Wanda Kosinski, Ramapo College, New Jersey; Manfred Kuechler, CUNY Hunter College; Cecilia Menjivar, Arizona State University; Joan Morris, University of Central Florida; Alisa Potter, Concordia College; Zhenchoa Qian, Arizona State University; Robert W. Reynolds, Weber State University; Laurie K. Scheuble, Doane College; Beth Anne Shelton, University of Texas, Arlington; Matthew Sloan, University of Wisconsin, Madison; Bernard Sorofman, University of Iowa; Ron Stewart; Randy Stoecker, University of Toledo; Theodore Wagenaar, Ivliami University, Ohio; Robert Wolf, Eastern Connecticut State University; and Jerome Wolfe, University of Miami. Over the years, I've become more and more impressed by the important role played by editors in books like this. Although an author's name appears on the book's spine, much of its backbone derives from the strength of its editors. Since 1973 I've worked >vith many sociology editors at Wadsworth, which has involved the kinds of adjustments you might need to make in successive marriages. As this book was gearing up for revision, the developmental editor, Sherry Symington, took editorial responsibility for this II th edition, and she immediately showed herself to be in command of the process. This is a new partnership, and I'm thrilled by the prospect of working together >vith her in the future. I also look forward to working >vith my new acquistions editor, Chris Caldeira, who came on board recently. There are also others at Wadsworth whose talents have had an impact on this book. I would like
xxiii
to acknowledge Wendy Gordon for her inspired marketing efforts, making sure everyone on the planet is aware of the book; Dee Dee Zobian for breaking new ground in publishing with her work on the website and other technology supplements; Elise Smith for managing the development of all of the useful print supplements to round out the teaching package; and Matt Ballantyne for shepherding the countless pieces and people required to turn a manuscript into a book. I also wish to thank Greg Hubit for managing all the critical production processes >vith great skill, and Carolyn Deacy for the creative new design for the book. Molly Roth is the standard by which copy editors should be judged, though that might set the standard too high. Molly and I have worked together on several books now, and she is simply the best. She successfully walks the thin line that separates a reluctance to say the author failed and a delight in saying it. I have never felt she let me get away >vith anything, nor have I felt anything but the highest support for my intention. Somehow, Molly can see what I'm trying to say and can often find ways of saying it more clearly and more powerfully. Ted Wagenaar has contributed extensively to this book. Ted and I coauthor the accompanying student study guide, Guided Activities for Practicing Social Research, but that's only the tip of the iceberg. Ted is a cherished colleague, welcome critic, good friend, and altogether decent human being. The II th edition of the book benefited from the assistance of a young sociologist you'll see and hear more of in the future: Sandrine Zerbib, a firstrate methodologist and scholar, working both the qualitative and quantitative sides of the street. She is particularly sensitive to feminist perspectives, and her experiences as a woman add a new dimension to the sociomethodological concerns we share" Sandrine's efforts are most apparent in Chapters 10 and 13. I've dedicated this book to my wife, Sheila, who has contributed greatly to its origin and evolution. Sheila and I first met when she was assigned
xxiv
Acknowledgments
to assist me on a project I was supervising at UC Berkeley's Survey Research CenteL* We've worked on numerous research projects during 40 years of marriage, and I suppose we'll do more in the fu*1 have always enjoyed saying Sheila married her boss, but 40 years later, it seems possible to me that I married mine,
lure, My gratitude to Sheila, however, extends well beyond our research activities, She is a powerful partner in life. Her insight and support take me always to the horizon of my purpose and allow me to look beyond, There's no way to thank her adequately for that.
i
I
I
•
I cience is afamiliar word; everyone uses it. Yet, images of science differ greatly. For some, science is mathematics; for others, it's white coats and laboratories. It's often confused with technology or equated with tough high school or college courses. Science is, of course, none of these things per se. It is difficult, however, to specify exactly what science is. Scientists themselves disagree on the proper definition. For the purposes of this book, we look at science as a method of inquiry-a way of learning and knowing things about the world around us. Contrasted with other ways of learning and knowing about the world, science has some special characteristics. It is a conscious, deliberate, and rigorous undertaking. Sometimes it uses statistical analyses, but often it does not. We'll examine these and other traits in this opening set of chapters. Dr. Benjamin Spack, the r~nowned author and pediatrician, began his books on .child care by assuring new parents that they already know more about child care than they think they do. I want to begin this book on a
similar note. Before you've read very far, you will realize that you already know a great deal about the practice of social research. In fact, you've been condu(ting research all your life. From that perspettive, the purpose of this book is to help you sharpen skills you already have and perhaps to show you some tricks that may not have occurred to you. Part 1 of this book lays the groundwork for the rest of the book by examining the fundamental characteristics and issues that make science different from other ways of knowing things. In Chapter 1, we'll begin with a look at native human inquiry, the sort of thing you've been dOing all your life. In the course of that examination, we'll see some of the ways people go astray in trying to understand the world around them, and I'll summarize the primary chara(teristics of scientific inquiry that guard against those errors. Chapter 2 deals with social theories and the links between theory and research. We'll look at some of the
I • I
theoretical paradigms that shape the nature of inquiry and largely determine what scientists look for and how they interpret what they see. Whereas most of this book deals with the scientific concerns of social research, Chapter 3 introduces two other important concerns: the ethics and politics of research. Researchers are governed by a set of ethical constraints that reflect ideals and values aimed at helping, not harming, people. Social research is also shaped by the fact that it operates within the political codes and systems of the societies it seeks to study and understand. These two topics appear throughout the book as critical components of social research. The overall purpose of Part 1 is to construtt a backdrop against which to view the specifics of research design and execution. After completing Part 1, you'll be ready to look at some of the more concrete aspetts of social research.
1
Introduction
Human Inquiry and Science
Introduction Looking for Reality Ordinary Human Inquiry Tradition Authority Errors in Inquiry, and Some Solutions What's Really Real? The Foundations of Social Science Theory, Not Philosophy or Belief Social Regularities Aggregates, Not Individuals A Variable Language
Some Dialectics of Social Research Idiographic and Nomothetic Explanation Inductive and Deductive Theory Qualitative and Quantitative Data Pure and Applied Research The Ethics of Social Research Voluntary Participation No Harm to Subjects
Sociologyi@Now'": Research Methods Use this online tool to help you make the grade on your next exam. Aiter reading this chapter, go to the "Online Study Resources" at the end of the chapter for instructions on how to benefit from SociologyNow: Research Methods.
Introduction This book is about knowing things-not so much !Vhat we know as ho!V we know it. Let's start by examining a few things you probably know already. You know the world is round. You probably also know it's cold on the dark side of the moon, and you know people speak Chinese in China. You know that vitamin C can prevent colds and that unprotected sex can result in AIDS. How do you know? Unless you've been to the dark side of the moon lately or done experimental research on the virtues of vitamin C, you know these things because somebody told them to you, and you believed what you were told. You may have read in National Geographic that people speak Chinese languages in China, and that made sense to you, so you didn't question it. Perhaps your physics or astronomy instructor told you it was cold on the dark side of the moon, or maybe you heard it on National Public Radio (NPR). Some of the things you know seem absolutely obvious to you. If someone asked you how you know the world is round, you'd probably say, "Everybody knows that." There are a lot of things everybody knows. Of course, everyone used to "know" that the world was fIat. Most of what you and I know is a matter of agreement and belief. Little of it is based on personal experience and discovery. A big part of growing up in any society, in fact, is the process of learning to accept what everybody around us "knows" is so. If you don't know those same things, you can't really be a part of the group. If you were to question seriously whether the world is really round, you'd quickly find yourself set apart from other people. You might be sent to live in a hospital 'with other people who question things like that. Although most of what we know is a matter of believing what we've been told, there is nothing wrong 'with us in that respect. It is simply the way human societies are structured, and it is a quite useful quality. The basis of knowledge is agreemenL Because we can't learn all we need to know
3
by means of personal experience and discovery alone, things are set up so we can simply believe what others tell us. We know some things through tradition and some things from "experts." I'm not saying you shouldn't question this received knowledge; I'm just draHring your attention to the way you and society normally get along regarding ''1'hat's so. There are other ways of knowing things, however. In contrast to knowing things through agreement, we can know them through direct experience-through observation. If you dive into a glacial stream fIoHring through the Canadian Rockies, you don't need anyone to tell you it's cold. The first tin1e you stepped on a thorn, you knew it hurt before anyone told you. When our experience conflicts with what everyone else knows, though, there's a good chance we'll surrender our experience in favor of the agreement. Let's take an example. Imagine you've come to a party at my house. It's a high-class affair. and the drinks and food are excellent. In particular, you're taken by one of the appetizers I bring around on a tray: a breaded, deep-fried appetizer that's especially zesw You have a couple-they're so delicious! You have more. Soon you're subtly moving around the room to be wherever I am when I arrive with a tray of these nibblies . Finally, you can't contain yourself any more. "What are they?" you ask. "How can I get the recipe?" And I let you in on the secret: "You've been eating breaded, deep-fried worms!" Your response is dramatic: Your stomach rebels, and you throw up all over the living-room rug. Argh! What a terrible thing to serve guests! The point of the story is that both of your feelings about the appetizer were quite reaL Your initialliking for them, based on your own direct experience, was certainly real. But so was the feeling of disgust you had when you found out that you'd been eating worms. It should be evident, however, that this feeling of disgust was strictly a product of the agreements you have with those around you that worms aren't fit to eat. That's an
4
looking for Reality
Chapter 1: Human Inquiry and Science
agreement you entered into the first time your parents found you sitting in a pile of dirt with half of a wriggling worm dangling from your lips. When they pried your mouth open and reached down your throat in search of the other half of the worm, you learned that worms are not acceptable food in our society. Aside from these agreements, what's wrong with worms? They are probably high in protein and low in calories. Bite-sized and easily packaged, they are a distributor's dream. They are also a delicacy for some people who live in societies that lack our agreement that worms are disgusting. Some people might love the worms but be turned off by the deep-fried breading. Here's another question to consider: "Are worms 'really' good or 'really' bad to eat?" And here's a more interesting question: "How could you know which was really so?" This book is about answering the second kind of question. The rest of this chapter looks at how we know what is real. We'll begin by examining inquiry as a natural human activity, something we all have engaged in every day of our lives. We'll look at the source of everyday knowledge and at some kinds of errors we make in normal inquiry. We'll then examine what makes science-in particular, social science-different. After considering some of the underlying ideas of social research, we'll conclude with an initial consideration of issues in social research.
looking for Reality Reality is a tricky business. You probably already suspect that some of the things you "know" may not be true, but how can you really know what's real? People have grappled with this question for thousands of years. One answer that has arisen out of that grappling is science, which offers an approach to both agreement reality and experiential reality. Scientists have certain criteria that must be met before they will accept the reality of something they have not personally experienced. In general, a scientific assertion must nave both logical and empirical
support: It must make sense, and it must not contradict actual observation. Why do earthbound scientists accept the assertion that the dark side of the moon is cold? First, it makes sense, because the moon's surface heat comes from the sun's rays, and the dark side of the moon is dark because it's turned away from the sun. Second, scientific measurements made on the moon's dark side confirm this logical expectation. So, scientists accept the reality of things they don't personally experiencethey accept an agreement reality-but they have special standards for doing so. More to the point of this book, however, science offers a special approach to the discovery of reality through personal experience. In other words, it offers a special approach to the business of inquiry. Epistemology is the science of knowing; methodology (a subfield of epistemology) might be called the science of finding out. This book presents and examines social science methodology, or how social scientists find out about human social life. Why do we need social science to discover the reality of social life? TO find out, let's first consider what happens in ordinary, nonscientific inquiry.
Ordinary Human Inquiry Practically all people, and many other animals as well, exhibit a desire to predict their future circumstances. Humans seem predisposed to undertake this task by using causal and probabilistic reasoning. First, we generally recognize that future circumstances are somehow caused or conditioned by present ones. We learn that getting an education will affect how much money we earn later in life and that swimming beyond the reef may bring an unhappy encounter with a shark. Sharks, on the other hand-whether or not they reason the matter through-may learn that hanging around the reef often brings a happy encounter with unhappy swimmers. Second, we also learn that such patterns of cause and effect are probabilistic in nature. That is, the effects occur more often when the causes occur than when the causes are absent-but not always.
Thus, students learn that studying hard produces good grades in most instances, but not every tinle. We recognize the danger of swimming beyond the reef, without believing that every such swim will be fatal. As we'll see throughout the book, science makes these concepts of causality and probability more explicit and provides techniques for dealing with them more rigorously than casual human inquiry does. It sharpens the skills we already have by making us more conscious, rigorous, and explicit in our inquiries. In looking at ordinary human inquiry, we need to distinguish between prediction and understanding. Often, we can make predictions without understanding-perhaps you can predict rain when your trick knee aches. And often, even if we don't understand why, we're willing to act on the basis of a demonstrated predictive ability. A racetrack buff who discovers that the third-ranked horse in the third race of the day always seems to win will probably keep betting without knowing, or caring, why it works out that way. Of course, the drawback in predicting without understanding will be powerfully evident when one of the other horses wins and our buff loses a week's pay. Whatever the primitive drives or instincts that motivate human beings and other animals, satisfying these drives depends heavily on the ability to predict future circumstances. For people, however, the attempt to predict is often placed in a context of knowledge and understanding. If you can understand why things are related to each other, why certain regular patterns occur, you can predict better than if you simply observe and remember those patterns. Thus, human inquiry aims at answering both "what" and "why" questions, and we pursue these goals by observing and figuring out. As I suggested earlier in this chapter, our attempts to learn about the world are only partly linked to direct, personal inquiry or experience. Another, much larger, part comes from the agreedon knowledge that others give us, those things "everyone knows." This agreement reality both assists and hinders our attempts to find out for ourselves. To see how, consider two important sources of our secondhand knowledge-tradition and authority.
5
Tradition Each of us inherits a culture made up, in part, of firmly accepted knowledge about the workings of the world. We may learn from others that planting corn in the spring ,viII garner the greatest assistance from the gods, that eating too much candy will decay our teeth, that the circumference of a circle is approximately twenty-two sevenths of its diameter, or that masturbation will blind us. We may test a few of these "truths" on our own, but we simply accept the great majority of them. These are the things that "everybody knows." Tradition, in this sense of the term, offers some dear advantages to human inquiry. By accepting what everybody knows, we avoid the overwhelming task of starting from scratch in our search for regularities and understanding. Knowledge is cumulative, and an inherited body of information and understanding is the jumping-off point for the development of more knowledge. We often speak of "standing on the shoulders of giants," that is, on those of previous generations. At the same time, tradition may hinder human inquiry. If we seek a fresh understanding of something everybody already understands and has always understood, we may be marked as fools for our efforts. More to the point. however, it rarely occurs to most of us to seek a different understanding of something we all "know" to be true.
Authority Despite the power of tradition, new knowledge appears every day. Quite aside from our own personal inquiries, we benefit throughout our lives from new discoveries and understandings produced by others. Often, acceptance of these new acquisitions depends on the status of the discoverer. You're more likely to believe that the common cold can be transmitted through kissing, for example, when you hear it from an epidemiologist than when you hear it from your uncle Pete (unless, of course, he's also an epidemiologist). Like tradition, authority can both assist and hinder human inquiry. We do well to trust the judgment of the person who has special training,
6
looking for Reality
Chapter 1: Human Inquiry and Science
expertise, and credentials in a given matter, espe~ cially in the face of controversy. At the same time, inquiry can be greatly hindered by the legitinlate authorities who err within their own province. Biologists, after all, make their mistakes in the field of biology Moreover, biological knowledge changes overtime. Inquiry is also hindered when we depend on the authority of experts speaking outside their realm of expertise. For example, consider the politicalor religious leader vl'ith no medical or biochemical expertise who declares that marijuana can fry your brain. The advertising industry plays heavily on this misuse of authority by, for example, having popular athletes discuss the nutritional value of breakfast cereals or having movie actors evaluate the performance of automobiles. Both tradition and authority, then, act as double-edged swords in the search for knowledge about the world. Simply put, they provide us with a starting point for our own inquiry, but they can lead us to start at the wrong point and push us off in the wrong direction.
Errors in Inquiry, and Some Solutions Quite aside from the potential dangers of tradition and authority, we often stumble and fall when we set out to learn for ourselves. Let's look at some of the common errors we make in our casual inquiries and at the ways science guards against those errors.
Inaccurate Observations Quite frequently. we make mistakes in our observations. For example, what was your methodology instructor wearing on the first day of class? If you have to guess, it's because most of our daily observations are casual and semiconscious. That's why we often disagree about what really happened. In contrast to casual human inquiry, scientific observation is a conscious activity. Simply making
replication Repeating a research study to test and either confirm or question the findings of an earlier study.
observation more deliberate helps reduce error. If you had to guess what your instructor was wearing on the first day of class, you'd probably make a mistake. If you had gone to the first class with a conscious plan to observe and record what your instructor was wearing, however, you'd be far more likely to be accurate. (You might also need a hobby.) In many cases, both simple and complex measurement devices help guard against inaccurate observations. Moreover. they add a degree of precision well beyond the capacity of the unassisted human senses. Suppose, for example, that you had taken color photographs of your instructor that day. (See earlier comment about needing a hobby.)
Overgeneralization When we look for patterns among the specific things we observe around us, we often assume that a few similar events provide evidence of a general pattern. That is, we overgeneralize on the basis of limited observations. (Think back to our now-broke racetrack buff.) Probably the tendency to overgeneralize peaks when the pressure to arrive at a general understanding is high. Yet it also occurs without such pressure. Whenever overgeneralization does occur, it can misdirect or impede inquiry. Inlagine you are a reporter covering an animalrights demonstration. You have orders to turn in your story in just two hours, and you need to know why people are demonstrating. Rushing to the scene, you start interviewing them, asking for their reasons. The first three demonstrators you interview give you essentially the same reason, so you simply assume that the other 3,000 are also there for that reason. Unfortunately, when your story appears, your editor gets scores of letters from protesters who were there for an entirely different reason. Scientists often guard against overgeneralization by committing themselves in advance to a sufficiently large and representative sample of observations. Another safeguard is provided by the replication of inquiry. Basically, replication means repeating a study and checking to see whether the same results are produced each time. Then, as a
further test. the study may be repeated again under slightly varied conditions.
Selective Observation One danger of overgeneralization is that it can lead to selective observation. Once we have concluded that a particular pattern exists and have developed a general understanding of why it exists, we tend to focus on future events and situations that fit the pattern, and we tend to ignore those that do not. Racial and ethnic prejudices depend heavily on selective observation for their persistence. Sometimes a research design will specify in advance the number and kind of observations to be made, as a basis for reaching a conclusion. If we wanted to learn whether women were more likely than men to support freedom to choose an abortion, vve would commit ourselves to making a specified number of observations on that question in a research project. We might select a thousand carefully chosen people to be interviewed on the issue. Alternately, when making direct observations of an event. such as attending the animal-rights demonstration, we might make a special effort to find "deviant cases"-precisely those who do not fit into the general pattern. Concluding that one youth became delinquent largely because of a lack of positive adult role models draws attention to the part role models play in keeping most youths on the straight and narrow. In this recollection of growing up in rural Vermont. Levvis Hill (2000: 35) presents another example of selective observation: Haying began right after the Fourth of JUly. The farmers in our neighborhood believed that anyone who started earlier was sure to suffer all the storms of late June in addition to those following the holiday which the oldtimers said were caused by all the noise and smoke of gunpowder burning. My mother told me that my grandfather and other Civil War veterans claimed it always rained hard after a big battle. Things didn't always work out the way the older residents promised, of course, but everyone remembered only the times they did.
7
Illogical Reasoning There are other ways in which we often deal with observations that contradict our understanding of the way things are in daily life. Surely one of the most remarkable creations of the human mind is "the exception that proves the rule." That idea doesn't make any sense at alL An exception can draw attention to a rule or to a supposed rule, but in no system of logic can it prove the rule it contradicts. Even so, we often use this pithy saying to brush away contradictions with a simple stroke of illogic What statisticians have called the gambler's fallac)' is another illustration of illogic in day-to-day reasoning. Often we assume that a consistent run of either good or bad luck foreshadows its opposite. An evening of bad luck at poker may kindle the belief that a v'{inning hand is just around the corner. Many a poker player has stayed in a game much too long because of that mistaken belief. Conversely, an extended period of good weather may lead you to worry that it is certain to rain on the weekend picnic Although all of us sometimes fall into embarrassingly illogical reasoning, scientists try to avoid this pitfall by using systems of logic consciously and explicitly. We'll examine the logic of science in more depth in Chapter 2. For now, simply note that logical reasoning is a conscious activity for scientists and that other scientists are always around to keep them honest. Science, then, attempts to protect its inquiries from the common pitfalls of ordinary inquiry. Accurately observing and understanding reality is not an obvious or trivial matter. Indeed, it's more complicated than I've suggested.
What's Really Real? Philosophers sometimes use the phrase Ilaive realism to describe the way most of us operate in our daily lives. When you sit at a table to write, you probably don't spend a lot of time thinking about whether the table is really made up of atoms, which in turn are mostly empty space. When you step into the street and see a city bus hurtling down on you, it's
8
Looking for Reality
Chapter 1: Human Inquiry and Science
not the best time to reflect on methods for testing whether the bus really exists. We all live with a view that what's real is pretty obvious-and that view usually gets us through the day. I don't want this book to interfere with your ability to deal with everyday life. I hope, however, that the preceding discussions have demonstrated that the nature of "reality" is perhaps more complex than we tend to assume in our everyday functioning. Here are three views on reality that will provide a philosophical backdrop for the discussions of science to follow. They are sometimes called premodern, modem, and postmodel71 views of reality (W Anderson 1990) .
Adopting the modern view is easy for most of us. Some might regard a dandelion as a beautiful flower, whereas others see only an annoying weed. In the premodern view, a dandelion has to be either one or the other. If you think it is a weed, it is really a weed, though you may admit that some people have a warped sense of beauty. In the modern view, a dandelion is simply a dandelion. It is a plant with yellow petals and green leaves. The concepts "beautiful flower" and "annoying weed" are subjective points of view imposed on the plant by different people. Neither is a quality of the plant itself, just as "good" and "evil" were concepts imposed on the spirits in the tree in our example.
FIGURE 1·1
The Postmodem View
A Book. All of these are the same book, but it looks different when viewed from different locations, perspectives, or "points of view:'
The Premodern View This view of reality has guided most of human history Our early ancestors all assumed that they saw things as they really were. In fact, this assumption was so fundamental that they didn't even see it as an assumption. No cavemom said to her cavekid, "Our tribe makes an assumption that evil spirits reside in the Old Twisted Tree." No, she said, "STAY OUT OF THAT TREE OR YOU'LL TURN INTO A TOAD!"
As humans evolved and became aware of their diversity, they came to recognize that others did not always share their views of things. Thus, they may have discovered that another tribe didn't buy the wicked tree thing; in fact, the second tribe felt that the spirits in the tree were holy and beneficiaL The discovery of this diversity led members of the first tribe to conclude that "some tribes I could name are pretty stupid." For them, the tree was still wicked, and they expected that some misguided people would soon be moving to Toad City.
The Modem View What philosophers call the modem view accepts such diversity as legitimate, a philosophical "different strokes for different folks." As a modern thinker, you would say, "I regard the spirits in the tree as evil, but I know that others regard them as good. Neither of us is right or wrong. There are simply spirits in the tree. They are neither good nor evil, but different people have different ideas about them."
Increasingly, philosophers speak of a postmodel71 view of reality. In this view, the spirits don't exist. Neither does the dandelion. All that's "real" are the images we get through our points of view. Put differently, there's nothing "out there"; it's all "in here." As Gertrude Stein said of the city of Oakland, "There's no there, there." No matter how bizarre the postmodern view may seem to you on first reflection, it has a certain ironic inevitability. Take a moment to notice the book you are reading; notice specifically what it looks like. Because you are reading these words, it probably looks something like Figure 1-1 a. Does Figure 1-1 a represent the way your book "really" looks? Or does it merely represent what the book looks like from your current point of view? Surely, Figures 1-1 b, c. and dare equally valid representations. But these views of the book differ greatly from each other. Which is the "reality"? As this example illustrates, there is no answer to the question "What does the book really look like?" All we can offer is the different ways it looks from different points of view. Thus, according to the postmodern view, there is no "book," only various images of it from different points of view. And all the different images are equally "true." Now let's apply this logic to a social situation. Imagine a husband and wife arguing. When she
a.
b.
c.
d.
FIGURE 1-2 Wife's Point of View. There is no question in the wife's mind as to who is right and rational and who is out of control. looks over at her quarreling husband, Figure 1-2 is what the wife sees. Take a minute to imagine what you would feel and think if you were the woman in this drawing. How would you eXlllain later to your best friend what had happened? What solutions to the conflict would seem appropriate if you were this woman?
9
FIGURE 1-3 Husband's Point of View. The husband has a very different perception of the same set of events, of course. Of course, what the woman's husband sees is another matter altogether, as shown in Figure 1-3. Take a minute to imagine experiencing the situation from his point of view. What thoughts and feelings would you have? How would you tell your best friend what had happened? What solutions would seem appropriate for resolving the conflict? Now consider a third point of view. Suppose you are an outside observer, watching this interaction between a wife and husband. What would it look like to you now? Unfortunately, we can't easily portray the third point of view without knovving something about the personal feelings, beliefs, past experiences, and so forth that you would bring to your task as outside observer. (Though I call you an "outside" observer, you are, of course, observing from inside your own mental system.) To take an extreme example, if you were a confirmed male chauvinist, you'd probably see the fight pretty much the same way that the husband saw it. On the other hand, if you were conmlitted to the view that men are generally unreasonable bums, you'd see things the way the vvife saw them in the earlier picture. Inlagine that instead you see two unreasonable people quarreling irrationally with each other.
10
Chapter 1: Human Inquiry and Science
Would you see them both as irresponsible jerks, equally responsible for the conflict? Or would you see them as two people facing a difficult human situation, each doing the best he or she can to resolve it? Imagine feeling compassion for them and noticing how each of them attempts to end the hostility, even though the gravity of the problem keeps them fighting. Notice how different these several views are. Which is a "true" picture of what is happening between the wife and the husband? You win the prize if you notice that the personal vie''Vpoint you bring to the observational task will again color your perception of what is happening. The postmodern view represents a critical dilemma for scientists. Although their task is to observe and understand vvhat is "really" happening, they are all human and, as such, bring along personal orientations that will color what they observe and how they explain it. There is ultimately no way people can totally step outside their llllmanness to see and understand the world as it "really" is-that is, independently of all human viewpoints. Whereas the modern view acknowledges the inevitability of human subjectivity, the postmodern view suggests there is actually no "objective" reality to be observed in the first place. There are only our several subjective views. You may want to ponder these three views of reality on your own for awhile. We'll return to them in Chapter 2 when we focus on specific scientific paradigms. Ultimately, two points ,viII emerge. First, established scientific procedures sometimes allow us to deal effectively with this dilemma-that is, we can study people and help them through their difficulties without being able to view "reality" directly. Second, different philosophical stances suggest a powerful range of possibilities for structuring our research.
theory A systematic explanation for the observa-
tions that relate to a particular aspect of life: juvenile delinquency, for example, or perhaps social stratification or political revolution.
The Foundations of Social Science
Let's turn now from general philosophical ideas to the foundations of social scientific approaches to understanding. A consideration of these underpinnings of social researcl1 vvill prepare the way for our exploration of specific research techniques.
The Foundations of Social Science Science is sometimes characterized as logicoempirical. This ungainly term carries an important message: As we noted earlier, the two pillars of science are logic and observation. That is, a scientific understanding of the world must both make sense and correspond to what we observe. Both elements are essential to science and relate to the three major aspects of social scientific enterprise: theory, data collection, and data analysis. To oversimplify just a bit, scientific theory deals with the logical aspect of science, whereas data collection deals with the observational aspect. Data analysis looks for patterns in observations and, where appropriate, compares what is logically expected with what is actually observed. Although this book is primarily about data collection and data analysis-that is, how to conduct social research-the rest of Part 1 is devoted to the theoretical context of research. Parts 2 and 3 then focus on data collection, and Part 4 offers an introduction to the analysis of data. Underlying the concepts presented in the rest of the book are some fundamental ideas that distinguish social science-theory, data collection, and analysis-from other ways of looking at social phenomena. Let's consider these ideas.
Theory Not Philosophy or Belief Today, social theory has to do vvith what is, not with what should be. For many centuries, however, social theory did not distinguish between these two orientations . Social philosophers liberally mixed their observations of what happened around
them, their speculations about why, and their ideas about how things ought to be. Although modern social researchers may do the same from time to time, as scientists they focus on how things actually are and why. This means that scientific theory-and, more broadly, science itself-cannot settle debates about values. Science cannot determine whether capitalism is better or worse than socialism. What it can do is determine how these systems perform in terms of some set of agreed-on criteria. For example, we could determine scientifically whether capitalism or socialism most supports human dignity and freedom only if we first agreed on some measurable definitions of dignity and freedom. Our conclusions would then be limited to the meanings specified in our definitions. They would have no general meaning beyond that. By the same token, if we could agree that suicide rates, say, or giving to charity were good measures of the quality of a religion, then we could determine scientifically whether Buddhism or Christianity is the better religion. Again, our conclusion would be inextricably tied to our chosen criteria. As a practical matter, people seldom agree on precise criteria for determining issues of value, so science is seldom useful in settling such debates. In fact questions like these are so much a matter of opinion and belief that scientific inquiry is often viewed as a threat to what is "already known." We'll consider this issue in more detail in Chapter 12, when we look at evaluation research. As you'll see, researchers have become increasingly involved in studying social programs that reflect ideological points of view, such as affirmative action or welfare reform. One of the biggest problems they face is getting people to agree on criteria of success and failure. Yet such criteria are essential if social research is to tell us anything useful about matters of value . By analogy, a stopwatch cannot tell us if one sprinter is better than another unless we first agree that speed is the critical criterion. Social science, then, can help us know only what is and why We can use it to determine what ought to be, but only when people agree on the
11
criteria for deciding what outcomes are better than others-an agreement that seldom occurs. As I indicated earlier, even knowing "what is and why" is no simple task. Let's turn now to some of the fundamental ideas that underlie social science's efforts to describe and understand social reality
Social RegulariUes In large part, social research aims to find patterns of regularity in social life. Although all the sciences share that aim, it sometimes imposes a barrier for people when they first approach social science. Certainly at first glance the subject matter of the physical sciences seems to be more governed by regularities than does that of the social sciences. A heavy object falls to earth every time we drop it, but a person may vote for a particular candidate in one election and against that same candidate in the next. Similarly, ice always melts when heated enough, but habitually honest people sometimes steaL Despite such examples, however, social affairs do exhibit a high degree of regularity that research can reveal and theory can explain. To begin with, the tremendous number of formal norms in society create a considerable degree of regularity. For example, traffic laws in the United States induce the vast majority of people to drive on the right side of the street rather than the left. Registration requirements for voters lead to some predictable patterns in which classes of people vote in national elections. Labor laws create a high degree of uniformity in the minimum age of paid workers as well as the minimum amount they are paid. Such formal prescriptions regulate, or regularize, social behavior. Aside from formal prescriptions, we can observe other social norms that create more regularities. Among registered voters, Republicans are more likely than Democrats to vote for Republican candidates. University professors tend to earn more money than unskilled laborers do. Men tend to earn more than women. And so on. Three objections are sometimes raised in regard to such social regularities. First some of the
12
Chapter 1: Human Inquiry and Science
regularities may seem trivial. For example, Republicans vote for Republicans; everyone knows that. Second, contradictory cases may be cited, indicating that the "regularity" isn't totally regular. Some laborers make more money than some professors do . Third, it may be argued that. unlike the heavy objects that cannot decide /lot to fall when dropped, the people involved in the regularity could upset the whole thing if they wanted to, Let's deal with each of these objections in turn,
The Charge of Triviality During World War II. Samuel Stouffer. one of the greatest social science researchers, organized a research branch in the US. Army to conduct studies in support of the war effort (Stouffer et a!. 19491950). Many of the studies concerned the morale among soldiers, Stouffer and his colleagues found there was a great deal of "common wisdom" regarding the bases of military morale, Much of their research was devoted to testing these "obvious" truths. For example, people had long recognized that promotions affect morale in the military. When military personnel get promotions and the promotion system seems fair, morale rises. Moreover, it makes sense that people who are getting promoted will tend to think the system is fair, whereas those passed over will likely think the system is unfair. By extension, it seems sensible that soldiers in units with slow promotion rates will tend to think the system is unfair. and those in units with rapid promotions will think the system is fair. But was this the way they really felt? Stouffer and his colleagues focused their studies on two units: the Military Police (MPs), which had the slowest promotions in the Army, and the Army Air Corps (forerunner of the U.S. Air Force), which had the fastest promotions. It stood to reason that MPs would say the promotion system was unfair. and the air corpsmen would say it was fair. The studies, however. showed just the opposite. Notice the dilemma faced by a researcher in a situation such as this. On the one hand, the observations don't seem to make sense. On the other
The Foundations of Social Science
hand, an explanation that makes obvious good sense isn't supported by the facts. A lesser person would have set the problem aside "for further study." Stouffer. however, looked for an explanation for his observations, and eventually he found it. Robert Merton and other sociologists at Columbia University had begun thinking and writing about something they called reference grollp tlze01Y. This theory says that people judge their lot in life less by objective conditions than by comparing themselves with others around themtheir reference group. For example, if you lived among poor people, a salary of $50,000 a year would make you feel like a millionaire. But if you lived among people who earned $500,000 a year, that same $50,000 salary would make you feel impoverished. Stouffer applied this line of reasoning to the soldiers he had studied. Even if a particular MP had not been promoted for a long time, it was unlikely that he knew some less deserving person who had gotten promoted more quickly. Nobody got promoted in the MPs. Had he been in the Air Corpseven if he had gotten several promotions in rapid succession-he would probably have been able to point to someone less deserving who had gotten even faster promotions. An MP's reference group, then, was his fellow MPs, and the air corpsman compared himself v'lith fellow corpsmen. Ultimately, then, Stouffer reached an understanding of soldiers' attitudes toward the promotion system that (1) made sense and (2) corresponded to the facts. This story shows that documenting the obvious is a valuable function of any science, physical or socia!. Charles Darwin coined the phrase fool's experiment to describe much of his own researchresearch in which he tested things that everyone else "already knew." As Darwin understood, the obvious all too often turns out to be wrong; thus, apparent triviality is not a legitimate objection to any scientific endeavor.
What about Exceptions? The objection that there are always exceptions to any social regularity does not mean that the regularity itself is unreal or unimportant. A partic-
ular woman may well earn more money than most men, but that provides small consolation to the majority of women, who earn less. The pattern still exists. Social regularities, in other words, are probabilistic patterns, and they are no less real sin1ply because some cases don't fit the general pattern. This point applies in physical science as well as social science. Subatomic physics, for example, is a science of probabilities. In genetics, the mating of a blue-eyed person with a brown-eyed person will probably result in a brown-eyed offspring. The birth of a blue-eyed child does not destroy the observed regularity, because the geneticist states only that the brown-eyed offspring is more likely and, further, that brown-eyed offspring will be born in a certain percentage of the cases. The social scientist makes a sin1ilar, probabilistic prediction-that women overall are likely to earn less than men. Once a pattern like this is observed, the social scientist has grounds for asking why it exists,
People Could Interfere Finally, the objection that the conscious will of the actors could upset observed social regularities does not pose a serious challenge to social science. This is true even though a parallel situation does not appear to exist in the physical sciences. (Presumably physical objects cannot violate the laws of physics, although the probabilistic nature of subatomic physics once led some observers to postulate that electrons had free wil!.) There is no denying that a religious, right-wing bigot could go to the polls and vote for an agnostic, left-wing African American if he wanted to upset political scientists studying the election. All voters in an election could suddenly switch to the underdog just to frustrate the pollsters. SinillarIy, workers could go to work early or stay home from work and thereby prevent the expected rush-hour traffic But these things do not happen often enough to seriously threaten the observation of social regularities, Social regularities, then, do exist. and social scientists can detect them and observe their effects.
13
When these regularities change over time, social scientists can observe and explain those changes.
Aggregates, Not Individuals The regularities of social life that social scientists study generally reflect the collective behavior of many individuals. Although social scientists often study motivations that affect individuals, the individual as such is seldom the subject of social science. Instead, social scientists create theories about the nature of group, rather than individual. life. Sinillarly, the objects of their research are typically aggregates, or collections, rather than individuals. Sometimes the collective regularities are amazing. Consider the birthrate, for example, People have babies for any number of personal reasons. Some do it because their own parents want grandchildren. Some feel it's a way of completing their womanhood or manhood. Others want to hold their marriages together. enjoy the ex-perience of raiSing children, perpetuate the family name, or achieve a kind of immortality. Still others have babies by accident. If you have fathered or given birth to a baby, you could probably tell a much more detailed, idiosyncratic story, Why did you have the baby when you did, rather than a year earlier or later? Maybe you lost your job and had to delay a year before you could afford to have the baby. Maybe you only felt the urge to become a parent after someone close to you had a baby. Everyone who had a baby last year had their own reasons for doing so. Yet. despite this vast diversity, and despite the idiosyncrasy of each individual's reasons, the overall birthrate in a society-the number of live births per 1.000 population-is remarkably consistent from year to year. See Table 1-1 for recent birthrates for the United States, If the US. birthrate were 15.9, 35,6, 7.8, 28,9, and 16.2 in five successive years, demographers would begin dropping like flies. As you can see, however, social life is far more orderly than that. Moreover. this regularity occurs without society-vvide regulation. No one plans how many babies will be born or determines who -vvill have them.
14
The Foundations of Social Science
Chapter 1: Human Inquiry and Science
TABLE 1-1
Birthrates, United States: 1980- 2002 * 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991
15.9 15.8 15.9 15.6 15.6 15.8 15.6 15.7 16.0 1M 16.7 16.2
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
15.8 15.4 15.0 14.6 14.4 14.2 14.3 14.2 14.4 14.1 13.9
'Live births per 1,000 population Source: U.s. Bureau of the Census,Statistical Abstract of the United States (Washington, DC: u.s. Government Printing Office, laOS), Table 70, p60.
You do not need a permit to have a baby; in fact, many babies are conceived unexpectedly, and some are borne unwillingly. Social scientific theories, then, typically deal with aggregated, not individual, behavior. Their purpose is to explain why aggregate patterns of behavior are so regular even when the individuals participating in them may change over time. We could even say that social scientists don't seek to explain people at alL They try to understand the systems in which people operate, the systems that e>..1Jlain why people do what they do. The elements in such a system are not people but variables.
A Variable Language Our most natural attempts at understanding usually take place at the level of the concrete and idiosyncratic. That's just the way we think.
variables Logical groupings of attributes. The variable gender is made of up of the attributes male and female.
Imagine that someone says to you, "Women ought to get back into the kitchen where they belong." You're likely to hear that comment in terms of what you know about the speaker. If it's your old uncle Harry who is also strongly opposed to daylight saving time, zip codes, and personal computers, you're likely to think his latest pronouncement simply fits into his rather dated point of view about things in generaL If, on the other hand, the statement is muttered by an incumbent politician trailing a female challenger in an election race, you'll probably explain his comment in a completely different way. In both examples, you're trying to understand the behavior of a particular individuaL Social research seeks insights into classes or types of individuals. Social researchers would want to find out aboLlt the kind of people who share that view of women's "proper" role. Do those people have other characteristics in common that may help explain their views? Even when researchers focus their attention on a single case study-such as a community or a juvenile gang-their aim is to gain insights that would help people understand other communities and other juvenile gangs. Similarly, the attempt to fully understand one individual carries the broader purpose of understanding people or types of people in generaL When this venture into understanding and explanation ends, social researchers will be able to make sense out of more than one person. In understanding what makes a group of people hostile to women who are active outside the home, they gain insight into all the individuals who share that characteristic. This is possible because, in an important sense, they have not been studying antifeminists as much as they have been studying antifeminism. It might then turn out that Uncle Harry and the politician have more in common than first appeared. Antifeminism is spoken of as a variable because it varies. Some people display the attitude more than others do. Social researchers are interested in understanding the system of variables that causes a particular attitude to be strong in one instance and weak in another.
The idea of a system composed of variables may seem rather strange, so let's look at an analogy. The subject of a physician's attention is the patient. If the patient is ill, the physician's purpose is to help the patient get welL By contrast, a medical researcher's subject matter is different-the variables that cause a disease, for example. The medical researcher may study the physician's patient, but for the researcher, that patient is relevant only as a carrier of the disease. That is not to say that medical researchers don't care about real people. They certainly do. Their ultimate purpose in studying diseases is to protect people from them. But in their research, they are less interested in individual patients than they are in the patterns governing the appearance of the disease. In fact. when they can study a disease meaningfully without involving actual patients, they do so. Social research, then, involves the study of variables and their relationships. Social theories are written in a language of variables, and people get involved only as the "carriers" of those variables. Variables, in turn, have what social researchers call attributes (or categories or values). Attributes are characteristics or qualities that describe an object-in this case, a person. Examples include female, Asian. alienated. collservarive, dishonest, intelligent, and fanner. Anything you might say to describe yourself or someone else involves an attribute. Variables. on the other hand, are logical groupings of attributes. Thus, for example, male and female are attributes, and sex or gender is the variable composed of those two attributes. The variable occupation is composed of attributes such as farmer, professor, and truck driver. Social class is a variable composed of a set of attributes such as upper class, middle class, and lOlVer class. Sometimes it helps to think of attributes as the categories that make up a variable. (See Figure 1-4 for a schematic review of what social scientists mean by variables and attributes.) The relationship between attributes and variables forms the heart of both description and explanation in science. For example, we might describe a college class in terms of the variable gender by reporting the observed frequencies of the attributes
Female
Age
Upper class
African American
Young
Occupation
Social class
Gender
Race/ethnicity
Plumber
Age
Young, middle-aged, old
Gender
Female, male
Occupation
Plumber, lawyer, data-entry clerk ...
Race/ethnicity
African American, Asian, Caucasian, Latino .. .
Social class
Upper, middle, lower .. .
15
FIGURE '-4
Variables and Attributes. In social research and Iheory, bolh variables and attributes represent social concepts. Variables are sets of related attributes (categories, valUes).
male and female: "The class is 60 percent men and 40 percent women." An unemployment rate can be thought of as a description of the variable employmellt SlafllS ofa laborforce in terms of the attributes employed and unell/ployed. Even the report of fami(v income for a city is a summary of attributes composing that variable: 53.124; 510,980; S35,000; and so forth. Sometimes the meanings of the concepts that lie behind social science concepts are immediately clear . Other times they aren't This point is discussed in "The Hardest Hit Was, .. " The relationship between attributes and variables is more complicated in the case of explanation and gets to the heart of the variable language of scientific theory. Here's a simple example, involving two variables, educarioll and prejudice. For the sake
attributes Characteristics of people or things.
16
Chapter 1: Human Inquiry and Science
The Foundations of Social Science
17
a. The uneducated are more prejudiced than the educated.
I
nearly i 982, adeadly storm ravaged the San Francisco Bay Area, leaving an aftermath of death, injury, and property damage..As the mass media sought to highlight the most tragic results of the storm, they sometimes focused on several people who were buried alive in amud slide in Santa Cruz Other times, They covered the plight of the 2,900 made homeless in Marin County Implicitly, everyone wanted to know where the worst damage Vias done, but the answer was not clear. Here are some data describing the results of the storm in two counties. Marin and Santa Cruz. Look over the comparisons and see if you can determine which county was "hardest hit" Certainly, in terms of the loss of life, Santa Cruz was the "hardest hit" of the two counties Yet more than seven times as many people were injured in Marin as in Santa Cruz;certainly, Marin County was "hardest hit" in that regard . Or consider the number of homes destroyed (worse in Santa Cruz) or damaged (worse in Marin) It matters which you focus on. The same dilemma holds true for the value of the damage done:Should we pay more attention to private damage or public damage? So which county was "hardest hi(7 Ultimately, the question as posed has no answer. Although you and I both have images in our minds about communities that are "devastated" or communities that are only"lightly touched,"these images are not precise enough to permit rigorous measurements
Business destroyed
Marin
SantaCruz
$1.50 million
$565 million
5
22
People injured
379
50 400
People killed People displaced
370
Homes destroyed
28
135
Homes damaged
2,900
300
Businesses destroyed
25
10
Businesses damaged
800
35
Private damages
$65.1 million
$50.0 million
Public damages
$15.0 million
$565 million
b. There is no apparent relationship between education and prejudice.
The question can be answered only if we can specify what we mean by "hardest hit"lf we measure it by death toll, then Santa Cruz was the hardest hit.lf we choose to define the variable in terms of people injured and or displaced, then Marin was the bigger disaster.The simple fact is that we cannot answer the question without specifying exactly what we mean by the term hardest hitJhis is afundamental requirement that will arise again and again as we attempt to measure social science variables. Data soufce;San Francisco Chranide.!anuary 13, 1982,p.16.
FIGURE 1-5 of simplicity, let's assume that the variable education has only two attributes: educated and uneducated, Similarly, let's give the variable prejudice two attributes: prejudiced and ul1prejudiced. Now let's suppose that 90 percent of the uneducated are prejudiced, and the other 10 percent are unprejudiced. And let's suppose that 30 percent of the educated people are prejudiced, and the other 70 percent are unprejudiced, This is illustrated graphically in Figure 1-5a. Figure 1-5a illustrates a relationship or association between the variables educatiol1 and prejudice. This relationship can be seen in terms of the pairings of attributes on the two variables. There are two predominant pairings: (1) those who are educated and unprejudiced and (2) those who
are uneducated and prejudiced. Here are two other useful ways of viewing that relationship. First, let's suppose that we playa game in which we bet on your ability to guess whether a person is prejudiced or unprejudiced. I'll pick the people one at a time (not telling you which ones I've picked), and you have to guess whether each person is prejudiced. We'll do it for all 20 people in Figure 1- 5a. Your best strategy in this case would be to guess prejudiced eadl time, because 12 out of the 20 are categorized that way. Thus, you'll get 12 right and 8 wrong, for a net success of 4. Now let's suppose that when I pick a person from the figure, I tell you whether the person is educated or uneducated. Your best strategy now would be to guess prejudiced for each uneducated
Relationship between Two Variables (Two Possibilities). Variables such as education and prejudice and their attributes (educated/ uneducated, prejudiced/unprejudiced) are the foundation for the examination·of causal relationships in social research.
person and unprejudiced for each educated person. If you followed that strategy, you'd get 16 right
and 4 wrong. Your improvement in guessing prejudice by knoi'ving education is an illustration of what it means to say that the variables are related. Second, by contrast, let's consider how the 20 people would be distributed if education and prejudice were unrelated to each other (Figure 1-5b). Notice that half the people are educated, and half are uneducated. Also notice that 12 of the 20 (60 percent) are prejudiced. If 6 of the 10 people in each group were prejudiced, we would conclude
that the two variables were unrelated to each other. KnOwing a person's education would not be of any value to you in guessing whether that person was prejudiced. We'll be looking at the nature of relationships between variables in some depth in Part 4. In particular, we'll eX1Jlore some of the ways relationships can be discovered and interpreted in research analysis. For now, you need a general understanding of relationships in order to appreciate the logic of social scientific theories. Theories describe the relationships we might logically expect between variables. Often, the
18
Some Dialectics of Social Research
Chapter 1: Human Inquiry and Science
expectation involves the idea of causation" That is, a person's attributes on one variable are expected to cause, predispose, or encourage a particular attribute on another variable" In the example just illustrated, we might theorize that a person's being educated or uneducated causes a lesser or greater likelihood of that person seeming prejudiced" As I'll discuss in more detail later in the book, educarioll and prejudice in this example would be regarded as an independent variable and a dependent variable, respectively" These two concepts are implicit in causa!, or deterministic, models" In this example, we assume that the likelihood of being prejudiced is determined or caused by something . In other words, prejudice depends on something else, and so it is called the "dependent" variable" What the dependent variable depends on is an independent variable, in this case, education" For the purposes of this study, educarioll is an "independent" variable because it is independent of prejudice (that is, people's level of education is not caused by whether or not they are prejudiced)" Of course, variations in levels of education can, in turn, be found to depend on something else" People whose parents have a lot of education, for example, are more likely to get a lot of education than are people "whose parents have little education" In this relationship, the subject's education is the dependent variable, and the parents' education is the independent variable" We can say the independent variable is the cause, the dependent variable the effect In our discussion of Figure 1- 5, we looked at the distribution of the 20 people in terms of the two variables" In constructing a social scientific theory, we would derive an eX'Pectation regarding the
relationship between the two variables based on what we know about eadL We know, for example, that education ex'Poses people to a wide range of cultural variation and to diverse points of view-in short, it broadens their perspectives" Prejudice, on the other hand, represents a narrower perspective. Logically, then, we might expect education and prejudice to be somewhat incompatible. We might therefore arrive at an expectation that increasing education would reduce the occurrence of prejudice, an eX'Pectation that would be supported by our observations. Because Figure 1-5 has illustrated two possibilities-that education reduces the likelihood of prejudice or that it has no effect-you might be interested in knowing what is actually the case. As one measure of prejudice, the 2002 General Social Survey (GSS) asked a national sample of adults in the United States why" (Negroes/Blacks/AfricanAmericans) have worse jobs, income, and housing than white people." One of the possibilities offered respondents was "because most (Negroes/Blacks/ African-Americans) have less in-born ability to learn." Only 12 percent of all the respondents said that was the reason for African Americans' disadvantaged status. Table 1-2 presents an analysis of those data, grouping respondents according to their levels of educational attainment (See the box for more about the GSS.) Notice that the theory has to do 'with the tvvo variables educarioll and prejudice, not ..vith people as such" People are the carriers of those two variables, so the relationship between the variables
TABLE 1-2
Education and Racial Prejudice independent variable A variable with values that are not problematical in an analysis bu t are taken as simply given" An independent variable is presumed to cause or determine a dependent variable" dependent variable A variable assumed to depend on or be caused by another (called the independent variable)" If you find that income is partly a function of amowlf offomwi education, iI/COllie is being treated as a dependent variable"
Level of Education
Less than high school graduate High school graduate Junior college Bachelor's degree Graduate degree
Percent saying African Americans have less inborn ability to learn
26 10
15 6 3
T
he National Opinion Research Center (NaRC) at the University of Chicago conducts a periodic national survey of American public opinion for the purpose of making such data available for analysis by the social research community. Many data examples in this book come from
can only be seen when we observe people. Ultimately, however, the theory uses a language of variables. It describes the associations that we might logically expect to exist between particular attributes of different variables" (You can do this data analysis for yourself with nothing more than a connection to the Internet. See "Analyzing Data Online.")
Some Dialectics of Social Research There is no one way to do social research. (If there were, this would be a much shorter book) In fact, much of the power and potential of social research lies in the many valid approaches it comprises. Four broad and interrelated distinctions, however, underlie the variety of research approaches" Although one can see these distinctions as competing choices, a good social researcher learns each of the orientations they represent. This is what I mean by the "dialectics" of social research: There is a fruitful tension between the complementary concepts I'm about to describe.
Idiographic and Nomothetic Explanation All of us go through life explaining things. We do it every day. You explain why you did poorly or well on an exam, why your favorite team is winning or losing, why you may be having trouble getting good dates or a decent job. In our everyday explanations, we engage in two distinct forms of causal reasoning, though we do not ordinarily distinguish them.
19
that source You can learn more about the GSS at the official website maintained by the University of Michigan httpJ/wwVlicpsrumich edu/GSSI.
Sometimes we attempt to ex'Plain a single situation in idiosyncratic detaiL Thus, for example, you may have done poorly on an exam because (I) you forgot there was an exam that day, (2) it was in your worst subject, (3) a traffic jam made you late for class, (4) your roommate kept you up the night before the exam by playing loud music, (5) the police kept you until dawn demanding to know what you had done with your roommate's stereo-and what you had done with your roonm1ate, for that matter-and (6) a wild band of coyotes ate your textbook. Given all these circumstances, it's no wonder you did poorly. This type of causal reasoning is called an idiographic explanation" Idio- in this context means unique, separate, peculiar. or distinct, as in the word idiosYllcrasy" When we have completed an idiographic explanation, we feel that we fully understand the causes of what happened in this particular instance. At the same time, the scope of our explanation is limited to the single case at hand, Although parts of the idiographic explanation might apply to other situations, our intention is to ex'Plain one case fully" Now consider a different kind of explanation" ( I) Every time you study with a group, you do better on the exam than if you had studied alone. (2) Your favorite team does better at home than on
idiographic An approach to explanation in which we seek to exhaust the idiosyncratic causes of a particular condition or evenL Imagine trying to list all the reasons why you chose to attend your particular college. Given all those reasons, it's difficult to ima a ine your making any other choice" '"
20
Some Dialectics of Social Research
Chapter 1: Human Inquiry and Science
can test the relationship between prejudice and education for have aconnection to the InternetJhe data just presented in the text are taken from the General Social Survey, found at http!lwwwicpsLumich . edu/GSSI. Once you reach that location, you'll discover several buttons along the top of the page Click the box labeled "Analyze NTh is will open up a data-analysis program created by the University of California, Berkeley For present purposes, you should click the second radio button, for "Frequencies or crosstabulation," and then click"Start" near the bottom of the page. This will take you to apage where you can speci~ the analysis you would like to seeJo replicate the analysis presented in the text, you will use three variables in the GSS data set RACDIF2 asks people whether they blame African Americans for their disadvantages in U.5.society. DEGREE gives us the respondent's educationallevel,and YEAR let's you speci~ which GSS survey you are interested in-2000 in this case. Enter this information in the online form as shown below. This specifies RACDIF2 as the dependent variable (Row) and DEGREE as the independent variable (Column)YEAR(2000) indicates that you want to analyze data only from the 2000 survey (Filter) Leaving the filter blank would cause the program to analyze the data from all the
surveys done since 1972 that included the two variables under study. Click "Run the Table" at the bottom to be presented with atable containing the data shown in the text Once you've done that, you might want to do some exploration on your own . The"Subject"hyperlink will take you to acatalog oftopics studied by the GSS over the years. You might wantto examine other indicators of prejudice RACDIF1, RACDIF3, and RACDI F4, for example. Or, EDUC will give you the number of years of education respondents report,ratherthan categorizing that variable as it's done in DEGREE. Or, you might want to look at the relationship between education and prejudice over timechanging the value of YEAR The possibilities are almost endless,and you might have some fun . With the increase interest in online data analysis, you may find this GSS site overused and slow to respond In that case, you might look around for other analysis engines, such as the Cultural Policy and the Arts'CPANDA-FACTOID at httpJ/www.cpanda.org!codebookDB/ sdalite.jsp?id=a00079. Or you may want to search the web for something like"Analyze'General Social Survey:" Because the web is an evolving resource, new tools will likely appear by the time this textbook reaches you
SDA Tables Program (Selected Study: GSS 1972-2000 Cumulative Datafile) Help: G
e
e
Other things [being] the same, a person will prefer to steal from a fellow group member rather than from an outsider. The preference to steal from a fellow group member is more pronounced in poor groups than in rich groups. In the case of theft, informants arise only in cross-group theft, in which case they are members of the thief's group. Persons who arrive a week late at summer camp or for freshman year of college are more likely to become friends of persons who play games of chance than of persons who play games of skilL
e
A society becomes more vulnerable to deficit spending as its wealth increases.
e
Societies in which population growth is welcomed must be societies in which the set of valued goods includes at least one quantity-good, such as wealth.
Jasso's theory leads to many other propositions, but this sampling should provide a good sense of where deductive theorizing can take you. To get a feeling for how she reasons her way to these propositions, let's look briefly at the logic involved in two of the propositions that relate to theft within and outside one's group. e
Other things [being] the same, a person vvill prefer to steal from a fellow group member rather than from an outsider.
Beginning >vith the assumption that thieves want to maximize their relative wealth, ask yourself whether that goal would be best served by
53
stealing from those you compare yourself vvith or from outsiders. In each case, stealing will increase your Actual Holdings, but what about your Comparison Holdings? A moment's thought should suggest that stealing from people in your comparison group ,vill lower their holdings, further increasing your relative wealth. To simplify, imagine there are only two people in your comparison group: you and I. Suppose we each have S100. If you steal S50 from someone outside our group, you will have increased your relative wealth by 50 percent compared with me: S150 versus SI 00. But if you steal S50 from me, you will have increased your relative wealth 200 percent: S150 to my S50. Your goal is best served by stealing from within the comparison group. e
In the case of theft, informants arise only in cross-group theft, in which case they are members of the thief's group.
Can you see why it would make sense for informants (1) to arise only in the case of cross-group theft and (2) to corne from the thief's comparison group? This proposition again depends on the fundamental assumption that everyone wants to increase his or her relative standing. Suppose you and I are in the same comparison group, but this time the group contains additional people. If you steal from someone else within our comparison group, my relative standing in the group does not change. Although your wealth has increased, the average wealth in the group remains the same (because someone else's wealth has decreased by the same amount). So my relative standing remains the same. I have no incentive to inform on you. If you steal from someone outside our comparison group, however, your nefarious income increases the total wealth in our group, Now my own wealth relative to that total is diminished. Because my relative wealth has suffered, I'm more likely to inform on you in order to bring an end to your stealing. Hence, informants arise only in crossgroup theft. This last deduction also begins to explain why these informants come from the thief's own comparison group. We've just seen how your theft
54
The Links between TIleory and Research
Chapter 2: Paradigms, TIleory, and Social Research
decreased my relative standing. How about members of the other group (other than the individual you stole from)? Each of them actually profits from the theft, because you have reduced the total with which they compare themselves. Hence, they have no reason to inform on you. Thus, the theory of distributive justice predicts that informants arise from the thief's own comparison group. This brief peek into Jasso's derivations should give you some sense of the enterprise of deductive theory. Of course, the theory guarantees none of the given predictions. The role of research is to test each of them to determine whether what makes sense (logic) actually occurs in practice (observation) .
Inductive Theory Construction As we have seen, quite often social scientists begin constructing a theory through the inductive method by first observing aspects of social life and then seeking to discover patterns that may point to relatively universal principles. Barney Glaser and Anselm Strauss (1967) coined the term grounded theory in reference to this method. Field research- the direct observation of events in progress-is frequently used to develop theories through observation. In a long and rich tradition, anthropologists have used this method to good advantage. Among modern social scientists, no one has been more adept at seeing the patterns of human behavior through observation than Erving Goffman: A game such as chess generates a habitable universe for those who can follow it. a plane of being, a cast of characters with a seemingly unlimited number of different situations and acts through which to realize their natures and destinies. Yet much of this is reducible to a small set of interdependent rules and practices. If the meaningfulness of everyday activity is similarly dependent on a closed, finite set of rules, then explication of them would give one a powerful means of analyzing social life. (19745)
In a variety of research efforts, Goffman uncovered the rules of such diverse behaviors as living in a mental institution (1961) and managing the "spoiled identity" of being disfigured (1963). In each case, Goffman observed the phenomenon in depth and teased out the rules governing behavior. Goffman's research provides an excellent example of qualitative field research as a source of grounded theory. Our earlier discussion of the Comfort Hypothesis and church involvement shows that qualitative field research is not the only method of observation appropriate to the development of inductive theory. Here's another detailed example to illustrate further the construction of inductive theory using quantitative methods.
An Example of Inductive Theory: Why Do People Smoke Marijuana? During the 1960s and 1970s, marijuana use on U.S. college campuses was a subject of considerable discussion in the popular press. Some people were troubled by marijuana's popularity; others welcomed it. What interests us here is why some students smoked marijuana and others didn't. A survey of students at the University of Hawaii by David Takeuchi (1974) provided the data to answer that question. At the time of the study, a huge number of explanations were being offered for drug use. People who opposed drug use, for example, often suggested that marijuana smokers were academic failures trying to avoid the rigors of college life. Those in favor of marijuana, on the other hand, often spoke of the search for new values: Marijuana smokers, they said, were people who had seen through the hypocrisy of middle-class values. Takeuchi's analysis of the data gathered from University of Hawaii students, however, did not support any of the explanations being offered. Those who reported smoking marijuana had essentially the same academic records as those who didn't smoke it. and both groups were equally involved in traditional "school spirit" activities. Both groups seemed to feel equally well integrated into campus life.
There were other differences between the groups, however: 1. Women were less likely than men to smoke marijuana. 2. Asian students (a large proportion of the student body) were less likely to smoke marijuana than non-Asians were. 3. Students living at home were less likely to smoke marijuana than those living in apartments were. As in the case of religiosity, the three variables independently affected the like1il100d of a student's smoking marijuana. About 10 percent of the Asian women living at home had smoked marijuana, in contrast to about 80 percent of the non-Asian men living in apartments. And, as in the religiosity study, the researchers discovered a powerful pattern of drug use before they had an explanation for that pattern. In this instance, the explanation took a peculiar turn. Instead of explaining why some students smoked marijuana, the researchers explained why some didn't. Assuming that all students had some motivation for trying drugs, the researchers suggested that students differed in the degree of "social constraints" preventing them from follovving through on that motivation. U.S. society is, on the whole, more permissive with men than with women when it comes to deviant behavior. Consider, for example, a group of men getting drunk and boisterous. We tend to dismiss such behavior with references to "camaraderie" and "having a good time," whereas a group of women behaving similarly would probably be regarded with great disapprovaL We have an idiom, "Boys vvill be boys," but no comparable idiom for girls. The researchers reasoned, therefore, that women would have more to lose by smoking marijuana than men would. In other words, being female provided a constraint against smoking marijuana. Students living at home had obvious constraints against smoking marijuana, compared with students living on their own. Quite aside from differences in opportunity, those living at home were
55
seen as being more dependent on their parentshence more vulnerable to additional punishment for breaking the law. Finally, the Asian subculture in Hawaii has traditionally placed a higher premium on obedience to the law than other subcultures have, so Asian students would have more to lose if they were caught violating the law by smoking marijuana. OveralL then, a "social constraints" theory was offered as the explanation for observed differences in the likelihood of smoking marijuana. The more constraints a student had, the less likely he or she would be to smoke marijuana. It bears repeating that the researchers had no thoughts about such a theory when their research began. The theory came from an examination of the data.
The links between Theory and Research Throughout this chapter, we have seen various aspects of the links between theory and research in social scientific inquiry. In the deductive model. research is used to test theories. In the inductive modeL theories are developed from the analysis of research data. This final section looks more closely into the ways theory and research are related in actual social scientific inquiry. Whereas we have discussed two idealized logical models for linking theory and research, social scientific inquiries have developed a great many variations on these themes. Sometimes theoretical issues are introduced merely as a background for empirical analyses. Other studies cite selected empirical data to bolster theoretical arguments. In neither case is there really an interaction between theory and research for the purpose of developing new explanations. Some studies make no use of theory at all, aiming specifically. for example, at an ethnographic descTiption of a particular social situation, such as an anthropological account of food and dress in a particular society. As you read social research reports, however, you'll often find that the authors are conscious of the implications of their research for social theories
56
C:hapter 2: Paradigms, Theory, and Social Research
and vice versa_ Here are a few examples to illustrate this point. When W. Lawrence Neuman (1998) set out to examine the problem of monopolies (the "trust problem") in u.s. history, he saw the relevance of theories about how social movements transform society ("state transformation"). He became convinced, however, that existing theories were inadequate for the task before him: State transformation theory links social movements to state policy formation processes by focussing on the role of cultural meaning in organized political struggles. Despite a resemblance among concepts and concerns, constructionist ideas found in the social problems, social movements, and symbolic politics literatures have not been incorporated into the theory. In this paper, I draw on these three literatures to enhance state transformation theory. (Neuman 1998: 315)
Having thus modified state transformation theory, Neuman had a theoretical tool that could guide his inquiry and analysis into the political maneuverings related to monopolies beginning in the 1880s and continuing until World War I. Thus, theory served as a resource for research and at the same time was modified by it. In a somewhat similar study, A1emseghed Kebede and 1.. David Knottnerus (1998) set out to investigate the rise of Rastafarianism in the Caribbean . However, they felt that recent theories on social movements had become too positivistic in focusing on the mobilization of resources. Resource mobilization theory, they felt, downplays the motivation, perceptions, and behavior of movement participants ... and concentrates instead on the whys and hows of mobilization. Typically theoretical and research problems include: How do emerging movement organizations seek to mobilize and routinize the flow of resources and how does the existing political apparatus affect the organization of resources? (1998. 500)
To study Rastafarianism more appropriately, the researchers felt the need to include several
Key Tenns
concepts from contemporary social psychology. In particular, they sought models to use in dealing with problems of meaning and collective thought. Frederika Schmitt and Patricia Martin (1999) were particularly interested in discovering what made for successful rape crisis centers and how they dealt \vith the organizational and political environments within which they operated. The researchers found theoretical constructs appropriate to their inquiry: This case study of unobtrusive mobilizing by Southern California Rape Crisis Center uses archivaL observational, and interview data to explore how a feminist organization worked to change police, schools, prosecutor, and some state and national organizations from 1974 to 1994. Mansbridge's concept of street theory and Katzenstein's concepts of unobtrusive mobilization and discursive politics guide the analysis .
(theories about large-scale features of society) versus microtheory (theories about smaller units or features of society). e
The positivistic paradigm assumes that we can scientifically discover the rules governing social life.
e
The Social Darwinist paradigm sees a progressive evolution in social life.
e
The conflict paradigm focuses on the attempt of individuals and groups to dominate others and to avoid being dominated.
e
e
In summary, there is no simple recipe for conducting social science research. It is far more openended than the traditional view of science suggests_ Ultimately, science depends on two categories of activity: logic and observation. As you'll see throughout this book, they can be fit together in many patterns.
The symbolic interactionist paradigm examines how shared meanings and social patterns develop in the course of social interactions.
Social scientific theory and research are linked through the two logical methods of deduction (the derivation of expectations and hypotheses from theories) and induction (the development of generalizations from specific observations) .
e
In practice, science is a process involving an al-
Ethnomethodology focuses on the ways people make sense out of social life in the process of living it, as though each were a researcher engaged in an inquiry.
Deductive Theory Construction
The structural functionalist (or social systems) paradigm seeks to discover what functions the many elements of society perform for the whole system.
e
Feminist paradigms, in addition to drawing attention to the oppression of women in most societies, highlight how previous images of social reality have often come from and reinforced the eX'Periences of men.
e
MAIN POINTS e
Introduction e Theories function in three ways in research: (1) helping to avoid flukes, (2) making sense of observed patterns, and (3) shaping and directing research efforts.
Like feminist paradigms, critical race theory both examines the disadvantaged position of a social group (African Americans) and offers a different vantage point from which to view and understand society. Some contemporary theorists and researchers have challenged the long-standing belief in an objective reality that abides by rational rules. They point out that it is possible to agree on an "intersubjective" reality.
Elements of Social Theory Some Social Science Paradigms e Social scientists use a variety of paradigms to organize how they understand and inquire into social life. e A distinction between types of theories that cuts across various paradigms is macrotheory
e
Two Logical Systems Revisited e In the traditional image of science, scientists proceed from theory to operationalization to observation_ But this image is not an accurate picture of how scientific research is actually done_ e
e
(1999 364)
57
The elements of social theory include observations, facts, and laws (which relate to the reality being observed), as well as concepts, variables, axioms or postulates, propositions, and hypotheses (which are logical building blocks of the theory itself).
ternation of deduction and induction.
e
Guillermina Jasso's theory of distributive justice illustrates how formal reasoning can lead to a variety of theoretical expectations that can be tested by observation_
Inductive Theory Construction e
David Takeuchi's study of factors influencing marijuana smoking among University of Hawaii students illustrates how collecting observations can lead to generalizations and an explanatory theory.
The Links between Theory and Research e In practice, there are many possible links between theory and research and many ways of going about social inquiry_
KEY TERMS
The following terms are defined in context in the chapter and at the bottom of the page where the term is introduced, as well as in the comprehensive glossary at the back of the book. hypothesis interest convergence macrotheory microtheory
null hypothesis operational definition operationalization paradigm
58
REVIEW QUESTIONS AND EXERCISES 1.
Online Study Resources
Chapter 2: Paradigms, Theory, and Social Research
Consider the possible relationship between education and prejudice that was mentioned in Chapter 1. Describe how you might examine that relationship through (a) deductive and (b) inductive methods,
2, Review the relationships between theory and research discussed in this chapter. Select a research article from an academic journal and classify the relationship between theory and research you find there, 3, Using one of the many search engines (such as Google, Excite, HotBot, Ask Jeeves, LookSmart, Lycos, Netscape, Web Crawler, or Yahoo). find information on the web concerning at least three of the following paradigms, Give the web locations and report on the theorists discussed in connection with the discussions you found, conflict theory critical race theory exchange theory ethnomethodology feminism
functionalism interactionism positivism postmodernism
4, Using InfoTrac College Edition (Article A67051613) or the library, locate Judith k Howard (2000), "Social Psychology of Identities," Annual Review of Sociology 26: 367-93, What paradigm does she find most useful for the study of social identities? Explain why she feels that it is the appropriate paradigm, Do you agree? Why or why not?
ADDITIONAL READINGS Chafetz, Janet. 1978, A Primer on [he Constrllction and Testing of Theories in Sociology, Itasca, IL: Peacock In one of the few books on theory construction written expressly for undergraduates, Chafetz provides a rudimentary understanding of the philosophy of science through simple language and everyday examples, She describes the nature of explanation, the role of assumptions and concepts, and the building and testing of theories. Delgado, Richard, and Jean Stefancic 2001, Critical Race Theory. An Introduction, New York: New York University Press, This is a good introduction to this alternative paradigm for viewing racial and
ethnic issues, presenting key concepts and findings, Denzin, Norman Ie, and Yvonna S. Lincoln, 1994. Handbook of Qualirarive Research, Newbury Park, CA: Sage, Various authors discuss the process of qualitative research from the perspective of various paradigms, showing how they influence the nature of inquiry, The editors also critique positivism from a postmodern perspective, DeVault, Marjorie L 1999. Liberatillg Method Feminism and Social Research, Philadelphia: Temple University Press, This book elaborates on some of the methods associated with the feminist paradigm and is committed to both rigorous inquiry and the use of social research to combat oppression, Harvey, David. 1990, The COlldition of Postmodernity: An Enquiry into the Origins of Cultllral Change. Cambridge, MA: BlackwelL Here's a wideranging analysis of the history and meaning of postmodernism, linking political and historical factors to experiences of time and space. Kuhn, Thomas, 1970, The Stmcture of Scientific Revolutions. Chicago: University of Chicago Press. In this exciting and innovative recasting of the nature of scientific development, Kuhn disputes the notion of gradual change and modification in science, arguing instead that established paradigms tend to persist until the weight of contradictory evidence brings about their rejection and replacement by new paradigms. This shorr book is at once stimulating and informative. Lofland, John, and Lyn R Lofland . 1995. Allalyzing Social Sertings A Guide ra Qualitative Observation and Analysis, 3rd ed, Belmont, CA: Wadsworth. An excellent text on how to conduct qualitative inquiry with an eye toward discovering the rules of social life. Includes a critique of postmodernism, McGrane, Bernard, 1994, The Ull-TV and 10 mph Car.: Experime/lts in Personal Freedom and Evelyday Life. Fort Bragg, CA: Small Press, Some excellent and imaginative examples of an ethnomethodological approach to society and to the craft of sociology. The book is useful for both students and faculty, Reinharz, Shulamit 1992. Femillist Methods in Social Research. New York: Oxford University Press. This book explores several social research techniques (such as intervie'wing, experiments, and content analysis) from a feminist perspective.
Ritzer, George. 1988, Sociological The01Y, New York: Knopf. This is an excellent overview of the major theoretical traditions in sociology, Rosenau, Pauline Marie. 1992, Post-Modernism and [he Social Sciences: IllSights, Inroads, and Intrusions., Princeton, NJ: Princeton University Press. Regarded as a modern classic, this book examines some of the main variations on postmodernism and shows how they have impacted different realms of society. Turner, Jonathan H., ed, 1989. Theory Building in Sociology Assessing Theoretical Cumulation. Newbury Park, CA: Sage. This collection of essays on sociological theory construction focuses specifically on the question posed by Turner'S introductory chapter, "Can Sociology Be a Cumulative Science?" Turner, Stephen Park, and Jonathan R Turner. 1990. The Impossible Science.: An Institutional Analysis of American Sociology, Newbury Park, CA: Sage, Two authors bring two very different points of view to the history of U.S. sociologists' attempt to establish a science of society.
SPSS EXERCISES See the booklet that accompanies your text for exercises using SPSS (Statistical Package for the Social Sciences), There are exercises offered for each chapter, and you'll also find a detailed primer on using SPSS.
Online Study Resources Sociology~Now;": Research Methods 1. Before you do your final review of the chapter, take the SociologyNow: Research Methods diagnostic quiz to help identify the areas on which you
59
should concentrate, You'll find information on this online tooL as well as instructions on how to access all of its great resources, in the front of the book. 2. As you review, take advantage of the Sociology Now Research Methods customized study plan, based on your quiz results, Use this study plan with its interactive exercises and other resources to master the material. 3. When you're finished with your review, take the posttest to confirm that you're ready to move on to the next chapter .
WEBSITE FOR THE PRACTICE OF SOCIAL RESEARCH 11 TH EDITION Go to your book's website at http://sociology .wadsworth.com/babbie_practicelle for tools to aid you in studying for your exams, You'll find Tutorial Quizzes with feedback, Intemet E'(ercises, Flashcards, and Chapter Turarials, as well as Extended Projects, InfoTrac College Edition search terms, Social Research in Cyberspace, GSS Data, Web Links, and primers for using various data-analysis software such as SPSS and NVivo.
WEB LINKS FOR THIS CHAPTER Please realize that the Internet is an evolving entity, subject to change, Nevertheless, these few websites should be fairly stable. Also, check your book's website for even more Web Links. Dead Sociologists' Homepage http://staff.uwsuper.edu/hps/mball/dead_soc.htm WWW Virtual Library: Sociology, Sociological Theory and Theorists http://VVlvw.mcmaster.ca/socscidocs/w3virtsoclib/ theories.htm SociologyOnline Gallery http://www.sociologyonline.co. uk/
Introduction
Introduction
The Ethics and Politics of Social Research
Introduction Ethical Issues in Social Research Voluntary Participation No Harm to the Participants Anonymity and Confidentiality Deception Analysis and Reporting Institutional Review Boards Professional Codes of Ethics
Sociology~Now'M:
Two Ethical Controversies Trouble in the Tearoom Observing Human Obedience The Politics of Social Research Objectivity and Ideology Politics with a Little "p" Politics in Perspective
Research Methods
Use this online tool to help you make the grade on your next exam. Aiter reading tlLis chapter, go to the "Online Study Resources" at the end of the chapter for instructions on how to benefit from SodologyNow: Research Merlzods.
My purpose in this book is to present a realistic and useful introduction to doing social research. For this introduction to be fully realistic, it must indude four main constraints on research projects: scientific, administrative, ethicaL and political. Most of the book focuses on scientific and administrative constraints. We'll see that the logic of science suggests certain research procedures, but we'll also see that some scientifically "perfect" study designs are not administratively feasible, because they would be too expensive or take too long to execute. Throughout the book, therefore, we'll deal with workable compromises. Before we get to the scientific and administra· tive constraints on research, it's useful to explore the two other important considerations in doing research in the real world-ethics and politics-that this chapter covers. Just as certain procedures are too impractical to use, others are either ethically prohibitive or politically difficult or impossible. Here's a story to illustrate what I mean. Several years ago, I was invited to sit in on a planning session to design a study of legal education in California. The joint project was to be conducted by a university research center and the state bar association. The purpose of the project was to improve legal education by learning which aspects of the law school experience were related to success on the bar exam. Essentially, the plan was to prepare a questionnaire that would get detailed information about the law school experiences of individuals. People would be required to answer the questionnaire when they took the bar exam. By analyzing how people with different kinds of law school experiences did on the bar exam, we could find out what sorts of things worked and what didn't. The findings of the research could be made available to law schools, and ultimately legal education could be improved. The exciting thing about collaborating with the bar association was that all the normally irritating lOgistical hassles would be handled. There would be no problem getting permission to administer
61
questionnaires in conjunction with the exam, for example, and the problem of nonresponse could be eliminated altogether. I left the meetirlg excited about the prospects for the study. When I told a colleague about it, I glowed about the absolute handling of the nonresponse problem. Her immediate comment turned everything around completely. "That's unethicaL There's no law requiring the questionnaire, and participation in research has to be voluntary." The study wasn't done. In retelling this story, I can easily see that requiring participation would have been inappropriate. You may have seen this even before I told you about my colleague's comment. I still feel a little embarrassed over the matter, but I have a specific purpose in telling this story about myself. All of us consider ourselves ethical-not perfect perhaps, but as ethical as anyone else and perhaps more so than most. The problem in social research, as probably in life, is that ethical considerations are not always apparent to us. As a result, we often plunge into things without seeing ethical issues that may be apparent to others and may even be obvious to us when pointed out. When I reported back to the others in the planning group, for example, no one disagreed with the inappropriateness of requiring participation. Everyone was a bit embarrassed about not having seen it. Any of us can immediately see that a study requiring small children to be tortured is unethical. I know you'd speak out immediately if I suggested that we interview people about their sex lives and then publish what they said in the local newspaper. But, as ethical as you are, you'll totally miss the ethical issues in some other situations-we all do. The first half of this chapter deals with the ethics of social research. In part, it presents some of the broadly agreed-on norms describing what's ethical in research and what's not. More important than simply knowing the guidelines, however, is becoming sensitized to the ethical component in research so that you'll look for it whenever you plan a study. Even when the ethical aspects of a situation are debatable, you should know that there's
Ethical Issues in Social Research 62
63
Chapter 3: The Ethics and Politics of Social Research
something to argue about. It's worth noting in this context that many professions operate under ethical constraints and that these constraints differ from one profession to another. Thus, priests, physicians, lawyers, reporters, and television producers operate under different ethical constraints. In this chapter, we'll look only at the ethical principles that govern social research. Political considerations in research are also subtle, ambiguous, and arguable. Notice that the law school example involves politics as well as ethics. Although social researchers have an ethical norm that participation in research should be voluntary. this norm clearly grows out of US. political norms protecting civil liberties, In some nations, the proposed study would have been considered quite ethical. In the second half of this chapter. we'll look at social research projects that were crushed or nearly crushed by political considerations. As vvith ethical concerns, there is often no "correct" take on a given situation. People of goodwill disagree. I won't try to give you a party line about what is and is not politically acceptable. As with ethics, the point is to become sensitive to the political dimension of social research.
Ethical Issues in Social Research In most dictionaries and in common usage, ethics is typically associated with morality, and both words concern matters of right and wrong. But what is right and what wrong? What is the source of the distinction? For individuals the sources vary. They may be religions, political ideologies, or the pragmatic observation of what seems to work and what doesn't. Webster's New World Dictiol1a7Y is typical among dictionaries in defining ethical as "conforming to the standards of conduct of a given profession or group." Although this definition may frustrate those in search of moral absolutes, what we regard as morality and ethics in day-to-day life is a matter of agreement among members of a group. And, not
surprisingly. different groups have agreed on different codes of conduct. Part of living successfully in a particular society is knowing what that society considers ethical and unethicaL The same holds true for the social research community. Anyone involved in social scientific research, then, needs to be aware of the general agreements shared by researchers about what is proper and improper in the conduct of scientific inquiry. This section summarizes some of the most important ethical agreements that prevail in social research.
Voluntary Participation Often, though not always, social research represents an intrusion into people's lives. The interviewer's knock on the door or the arrival of a questiorUlaire in the mail signals the beginning of an activity that the respondent has not requested and that may require significant time and energy. Participation in a social experin1ent disrupts the subject's regular activities. Social research, moreover. often requires that people reveal personal information about themselves-information that may be unknown to their friends and associates. And social research often requires that such information be revealed to strangers. Other professionals, such as physicians and lavvyers, also ask for such information. Their requests may be justified, however. by their aims: They need the information in order to serve the personal interests of the respondent. Social researchers can seldom make this claim. Like medical scientists, they can only argue that the research effort may ultimately help all humanity, A major tenet of medical research ethics is that experimental participation must be voluntary. The same norm applies to social research. No one should be forced to participate. This norm is far easier to accept in theory than to apply in practice, however. Again, medical research provides a useful parallel. Many experimental drugs used to be tested on prisoners. In the most rigorously ethical cases, the prisoners were told the nature and the possible dangers of the experin1ent, they were told that participation was completely voluntary, and they were
further instructed that they could expect no special rewards-such as early parole- for participation. Even under these conditions, it was often clear that volunteers were motivated by the belief that they would personally benefit from their cooperation. When the instructor in an introductory sociology class asks students to fill out a questionl1aire that he or she hopes to analyze and publish, students should always be told that their participation in the survey is completely voluntary. Even so, most students will fear that non participation will somehow affect their grade. The instructor should therefore be especially sensitive to such implications and make special provisions to eliminate them. For example, the instructor could insure anonymity by leaving the room while the questionnaires are being completed. Or, students could be asked to return the questionnaires by mail or to drop them in a box near the door just before the next course meeting. This norm of voluntary participation, though, goes directly against several scientific concerns. In the most general terms, the scientific goal of generalizability is threatened if e;perimental subjects or survey respondents are all the kind of people who willingly participate in such things. Because this orientation probably reflects more general personality traits, the results of the research mioht not be 0>oen-' 0> eralizable to all people. Most clearly, in the case of a descriptive survey, a researcher cannot generalize the sample survey findings to an entire population unless a substantial majority of the scientifically selected sample actually participates-the willing respondents and the somewhat unwilling. As you'll see in Chapter 10, field research has its own ethical dilemmas in this regard. Very often the researcher cannot even reveal that a study is being done, for fear that that revelation might significantly affect the social processes being studied. Clearly, the subjects of study in such cases are not given the opportunity to volunteer or refuse to participate . Though the norm of voluntary participation is important, it is often impossible to follow. In cases where you feel ultimately justified in violating it, your observing the other ethical norms of scientific research, such as bringing no harm to the people under study, becomes all the more important.
No Harm to the Participants Social research should never injure the people being studied, regardless of whether they volunteer for the study. Perhaps the clearest instance of this norm in practice concerns the revealing of information that would embarrass subjects or endanger their home lives, friendships, jobs, and so forth. We'll discuss this aspect of the norm more fully in a moment. Because subjects can be harmed psychologically in the course of a social research study, the researcher must look for the subtlest dangers and guard against them. Quite often, research subjects are asked to reveal deviant behavior. attitudes they feel are unpopUlar, or personal characteristics that may seem demeaning, such as low income, the receipt of welfare payments, and the like. Revealing such information usually makes subjects feel at least uncomfortable. Social research projects may also force participants to face aspects of themselves that they don't normally consider. This can happen even when the information is not revealed directly to the researcher. In retrospect, a certain past behavior may appear unjust or immoraL The project, then, can cause continuing personal agony for the subject. If the study concerns codes of ethical conduct, for example, the subject may begin questioning his or her own morality, and that personal concern may last long after the research has been completed and reported. For instance, probing questions can injure a fragile self-esteem. It should be apparent from these observations that just about any research you might conduct runs the risk of injuring other people in some way. It isn't possible to insure against all these possible injuries, but some study designs make such injuries more likely than others do. If a particular research procedure seems likely to produce unpleasant effects for subjects-asking survey respondents to report deviant behavior, for example-the researcher should have the firmest of scientific grounds for doing it If your research design is essential and also likely to be unpleasant for subjects, you'll find yourself in an ethical netherworld and may go through some personal agonizing. Although
64
Chapter 3: The Ethics and Politics of Social Research
agonizing has little value in itself, it may be a healthy sign that you've become sensitive to the problem. Increasingly, the ethical norms of voluntary participation and no harm to participants have become formalized in the concept of informed consent. This norm means that subjects must base their voluntary participation in research projects on a full understanding of the possible risks involved. In a medical experiment, for example, prospective subjects are presented \vith a discussion of the experiment and all the possible risks to themselves. They are required to sign a statement indicating that they are aware of the risks and that they choose to participate anyway. Although the value of such a procedure is obvious when subjects \viII be injected \vith drugs designed to produce physical effects, for example, it's hardly appropriate when a participant observer rushes to the scene of urban rioting to study deviant behavioL Whereas the researcher in this latter case must still bring no harm to those observed, gaining informed consent is not the means to achieving that end. Altl10ugh the fact often goes unrecognized, another possible source of harm to subjects lies in the analysis and reporting of data. Every now and then, research subjects read the books published about the studies they participated in. Reasonably sophisticated subjects can locate themselves in the various indexes and tables. Having done so, they may find themselves characterized-though not identified by name-as bigoted, unpatriotic, irreligious, and so forth. At the very least. such characterizations are likely to trouble them and threaten their self-images. Yet the whole purpose of the research project may be to explain why some people are prejudiced and others are not
informed consent A norm in which subjects base their voluntary participation in research projects on a full understanding of the possible risks involved. anonymity Anonymity is guaranteed in a research project when neither the researchers nor the readers of the findings can identify a given response with a given respondent,
Ethical Issues in Social Research
In one survey of churchwomen (Babbie 1967), ministers in a sample of churches were asked to distribute questionnaires to a specified sample of members, collect them, and return them to the research office. One of these ministers read through the questionnaires from his sample before returning them, and then he delivered a hellfire and brimstone sermon to his congregation, saying that many of them were atheists and were going to helL Even though he could not identify the people who gave particular responses, many respondents certainly endured personal harm from his tirade. Like voluntary participation, avoiding harm to people is easy in theory but often difficult in practice. Sensitivity to the issue and experience \vith its applications, however, should improve the researcher'S tact in delicate areas of research. In recent years, social researchers have been gaining support for abiding by this norm. Federal and other funding agencies typically require an independent evaluation of the treatment of human subjects for research proposals, and most universities now have human-subject committees to serve this evaluative function. Although sometin1es troublesome and inappropriately applied, such requirements not only guard against unethical research but also can reveal ethical issues overlooked by even the most scrupulous researchers.
Anonymity and Confidentiality The clearest concern in the protection of the subjects' interests and well-being is the protection of their identity, especially in survey research. If revealing their survey responses would injure them in any way, adherence to this norm becomes all the more important Two techniques-anonymity and confidentiality-assist researchers in this regard, although people often confuse the two.
Anonymity A research project guarantees anonymity when the researcher-not just the people who read about the research-cannot identify a given response \vith a given respondent. This implies that a typical interview survey respondent can never be consid-
ered anonymous, because an interviewer collects the information from an identifiable respondent. An example of anonymity is a mail survey in which no identification numbers are put on the questionnaires before their return to the research office. As we'll see in Chapter 9 (on survey research), assuring anonymity makes keeping track of who has or hasn't returned the questionnaires difficult. Despite this problem, paying the necessary price is advisable in certain situations. For example, in one study of drug use among university students, I decided that I specifically did not want to know the identity of respondents. I felt that honestly assuring anonymity would increase the likelihood and accuracy of responses. Also, I did not want to be in the position of being asked by authorities for the names of drug offenders. In the few instances in which respondents volunteered their names, such information was immediately obliterated on the questionnaires.
Confidentiality A research project guarantees confidentiality when the researcher can identify a given person's responses but essentially promises not to do so pUblicly. In an interview survey, for example, the researcher could make public the income reported by a given respondent, but the respondent is assured that this \vill not be done. Whenever a research project is confidential rather than anonymous, it is the researcl1er's responsibility to make that fact clear to the respondent. Moreover, researchers should never use the term anonymous to mean confidential. With few exceptions (such as surveys of public figures who agree to have their responses published), the information respondents give must at least be kept confidentiaL This is not always an easy norm to follow, because for example the courts have not recognized social research data as the kind of "privileged communication" priests and attorneys have. This unprotected guarantee of confidentiality produced a near disaster in 1991. Two years earlier, the Exxon Valdez supertanker had run aground near the port of Valdez in Alaska, spilling ten mil-
65
lion gallons of oil into the bay. The economic and environmental damage was \videly reported. The media paid less attention to the psycl10logical and sociological damage suffered by residents of the area. There were anecdotal reports of increased alcoholism, family violence, and other secondary consequences of the disruptions caused by the oil spilL Eventually, 22 communities in Prince William Sound and the Gulf of Alaska sued Exxon for me economic, social, and psychological damages suffered by their residents. To determine the amount of damage done, the communities conunissioned a San Diego research firm to undertake a household survey asking residents very personal questions about increased problems in their families. The sample of residents were asked to reveal painful and embarrassing information, under the guarantee of absolute confidentiality Ultimately, the results of the survey confirmed that a variety of personal and family problems had increased substantially following the oil spill. When Exxon learned that survey data would be presented to document the suffering, they took an unusual step: They asked the court to subpoena the survey questionnaires. The court granted the request and ordered the researchers to turn over the questionnaires-\vith all identifying information. It appeared that Exxon's intention was to call survey respondents to the stand and cross-examine them regarding answers they had given to interviewers under the guarantee of confidentiality. Moreover, many of the respondents were Native Americans, whose cultural norms made such public revelations all the more painful. Happily, the Exxon Valdez case was settled before the court decided whether it would force survey respondents to testify in open court. Unhappily, the potential for disaster remains. For more information on this ecological disaster, see Picou, Gill, and Cohen (1999).
confidentiality A research project guarantees confidentiality when the researcher can identify a given person's responses but promises not to do so pUblicly.
66
Chapter 3: The Ethics and Politics of Social Research
The seriousness of tlus issue is not limited to established research firms" Rik Scarce was a graduate student at Waslungton State University when he undertook participant observation among animalrights activists" In 1990 he published a book based on Ius research: Ecowarriors: Understanding the Radical Environmental MOVemel1L In 1993, Scarce was called before a grand jury and asked to identify the activists he had studied. In keeping with the norm of confidentiality, the young researcher refused to answer the grand jury's questions and spent 159 days in the Spokane County jail" He reports, Although I answered many of the prosecutor's questions, on 32 occasions I refused to answer, saying, "Your question calls for information that I have only by virtue of a confidential disclosure given to me in the course of my research activities" I cannot answer the question without actually breaching a confidential communication" Consequently, I decline to answer the question under my etlucal obligations as a member of the American Sociological Association and pursuant to any privilege that may extend to journalists, researchers, and writers under the First Amendment." (Scarce 1999: 982)
At the time of his grand jury appearance and his incarceration, Scarce felt that the American Sociological Association (ASA) code of ethics strongly supported his etlucal stand, and the ASA filed a friend of the court brief on his behalf" In 1997, the ASA revised its code and, while still upholding the norm of confidentiality, warned researchers to inform themselves regarding laws and rules that may linlit their ability to promise confidentiality to research subjects. You can use several techniques to guard against such dangers and ensure better performance on the guarantee of confidentiality To begin, interviewers and others with access to respondent identifications should be trained in their ethical responsibilities" Beyond training, the most fundamental technique is to remove identifying information as soon as it's no longer necessary. In a survey, for example, all names and addresses should be
Ethical Issues in Social Research
removed from questionnaires and replaced by identification numbers" An identification file should be created that links numbers to names to pernlit the later correction of nlissing or contradictory information, but this file should not be available except for legitimate purposes" Sinlilarly, in an interview survey you may need to identify respondents initially so that you can recontact them to verify that the interview was conducted and perhaps to get information that was missing in the original interview. As soon as you've verified an interview and assured yourself that you don't need any further information from the respondent, however, you can safely remove all identifying information from the interview booklet. Often, interview booklets are printed so that the first page contains all the identifiers-it can be torn off once the respondent's identification is no longer needed" J. Steven Picou (1996a, 1996b) points out that even removing identifiers from data files does not always sufficiently protect respondent confidentiality, a lesson he learned during nearly a year in federal court A careful examination of all the responses of a particular respondent sometimes allows others to deduce that person's identity. Imagine, for example, that someone said he or she was a former employee of a particular company. Knovving the person's gender, age, ethnicity, and other characteristics could make it possible for the company to identify that person" Even if you intend to remove all identifying information, suppose you have not yet done so. What do you do when the police or a judge orders you to provide the responses given by your research subjects? In this, as all other aspects of research ethics, professional researchers avoid settling for mere rote compliance with established ethical rules" Rather, tlley continually ask what actions would be most appropriate in protecting the interests of those being studied" Here's the way Penny Becker (1998: 452) addressed the issue of confidentiality in connection with a qualitative research project studying religious life in a community: Following the lead of several recent studies, I identify the real name of the community, Oak
Park, rather than reducing the complexity of the community'S history to a few underlying dimensions or creating an "insider/outsider" dynamic where some small group of fellow researchers knows the community's real name and the rest of the world is kept in the dark. " . In all cases individual identities are disguised, except for Jack Firmey, the Lutheran pastor, who gave pernlission to be identified. "City Baptist" is a pseudonym used at the request of the church's leadership" The leaders of Good Shepherd Lutheran Church (GSLC) gave permission to use the church's real name"
Deception We've seen that the handling of subjects' identities is an important ethical consideration" Handling your ovvn identity as a researcher can also be tricky" Sometimes it's useful and even necessary to identify yourself as a researcher to those you want to study" You'd have to be an experienced con artist to get people to participate in a laboratory experiment or complete a lengthy questionnaire without letting on that you were conducting research. Even when you must conceal your research identity; you need to consider the follOwing" Because deceiving people is unethical, deception 'within social research needs to be justified by compelling scientific or adnlinistrative concerns" Even then, the justification will be arguable" Sometimes researchers adnlit that they're doing research but fudge about why they're doing it or for whom" Suppose you've been asked by a public welfare agency to conduct a study of living standards among aid recipients" Even if the agency is looking for ways of improving conditions, the recipient-subjects are likely to fear a witcll hunt for "cheaters"" They nlight be tempted, therefore, to give answers that make them seem more destitute than they really are. Unless they provide truthful answers, however, the study will not produce accurate data that will contribute to an improvement of living conditions. What do you do? One solution would be to tell subjects that you're conducting the study as part of a university research program-concealing your affiliation with
67
the welfare agency" Although doing that improves the scientific quality of the study, it raises serious ethical questions" Lying about research purposes is common in laboratory experiments" Although it's difficult to conceal that you're conducting research, it's usually simple-and sometimes appropriate-to conceal your purpose. Many experiments in social psychology, for example, test the extent to which subjects will abandon the evidence of their own observations in favor of the views eXlJressed by others. Recall Figure 2-1 (p" 40), which shows tlle stimulus from the classic Asch experiment-frequently replicated by psychology classes-in which subjects are shown three lines of differing lengths (A, B, and C) and asked to compare them with a fourth line (X)" Subjects are tllen asked, "Whidl of the first three lines is the same length as the fourth?" You'd probably find it a fairly simple task to identify "B" as the correct answer. Your job would be complicated, however, by the fact that several other "subjects" sitting beside you all agree that A is the same length as Xl In reality, of course, the others in the experiment are all confederates of the researcher, told to agree on the wrong answer. As we saw in Chapter 2, the purpose of the experiment is to see whether you'd give up your ovvnjudgment in favor of the group agreement I think you can see that conformity is a useful phenomenon to study and understand, and it couldn't be studied eXlJerimentally without deceiving the subjects" We'll examine a similar situation in the discussion of a famous experiment by Stanley Milgram later in this chapter. The question is, how do we get around the ethical issue that deception is necessary for an eXlJeriment to work? One appropriate solution researchers have found is to debrief subjects follovving an experiment Debriefing entails interviews to discover any problems generated by the research experience so that those problems can be corrected" Even
debriefing Interviewing subjects to learn about their eXllerience of participation in the projecL Especially important if there's a possibility that they have been damaged by that participation.
68
Chapter 3; The Ethics and Politics of Social Research
Department of Sociology !/Iinois State University
hen studying any form of human behavior, ethical concerns are paramountThis statement may be even truer for studies of human sexuality because of the topic's highly personal,salient,and perhaps threatening nature. Concern has been expressed by the public and by legislators about human sexuality research Three commonly discussed ethical criteria have been related specifically to research in the area of human sexuality Informed Consent This criterion emphasizes the importance of both accurately informing your subject or respondent as to the nature of the research and obtaining his or her verbal orwrinen consentto participate Coercion is notto be used to force participation,and subjects mayterminate their involvement in the research at any time.There are many possible violations of this standard Misrepresentation or deception may be used when describing an embarrassing or personal topic of study, because the researchers fear high rates of refusal or false dataCovert research,such as some observational studies,also violate the informed consent standard since subjects are unaware that they are being studied. Informed consent may create special problems with certain populationsJor example,studies of the sexuality of children are limited by the concern that children may be cognitively and emotionally unable to give informed consent. Although there can be problems such as those discussed,rnost research is clearly voluntary, with informed consent from those participating. Right to Privacy Given the highly personal nature of sexuality and society's tremendous concern with social control of sexuality, the right to privacy is avery important ethical concern for research in this area Individuals may risk lOSing their jobs, having family difficulties, or
though subjects can't be told the true purpose of the study prior to their participation in it, there's usually no reason they can't know aftervvard . Telling them the truth afterward may make up for having to lie to them at the outset. This must be done vvith care. however, making sure the subjects aren't left with bad feelings or doubts about themselves based on their performance in the experiment. If this seems complicated, it's simply the price we pay for using other people's lives as the subject matter for our research.
Ethical Issues in Social Research
being ostracized by peers if certain facets of their sexual lives are revealedThis is especially true for individuals involved in sexual behavior categorized as deviant (such as transvestism) Violations of right to privacy occur when researchers identify members of certain groups they have studied, release or share an individual's data or responses, or covertly observe sexual behavioLln most cases, right to privacy is easily maintained by the researchers. In survey research, self-administered questionnaires can be anonymous and interviews can be kept conndentiaLin case and observational studies, the identity of the person or group studied can be disguised in any publications. In most research methods, analysis and reporting of data should be at the group or aggregate level. Protection from Harm Harm may include emotional or psychological distress, as well as physical harm. Potential for harm varies by research method; it is more likely in experimental studies where the researcher manipulates or does something to the subject than in observational or survey research. Emotional distress, however, is a possibility in all studies of human sexuality. Respondents may be asked questions that elicit anxiety, dredge up unpleasant memories, or cause them to evaluate themselves critically. Researchers can reduce the potential for such distress during astudy by using anonymous,self-administered questionnaires or well-trained interviewers and by wording sensitive questions carefully. All three of these ethical criteria are quite subjective. Violations are sometimes justified by arguing that risks to subjects are outweighed by benefits to societyThe Issue here, of course, is who makes that critical decision. Usually,such decisions are made by the researcher and often a screening comminee that deals with ethical concerns . Most creative researchers have been able to follow all three ethical guidelines and still do important research.
As a social researcher, then, you have many ethical obligations to the subjects in your studies. "Ethical Issues in Research on Human Sexuality" illustrates some of the ethical questions involved in a specific research area.
AnalysiS and Reporting In addition to their ethical obligations to subjects, researchers have ethical obligations to their colleagues in the scientific community. These obligations
concern the analysis of data and the way the results are reported. In any rigorous study, the researcher should be more familiar than anyone else with the study's technical limitations and failures. Researchers have an obligation to make such shortcomings known to their readers-even if admitting qualifications and mistakes makes them feel foolish. Negative findings, for example, should be reported if they are at all related to the analysis. There is an unfortunate myth in scientific reporting that only positive discoveries are worth reporting (journal editors are sometimes guilty of believing this as well). In science, however, it's often as important to know that two variables are not related as to know that they are. Similarly, researchers must avoid the temptation to save face by describing their findings as the product of a carefully preplanned analytical strategy when that is not the case. Many findings arrive uneAlJectedly-even though they may seem obvious in retrospect. So an interesting relationship was uncovered by accident-so what? Embroidering such situations with descriptions of fictitious hypotheses is dishonest. It also does a disservice to less experienced researchers by leading them into thinking that all Scientific inquiry is rigorously preplanned and organized. In general, science progresses through honesty and openness; ego defenses and deception retard it. Researchers can best serve their peers-and scientific discovery as a whole-by telling the truth about all the pitfalls and problems they've experienced in a particular line of inquiry. Perhaps they'll save others from the same problems.
Institutional Review Boards The issue of research ethics in studies involving humans is now also governed by federal law. Any agency (such as a university or a hospital) wishing to receive federal research support must establish an Institutional Review Eoard (IRE), a panel of faculty (and possibly others) who review all research proposals involving human subjects so that they can guarantee that the subjects' rights and interests will be protected. Although the law applies specifi-
69
cally to federally funded research, many universities apply the same standards and procedures to all research, including that funded by nonfederal sources and even research done at no cost, such as student projects. The chief responsibility of an IRE is to ensure that the risks faced by human participants in research are minimal. In some cases, the IRE may ask the researcher to revise the study design; in others, the IRE may refuse to approve a study. Where some minimal risks are deemed unavoidable, researchers are required to prepare an "informed consent" form that describes those risks clearly. Subjects may participate in the study only after they have read the statement and signed it as an indication that they know the risks and voluntarily accept them. Much of the impetus for establishing IREs had to do vvith medical experimentation on humans, and many social research study designs are generally regarded as exempt from IRE review. An example is an anonymous survey sent to a large sample of respondents. The guideline to be followed by IREs, as contained in the Federal Exemption Categories (45 CFR 46.101 [bl) exempts a variety of research situations: (1) Research conducted in established or commonly accepted educational settings, involving normal educational practices, such as (i) research on regular and special education instructional strategies, or (ii) research on the effectiveness of or the comparison among instructional techniques, curricula, or classroom management methods. (2) Research involving the use of educational tests (cognitive, diagnostic, aptitUde, achievement), survey procedures, interview procedures or observation of public behavior, unless: (i) information obtained is recorded in such a manner that human subjects can be identified, directly or through identifiers linked to the subjects; and (ii) any disclosure of the human subjects' responses outside the research could reasonably place the subjects at risk of criminal or civil liability or be damaging to the subjects' financial standing, employability, or reputation.
70
Two Etllical Controversies
Chapter 3: The Ethics and Politics of Social Research
(3) Research involving the use of educational tests (cognitive, diagnostic, aptitude, achievement), survey procedures, interview procedures, or observation of public behavior that is not exempt under paragraph (b) (2) of this section, if: (i) the human subjects are elected or appointed public officials or candidates for public office; or (ti) Federal statute(s) require(s) \vithout exception that the confidentiality of the personally identifiable information \vill be maintained throughout the research and thereafteL (4) Research involving the collection or study of existing data, documents, records, pathological specinlens, or diagnostic specinlens, if these sources are publicly available or if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects. (5) Research and demonstration projects which are conducted by or subject to the approval of Department or Agency heads, and which are designed to study, evaluate, or otherwise examine: (i) Public benefit or service programs; (ti) procedures for obtaining benefits or services under those programs; (iii) possible changes in or alternatives to those programs or procedures; or (iv) possible changes in methods or levels of payment for benefits or services under those programs. (6) Taste and food quality evaluation and consumer acceptance studies, (i) if wholesome foods \vithout additives are consumed or (ti) if a food is consumed that contains a food ingredient at or below the level and for a use found to be safe, or agricultural chemical or environmental contaminant at or below the level found to be safe, by the Food and Drug Administration or approved by the Environmental Protection Agency or the Food Safety and Inspection Service of the US. Department of Agricultureo Paragraph (2) of the excerpt exempts much of the social research described in this book, Nonetheless, universities sometimes apply the law's provi-
sions inappropriately. As chair of a university IRE, for example, I was once asked to review the letter of informed consent that was to be sent to medical insurance companies, requesting their agreement to participate in a survey that would ask which medical treatments were covered under their programs. Clearly the humans involved were not at risk in the sense anticipated by the lawo In a case like that, the appropriate technique for gaining informed consent is to mail the questionnaire. If a company returns it, they've consented. If they don't, they haven't. Other IREs have suggested that researchers need to obtain permission before observing participants in public gatherings and events, before conducting surveys on the most mundane matters, and so fortho Christopher Shea (2000) has chronicled several SUcll questionable applications of the law while supporting the ethical logic that originally prompted the law. Don't think that these critiques of IREs minimize the importance of protecting human subjects. Indeed, some universities exceed the federal requirements in reasonable and responsible ways: requiring IRE review of nonfederally funded projects, for example, Research ethics is an ever-evolving subject, because new research techniques often require revisiting old concerns, Thus, for example, the increased use of public databases for secondary research has caused some IRBs to worry whether they need to reexamine such projects as the General Social Survey every tinle a researcher proposes to use those data. (Most have decided this is unnecessary; see Skedsvold 2002 for a discussion of issues relating to public databaseSoj Sinlllarly, the prospects for research of and through the Internet has raised ethical concerns. In November 1999, the American Association for the Advancement of Science sponsored a workshop on tlus subject. The overall conclusion of the report produced by the workshop summarizes some of the primary concerns already examined in this chapter: The current ethical and legal framework for protecting human subjects rests on the principles of autonomy, beneficence, and justice.
The first principle, autonomy, requires that subjects be treated with respect as autonomous agents and affirms that those persons \vith diminished autonomy are entitled to special protection. In practice, this principle is reflected in the process of informed consent, in which the risks and benefits of the research are disclosed to the subject. The second principle, beneficence, involves maximizing possible benefits and good for the subject, while minimizing the amount of possible harm and risks resulting from the researcho Since the fruits of knowledge can corne at a cost to those participating in research, the last principle, justice, seeks a fair distribution of the burdens and benefits associated \'lith research, so that certain individuals or groups do not bear disproportionate risks while others reap the benefits. (Frankel and Siang 1999.2-3)
Professional Codes of Ethics Ethical issues in social research are both important and ambiguous. For this reason, most of the professional associations of social researchers have created and published formal codes of conduct describing what is considered acceptable and unacceptable professional behavior. As one example, Figure 3-1 presents the code of conduct of the American Association for Public Opinion Research (AAOPR), an interdisciplinary research association in the social sciences. Most professional associations have such codes of ethicso See, for example, the American Sociological Association, the American Psychological Association, the American Political Science Association, and so forth. You can find many of these on the associations' websites . In addition, the Association of Internet Researchers (AoIR) has a code of ethics accessible online.
Two Ethical Controversies As you may already have guessed, the adoption and publication of professional codes of conduct have not totally resolved the issue of research ethicso Social researchers still disagree on some
71
general principles, and those who agree in principe often debate specifics, This section briefly describes two research projects that have provoked ethical controversy and discussion. The first project studied homosexual behavior in public restrooms, and the second examined obedience in a laboratory settingo
Trouble in the Tearoom As a graduate student, Laud Humphreys became interested in the study of homosexual behavior. He developed a special interest in the casual and fleeting same-sex acts engaged in by some male nonhomosexuak In particular, his research interest focused on homosexual acts between strangers meeting in the public restrooms in parks, called "tearooms" among homosexualso The result was the publication in 1970 of Tearoom Trade. What particularly interested Humphreys about the tearoom activity was that the participants seemed otherwise to live conventional lives as "family men" and accepted members of the community. They did nothing else that might qualify them as homosexuals. Thus, it was important to them that they remain anonymous in their tearoom visits. How would you study something like that? Humphreys decided to take advantage of the social structure of the situation. Typically, the tearoom encounter involved three people: the two men actually engaging in the sexual act and a lookout, called the "watchqueen." Humphreys began showing up at public rest rooms, offering to serve as watchqueen whenever it seemed appropriate, Because the watchqueen's payoff was the chance to watch the action, Humphreys was able to conduct field observations as he would in a study of political rallies or jaywalking behavior at intersections. To round out his understanding of the tearoom trade, Humphreys needed to know something more about the people who participatedo Because the men probably would not have been thrilled about being interviewed, Humphreys developed a different solutiono Whenever possible, he noted the license numbers of participants' cars and tracked down their names and addresses through the police. Humphreys then visited the men at their
Two Ethical Controversies
73
CODE OF PROFESSIONAL ETHICS AND PRACTICES
We, the members of the American Association for Public Opinion Research, subscribe to the principles expressed in the following code. Our goal is to support sound practice in the profession of public opinion research. (By public opinion research we mean studies in which the principal source of information about individual beliefs, preferences, and behavior is a report given by the individual himself or herself.) We pledge ourselves to maintain high standards of scientific competence and integrity in our work, and in our relations both with our clients and with the general public. We further pledge ourselves to reject all tasks or assignments which would be inconsistent with the principles of this code. THE CODE
I. Principles of Professional Practice in the Conduct of Our Work A. We shall exercise due care in gathering and processing data, taking all reasonable steps to assume the accuracy of results. B. We shall exercise due care in the development of research designs and in the analysis of data. 1. We shall employ only research tools and methods of analysis which, in our professional judgment, are well suited to the research problem at hand. 2. We shall not select research tools and methods of analysis because of their special capacity to yield a desired conclusion. 3. We shall not knowingly make interpretations of research results, nor shall we tacitly permit interpretations, which are inconsistent with the data available. 4. We shall not knowingly imply that interpretations should be accorded greater confidence than the data actually warrant. C. We shall describe our findings and methods accurately and in appropriate detail in all research reports.
II. Principles of Professional Responsibility in Our Dealings with People A. The Public: 1. We shall cooperate with legally authorized representatives of the public by describing the methods used in our studies. 2. We shall maintain the right to approve the release of our findings whether or not ascribed to us. When misinterpretation appears, we shall publicly disclose what is required to correct it, notwithstanding our obligation for client confidentiality in all other respects. B. Clients or Sponsors: 1. We shall hold confidential all information obtained about the client's general business affairs and about the findings of research conducted for the client, except when the dissemination of such information is expressly authorized. 2. We shall be mindful of the limitations of our techniques and facilities and shall accept only those research assignments which can be accomplished within these limitations. C. The Profession: 1. We shall not cite our membership in the Association as evidence of professional competence, since the association does not so certify any persons or organizations. 2. We recognize our responsibility to contribute to .the science of public opinion research and to disseminate as freely as possible the ideas and findings which emerge from our research. D. The Respondent: 1. We shall not lie to survey respondents or use practices and methods which abuse, coerce, or humiliate them. 2. We shall protect the anonymity of every respondent, unless the respondent waives such anonymity for specified uses. In addition, we shall hold as privileged and confidental all information which tends to identify the respondent.
FIGURE 3-1 Code of Conduct of the American Association for Public Opinion Research Source: American Association for Public Opinion Research, By-Laws (May 1977L Used by permission. The code of conduct is currently under revision; you can download a copy of the proposed changes at http://www.aapoLorg/?page = survey_ methods/sta ndards _and_besCpractices/code_for_professional_ethics_and_practices.
homes, disguising himself enough to avoid recognition, and announced that he was conducting a survey. In that fashion, he collected the personal information he couldn't get in the restrooms. As you can imagine, Humphreys' research provoked considerable controversy both inside and outside the social scientific community. Some critics charged Humphreys with a gross invasion of privacy in the name of science. What men did in public restrooms was their own business. Others were mostly concerned about the deceit involvedHumphreys had lied to the participants by leading them to believe he was only a voyeur-participant. Even people who felt that the tearoom participants were fair game for observation because they used a public facility protested the follow-up survey. They felt it was unethical for Humphreys to trace the participants to their homes and to interview them under false pretenses. Still others justified Humphreys' research. The topic, they said, was worth study. It couldn't be studied any other way, and they regarded the deceit as essentially harmless, noting that Humphreys was careful not to harm his subjects by disclosing their tearoom activities. The tearoom trade controversy has never been resolved. It's still debated, and it probably always will be, because it stirs emotions and involves ethical issues people disagree about. What do you think? Was Humphreys ethical in doing what he did? Are there parts of the research that you believe were acceptable and other parts that were not? (See the discussion by Joan Sieber online at http://www.missourLedu/-philwb/Laud.html for more on the political and ethical context of the "tearoom" research.)
Observing Human Obedience The second illustration differs from the first in many ways. Whereas Humphreys' study involved participant observation, this study took place in the laboratory. Humphreys' study was sociological, this one psychologicaL And whereas Humphreys examined behavior considered by many to be deviant, the researcher in this study examined obedience and conformity.
One of the more unsettling cliches to come out of World War II was the German soldier's common excuse for atrocities: "I was only following orders." From the point of view that gave rise to this comment, any behavior-no matter how reprehensible-could be justified if someone else could be assigned responsibility for it. If a superior officer ordered a soldier to kill a baby, the fact of the order supposeclly exempted the soldier from personal responsibility for the action. Although the military tribunals that tried the war crime cases did not accept this excuse, social researchers and others have recognized the extent to which this point of view pervades social life. People often seem willing to do things they know would be considered wrong, if they can claim that some higher authority ordered them to do it. Such was the pattern of justification in the My Lai tragedy of Vietnam, when US. soldiers killed more than 300 unarmed civilians-some of them young children-simply because their village, My LaL was believed to be a Viet Cong stronghold. This sort of justification appears less dramatically in day-to-day civilian life. Few would disagree that this reliance on authority exists, yet Stanley Milgram's study (1963, 1965) of the topic provoked considerable controversy. To observe people's vvillingness to harm others when following orders, Milgram brought 40 adult men from many different walks of life into a laboratory setting designed to create the phenomenon under study. If you had been a subject in the experiment, you would have had something like the follovving experience. You've been informed that you and another subject are about to participate in a learning experiment. Through a draw of lots, you're assigned the job of "teacher" and your fellow subject the job of "pupil." The "pupil" is led into another room and strapped into a chair; an electrode is attached to his wrist. As the teacher. you're seated in front of an impressive electrical control panel covered with dials, gauges, and switches. You notice that each switch has a label giving a different number of volts, ranging from 15 to 315. The switches have other labels, too, some with the ominous phrases
74
Chapter 3: The Ethics and Politics of Social Research
"Extreme-Intensity Shock," "Danger-Severe Shock" and "XXX" The experiment runs like this. You read a list of word pairs to the learner and then test his ability to match them up. Because you can't see him, a light on your control panel indicates his answer. Whenever the learner makes a mistake, you're instructed by the experimenter to throw one of the switches-beginning yvith the mildest-and administer a shock to your pupil. Through an open door between the two rooms, you hear your pupil's response to the shock. Then you read another list of word pairs and test him again. As the experiment progresses, you administer ever more intense shocks, until your pupil screams for mercy and begs for the experiment to end. You're instructed to administer the next shock anyway. After a while, your pupil begins kicking the wall between the two rooms and continues to scream. The implacable experimenter tells you to give the next shock. Finally, you read a list and ask for the pupil's answer-but there is no reply whatever. only silence from the other room. The experimenter informs you that no answer is considered an error and instructs you to administer the next higher shock. This continues up to the "XXX" shock at the end of the series,. What do you suppose you really would have done when the pupil first began screaming? When he began kicking on the wall? Or when he became totally silent and gave no indication of life? You'd refuse to continue giving shocks, right? And surely the same would be true of most people. So we might think-but Milgram found otherwise. Of the first 40 adult men Milgram tested, nobody refused to continue administering the shocks until they heard the pupil begin kicking the wall between the two rooms. Of the 40, 5 did so then. Two-thirds of the subjects, 26 of the 40, continued doing as they were told through the entire seriesup to and including the administration of the highest shock. As you've probably guessed, the shocks were phony, and the "pupil" was a confederate of the experimenter. Only the "teacher" was a real subject in the experiment. As a subject. you wouldn't actually have been hurting another person, but you would have been led to think you were. The ell:periment
The Politics of Social Research
was designed to test your willingness to follow orders to the point of presumably killing someone. Milgram's experiments have been criticized both methodologically and ethically. On the ethical side, critics have particularly cited the effects of the experiment on the subjects. Many seemed to have experienced personally about as much pain as they thought they were administering to someone else. They pleaded with the experimenter to let them stop giving the shocks. They became extremely upset and nervous. Some had uncontrollable seizures. How do you feel about this research? Do you think the topic was important enough to justify such measures? Would debriefing the subjects be sufficient to ameliorate any possible harm? Can you think of other ways the researcher might have examined obedience? There is a wealth of discussion regarding the Milgram experiments on the web. Search for "Milgram experiments," "human obedience experiments," or "Stanley Milgram."
The Politics of Social Research As I indicated earlier, both ethics and politics hinge on ideological points of view. What is unacceptable from one point of view will be acceptable from another. Although political and ethical issues are often closely intertwined, I want to distinguish between them in two ways. First, the ethics of social research deals mostly with the methods employed; political issues tend to center on the substance and use of research. Thus, for example, some critics raise ethical objections to the Milgram ex-periments, saying that the methods harm the subjects. A political objection would be that obedience is not a suitable topic for study, either because (I) we should not tinker with people's vvillingness to follow orders from higher authority or (2), from the opposite political point of view, because the results of the research could be used to make people more obedient. The second distinction between the ethical and political aspects of social research is that there are no formal codes of accepted political conduct. Although some ethical norms have political aspects-for example, specific guidelines for not
harming subjects clearly relate to Western ideas about the protection of civil liberties-no one has developed a set of political norms that all social researchers accept. The only partial exception to the lack of political norms is the generally accepted view that a researcher's personal political orientation should not interfere with or unduly influence his or her scientific research. It would be considered improper for a researcher to use shoddy techniques or to distort or lie about his or her research as a way of furthering the researcher's political views. As you can imagine, however, studies are often enough attacked for allegedly violating this norm.
Objectivity and Ideology In Chapter L I suggested that social research can never be totally objective, because researchers are human and therefore necessarily subjective. As a collective enterprise, science achieves the equivalent of objectivity through intersubjectivity. That is, different scientists, having different subjective views, can and should arrive at the same results when they employ accepted research techniques. Essentially, this will happen to the extent that each can set personal values and views aside for the duration of the research. The classic statement on objectivity and neutrality in social science is Max Weber's lecture "Science as a Vocation" ([ I 925] 1946). In this talk, Weber coined the phrase valueJree sociology and urged that sociology. like other sciences, needed to be unencumbered by personal values if it was to make a special contribution to society. Liberals and conservatives alike could recognize the "facts" of social science, regardless of how those facts accorded with their personal politics. Most social researchers have agreed with this abstract ideaL but not alL Marxist and neo-Marxist scholars, for example, have argued that social science and social action cannot and should not be separated. Explanations of the status quo in society, they contend, shade subtly into defenses of that same status quo. Simple explanations of the social functions of. say, discrimination can easily become justifications for its continuance. By the same token, merely studying society and its ills without a
75
commitment to making SOciety more humane has been called irresponSible. In Chapter 10, we'll examine partidpatOlY action research which is explicitly committed to using social research for purposes designed and valued by the subjects of the research. Thus, for example, researchers committed to improving the working conditions for workers at a factory would ask the workers to define the outcomes they would like to see and to have a hand in conducting social research relevant to achieving the desired ends. The role of the researchers is to ensure that the workers have access to professional research methods. Quite aside from abstract disagreements about whether social science can or should be value-free, many have argued about whether particular research undertakings are value-free or whether they represent an intrusion of the researcher's own political values. Typically, researchers have denied such intrusion, and their denials have then been challenged. Let's look at some examples of the controversies this issue has produced.
Social Research and Race Nowhere have social research and politics been more controversially intertwined than in the area of racial relations. Social researchers studied tlle topic for a long time, and the products of the social research have often found their way into practical politics. A few brief references should illustrate the point. In 1896, when the U.S, Supreme Court established the principle of "separate but equal" as a means of reconciling the Fourteenth Amendment's guarantee of equality to African Americans with the norms of segregation, it neither asked for nor cited social research. Nonetheless, it is widely believed that the Court was influenced by the writings of William Graham Surrmer, a leading social scientist of his era. Sumner was noted for his view that the mores and folkways of a society were relatively inlpervious to legislation and social planning. His view has often been paraphrased as "stateways do not make folkways." Thus, the Court ruled that it could not accept the assumption that "social prejudices may be overcome by legislation" and denied the wisdom of "laws which conflict with the
76 .,
Chapter 3: The Ethics and Politics of Social Research
general sentiment of the community" (Blaunstein and Zangrando 1970: 308). As many a politician has said, "You can't legislate morality." When the doctrine of "separate but equal" was overturned in 1954 (BraH'I1 v. Board of Education) , the new Supreme Court decision was based in part on the conclusion that segregation had a detrimental effect on African American children. In drawing that conclusion, the Court cited several sociological and psychological research reports (Blaunstein and Zangrando 1970). For the most part, social researchers in this century have supported the cause of African American equality in the United States, and their convictions often have been the impetus for their research. Moreover, they've hoped that their research will lead to social change. There is no doubt, for example, that Gunnar Myrdal's classic two-volume study (1944) of race relations in the United States had a significant impact on the topic of his research. Myrdal amassed a great deal of data to show that the position of African Americans directly contradicted US. values of social and political equality. Further, Myrdal did not attempt to hide his ovvn point of view in the matter. (You can pursue Myrdal's landmark research further online by searching for "Gunnar Myrdal" or "An American Dilemma.") Many social researchers have become directly involved in the civil rights movement, some more radically than others. Given the broad support for ideals of equality, research conclusions supporting the cause of equality draw little or no criticism. To recognize how solid the general social science position is in this matter, we need only examine a few research projects that have produced conclusions disagreeing with the predominant ideological position. Most social researchers have-overtly, at least-supported the end of school segregation. Thus, an immediate and heated controversy arose in 1966 when James Coleman, a respected sociologist, published the results of a major national study of race and education. Contrary to general agreement, Coleman found little difference in academic performance between African American students attending integrated scl1001s and those attending segregated ones. Indeed, such obvious things as
The Politics of Social Research libraries, laboratory facilities, and high expenditures per student made little difference. Instead, Coleman reported that family and neighborhood factors had the most influence on academic amievement. Coleman's findings were not well received by many of the social researchers who had been active in the civil rights movement. Some scholars criticized Coleman's work on methodological grounds, but many others objected hotly on the grounds that the findings would have segregationist political consequences. The controversy that raged around the Coleman report was reminiscent of that provoked a year earlier by Daniel Moynihan (1965) in his critical analysis of the African American family in the United States. Another example of political controversy surrounding social research in connection with race concerns IQ scores. In 1969, Arthur Jensen, a Harvard psychologist, was asked to prepare an article for the Harvard Educational Review examining the data on racial differences in IQ test results (Jensen 1969). In the article, Jensen concluded that genetic differences between African Americans and whites accounted for the lower average IQ scores of African Americans. Jensen became so identified with that position that he appeared on college campuses across the country discussing it. Jensen's research has been attacked on numerous methodological bases. Critics charged that much of the data on which Jensen's conclusion was based were inadequate and sloppy-there are many IQ tests, some worse than others. Similarly, it was argued that Jensen had not taken socialenvironmental factors sufficiently into account. Other social researchers raised still other methodological objections. Beyond the scientific critique, however, many condemned Jensen as a racist. Hostile crowds booed him, drowning out his public presentations. Ironically, Jensen's reception by several university audiences did not differ significantly from the reception received by abolitionists a century before, when the prevailing opinion favored leaving the institution of slavery intact. Many social researchers limited their objections to the Moynihan, Coleman, and Jensen research to
scientific, methodological grounds. The political firestorms ignited by these studies, however, point out how ideology often shows up in matters of social research. Although the abstract model of science is divorced from ideology, the practice of science is not. To examine a more recent version of the controversy surrounding race and achievement, search the web for differing points of view concerning "The Bell Curve"-sparked by a book with that title by Richard 1. Herrnstein and Charles Murray.
nJe Politics of Sexual Research As I indicated earlier, the Laud Humphreys' study of tearoom trade raised ethical issues that researmers still discuss and debate. At the same time, it seems clear that much of the furor raised by the research was related to the subject matter itself. As I have written elsewhere, Laud Humphreys didn't just study S-E-X but observed and discussed homosexuality. And it wasn't even the caring-and-cornrnittedrelationships-between -two-people-whojust -happen -to-be-of-the-same-sex homosexuality but tawdry encounters between strangers in public toilets. Only adding the sacrifice of Christian babies could have made this more inflammatory for the great majority of Americans in 1970. (Babbie 2004: 12)
Whereas Humphreys' researm topic proved unusually provocative for many, much tamer sexuality research has also engendered outcries of public horror. During the 1940s and 1950s, the biologist Alfred Kinsey and his colleagues published landmark studies of sexual practices of American men (1948) and women (1953). Kinsey's extensive interviewing allowed him to report On frequency of sexual activity, premarital and extramarital sex, homosexual behavior, and so forth. His studies produced public outrage and efforts to close his research institute at Indiana University. Although today most people no longer get worked up about the Kinsey reports, Americans
77
tend to remain touchy about research on sex. In 1987, the National Institutes of Health (NIH), charged with finding ways to combat the AIDS epidemic, found they needed hard data on contemporary sexual practices if they were to design effective anti-AIDS programs. Their request for research proposals resulted in a sophisticated study design by Edward O. Laumann and colleagues. The proposed study focused on the different patterns of sexual activity characterizing different periods of life, and it received rave reviews from the NIH and their consultants. Enter Senator Jesse Helms (R-North Carolina) and Congressman William Dannemeyer (RCalifornia). In 1989, having learned of the Laumann study, Helms and Dannemeyer began a campaign to block the study and shift the same amount of money to a teen celibacy program. Anne Fausto-Sterling, a biologist, sought to understand the opposition to the Laumann study. The surveys, Helms argued, are not really intended "to stop the spread of AIDS. The real purpose is to compile supposedly scientific facts to support the left-wing liberal argument that homosexuality is a normal, acceptable lifestyle .... As long as I am able to stand on the floor of the US. Senate," he added, "I am never going to yield to that sort of thing, because it is not just another life-style; it is sodomy." (Fallsto-Sterling 1992)
Helms won a 66-34 vote in favor of his amendment in the US. Senate . Although the House of Representatives rejected the amendment, and it was dropped in conference committee, government funding for the study was put on hold. Laumann and his colleagues then turned to the private sector and obtained funding, albeit for a smaller study, from private foundations. Their research results were published in 1994 as The Social Organization of Sexuality.
Politics and the Census There is probably a political dimension to every attempt to study human social behavior. Consider the matter of the U S. decennial census, mandated
78
Chapter 3: The Ethics and Politics of Sodal Research
by the Constitution. The original purpose was to discover the population sizes of the various states to determine their proper representation in the House of Representatives. Whereas each state gets two Senators, large states get more Representatives than small ones do. So what could be simpler? Just count the number of people in each state. From the beginning, there was nothing simple about counting heads in a dispersed, national population like the United States. Even the definition of a "person" was anything but straightforward. A slave, for example, counted as only three-fourths of a person for purposes of the census. This decreased the representation of the slave-holding Southern states, though counting slaves as whole people might have raised the dangerously radical idea that they should be allowed to vote. Further, the logistical problems of counting people who reside in suburban tract houses, urban apartments, college dorms, military barracks, farms, cabins in the woods, and illegal housing units, as well as counting those who have no place to live, has always presented a daunting task It's the sort of challenge social researchers tackle with relish. However, the difficulty of finding the hardto-reach and the techniques created for doing so cannot escape the political neL Kenneth Prewitt, who directed the Census Bureau from 1998 to 200 L describes some of the political aspects of counting heads: Between 1910 and 1920, there was a massive wartime population movement from the rural, Southern states to industrial Northern cities. In 1920, for the first time in American history, the census included more city dwellers than rural residents. An urban America was something new and disturbing, especially to those who held to the Jeffersonian belief that independent farmers best protected democracy. Among those of this persuasion were rural, conservative congressmen in the South and West. They saw that reapportionment would shift power to factorybased unions and politically radical immigrants concentrated in Northeastern cities. Conservatives in Congress blocked reapportionment, complaining among other things that because
The Politics of Soda I Research
January 1 was then census day, transient agricultural workers were "incorrectly" counted in cities rather than on the farms to which they would return in time for spring planting. (Census day was later shifted to April 1, where it has remained.) The arguments dragged out for a decade, and Congress was not reapportioned until after the next census. (Prewitt 2003)
In more recent years, concern for undercount-
ing the urban poor has become a political issue. The big cities, which have the most to lose from the undercounting, typically vote Democratic rather than Republican, so you can probably guess which party supports efforts to improve the counting and which party is less enthusiastic. By the same token, when social scientists have argued in favor of replacing the attempt at a total enumeration of the population vvith modern survey sampling methods (see Chapter 7 for more on sampling), they have enjoyed more support from Democrats, who would stand to gain from such a methodological shift, than from Republicans, who would stand to lose. Rather than suggesting Democrats support science more than Republicans do, this situation offers another example of how the political context in which we live and conduct social research often affects that research.
Politics with a Little lip" Social research is often confounded by political ideologies, but the "politics" of social research runs far deeper stilL Social research in relation to contested social issues simply cannot remain antiseptically objective-particularly when differing ideologies are pitted against each other in a field of social science data. The same is true when research is invoked in disputes between people with conflicting interests. For instance, social researchers who have served as "expert witnesses" in court would probably agree that the scientific ideal of a "search for truth" seems hopelessly naive in a trial or lawsuit. Although expert witnesses technically do not represent either side in court, they are, nonetheless, engaged by only one side to appear, and their testimony tends
to support the side of the party who pays for their time. This doesn't necessarily mean that these witnesses will lie on behalf of their patrons, but the contenders in a lawsuit are understandably more likely to pay for expert testimony that supports their case than for testimony that attacks it. Thus, as an expert witness, you appear in court only because your presumably scientific and honest judgment happens to coincide with the interests of the party paying you to testify. Once you arrive in court and swear to tell the truth, the whole truth, and nothing but the truth, however, you find yourself in a world foreign to the ideals of objective contemplation. Suddenly, the norms are those of winning and losing. As an expert witness, of course, all you have to lose is your respectability (and perhaps the chance to earn fees as an expert witness in the future). Still, such stakes are high enough to create discomfort for most social researchers. I recall one case in federal court when I was testifying on behalf of some civil service workers who had had their cost-of-living allowance (COLA) cut on the basis of research I thought was rather shoddy. I was engaged to conduct more "scientific" research that would demonstrate the injustice worked against the civil servants (Babbie 1982: 232-43). I took the stand, feeling pretty much like a respected professor and textbook author. In short order, however, I found I had moved from the academy to the hockey rink. Tests of statistical significance and sampling error were suddenly less relevant than a slap shot. At one point, an attorney from Washington lured me into casually agreeing that I was familiar with a certain professional journal. Unfortunately, the journal did not exist. I was mortified and suddenly found myself shifting domains. Without really thinking about it, I now was less committed to being a friendly Mr. Chips and more aligned with ninja-professor. I would not be fully satisfied until 1, in turn, could mortify the attorney, which I succeeded in doing. Even though the civil servants got their cost-ofliving allowance back, I have to admit I was also concerned with how I looked in front of the courtroom assemblage. I tell you this anecdote to illustrate the personal "politics" of human interactions involving presumably scientific and objective research. We need to realize that as human beings
79
social researchers are going to act like human beings, and we must take this into account in assessing their findings. This recognition does not invalidate their research or provide an excuse for rejecting findings we happen to dislike, but it does need to be taken into account.
Politics in Perspective Although the ethical and the political dimensions of research are in principle distinct they do intersect. Whenever politicians or the public feel that social research is violating ethical or moral standards, they'll be quick to respond with remedies of their own. Moreover, the standards they defend may not be those of the research community. Even when researchers support the goals of measures directed at the way research is done, the means specified by regulations or legislation can hamstring research. Legislators show special concern for research on children. Although the social research norms discussed in this chapter would guard against bringing any physical or emotional harm to children, some of the restrictive legislation introduced from time to time borders on the actions of one particular western city, which shall remain nameless. In response to concerns that a public school teacher had been playing New Age music in class and encouraging students to meditate, the city council passed legislation stating that no teacher could do anything that would "affect the minds of students"! I hope you take away four main lessons from this discussion. First, science is not untouched by politics. The intrusion of politics and related ideologies is not unique to social research; the natural sciences have experienced and continue to experience similar situations. But social science, in particular, is a part of social life. Social researchers study things that matter to people-things they have firm, personal feelings about and things that affect their lives. Moreover, researchers are human beings, and their feelings often show through in their professionallives. To think otherwise would be naive. Second, science does proceed in the midst of political controversy and hostility. Even when researchers get angry and call each other names, or
80 " Chapter 3: The Ethics and Politics of Sodal Research when the research community comes under attack from the outside, scientific inquiry persists" Studies are done, reports are published, and new things are learned" In short, ideological disputes do not bring science to a halt, but they do make it more challenging-and exciting" Third, an awareness of ideological considerations enriches the study and practice of social research methods. Many of the established characteristics of science, such as intersubjectivity, function to cancel out or hold in check our human shortcomings, especially those we are unaware of. Otherwise, we might look into the world and never see anything but a reflection of our personal biases and beliefs. Finally, whereas researchers should not let their own values interfere with the quality and honesty of their research, this does not mean that researchers cannot or should not participate in public debates and express both their scientific expertise and personal values. You can do scientifically excellent research on racial prejudice, all the while being opposed to prejudice and saying so. Some would argue that social scientists, because of their scientific expertise in the workings of society, have an obligation to speak out, rather than leaving that role to politicians, journalists, and talk-show hosts. Herbert Gans (2002) writes of the need for "public sociologists": A public sociologist is a public intellectual who applies sociological ideas and findings to social (defined broadly) issues about which sociology (also defined broaclly) has something to say Public intellectuals comment on whatever issues show up on the public agenda; public sociologists do so only on issues to which they can apply their sociological insights and findings.
Review Questions and Exercises Ethical Issues in Social Research &
&
&
&
&
&
&
What is ethical and unethical in research is ultimatelya matter of what a community of people agree is right and wrong"
MAIN POINTS
Researchers agree that research should not harm those who participate in it, unless they give their informal consent, thereby willingly and knowingly accepting the risks of harm.
In addition to tedmical, scientific considerations, social research projects are likely to be shaped by administrative, ethical, and political considerations.
Even though the norms of science cannot force individual researchers to give up their personal values, the intersubjective character of science provides a guard against scientific findings being the product of bias only"
&
Many research designs involve a greater or lesser degree of deception of subjects. Because deceiving people violates common standards of ethical behavior, deception in research requires a strong justification-and even then the justification may be challenged.
The following terms are defined in context in the chapter and at the bottom of the page where the term is introduced, as well as in the comprehensive glossary at the back of the book
Social researchers have ethical obligations to the community of researchers as well as to subjects. These obligations include reporting results fully and accurately as well as disclosing errors, limitations, and other shortcomings in the research. Professional associations in several disciplines publish codes of ethics to guide researchers. These codes are necessary and helpful, but they do not resolve all ethical questions.
Laud Humphreys' study of "tearoom" encounters and Stanley Milgram's study of obedience raise ethical issues that are debated to this day"
The Politics of Social Research &
Although most researchers agree that political orientation should not unduly influence research, in practice separating politics and ideology from the conduct of research can be quite difficult. Some researchers maintain that research can and should be an instrument of social action and change. More subtly, a shared ideology can affect the way other researchers receive one's research.
Whereas anonymity refers to the situation in which even the researcher cannot identify specific information with the individuals it describes, confidentiality refers to the situation in which the researcher promises to keep information about subjects private. The most straightforward way to ensure confidentiality is to destroy identifying information as soon as it's no longer needed.
Introduction &
&
Researchers agree that participation in research should normally be voluntary. This norm, however, can conflict with the scientific need for generalizability.
Two Ethical Controversies &
Moreover, much social research inevitably involves the political beliefs of people outside the research community.
Social research inevitably has a political and ideological dimension. Although science is neutral on political matters, scientists are not.
risk arrest as an accomplice after the fact, the researcher complies. c
d.
e"
KEY TERMS
£.
anonymity confidentiality
debriefing informed consent
REVIEW QUESTIONS AND EXERCISES g" I. Consider the following real and hypothetical research situations. What is the ethical component in each example? How do you feel about it? Do you think the procedures described are ultimately acceptable or unacceptable? You might find it useful to discuss some of these situations with classmates. a.
A psychology instructor asks students in an introductory psychology class to complete questionnaires that the instructor will analyze and use in preparing a journal article for publication.
b" After a field study of deviant behavior during a riot, law enforcement officials demand that the researcher identify those people who were observed looting. Rather than
81
h.
L
After completing the final draft of a book reporting a research project, the researcherauthor discovers that 25 of the 2,000 survey interviews were falsified by interviewers" To protect the bulk of the research, the author leaves out this information and publishes the book Researchers obtain a list of right-wing radicals they wish to study They contact the radicals with the explanation that each has been selected "at random" from among the general population to take a sampling of "public opinion." A college instructor who wants to test the effect of unfair berating administers an hour exam to both sections of a specific course. The overall performance of the two sections is essentially the same. The grades of one section are artificially lowered, however, and the instructor berates the students for performing so badly" The instructor then administers the same final exam to both sections and discovers that the performance of the unfairly berated section is worse. The hypothesis is confirmed, and the research report is published. In a study of sexual behavior, the investigator wants to overcome subjects' reluctance to report what they might regard as shameful behavioL To get past their reluctance, subjects are asked, "Everyone masturbates now and then; about how much do you masturbate?" A researcher studying dorm life on campus discovers that 60 percent of the residents regularly violate restrictions on alcohol consumption . Publication of this finding would probably create a furor in the campus community. Because no extensive analysis of alcohol use is planned, the researcher decides to keep this finding quiet. To test the extent to which people may try to save face by expressing attitudes on matters they are wholly uninformed about, the researcher asks for their attitudes regarding a fictitious issue. A research questionnaire is circulated among students as part of their university registration packet. Although students are not told they must complete the questionnaire, the
82
Online Study Resources
Chapter 3: The Ethics and Politics of Social Research hope is that they will believe they mustthus ensuring a higher completion rateo j.
A researcher pretends to join a radical political group in order to study it and is successfully accepted as a member of the inner planning circle. What should the re~ searcher do if the group makes plans for the following? (1) A peaceful, though illegal, demonstration (2) The bombing of a public building during a time it is sure to be unoccupied
lee, Raymond. 1993 . Doillg Research 011 Sensitive Topics. Newbury Park, CA: Sage. This book examines the conflicts between scientific research needs and the rights of the people involved-with guidelines for dealing with such conflicts. Sweet, Stephen. 19990 "Using a Mock Institutional Review Board to Teach Ethics in Sociological Research." Teachillg Sociology 27 (January): 55-59. Though written for professors, this article provides some research examples that challenge your ethical instincts.
(3) The assassination of a public official 2. Review the discussion of the Milgram experiment on obedienceo How would you design a study to accomplish the same purpose while avoiding the ethical criticisms leveled at Milgram? Would your design be equally valid? Would it have the same effect? 3.
4.
Suppose a researcher who is personally in favor of small families-as a response to the problem of overpopulation-wants to conduct a survey to determine why some people want many children and others don't. What personalinvolvement problems would the researcher face, and how could she or he avoid them? What ethical issues should the researcher take into account in designing the survey? Using InfoTrac College Edition, search for "informed content" and then narrow your search to "researcho" Skim the resulting articles and begin to identify groups of people for whom informed consent may be problematic-people who may not be able to give it. Suggest some ways in which the problem might be overcome.
SPSS EXERCISES
See the booklet that accompanies your text for exercises using SPSS (Statistical Package for the Social Sciences). There are exercises offered for each chapter, and you'll also find a detailed primer on using SPSS.
Online Study Resources Sociology.~ Now'": Research Methods 10 Before you do your final review of the chapter, take the SociologyNow~ Research lvlethods diagnostic quiz to help identify the areas on which you should concentrate. You'll find information on this online tool, as well as instructions on how to access all of its great resources, in the front of the book. 20
ADDITIONAL READINGS
Hamnett, Michael E, Douglas J. Porter, Amarjit Singh, and Krishna Kumar- 1984. Ethics, Politics, and International Social Science ResearcJL Honolulu: University of Hawaii Press. Discussions of research ethics typically focus on the interests of the individual participants in research projects, but this book raises the level of the discussion to include the rights of whole societies. Homan, RogeL 1991. Tlze Ethics of Social Research london: Longman. A thoughtful analysis of the ethical issues of social science research, by a practicing British social researcher,
As you review, take advantage of the Sociology Now: Research Methods customized study plan, based on your quiz results . Use this study plan with its interactive exercises and other resources to master the materiaL
3. When you're finished with your review, take the posttest to confirm that you're ready to move on to the next chapteL
WEBSITE FOR THE PRACTICE OF SOCIAL RESEARCH 11 TH EDITION Go to your book's website at http://sociology .wadsworthcom/babbie_practice lIe for tools to aid you in studying for your exams. You'll find Tutorial Quizzes with feedback, Internet Exercises, Flashcards, and
Chapter Tutorials, as well as E\1ended Projeas, InfoTrae College Edition search terms, Serial Research in C)berspace, GSS Data, y~;;b Links, and primers for using various data-analysis software such as SPSS and NVivo .
WEB LINKS FOR THIS CHAPTER Please realize that the Internet is an evolving entity, subject to change. Nevertheless, these few websites should be fairly stable. Also, check your book's website for even more H0b Links Anlerican Sociological Association, Code of Ethics http://www.asanetorg/page . vvw?section=Ethics& name=Ethics Most professional associations have codes of ethics intended to guide the activities of their members. This one is a good illustration of the genre.
83
Department of Health and Human Services, Protection of Human Subjects (45 CFR Part 46) hnp:llwww.hhs . gov/ohrp/humansubjects/ guidance/45cfr46.htm Here is the primary federal regulation governing the treatment of human subjects and providing a basis for the actions of institutional review boards. Research Council of NOIway, Guidelines for Research Ethics in the Social Sciences, Law and the Humanities http://www.etikkom.no/Engelsk/J\TESH This report, by the National Conunittee for Research Ethics in the Social Sciences and the Humanities, provides an in-depth examination of the topic
I
I osing problems properly is often more difficult than answering them. Indeed, a properly phrased question often seems to answer itself. .• You may have discovered the answer to a question just In the process of making the question clear to someone else. Part 2 deals with what should be observed; that is, Part 2 considers the posing of proper scientific questions, the structuring of inquiry. Part 3 will describe some of the specific methods of social scientific observation. Chapter 4 addresses the beginnings of research. It examines some of the purposes of inquiry, units of analysis, and the reasons scientists get involved in research projects. Chapter 5' deals with the specification of what it is you Want to measure-the processes of conceptualization and operationalization. It looks at some of the terms that you and I use quite casually in everyday life-prejudic~ liberalism, happjnes~ and so forth-and
shows how essential it is to clarify what we really mean by such terms when we do research. 'This process of clarification is called conceptualization . Once we clarify what we mean by certain terms, we can then measure the referents of those terms. The process of devising steps or operations for measuring what we want to study is called operationalization. Chapter 5 deals with the topic of operationalization in general, paying special attention to the framing of questions for interviews and questionnaires. To complete the introduction to measurement, Chapter 6 breaks with the chronological discussion of how research is conducted. In this chapter, we'll examine techniques for measuring variables in quantitative research through the combination of several indicators: indexes, scales, and typologies. As an example, we might ask survey respondents five different questions about their attitudes toward gender equality and then combine the answers to all five questions into a
composite measure of gender-based egalitarianism. Although such composite measures are constructed during the analysis of data (see Part 4), the raw materials for them must be provided for in the design and execution of data colle(tion. Finally, we'll look at how social researchers sele(t people or things for observation. Chapter 7, on sampling, addresses the fundamental scientific issue of generalizability. As you'll see, we can select a few people or things for observation and then apply what we observe to a much larger group. For example, by surveying 2,000 U.s. citizens about whom they favor for president of the United States, we can accurately predkt how tens of millions will vote. In this chapter, we'll examine techniques that increase the generalizability of what we observe. What you learn in Part 2 will bring you to the verge of making controlled social scientific observations. Part 3 will then show you how to take that next step. 85
Three Purposes of Research
Introduction
Research Design
Introduction Three Purposes of Research Exploration Description Explanation The Logic of Nomothetic Explanation Criteria for Nomothetic Causality False Criteria for Nomothetic Causality Necessary and Sufficient Causes Units of Analysis Individuals Groups Organizations Social Interactions Social Artifacts Units of Analysis in Review Faulty Reasoning about Units of Analysis: The Ecological Fallacy and Reductionism
The Time Dimension Cross-Sectional Studies Longitudinal Studies Approximating Longitudinal Studies Examples of Research Strategies How to Design a Research Project Getting Started Conceptualization Choice of Research Method Operationalization Population and Sampling Observations Data Processing Analysis Application Research Design in Review The Research Proposal Elements of a Research Proposal
Sociology~ NOW'": Research Methods Use this online tool to help you make the grade on your next exam. After reading this chapter, go to the "Online Study Resources" at the end of the chapter for instmctions on how to benefit from SodologyNow: Research lyre/hods.
Science is an enterprise dedicated to "finding out." No matter what you want to find out, though, there will likely be a great many ways of doing it. That's true in life generally. Suppose, for example, that you want to find out whether a particular automobilesay, the new Burpo-Blasto-would be a good car for you. You could, of course, buy one and find out that way. Or you could talk to a lot of B-B owners or to people who considered buying one but didn't. You might check the classified ads to see if there are a lot of B-Bs being sold cheap. You could read a consumer magazine evaluation of Burpo-Blastos. A similar situation occurs in scientific inquiry. Ultimately, scientific inquiry comes down to making observations and interpreting what you've observed, the subjects of Parts 3 and 4 of thls book. Before you can observe and analyze, however, you need a plan. You need to determine what you're going to observe and analyze: why and how. That's what research design is all about. Although the details vary according to what you wish to study, you face two major tasks in any research design. First you must specify as clearly as possible what you want to find out. Second, you must determine the best way to do it. Interestingly, if you can handle the first consideration fully, you'll probably handle the second in the same process. As mathematicians say, a properly framed question contains the answer. Let's say you're interested in conducting social research on terrorism. When Jeffrey Ross (2004) addressed this issue, he found the existing studies used a variety of qualitative and quantitative approaches. Qualitative researchers, for example, generated original data through Autobiographies Incident Reports and Accounts Hostages' EX'Periences with Terrorists Firsthand Accounts of Implementing Policies Ross goes on to discuss some of the secondary materials used by qualitative researchers: "biographies
87
of terrorists, case studies of terrorist organizations, case studies on types of terrorism, case studies on particular terrorist incidents, and case studies of terrorism in selected regions and countries." (2004: 27) Quantitative researchers, on the other hand, addressed terrorism in a variety of ways, including analyses of media coverage, statistical modeling of terrorist events, and the use of various databases relevant to the topic. As you'll see in this chapter, any research topic can be approached from many different directions. This chapter provides a general introduction to research design, whereas the other chapters in Part 2 elaborate on specific aspects of it. In practice, all aspects of research design are interrelated. As you read through Part 2, the interrelationships among parts will become clearer. We'll start by briefly examining the main purposes of social research. Then, we'll consider units of analysis-the what or whom you want to study. Next we'll consider ways of handling time in social research, or how to study a moving target that changes over time. With these ideas in hand, we'll turn to how to design a research project. This overview of the research process serves two purposes: Besides describing how you might go i3bout designing a study, it provides a map of the remainder of this book. Finally, we'll look at the elements of research proposals. Often, the actual conduct of research needs to be preceded by a detailing of your intentions-to obtain funding for a major project or perhaps to get your instructor's approval for a class project. You'll see that the research proposal provides an excellent opportunity for you to consider all aspects of your research in advance.
Three Purposes of Research Social research can serve many purposes. Three of the most common and useful purposes are exploration, description, and explanation. Although a given study can have more than one of these purposes-and most do-examining them
88
Three Purposes of Research
Chapter 4: Research Design
separately is useful because each has different implications for other aspects of research design.
Exploration Much of social research is conducted to e;..'{)lore a topic, that is, to start to familiarize a researcher with that topic. This approach typically occurs when a researcher examines a new interest or when the subject of study itself is relatively new. As an example, let's suppose that widespread taxpayer dissatisfaction with the government erupts into a taxpayers' revolt. People begin refusing to pay their taxes, and they organize themselves around that issue. You might like to learn more about the movement: How widespread is it? What levels and degrees of support are there within the community? How is the movement organized? What kinds of people are active in it? An exploratory study could help you find at least approximate answers to some of these questions, You might check figures vvith tax-collecting officials, collect and study the literature of the movement, attend meetings, and interview leaders. Exploratory studies are also appropriate for more persistent phenomena. Suppose you're unhappy with your college's graduation requirements and want to help change them. You might study the history of such requirements at the college and meet with college officials to learn the reasons for the current standards. You could talk to several students to get a rough idea of their sentiments on the subject. Though this last activity would not necessarily yield an accurate picture of student opinion, it could suggest what the results of a more extensive study might be. Sometimes exploratory research is pursued through the use of focus groups, or guided smallgroup discussions. This technique is frequently used in market research; we'll examine it further in Chapter 10. Exploratory studies are most typically done for three purposes: (1) to satisfy the researcher's Cliriosity and desire for better understanding, (2) to test the feasibility of undertaking a more extensive study, and (3) to develop the methods to be employed in any subsequent study.
A while back, for example, I became aware of the growing popularity of something called "channeling," in which a person known as a channel or medium enters a trance state and begins speaking with a voice that claims it originates outside the channel. Some of the voices say they come from a spirit world of the dead, some say they are from other planets, and still others say they exist in dimensions of reality difficult to explain in ordinary human terms. The channeled voices, often referred to as entities, sometimes use the metaphor of radio or television for the phenomenon they represent. "When you watch the news," one told me in the course of an interview, "you don't believe Dan Rather is really inside the television set. The same is true of me. I use this medium's body the way Dan Rather uses your television set." The idea of channeling interested me from several perspectives, not the least of which was the methodological question of how to study scientifically something that violates so much of what we take for granted, including scientific staples such as space, time, causation, and individuality. Lacking any rigorous theory or precise expectations, I merely set out to learn more. Using some of the techniques of qualitative field research discussed in Chapter 10, I began amassing information and forming categories for making sense of what I observed. I read books and articles about the phenomenon and talked to people who had attended channeling sessions. I then attended channeling sessions myself, observing those who attended as well as the channel and entity Next, I conducted personal interviews vvith numerous channels and entities. In most interviews, I began by asking the human channels questions about how they first began channeling, what it was like, and why they continued, as well as standard biographical questions. The channel would then go into a trance, whereby the interview continued vvith the entity speaking. "Who are you?" I might ask. "Where do you come from?" "Why are you doing this?" "How can I tell if you are real or a fake?" Although I went into these interview sessions with several questions prepared in advance, each of the interviews followed
whatever course seemed appropriate in light of the answers given. This example of exploration illustrates where social research often begins. Whereas researchers working from deductive theories have the key variables laid out in advance, one of my first tasks was to identify some of the possibly relevant variables. For example, I noted a channel's gender, age, education, religious background, regional origins, and previous participation in things metaphysical. I chose most of these variables because they commonly affect behavior. I also noted differences in the circumstances of channeling sessions. Some channels said they must go into deep trances, some use light trances, and others remain conscious. Most sit down while channeling, but others stand and walk about. Some channels operate under pretty ordinary conditions; others seem to require metaphysical props such as dim lights, incense, and chanting. Many of these differences became apparent to me only in the course of my initial observations. Regarding the entities, I have been interested in classifying where they say they come from. Over the course of my interviews, I've developed a set of questions about specific aspects of "reality," attempting to classify the answers they give. Similarly, I ask each to speak about future events, Over the course of this research, my examination of specific topics has become increasingly focused as I've identified variables that seem worth pursuing: gender, education, and religion, for example. Note, however, that I began vvith a reasonably blank slate. Exploratory studies are quite valuable in social scientific research. They're essential whenever a researcher is breaking new ground, and they almost always yield new insights into a topic for research. Exploratory studies are also a source of grounded theory, as discussed in Chapter L The chief shortcoming of exploratory studies is that they seldom provide satisfactory answers to research questions, though they can hint at the answers and can suggest which research methods could provide definitive ones. The reason exploratory studies are seldom definitive in themselves has to do with representativeness; that is, the
89
people you study in your exploratory research may not be typical of the larger population that interests you. Once you understand representativeness, you'll be able to know whether a given exploratory study actually answered its research problem or only pointed the way toward an answer. (Representativeness is discussed at length in Chapter 7.)
Description A major purpose of many social scientific studies is to describe situations and events. The researcher observes and then describes what was observed. Because scientific observation is careful and deliberate, however, scientific descriptions are typically more accurate and precise than casual ones are. The US. Census is an excellent example of descriptive social research. The goal of the census is to describe accurately and precisely a wide variety of characteristics of the U.S. population, as well as the populations of smaller areas such as states and counties, Other examples of descriptive studies are the computation of age-gender profiles of populations done by demographers, the computation of crime rates for different cities, and a productmarketing survey that describes the people who use, or would use, a particular product. A researcher who carefully chronicles the events that take place on a labor union picket line has, or at least serves, a descriptive purpose. A researcher who computes and reports the number of times individual legislators voted for or against organized labor also fulfills a descriptive purpose. Many qualitative studies aim primarily at description. An anthropological ethnography, for example, may try to detail the particular culture of some preliterate society. At the same time, such studies are seldom limited to a merely descriptive purpose. Researchers usually go on to examine why the observed patterns exist and what they imply.
Explanation The third general purpose of social scientific research is to explain things. Descriptive studies answer questions of what, where, when, and
90
Chapter 4: Research Design
how; explanatory questions, of why, So when William Sanders (1994) set about describing the varieties of gang violence, he also wanted to reconstruct the process that brought about violent episodes among the gangs of different ethnic groups, Reporting the voting intentions of an electorate is descriptive, but reporting why some people plan to vote for Candidate A and others for Candidate B is explanatory, Identifying variables that explain why some cities have higher crime rates than others involves explanation, A researcher who sets out to know why an antiabortion demonstration ended in a violent confrontation with police, as opposed to simply describing what happened, has an explanatory purpose, Let's look at a specific case, What factors do you suppose might shape people's attitudes toward the legalization of marijuana? To answer this, you might first consider whether men and women differ in their opinions, An explanatory analysis of the 2002 General Social Survey (GSS) data indicates that 38 percent of men and 30 percent of women said marijuana should be legalized, What about political orientation? The GSS data show that 55 percent of liberals said marijuana should be legalized, compared with 29 percent of moderates and 27 percent of conservatives, Further, 41 percent of Democrats, compared with 34 percent of Independents and 28 percent of Republicans, supported legalization, Given these statistics, you might begin to develop an e;.:planation for attitudes toward marijuana legalization, Further study of gender and political orientation might then lead to a deeper explanation of these attitudes,
correlation An empirical relationship betvveen two variables such that (1) changes in one are associated with changes in the other or (2) particular attributes of one variable are associated with particular attributes of the other. Correlation in and of itself does not constitute a causal relationship between the two variables, but it is one criterion of causality,
The logic of Nomothetic Explanation
The logic of Nomothetic Explanation The preceding examination of what factors might cause attitudes about legalizing marijuana illustrates nomothetic explanation, as discussed in Chapter L Recall that in this model, we try to find a few factors (independent variables) that can account for many of the variations in a given phenomenon, This explanatory model stands in contrast to the idiographic model, in which we seek a complete, in-depth understanding of a single case, In our example, an idiographic approach would suggest all the reasons that one person was opposed to legalization-involving what her parents, teachers, and clergy told her about it; any bad experiences experimenting vvith it; and so forth, When we understand something idiographically, we feel we really understand it. When we know all the reasons why someone opposed legalizing marijuana, we couldn't imagine that person having any other attitude, In contrast, a nomothetic approach might suggest that overall political orientations account for much of the difference of opinion about legalizing marijuana, Because this model is inherently probabilistic, it is more open than the idiographic model to misunderstanding and misinterpretation, Let's examine what social researchers mean when they say one variable (nomothetically) causes another. Then, we'll look at what they don't mean,
legalization, we could hardly say that political orientations caused the attitude, Though this cTiterion is obvious, it emphasizes the need to base social research assertions on actual observations rather than assumptions,
Time Order Next, we can't say a causal relationship exists unless the cause precedes the effect in time, Notice that it makes more sense to say that most children's religious affiliations are caused by those of their parents than to say that parents' affiliations are caused by those of their children-even though it would be possible for you to change your religion and for your parents to follow suit, Remember, nomothetic explanation deals vvith "most cases" but not all, In our marijuana example, it would make sense to say that gender causes, to some extent, attitudes toward legalization, whereas it would make no sense to say that opinions about marijuana determine a person's gender. Notice, however, that the time order connecting political orientations and attitudes about legalization is less clear, though we sometimes reason that general orientations cause specific opinions, And sometimes our analyses involve two or more independent variables that were established at the same time: looking at the effects of gender and race on voting behavior, for example, As we'll see in the next chapter, the issue of time order can be a complex matter.
Nonspurious
Criteria for Nomothetic Causality There are three main criteria for nomothetic causal relationships in social research: (1) the variables must be correlated, (2) the cause takes place before the effect, and (3) the variables are nonspurious,
Correlation Unless some actual relationship-or correlationis found between two variables, we can't say that a causal relationship exists, Our analysis of GSS data suggested that political orientation was a cause of attitudes about legalizing marijuana, Had the same percentage of liberals and conservatives supported
The third requirement for a causal relationship is that the effect cannot be explained in terms of some third variable, For example, there is a correlation between ice-cTeam sales and deaths due to drowning: the more ice cream sold, the more drownings, and vice versa, There is, however, no direct link between ice CTeam and drowning, The third variable at work here is season or temperature, Most drowning deaths occur during summer-the peak period for ice-CTeam sales, Here are a couple of other examples of spurious relationships, or ones that aren't genuine, There is a negative relationship between the number of mules and the number of Ph,D:s in towns
91
and cities: the more mules, the fewer Ph,D:s and vice versa, Perhaps you can think of another variable that would explain this apparent relationship, The answer is rural versus urban settings. There are more mules (and fewer Ph,D:s) in rural areas, whereas the opposite is true in cities, Or, consider the positive correlation between shoe size and math ability among schoolchildren, Here, the third variable that explains the puzzling relationship is age. Older children have bigger feet and more highly developed math skills, on average, than younger children do. See Figure 4-1 for an illustration of this spurious relationship, Observed associations are indicated vvith thin arrows; causal relationships with thick ones. Notice that observed associations go in both directions, That is, as one variable occurs or changes, so does the other. The list goes on. Areas with many storks have high birth rates. Those with few storks have low birth rates, Do storks really deliver babies? Birth rates are higher in the country than in the city; more storks live in the country than the city. The third variable here is urban/1'lIral areas. Finally, the more fire trucks that put out a fire, the more damage to the structure, Can you guess what the third variable is? In this case, it's the size of the fire, Thus, when social researchers say there is a causal relationship between, say, education and racial tolerance, they mean (1) there is a statistical correlation between the two variables, (2) a person's educational level occurred before their current level of tolerance or prejudice, and (3) there is no third variable that can explain away the observed correlation as spurious.
False Criteria for Nomothetic Causality Because notions of cause and effect are well entrenched in everyday language and logic, it's important to specify some of the things social
spurious relationship A coincidental statistical correlation between two variables, shown to be caused by some third variable,
92
Necessary and Sufficient Causes
Chapter 4: Research Design
93
Observed Correlation Positive (direct) correlation Shoe size ...
~~...~
Math skill
Bigger shoe size is associated with greater math skill, and vice versa.
Spurious causal relationships
'" Shoe size
Math skill
Neither shoe size nor math skill is a cause of the otheL
Actual causal relationships
Shoe size
Math skill
FIGURE 4-2 The underlying variable of age causes both bigger shoe size and greater math skill, thus explaining the observed correlation.
FIGURE 4-1 An Example of a Spurious Causal Relationship. Finding an empirical correlation between two variables does not necessarily establish a causal relationship. Sometimes the observed correlation is the incidental result of other causal relationships, involving other variables. researchers do not mean when they speak of causal relationships. When they say that one variable causes another, they do not necessarily mean to suggest complete causation, to account for exceptional cases, or to claim that the causation exists in a majority of cases.
consistently found that women are more religious than men in the United States. Thus, gender may be a cause of religiosity, even if your uncle is a religious zealot or you know a woman who is an avowed atheist. Those exceptional cases do not disprove the overall, causal pattern.
Complete Causation
Majority of Cases
Whereas an icliographic explanation of causation is relatively complete, a nomothetic explanation is probabilistic and usually incomplete. As we've seen, social researchers may say that political orientations cause attitudes toward legalizing marijuana even though not all liberals approve nor all conservatives disapprove. Thus, we say that political orientation is one of the causes of the attitude, but not the only one.
Causal relationships can be true even if they don't apply in a majority of cases. For example, we say that c11ildren who are not supervised after school are more likely to become delinquent than those who are supervised are; hence, lack of supervision is a cause of delinquency. This causal relationship holds true even if only a small percentage of those not supervised become delinquent. As long as they are more likely than those who are supervised to be delinquent, we say there is a causal relationship. The social scientific view of causation may vary from what you are accustomed to, because people commonly use the term cause to mean something
Exceptional Cases In nomothetic explanations, exceptions do not disprove a causal relationship. For example, it is
Necessary Cause. Being female is a necessary cause of pregnancy; that is, you can't get pregnant unless you are female.
that completely causes another thing. The somewhat different standard used by social researchers can be seen more clearly in terms of necessary and sufficient causes.
Necessary and Sufficient Causes A necessmy calise represents a condition that must be present for the effect to follow. For example, it is necessary for you to take college courses in order to get a degree. Take away the courses, and the degree never follows. However, simply taking the courses is not a sufficient cause of getting a degree. You need to take the right ones and pass them. Similarly, being female is a necessary condition of becoming pregnant, but it is not a sufficient cause. Otherwise, all women would get pregnant. Figure 4-2 illustrates this relationship between the variables of gender and pregnancy as a matrix showing the possible outcomes of combining these variables. A suffident cause, on the other hand, represents a conclition that, if it is present, guarantees the effect in question. This is not to say that a sufficient
cause is the only possible cause of a particular effect. For example, skipping an exam in this course would be a sufficient cause for failing it, though students could fail it other ways as well. Thus, a cause can be sufficient, but not necessary. Figure 4- 3 illustrates the relationship between taking or not taking the exam and either passing or failing it. The discovery of a cause that is both necessary and sufficient is, of course, the most satisfying outcome in research. If juvenile delinquency were the effect under examination, it would be nice to discover a single conclition that (I) must be present for delinquency to develop and (2) always results in delinquency. In such a case, you would surely feel that you knew precisely what caused juvenile delinquency. Unfortunately, we never discover single causes that are absolutely necessary and absolutely sufficient when analyzing the nomothetic relationships among variables. It is not uncommon, however, to find causal factors that are either 100 percent necessary (you must be female to become pregnant) or 100 percent sufficient (skipping an exam will inevitably cause you to fail it). In the idiographic analysis of single cases, you may reach a depth of explanation from which it is
94
Units of Analysis
Chapter 4: Research Design
F F
F
F
F
F
F
ADD A
B B
B C
C
A
B
F F F
F F
c
C A
o
C C
0 A
FIGURE 4-3 Sufficient Causeo Not taking the exam is a sufficient cause of failing it, even though there are other ways of failing (such as answering randomly)o
reasonable to assume that things could not have turned out differently, suggesting you have determined the sufficient causes for a particular result. (Anyone with all the same details of your genetic inheritance, upbringing, and subsequent experiences would have ended up going to college,) At the same tin1e, there could always be other causal paths to the same resulL Thus, the idiographic causes are sufficient but not necessary.
Units of Analysis In social research, there is virtually no limit to what or whom can be studied, or the units of analysis. This topic is relevant to all forms of social research, although its implications are clearest in the case of nomothetic, quantitative studies, The idea for units of analysis may seem slippery at first, because research-especially nomothetic research - often studies large collections of people or things, or aggregateso It's important to distin-
units of analysis The what or whom being studied. In social science research, the most typical units of analysis are individual people.
guish betvveen the unit of analysis and the aggregates that we generalize aboue For instance, a researcher may study a class of people, such as Democrats, college undergraduates, African American women under 30, or some other collection. But if the researcher is interested in exploring, descTibing, or explaining how different groups of individuals behave as individuals, the unit of analysis is the individual, not the group. This is true even though the researcher uses the information about individuals to generalize about aggregates of individuals, as in saying that more Democrats than Republicans favor legalizing marijuana, Think of it this way: Having an attitude about marijuana is something that can only be an attribute of an individual, not a group; that is, there is no one group "mind" that can have an attitudeo So even when we generalize about Democrats, we're generalizing about an attribute they possess as individuals, In contrast, we may sometimes want to study groups, considered as individual "actors" or entities that have attributes as gTOUpS, For instance, we might want to compare the characteristics of different types of street gangso In that case our unit of analysis would be gangs (not members of gangs), and we might proceed to make generalizations about different types of gangs. For example, we might conclude that male gangs are more violent than female gangso Each gang (unit of analysis) would be described in terms of two variables: (1) What sex are the members? and (2) How violent are its activities? So we might study 52 gangs, reporting that 40 were male and 12 were female, and so forth. The "gang" would be the unit of analysis, even though some of the characteristics were drawn from the components (members) of the gangs. Social researchers tend to choose individual people as their units of analysiso You may note the characteristics of individual people-gender, age, region of birth, attitudes, and so fortho You can then combine these descriptions to provide a composite picture of the group the individuals represent, whether a street-corner gang or a whole societyo For example, you may note the age and gender of each student enrolled in Political Science 110 and then characterize the group of students as
being 53 percent men and 47 percent women and as having a mean age of 1806 yearso Although the final description would be of the class as a whole, the description is based on characteristics that members of the class have as individuals, The same distinction between units of analysis and aggregates occurs in explanatory studieso Suppose you wished to discover whether students with good study habits received better grades in political Science 110 than students with poor study habits did. You would operationalize the variable study habits and measure this variable, perhaps in terms of hours of study per week, You might then aggregate students with good study habits and those with poor study habits and see which group received the best grades in the course. The purpose of the study would be to explain why some groups of students do better in the course than others do, but the unit of analysis is still individual students. Units of analysis in a study are usually also the units of observation, Thus, to study success in a political science course, we would observe individual students. Sometimes, however, we "observe" our units of analysis indirectly. For example, suppose we want to find out whether disagreements about the death penalty tend to cause divorce. In this case, we might "observe" individual husbands and wives by asking them about their attitudes about capital punishment. in order to distinguish couples who agree and disagree on this issue. In this case, our units of observation are individual wives and husbands, but our units of analysis (the t11ings we want to study) are couples. Units of analysis, then, are those things we examine in order to create summary descriptions of all such units and to explain differences among them. In most research projects, the unit of analysis will probably be clear to YOUo When the unit of analysis is not clear, however, it's essential to determine what it is; otherwise, you cannot determme what observations are to be made about whom or what. Some studies try to describe or explain more than one unit of analysis, In these cases, the researcher must anticipate what conclusions she or he wishes to draw with regard to which units of
95
analysis. For example, we may want to discover what kinds of college students (individuals) are most successful in their careers; we may also want to learn what kinds of colleges (organizations) produce the most successful graduateso Here's an example that illustrates the complexity of units of analysis. Murder is a fairly personal matter: One individual kills another individuaL However, when Charis Kubrin and Ronald Weitzer (2003: 157) ask, "Why do these neighborhoods generate high homicide rates?" the unit of analysis in that phrase is neighborhood. You can probably imagine some kinds of neighborhoods (eogo, poor, urban) that would have high homicide rates and some (e.g., wealthy, suburban) that would have low rates. In this particular conversation, the unit of analysis (neighborhood) would be categorized in terms of variables such as economic level, locale, and homicide rate, In their analysis, however, Kubrin and Weitzer were also interested in different types of homicide: in particular, those that occurred in retaliation for some earlier event, such as an assault or insulL Can you identify the unit of analysis common to all of the following excerpts? 10 The sample of killings 0 0 0 20 The coding instrument includes over 80 items
related to the homicide, 3.. Of the 2,161 homicides that occurred from 1985 [tol1995 0,0
4. Of those with an identified motive, 19.5 percent (n = 337) are retaliatoryo (Kubrin and Weitzer 2003: 163)
In each of these excerpts, the unit of analysis is homicide (also called killing or murder). Sometimes you can identify the unit of analysis in the description of the sampling methods, as in the first excerpt. A discussion of classification methods might also identify the unit of analysis, as in the second excerpt (80 ways to code the homicides). Often, numerical summaries point the way: 2,161 homicides; 19.5 percent (of the homicides). With a little practice you'll be able to identify the units of analysis in most social research reports, even when more than one is used in a given analysiso
96
Chapter 4: Research Design
To explore this topic in more depth, let's consider several common units of analysis in social research.
Individuals As mentioned, individual human beings are perhaps the most typical units of analysis for social research. Social researchers tend to describe and explain social groups and interactions by aggregating and manipulating the descriptions of individuals. Any type of individual may be the unit of analysis for social research. This point is more important than it may seem at first. The norm of generalized understanding in social research should suggest that scientific findings are most valuable when they apply to all kinds of people. In practice, however, social researchers seldom study all kinds of people. At the very least. their studies are typically limited to the people living in a single country, though some comparative studies stretch across national boundaries. Often, though, studies are quite circumscribed. Examples of classes of individuals that might be chosen for study include students, gays and lesbians, auto workers, voters, single parents, and faculty members. Note that each of these terms implies some population of individuals. Descriptive studies with individuals as their units of analysis typically aim to describe the population that comprises those individuals, whereas explanatory studies aim to discover the social dynamics operating within that population. As the units of analysis, individuals may be characterized in terms of their membership in social groupings. Thus, an individual may be described as belonging to a rich family or to a poor one, or a person may be described as having a college-educated mother or not. We might examine in a research project whether people with collegeeducated mothers are more likely to attend college than are those with non-college-educated mothers or whether high school graduates in rich families are more likely than those in poor families to attend college. In each case, the unit of analysis-the "thing" whose characteristics we are seeking to describe or explain-is the individual. We then
Units of Analysis
aggregate these individuals and make generalizations about the population they belong to.
Groups Social groups can also be units of analysis in social research. That is, we may be interested in characteristics that belong to one group, considered as a single entity. If you were to study the members of a criminal gang to learn about criminals, the individual (criminal) would be the unit of analysis; but if you studied all the gangs in a city to learn the differences, say, between big gangs and small ones, between "uptown" and "downtown" gangs, and so forth, you would be interested in gangs rather than their individual members. In this case, the unit of analysis would be the gang, a social group. Here's another example. Suppose you were interested in the question of access to computers in different segments of society. You might describe families in terms of total annual income and according to whether or not they had computers. You could then aggregate families and describe the mean income of families and the percentage with computers. You would then be in a position to detennine whether families with higher incomes were more likely to have computers than were those with lower incomes. In this case, the unit of analysis would be families. As with other units of analysis, we can derive the characteristics of social groups from those of their individual members. Thus, we might describe a family in terms of the age, race, or education of its head. In a descriptive study, we might find the percentage of all families that have a collegeeducated head of family. In an explanatory study, we might determine whether such families have, on average, more or fewer children than do families headed by people who have not graduated from college. In each of these examples, the family is the unit of analysis. In contrast, had we asked whether college-educated individuals have more or fewer children than do their less-educated counterparts, then the individual would have been the unit of analysis. Other units of analysis at the group level could be friendship cliques, married couples, census
blocks, cities, or geographic regions. As with individuals, each of these terms implies some population. Street gangs implies some population that includes all street gangs, perhaps in a given city. You might then describe this population by generalizing from your findings about individual gangs. For instance, you might describe the geographic distribution of gangs throughout a city. In an explanatory study of street gangs, you might discover whether large gangs are more likely than small ones to engage in intergang warfare. Thus, you would arrive at conclusions about the population of gangs by using individual groups as your unit of analysis.
Organizations Formal social organizations may also be the units of analysis in social research. For example, a researcher might study corporations, by which he or she implies a population of all corporations. Individual corporations might be characterized in terms of their number of employees, net annual profits, gross assets, number of defense contracts, percentage of employees from racial or ethnic minority groups, and so forth. We might determine whether large corporations hire a larger or smaller percentage of minority group employees than do small corporations. Other examples of formal social organizations suitable as units of analysis include church congregations, colleges, army divisions, academic departments, and supermarkets. Figure 4-4 provides a graphic illustration of some different units of analysis and the statements that might be made about them.
Social Interactions Sometimes social interactions are the relevant units of analysis. Instead of individual humans, you can study what goes on between them: telephone calls, kisses, dancing, arguments, fistfights, e-mail exchanges, chat-room discussions, and so forth. As you saw in Chapter 2, social interaction is the basis for one of the primary theoretical paradigms in the social sciences, and the number of units of analysis that social interactions provide is nearly infinite.
97
Even though individuals are usually the actors in social interactions, there is a difference between (1) comparing the kinds of people who subscribe to different Internet service providers (individuals being the unit of analysis) and (2) comparing the length of chat-room discussions on those same ISPs (the discussion being the unit of analysis).
Social Artifacts Another unit of analysis is the social artifact or any product of social beings or their behavior. One class of artifacts includes concrete objects such as books, poems, paintings, automobiles, bUildings, songs, pottery, jokes, student excuses for missing exams, and scientific discoveries. For example, Lenore Weitzman and her associates (1972) were interested in learning how gender roles are taught. They chose children's picture books as their unit of analysis. Specifically, they examined books that had received the Caldecott Medal. Their results were as follows: We found that females were underrepresented in the titles, central roles, pictures, and stories of every sample of books we examined. Most children's books are about boys, men, male animals, and deal exclusively with male adventures. Most pictures show men singly or in groups. Even when women can be found in the books, they often play insignificant roles, remaining both inconspicuous and nameless. (Weitzman et al.. 1972: 1128)
In a more recent study, Roger Clark, Rachel Lennon, and Leana Morris (1993) concluded that male and female characters are now portrayed less stereotypically than before, observing a clear progress toward portraying men and women in nontraditional roles. However, they did not find total equality between the sexes. As this example suggests, just as people or social groups imply popUlations, each social object
social artifact Any product of social beings or their behavior. Can be a unit of analysis.
Units of Analysis
60% of the sample are women 10% of the sample are wearing an eye patch 10% of the sample have pigtails
20% of the families have a single parent 50% of the families have two children 20% of the famillies have no children The mean number of children per family is 1. 3
20% of the households are occupied by more than one family 30% of the households have holes in their roofs 10% of the households are occupied by aliens Notice also that 33% of the families live in multiple-family households with family as the unit of analysis
FIGURE 4-4 Illustrations of Units of Analysis. Units of analysis in social research can be individuals, groups, or even nonhuman entities.
implies a set of all objects of the same class: all books, all novels, all biographies, all introductory sociology textbooks, all cookbooks, all press conferences. In a study using books as the units of analysis, an individual book might be characterized by its size, weight, length, price, content, number of pictures, number sold, or description of the author. Then the population of all books or of a particular kind of book could be analyzed for the purpose of description or explanation: what kinds of books sell best and why, for example. Similarly, a social researcher could analyze whether paintings by Russian, Chinese, or US. artists showed the greatest degree of working-class consciousness, taking paintings as the units of analysis and describing each, in part, by the nationality of its creator. Or you might examine a newspaper's editorials regarding a local university, for the purpose of describing, or perhaps explaining, changes in the newspaper's editorial position on the university over tin1e. In this example, individual editorials would be the units of analYSis. Social interactions form another class of social artifacts suitable for social research. For example, we might characterize weddings as racially or religiously mixed or not, as religious or secular in ceremony, as resulting in divorce or not, or by descriptions of one or both of the marriage partners (such as "previously married," "Oakland Raider fan," "wanted by the FBI"). When a researcher reports that weddings between partners of different religions are more likely to be performed by secular authorities than those between partners of the same religion are, the weddings are the units of analysis, not the individuals involved. Other social interactions that might be units of analysis are friendship choices, court cases, traffic aCcidents, divorces, fistfights, ship launchings, airline hijackings, race riots, final exams, student demonstrations, and congressional hearings. Congressional hearings, for instance, could be characterized by whether or not they occurred during an election campaign, whether the committee chairs were running for a higher office, whether they had received campaigns contributions from interested parties, and so on. Notice that even if we characterized and compared the hearings in terms of the committee chairs, the hearings themselves-not
99
the individual chairpersons-would be our units of analysis.
Units ofAna/ysis in Review The examples in this section should suggest the nearly infinite variety of possible units of analysis in social research. Although individual human beings are typical objects of study. many research questions can be answered more appropriately through the examination of other units of analysis. Indeed, social researchers can study just about anything that bears on social life. Moreover, the types of units of analysis named in this section do not begin to exhaust the possibilities. Morris Rosenberg (1968: 234-48), for example, speaks of individual, group, organizational, institutional, spatial, cultural, and societal units of analysis. Jolm and Lyn Lofland (1995: 103-13) speak of practices, episodes, encounters, roles, relationships, groups, organizations, settlements, social worlds, lifestyles, and subcultures as suitable units of study. The important thing here is to grasp the logic of units of analysis. Once you do, the possibilities for fruitful research are limited only by your imagination. Categorizing possible units of analysis might make the concept seem more complicated than it needs to be. What you call a given unit of analysis-a group, a formal organization, or a social artifact-is irrelevant The key is to be clear about what your unit of analysis is . When you embark on a research project, you must decide whether you're studying marriages or marriage partners, crin1es or criminals, corporations or corporate executives. Otherwise, you run the risk of drawing invalid conclusions because your assertions about one unit of analysis are actually based on the examination of another. We'll see an example of this issue in the next section as we look at the ecological fallacy.
Faulty Reasoning about Units ofAnalysis: The Ecological Fallacy and Reductionism At this point, it's appropriate to introduce two types of faulty reasoning that you should be aware of: the ecological fallacy and reductionism. Each
100
Chapter 4: Research Design
represents a potential pitfall regarding units of analysis, and either can occur in doing research and drawing conclusions from the results.
The Ecological Fallacy In this context, "ecological" refers to groups or sets or systems: something larger than individuals. The ecological fallacy is the assumption that something learned about an ecological unit says something about the individuals making up that unit. Let's consider a hypothetical illustration of this fallacy. Suppose we're interested in learning something about the nature of electoral support received by a female political candidate in a recent citywide election. Let's assume we have the vote tally for each precinct so we can tell which precincts gave her the greatest support and which the least. Assume also that we have census data describing some characteristics of these precincts. Our analysis of such data might show that precincts with relatively young voters gave the female candidate a greater proportion of their votes than did precincts with older voters. We might be tempted to conclude from these findings that younger voters are more likely to vote for female candidates than older voters arein other words, that age affects support for the woman. In reaching such a conclusion, we run the risk of committing the ecological fallacy, because it may have been the older voters in those "young" precincts who voted for the woman. Our problem is that we have examined precil1cts as our units of analysis but wish to draw conclusions about voters. The same problem would arise if we discovered that crime rates were higher in cities having large African American populations than in those with few African Americans. We would not know if the crimes were actually committed by African Americans. Or if we found suicide rates higher in Protestant countries than in Catholic ones, we still could
ecological fallacy Erroneously drawing conclusions about individuals solely from the observation of groups.
The TIme Dimension
not know for sure that more Protestants than Catholics committed suicide. In spite of these hazards, social researchers often have little choice but to address a particular research question through an ecological analysis. Perhaps the most appropriate data are simply not available. For example, the precinct vote tallies and the precinct characteristics mentioned in our initial example may be easy to obtain, but we may not have the resources to conduct a postelection survey of individual voters. In such cases, we may reach a tentative conclusion, recognizing and noting the risk of an ecological fallacy. Although you should be careful not to commit the ecological fallacy, don't let these warnings lead you into committing what we might call the individualistic fallacy Some people who approach social research for the first time have trouble reconciling general patterns of attitudes and actions with individual exceptions. But generalizations and probabilistic statements are not invalidated by individual exceptions. Your knovving a rich Democrat, for example, doesn't deny the fact that most rich people vote Republican-as a general pattern. Similarly, if you know someone who has gotten rich without any formal education, that doesn't deny the general pattern of higher education relating to higher income. The ecological fallacy deals with something else altogether-confusing units of analysis in such a way that we draw conclusions about individuals solely from the observation of groups. Although the patterns observed between variables at the level of groups may be genuine, the danger lies in reasoning from the observed attributes of groups to the attributes of the individuals who made up those groups, even though we have not actually observed individuals.
Reductionism A second type of potentially faulty reasoning related to units of analysis is reductionism. Reductionism involves attempts to explain a particular phenomenon in terms of limited and/or lowerorder concepts. The reductionist explanation is not altogether wrong; it is simply too limited. Thus, you
might attempt to predict this year's winners and losers in the National Basketball Association by focusing on the abilities of the individual players on each team. This is certainly not stupid or irrelevant, but the success or failure of teams involves more than just the individuals in them; it involves coaching, teamwork strategies, finances, facilities, fan loyalty, and so forth. To understand why some teams do better than others, you would make "team" the unit of analysis, and the quality of players would be one variable you would probably want to use in describing and classifying the teams. Further, different academic disciplines approach the same phenomenon quite differently. Sociologists tend to consider sociological variables (such as values, norms, and roles), economists ponder economic variables (such as supply and demand and marginal value), and psychologists examine psychological variables (such as personality types and traumas). Explaining all or most human behavior in terms of economic factors is called economic reductionism, explaining it in terms of psychological factors is called psychological reductionism, and so forth. Notice how this issue relates to the discussion of theoretical paradigms in Chapter 2. For many social scientists, the field of sociobiology is a prime example of reductionism, suggesting that all social phenomena can be explained in terms of biological factors. Thus, for example, Edward O. Wilson (1975) sought to explain altruistic behavior in human beings in terms of genetic makeup. In his neo-Darwinian view, Wilson suggests that humans have evolved in such a way that individuals sometimes need to sacrifice themselves for the benefit of the whole species. Some people might ex'Plain such sacrifice in terms of ideals or warm feelings between humans. However, genes are the essential unit in Wilson's paradigm, producing his famous dictum that human beings are "only DNA's way of making more DNA." Reductionism of any type tends to suggest that particular units of analysis or variables are more relevant than others. Suppose we ask what caused the American Revolution. Was it a shared commitment to the value of individual liberty? The eco-
101
nomic plight of the colonies in relation to Britain? The megalomania of the founders? As soon as we inquire about the single cause, we run the risk of reductionism. If we were to regard shared values as the cause of the American Revolution, our unit of analysis would be the individual colonist. An economist, though, might choose the 13 colonies as units of analYSis and examine the economic organizations and conditions of each. A psychologist might choose individual leaders as the units of analysis for purposes of examining their personalities. Of course, there's nothing wrong in choosing these units of analysis as pan of an ex'Planation of the An1erican Revolution, but I think you can see how each alone would not produce a complete answer. Like the ecological fallacy, reductionism can occur when we use inappropriate units of analysis. The appropriate unit of analysis for a given research question, however, is not always clear. Social researchers, especially across disciplinary boundaries, often debate this issue.
The Time Dimension So far in this chapter, we've regarded research design as a process for deciding what aspects we'll observe, of whom, and for what purpose. Now we must consider a set of time-related options that cuts across each of these earlier considerations. We can choose to make observations more or less at one time or over a long period. Tin1e plays many roles in the design and execution of research, quite aside from the time it takes to do research. Earlier we noted that the time sequence of events and situations is critical to
reductionism A fault of some researchers: a strict lin1itation (reduction) of the kinds of concepts to be considered relevant to the phenomenon under study. SOCiobiology A paradigm based in the view that social behavior can be eX"plained solely in terms of genetic characteristics and behavior.
102
The Time Dimension
Chapter 4: Research Design
determining causation (a point we'll return to in Part 4). Time also affects the generalizability of research findings. Do the descriptions and explanations resulting from a particular study accurately represent the situation of ten years ago, ten years from now, or only the present? Researchers have two principal options available to deal with the issue of time in the design of their research: crosssectional studies and longitudinal studies.
Cross-Sectional Studies A cross-sectional study involves observations of a sample, or cross section, of a population or phenomenon that are made at one point in time. Exploratory and descriptive studies are often crosssectionaL A single US. Census, for instance, is a study aimed at describing the US. population at a given time. Many eX'Planatory studies are also crosssectionaL A researcher conducting a large-scale national survey to examine the sources of racial and religious prejudice would, in all likelihood, be dealing with a single time frame-taking a snapshot so to speak, of the sources of prejudice at a particular point in history. Explanatory cross-sectional studies have an inherent problem. Although their conclusions are based on observations made at only one time, typically they aim at understanding causal processes that occur over time. This problem is somewhat akin to that of determining the speed of a moving object on the basis of a high-speed, still photograph that freezes the movement of the object. Yanjie Bian, for example, conducted a survey of workers in Tianjin, China, for the purpose of studying stratification in contemporary, urban Chinese society. In undertaking the survey in 1988, however, he was conscious of the important changes
cross-sectional study A study based on observations representing a single point in time. longitudinal study A study design involving the collection of data at different points in time.,
brought about by a series of national campaigns, such as the Great Proletarian Cultural Revolution, dating from the Chinese Revolution in 1949 (which brought the Chinese Communists into power) and continuing into the present. These campaigns altered political atmospheres and affected people's work and nonwork activities. Because of these campaigns, it is difficult to draw conclusions from a cross-sectional social survey, such as the one presented in this book, about general patterns of Chinese workplaces and their effects on workers. Such conclusions may be limited to one period of time and are subject to further tests based on data collected at other times. (1994,' 19)
The problem of generalizations about social life from a "snapshot" is one this book repeatedly addresses. One solution is suggested by Bian's final comment-about data collected "at other times": Social research often involves revisiting phenomena and building on the results of earlier research.
Longitudinal Studies In contrast to cross-sectional studies, a longitudinal study is designed to permit observations of the same phenomenon over an extended period. For example, a researcher can participate in and observe the activities of a UFO cult from its inception to its demise. Other longitudinal studies use records or artifacts to study changes over tinle. In analyses of newspaper editorials or Supreme Court decisions over time, for example, the studies are longitudinal whether the researcher's actual observations and analyses were made at one time or over the course of the actual events under study. Many field research projects, involving direct observation and perhaps in-depth interviews, are naturally longitudinaL Thus, for example, when Ramona Asher and Gary Fine (1991) studied the life experiences of the ,vives of alcoholic men, they were in a position to examine the evolution of troubled marital relationships over time, sometimes
even including the reactions of the subjects to the research itself. In the classic study When Prophecy Fails (1956), Leon Festinger, Henry Reicker, and Stanley Schachter were specifically interested in learning what happened to a flying saucer cult when their predictions of an alien encounter failed to come true. Would the cult members close down the group, or would they become all the more committed to their beliefs? A longitudinal study was required to provide an answer. (They redoubled their efforts to get new members.) Longitudinal studies can be more difficult for quantitative studies such as large-scale surveys. Nonetheless, they are often the best way to study changes over time. There are three special types of longitudinal studies that you should know about: trend studies, cohort studies, and panel studies.
Trend Studies A trend study is a type of longitudinal study that examines changes within a population over time. A simple example is a comparison of US. Censuses over a period of decades, showing shifts in the makeup of the national population. A similar use of archival data was made by Michael Carpini and Scott Keeter (1991), who wanted to know whether contemporary US. citizens were better or more poorly informed about politics than citizens of an earlier generation were. To find out, they compared the results of several Gallup Polls conducted during the 1940s and 1950s 'with a 1989 survey that asked several of the same questions tapping political knowledge. OveralL the analysis suggested that contemporary citizens were slightly better informed than earlier generations were. In 1989, 74 percent of the sample could name the vice president of the United States, compared with 67 percent in 1952. Substantially higher percentages of people in 1989 than in 1947 could explain presidential vetoes and congressional overrides of vetoes. On the other hand, more of the 1947 sample could identify their US. representative (38 percent) than the 1989 sample (29 percent) could.
103
An in-depth analysis, however, indicates that the slight increase in political knowledge resulted from the fact that the people in the 1989 sample were more highly educated than those from earlier samples were. When educational levels were taken into account, the researchers concluded that political knowledge has actually declined within specific educational groups.
Cohort Studies In a cohort study, a researcher examines specific subpopulations, or cohorts, as they change over time, Typically, a cohort is an age group, such as people born during the 1950s, but it can also be some other time grouping, such as people born during the Vietnam War, people who got married in 1994, and so forth. An example of a cohort study would be a series of national surveys, conducted perhaps every 20 years, to study the attitudes of the cohort born during World War II toward US. involvement in global affairs. A sample of people 15-20 years old might be surveyed in 1960, another sample of those 35-40 years old in 1980, and another sample of those 55-60 years old in 2000. Although the specific set of people studied in each survey would differ, each sample would represent the cohort born between 1940 and 1945. James Davis (1992) turned to a cohort analysis in an attempt to understand shifting political orientations during the 1970s and 1980s in the United States. OveralL he found a liberal trend on issues
trend study A type of longitudinal study in which a given characteristic of some population is monitored over time. An example would be the series of Gallup Polls showing the electorate's preferences for political candidates over the course of a campaign, even though different samples were interviewed at each point., cohort study A study in which some specific subpopUlation, or cohort, is studied over time, although data may be collected from different members in each set of observations, For example, a study of the occupational history of the class of 1970 in which questionnaires were sent every five years would be a cohort study,
104
Chapter 4: Research Design
TIle Time Dimension
TABLE 4-1
and Political Liberalism Survey dates of cohort Percent who would let the Communist speak
1972 to 1974
1977to 1980
1982 to 1984
1987to 1989
20-24
25-29
30-34
35-39
72%
68%
73%
73%
such as race, gender, religion, politics, crime, and free speech, But did this trend represent people in general getting a bit more liberal, or did it merely reflect liberal younger generations replacing the conservative older ones? To answer this question, Davis examined national surveys (from the General Social Survey, of which he is a founder) conducted in four time periods, five years aparL In each survey, he grouped the respondents into age groups, also five years apart. This strategy allowed him to compare different age groups at any given point in time as well as follow the political development of each age group over time. One of the questions he examined was whether a person who admitted to being a Communist should be allowed to speak in the respondents' communities. Consistently, the younger respondents in each time period were more ,>\Tilling to let the Communist speak than the older ones were. Among those aged 20-40 in the first set of the survey, for example, 72 percent took this liberal position, contrasted with 27 percent among respondents 80 and oldeL What Davis found when he examined the youngest cohort over time is shown in Table 4-1. This pattern of a slight, conservative shift in the 1970s, followed by a liberal rebound in the 1980s, typifies the several cohorts Davis analyzed (J. Davis 1992: 269), In another study, Eric Plutzer and Michael Berkman (2005) used a cohort design to completely reverse a prior conclusion regarding aging and support for education. Logically, as people grow well beyond the child-rearing years, we might expect panel study A type of longitudinal study, in which data are collected from the same set of people (the sample or panel) at several points in time .
them to reduce their commitment to educational funding. Moreover, cross-sectional data support that expectation, The researchers present several data sets showing those over 65 voicing less support for education funding than those under 65 did. Such simplistic analyses, however, leave out an important variable: increasing support for educational funding in US. society over time in general, The researchers add to this the concept of "generational replacement," meaning that the older respondents in a survey grew up during a time when there was less support for education in generaL whereas the younger respondents grew up during a time of greater overall support. A cohort analysis allowed the researchers to determine what happened to the attitudes of specific cohorts over time. Here, for example, are the percentages of Americans born during the 1940s who felt educational spending was too low, when members of that cohort were interviewed over time (Plutzer and Berkman, 2000: 76):
Percent Who Say Educational Year Interviewed
Too Low
1970s 19805 19905 20005
58 66 74 79
As these data indicate, those who were born during the 1940s have steadily increased their support for educational funding as they have passed through and beyond the child-rearing years .
Panel Studies Though similar to trend and cohort studies, a panel study examines the same set of people each
105
time. For example, we could interview the same sample of voters every month during an election campaign, asking for whom they intended to vote. Though such a study would allow us to analyze overall trends in voter preferences for different candidates, it would also show the precise patterns of persistence and change in intentions. For example, a trend study that showed that Candidates A and B each had exactly half of the voters on September 1 and on October 1 as well could indicate that none of the electorate had changed voting plans, that all of the voters had changed their intentions, or something in-between. A panel study would eliminate this confusion by showing what kinds of voters switched from A to B and what kinds switched from B to A, as well as other facts. Joseph Veroff, Shirley Hatchett, and Elizabeth Douvan (1992) wanted to learn about marital adjustment among newlyweds, specifically regarding differences between white and African American couples. To get subjects for study, they selected a sample of couples who applied for marriage licenses in Wayne County, Michigan, April through June 1986. Concerned about the possible impact their research might have on the couples' marital adjustment, the researchers divided their sample in half at random: an experimental group and a control group (concepts we'll explore further in Chapter 8). Couples in the former group were intensively interviewed over a four-year period, whereas the latter group was contacted only briefly each yeaL By studying the same couples over time, the researchers could follow the specific problems that arose and the way the couples dealt with them. As a by-product of their research, they found that those studied the most intensely seemed to achieve a somewhat better marital adjustment. The researchers felt that the interviews could have forced couples to discuss matters they might have otherwise buried.
affiliation. A trend study might look at shifts in US. religious affiliations over time, as the Gallup Poll does on a regular basis, A cohort study might follow shifts in religious affiliations among "the Depression generation," specifically, say, people who were 20 to 30 years old in 1932. We could study a sample of people 30-40 years old in 1942, a new sample of people aged 40-50 in 1952, and so forth. A panel study could start with a sample of the whole population or of some special subset and study those specific individuals over time. Notice that only the panel study would give a full picture of the shifts among the various categories of affiliations, including "none." Cohort and trend studies would uncover only net changes. Longitudinal studies have an obvious advantage over cross-sectional ones in providing information describing processes over time. But this advantage often comes at a heavy cost in both time and money, especially in a large-scale survey. Observations may have to be made at the tin1e events are occurring, and the method of observation may require many research workers. Panel studies, which offer the most comprehensive data on changes over time, face a special problem: panel attrition. Some of the respondents studied in the first wave of the survey might not participate in later waves. (This is comparable to the problem of experimental mortality discussed in Chapter 8.) The danger is that those who drop out of the study may be atypicaL thereby distorting the results of the study. Thus, when Carol Aneshensel and her colleagues conducted a panel study of adolescent girls (comparing Latinas and non-Latinas), they looked for and found differences in characteristics of survey dropouts among Latinas born in the United States and those born in Mexico. These differences needed to be taken into account to avoid misleading conclusions about differences between Latinas and non-Latinas (Aneshensel et aL 1989),
Comparing the Three Types of Longitudinal Studies
Approximating Longitudinal Studies
To reinforce the distinctions among trend, cohort, and panel studies, let's contrast the three study designs in terms of the same variable: religiolls
Longitudinal studies do not always provide a feasible or practical means of studying processes that take place over time, Fortunately, researchers often can draw approxin1ate conclusions about such
106
Chapter 4: Research Design
processes even when only cross-sectional data are available. Here are some ways to do that. Sometimes cross-sectional data imply processes over time on the basis of simple logic For example, in the study of student drug use conducted at the University of Hawaii (Chapter 2), students were asked to report whether they had ever tried each of several illegal drugs . The study found that some students had tried both marijuana and LSD, some had tried only one, and others had tried neither. Because these data were collected at one time, and because some students presumably would experiment with drugs later on, it would appear that such a study could not tell whether students were more likely to try marijuana or LSD first. A closer examination of the data showed, however, that although some students reported having tried marijuana but not LSD, there were no students in the study who had tried only LSD. From this finding it was inferred-as common sense suggested-that marijuana use preceded LSD use. If the process of drug experimentation occurred in the opposite time order, then a study at a given time should have found some students who had tried LSD but not marijuana, and it should have found no students who had tried only marijuana. Researchers can also make logical inferences whenever the time order of variables is clear. If we discovered in a cross-sectional study of college students that those educated in private high schools received better college grades than those educated in public high schools did, we would conclude that the type of high school attended affected college grades, not the other way around. Thus, even though we made our observations at only one time, we would feel justified in drawing conclusions about processes taking place across time. Very often, age differences discovered in a cross-sectional study form the basis for inferring processes across time. Suppose you're interested in the pattern of worsening health over the course of the typical life cycle . You might study the results of annual checkups in a large hospitaL You could group health records according to the ages of those examined and rate each age group in terms of several health conditions-sight, hearing, blood
How to Design a Research Project
pressure, and so forth. By reading across the agegroup ratings for each health condition, you would have something approximating the health history of individuals. Thus, you might conclude that the average person develops vision problems before hearing problems. You would need to be cautious in this assumption, however. because the differences might reflect societyvvide trends. Perhaps improved hearing examinations instituted in the schools had affected only the young people in your study. Asking people to recall their pasts is another common way of approximating observations over time. Researchers use that method when they ask people where they were born or when they graduated from high school or whom they voted for in 1988. Qualitative researchers often conduct indepth "life history" interviews . For example, C. Lynn Carr (1998) used this technique in a study of "tomboyism." Her respondents, aged 25-40, were asked to reconstruct aspects of their lives from childhood on, including experiences of identifying themselves as tomboys. The danger in this technique is evident. Sometimes people have faulty memories; sometimes they lie. When people are asked in postelection polls whom they voted for. the results inevitably show more people voting for the winner than actually did so on election day. As part of a series of indepth interviews, such a report can be validated in the context of other reported details; however, results based on a single question in a survey must be regarded with caution. This discussion of the ways that time figures into social research suggests several questions you should confront in your own research projects. In designing any study, be sure to look at both the e;.,:plicit and implicit assumptions you're making about time . Are you interested in describing some process that occurs over time, or are you simply going to describe what exists now? If you want to describe a process occurring over time, ,viII you be able to make observations at different points in the process, or will you have to approximate such observations by drawing logical inferences from what you can observe now? If you opt for a longitudinal design, which method best serves your research purposes?
Examples of Research Strategies As the preceding discussions have implied, social research follows many paths. The following short excerpts further illustrate this point. As you read each eXCel})!, note both the content of each study and the method used to study the chosen topic Does the study seem to be exploring, describing, or explaining (or some combination of these)? What are the sources of data in each study? Can you identify the unit of analysis? Is the dimension of time relevant? If so, how"vill it be handled?
I>
107
Using interview and observational field data, I demonstrate how a system of temporary employment in a participative workplace both exploited and shaped entry-level workers' aspirations and occupational goals, (V. Smith 1998:411)
I>
I collected data [on White Separatist rhetoric] from several media of public discourse, including periodicals, books, pamphlets, transcripts from radio and television talk shows, and newspaper and magazine accounts. (Berbrier 1998: 435)
I>
I>
I>
I>
I>
I>
This case study of unobtrusive mobilizing by Southern California Rape Crisis Center uses archivaL observational, and interview data to explore how a feminist organization worked to change police, schools, prosecutors, and some state and national organizations from 1974 to 1994. (Schmitt and Martin 1999: 364) Using life history narratives, the present study investigates processes of agency and consciousness among 14 women who identified themselves as tomboys. (Carr 1998: 528) By drawing on interviews with activists in the former Estonian Soviet Socialist Republic, we specify the conditions by which accommodative and oppositional subcultures exist and are successfully transformed into social movements. (Johnston and Snow 1998: 473) This paper presents the results of an ethnographic study of an AIDS service organization located in a small city It is based on a combination of participant observation, interviews with participants, and review of organizational records. (Kilburn 1998: 89) Using interviews obtained during fieldwork in Palestine in 1992, 1993, and 1994, and employing historical and archival records, I argue that Palestinian feminist discourses were shaped and influenced by the sociopolitical context in which Palestinian women acted and with which they interacted. (Abdulhadi 1998: 649) This article reports on women's experiences of breastfeeding in public as revealed through indepth interviews with 51 women. (Stearns 1999: 308)
I>
In the analysis that follows, racial and gender inequality in employment and retirement will be analyzed, using a national sample of persons who began receiving Social Security Old Age benefits in 1980-8L (Hogan and Perrucci 1998: 528)
I>
Drawing from interviews with female crack dealers, this paper explores the techniques they use to avoid arrest (Jacobs and Miller 1998: 550)
How to DeSign a Research Project You've now seen some of the options available to social researchers in designing projects. I know there are a lot of components, and the relationships among them may not be totally clear, so here's a way of pulling them together. Let's assume you were to undertake research, Where would you start? Then, where would you go? Although research design occurs at the beginning of a research project, it involves all the steps of the subsequent project. This discussion, then, provides both guidance on how to start a research project and an overview of the topics that follow in later chapters of this book Figure 4-5 presents a schematic view of the social research process. I present this view reluctantly, because it may suggest more of a step-by-step order to research than actual practice bears out. Nonetheless, this idealized overview of the process
How to Design a Research Project
X~Y A~B
Survey research Field research Content analysis
A_B_E_F ~;t
;t~
~
C-D-X-Y
conclusions about? Who will be observed for that purpose?
Existing data research How will we actually measure the variables under study?
Comparative research Evaluation research
Transforming the data collected into a form appropriate to manipulation and analysis
FIGURE 4-5 The Research Process. Here are some of the key elements that we'll be examining throughout this book: the pieces that make up the whole of social research.
provides a context for the specific details of particular components of social research. Essentially, it is another and more detailed picture of the scientific process presented in Chapter 2. At the top of the diagram are interests, ideas, and theories, the possible beginning points for a line of research. The letters (A, B, X, Y, and so forth) represent variables or concepts such as prejudice or alienation. Thus, you might have a general interest in finding out what causes some people to be more prejudiced than others, or you might want to know some of the consequences of alienation. Alternatively, your inquiry might begin with a specific idea about the way things are. For example, you might have the idea that working on an assembly line causes alienation. The question marks in the diagram indicate that you aren't sure things are the way you suspect they are-that's why you're doing the research. Notice that a theory is represented as a set of complex relationships among several variables. The double arrows between "interest," "idea," and "theory" suggest that there is often a movement back and forth across these several possible beginnings. An initial interest may lead to the formulation of an idea, which may be fit into a larger theory, and the theory may produce new ideas and create new interests. Any or all of these three may suggest the need for empirical research. The purpose of such research can be to explore an interest, test a specific idea, or validate a complex theory. Whatever the purpose, the researcher needs to make a variety of decisions, as indicated in the remainder of the diagram. To make this discussion more concrete, let's take a specific research example. Suppose you're concerned with the issue of abortion and have a special interest in learning why some college students support abortion rights and others oppose them. Going a step further, let's say you've formed the impression that students in the humanities and social sciences seem generally more inclined to support the idea of abortion rights than those in the natural sciences do. (That kind of thinking often leads people to design and conduct social research.)
109
So, where do you start? You have an idea you want to pursue, one that involves abortion attitudes and choice of college majoc In terms of the options we've discussed in this chapter, you probably have both descriptive and explanatory interests, but you might decide you only want to explore the issue. You might wonder what sorts of attitudes students with different majors have about abortion (exploratory), what percentage of the student body supports a woman's right to an abortion (descriptive), or what causes some to support it and others to oppose it (explanation). The units of analysis in this case would individuals: college students. But we're jumping the gun. As you can see, even before we've "started," we've started. The reciprocal processes described in Figure 4-5 begin even before you've made a commitment to a project. Let's look more formally at the various steps, then, keeping this reciprocal motion in mind.
Getting Started At the outset of your project, then, your aim would probably be eX1Jloratory. At this point, you might choose among several possible activities in pursuing your interest in student attitudes about abortion rights. To begin with, you might want to read something about the issue. If you have a hunch that attitudes are somehow related to college major, you might find out what other researchers may have written about that. Appendix A of this book will help you make use of your college library. In addition, you would probably talk to some people who support abortion rights and some who don't. You might attend meetings of abortion-related groups . All these activities could help prepare you to handle the various decisions of research design we're about to examine. Before designing your study, you must define the purpose of your project. What kind of study will you undertake-exploratory, descriptive, explanatory? Do you plan to write a research paper to satisfy a course or thesis requirement? Is your purpose to gain information that will support you in arguing for or against abortion rights? Do you want to write an article for the campus newspaper or an
11 0
How to Design a Research Project
Chapter 4: Research Design
academic journal? In reviewing the previous research literature regarding abortion rights, you should note the design decisions other researchers have made, always asking whether the same decisions would satisfy your purposeo Usually, your purpose for undertaking research can be expressed as a report. A good first step in designing your project is to outline such a report (see Chapter 17 for help on this). Although your final report may not look much like your initial image of it this exercise will help you figure out which research designs are most appropriate. During this step, clearly describe the kinds of statements you want to make when the research is completeo Here are some examples of such statements: "Students frequently mentioned abortion rights in the context of discussing social issues that concerned them personally." "X percent of State U students favor a woman's right to choose an abortion." "Engineers are (morelless) likely than sociologists to favor abortion rights."
Conceptualization Once you have a well-defined purpose and a clear description of the kinds of outcomes you want to achieve, you can proceed to the next step in the design of your study-conceptualizationo We often talk pretty casually about social science concepts such as prejudice, alienation, religiosity, and liberalism, but it's necessary to clarify what we mean by these concepts, in order to draw meaningful conclusions about themo Chapter 5 examines this process of conceptualization in deptho For now, let's see what it might involve in the case of our hypothetical exampleo If you're going to study how college students feel about abortion and why, the first thing you'll have to specify is what you mean by "the right to an abortion." Because support for abortion probably varies according to the circumstances, you'll want to pay attention to the different conditions under which people might approve or disapprove of abortion: for example, when the woman's life is in danger. in the case of rape or incest or simply as a matter of personal choiceo
Similarly. you'll need to specify exact meanings for all the other concepts you plan to studyo If you want to study the relationship of opinion about abortion to college major, you'll have to decide whether you want to consider only officially declared majors or to include students' intentions as welL What will you do with those who have no major? In surveys and experiments, you need to specify such concepts in advanceo In less tightly structured research, such as open-ended interviews, an important part of the research may involve the discovery of different dimensions, aspects, or nuances of conceptso In such cases, the research itself may uncover and report aspects of social life that were not evident at the outset of the projecL
Choice of Research Method As we'll discuss in Part 3, each research method has its strengths and weaknesses, and certain concepts are more appropriately studied through some methods than through otherso In our study of attitudes toward abortion rights, a survey might be the most appropriate method: either interviewing students or asking them to fill out a questionnaire. Surveys are particularly well suited to the study of public opiniono This is not to say that you couldn't make good use of the other methods presented in Part 3. For example, you might use the method of content analysis to examine letters to the editor and analyze the different images of abortion that letter writers have. Field research would provide an avenue to understanding how people interact with one another regarding the issue of abortion, how they discuss it, and how they change their mindso Other research methods introduced in Part 3 could also be used in studying this topic. Usually, the best study design uses more than one research method, taking advantage of their different strengthso If you look back at the brief examples of actual studies at the end of the preceding section, you'll see several instances where the researchers used many methods in a single study.
Operationalization Once you've specified the concepts to be studied and chosen a research method, the next step is operationalization, or deciding on your measurement techniques (discussed further in Chapters 5 and 6). The meaning of variables in a study is determined in part by how they are measured, Part of the task here is deciding how the desired data will be collected: direct observation, review of official documents, a questionnaire, or some other technique. If you decided to use a survey to study attitudes toward abortion rights, part of operationalization is determining the wording of questionnaire itemso For example, you might operationalize your main variable by asking respondents whether they would approve of a woman's right to have an abortion under each of the conditions you've conceptualized: in the case of rape or incest, if her life were threatened by the pregnancy, and so forth. You'd design the questionnaire so that it asked respondents to express approval or disapproval for each situationo Similarly, you would specify exactly how respondents would indicate their college major, as well as what choices to provide those who have not declared a major.
111
decisions about population and sampling are related to decisions about the research method to be usedo Whereas probability sampling techniques would be relevant to a large-scale surveyor a content analysis, a field researcher might need to select only those informants who will yield a balanced picture of the situation under study, and an experimenter might assign subjects to experimental and control groups in a manner that creates comparability. In your hypothetical study of abortion attitudes, the relevant population would be the student population of your collegeo As you'll discover in Chapter 7, however, selecting a sample will require you to get more specific than thaL Will you include part-time as well as full-time students? Only degree candidates or everyone? International students as well as US. citizens? Undergraduates, graduate students, or both? There are many such questions-each of which must be answered in terms of your research purposeo If your purpose is to predict how students would vote in a local referendum on abortion, you might want to limit your population to those eligible and likely to voteo
Observations Population and Sampling In addition to refining concepts and measurements, you must decide whom or what to study The population for a study is that group (usually of people) about whom we want to draw conclusions. We're almost never able to study all the members of the population that interests us, however, and we can never make every possible observation of them. In every case, then, we select a sample from among the data that might be collected and studiedo The sampling of information, of course, occurs in everyday life and often produces biased observations. (Recall the discussion of "selective observation" in Chapter L) Social researchers are more deliberate in their sampling of what will be observedo Chapter 7 describes methods for selecting samples that adequately reflect the whole population that interests us. Notice in Figure 4-5 that
Having decided what to study among whom by what method, you're now ready to make observations-to collect empirical datao The chapters of Part 3, which describe the various research methods, give the different observation techniques appropriate to each. To conduct a survey on abortion, you might want to print questionnaires and mail them to a sample selected from the student body. Alternatively, you could arrange to have a team of interviewers conduct the survey over the telephone. The relative advantages and disadvantages of these and other possibilities are discussed in Chapter 9.
Data Processing Depending on the research meth~d chosen, you'll have amassed a volume of observations in a form that probably isn't immediately interpretable. If
112
The Research Proposal
Chapter 4: Research Design
you've spent a month observing a street-corner gang firsthand, you'll now have enough field notes to fill a book. In a historical study of ethnic diversity at your school, you may have amassed volumes of official documents, interviews with administrators and others, and so forth. Chapters 13 and 14 describe some of the ways social scientific data are processed or transformed for qualitative or quantitative analysis. In the case of a survey, the "raw" observations are typically in the form of questionnaires with boxes checked, answers written in spaces, and the like. The data-processing phase of a survey typically involves the classification (coding) of written-in answers and the transfer of all information to a computer.
Analysis Once the collected data are in a suitable form, you're ready to interpret them for the purpose of drawing conclusions that reflect the interests, ideas, and theories that initiated the inquiry. Chapters 13 and 14 describe a few of the many options available to you in analyzing data. In Figure 4-5, notice that the results of your analyses feed back into your initial interests, ideas, and theories. Often this feedback represents the beginning of another cycle of inquiry. In the survey of student attitudes about abortion rights, the analysis phase would pursue both descriptive and explanatory aims. You might begin by calculating the percentages of students who favored or opposed each of the several different versions of abortion rights. Taken together, these several percentages would provide a good picture of student opinion on the issue. Moving beyond simple description, you might describe the opinions of subsets of the student body, such as different college majors. Provided that your design called for trapping other information about respondents, you could also look at men versus women; freshmen, sophomores, juniors, seniors, and graduate students; or other categories that you've included. The description of subgroups could then lead you into an explanatory analysis.
Application The final stage of the research process involves the uses made of the research you've conducted and the conclusions you've reached. To start, you'll probably want to communicate your findings so that others will know what you've learned. It may be appropriate to prepare-and even publish-a written report. Perhaps you'll make oral presentations, such as papers delivered to professional and scientific meetings. Other students would also be interested in hearing what you've learned about them. You may want to go beyond simply reporting what you've learned to discussing the implications of your findings . Do they say anything about actions that might be taken in support of policy goals? Both the proponents and the opponents of abortion rights would be interested. Finally, be sure to consider what your research suggests in regard to further research on your subject. What mistakes should be corrected in future studies'? What avenues-opened up slightly in your study-should be pursued further?
Research Design in Review As this overview shows, research design involves a set of decisions regarding what topic is to be studied among what population with what research methods for what purpose. Although you'll want to consider many ways of studying a subject-and use your imagination as well as your knowledge of a variety of methods-research design is the process of focusing your perspective for the purposes of a particular study. If you're doing a research project for one of your courses, many aspects of research design may be specified for you in advance, induding the method (such as an experiment) or the topic (as in a course on a particular subject, such as prejudice). The following summary assumes that you're free to choose both your topic and your research strategy. In designing a research project, you'll find it useful to begin by assessing three things: your interests, your abilities, and the available resources.
Each of these considerations will suggest a large number of possible studies. Simulate the beginning of a somewhat conventional research project: Ask yourself what you're interested in understanding. Surely you have several questions about social behavior and attitudes. Why are some people politically liberal and others politically conservative? Why are some people more religious than others? Why do people join militia groups? Do colleges and universities still discriminate against minority faculty members? Why would a woman stay in an abusive relationship? Spend some time thinking about the kinds of questions that interest and concern you. Once you have a few questions you'd be interested in answering for yourself, think about the kind of information needed to answer them. What research units of analysis would provide the most relevant information: college students, corporations, voters, cities, or corporations? This question vvill probably be inseparable in your thoughts from the question of research topiCS. Then ask which aspects of the units of analysis would provide the information you need in order to answer your research question. Once you have some ideas about the kind of information relevant to your purpose, ask yourself how you might go about getting that information. Are the relevant data likely to be already available somewhere (say, in a government publication), or would you have to collect them yourself? If you think you would have to collect them, how would you go about doing it? Would you need to survey a large number of people or interview a few people in depth? Could you learn what you need to know by attending meetings of certain groups? Could you glean the data you need from books in the library? As you answer these questions, you'll find yourself well into the process of research design. Keep in mind your own research abilities and the resources available to you. There's little point in designing a perfect study that you can't actually carry out. You may want to try a research method you haven't used before so you can learn from it, but be careful not to put yourself at too great a disadvantage.
113
Once you have a general idea of what you want to study and how, carefully review previous research in journals and books to see how other researchers have addressed the topic and what they have learned about it. Your review of the literature may lead you to revise your research design: Perhaps you'll decide to use a previous researcher'S method or even replicate an earlier study. A standard procedure in the physical sciences, the independent replication of research projects is just as important in the social sciences, although social researchers tend to overlook that. Or, you might want to go beyond replication and study some aspect of the topic that you feel previous researchers have overlooked. Here's another approach you might take. Suppose a topic has been studied previously using field research methods. Can you design an experiment that would test the findings those earlier researchers produced? Or, can you think of existing statistics that could be used to test their conclusions? Did a mass survey yield results that you'd like to explore in greater detail through on-the-spot observations and in-depth interviews? The use of several different research methods to test the same finding is sometimes called triangulation, and you should always keep it in mind as a valuable research strategy. Because each research method has particular strengths and weaknesses, there is always a danger that research findings will reflect, at least in part, the method of inquiry. In the best of all worlds, your own research design should bring more than one research method to bear on the topic.
The Research Proposal Quite often, in the design of a research project, you'll have to layout the details of your plan for someone else's review and/or appr~val. In the case of a course project, for example, your instructor might very well want to see a "proposal" before you set off to work. Later in your career, if you wanted to undertake a major project, you might need to obtain funding from a foundation or government agency, who would most definitely want a
114
Main Points
Chapter 4: Research Design
detailed proposal that desCTibes how you would spend their money. You may respond to a Request for Proposals (RFP), which both public and private agencies often circulate in search of someone to do research for them. This chapter concludes 'with a brief discussion of how you might prepare a research proposaL This will give you one more overview of the whole research process that the rest of this book details.
Elements of a Research Proposal Although some funding agencies (or your instructor, for that matter) may have specific requirements for the elements or structure of a research proposal. here are some basic elements you should include.
Problem or Objective What exactly do you want to study? Why is it worth studying? Does the proposed study have practical significance? Does it contribute to the construction of social theories?
Literature Review What have others said about this topic? What theories address it and what do they say? What previous research exists? Are there consistent findings, or do past studies disagree? Are there flaws in the body of existing research that you think you can remedy? Chapter 17 has a lengthier discussion of this topic You'll find that special skills involved in reading social science research reports requires special skills . If you need to undertake a review of the literature at this point in your course, you may want to skip ahead to Chapter 17. It will familiarize you v'lith the different types of research literature, how to find what you want, and how to read it. There is a special discussion of how to use electronic resources online and how to avoid being misled by information on the Internet. In part, your review of the literature 'will be shaped by the data-collection method(s) you intend to use in your study. Reviewing the designs of previous studies using that same technique can give you a head start in planning your own study.
At the same time, you should focus your search on your research topic: regardless of the methods other researchers have used . So, if you're planning field research on, say, interracial marriages, you might gain some useful insights from the findings of surveys on the topic; further, past field research on interracial marriages could be invaluable in your designing a survey on the topic Because the literature review will appear early in your research proposal. you should write it vvith an eye to introducing the reader to the topic you will address, laying out in a logical manner what has already been learned on the topic by past researchers, then leading up to the holes or loose ends in our knowledge of the topic, which you propose to remedy. Or a little differently, your review of the literature may point to inconsistencies or disagreements to be found among the existing research findings. In that case, your proposed research will aim to resolve the ambiguities that plague us. I don't know about you, but I'm already excited about the research you're proposing to undertake .
Subjects for Study Whom or what will you study in order to collect data? Identify the subjects in general. theoretical terms; in specific. more concrete terms, identify who is available for study and how you'll reach them. Will it be appropriate to select a sample? If so, how will you do that? If there is any possibility that your research will affect those you study, how will you insure that the research does not harm them? Beyond these general questions, the specific research method you'll use will further specify the matter. If you're planning to undertake an experiment, a survey, or field research, for example, the techniques for subject selection will vary quite a biL Happily, Chapter 7 of this book discusses sampling techniques for both qualitative and quantitative studies.
Measurement What are the key variables in your study? How will you define and measure them? Do your definitions and measurement methods duplicate or differ from
those of previous research on this topic? If you have already developed your measurement device (a questionnaire, for example) or will be using something previously developed by others, it might be appropriate to include a copy in an appendix to your proposaL
Data-Collection Methods How vvill you actually collect the data for your study? Will you conduct an experiment or a survey? Will you undertake field research or will you focus on the reanalysis of statistics already created by others? Perhaps you'll use more than one method.
115
As you can see, if you're interested in conducting a social research project. it's a good idea to prepare a research proposal for your own purposes, even if you aren't required to do so by your instructor or a funding agency If you're going to invest your time and energy in such a project. you should do what you can to insure a return on that investment. Now that you've had a broad overview of social research, let's move on to the remaining chapters in this book and learn exactly how to design and execute each specific step. If you've found a research topic that really interests you, you'll want to keep it in mind as you see how you might go about studying it.
Analysis Indicate the kind of analysis you plan to conduct. Spell out the purpose and logic of your analysis. Are you interested in precise description? Do you intend to explain why things are the way they are? Do you plan to account for variations in some quality: for example, why some students are more liberal than others? What possible explanatory variables will your analysis consider, and how vviIl you know if you've explained variations adequately?
Schedule It's often appropriate to provide a schedule for the various stages of research. Even if you don't do this for the proposal. do it for yourself. Unless you have a timeline for accomplishing the several stages of research and keeping track of how you're doing, you may end up in trouble.
Budget When you ask someone to cover the costs of your research, you need to provide a budget that specifies where the money vvill go. Large, expensive projects include budgetary categories such as personnel, equipment, supplies, telephones, and postage. Even for a project you'll pay for yourself. it's a good idea to spend some time anticipating expenses: office supplies, photocopying, CD-ROMs, telephone calls, transportation, and so on.
MAIN POINTS
Introduction I>
Any research design requires researchers to specify as clearly as possible what they want to find out and then determine the best way to do it.
Three Purposes of Research I>
I>
I>
I>
The principal purposes of social research include exploration, description, and explanation. Research studies often combine more than one purpose. Exploration is the attempt to develop an initial. rough understanding of some phenomenon. Description is the precise measurement and reporting of the characteristics of some population or phenomenon under study. Explanation is the discovery and reporting of relationships among different aspects of the phenomenon under study. Whereas descriptive studies answer the question "What's so?" ex-planatory ones tend to answer the question "Why?"
The Logic of Nomothetic Explanation I>
Both idiographic and nomothetic models of explanation rest on the idea of causation.
116
Chapter 4: Research Design
The idiographic model aims at a complete understanding of a particular phenomenon, using all relevant causal factorso The nomothetic model aims at a general understanding-not necessarily complete-of a class of phenomena, using a small number of relevant causal factorso @
There are three basic criteria for establishing causation in nomothetic analyses: (1) The variables must be empirically associated, or correlated, (2) the causal variable must occur earlier in time than the variable it is said to affect, and (3) the observed effect cannot be explained as the effect of a different variable.
Additional Reading @
@
@
Mere association, or correlation, does not in itself establish causationo A spurious causal relationship is an association that in reality is caused by one or more other variables. @
Units of Analysis @
@
Units of analysis are the people or things whose characteristics social researchers observe, describe, and explaino Typically, the unit of analysis in social research is the individual person, but it may also be a social group, a formal organization, a social interaction, a social artifact, or some other phenomenon such as a lifestyle or a type of social interaction. The ecological fallacy involves conclusions dravvn from the analysis of groups (e.go, corporations) that are then assumed to apply to individuals (e.g., the employees of corporations).
@
Reductionism is the attempt to understand a complex phenomenon in terms of a narrow set of concepts, such as attempting to explain the American Revolution solely in terms of economics (or political idealism or psychology).
Research design starts with an initial interest, idea, or theoretical ex-pectation and proceeds through a series of interrelated steps to narrow the focus of the study so that concepts, methods, and procedures are well definedo A good research plan accounts for all these steps in advance. At the outset, a researcher specifies the meaning of the concepts or variables to be studied (conceptualization), chooses a research method or methods (e.go, experiments versus surveys), and specifies the population to be studied and, if applicable, how it ,viII be sampledo To operationalize the concepts to be studied, the researcher states precisely how variables in the study will be measuredo Research then proceeds through observation, data processing, analysis, and application, such as reporting the results and assessing their implicationso
A research proposal provides a preview of why a study will be undertaken and how it vvill be conducted. A research project is often required to get permission or necessary resourceso Even when not required, a proposal is a useful device for planning.
The Time Dimension @
Research into processes that occur over time presents social challenges that can be addressed through cross-sectional studies or longitudinal studies.
is introduced, as well as in the comprehensive glossary at the back of the booko cohort study correlation cross-sectional study ecological fallacy longitudinal study panel study
10 One example in this chapter suggested that political orientations cause attitudes toward legalizing marijuana. Can you make an argument that the time order is just the opposite of what was assumed?
The following terms are defined in context in the chapter and at the bottom of the page where the term
The analysis of community opposition to group homes for the mentally handicapped 00. indicates that deteriorating neighborhoods are most likely to organize in opposition, but that upper-middle class neighborhoods are most likely to enjoy private access to local officialso (Graham and Hogan 1990: 513)
go
Some analysts during the 1960s predicted that the rise of economic ambition and political militancy among blacks would foster discontent with the "otherworldly" black mainline churcheso (Ellison and Sherkat 1990: 551)
IL This analysis explores whether propositions and empirical findings of contemporary theories of organizations directly apply to both private product producing organizations (PPOs) and public human service organizations (PSOs)o (Schiflett and ley 1990: 569)
2. Here are some examples of real research topics. For each one, can you name the unit of analysis? (The answers are at the end of this chapteL) a.
Women watch TV more than men because they are likely to work fewer hours outside the home than meno 0 0 Black people watch an average of approximately three-quarters of an hour more television per day than white people. (Hughes 1980: 290)
b.
Of the 130 incorporated UoSo cities with more than 100,000 inhabitants in 1960, 126 had at least two short-term nonproprietary general hospitals accredited by the American Hospital Associationo (Turk 1980: 317)
c
The early TM [transcendental meditation] organizations were small and informal. The Los Angeles group, begun in June 1959, met at a member's house where, incidentally, Maharishi was living. (Johnston 1980: 337)
d.. However, it appears that the nursing staffs exercise strong influence over 0 0 0a decision to change the nursing care systemo . 00 Conversely, among those decisions dominated by the administration and the medical staffs 00 0 (Comstock 1980: 77) e..
KEY TERMS
reductionism social artifact sociobiology spurious relationship trend study units of analysis
L
REVIEW QUESTIONS AND EXERCISES
The Research Proposal @
@
In longitudinal studies, observations are made at many tin1es. Such observations may be made of samples drawn from general populations (trend studies), samples drawn from more specific subpopulations (cohort studies), or the same sample of people each time (panel studies)o
How to Design a Research Project
Necessary and Sufficient Causes @
Cross-sectional studies are based on observations made at one tin1eo Although such studies are limited by this characteristic, researchers can sometimes make inferences about processes that occur over timeo
Though 667,000 out of 2 million farmers in the United States are women, women historically have not been viewed as farmers, but rather, as the farmer's wifeo (Votaw 1979: 8)
117
L
3
This paper examines variations in job title structures across work roleso Analyzing 3,173 job titles in the California civil service system in 1985, we investigate how and why lines of work vary in the proliferation of job categories that differentiate ranks, functions, or particular organizationallocationso (Strang and Baron 1990: 479)
Review the logic of spuriousnesso Can you think up an example where an observed relationship between two variables could actually be explained away by a third variable?
40 Using InfoTrac College Edition or printed journals in the library, locate a research project involving a panel studyo Describe the nature of the study design and its primary findingso
ADDITIONAL READINGS
Bart, Pauline, and Linda FrankeL 19860 The Student Sociologist's Handbook.. Morristown, NJ: General Learning Press. A handy little reference book to help you get started on a research projecL Written from the standpoint of a student term paper, this volume offers a particularly good guide to the periodical literature of the social sciences that's available in a good libraryo Casley, D. J., and Do Ao Luryo 19870 Data Collection in Developing COlllltrieso Oxford: Clarendon Presso
118
Online Study Resources
Chapter 4: Research Design This book discusses the special problems of research in the developing world .
Cooper, Harris M . 1989 . Imegrating Research A Guide for Literature RevielVs. Newbury Park, CA: Sage. The author leads you through each step in the literature review process. Hunt, Morron. 1985. Profiles of Social Research The SciClltific SlIIdy of Human Imeracrions . New York: Basic Books . An engaging and informative series of project biographies: James Coleman'S study of segregated schools is presented, as well as several other major projects that illustrate the elements of social research in actual practice. Iversen, Gudmund R. 1991. ComexlIIal Analysis. Newbury Park, CA: Sage. Contextual analysis examines the impact of socioenvironmental factors on individual behavior. Durkheim's study of suicide offers a good example of this, identifying social contexts that affect the likelihood of selfdestruction. Maxwell, Joseph A. 1996. Qualirative Research Design: An Interacriw Approaciz. Newbury Park, CA: Sage. Maxwell covers many of the same topics that this chapter does but with attention devoted specifically to qualitative research projects. Menard, Scott. 1991 . Longillidinal Research. Newbury Park, CA: Sage. Beginning by explaining why researchers conduct longitudinal research, the author goes on to detail a variety of study designs as well as suggestions for the analysis of longitudinal data Miller, Delbert. 1991. Handbook of Researciz Design and Social MeasuremCllf. Newbury Park. CA: Sage. A useful reference book for introducing or reviewing numerous issues involved in design and measurement. In addition, the book contains a wealth of practical information relating to foundations, journals, and professional associations.
SPSS EXERCISES
See the booklet that accompanies your text for exercises using SPSS (Statistical Package for the Social Sciences). There are exercises offered for each chapter, and you'll also find a detailed primer on using SPSS.
Online Study Resources Sociology @Now'": Research Methods 1.
2.
3.
Before you do your final review of the chapter, take the SociologyNolF Researclz Merlzods diagnostic quiz to help identify the areas on which you should concentrate. You'll find information on this online tooL as well as instructions on how to access all of its great resources, in the front of the book. As you review, take advantage of the Sociology Now Research lYlerhods customized study plan, based on your quiz results. Use this study plan with its interactive exercises and other resources to master the materiaL When you're finished with your review, take the posttest to confirm that you're ready to move on to the next chapter .
WEBSITE FOR THE PRACTICE OF SOCIAL RESEARCH 11 TH EDITION Go to your book's website at http://sociology .wadsworth.com/babbie_practicelle for tools to aid you in studying for your exams. You'll find Tiltorial Quizzes with feedback, Imenzet Ewrcises, Flaslzcards, and Chapter TlItorials, as well as E\1ended Projects, InfoTrae College Edition search terms, Social Research in Cyberspace, GSS Data, H'c?b Links, and primers for using various data-analysis software such as SPSS and l',TVivo.
WEB LINKS FOR THIS CHAPTER Please realize that the Internet is an evolving entity, subject to change . Nevertheless, these few websites should be fairly stable. Also, check your book's website for even more H't?b Links
The Internet Public Library, Social Sciences Resources http://www,ipLorg/reflRR/static/socOO.OO.OO.html This site, along with its numerous hotlinks, provides a broad view of the kinds of research topics explored by social researchers in many disciplines. University of Calgary, Beginner's Guide to the Research Proposal http://www.ucalgary.ca/md/CAH/research/ res_prop.htm
As the name suggests, this site will walk you through the process of preparing a research proposaL
c
Transcendental meditation organizations (groups)
Anthony W. Heath, "The Proposal
d.
Nursing staffs (groups)
in Qualitative Research"
e,
Farmers (individuals)
http://vvwwnova.edu/ssss/QR/QR3-l/heath.html This piece, reprinted from The QlIalirative Report 3 (no. 1, March 1997) provides another guide to proposal writing, this time specifically for qualitative research projects .
L
Neighborhoods (groups)
g.
Blacks (individuals)
h. Service and production organizations (formal organizations) i.
ANSWERS TO UNITS OF ANALYSIS QUIZ, EXERCISE 2 a..
Men and women, black and white people (individuals)
b..
Incorporated US, cities (groups)
119
Job titles (artifacts)
Measuring Anything That Exists
Conceptualization, Operationalization, and Measurement
Introduction This chapter and the next deal with how researchers move from a general idea about what they want to study to effective and well-defined measurements in the real world. This chapter discusses the interrelated processes of conceptualization, operationalization, and measurement. Chapter 6 builds on this foundation to discuss types of measurements that are more complex" We begin this chapter by confronting the hidden concern people sometimes have about whether it's truly possible to measure the stuff of life: love, hate, prejudice, religiosity, radicalism, alienation. The answer is yes, but it will take a few pages to see how. Once we establish that researchers can measure anything that exists, we'll turn to the steps involved in doing just that.
Measuring Anything That Exists Introduction Measuring Anything That Exists Conceptions, Concepts, and Reality Concepts as Constructs Conceptualization Indicators and Dimensions The Interchangeability of Indicators Real, Nominal, and Operational Definitions Creating Conceptual Order An Example of Conceptualization: The Concept of Anomie Definitions in Descriptive and Explanatory Studies Operationalization Choices Range of Variation
Variations between the Extremes A Note on Dimensions Defining Variables and Attributes Levels of Measurement Single or Multiple Indicators Some Illustrations of Operationalization Choices Operationalization Goes On and On Criteria of Measurement Quality Precision and Accuracy Reliability Validity Who Decides What's Valid? Tension between Reliability and Validity
SociologyWNow'": Research Methods Use this online tool to help you make the grade on your next exam. After reading this chapter, go to the "Online Study Resources" at the end of the chapter for instructions on how to benefit from SodologyNow: Research Methods"
Earlier in this book, I said that one of the two pillars of science is observation" Because this word can suggest a casuaL passive activity, scientists often use the term measurement instead, meaning carefuL deliberate observations of the real world for the purpose of describing objects and events in terms of the attributes composing a variable. You may have some reservations about the ability of science to measure the really important aspects of human social existence. If you've read research reports dealing vvith something like liberalism or religion or prejudice, you may have been dissatisfied with the way the researchers measured whatever they Viere studying. You may have felt that they were too superficiaL that they missed the aspects that really matter most. Maybe they measured religiosity as the number of times a person went to religious services, or maybe they measured liberalism by how people voted in a single election. Your dissatisfaction would surely have increased if
121
you had found yourself being misclassified by the measurement system. Your feeling of dissatisfaction reflects an important fact about social research: Most of the variables we want to study don't actually exist in the way that rocks exist. Indeed, they are made up. Moreover, they seldom have a single, unambiguous meaning. To see what I mean, suppose we want to study political parry affiliatiolL To measure this variable, we might consult the list of registered voters to note whether the people we were studying were registered as Democrats or Republicans and take that as a measure of their party affiliation. But we could also simply ask someone what party they identify with and take their response as our measure. Notice that these two different measurement possibilities reflect somewhat different definitions of "political party affiliation." They might even produce different results: Someone may have registered as a Democrat years ago but gravitated more and more toward a Republican philosophy over time" Or someone who is registered with neither political party may, when asked, say she is affiliated with the one she feels the most kinship with. Similar points apply to religiolls affiliatioll. Sometimes this variable refers to official membership in a particular church, temple, mosque, and so forth; other times it simply means whatever religion, if any, you identify yourself with. Perhaps to you it means something else, such as attendance at religious services. The truth is that neither "party affiliation" nor "religious affiliation" has any real meaning, if by "real" we mean corresponding to some objective aspect of reality. These variables do not exist in nature. They are merely terms we've made up and assigned specific meanings to for some purpose, such as doing social research. But, you might object, "political affiliation" and "religious affiliation"-and a host of other things social researchers are interested in, such as prejudice or compassion-have some reality. After all, we make statements about them, such as "In
122
Chapter 5: Conceptualization, Operationalization, and Measurement
Happytown, 55 percent of the adults affiliate with the Republican Party, and 45 percent of them are Episcopalians. Overall, people in Happytown are low in prejudice and high in compassion." Even ordinary people, not just social researchers, have been known to make statements like that If these things don't exist in reality, what is it that we're measuring and talking about? What indeed? Let's take a closer look by considering a variable of interest to many social researchers (and many other people as well)prejudice.
Conceptions, Concepts, and Reality As you and I wandered dovvn the road of life, we observed a lot of things and knew they were real through our observations, and we heard reports from other people that seemed real. For example: e
We personally heard people say nasty things about minority groups.
e
We heard people say that women were inferior to men.
e
We read about African Americans being lynched.
e
We read that women and minorities earned less for the same work.
e
We learned about "ethnic cleansing" and wars in which one ethnic group tried to eradicate another.
With additional experience, we noticed something more. People who participated in lynching were also quite likely to call African Americans ugly names. A lot of them, moreover, seemed to want women to "stay in their place." Eventually it dawned on us that these several tendencies often appeared together in the same people and also had something in common. At some point, someone had a bright idea: "Let's use the word prejlldiced as a shorthand notation for people like that. We can use the term even if they don't do all those things-as long as they're pretty much like that." Being basically agreeable and interested in efficiency, we agreed to go along with the system. That's where "prejudice" came from. We never
observed it We just agreed to use it as a shortcut, a name that represents a collection of apparently related phenomena that we've each observed in the course of life. In short, we made it up. Here's another clue that prejudice isn't something that exists apart from our rough agreement to use the term in a certain way.. Each of us develops our own mental image of what the set of real phenomena we've observed represents in general and what these phenomena have in common. When I say the word prejudice, it evokes a mental image in your mind, just as it evokes one in mine. It's as though file drawers in our minds contained thousands of sheets of paper, with each sheet of paper labeled in the upper right-hand corner. A sheet of paper in each of our minds has the term prejudice on it. On your sheet are all the things you've been told about prejudice and everything you've observed that seems to be an example of it. My sheet has what I've been told about it plus all the things I've observed that seem examples of it-and mine isn't the same as yours. The technical term for those mental images, those sheets of paper in our mental file drawers, is concepcion. That is, I have a conception of prejudice, and so do you. VVe can't communicate these mental images directly, so we use the terms written in the upper right-hand corner of our own mental sheets of paper as a way of conm1Unicating about our conceptions and the things we observe that are related to those conceptions. These terms make it possible for us to communicate and eventually agree on what we specifically mean by those terms. In social research, the process of coming to an agreement about what terms mean is conceptualization, and the result is called a concept. Let's take another example of a conception. Suppose that I'm going to meet someone named Pat, whom you already know. I ask you what Pat is like. Now suppose that you've seen Pat help lost children find their parents and put a tiny bird back in its nest. Pat got you to take turkeys to poor families on Thanksgiving and to visit a children's hospitalon Christmas. You've seen Pat weep through a movie about a mother overcoming adversities to save and protect her child. As you search through your mental files, you may find all or most of those
Measuring Anything That Exists
phenomena recorded on a single sheet labeled "compassionate." You look over the other entries on the page, and you find they seem to provide an accurate description of Pat So you say, "Pat is compassionate." Now I leaf through my own mental file drawer until I find a sheet marked "compassionate." I then look over the things written on my sheet, and I say, "Oh, that's nice." I now feel I know what Pat is like, but my expectations reflect the entries on my file sheet, not yours. Later, when I meet Pat, I happen to find that my own experiences correspond to the entries I have on my "compassionate" file sheet, and I say that you sure were right. But suppose my observations of Pat contradict the things I have on my file sheet. I tell you that I don't think Pat is very compassionate, and we begin to compare notes. You say, "I once saw Pat weep through a movie about a mother overcoming adversity to save and protect her child." I look at my "compassionate sheet" and can't find anything like that. Looking elsewhere in my file, I locate that sort of phenomenon on a sheet labeled "sentimental." I retort, "That's not compassion. That's just sentimentality." To further strengthen my case, I tell you that I saw Pat refuse to give money to an organization dedicated to saving whales from extinction. "That represents a lack of compassion," I argue. You search through your files and find saving the whales on two sheets-"environmental activism" and "cross-species dating"-and you say so. Eventually, we set about comparing the entries we have on our respective sheets labeled "compassionate." We then discover that many of our mental images corresponding to that term differ. In the big picture, language and communication work only to the extent that you and I have considerable overlap in the kinds of entries we have on our corresponding mental file sheets. The similarities we have on those sheets represent the agreements existing in our society. As we grow up, we're told approximately the same thing when we're first introduced to a particular term. Dictionaries formalize the agreements our society has about such terms. Each of us, then, shapes his or her mental images to correspond with such agreements. But because all
123
of us have different experiences and observations, no two people end up with exactly the same set of entries on any sheet in their file systems. If we want to measure "prejudice" or "compassion," we must first stipulate what, exactly, counts as prejudice or compassion for our purposes. Returning to the assertion made at the outset of this chapter, we can measure anything that's reaL We can measure, for example, whether Pat actually puts the little bird back in its nest, visits the hospital on Christmas, weeps at the movie, or refuses to contribute to saving the whales. All of those behaviors exist, so we can measure them. But is Pat really compassionate? We can't answer that question; we can't measure compassion in any objective sense, because compassion doesn't exist in the way that those things I just described exist. Compassion exists only in the form of the agreements we have about how to use the term in communicating about things that are reaL
Concepts as Constructs If you recall the discussions of postmodernism in Chapter 1, you'll recognize that some people would object to the degree of "reality" I've allowed in the preceding comments. Did Pat "really" visit the hospital on Christmas? Does the hospital "really" exist? Does Christmas? Though we aren't going to be radically postmodern in this chapter, I think you'll recognize the importance of an intellectually tough view of what's real and what's not. (When the intellectual going gets tough, the tough become social scientists. ) In this context, Abraham Kaplan (1964) distinguishes three classes of things that scientists measure. The first class is direct observables: those things we can observe rather simply and directly, like the color of an apple or the check mark made in a questionnaire . The second class, indirect observables, require "relatively more subtle, complex, or indirect observations" (1964: 55). We note a person's check mark beside "female" in a questionnaire and have indirectly observed that person's gender. History books or minutes of corporate board meetings provide indirect observations of past social actions. Finally, the third class of observables consists of
124
Conceptualization
Chapter 5: Conceptualization, Operationalization, and Measurement
construas-theoretical creations that are based on observations but that cannot be observed directly or indirectly. A good example is intelligence quotient or IQ. It is constructed mathematically from observations of the answers given to a large number of questions on an IQ test No one can directly or indirectly observe IQ. It is no more a "real" characteristic of people than is compassion or prejudice. Kaplan (1964: 49) defines concept as a "family of conceptions." A concept is, as Kaplan notes, a construct, something we create. Concepts such as compassion and prejudice are constructs created from your conception of them, my conception of them, and the conceptions of all those who have ever used these terms. They cannot be observed directly or indirectly, because they don't exist. We made them up. To summarize, concepts are constructs derived by mutual agreement from mental images (conceptions) . Our conceptions sunm1arize collections of seemingly related observations and experiences. Although the observations and experiences are reaL at least subjectively, conceptions, and the concepts derived from them, are only mental creations. The terms associated with concepts are merely devices created for the purposes of filing and communication. A term such as prejudice is, objectively speaking, only a collection of letters. It has no intrinsic reality beyond that Is has only the meaning we agree to give it Usually, however, we fall into the trap of believing that terms for constructs do have intrinsic meaning, that they name real entities in the world. That danger seems to grow stronger vvhen we begin to take terms seriously and attempt to use them precisely. Further, the danger is all the greater in the presence of experts who appear to know more than we do about what the terms really mean: It's easy to yield to authority in such a situation. Once we assume that terms like prejudice and compassion have real meanings, we begin the tor-
conceptualization The mental process whereby fuzzy and imprecise notions (concepts) are made more specific and precise. So you want to study prejudice. What do you mean by "prejudice"? Are there different kinds of prejudice? What are they?
tured task of discovering what those real meanings are and what constitutes a genuine measurement of them. Regarding constructs as real is called reifica[ion. The reification of concepts in day-to-day life is quite common. In science, we want to be quite clear about what it is we are actually measuring, but this aim brings a pitfall with it. Settling on the "best" way of measuring a variable in a particular study may imply that we've discovered the "real" meaning of the concept involved. In fact concepts have no reaL true, or objective meanings-only those we agree are best for a particular purpose. Does this discussion imply that compassion, prejudice, and similar constructs can't be measured? Interestingly, the answer is no. (And a good thing, too, or a lot of us social researcher types would be out of work.) I've said that we can measure anything that's real. Constructs aren't real in the way that trees are reaL but they do have another important virtue: They are useful. That is, they help us organize, communicate about. and understand things that are real. They help us make predictions about real things. Some of those predictions even turn out to be true. Constructs can work this way because, although not real or observable in themselves, they have a definite relationship to things that are real and observable. The bridge from direct and indirect observables to useful constructs is the process called conceptualization.
Conceptualization As we've seen, day-to-day communication usually occurs through a system of vague and general agreements about the use of terms. Although you and I do not agree completely about the use of the term compassionate, I'm probably safe in assuming that Pat won't pull the wings off flies. A wide range of misunderstandings and conflict-from the interpersonal to the international-is the price we pay for our imprecision, but somehow we muddle through. Science, however. aims at more than muddling; it cannot operate in a context of such imprecision. The process through which we specify what we mean when we use particular terms in research is called conceptualization. Suppose we want to
find out for example, whether women are more compassionate than men. I suspect many people assume iliis is the case, but it might be interesting to find out if it's really so. We can't meaningfully study the question, let alone agree on the answer, without some working agreements about the meaning of compassion. They are "working" agreements in the sense that they allow us to work on the question. We don't need to agree or even pretend to agree that a particular specification is ultimately the best one. Conceptualization, then, produces a specific, agreed-on meaning for a concept for the purposes of research. This process of specifying exact meaning involves describing the indicators we'll be using to measure our concept and the different aspects of the concept called dimensions.
Indicators and Dimensions Conceptualization gives definite meaning to a concept by specifying one or more indicators of what we have in mind. An indicator is a sign of the presence or absence of the concept we're studying. Here's an example. We might agree that visiting children'S hospitals during Christmas and Hanukkah is an indicator of compassion. Putting little birds back in their nests might be agreed on as another indicator, and so forth. If the unit of analysis for our study is the individual person, we can then observe the presence or absence of each indicator for each person under study. Going beyond that, we can add up the number of indicators of compassion observed for each individual. We might agree on ten specific indicators, for example, and find six present in our study of Pat three for John, nine for Mary, and so forth. Returning to our question about whether men or women are more compassionate, we might calculate that the women we studied displayed an average of 6.5 indicators of compassion, the men an average of 3.2. On the basis of our quantitative analysis of group difference, we might therefore conclude that women are, on the whole, more compassionate than men. Usually, though, it's not that simple. Imagine you're interested in understanding a small funda-
125
mentalist religious cult particularly their harsh views on various groups: gays, nonbelievers, feminists, and others. In fact they suggest that anyone who refuses to join their group and abide by its teachings will "burn in hell." In the context of your interest in compassion, they don't seem to have much. And yet, the group's literature often speaks of their compassion for others. You want to explore this seeming paradox. To pursue this research interest you might arrange to interact with cult members, getting to know them and learning more about their views. You could tell them you were a social researcher interested in learning about their group, or perhaps you would just express an interest in learning more, without saying why. In the course of your conversations with group members and perhaps attendance of religious services, you would put yourself in situations where you could come to understand what the cult members mean by compassion. You might learn, for example, that members of the group were so deeply concerned about sinners burning in hell that they were "villing to be aggressive, even violent to make people change their sinful ways. Within their own paradigm, then, cult members would see beating up gays, prostitutes, and abortion doctors as acts of compassion. Social researchers focus their attention on the meanings that the people under study give to words and actions. Doing so can often clarify the behaviors observed: At least now you understand how the cult can see violent acts as compassionate. On the other hand, paying attention to what words and actions mean to the people under study almost always complicates the concepts researchers are interested in. (We'll return to this issue when we discuss the validity of measures, toward the end of this chapter.) Whenever we take our concepts seriously and set about specifying what we mean by them, we
indicator An observation that we choose to consider as a reflection of a variable we wish to study. Thus, for example, attending religious services might be considered an indicator of religiosity.
126
Conceptualization
Chapter 5: Conceptualization, Operationalization, and Measurement
discover disagreements and inconsistencies. Not only do you and I disagree, but each of us is likely to find a good deal of muddiness within our own mental images. If you take a moment to look at 'what you mean by compassion, you'll probably find that your image contains several kinds of compassion. That is, the entries on your mental file sheet can be combined into groups and subgroups, say, compassion toward friends, co-religionists, humans, and birds. You may also find several different strategies for making combinations. For example, you might group the entries into feelings and actions. The technical term for such groupings is dimension: a specifiable aspect of a concept. For instance, we might speak of the "feeling dimension" of compassion and the "action dimension" of compassion. In a different grouping scheme, we might distinguish "compassion for humans" from "compassion for animals." Or we might see compaSSion as helping people have what we want for them versus what they want for themselves. Still differently, we might distinguish compassion as forgiveness from compassion as pity. Thus, we could subdivide compassion into several clearly defined dimensions. A complete conceptualization involves both specifying dimensions and identifying the various indicators for eacl1. Sometimes conceptualization aimed at identifying different dimensions of a variable leads to a different kind of distinction. We may conclude that we've been using the same word for meaningfully distinguishable concepts. In the follOwing example, the researchers find (1) that "violence" is not a sufficient description of "genocide" and (2) that the concept "genocide" itself comprises several distinct phenomena. Let's look at the process they went through to come to this conclusion. When Daniel Chirot and Jennifer Edwards attempted to define the concept of "genocide," they
dimension A specifiable aspect of a concept. "Religiosity," for example, might be specified in terms of a belief dimension, a ritual dimension, a devotional dimension, a knowledge dimension, and so forth.
found existing assumptions were not precise enough for their purposes: The United Nations originally defined it as an attempt to destroy "in whole or in part, a national, ethnic, racial, or religious group." If genocide is distinct from other types of violence, it requires its own unique explanation. (2003 14)
Notice the final comment in this excerpt, as it provides an important insight into why researchers are so careful in specifying the concepts they study. If genocide, such as the Holocaust, were simply another example of violence, like assaults and homicides, then what we know about violence in general might e::\:plain genocide. If it differs from other forms of violence, then we may need a different explanation for it. So, the researchers began by suggesting that "genocide" was a concept distinct from "violence" for their purposes. Then, as Chirot and Edwards examined historical instances of genocide, they began concluding that the motivations for launching genocidal mayhem differed sufficiently to represent four distinct phenomena that were all called "genocide" (2003: 15-18). I.
2.
3..
COlZvenience: Sometimes the attempt to eradicate a group of people serves a function for the eradicators, such as Julius Caesar's attempt to eradicate tribes defeated in battle, fearing they would be difficult to rule. Or when gold was discovered on Cherokee land in the Southeastern United States in the early nineteenth century, the Cherokee were forcibly relocated to Oklahoma in an event known as the "Trail of Tears," which ultimately killed as many as half of those forced to leave. Revenge: When the Chinese of Nanking bravely resisted the Japanese invaders in the early years of World War II, the conquerors felt they had been insulted by those they regarded as inferior beings. Tens of thousands were slaughtered in the "Rape of Nanking" in 1937-1938. Fear: The ethnic cleansing that recently occurred in the former Yugoslavia was at least
partly motivated by economic competition and worries that the growing Albanian population of Kosovo was gaining political strength through numbers. Similarly, the Hutu attempt to eradicate the Tutsis of Rwanda grew out of a fear that returning Tutsi refugees would seize control of the country. Often intergroup fears such as these grow out of long histories of atrocities, often inflicted in both directions. 4. Purification: The Nazi Holocaust, probably the most publicized case of genocide, was intended as a purification of the "Aryan race." While Jews were the main target, gypsies, homosexuals, and many other groups were also included. Other examples include the Indonesian witch-hunt against communists in 1965-1966 and the attempt to eradicate all non-Khmer Cambodians under Pol Pot in the I970s. No single theory of genocide could explain these various forms of mayhem. Indeed, this act of conceptualization suggests four distinct phenomena, each needing a different set of explanations. Specifying the different dimensions of a concept often paves the way for a more sophisticated understanding of what we're studying. We might observe, for example, that women are more compassionate in terms of feelings, and men more so in terms of actions-or vice versa. Whichever turned out to be the case, we would not be able to say whether men or women are really more compassionate. Our research would have shown that there is no single answer to the question. That alone represents an advance in our understanding of reality. To get a better feel for concepts, variables, and indicators, go to the General Social Survey codebook and explore some of the ways the researchers have measured various concepts: http:// www. icpsr .umich.edu/GSS99/subject/s-indexJltm.
The Interchangeability of Indicators There is another way that the notion of indicators can help us in our attempts to understand reality by means of "unreal" constructs. Suppose, for the
127
moment, that you and I have compiled a list of 100 indicators of compassion and its various dimensions. Suppose further that we disagree widely on which indicators give the clearest evidence of compassion or its absence, If we pretty much agree on some indicators, we could focus our attention on those, and we would probably agree on the answer they provided. We would then be able to say that some people are more compassionate than others in some dimension, But suppose we don't really agree on any of the possible indicators. Surprisingly, we can still reach an agreement on whether men or women are the more compassionate . How we do that has to do with the interchangeability of indicators. The logic works like this. If we disagree totally on the value of the indicators, one solution would be to study all of them. Suppose that women turn out to be more compassionate than men on all 100 indicators-on all the indicators you favor and on all of mine. Then we would be able to agree that women are more compassionate than men, even though we still disagree on exactly what compassion means in generaL The interchangeability of indicators means that if several different indicators all represent, to some degree, the same concept, then all of them will behave the same way that the concept would behave if it were real and could be observed. Thus, given a basic agreement about what "compassion" is, if women are generally more compassionate than men, we should be able to observe that difference by using any reasonable measure of compassion. If, on the other hand, women are more compassionate than men on some indicators but not on others, we should see if the two sets of indicators represent different dimensions of compassion. You have now seen the fundamental logic of conceptualization and measurement. The discussions that follow are mainly refinements and extensions of what you've just read. Before turning to a technical elaboration of measurement, however, we need to fill out the picture of conceptualization by looking at some of the ways social researchers provide standards, consistency, and commonality for the meanings of terms.
128
Ii
Conceptualization
Chapter 5: Conceptualization, Operationalization, and Measurement
ReaL NominaL and Operational Definitions As we have seen, the design and execution of social research requires us to clear away the confusion over concepts and reality To this end, logicians and scientists have found it useful to distinguish three kinds of definitions: reaL nominaL and operationaL The first of these reflects the reification of terms. As Carl Hempel cautions, A "real" definition, according to traditional logic, is not a stipulation determining the meaning of some expression but a statement of the "essential nature" or the "essential attributes" of some entity. The notion of essential nature, however, is so vague as to render tlus characterization useless for the purposes of rigorous inquiry. (1952: 6)
In' other words, trying to specify the "real" meaning of concepts only leads to a quagmire: It mistakes a construct for a real entity. The specification of concepts in scientific inquiry depends instead on nominal and operational definitions. A nominal definition is one that is simply assigned to a term without any claim that the definition represents a "real" entity Nominal definitions are arbitrary-I could define compassion as "plucking feathers off helpless birds" if I wanted to-but they can be more or less usefuL For most purposes, especially communication, that last definition of compassion would be pretty useless" Most nominal definitions represent some consensus, or convention, about how a particular term is to be used. An operarional definition, as you may remember from an earlier chapter, specifies precisely how a concept will be measured-that is, the operations we'll perform. An operational definition is nominal rather than reaL but it has the advantage of achieving maximum clarity about what a concept means in the context of a given study. In the midst of
specification The process through which concepts
are made more specific.
disagreement and confusion over what a term "really" means, we can specify a working definition for the purposes of an inquiry. Wishing to examine socioeconomic status (SES) in a study. for example, we may simply specify that we are going to treat SES as a combination of income and educational attainmenL In this decision, we rule out other possible aspects of SES: occupational status, money in the bank, property, lineage, lifestyle, and so forth. Our findings will then be interesting to the extent that our definition of SES is useful for our purpose.
Creating Conceptual Order The clarification of concepts is a continuing process in social research. Catherine Marshall and Gretchen Rossman (1995: 18) speak of a "conceptual funnel" through which a researcher's interest becomes increasingly focused. Thus, a general interest in social activism could narrow to "individuals who are committed to empowerment and social change" and further focus on discovering "what experiences shaped the development of fully committed social activists," This focusing process is inescapably linked to the language we use. In some forms of qualitative research, the clarification of concepts is a key element in the collection of data. Suppose you were conducting interviews and observations in a radical political group devoted to combating oppression in U.S. society. Imagine how the meaning of oppression would shift as you delved more and more deeply into the members' experiences and worldviews, For example, you might start out thinking of oppression in physical and perhaps economic terms, The more you learned about the group, however, the more you might appreciate the possibility of psychological oppression. The same point applies even to contexts where meanings might seem more fixed. In the analysis of textual materials, for example, social researchers sometimes speak of the "hermeneutic circle," a cyclical process of ever-deeper understanding. The understanding of a text takes place through a process in which the meaning of the separate parts is determined by the global meaning of
the text as it is anticipated. The closer determination of the meaning of the separate parts may eventually change the originally anticipated meaning of the totality, which again influences the meaning of the separate parts, and so on" (K\'ale 1996 47)
Consider the concept "prejudice." Suppose you needed to write a definition of the term. You might start out thinking about racial/ethnic prejudice. At some point you would realize you should probably allow for gender prejudice, religiOUS prejudice, antigay prejudice, and the like in your definition. Examining each of these specific types of prejudice would affect your overall understanding of the general concept. As your general understanding changed, however, you would likely see each of the individual forms somewhat differently. The continual refinement of concepts occurs in all social research methods, Often you will find yourself refining the meaning of important concepts even as you write up your final report, Although conceptualization is a continuing process, it is vital to address it specifically at the beginning of any study design, especially rigorously structured research deSigns such as surveys and experiments. In a survey, for example, operationalization results in a commitment to a specific set of questionnaire items that will represent the concepts under study, Without that commitment, the study could not proceed. Even in less-structured research methods, however, it's important to begin with an initial set of anticipated meanings that can be refined during data collection and interpretation. No one seriously believes we can observe life with no preconceptions; for this reason, scientific observers must be conscious of and explicit about these conceptual starting points. Let's explore initial conceptualization the way it applies to structured inquiries such as surveys and experiments, Though specifying nominal definitions focuses our observational strategy, it does not allow us to observe, As a next step we must specify exactly what we are going to observe, how we will do it, and what interpretations we are going to
129
place on various possible observations. All these further specifications make up the operational definition of the concept. In the example of socioeconomic status, we might decide to ask survey respondents two questions, corresponding to the decision to measure SES in terms of income and educational attainment: 1. What was your total family income during the past 12 months? 2, What is the highest level of school you completed? To organize our data, we'd probably want to specify a system for categorizing the answers people give us. For income, we might use categories such as "under $5,000," "$5,000 to $10,000," and so on, Educational attainment might be similarly grouped in categories: less than high schooL high schooL college, graduate degree. Finally, we would specify the way a person's responses to these two questions would be combined in creating a measure of SES. In this way we would create a working and workable definition of SES, Although others might disagree with our conceptualization and operationalization, the definition would have one essential scientific virtue: It would be absolutely specific and unambiguous. Even if someone disagreed with our definition, that person would have a good idea how to interpret our research results, because what we meant by SES-reflected in our analyses and conclusions-would be precise and clear. Here is a diagram showing the progression of measurement steps from our vague sense of what a term means to specific measurements in a fully structured scientific study: Conceptualization
J. Nominal Definition
J. Operational Definition
1 Measurements in the Real World
130
Chapter 5: Conceptualization, Operationalization, and Measurement
An Example of Conceptualization: Tl7e Concept of Anomie To bring this discussion of conceptualization in research together, let's look briefly at the history of a specific social scientific concept. Researchers studying urban riots are often interested in the part played by feelings of powerlessness. Social scientists sometimes use the word anomie in this context. This term was first introduced into social science by Emile Durkheim, the great French sociologist, in his classic 1897 study, Suicide. Using only government publications on suicide rates in different regions and countries, Durkheim produced a work of analytic genius. To determine the effects of religion on suicide, he compared the suicide rates of predominantly Protestant countries with those of predominantly Catholic ones, Protestant regions of Catholic countries with Catholic regions of Protestant countries, and so forth. To determine the possible effects of the weather, he compared suicide rates in northern and southern countries and regions, and he examined the different suicide rates across the months and seasons of the year. Thus, he could draw conclusions about a supremely individualistic and personal act vvithout having any data about the individuals engaging in it.. At a more general level, Durkheim suggested that suicide also reflects the extent to which a society'S agreements are clear and stable. Noting that times of social upheaval and change often present individuals with grave uncertainties about what is expected of them, Durkheim suggested that such uncertainties cause confusion, anxiety, and even self-destruction. To describe this societal condition of normlessness, Durkheim chose the term anomie. Durkheim did not make this word up. Used in both German and French, it literally meant "without law." The English term anomy had been used for at least three centuries before Durkheim to mean disregard for divine law. However. Durkheim created the social scientific concept of anomie. In the years that have followed the publication of SuiCIde, social scientists have found anomie a useful concept, and many have expanded on
Conceptualization
Durkheim's use. Robert Merton, in a classic article entitled "Social Structure and Anomie" (1938), concluded that anomie results from a disparity between the goals and means prescribed by a society. Monetary success, for example, is a widely shared goal in our society, yet not all indi\riduals have the resources to achieve it through acceptable means. An emphasis on the goal itself, Merton suggested, produces normlessness, because those denied the traditional avenues to wealth go about getting it through illegitimate means. Merton's discussion, then, could be considered a further conceptualization of the concept of anomie. Although Durkheim originally used the concept of anomie as a characteristic of societies, as did Merton after him, other social scientists have used it to describe individuals. To clarify this distinction, some scholars have chosen to use anomie in reference to its originaL societal meaning and to use the term anomia in reference to the individual characteristic. In a given SOciety, then, some individuals experience anomia, and others do not. Elwin PowelL writing 20 years after Merton, provided the following conceptualization of anomia (though using the term anomie) as a characteristic of individuals: When the ends of action become contradictory, inaccessible or insignificant, a condition of anomie arises. Characterized by a general loss of orientation and accompanied by feelings of "emptiness" and apathy, anomie can be simply conceived as meaninglessness. (1958132)
Powell went on to suggest there were two distinct kinds of anomia and to examine how the two rose out of different occupational experiences to result at times in suicide. In his study. however, Powell did not measure anomia per se; he studied the relationship between suicide and occupation, making inferences about the two kinds of anomia. Thus, the study did not provide an operational definition of anomia, only a further conceptualization. Although many researchers have offered operational definitions of anomia, one name stands out
by Leo 5role ycareer-long fixation on anomie began with reading Durkheim's Le Suicide as aHarvard undergraduate Later, as agraduate student at Chicago, I studied under two Durkheimian anthropologists William Lloyd Warner and Alfred Radcliffe-Brovm Radcliffe-Brown had carried on alively correspondence with Durkheim,making me acollateral "descendant" of the great French sociologist For me, the early impact of Durkheim's work on suicide was mixed but permanent On the one hand, I had serious reservations about his strenuous, ingenious, and orren awkward efforts to force the crude, bureaucratic records on suicide rates to fit with his unidirectional sociological determinism On the other hand, I was moved by Durkheim's unswerving preoccupation with the moral force of the interpersonal ties that bind us to our time, place, and past, and also his insights about the lethal consequences that can folloVi from shrinkage and decay in those ties My interest in anomie received an eyewitness jolt at the finale of World War II, when I served with the United Nations Relief and Rehabilitation Administration, helping to rebuild awar-torn Europe At the Nazi concentration camp of Dachau, I saw firsthand the depths of dehumanization that macrosocial forces, such as those that engaged Durkheim, could produce in individuals like Hitler, Eichmann, and the others serving their dictates at all levels in the Nazi death factories. Returning from my UNRRA post,! felt most urgently thatthe time Vias long overdue to come to an understanding of the dynamics
over alL Two years before Powell's article appeared, Leo Srole (1956) published a set of questionnaire items that he said provided a good measure of anomia as experienced by individuals. It consists of five statements that subjects were asked to agree or disagree with: I. In spite of what some people say, the lot of the average man is getting worse. 2. It's hardly fair to bring children into the world with the way things look for the future. 3. Nowadays a person has to live pretty much for today and let tomorrow take care of itself. 4. These days a person doesn't really know who he can count on.
131
underlying disintegrated social bonds We needed to work expeditiously, deemphasizing proliferation of macro-level theory in favor of adirect exploratory encounter with individuals,using state-ofthe-an survey research methodology Such research,1 also felt,should focus on abroader spectrum of behavioral pathologies than suicide My initial investigations were adiverse effort.ln 1950, for example, I was able to interview asample of 401 bus riders in Springfield, Massachusetts Four years later, the Midtown i'v1anhattan Mental Health Study provided amuch larger population reach.These and other field projects gave me scope to expand and refine my measurements of that quality in individuals which reflected the macro-social quality Durkheim had called anomie While I began by using Durkheim's term in my ovm vlork, I soon decided that it was necessary to limit the use of that concept to its macro-social meaning and to sharply segregate it from its individual manifestations For the latter purpose, the cognate but hitherto obsolete Greek term,onomia, readily suggested itself I first published the anomia construct in a 1956 article in the American Sociological RfVlev'I, describing ways of operationalizing it, and presenting the results of its initial field application research By 1982, the Science Citation Index and Social Science Citation Index had listed sorne 400 publications in political science, psychology, social work, and sociology journals here and abroad that had cited use of that article's instruments or findings, warranting the Arnerican Institute for Scientific Inforrnation to deSignate it a"citation classic"
5. There's little use writing to public officials because they aren't really interested in the problems of the average man. (1956 713)
In the half-century following its publication, the Srole scale has become a research staple for social scientists. You'll likely find this particular operationalization of anomia used in many of the research projects reported in academic journals. Srole touches on this in the accompanying box, "The Origins of Anomia," which he prepared for this book before his death. This abbreviated history of anomie and anomia as social scientific concepts illustrates several points.
132
EI
Chapter 5: Conceptualization, Operationalizatior1, and Measurement
First, it's a good example of the process through which general concepts become operationalized measurements. This is not to say that the issue of how to ope rationalize anomie/anomia has been resolved once and for all. Scholars will surely continue to reconceptualize and reoperationalize these concepts for years to come, continually seeking more-useful measures. The Srole scale illustrates another important point. Letting conceptualization and operationalization be open-ended does not necessarily produce anarchy and chaos, as you might expect. Order often emerges. For one thing, although \'le could define anomia any way we chose-in terms of. say, shoe size-we're likely to define it in ways not too different from other people's mental images. If you were to use a really offbeat definition, people would probably ignore you. A second source of order is that. as researchers discover the utility of a particular conceptualization and operationalization of a concept. they're likely to adopt it. which leads to standardized definitions of concepts. Besides the Srole scale, examples include IQ tests and a host of demographic and economic measures developed by the US. Census Bureau. Using such established measures has two advantages: They have been extensively pretested and debugged, and studies using the same scales can be compared. If you and I do separate studies of two different groups and use the Srole scale, we can compare our two groups on the basis of anomia. Social scientists, then, can measure anything that's real; through conceptualization and operationalization, they can even do a pretty good job of measuring things that aren't. Granting that such concepts as socioeconomic status, prejudice, compassion, and anomia aren't ultimately real. social scientists can create order in handling them. It is an order based on utility, however. not on ultimate truth.
Definitions in Descriptive and Explanatory Studies As you'll recall from Chapter 4, two general purposes of research are description and explanation. The distinction between them has important
implications for definition and measurement. If it seems that description is simpler than explanation, you may be surprised to learn that definitions are more problematic for descriptive research than for explanatory research. Before we turn to other aspects of measurement. you'll need a basic understanding of why this is so (we'll discuss this point more fully in Part 4). It's easy to see the importance of clear and precise definitions for descriptive research. If we want to describe and report the unemployment rate in a city, our definition of being unemployed is obviously critical. That definition will depend on our definition of another term: the labor force. If it seems patently absurd to regard a three-year-old child as being unemployed, it is because such a child is not considered a member of the labor force. Thus, we might follow the U.S. Census Bureau's convention and exclude all people under 14 years of age from the labor force. This convention alone, however. would not give us a satisfactory definition, because it would count as unemployed such people as high school students, the retired, the disabled, and homemakers. We might follow the census convention further by defining the labor force as "all persons 14 years of age and over who are employed, looking for work, or waiting to be called back to a job from which they have been laid off or furloughed." If a student, homemaker. or retired person is not looking for work, such a person would not be included in the labor force. Unemployed people, then, would be those members of the labor force, as defined, who are not employed. But what does "looking for work" mean? Must a person register with the state employment service or go from door to door asking for employment? Or would it be sufficient to want a job or be open to an offer of employment? Conventionally, "looking for work" is defined operationally as saying yes in response to an interviewer's asking "Have you been looking for a job during the past seven days?" (Seven days is the period most often specified, but for some research purposes it might make more sense to shorten or lengthen iLl As you can see, the conclusion of a descriptive study about the unemployment rate depends directly on how each issue of definition is resolved.
Definitions in Descriptive and Explanatory Studies
Increasing the period during which people are counted as looking for work would add more unemployed people to the labor force as defined, thereby increasing the reported unemployment rate. If we follow another convention and speak of the civilian labor force and the civilian unemployment rate, we're excluding military personnel; that, too, increases the reported unemployment rate, because military personnel would be employed-by definition. Thus, the descriptive statement that the unemployment rate in a city is 3 percent, or 9 percent. or whatever it might be, depends directly on the operational definitions used. This example is relatively clear because there are several accepted conventions relating to the labor force and unemployment. Now, consider how difficult it would be to get agreement about the definitions you would need in order to say, "Fort1'five percent of the students at this institution are politically conservative." Like the unemployment rate, this percentage would depend directly on the definition of what is being measured-in this case, political conservatism. A different definition might result in the conclusion "Five percent of the student body are politically conservative." Ironically, definitions are less problematic in the case of explanatory research. Let's suppose we're interested in ex-plaining political conservatism. Why are some people conservative and others not? More specifically, let's suppose we're interested in whether conservatism increases with age. What if you and I have 25 different operational definitions of cOllsell'ative, and we can't agree on which definition is best? As we saw in the discussion of indicators, this is not necessarily an insurmountable obstacle to our research. Suppose we found old people to be more conservative than young people in terms of all 25 definitions. Clearly, the exact definition wouldn't matter much. We would conclude that old people are generally more conservative than young people-even though we couldn't agree about exactly what conservative means. In practice, explanatory research seldom results in findings quite as unambiguous as this example suggests; nonetheless, the general pattern is quite common in actual research. There are consistent patterns of relationships in human social life that
133
result in consistent research findings. However, such consistency does not appear in a descriptive situation. Changing definitions almost inevitably results in different descriptive conclusions. "The Importance of Variable Names" explores this issue in connection vvith the variable cirizen participation.
Operationalization Choices In discussing conceptualization, I frequently have
referred to operationalization, for the two are intimately linked. To recap: Conceptualization is the refinement and specification of abstract concepts, and operationalization is the development of specific research procedures (operations) that will result in empirical observations representing those concepts in the real world. As with the methods of data collection, social researchers have a variety of choices when operationalizing a concept Although the several choices are intimately interconnected, I've separated them for the sake of discussion. Realize, though, that operationalization does not proceed through a systematic checklisL
Range of Variation In operationalizing any concept. researchers must be clear about the range of variation that interests them. The question is, to what extent are they willing to combine attributes in fairly gross categories? Let's suppose you want to measure people's incomes in a study by collecting the information from either records or interviews. The highest annual incomes people receive run into the millions of dollars, but not many people get that much. Unless you're studying the very rich, it probably won't add much to your study to keep track of extremely high categories. Depending on whom you study, you'll probably want to establish a highest income category with a much lower floor-maybe $100,000 or more. Although this decision will lead you to throw together people who earn a trillion dollars a year with paupers earning a mere $100,000, they'll survive it. and that mixing probably won't hurt your research any, either. The same decision faces you at
134
Operationalization Choices
Chapter 5: Conceptualization, Operationalization, and Measurement
by Patricia Fisher Graduate School ofPlanning, University ofTennessee
perationalization is one of those things that's easier said than is quite simple to explain to someone the purpose and importance of operational definitions for variables, and even to describe how operationalization typically takes place. However, until you've tried to operationalize a rather complex variable, you may not appreciate some of the subtle difficulties involved. Of considerable importance to the operationalization effort is the particular name that you have chosen for a variable. Let's consider an example from the field of Urban Planning. Avariable of interestto planners is citizen participation. Planners are convinced that participation in the planning process by citizens is important to the success of plan implementation Citizen participation is an aid to planners' understanding of the real and perceived needs of acommunity, and such involvement by citizens tends to enhance their cooperation with and support for planning efforts Although many different conceptual definitions might be offered by different planners, there would be little misunderstanding over what is meant by citizen participationThe name of the variable seems adequate However, if vie ask different planners to provide very simple operational measures for citizen participation, we are likely to find a variety among their responses that does generate confusion One planner might
the other end of the income spectrum. In studies of the general US popUlation, a bottom category of S5,000 or less usually works fine. In studies of attitudes and orientations, the question of range of variation has another dimension. Unless you're carefuL you may end up measuring only half an attitude without really meaning to. Here's an example of what I mean . Suppose you're interested in people's attitudes toward expanding the use of nuclear power generators . You'd anticipate that some people consider nuclear power the greatest thing since the wheeL whereas other people have absolutely no interest in iL Given that anticipation, it would seem to make sense to ask people how much they favor expanding the use of nuclear energy and to give them an-
keep atally of attendance by private citizens at city commission and other local government meetings; another might maintain a record of the different topics addressed by private citizens at similar meetings; while athird might record the number of local government meeting attendees, letters, and phone calls received by the mayor and other public officials, and meetings held by special interest groups during a particular time period As skilled researchers, we can readily see that each planner would be measuring (in a very simplistic fashion) a different dimension of citizen participation:extent of citizen participation, issues prompting citizen participation, and form of citizen participationTherefore, the original naming of our variable, citizen participation, which was quite satisfactory from a conceptual point of view, proved inadequate for purposes of operationalization The precise and exact naming of variables is important in research. It is both essential to and a result of good operationalization.Variable names quite often evolve from an iterative process of forming a conceptual definition, then an operational definition, then renaming the concept to better match what can or will be measuredThis looping process continues (our example illustrates only one iteration), resulting in a gradual refinement of the variable name and its measurement until a reasonable fit is obtained. Sometimes the concept of the variable that you end up with is a bit different from the original one that you started with, but at least you are measuring what you are talking about, if only because you are talking about what you are measuring!
swer categories ranging from "Favor it very much" to "Don't favor it at alL"
This operationalization, however, conceals half the attitudinal spectrum regarding nuclear energy. Many people have feelings that go beyond simply not favoring it: They are, with greater or lesser degrees of intensity, actively opposed to it. In this instance, there is considerable variation on the left side of zero. Some oppose it a little, some quite a bit, and others a great deaL To measure the full range of variation, then, you'd want to operationalize attitudes toward nuclear energy with a range from favoring it very much, through no feelings one way or the other, to opposing it very much. This consideration applies to many of the variables social scientists study. Virtually any public
135
issue involves both support and opposition, each in varying degrees. Political orientations range from very liberal to very conservative, and depending on the people you're studying, you may want to allow for radicals on one or both ends. Similarly, people are not just more or less religious; some are positivelyantireligious. The point is not that you must measure the full range of variation in every case. You should, however, consider whether you need to, given your particular research purpose. If the difference between not religious and antireligious isn't relevant to your research, forget it. Someone has defined pragmatism as "any difference that makes no difference is no difference." Be pragmatic. Finally, decisions on the range of variation should be governed by the expected distribution of attributes among the subjects of the study. In a study of college professors' attitudes toward the value of higher education, you could probably stop at no value and not worry about those who might consider higher education dangerous to students' health. (If you were studying students, however ... j
inquiry whether a person is a conservative Democrat rather than a liberal Democrat, or will it be sufficient to know the party? In measuring religiolls affiliation, is it enough to know that a person is Protestant, or do you need to know the denomination? Do you simply need to know whether or not a person is married, or will it make a difference to know if he or she has never married or is separated, widowed, or divorced? There is, of course, no general answer to such questions. The answers come out of the purpose of a given study, or why we are making a particular measurement. I can give you a useful guideline, though. Whenever you're not sure how much detail to pursue in a measurement, get too much rather than too little. When a subject in an in-depth interview volunteers that she is 37 years old, record "37" in your notes, not "in her thirties." When you're analyzing the data, you can always combine precise attributes into more general categories, but you can never separate any variations you lumped together during observation and measurement.
Variations between the Extremes
A Note on Dimensions
Degree of precision is a second consideration in operationalizing variables. What it boils down to is how fine you "'ill make distinctions among the various possible attributes composing a given variable. Does it matter for your purposes whether a person is 17 or 18 years old, or could you conduct your inquiry by throwing them together in a group labeled 10 to 19 years old? Don't answer too quickly. If you wanted to study rates of voter registration and participation, you'd definitely want to know whether the people you studied were old enough to vote. In general, if you're going to measure age, you must look at the purpose and procedures of your study and decide whether fine or gross differences in age are important to you. In a survey, you'll need to make these decisions in order to design an appropriate questionnaire. In the case of in-depth interviews, these decisions will condition the extent to which you probe for detailso The same thing applies to other variables. If you measure political affiliation, will it matter to your
We've already discussed dimensions as a characteristic of concepts. When researchers get down to the business of creating operational measures of variables, they often discover-or worse, never notice-that they're not exactly dear about which dimensions of a variable they're really interested in. Here's an example. Let's suppose you're studying people's attitudes toward government, and you want to include an examination of how people feel about corruption. Here are just a few of the dimensions you might examine: e
Do people think there is corruption in government?
e
How much corruption do they think there is?
e
How certain are they in their judgment of how much corruption there is?
e
How do they feel about corruption in government as a problem in society?
e
What do they think causes it?
Operationalization Choices
136 " Chapter 5: Conceptualization, Operationalization, and Measurement
Do they think it's inevitable? What do they feel should be done about it? What are they willing to do personally to eliminate corruption in government? How certain are they that they would be willing to do what they say they would do? The list could go on and on-how people feel about corruption in government has many dimensions> It's essential to be clear about which ones are important in our inquiry; otherwise, you may measure how people feel about corruption when you really wanted to know how much they think there is, or vice versa. Once you have determined how you're going to collect your data (for example, survey, field research) and have decided on the relevant range of variation, the degree of precision needed between the extremes of variation, and the specific dimensions of the variables that interest you, you may have another choice: a mathematical-logical one. That is, you may need to decide what level of measurement to use. To discuss this point, we need to take another look at attributes and their relationship to variables.
Defining Variables and Attributes An attribute, you'll recalL is a characteristic or quality of something> Female is an example> So is old or stlldent. Variables, on the other hand, are logical sets of attributes. Thus, gender is a variable composed of the attributes female and male . The conceptualization and operationalization processes can be seen as the specification of variables and the attributes composing them. Thus, in the context of a study of unemployment, employment statlls is a variable having the attributes employed and llnemployed; the list of attributes could
nominal measure A variable whose attributes have only the characteristics of exhaustiveness and mutual exclusiveness, In other words, a level of measurement describing a variable that has attributes that are merely different, as distinguished from ordinal, interval, or ratio measures. Gender is an example of a nominal measure,
also be expanded to include the other possibilities discussed earlier, such as homemaker. Every variable must have two in1portant qualities. First, the attributes composing it should be exhaustive. For the variable to have any utility in research, we must be able to classify every observation in terms of one of the attributes composing the variable> We'll run into trouble if we conceptualize the variable political party affiliation in terms of the attributes Repllblican and Democrat, because some of the people we set out to study will identify with the Green Party, the Reform Party, or some other organization, and some (often a large percentage) will tell us they have no party affiliation. We could make the list of attributes exhaustive by adding other and no affiliation. Whatever we do, we must be able to classify every observation. At the same time, attributes composing a variable must be mutually exclusive. Every observation must be able to be classified in terms of one and only one attribute. For example, we need to define employed and unemployed in such a way that nobody can be both at the same time. That means being able to classify the person who is working at a job but is also looking for work. (We might run across a fully employed mud wrestler who is looking for the alamour and excitement of being a social reo searcheL) In this case, we might define the attributes so that employed takes precedence over 1ll1emplayed, and anyone working at a job is employed reaardless of whether he or she is looking for o something betteL
Levels olMeasurement Attributes operationalized as mutually exclusive and exhaustive may be related in other ways as welL For example, the attributes composing variables may represent different levels of measurement. In this section, we'll examine four levels of measurement: nominaL ordinaL intervaL and ratio.
Nominal Measures Variables whose attributes have only the characteristics of exhaustiveness and mutual exclusiveness are nominal rneasures. Examples include gender,
religiolls affiliation, political party affiliation, birthplace, college major, and hair color. Although the attributes composing eacl1 of these variables-as male and female compose the variable gender-are distinct from one another (and exhaust the possibilities of gender among people), they have no additional structures. Nominal measures merely offer names or labels for characteristics. Imagine a group of people characterized in terms of one such variable and physically grouped by the applicable attributes. For example, say we've asked a large gathering of people to stand together in groups according to the states in which they were born: all those born in Vermont in one group, those born in California in another, and so forth. The variable is place of birth; the attributes are bam in Califomia, bam in VerllloJZ[, and so on> All the people standing in a given group have at least one thing in common and differ from the people in all other groups in that same regard. Where the individual groups form, how close they are to one another, or how the groups are arranged in the room is irrelevant All that matters is that all the members of a given group share the same state of birth and that each group has a different shared state of birth. All we can say about two people in terms of a nominal variable is that they are either the same or different.
Ordinal Measures Variables with attributes we can logically rank-order are ordinal measures. The different attributes of ordinal variables represent relatively more or less of the variable. Variables of this type are social class, cOllsen'atislll, alienatioll, prejlldice, intellectllal sophistication, and the like. In addition to saying whether two people are the same or different in terms of an ordinal variable, you can also say one is "more" than the other-that is, more conservative, more religious, older, and so forth. In the physical sciences, hardness is the most frequently cited example of an ordinal measure. We may say that one material (for example, diamond) is harder than another (say, glass) if the former can scratch the latter and not vice versa, By attempting to scratch various materials with other materials, we might eventually be able to arrange
137
several materials in a row, ranging from the softest to the hardest. We could never say how hard a given material was in absolute terms; we could only say how hard in relative terms-which materials it is harder than and which softer than> Let's pursue the earlier example of grouping the people at a social gathering. This time imagine that we ask all the people who have graduated from college to stand in one group, all those with only a high school diploma to stand in another group, and all those who have not graduated from high school to stand in a third group. This manner of grouping people satisfies the requirements for exhaustiveness and mutual exclusiveness discussed earlier. In addition, however, we might logically arrange the three groups in terms of the relative amount of formal education (the shared attribute) each had. We might arrange the three groups in a row, ranging from most to least formal education. This arrangement would provide a physical representation of an ordinal measure. If we knew which groups two individuals were in, we could determine that one had more, less, or the same formal education as the other. Notice in this example that it is irrelevant how close or far apart the educational groups are from one another. The college and high school groups might be 5 feet apart, and the less-than-highschool group 500 feet farther down the line. These actual distances don't have any meaning. The high school group, however, should be between the lessthan-high-school group and the college group, or else the rank order ,viII be incorrect.
Interval Measures For the attributes composing some variables, the actual distance separating those attributes does have meaning. Such variables are interval measures. For these, the logical distance between attributes can be expressed in meaningful standard intervals.
ordinal measure A level of measurement describing a variable with attributes we can rank-order along some dimension. An example is sodoecollomic status as composed of the attributes high, medium. [ow.
138
Operationalization Choices
Chapter 5: Conceptualization, Operationalization, and Measurement
For example, in the Fahrenheit temperature scale, the difference, or distance, between 80 degrees and 90 degrees is the same as that between 40 degrees and 50 degrees. However, 80 degrees Fahrenheit is not twice as hot as 40 degrees, because the zero point in the Fahrenheit scale is arbitrary; zero degrees does not really mean lack of heal. Sinlilarly, minus 30 degrees on this scale doesn't represent 30 degrees less than no heal. (This is true for the Celsius scale as well. In contrast, the Kelvin scale is based on an absolute zero, which does mean a complete lack of heaL) About the only interval measures commonly used in social scientific research are constructed measures such as standardized intelligence tests that have been more or less accepted. The interval separating IQ scores of 100 and 110 may be regarded as the same as the interval separating scores of 110 and 120 by virtue of the distribution of observed scores obtained by many thousands of people who have taken the tests over the years. But it would be incorrect to infer that someone with an IQ of 150 is 50 percent more intelligent than someone with an IQ of 100. (A person who received a score of 0 on a standard IQ test could not be regarded, strictly speaking, as having no intelligence, although we might feel he or she was unsuited to be a college professor or even a college studenL But perhaps a dean ... ?) When comparing two people in terms of an interval variable, we can say they are different from one another (nominal), and that one is more than
interval measure A level of measurement describing a variable whose attributes are rank-ordered and have equal distances between adjacent attributes. The Fahrenheit temperature scale is an example of this, because the distance between 17 and 18 is the same as that between 89 and 90. ratio measure A level of measurement describing a variable with attributes that have all the qualities of nominal, ordinal. and interval measures and in addition are based on a "true zero" point. Age is an example of a ratio measure,
139
another (ordinal). In addition, we can say "how much" more.
Ratio Measures Most of the social scientific variables meeting the mininmm requirements for interval measures also meet the requirements for ratio measures. In ratio measures, the attributes composing a variable, besides having all the structural characteristics mentioned previously, are based on a true zero point. The Kelvin temperature scale is one such measure. Examples from social scientific research include age, length of residence in a given place, number of organizations belonged to, number of times attending religious services during a particular period of time, number of times married, and number of Arab friends. Returning to the illustration of methodological party games, we might ask a gathering of people to group themselves by age. All the one-year-olds would stand (or sit or lie) together, the two-yearaIds together, the three-year-olds, and so forth. The fact that members of a single group share the same age and that each different group has a different shared age satisfies the minimum requirements for a nominal measure. Arranging the several groups in a line from youngest to oldest meets the additional requirements of an ordinal measure and lets us determine if one person is older than, younger than, or the same age as another. If we space the groups equally far apart, we satisfy the additional requirements of an interval measure and can say how much older one person is than another. Finally, because one of the attributes included in age represents a true zero (babies carried by women about to give birth), the phalanx of hapless party goers also meets the requirements of a ratio measure, permitting us to say that one person is nvice as old as anotheL (Remember this in case you're asked about it in a workbook assignment.) Another example of a ratio measure is income, which extends from an absolute zero to approximately infinity, if you happen to be the founder of Microsoft. Comparing two people in terms of a ratio variable, then, allows us to conclude (1) whether they
Female
Fairly
Not very important
important
Male
Most important thing in my life
Very important
t SO
S10,000
S20,000
$30,000
S40,000
$50,000
FIGURE 5-1 levels of Measurement Often you can choose among different levels of measurement-nominal, ordinal, interval, or ratio-carrying progressively more amounts of information.
are different (or the same), (2) whether one is more than the other. (3) how much they differ, and (4) what the ratio of one to another is. Figure 5-1 sunm1arizes this discussion by presenting a graphic illustration of the four levels of measurement.
Implications 1of Levels of Measurement Because it's unlikely that you'll undertake the physical grouping of people just described (try it
once, and you won't be invited to many parties), I should draw your attention to some of the practical implications of the differences that have been distinguished. These implications appear primarily in the analysis of data (discussed in Part 4), but you need to anticipate such implications when you're structuring any research project. Certain quantitative analysis techniques require variables that meet certain minimum levels of
140 "
Chapter 5: Conceptualization, Operationalization, and Measurement
measurement. To the extent that the variables to be examined in a research project are limited to a particular level of measurement-say, ordinal-you should plan your analytical techniques accordingly. More precisely, you should anticipate drawing research conclusions appropriate to the levels of measurement used in your variables. For example, you might reasonably plan to determine and report the mean age of a population under study (add up all the individual ages and divide by the number of people), but you should not plan to report the mean religious affiliation, because that is a nominal variable, and the mean requires ratio-level data. (You could report the modal-the most commonreligious affiliation.) At the same time, you can treat some variables as representing different levels of measurement. Ratio measures are the highest leveL descending through interval and ordinal to nominaL the lowest level of measurement. A variable representing a higher level of measurement-say, ratio-can also be treated as representing a lower level of measurement -say, ordinal. Recall, for example, that age is a ratio measure. If you ,vished to examine only the relationship between age and some ordinal-level variable-say, self-perceived religiosity: high, medium, and low-you might choose to treat age as an ordinal-level variable as well. You might characterize the subjects of your study as being young, middle-aged, and old, specifying what age range composed each of these groupings. Finally, age might be used as a nominal-level variable for certain research purposes. People might be grouped as being born during the Depression or not. Another nominal measurement based on birth date rather than just age, would be the grouping of people by astrological signs . The level of measurement you'll seek, then, is determined by the analytical uses you've planned for a given variable, keeping in mind that some variables are inherently limited to a certain level. If a variable is to be used in a variety of ways, requiring different levels of measurement, the study should be designed to achieve the highest level required. For example, if the subjects in a study are asked their exact ages, they can later be organized into ordinal or nominal groupings.
Again, you need not necessarily measure variables at their highest level of measurement. If you're sure to have no need for ages of people at higher than the ordinal level of measurement, you may simply ask people to indicate their age range, such as 20 to 29, 30 to 39, and so forth. In a study of the wealth of corporations, rather than seek more precise information, you may use Dun & Bradstreet ratings to rank corporations. Whenever your research purposes are not altogether clear, however, seek the highest level of measurement possible. As we've discussed, although ratio measures can later be reduced to ordinal ones, you cannot convert an ordinal measure to a ratio one. More generally, you cannot convert a lower-level measure to a higher-level one. That is a one-way street worth remembering. Typically a research project wi.ll tap variables at different levels of measurement. For example, William Bielby and Denise Bielby (1999) set out to examine the world of film and television, using a nomothetic, longitudinal approach (take a moment to remind yourself what that means). In what they referred to as the "culture industry," the authors found that reputatioll (an ordinal variable) is the best predictor of screenwriters' future productivity. More interestingly, they found that screenwriters who were represented by "core" (or elite) agencies were not only far more likely to find jobs (a nominal variable), but also jobs that paid more (a ratio variable). In other words, the researchers found that agencies' reputations (ordinal) was a key independent variable for predicting a screenwriter'S career success. The researchers also found that being older (ratio), female (nominal), an ethnic minority (nominal), and having more years of experience (ratio) were disadvantageous for a writer'S career. On the other hand, higher earnings from previous years (measured in ordinal categories) led to more success in the future. In Bielby and Bielby's terms, "success breeds success" (1999: 80).
Single or Multiple Indicators With so many alternatives for operationalizing social scientific variables, you may find yourself worrying about making the right choices. To
Operationalization Choices counter this feeling, let me add a momentary dash of certainty and stability. Many social research variables have fairly obvious, straightforward measures. No matter how you cut it, gender usually turns out to be a matter of male or female: a nominal-level variable that can be measured by a single observation-either by looking (well, not always) or by asking a question (usually). In a study involving the size of families, you'll want to think about adopted and foster children, as well as blended families, but it's usually pretty easy to find out how many dlildren a family has. For most research purposes, the resident population of a country is the resident population of that country-you can look it up in an almanac and know the answer. A great many variables, then, have obvious single indicators. If you can get one piece of information, you have what you need. Sometimes, however, there is no single indicator that will give you the measure of a variable you really want. As discussed earlier in thls chapter, many concepts are subject to varying interpretations-each ,vith several possible indicators. In these cases, you'll want to make several observations for a given variable. You can then combine the several pieces of information you've collected, creating a composite measurement of the variable in question. Chapter 6 is devoted to ways of doing that so here let's just discuss one simple illustration. Consider the concept "college performance." All of us have noticed that some students perform well in college courses and others don't. In studying these differences, we might ask what characteristics and experiences are related to high levels of performance (many researchers have done just that). How should we measure overall performance? Each grade in any single course is a potential indicator of college performance, but it also may not typify the student'S general performance. The solution to this problem is so firmly established that it is, of course, obvious: the grade point average (GPA). We assign numerical scores to each letter grade, total the points earned by a given student, and divide by the number of courses taken, thus obtaining a composite measure. (If the courses vary
141
in number of credits, we adjust the point values accordingly.) Creating such composite measures in social research is often appropriate.
Some lfIustrations of Operationalization Choices To bring together all the operationalization choices available to the social researcher and to show the potential in those possibilities, let's look at some of the distinct ways you might address various research problems. The alternative ways of operationalizing the variables in each case should demonstrate the opportunities that social research can present to our ingenuity and imaginations. To simplify matters, I have not attempted to describe all the research conditions that would make one alternative superior to the others, though in a given situation they would not all be equally appropriate. Here are specific research questions, then, and some of the ways you could address them. We'll begin ,vith an example discussed earlier in the chapter. It has the added advantage that one of the variables is straightforward to operationalize. 1. Are women more compassionate than men? a.
Select a group of subjects for study, with equal numbers of men and women. Present them vvith hypothetical situations that involve someone's being in trouble. Ask them what they would do if they were confronted ,vith that situation. What would they do, for example, if they came across a small child who was lost and crying for his or her parents? Consider any answer that involves helping or comforting the child as an indicator of compassion. See whether men or women are more likely to indicate they would be compassionate.
b. Set up an experiment in which you pay a small child to pretend that he or she is lost. Put the child to work on a busy sidewalk and observe whether men or women are more likely to offer assistance. Also be sure to count the total number of men and
142
Criteria of Measurement Quality
Chapter 5: Conceptualization, Operationalization, and Measurement
women who walk by, because there may be more of one than the otheL If that's the case, simply calculate the percentage of men and the percentage of women who help.
c
Select a sample of people and do a survey in which you ask them what organizations they belong to. Calculate whether women or men are more likely to belong to those that seem to reflect compassionate feelings. To take account of men who belong to more organizations than do women in general-or vice versa-do this: For each person you study, calculate the percentage of his or her organizational memberships that reflect compassion. See if men or women have a higher average percentage.
2. Are sociology students or accounting students better informed about world affairs? a. Prepare a short quiz on world affairs and arrange to administer it to the students in a sociology class and in an accounting class at a comparable leveL If you want to compare sociology and accounting majors, be sure to ask students what they are majoring in. b. Get the instructor of a course in world affairs to give you the average grades of sociology and accounting students in the course.
c
Take a petition to sociology and accounting classes that urges that "the United Nations headquarters be moved to New York City." Keep a count of how many in each class sign the petition and how many inform you that the UN headquarters is already located in New York City.
3. Do people consider New York or California the better place to live? a.
Consulting the Statistical Abstracr of the Ul1ited States or a similar publication, check the migration rates into and out of each state. See if you can find the numbers moving directly from New York to California and vice versa.
b. The national polling companies-Gallup, Harris, Roper, and so forth-often ask
people what they consider the best state to live in. Look up some recent results in the library or through your local newspaper.
c
Compare suicide rates in the two states.
4. Who are the most popular instructors on your campus, those in the social sciences, the natural sciences, or the humanities? a. If your school has a provision for student evaluation of instructors, review some recent results and compute the average rating of each of the three groups. b. Begin visiting the introductory courses given in each group of disciplines and measure the attendance rate of each class.
c
In December, select a group of faculty in
each of the three divisions and ask them to keep a record of the numbers of holiday greeting cards and presents they receive from admiring students. See who vvins. The point of these examples is not necessarily to suggest respectable research projects but to illustrate the many ways variables can be operationalized.
Operationalization Goes On and On Although I've discussed conceptualization and operationalization as activities that precede data collection and analysis-for example, you must design questionnaire items before you send out a questionnaire-these two processes continue throughout any research project, even if the data have been collected in a structured mass survey. As we've seen, in less-structured methods such as field research, the identification and specification of relevant concepts is inseparable from the ongoing process of observation. As a researcher, always be open to reexamining your concepts and definitions. The ultimate purpose of social research is to clarify the nature of sociallife. The validity and utility of what you learn in this regard doesn't depend on when you first figured out how to look at things any more than it matters whether you got the idea from a learned textbook, a dream, or your brother-in-law.
Criteria of Measurement Quality This chapter has come some distance. It began with the bald assertion that social scientists can measure anything that exists. Then we discovered that most of the things we might want to measure and study don't really exist Next we learned that it's possible to measure them anyway. Now we conclude the chapter with a discussion of some of the yardsticks against which we judge our relative success or failure in measuring things-even things that don't exist.
Precision and Accuracy To begin, measurements can be made with varying degrees of precision. As we saw in the discussion of operationalization, precision concerns the fineness of distinctions made between the attributes that compose a variable. The description of a woman as "43 years old" is more precise than "in her forties." Saying a street-corner gang was formed "in the summer of 1996" is more precise than saying "during the 1990s." As a general rule, precise measurements are superior to imprecise ones, as common sense dictates. There are no conditions under which imprecise measurements are intrinsically superior to precise ones. Even so, exact precision is not always necessary or desirable. If knowing that a woman is in her forties satisfies your research requirements, then any additional effort invested in learning her precise age is wasted. The operationalization of concepts, then, must be guided partly by an understanding of the degree of precision required. If your needs are not clear, be more precise rather than less. Don't confuse precision with accuracy, howeveL Describing someone as "born in New England" is less precise than "born in Stowe, Vermont"-but suppose the person in question was actually born in Boston., The less-precise description, in this instance, is more accurate, a better reflection of the real world. Precision and accuracy are obviously important qualities in research measurement, and they
143
probably need no further eA"planation. When social scientists construct and evaluate measurements, however, they pay special attention to two technical considerations: reliability and validity.
Reliability In the abstract, reliability is a matter of whether a particular technique, applied repeatedly to the same object, yields the same result each time, Let's say you want to know how much I weigh. (No, I don't know why.) As one technique, say you ask two different people to estimate my weight. If the first person estimates 150 pounds and the other estimates 300, we have to conclude the technique of having people estimate my weight isn't very reliable. Suppose, as an alternative, that you use a bathroom scale as your measurement tedmique. I step on the scale t'llI,rjce, and you note the same result each time. The scale has presumably reported the same weight for me both times, indicating that the scale provides a more reliable technique for measuring a person's weight than asking people to estimate it does. Reliability, however, does not ensure accuracy any more than precision does. Suppose I've set my bathroom scale to shave five pounds off my weight just to make me feel betteL Although you would (reliably) report the same weight for me each time, you would always be wrong. This new element, called bias, is discussed in Chapter 8. For now, just be warned that reliability does not ensure accuracy. Let's suppose we're interested in studying morale among factory workers in two different kinds
reliability That quality of measurement method that suggests that the same data would have been collected each time in repeated observations of the same phenomenon, In the context of a survey, we would expect that the question "Did you attend religious services last week?" would have higher reliability than the question "About how many times have you attended religious services in your life?" This is not to be confused 'with validity.
144
Chapter 5: Conceptualization, Operationalization, and Measurement
of factories. In one set of factories, workers have specialized jobs, reflecting an extreme division of labor. Each worker contributes a tiny part to the overall process performed on a long assembly line. In the other set of factories, each worker performs many tasks, and small teams of workers complete the whole process. How should we measure morale? Following one strategy, we could observe the workers in each factory, noticing such things as whether they joke with one another, whether they snille and laugh a lot, and so forth. We could ask them how they like their work and even ask them whether they think they would prefer their current arrangement or the other one being studied. By comparing what we observed in the different factories, we might reach a conclusion about which assembly process produces the higher morale. Notice that I've just described a qualitative measurement procedure. Now let's look at some reliability problems inherent in this method, First how you and I are feeling when we do the observing will likely color what we see. We may misinterpret what we see. We may see workers kidding each other but think they're having an argument. We may catch them on an off day. If we were to observe the same group of workers several days in a row, we might arrive at different evaluations on each day. Further, even if several observers evaluated the same behavior, they might arrive at different conclusions about the workers' morale. Here's another strategy for assessing morale, a quantitative approach. Suppose we check the company records to see how many grievances have been filed Vlrith the union during some fixed period. Presumably this would be an indicator of morale: the more grievances, the lower the morale. This measurement strategy would appear to be more reliable: Counting up the grievances over and over, we should keep arriving at the same number. If you find yourself thinking that the number of grievances doesn't necessarily measure morale, you're worrying about validity, not reliability. We'll discuss validity in a moment. The point for now is that the last method is more like my bathroom scale-it gives consistent results.
In social research, reliability problems crop up in many forms, Reliability is a concern every time a single observer is the source of data, because we have no certain guard against the impact of that observer's subjectivity. We can't tell for sure how much of what's reported originated in the situation observed and how much in the observer. Subjectivity is not only a problem with single observers, however. Survey researchers have known for a long time that different interviewers, because of their own attitudes and demeanors, get different answers from respondents. Or, if we were to conduct a study of newspapers' editorial positions on some public issue, we might CTeate a team of coders to take on the job of reading hundreds of editorials and classifying them in terms of their position on the issue. Unfortunately, different coders will code the same editorial differently. Or we might want to classify a few hundred specific occupations in terms of some standard coding scheme, say a set of categories created by the Department of Labor or by the Census Bureau. You and I would not place all those occupations in the same categories. Each of these examples illustrates problems of reliability. Similar problems arise whenever we ask people to give us information about themselves. Sometimes we ask questions that people don't know the answers to: How many times have you been to religious services? Sometimes we ask people about things they consider totally irrelevant: Are you satisfied with China's current relationship with Albania? In such cases, people will answer differently at different times because they're making up answers as they go. Sometimes we explore issues so complicated that a person who had a dear opinion in the matter might arrive at a different interpretation of the question when asked a second time. So how do you create reliable measures? If your research design calls for asking people for information, you can be careful to ask only about things the respondents are likely to know the answer to. Ask about things relevant to them, and be dear in what you're asking. Of course, these techniques don't solve every possible reliability problem. Fortunately, social researchers have developed
Criteria of Measurement Quality" 145
several techniques for cross-checking the reliability of the measures they devise.
Test-Retest Method Sometimes it's appropriate to make the same measurement more than once, a technique called the test-mest mer/zod. If you don't expect the soughtafter information to change, then you should expect the same response both times. If answers vary, the measurement method may, to the extent of that variation, be unreliable. Here's an illustration. In their research on Health Hazard Appraisal (HHA), a part of preventive medicine, Jeffrey Sacks, W. Mark Krushat, and Jeffrey Newman (1980) wanted to determine the risks associated with various background and lifestyle factors, making it possible for physicians to counsel their patients appropriately. By knovving patients' life situations, physicians could advise them on their potential for survival and on how to improve it. This purpose, of course, depended heavily on the accuracy of the information gathered about each subject in the study. To test the reliability of their information, Sacks and his colleagues had all 207 subjects complete a baseline questionnaire that asked about their characteristics and behavior. Three months later, a follow-up questionnaire asked the same subjects for the same information, and the results of the two surveys were compared. Overall. only 15 percent of the subjects reported the same information in both studies. Sacks and his colleagues report the following: Almost 10 percent of subjects reported a different height at follow-up examination. Parental age was changed by over one in three subjects. One parent reportedly aged 20 chronologic years in three months. One in five ex-smokers and ex-drinkers have apparent difficulty in reliably recalling their previous consumption pattern. (1980 730)
Some subjects erased all trace of previously reported heart murmur, diabetes, emphysema, arrest record, and thoughts of suicide. One subject's
mother, deceased in the first questionnaire, was apparently alive and well in time for the second. One subject had one ovary missing in the first study but present in the second. In another case, an ovary present in the first study was missing in the second study-and had been for ten years! One subject was reportedly 55 years old in the first study and 50 years old three months later. (You have to wonder whether the physician-counselors could ever have nearly the impact on their patients that their patients' memories did.) Thus, test-retest revealed that this data-collection method was not especially reliable.
Split-Half Method As a general rule, it's always good to make more than one measurement of any subtle or complex social concept, such as prejudice, alienation, or social class. This procedure lays the groundwork for another check on reliability. Let's say you've CTeated a questionnaire that contains ten items you believe measure prejudice against women. Using the split-half technique, you would randomly assign those ten items to two sets of five . Each set should provide a good measure of prejudice against women, and the two sets should classify respondents the same way. If the two sets of items classify people differently, you most likely have a problem of reliability in your measure of the variable.
Using Established Measures Another way to help ensure reliability in getting information from people is to use measures that have proved their reliability in previous research. If you want to measure anomia, for example, you might want to follow Srole's lead. The heavy use of measures, though, does not guarantee their reliability. For example, the Scholastic Assessment Tests (SATs) and the Minnesota Multiphasic Personality Inventory (MMPI) have been accepted as established standards in their respective domains for decades. In recent years, though, they've needed fundamental overhauling to reflect changes in society, eliminating outdated topics and gender bias in wording.
146
Criteria of Measurement Quality
Chapter 5: Conceptualization, Operationalization, and Measurement
Reliability of Research Workers As we've seen, it's also possible for measurement unreliability to be generated by research workers: interviewers and coders, for example" There are several ways to check on reliability in such cases"" To guard against interviewer unreliability in surveys, for example, a supervisor will call a subsample of the respondents on the telephone and verify selected pieces of information" Replication works in other situations also. If you're worried that newspaper editorials or occupations may not be classified reliably, you could have each independently coded by several coders. Those cases that are classified inconsistently can then be evaluated more carefully and resolved. Finally, clarity, specificity, training, and practice can prevent a great deal of unreliability and grief. If you and I spent some time reaching a clear agreement on how to evaluate editorial positions on an issue-discussing various positions and reading through several together-HIe could probably do a good job of classifying them in the same way independently The reliability of measurements is a fundamental issue in social research, and we'll return to it more than once in the chapters ahead. For now, however, let's recall that even total reliability doesn't ensure that our measures actually mea-
validity A term describing a measure that accurately reflects the concept it is intended to measure. For example, your IQ would seem a more valid measure of your intelligence than the number of hours you spend in the library would" Though the ultimate validity of a measure can never be proved, we may agree to its relative validity on the basis of face validity, criterion validity, content validity, construct validity, internal validation, and external validation. This must not be confused with reliability"" face validity That quality of an indicator that makes it seem a reasonable measure of some variable. That the frequency of attendance at religious services is some indication of a person's religiosity seems to make sense without a lot of ex-planalion. It has face validity.
sure what we think they measure" Now let's plunge into the question of validity.
Validity In conventional usage, validity refers to the extent to which an empirical measure adequately reflects the real meaning of the concept under consideration. Whoops! I've already committed us to the view that concepts don't have real meanings. How can we ever say whether a particular measure adequately reflects the concept's meaning, then? Ultimately, of course, we can't At the same time, as we've already seen, all of social life, including social research, operates on agreements about the terms we use and the concepts they represent There are several criteria of success in making measurements that are appropriate to these agreed-on meanings of concepts. First, there's something called face validity. Particular empirical measures mayor may not jibe ,>vith our conunon agreements and our individual mental images concerning a particular concept For example, you and I might quarrel about whether counting the number of grievances filed vvith the union vvill adequately measure morale. Still, we'd surely agree that the number of grievances has sometlzing to do 'with morale. That is, the measure is valid "on its face," whether or not it's adequate. If I were to suggest that we measure morale by finding out how many books the workers took out of the library during their off-duty hours, you'd undoubtedly raise a more serious objection: That measure wouldn't have much face validity. Second, I've already pointed to many of the more formally established agreements that define some concepts. The Census Bureau, for example, has created operational definitions of such concepts as family, household, and employment status that seem to have a workable validity in most studies using these concepts. Three additional types of validity also specify particular ways of testing the validity of measures. The first, criterion-related validity, sometimes called predictive validity, is based on some external criterion. For example, the validity of College Board exams is shown in their ability to predict students'
success in college. The validity of a written driver'S test is determined, in this sense, by the relationship between the scores people get on the test and their subsequent driving records. In these examples, college success and driving ability are the criteria. To test your understanding of criterion-related validity, see whether you can think of behaviors that might be used to validate each of the following attitudes: Is very religiOUS Supports equality of men and women Supports far-right militia groups Is concerned about the environment Some possible validators would be, respectively, attends religious services, votes for women candidates, belongs to the NRA, and belongs to the Sierra Club. Sometimes it's difficult to find behavioral criteria that can be taken to validate measures as directly as in such examples. In those instances, however, we can often approximate such criteria by applying a different test We can consider how the variable in question ought, theoretically, to relate to other variables . Construct validity is based on the logical relationships among variables. Suppose, for example, that you want to study the sources and consequences of marital satisfaction. As part of your research, you develop a measure of marital satisfaction, and you want to assess its validity, In addition to developing your measure, you'll have developed certain theoretical eX1lectations about the way the variable marital satisfaction relates to other variables . For example, you might reasonably conclude that satisfied husbands and wives will be less likely than dissatisfied ones to cheat on their spouses. If your measure relates to marital fidelity in the expected fashion, that constitutes evidence of your measure's construct validity. If satisfied marriage partners are as likely to cheat on their spouses as are the dissatisfied ones, however, that would challenge the validity of your measure. Tests of construct validity, then, can offer a weight of evidence that your measure either does
147
or doesn't tap the quality you want it to measure, vvithout providing definitive prooL Although I have suggested that tests of construct validity are less compelling than those of criterion validity, there is room for disagreement about which kind of test a particular comparison variable (driving record, marital fidelity) represents in a given situation. It's less important to distinguish the two types of validity tests than to understand the logic of validation that they have in common: If we've succeeded in measuring some variable, then our measures should relate in some logical way to other measures. Finally, content validity refers to how much a measure covers the range of meanings included within a concept For example, a test of mathematical ability cannot be limited to addition but also needs to cover subtraction, multiplication, division, and so forth. Or, if we're measuring prejudice, do our measurements reflect all types of prejudice, including prejudice against racial and ethnic groups, religious minorities, women, the elderly, and so on? Figure 5-2 presents a graphic portrayal of the difference between validity and reliability. If you think of measurement as analogous to repeatedly shooting at the bull's-eye on a target, you'll see that reliability looks like a "tight pattern," regardless of where the shots hit, because reliability is a function of consistency. Validity, on the other hand, is a function of shots being arranged around the bull's-eye, The failure of reliability in the figure is randomly distributed around the target; the failure of validity is systematically off the mark. Notice that neither an unreliable nor an invalid measure is likely to be very useful.
criterion-related validity The degree to which a measure relates to some external criterion . For example, the validity of College Board tests is shown in their ability to predict the college success of students, Also called predictive validity. construct validity The degree to which a measure relates to other variables as expected within a system of theoretical relationships. content validity The degree to which a measure covers the range of meanings included within a concept.
148
Chapter 5: Conceptualization, Operationalization, and Measurement
Reliable but not valid
Valid but not reliable
Main Points
Valid and reliable
FIGURE 5-2 An Analogy to Validity and Reliability. A good measurement technique should be both valid (measuring what it is intended to measure) and reliable (yielding a given measurement dependably).
Who Decides What's Valid? Our discussion of validity began with a reminder that we depend on agreements to determine what's real, and we've just seen some of the ways social scientists can agree among themselves that they have made valid measurements. There is yet another way of looking at validity. Social researchers sometimes criticize themselves and one another for implicitly assuming they are somewhat superior to those they study. For example, researchers often seek to uncover motivations that the social actors themselves are unaware of. You think you bought that new Burpo-Blasto because of its high performance and good looks, but lVe know you're really trying to achieve a higher social status. This implicit sense of superiority would fit comfortably with a totally positivistic approach (the biologist feels superior to the frog on the lab table), but it clashes with the more humanistic and typically qualitative approach taken by many social scientists. We'll explore this issue more deeply in Chapter 10. In seeking to understand the way ordinary people make sense of their worlds, etlmomethodologists have urged all social scientists to pay more respect to these natural social processes of conceptualization and shared meaning. At the very least behavior that may seem irrational from the scientist's paradigm may make logical sense when viewed through the actor's paradigm.
Ultimately, social researchers should look both to their colleagues and to their subjects as sources of agreement on the most useful meanings and measurements of the concepts they study. Sometimes one source will be more useful, sometimes the other. But neither one should be dismissed.
Tension between Reliability and Validity Clearly, we want our measures to be both reliable and valid. However, a tension often arises between the criteria of reliability and validity, forcing a trade-off between the two. Recall the example of measuring morale in different factories. The strategy of immersing yourself in the day-to-day routine of the assembly line, observing what goes on, and talking to the workers would seem to provide a more valid measure of morale than counting grievances would. It just seems obvious that we'd get a clearer sense of whether the morale was high or low using this first method. As I pointed out earlier, however, the counting strategy would be more reliable. This situation reflects a more general strain in research measurement. Most of the really interesting concepts we want to study have many subtle nuances, so specifying precisely what we mean by them is hard. Researchers sometimes speak of such concepts as having a "richness of meaning." Although scores of books and articles have been written on the topic
of anomie/anomia, for example, they still haven't exhausted its meaning. Very often, then, specifying reliable operational definitions and measurements seems to rob concepts of their richness of meaning. Positive morale is much more than a lack of grievances filed with the union; anomia is much more than what is measured by the five items created by Leo Srole. Yet the more variation and richness we allow for a concept the more opportunity there is for disagreement on how it applies to a particular situation, thus reducing reliability. To some extent, this dilemma explains the persistence of two quite different approaches to social research: quantitative, nomothetic, structured techniques such as surveys and experinlents on the one hand, and qualitative, idiographic methods such as field research and historical studies on the other. In the simplest generalization, the former methods tend to be more reliable, the latter more valid. By being forewarned, you'll be effectively forearmed against this persistent and inevitable dilel1U11a. If there is no clear agreement on how to measure a concept measure it several different ways. If the concept has several dinlensions, measure them alL Above alL know that the concept does not have any meaning other than what you and I give it. The only justification for giving any concept a particular meaning is utility. Measure concepts in ways that help us understand the world around us.
e
e
III
e
Conceptions are mental images we use as summary devices for bringing together observations and experiences that seem to have something in C0l1U110n. We use terms or labels to reference these conceptions.
Conceptualization is the process of specifying observations and measurements that give concepts definite meaning for the purposes of a research study. Conceptualization includes specifying the indicators of a concept and describing its dimensions. Operational definitions specify how variables relevant to a concept will be measured.
Precise definitions are even more important in descriptive than in explanatory studies. The degree of precision needed varies with the type and purpose of a study.
Operationalization Choices e
Operationalization is an extension of conceptualization that specifies the exact procedures that will be used to measure the attributes of variables.
e
Operationalization involves a series of interrelated choices: specifying the range of variation that is appropriate for the purposes of a study, determining how precisely to measure variables, accounting for relevant dimensions of variables, clearly defining the attributes of variables and their relationships, and deciding on an appropriate level of measurement.
e
Researchers must choose from four levels of measurement, which capture increasing amounts of information: nominal, ordinal, intervaL and ratio. The most appropriate level depends on the purpose of the measurement.
e
A given variable can sometimes be measured at different levels. When in doubt researchers should use the highest level of measurement appropriate to that variable so they can capture the greatest amount of information.,
Measuring Anything That Exists e
Concepts are constructs; they represent the agreed-on meanings we assign to terms. Our concepts don't exist in the real world, so they can't be measured directly, but we can measure the things that our concepts summarize.
Definitions in Descriptive and Explanatory Studies
Introduction The interrelated processes of conceptualization, operationalization, and measurement allow researchers to move from a general idea about what they want to study to effective and welldefined measurements in the real world.
149
Conceptualization
MAIN POINTS
e
El
150
e
Operationalization begins in the design phase of a study and continues through all phases of the research project, including the analysis of data .
dimensions you wish to include in and exclude from your conceptualization. 2,
Criteria of Measurement Quality e
Criteria of the quality of measures include precision, accuracy, reliability, and validity.
e
Whereas reliability means getting consistent results from the same measure, validity refers to getting results that accurately reflect the concept being measured.
e
e
e
Researchers can test or improve the reliability of measures through the test-retest method, the split-half method, the use of established measures, and the examination of work performed by research workers. The yardsticks for assessing a measure's validity include face validity, criterion-related validity, construct validity, and content validity. Creating specific, reliable measures often seems to diminish the richness of meaning our general concepts have. This problem is inevitable. The best solution is to use several different measures, tapping the different aspects of a concept
KEY TERMS
The following terms are defined in context in the chapter and at the bottom of the page where the term is introduced, as well as in the comprehensive glossary at the back of the book. conceptualization construct validity content validity criterion-related validity dimension face validity indicator
interval measure nominal measure ordinal measure ratio measure reliability specification validity
REVIEW QUESTIONS AND EXERCISES
Pick a social science concept such as liberalism or alienation, then specify that concept so that it could be studied in a research project. Be sure to specify the indicators you'll use as well as the
What level of measurement-nominal. ordinal. interval. or ratio-describes each of the following variables? a.
Race (white, African American, Asian, and so on)
b.
Order of finish in a race (first. second, third, and so on)
c.
Number of children in families
d.
Populations of nations
e.
Attitudes toward nuclear energy (strongly approve, approve, disapprove, strongly disapprove)
L
Region of birth (Northeast. Midwest, and so on)
g.
Political orientation (very liberal. somewhat liberaL somewhat conservative, very conservative)
3. Let's conceptualize the variable: prejudice. Using your favorite web browser, search for the term prejudice. After reviewing several of the websites resulting from your search, make a list of some different forms of prejudice that might be studied in an omnibus project dealing with that topic. 4.
L
Online Study Resources
Chapter 5: Conceptualization, Operationalization, and Measurement
Let's discover truth. In a good dictionary, look up trllth and true, then copy out the definitions. Note the key terms used in those definitions (e . g" realify), look up the definitions of those terms, and copy out these definitions as welL Continue this process until no new terms appear, Comment on what you've learned from this exercise.
Lazarsfeld, Paul E, and Morris Rosenberg, eds. 1955. The Lallguage of Social Research, Section L New York: Free Press of Glencoe. An excellent and diverse classic collection of descriptions of specific measurements in past social research. These 14 articles present useful and readable accounts of actual measurement operations performed by social researchers, as well as more conceptual discussions of measurement in generaL Miller. Delbert. 1991 . Handbook ofResearclz Design and Social Measurement. Newbury Park, CA: Sage. A powerful reference work., This book, especially Part 6, cites and describes a wide variety of operational measures used in earlier social research. In several cases, the questionnaire formats used are presented. Though the quality of these illustrations is uneven, they provide excellent examples of possible variations. Silverman, David. 1993. IllIerprerilzg Qualirarive Data:' Methods for Analyzing Talk, Text, and Interaction, Chapter 7., Newbury Park, CA: Sage. This chapter deals with the issues of validity and reliability specifically in regard to qualitative research., U.S. Department of Health and Human Services. 1992. Survey l'vIeasurelllent ofDmg Use . Washington, DC: U.S. Government Printing Office. An extensive review of techniques devised and used for measuring various kinds of drug use.
SPSS EXERCISES
See the booklet that accompanies your text for exercises using SPSS (Statistical Package for the Social Sciences), There are exercises offered for each chapter, and you'll also find a detailed primer on using SPSS.
ADDITIONAL READINGS
Bohrnstedt. George W. 1983. "MeasuremenL" Pp. 70-121 in Halldbook of Survey Research edited by Peter H. Rossi, James D. Wright, and Andy B., Anderson. New York: Academic Press., This essay offers the logical and statistical grounding of reliability and validity in measurement. Grimes, Michael D. 1991. Class in TlVentieth-CentlllY American Sociology.: An Analysis of Theories and Measurement Strategies. New York: PraegeL This book provides an excellent, long-term view of conceptualization as the author examines a variety of theoretical views of social class and the measurement techniques appropriate to those theories.
Online Study Resources Sociology~Now'M: Research Methods 1. Before you do your final review of the chapter. take the SociologyNow: Research Methods diagnostic quiz to help identify the areas on which you should concentrate. You'll find information on this online tooL as well as instructions on how to access all of its great resources, in the front of the book, 2.
As you review, take advantage of the Sociology Now Research l'vIetlzods customized study plan,
151
based on your quiz results, Use this study plan with its interactive exercises and other resources to master the materiaL 3.
When you're finished with your review, take the posttest to confirm that you're ready to move on to the next chapter..
WEBSITE FOR THE PRACTICE OF SOCIAL RESEARCH 11TH EDITION Go to your book's website at http://sociology wadsworth com/babbie_practice II e for tools to aid you in studying for your exams. You'll find TlIlorial Quizzes with feedback, I11lcnzet Exercises, Flashcards, and Chapter Tutorials, as well as E\1ended Projeas, lnfoTrac College Editioll search terms, Social Research in Cyberspace, GSS Dara, Web Links, and primers for using various data-analysis software such as SPSS and NVivo.
WEB LINKS FOR THIS CHAPTER Please realize that the Internet is an evolving entity, subject to change., Nevertheless, these few websites should be fairly stable, Also, check your book's website for even more H't?b Links . These websites, current at the time of this book's publication, provide opportunities to learn about conceptualization, operationalization, and measurement.
US Census, Statistical Abstract of the United States http://www.census.gov/prod/w,,\vw/statistical-abstract -us.html Here is just about everything you want to know about people in the United States: what they are like and what they do., It provides numerous examples of how characteristics and behaviors can be defined and measured. University of Michigan, General Social Survey Codebook http://www,icpsLumich . edu/GSS/ This is a major social science resource. The GSS codebook identifies the numerous variables examined by the studies over time and gives the specific operationalization of those variables. University of Colorado, Social Science Data Archives http://socscLcolorado.edu/LAB/dataarchives.htm These hotlinks to major social science data sets will give you many examples of variables defined and studied by researchers.
Indexes versus Scales
Indexes, Scales, and Typologies
Introduction Indexes versus Scales Index Construction Item Selection Examination of Empirical Relationships Index Scoring Handling Missing Data Index Validation The Status of Women: An Illustration of Index Construction
Scale Construction Bogardus Social Distance Scale Thurstone Scales Likert Scaling Semantic Differential Guttman Scaling Typologies
Sociology@Noww: Research Methods Use this online tool to help you make the grade on your next exam. After reading this chapter, go to the "Online Study Resources" at the end of the chapter for instructions on how to benefit from SoaologyNow: Research Aletlzods.
Introduction As we saw in Chapter 5, many social scientific concepts have complex and varied meanings. Making measurements that capture such concepts can be a challenge. Recall our discussion of content validity, which concerns whether we have captured all the different dimensions of a concept. To achieve broad coverage of the various dimensions of a concept. we usually need to make multiple observations pertaining to that concept. Thus, for example, Bruce Berg (1989: 21) advises in-depth interviewers to prepare essential questions, which are "geared toward eliciting specific, desired information." In addition, the researcher should prepare extra questions: "questions roughly equivalent to certain essential ones, but worded slightly differently." Multiple indicators are used with quantitative data as well. Suppose you're designing a survey. Although you can sometimes construct a single questionnaire item that captures the variable of interest-"Gender: 0 Male 0 Female" is a simple example-other variables are less straightforward and may require you to use several questionnaire items to measure them adequately. Quantitative data analysts have developed specific techniques for combining indicators into a single measure. This chapter discusses the construction of two types of composite measures of variables-indexes and scales. Although these measures can be used in any form of social research, they are most common in survey research and other quantitative methods. A short section at the end of this chapter considers typologies, which are relevant to both qualitative and quantitative research. Composite measures are frequently used in quantitative research, for several reasons. First. social scientists often wish to study variables that have no dear and unambiguous single indicators. Single indicators do suffice for some variables, such as age. We can determine a survey respondent's age by simply asking, "How old are you?" Similarly, we can determine a newspaper's circulation by merely looking at the figure the newspaper reports. In the case of
153
complex concepts, however, researchers can seldom develop single indicators before they actually do the research. This is especially true with regard to attitudes and orientations. Rarely can a survey researcher. for example, devise single questionnaire items that adequately tap respondents' degrees of prejudice, religiosity, political orientations, alienation, and the like. More likely, the researcher will devise several items, each of which provides some indication of the variables. Taken individually, each of these items is likely to prove invalid or unreliable for many respondents. A composite measure, however, can overcome this problem. Second, researchers may wish to employ a rather refined ordinal measure of a particular variable (alienarion. say), arranging cases in several ordinal categories from very low to very high, for example. A single data item might not have enough categories to provide the desired range of variation. However, an index or scale formed from several items can provide the needed range. Finally, indexes and scales are efficient devices for data analysis. If considering a single data item gives us only a rough indication of a given variable, considering several data items can give us a more comprehensive and more accurate indication. For example, a single newspaper editorial may give us some indication of the political orientations of that newspaper. Examining several editorials would probably give us a better assessment, but the manipulation of several data items simultaneously could be very complicated. Indexes and scales (especially scales) are efficient data-reduction devices: They allow us to summarize several indicators in a single numerical score, while sometimes nearly maintaining the specific details of all the individual indicators.
Indexes versus Scales The terms il1dex and scale are typically used imprecisely and interchangeably in social research literature . The two types of measures do have some characteristics in common, but in this book we'll distinguish between the two. However. you should
154
Chapter 6: Indexes, Scales, and Typologies
be warned of a growing tendency in the literature to use the term scale to refer to both indexes and scales, as they are distinguished here, First let's consider what they have in common, Both scales and indexes are ordinal measures of variables. Both rank-order the units of analysis in terms of specific variables such as religiosiry, alienarioll, socioecollomic status, prejudice, or imellectual sophisticariolL A person's score on either a scale or an index of religiosity, for example, gives an indication of his or her relative religiosity vis-a-vis other people, Further, both scales and indexes are composite measures of variables-that is, measurements based on more than one data item. Thus, a survey respondent's score on an index or scale of religiosity is determined by the responses given to several questionnaire items, each of which provides some indication of religiosity Similarly, a person's IQ score is based on answers to a large number of test questions. The political orientation of a newspaper might be represented by an index or scale score reflecting the newspaper's editorial policy on various political issues, Despite these shared characteristics, it's useful to distinguish between indexes and scales. In this book, we'll distinguish them by the way scores are assigned in each. We construct an index simply by accumulating scores assigned to individual attributes. We might measure prejudice, for example, by adding up the number of prejudiced statements each respondent agreed v'lith. We construct a scale, however, by assigning scores to patterns of responses, recognizing that some items reflect a relatively weak degree of the variable while others reflect something stronger. For example, agreeing that "Women are different from men" is, at best,
index A type of composite measure that summarizes and rank-orders several specific observations and represents some more general dimension, scale A type of composite measure composed of several items that have a logical or empirical structure among them, Examples of scales include Bogardus social distance. Guttman. Likert. and Thurstone scales.
Indexes versus Scales weak evidence of sexism compared with agreeing that "Women should not be allowed to vote." A scale takes advantage of differences in intensity among the attributes of the same variable to identify distinct patterns of response. Let's consider this simple example of sexism a bit further. Imagine asking people to agree or disagree with the two statements just presented, Some might agree ,vith both, some might disagree ,vith both, But suppose I told you someone agreed with one and disagree with the other: Could you guess which statement they agreed with and which they did not? I'd guess the person in question agreed that women were different but disagreed that they should be prohibited from voting. On the other hand. I doubt that anyone would want to prohibit women from voting, while asserting that there is no difference between men and women, That would make no sense. Nmv consider this. The two responses we wanted from each person would technically yield four response patterns: agree/agree, agree/disagree, disagree/agree, and disagree/disagree. We've just seen, however, that only three of the four patterns make any sense or are likely to occur. Where indexes score people based on their responses, scales score people on the basis of response pattems: We determine what the logical response patterns are and score people in terms of the pattern their responses most closely resemble. Figure 6-1 provides a graphic illustration of the difference between indexes and scales. Let's assume we want to develop a measure of political activism. distinguishing those people who are very active in political affairs, those who don't participate much at all, and those who are somewhere in between, The first part of Figure 6-1 illustrates the logic of indexes, The figure shows six different political actions. Although you and I might disagree on some specifics, I think we could agree that the six actions represent roughly the same degree of political activism. Using these six items, we could construct an index of political activism by giving each person 1 point for each of the actions he or she has taken. If you wrote to a public official and signed a petition,
155
Index-Construction Logic Here are several types of political actions people may have taken. By and large, the different actions represent similar degrees of political activism.. To create an index of overall political activism, we might give people 1 point for each of the actions they've taken.
Wrote a letter to a public official
Signed a political petition
Gave money toa political cause
Gave money to a political candidate
Wrote a political letter to the editor
Persuaded someone to change her or his voting plans!
II
Scale-Construction Logic Here are some political actions that represent very different degrees of activism: e.g., running for office represents a higher degree of activism than simply voting does. It seems likely, moreover, that anyone who has taken one of the more demanding actions would have taken all the easier ones as well. To construct a scale of political activism, we might score people according to which of the following "ideal" patterns comes closest to describing them,
Ran for office Worked on a political campaign Contributed money to a political campaign Voted
No
I I
Yes
o
I I
Yes 2
I I
Yes 3
I I
Yes 4
FIGURE 6-1 Indexes versus Scales. Both indexes and scales seek to measure variables such as political activism. Whereas indexes count the number of indicators of the variable, scales take account of the differing intensities of those indicators.
you'd get a total of 2 points. If I gave money to a candidate and persuaded someone to change her or his vote, I'd get the same score as you. Using this approach, we'd conclude that you and I had the same degree of political activism, even though we had taken different actions. The second part of Figure 6-1 describes the logic of scale construction. In this case, the actions clearly represent different degrees of political activism, ranging from simply voting to running for office. Moreover, it seems safe to assume a pattern of actions in this case. For example, all those who contributed money probably also voted. Those who
worked on a campaign probably also gave some money and voted, This suggests that most people \vill fall into only one of five idealized action patterns, represented by the illustrations at the bottom of the figure. The discussion of scales. later in this chapter, describes ways of identifying people with the type they most closely represent. As you might surmise, scales are generally superior to indexes, because scales take into consideration the intensity with which different items reflect the variable being measured, Also, as the example in Figure 6-1 shows, scale scores convey more information than index scores do. Again,
Index Construction" 157
156 " Chapter 6: Indexes, Scales, and Typologies
be aware that the term scale is commonly misused to refer to measures that are only indexes" Merely calling a measure a scale instead of an index doesn't make it bettec There are two other misconceptions about scaling that you should know abouL First, whether the combination of several data items results in a scale almost always depends on the particular sample of observations under study. Certain items may form a scale within one sample but not within anotheL For this reason, do not assume that a given set of items is a scale sin1ply because it has turned out that way in an earlier study. Second, the use of specific scaling techniquessuch as Guttman scaling, to be discussed-does not ensure the creation of a scale" Rather. such techniques let us determine whether or not a set of items constitutes a scale. An examination of actual social science research reports vvilI show that researchers use indexes much more frequently than they do scales" Ironically, however, the methodological literature contains little if any discussion of index construction, whereas discussions of scale construction abound" There appear to be two reasons for this disparity" First. indexes are more frequently used because scales are often difficult or impossible to construct from the data at hand" Second, methods of index construction seem so obvious and straightforvvard that they aren't discussed much" Constructing indexes is not a simple undertaking, howeveL The general failure to develop index construction techniques has resulted in many bad indexes in social research" With this in mind, I've devoted over half of this chapter to the methods of index construction. With a solid understanding of the logic of this activity, you'll be better equipped to try constructing scales. Indeed, a carefully constructed index may turn out to be a scale.
Index Construction Let's look now at four main steps in the construction of an index: selecting possible items, examining their empirical relationships, scoring the index,
and validating it- We'll conclude this discussion by examining the construction of an index that provided interesting findings about the status of women in different countries.
Item Selection The first step in creating an index is selecting items for a composite index, which is created to measure some variable.
Face Validity The first criterion for selecting items to be included in an index is face validity (or logical validity)" If you want to measure political conservatism, for example, each of your items should appear on its face to indicate conservatism (or its opposite, liberalism)" Political party affiliation would be one such item. Another would be an item asking people to approve or disapprove of the views of a wellknown conservative public figure. In constructing an index of religiosity, you might consider items such as attendance at religious services, acceptance of certain religious beliefs, and frequency of prayer; each of these appears to offer some indication of religiosity.
Unidimensionality The methodological literature on conceptualization and measurement stresses the need for unidimensionality in scale and index construction" That is, a composite measure should represent only one dimension of a concept Thus, items reflecting religious fundamentalism should not be included in a measure of political conservatism, even though the two variables might be empirically related to each other.
on ritual participation in religion, you should choose items specifically indicating this type of religiosity: attendance at religious services and other rituals such as confession, bar mitzvah, bowing toward Mecca, and the like. If you wished to measure religiosity in a more general way, you would include a balanced set of items, representing each of the different types of religiosity. Ultimately, the nature of the items you include will determine how specifically or generally the variable is measured.
Variance In selecting items for an index, you must also be concerned with the amount of variance they provide. If an item is intended to indicate political conservatism, for example, you should note what proportion of respondents would be identified as conservatives by that item. If a given item identified no one as a conservative or everyone as a conservative-for example, if nobody indicated approval of a radical-right political figure-that item would not be very useful in the construction of an index" To guarantee variance, you have two options" First. you may select several items the responses to which divide people about equally in terms of the variable, for example, about half conservative and half liberal. Although no single response would justify the characterization of a person as very conservative, a person who responded as a conservative on all items might be so characterized" The second option is to select items differing in variance. One item might identify about half the subjects as conservative, while another might identify few of the respondents as conservatives" Note that this second option is necessary for scaling, and it is reasonable for index construction as well.
Examination of Empirical Relationships General or Specific Although measures should tap the same dimension, the general dimension you're attempting to measure may have many nuances. In the example of religiosity, the indicators mentioned previouslyritual participation, belief. and so on-represent different types of religiosity If you wished to focus
The second step in index construction is to examine the empirical relationships among the items being considered for inclusion. (See Chapter 14 for more.) An empirical relationship is established when respondents' answers to one question-in a questionnaire, for example-help us predict how they'll answer other questions. If two items are
empirically related to each other, we can reasonably argue that each reflects the same variable, and we may include them both in the same index" There are two types of possible relationships among items: bivariate and multivariate.
Bivariate Relationships A bivariate relationship is, sin1ply put, a relationship between two variables. Suppose we want to measure respondents' support for US" participation in the United Nations. One indicator of different levels of support might be the question "Do you feel the U.S. financial support of the UN is D Too high D About right D Too low?" A second indicator of support for the United Nations might be the question "Should the United States contribute military personnel to UN peacekeeping actions? D Strongly approve D Mostly approve D Mostly disapprove D Strongly disapprove"" Both of these questions, on their face, seem to reflect different degrees of support for the United Nations. Nonetheless, some people might feel the United States should give more money but not provide troops" Others might favor sending troops but cutting back on financial support" If the two items both reflect degrees of the same thing, however, we should expect responses to the two items to generally correspond with each othec Specifically, those who approve of military support should be more likely to favor financial support than those who disapprove of military support would" Conversely, those who favor financial support should be more likely to favor military support than those disapproving of financial support would" If these expectations are met, we say there is a bivariate relationship between the two items" Here's another example. Suppose we want to determine the degree to which respondents feel women have the right to an abortion. We might ask (1) "Do you feel a woman should have the right to an abortion when her pregnancy was the result of rape?" and (2) "Do you feel a woman should have the right to an abortion if continuing her pregnancy would seriously threaten her life?"
158
Chapter 6: Indexes, Scales, and Typologies Index Construction
by Kenneth Bollen Department of Sociology, University of North Carolina, Chapel Hilf
hile it often makes sense to expect indicators of the same variable to be positively related to one another, as discussed in the text, this is not always the case. Indicators should be related to one another if they are essentially "effects" of avariable. For example, to measure self-esteem, we might ask a person to indicate whether he or she agrees or disagrees with the statements (1) "I am agood person" and (2) "I am happy with myself" Aperson with high self-esteem should agree with both statements while one vlith low self-esteem would probably disagree with both . Since each indicator depends on or"refiects" self-esteem, we expect them to be positively correlated. More generally, indicators that depend on the same variable should be associated with one another if they are valid measures. But, this is not the case when the indicators are the "cause" rather than the "effect" of avariable. In this situation the indicators may correlate positively, negatively, or not at all For example, we could use gender and race as indicawrs of the variable exposure to discrimination. Being
Granted, some respondents might agree Vlrith item (1) and disagree ,\rith item (2); others ,\rill do just the reverse. However, if both items tap into some general opinion people have about the issue of abortion, then the responses to these two items should be related to each other. Those who support the right to an abortion in the case of rape should be more likely to support it if the woman's life is threatened than those who disapproved of abortion in the case of rape would. This would be another example of a bivariate relationship. You should examine all the possible bivariate relationships among the several items being considered for inclusion in an index, in order to determine the relative strengths of relationships among the several pairs of items. Percentage tables, correlation coefficients (see Chapter 16), or both may be used for this purpose. How we evaluate the strength of the relationships, however, can be
nonwhite or female increases the likelihood of experiencing discrimination, so both are good indicators of the variable. But we would not expect the race and gender of individuals to be strongly associated. Or, we may measure social interaction with three indicators:time spent with friends, time spent with family, and time spent with coworkersThough each indicator is valid, they need not be positively correlated. Time spent with friends, for instance, may be inversely related to time spent with family. Here, the three indicators"cause"the degree of social interaction. As afinal example, exposure to stress may be measured by whether a person recently experienced divorce, death of aspouse, or loss of ajob. Though any of these events may indicate stress, they need not correlate with one another. In short, we expect an association between indicators that depend on or "reflect" a variable, that is, if they are the"effects" of the variable. But if the variable depends on the indicators-if the indicators are the "causes"-those indicators may be either positively or negatively correlated, or even unrelatedTherefore, we should decide whether indicators are causes or effects of a variable before using their intercorrelations to assess their validity
rather subtle. "'Cause' and 'Effect' Indicators" examines some of these subtleties. Be wary of items that are not related to one another empirically: It's unlikely that they measure the same variable. You should probably drop any item that is not related to several other items. At the same time, a very strong relationship between two items presents a different problem. If two items are perfectly related to each other, then only one needs to be included in the index; because it completely conveys the indications provided by the other, nothing more would be added by including the other item. (This problem mil become even clearer in the next section.) Here's an example to illustrate the testing of bivariate relationships in index construction. I once conducted a survey of medical school faculty members to find out about the consequences of a "scientific perspective" on the quality of patient care
provided by physicians. The primary intent was to determine whether scientifically inclined doctors treated patients more in1personally than other doctors did. The survey questionnaire offered several possible indicators of respondents' scientific perspectives. Of those, three items appeared to provide especially clear indications of whether the doctors were scientifically oriented: 1. As a medical school faculty member, in what
capacity do you feel you can make your greatest teachil1g contribution: as a practicing physician or as a medical researcher? 2. As you continue to advance your own medical knowledge, would you say your ultimate medical interests lie primarily in the direction of total patient management or the understanding of basic mechanisms? [The purpose of tlus item was to distinguish those who were mostly interested in overall patient care from those mostly interested in biological processes.] 3. In the field of therapeutic research, are YOll generally more interested in articles reporting evaluations of the effectiveness of various treatments or articles exploring the basic rationale underlying the treatments? [Similarly, I wanted to distinguish those more interested in articles dealing mth patient care from those more interested in biological processes.] (Babbie 1970.· 27-31)
For each of these items, we might conclude that those respondents who chose the second answer are more scientifically oriented than respondents who chose the first answer. Though this comparative conclusion is reasonable, we should not be misled into tlunking that respondents who chose the second answer to a given item are scientists in any absolute sense . They are simply more scientifically oriented than those who chose the first answer to the item. To see this point more clearly, let's examine the distribution of responses to each item. From the first item-greatest teaching contribution-only about one-third of the respondents appeared scientifically oriented. That is, approximately onethird said they could make their greatest teaching
159
contribution as medical researchers. In response to the second item-ultimate medical interests-approximately two-thirds chose the scientific answer, saying they were more interested in learning about basic mechanisms than learning about total patient management. In response to the third item-reading preferences-about 80 percent chose the scientific answer. These three questionnaire items can't tell us how many "scientists" there are in the sample, for none of them is related to a set of criteria for what constitutes being a scientist in any absolute sense. Using the items for this purpose would present us mth the problem of three quite different estimates of how many scientists there were in the sample. However, these items do provide us mth three independent indicators of respondents' relative inclinations toward science in medicine. Each item separates respondents into the more scientific and the less scientific. But each grouping of more or less scientific respondents will have a somewhat different memberslup from the others. Respondents who seem scientific in terms of one item "rill not seem scientific in terms of another. Nevertheless, to the extent that each item measures the same general dimension, we should find some correspondence among the several groupings. Respondents who appear scientific in terms of one item should be more likely to appear scientific in their response to another item than would those who appeared nonscientific in their response to the first. In other words, we should find an association or correlation between the responses given to two items. Figure 6-2 shows the associations among the responses to the three items. Three bivariate tables are presented, shomng the distribution of responses for each possible pairing of items. An examination of the three bivariate relationships presented in the figure supports the suggestion that the three items all measure the same variable: sciemific oriel1tatiol1. To see why tl1is is so, let's begin by looking at the first bivariate relationship in the table. The table shows that faculty who responded that "researcher" was the role in which they could make their greatest teaching contribution were more likely to identify their ultimate medical interests as "basic mechanisms" (87 percent) than
160
Chapter 6: Indexes, Scales, and Typologies
a.
Greatest Teaching Contribution Physician
....
I/)
QI "QI QI
cae e=
- .-C'a 'C
Researcher
Total patient management
:;::
:::)
(.)
QI
::iE
Basic mechanisms 100%
(268)
b.
100% (159)
Reading Preferences Effectiveness
Rationale
Ui QI Total patient "-
QI QI ........ C\l c:
management
e=
-
;:: co
(.) Basic :::)'6
QI
::iE
mechanisms 100%
(78)
c.
100% (349)
Effectiveness
Rationale
Physician
QI ::::l
1-.0
..,"i:
1/) ....
QI c: .... 0
mu
The same general conclusion applies to the other bivariate relationships. The strength of the relationship between reading preferences and ultimate medical interests may be summarized as a 38 percentage point difference, and the strength of the relationship between reading preferences and the two teaching roles as a 21 percentage point difference. In summary, then, each single item produces a different grouping of "scientific" and "nonscientific" respondents. However, the responses given to each of the items correspond, to a greater or lesser degree, to the responses given to each of the other items. Initially, the three items were selected on the basis of face validity-each appeared to give some indication of faculty members' orientations to science. By examining the bivariate relationship between the pairs of items, we have found support for the expectation that they all measure basically the same thing. However, that support does not sufficiently justify including the items in a composite index. Before combining them in a single index, we need to examine the multivariate relationships among the several variables.
Reading Preferences
Cl
c: .- c: '50 ('0;
Index Construction
Researcher
"-
Cl
100%
(78)
100% (349)
FIGURE 6-2 Bivariate Relationships among Scientific Orientation Items. If several indicators are measures of the same variable, then they should be empirically correlated with one another.
were those who answered "physician" (51 percent). The fact that the "physicians" are about evenly split in their ultimate medical interests is irrelevant for our purposes. It is only relevant that they are less scientific in their medical interests than the "researchers." The strength of this relationship may be summarized as a 36 percentage point difference.,
Multivariate Relationships among Items Figure 6- 3 categorizes the sample respondents into four groups according to (1) their greatest teaching contribution and (2) their reading preferences. The numbers in parentheses indicate the number of respondents in each group. Thus, 66 of the faculty members who said they could best teach as physicians also said they preferred articles dealing with the effectiveness of treatments. For each of the four groups, the figure presents the percentage of those who say they are ultimately more interested in basic mechanisms. So, for example, of the 66 faculty mentioned, 27 percent are primarily interested in basic mechanisms. The arrangement of the four groups is based on a previously drawn conclusion regarding scientific orientations. The group in the upper left corner of the table is presumably the least scientifically oriented, based on greatest teaching contribution and reading preference. The group in the lower right corner is presumably the most scientifically oriented in terms of those items.
Percent Interested in Basic Mechanisms
Percent Interested in Basic Mechanisms
Greatest Teaching Contribution Physician I/)
ClQl
c: (.) ._ c:
Effectiveness of treatments
a.
Physician I/)
g' c:~ ._
Researcher
Effectiveness of treatments
'CQI
C\l "-
a:~
Greatest Teaching Contribution
Researcher
'CQI QI.l!!
161
Rationale behind treatments
m.!
Rationale
a: a.~ behind
treatments
FIGURE 6-3
FIGURE 6-4
Trivariate Relationships among Scientific Orientation Items. Indicators of the same variable should be correlated in a mUltivariate analysis as well as in bivariate analyses.
Hypothetical Trivariate Relationship among Scientific Orientation Items. This hypothetical relationship would suggest that not all three indicators would contribute effectively to a compOSite index.
Recall that expressing a primary interest in basic mechanisms was also taken as an indication of scientific orientation. As we should expect then, those in the lower right corner are the most likely to give this response (89 percent), and those in the upper left corner are the least likely (27 percent). The respondents who gave mixed responses in terms of teaching contributions and reading preferences have an intermediate rank in their concern for basic mechanisms (58 percent in both cases). This table tells us many things. First we may note that the original relationships between pairs of items are not significantly affected by the presence of a third item. Recall. for example, that the relationship between teaching contribution and ultimate medical interest was summarized as a 36 percentage point difference. Looking at Figure 6-3, we see that among only those respondents who are most interested in articles dealing with the effectiveness of treatments, the relationship between teaching contribution and ultimate medical interest is 31 percentage points (58 percent minus 27 percent: first row). The same is true among those most interested in articles dealing with the rationale for treatments (89 percent minus 58 percent: second row), The original relationship between teaching contribution and ultimate medical interest is essentially the same as in Figure 6-2, even among those respondents judged as scientific or nonscientific in terms of reading preferences. We can draw the same conclusion from the columns in Figure 6-3.. Recall that the original
relationship between reading preferences and ultimate medical interests was summarized as a 38 percentage point difference. Looking only at the "physicians" in Figure 6-3, we see that the relationship between the other two items is now 31 percentage points. The same relationship is found among the "researchers" in the second column. The importance of these observations becomes clearer when we consider what might have happened. In Figure 6-4, hypothetical data tell a much different story than the actual data in Figure 6- 3 do. As you can see, Figure 6-4 shows that the original relationship between teaching role and ultimate medical interest persists, even when reading preferences are introduced into the picture. In each row of the table, the "researchers" are more likely to eXlJress an interest in basic mechanisms than the "physicians" are. Looking down the columns, however, we note that there is no relationship between reading preferences and ultimate medical interest. If we know whether a respondent feels he or she can best teach as a physician or as a researcher, knOwing the respondent's reading preference adds nothing to our evaluation of his or her scientific orientation. If something like Figure 6-4 resulted from the actual data, we would conclude that reading preference should not be included in the same index as teaching role, because it contributed nothing to the composite index.
162
Index Construction
Chapter 6: Indexes, Scales, and Typologies
This example used only three questionnaire items. If more were being considered, then morecomplex multivariate tables would be in order, constructed of four, five, or more variables. The purpose of this step in index construction, again, is to discover the simultaneous interaction of the items in order to determine which should be included in the same index. These kinds of data analyses are easily accomplished using programs such as SPSS and MicroCase. They are usually referred to as cross-tabulations.
Index Scoring When you've chosen the best items for your index, you next assign scores for particular responses, thereby creating a single composite measure out of the several items. There are two basic decisions to be made in this step. First you must decide the desirable range of the index scores. A primary advantage of an index over a single item is the range of gradations it offers in the measurement of a variable. As noted earlier, political conservatism might be measured from "very conservative" to "not at all conservative" or "very liberaL" How far to the extremes, then, should the index extend? In this decision, the question of variance enters once more . Almost always, as the possible extremes of an index are extended, fewer cases are to be found at each end. The researcher who wishes to measure political conservatism to its greatest extreme (somewhere to the right of Attila the Hun, as the saying goes) may find there is almost no one in that category. At some point additional gradations do not add meaning to the results. The first decision, then, concerns the conflicting desire for (I) a range of measurement in the index and (2) an adequate number of cases at each point in the index. You'll be forced to reach some kind of compromise between these conflicting desires. The second decision concerns the actual assignment of scores for each particular response. Basically you must decide whether to give items in the index equal weight or different weights. Although there are no firm rules, I suggest -and practice
tends to support this method-that items be weighted equally unless there are compelling reasons for differential weighting. That is, the burden of proof should be on differential weighting; equal weighting should be the norm. Of course, this decision must be related to the earlier issue regarding the balance of items chosen. If the index is to represent the composite of slightly different aspects of a given variable, then you should give each aspect the same weight. In some instances, however, you may feel that two items reflect essentially the same aspect, and the third reflects a different aspect. If you want to have both aspects equally represented by the index, you might give the different item a weight equal to the combination of the two similar ones. For instance, you could assign a maximum score of 2 to the different item and a maximum score of I to each of the similar ones. Although the rationale for scoring responses should take such concerns as these into account typically researchers experiment with different scoring methods, examining the relative weights given to different aspects but at the same time worrying about the range and distribution of cases provided. Ultimately, the scoring method chosen will represent a compromise among these several demands. Of course, as in most research activities, such a decision is open to revision on the basis of later examinations. Validation of the index, to be discussed shortly, may lead the researcher to recycle his or her efforts toward constructing a completely different index. In the example taken from the medical school faculty survey, I decided to weight the items equally, because I'd chosen them, in part, because they represent slightly different aspects of the overall variable scientific orientation. On eacl1 of the items, the respondents were given a score of I for choosing the "scientific" response to the item and a score of for choosing the "nonscientific" response. Each respondent then, could receive a score of 0, 1, 2, or 3. This scoring method provided what I considered a useful range of variation-four index categories-and also provided enough cases for analysis in each category. Here's a similar example of index scoring, from a study of work satisfaction. One of the key
°
variables was job-related depressiol1, measured by an index composed of the following four items, which asked workers how they felt when thinking about themselves and their jobs: e
"I feel dovvnhearted and blue."
e
"I get tired for no reason."
e
"I find myself restless and can't keep stilL"
e
"I am more irritable than usual."
The researchers, Amy Wharton and James Baron, report, "Each of these items was coded: 4 often, 3 = sometimes, 2 rarely, I = never." They go on to explain how they measured another variable, job-related self-esteem: Job-related self-esteem was based on four items asking respondents how they saw themselves in their work: happy/sad; successful/not successful; important/not important; doing their bestlnot doing their best. Each item ranged from 1 to 7, 'where 1 indicates a self-perception of not being happy, successfuL important, or doing one's best. (1987578)
As you look through the social research literature, you'll find numerous similar examples of cumulative indexes being used to measure variables. Sometimes the indexing procedures are controversiaL as evidenced in "What Is the Best College in the United States?"
Handling Missing Data Regardless of your data-collection method, you'll frequently face the problem of missing data. In a content analysis of the political orientations of newspapers, for example, you may discover that a particular newspaper has never taken an editorial position on one of the issues being studied. In an experimental design involving several retests of subjects over time, some subjects may be unable to participate in some of the sessions. In virtually every survey, some respondents fail to answer some questions (or choose a "don't know" response). Although missing data present problems at all stages of analysis, they're especially troublesome
163
in index construction. There are, however, several methods of dealing with these problems. First if there are relatively few cases with missing data, you may decide to exclude them from the construction of the index and the analysiS. (I did this in the medical school faculty example.) The primary concerns in this instance are whether the numbers available for analysis vvill remain sufficient and whether the exclusion ,viII result in an unrepresentative sample whenever the index, excluding some of the respondents, is used in the analysis. The latter possibility can be examined through a comparison-on other relevant variables-of those who would be included and excluded from the index. Second, you may sometimes have grounds for treating missing data as one of the available responses. For example, if a questionnaire has asked respondents to indicate their participation in various activities by checking "yes" or "no" for each, many respondents may have checked some of the activities "yes" and left the remainder blank. In such a case, you might decide that a failure to answer meant "no," and score missing data in this case as though the respondents had checked the "no" space. Third, a careful analysis of missing data may yield an interpretation of their meaning. In constructing a measure of political conservatism, for example, you may discover that respondents who failed to answer a given question were generally as conservative on other items as those who gave the conservative answer. In another example, a recent study measuring religious beliefs found that people vvho answered "don't know" about a given belief were almost identical to the "disbelievers" in their answers about other beliefs. (Note: You should take these examples not as empirical guides in your own studies but only as suggestions of general ways to analyze your own data.) Whenever the analysis of missing data yields such interpretations, then, you may decide to score such cases accordingly. There are many other ways of handling the problem of missing data. If an item has several possible values, you might assign the middle value to cases ,vith missing data; for example, you could assign a 2 if the values are 0, L 2, 3, and 4. For a
Index Construction " 165 164 " Chapter 6: Indexes, Scales, and Typologies
ach year the newsmagazine US News and Worfd Report issues a special report ranking the nation's colleges and universities Their rankings reflect an index,created from several items:educational expenditures per student, graduation rates,selectivity (percentage accepted of those applying), average SAT scores of first-year students, and similar indicators of quality. Typically, Harvard is ranked the number one school in the nation, followed by Yale and Princeton However, the 1999"America's Best Colleges"issue shocked educators, prospective college students, and their parents The California Institute ofTechnology had leaped from ninth place in 1998 to first place ayear later While Harvard, Yale, and Princeton still did well, they had been supplanted What had happened at Caltech to produce such a remarkable surge in quality? The answer was to be found at US News and Worfd Report, not at CaltechThe newsmagazine changed the structure of the ranking index in 1999, which made a big difference in how schools fared. Bruce Gottlieb (1999) gives this example of how the altered scoring made adifference.
E
So, how did Caltech come out on top? Well, one variable in a school's ranking has long been educational expenditures per student, and Caltech has traditionally been tops in this category. But until this year, US News considered only aschool's ranking in this category-first, second, etc-rather than how much it spent relative to other schools It didn't matter whether Caltech beat Harvard by $1 or by $10Q,000Two other schools that rose in their rankings this year were MIT (from fourth to third) and Johns Hopkins (from 14th to seventh) All three have high per-student expenditures and three are in the hard sciences. Universities are
continuous variable such as age, you could similarly assign the mean to cases with missing data (more on this in Chapter 14). Or, missing data can be supplied by assigning values at random. All of these are conservative solutions because they weaken the "purity" of your index and reduce the likelihood that it will relate to other variables in ways you may have hypothesized. If you're creating an index out of a large number of items, you can sometimes handle missing data by using proportions based on what is observed. Suppose your index is composed of six indicators, and
allowed to count their research budgets in their per-student expenditures, though students get no direct benefit from costly research their professors are doing outside of class. In its"best colleges"issue two years ago, US NeVIS made precisely this point,saying it considered only the rank ordering of perstudent expenditures, ratherthan the actual amounts,on the grounds that expenditures at institutions with large research programs and medical schools are substantially higher than those at the rest of the schools in the category. In other words,justtwo years ago,the magazine felt it unfair to give Caltech,MIT,and Johns Hopkins credit for having lots offancy laboratories that don't actually improve undergraduate education. Gottlieb reviewed each of the changes in the index and then asked how 1998's ninth-ranked Caltech would have done had the revised indexing formula been in place ayear earlier. His conclusion: Caltech would have been first in 1998 as well.ln other words, the apparent improvement was solely afunction of how the index was scored. Composite measures such as scales and indexes are valuable tools for understanding society. However, it's important that we know how those measures are constructed and what that construction implies. So, what's really the best college in the United States? It depends on how you define"best"There is no"really best," only the various social constructions we can create. Sources: us. News and World RepoTi,"America's Best CollegeS," August 30, 1999; Bruce Books:How U.s. NeVIS Cheats in Picking Its 'Best ;\,merican (hnp:llslate.msn .comldefault.aspx lid =34027).
you only have four observations for a particular subject If the subject has earned 4 points out of a possible 4, you might assign an index score of 6; if the subject has 2 points (half the possible score on four items), you could assign a score of 3 (half the possible score on six observations). The choice of a particular method to be used depends so much on the research situation that I can't reasonably suggest a single "best" method or rank the several I've described. Excluding all cases with missing data can bias the representativeness of the findings, but including such cases by assigning
scores to missing data can influence the nature of the findings. The safest and best method is to construct the index using more than one of these methods and see whether you reach the same conclusions using each of the indexes. Understanding your data is the final goal of analysis anyway.
Index Validation Up to this point, we've discussed all the steps in the selection and scoring of items that result in an index purporting to measure some variable. If each of the preceding steps is carried out carefully, the likelihood of the index actually measuring the variable is enhanced. To demonstrate success, however, we must show that the index is valid. Following the basic logic of validation, we assume that the index provides a measure of some variable; that is, the scores on the index arrange cases in a rank order in terms of that variable. An index of political conservatism rank-orders people in terms of their relative conservatism. If the index does that successfully, then people scored as relatively conservative on the index should appear relatively conservative in all other indications of political orientation, such as their responses to other questionnaire items, There are several methods of validating an index.
Item Analysis The first step in index validation is an internal validation called item analysis. In item analysis, you examine the extent to which the index is related to (or predicts responses to) the individual items it comprises. Here's an illustration of this step. In the index of scientific orientations amanab medical school faculty, index scores ranged from 0 (most interested in patient care) to 3 (most interested in research). Now let's consider one of the items in the index: whether respondents wanted to advance their own knowledge more with regard to total patient management or more in the area of basic mechanisms. The latter were treated as being more Scientifically oriented than the former. The following empty table shows how we would examine the relationship between the index and the individual item.
Index ofScientific Orientations
a Percentage who said they were more interested in basic mechanisms
??
??
2
3
??
??
If you take a minute to reflect on the table, you may see that we already know the numbers that ao in two of the cells. To get a score of 3 on the inde; respondents had to say "basic mechanisms" in response to this question and give the "scientific" answers to the other two items as welL Thus, 100 percent of the 3's on the index said "basic mechanisms." By the same token, all the O's had to answer this item with "total patient management" Thus, 0 percent of those respondents said "basic mechanisms." Here's how the table looks with the information we already know. Index ofScientific Orientations
a Percentage who said they were more interested in basic mechanisms
a
??
2
3
??
100
If the individual item is a good reflection of the overall index, we should expect the 1's and 2's to fill in a progression between 0 percent and 100 percent. More of the 2's should choose "basic mechanisms" than 1'So This result is not guaranteed by the way the index was constructed, however; it is an empirical question-one we answer in an item analysis. Here's how this particular item analysis turned out Index of Scientific Orientations
a Percentage who said they were more interested in basic mechanisms
a
16
2
3
91
100
item analysis An assessment of whether each of the items included in a composite measure makes an independent contribution or merely duplicates the contribution of other items in the measure.
166
Chapter 6: Indexes, Scales, and Typologies
Index Construction
As you can see, in accord with our assumption that the 2's are more scientifically oriented than the 1's, we find that a higher percentage of the 2's (91 percent) say "basic mechanisms" than the l's (16 percent). An item analysis of the other two components of the index yields similar results, as shown here . Index of Scientific Orientations
0
Percentage who said they could teach best as medical researchers Percentage who said they preferred reading about rationales
2
3
0
4
14
100
0
80
97
100
Each of the items, then, seems an appropriate component in the index. Each seems to reflect the same quality that the index as a whole measures. In a complex index containing many items, this step provides a convenient test of the independent contribution of each item to the index. If a given item is found to be poorly related to the index, it may be assumed that other items in the index cancel out the contribution of that item, and it should be excluded from the index. If the item in question contributes nothing to the index's power, it should be excluded. Although item analysis is an important first test of an index's validity, it is not a sufficient test. If the index adequately measures a given variable, it should successfully predict other indications of that variable. To test this, we must turn to items not included in the index.
External Validation People scored as politically conservative on an index should appear conservative by other measures as well. such as their responses to other items in a
external validation The process of testing the validitv of a measure, such as an index or scale, by exami;ling its relationship to other. presumed indicators of the same variable. If the index really measures prejudice, for example, it should correlate with other indicators of prejudice.
TABLE 6-1
Validation of Scientific Orientation Index Index ofScientific Orientation
2
High 3
Low
a Percent interested in attending scientific lectures at the medical school Percent who say faculty members should have experience as medical researchers Percent who would prefer faculty duties involving research activities only Percent who engaged in research during the preceding academic year
34
42
46
65
43
60
65
89
0
8
32
66
61
76
94
99
questionnaire. Of course, we're talking about relative conservatism, because we can't define conservatism in any absolute way. However, those respondents scored as the most conservative on the index should score as the most conservative in answering other questions. Those scored as the least conservative on the index should score as the least conservative on other items . Indeed, the ranking of groups of respondents on the index should predict the ranking of those groups in answering other questions dealing with political orientations. In our example of the scientific orientation index, several questions in the questionnaire offered the possibility of such external validation. Table 6-1 presents some of these items, which provide several lessons regarding index validation. First we note that the index strongly predicts the responses to the validating items in the sense that the rank order of scientific responses among the four groups is the same as the rank order provided by the index itself. That is, the percentages reflect greater scientific orientation as you read across the rows of the table. At the same time, each item gives a different description of scientific orientations overalL For example, the last validating item indicates that the great majority of all faculty were engaged in
167
research during the preceding year. If this were the only indicator of scientific orientation, we would conclude that nearly all faculty were scientific Nevertheless, those scored as more scientific on the index are more likely to have engaged in research than were those who were scored as relatively less scientific The third validating item provides a different descriptive picture: Only a minority of the faculty overall say they would prefer duties limited exclusively to research. Nevertheless, the relative percentages giving this answer correspond to the scores assigned on the index.
There's no cookbook solution to this problem; it is an agony serious researchers must learn to survive. Ultimately, the wisdom of your decision to accept an index will be determined by the usefulness of that index in your later analyses. Perhaps you'll initially decide that the index is a good one and that the validators are defective, but you'll later find that the variable in question (as measured by the index) is not related to other variables in the ways you expected. You may then have to compose a new index .
Bad Index versus Bad Vafidators
The Status of Women: An lfIustration of Index Construction
Nearly every index constructor at some time must face the apparent failure of external items to validate the index. If the internal item analysis shows inconsistent relationships between the items included in the index and the index itself, something is wrong with the index. But if the index fails to predict strongly the external validation items, the conclusion to be drawn is more ambiguous. In this situation we must choose between two pOssibilities: (1) the index does not adequately measure the variable in question, or (2) the validation items do not adequately measure the variable and thereby do not provide a sufficient test of the index. Having worked long and conscientiously on the construction of an index, you'll likely find the second conclusion compelling. Typically, you'll feel you have included the best indicators of the variable in the index; the validating items are, therefore, second~rate indicators. Nevertheless, you should recognize that the index is purportedly a very powerful measure of the variable; thus, it should be somewhat related to any item that taps the variable even poorly. When external validation fails, you should reexamine the index before deciding that the validating items are insufficient. One way to do this is to examine the relationships between the validating items and the individual items included in the index. If you discover that some of the index items relate to the validators and others do not you'll have improved your understanding of the index as it was initially constituted.
For the most part, our discussion of index construction has focused on the specific context of survey research, but other types of research also lend themselves to this kind of compOsite measure. For example, when the United Nations (1995) set out to examine the status of women in the world, they chose to create two indexes, reflecting two different dimensions. The Gender-related Development Index (GDI) compared women to men in terms of three indicators: life eX1Jectancy education, and income. These indicators are commonly used in monitoring the status of women in the world. The Scandinavian countries of Nonvay, Sweden, Finland, and Denmark ranked highest on this measure. The second index, the Gender Empowerment Measure (GEM), aimed more at power issues and comprised three different indicators: €I
€I
€I
The proportion of parliamentary seats held by women The proportion of administrative, managerial. professional. and technical positions held by women A measure of access to jobs and wages
Once again, the Scandinavian countries ranked high but were joined by Canada, New Zealand, the Netherlands, the United States, and Austria. Having two different measures of gender equality rather than one allowed the researchers to make more sophisticated distinctions. For example, in several
168
Scale Construction
Chapter 6: Indexes, Scales, and Typologies
countries, most notably Greece, France, and Japan, women fared relatively well on the GDI but quite poorly on the GEM. Thus, while women were doing fairly well in terms of income, education, and life expectancy, they were still denied access to power. And whereas the GDI scores were higher in the wealthier nations than in the poorer ones, GEM scores showed that women's empowerment depended less on national wealth, with many poor, developing countries outpacing some rich, industrial ones in regard to such empowerment. By examining several different dinlensions of the variables involved in their study, the UN researchers also uncovered an aspect of women's earnings that generally goes unnoticed. population Communications International (1996: 1) has summarized the finding nicely: Every year, women make an invisible contribution of eleven trillion US. dollars to the global economy, the UNDP [United Nations Development Programme] report says, counting both unpaid work and the underpay-ment of women's work at prevailing market prices. This "underevaluation" of women's work not only undermines their purchasing power, says the 1995 HDR [Human Development Report], but also reduces their already low social status and affects their ability to own property and use credit. Mahbub ul Haq, the principal author of the report, says that "if women's work were accurately reflected in national statistics, it would shatter the myth that men are the main bread\vinner of the world." The UNDP report finds that women work longer hours than men in almost every country, including both paid and unpaid duties. In developing countries, women do approximately 53% of all work and spend two-thirds of their work time on unremunerated activities. In industrialized countries, women do an average of 51 % of the total work, and-like their counterparts in the developing world-perform about two- thirds of their total labor without pay. Men in industrialized countries are compensated for two-thirds of their work. As you can see, indexes can be constructed from many different kinds of data for a variety of
purposes. NO'IV we'll turn our attention from the construction of indexes to an examination of scaling techniques.
Scale Construction Good indexes provide an ordinal ranking of cases on a given variable. All indexes are based on this kind of assumption: A senator who voted for seven conservative bills is considered to be more conservative than one who only voted for four of them. What an index may fail to take into account, however, is that not all indicators of a variable are equally lmPortant or equally strong. The first senator might have voted in favor of seven mildly conservative bills, whereas the second senator might have voted in favor of four extremely conservative bills. (The second senator might have considered the other seven bills too liberal and voted against them.) Scales offer more assurance of ordinality by tapping the intensity structures among the indicators. The several items going into a composite measure may have different intensities in terms of the variable. Many methods of scaling are available. We'll look at four scaling procedures to illustrate the variety of teclmiques available, along with a teclmique called the semantic differentiaL Although these examples focus on questionnaires, the logic of scaling, like that of indexing, applies to other research methods as welL
Bogardus Social Distance Scale Let's suppose you're interested in the extent to which US. citizens are willing to associate with, say, sex offenders. You might ask the following questions: 1. Are you willing to permit sex offenders to live in your country? 2. Are you willing to permit sex offenders to live in your community? 3. Are you willing to permit sex offenders to live in your neighborhood?
4. Would you be willing to let a sex offender live next door to you? 5. Would you let your child marry a sex offender? These questions increase in terms of the closeness of contact with sex offenders. Beginning with the original concern to measure willingness to associate with sex offenders, you have thus developed several questions indicating differing degrees of intensity on this variable. The kinds of items presented constitute a Bogardus social distance scale (created by Emory Bogardus). This scale is a measurement technique for determining the willingness of people to participate in social relationsof varying degrees of closeness-with other kinds of people. The clear differences of intensity suggest a structure among the items. Presumably if a person is willing to accept a given kind of association, he or she would be willing to accept all those preceding it in the list-those with lesser intensities. For example, the person who is willing to permit sex offenders to live in the neighborhood will surely accept them in the community and the nation but mayor may not be willing to accept them as next-door neighbors or relatives. This, then, is the logical structure of intensity inherent among the items. Empirically, one would expect to find the largest number of people accepting co-citizenship and the fewest accepting intermarriage. In this sense, we speak of "easy items" (for example, residence in the United States) and "hard items" (for example, intermarriage). More people agree to the easy items than to the hard ones. With some inevitable exceptions, logic demands that once a person has refused a relationship presented in the scale, he or she will also refuse all the harder ones that follow it. The Bogardus social distance scale illustrates the important economy of scaling as a datareduction device. By knowing how many relationships with sex offenders a given respondent will accept, we know which relationships were accepted. Thus, a single number can accurately summarize five or six data items without a loss of information.
169
Motoko Lee, Stephen Sapp, and Melvin Ray (1996) noticed an implicit element in the Bogardus social distance scale: It looks at social distance from the point of view of the majority group in a society. These researchers decided to turn the tables and create a "reverse social distance" scale: looking at social distance from the perspective of the minority group. Here's how they framed their questions (1996: 19): Considering typical Caucasian Americans you have known, not any specific person nor the worst or the best, circle Y or N to express your opinion. Y N 5. Do they mind your being a citizen in this country? Y N 4. Do they mind your living in the same neighborhood? Y N 3. Would they mind your living next to them? Y N 2. Would they mind your becoming a close friend to them? Y N 1. Would they mind your becoming their kin by marriage? As with the original scale, the researchers found that knowing the number of items minority respondents agreed with also told the researchers which ones were agreed with: 98.9% percent of the time in this case.
Thurstone Scales Often, the inherent structure of the Bogardus social distance scale is not appropriate to the variable being measured. Indeed, such a logical structure among several indicators is seldom apparent.
Bogardus social distance scale A measurement technique for determining the willingness of people to participate in social relations-of varying degrees of closeness-with other kinds of people. It is an especially efficient tedmique in that one can summarize several discrete answers without losing any of the original details of the data.
170
Chapter 6: Indexes, Scales, and Typologies
A Thurstone scale (created by Louis Thurstone) is an attempt to develop a format for generating groups of indicators of a variable that have at least an empirical structure among them. A group of judges is given perhaps a hundred items that are thought to be indicators of a given variable. Each judge is then asked to estimate how strong an indicator of a variable each item is-by assigning scores of perhaps 1 to 13. If the variable were prejlldice, for example, the judges would be asked to assign the score of 1 to the very weakest indicators of prejudice, the score of 13 to the strongest indicators, and intermediate scores to those felt to be somewhere in between. Once the judges have completed this task, the researcher examines the scores assigned to each item by all the judges, then determines which items produced the greatest agreement among the judges. Those items on which the judges disagreed broadly would be rejected as ambiguous. Among those items producing general agreement in scoring, one or more would be selected to represent each scale score from 1 to 13. The items selected in this manner might then be included in a survey questionnaire. Respondents who appeared prejudiced on those items representing a strength of 5 would then be expected to appear prejudiced on those having lesser strengths, and if some of those respondents did not appear prejudiced on the items with a strength of 6, it would be eAlJected that they would also not appear prejudiced on those with greater strengths. If the Thurstone scale items were adequately developed and scored, the economy and effectiveness of data reduction inherent in the Bogardus social distance scale would appear. A single score might be assigned to each respondent (the strength of the hardest item accepted), and that score would adequately represent the responses to several questionnaire items. And as is true of the Bogardus scale, a respondent who scored 6 might
Thurstone scale A type of composite measure, constructed in accord with the weights assigned by "judges" to various indicators of some variables.
Scale Construction
be regarded as more prejudiced than one who scored 5 or less . Thurstone scaling is not often used in research today, primarily because of the tremendous expenditure of energy and time required to have 10 to 15 judges score the items. Because the quality of their judgments would depend on their experience with the variable under consideration, they might need to be professional researchers. Moreover, the meanings conveyed by the several items indicating a given variable tend to change over time. Thus, an item having a given weight at one time might have quite a different weight later on. For a Thurstone scale to be effective, it would have to be periodically updated.
Likert Scaling You may sometimes hear people refer to a questionnaire item containing response categories such as "strongly agree," "agree," "disagree," and "strongly disagree" as a Liken scale . This is technically a misnomer, although Rensis Likert (pronounced "LICK-ert") did create this commonly used question format. The particular value of this format is the unambiguous ordinality of response categories. If respondents were permitted to volunteer or select such answers as "sort of agree," "pretty much agree," "really agree," and so forth, the researcher would find it impossible to judge the relative strength of agreement intended by the various respondents. The Likert format solves this problem. Likert had something more in mind, however. He created a method by which this question format could be used to determine the relative intensity of different items. As a simple example, suppose we wish to measure prejudice against women. To do this, we create a set of 20 statements, each of which reflects that prejudice. One of the items might be "Women can't drive as well as men." Another might be "Women shouldn't be allowed to vote." Likert's scaling technique would demonstrate the difference in intensity between these items as well as pegging the intensity of the other 18 statements. Let's suppose we ask a sample of people to agree or disagree with each of the 20 statements.
Simply giving one point for each of the indicators of prejudice against women would yield the possibility of index scores ranging from 0 to 20. A Likert scale goes one step beyond that and calculates the average index score for those agreeing with each of the individual statements. Let's say that all those who agreed that women are poorer drivers than men had an average index score of 1.5 (out of a possible 20). Those who agreed that women should be denied the right to vote might have an average index score of, say, 19.5indicating the greater degree of prejudice reflected in that response As a result of this item analysis, respondents could be rescored to form a scale: 1. 5 points for agreeing that women are poorer drivers, 19.5 points for saying women shouldn't vote, and points for other responses reflecting how those items related to the initial, simple index. If those who disagreed with the statement "I might vote for a woman for president" had an average index score of 15, then the scale would give 15 points to people disagreeing with that statement. In practice, Likert scaling is seldom used today. I don't know why; maybe it seems too complex. The item format devised by Likert, however, is one of the most commonly used formats in contemporary questionnaire design. Typically, it is now used in the creation of simple indexes. With, say, five response categories, scores of 0 to 4 or 1 to 5 might be assigned, taking the direction of the items into account (for example, assign a score of 5 to "strongly agree" for positive items and to "strongly disagree" for negative items). Each respondent would then be aSSigned an overall score representing the summation of the scores he or she received for responses to the individual items.
Semantic Differential Like the Likert format, the semantic differential asks respondents to a questionnaire to choose between two oppOsite positions by using qualifiers to bridge the distance between the two opposites. Here's how it works.
171
Suppose you're evaluating the effectiveness of a new music-appreciation lecture on subjects' appreciation of music. As a part of your study, you want to play some musical selections and have the subjects report their feelings about them. A good way to tap those feelings would be to use a semantic differential format. To begin, you must determine the dimensions along which subjects should judge each selection. Then you need to find two opposite terms, representing the polar extremes along each dimension. Let's suppose one dimension that interests you is simply whether subjects enjoyed the piece or not. Two opposite terms in this case could be "enjoyable" and "unenjoyable." Similarly, you might want to know whether they regarded the individual selections as "complex" or "simple," "harmonic" or "discordant," and so forth. Once you have determined the relevant dimensions and have found terms to represent the extremes of each, you might prepare a rating sheet each subject would complete for each piece of music Figure 6-5 shows what it might look like. On each line of the rating sheet, the subject would indicate how he or she felt about the piece of music: whether it was enjoyable or unenjoyable, for example, and whether it was "somewhat" that way or "very much" so. To avoid creating a biased
Likert scale A type of composite measure developed by Rensis Likert in an attempt to improve the levels of measurement in social research throuah the use of standardized response categories in surv~y questionnaires to determine the relative intensity of different items . Likert items are those using such response categories as strongly agree, agree, disagree, and strongly disagree. Such items may be used in the construction of true Likert scales as well as other types of composite measures. semantic differential A questionnaire format in which the respondent is asked to rate something in terms of two, opposite adjectives (e.g., rate textbooks as "boring" or "exciting"), using qualifiers such as "very," "somewhat," "neither," "somewhat," and "very" to bridge the distance between the two opposites.
172
Scale Construction
Chapter 6: Indexes, Scales, and Typologies
TABLE 6-2
Scaling Support for Choice of Abortion Enjoyable Simple Discordant Traditional
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
Unenjoyable Complex Harmonic Modern
Scale lypes
Womens Health
Result ofRape
Woman Unmarried
+ + +
+ +
+
FIGURE 6-5 Semantic Differential: Feelings about Musical Selectionso The semantic differential asks respondents to describe something or
+
someone in terms of opposing adjectives. pattern of responses to such items, it's a good idea to vary the placement of terms that are likely to be related to each other. Notice, for example, that "discordant" and "traditional" are on the left side of the sheet. vvith "harmonic" and "modern" on the right. Most likely, those selections scored as "discordant" would also be scored as "modern" as opposed to "traditionaL" Both the Likert and semantic differential formats have a greater rigor and structure than other question formats do. As I indicated earlier, these formats produce data suitable to both indexing and scaling.
Guttman Scaling Researchers today often use the scale developed by Louis Guttman. Like Bogardus, Thurstone, and Likert scaling, Guttman scaling is based on the fact that some items under consideration may prove to be more-extreme indicators of the variable than others. Here's an example to illustrate this pattern. In the earlier example of measuring scientific orientation among medical school faculty members, you'll recall that a simple index was constructed. As it happens, however. the three items included in the index essentially form a Guttman scale. The construction of a Guttman scale begins with some of the same steps that initiate index Guttman scale A type of composite measure used to summarize several discrete observations and to represent some more-general variable.
+ construction. You begin by examining the face validity of items available for analysis. Then, you examine the bivariate and perhaps multivariate relations among those items. In scale construction, however, you also look for relatively "hard" and "easy" indicators of the variable being examined. Earlier, when we talked about attitudes regarding a woman's right to have an abortion, we discussed several conditions that can affect people's opinions: whether the woman is married, whether her health is endangered, and so forth. These differing conditions provide an excellent illustration of Guttman scaling. Here are the percentages of the people in the 2000 GSS sample who supported a woman's right to an abortion, under three different conditions: Woman's health is seriously endangered
89%
Pregnant as a result of rape
81 %
Woman is not married
39%
The different percentages supporting abortion under the three conditions suggest something about the different levels of support that each item indicates. For example, if someone supports abortion when the mother's life is seriously endangered, that's not a very strong indicator of general support for abortion, because almost everyone agreed with that. Supporting abortion for unmarried women seems a much stronger indicator of support for abortion in general-fewer than half the sample took that position. Guttman scaling is based on the idea that anyone who gives a strong indicator of some variable will also give the weaker indicators. In this case, we
Mixed Types
+
Number of eases
677 607 165 147 Total = 1,596 42 5 + 2 + 4 + Total = 53
+ = favors woman's right to choose; - = opposes woman's right to choose
would assume that anyone who supported abortion for unmarried women would also support it in the case of rape or of the woman's health being threatened. Table 6-2 tests this assumption by presenting the number of respondents who gave each of the possible response patterns. The first four response patterns in the table compose what we would call the scale types: those patterns that form a scalar structure. Following those respondents who supported abortion under all three conditions (line I), we see (line 2) that those with only two pro-choice responses have chosen the two easier ones; those with only one such response (line 3) chose the easiest of the three (the woman's health being endangered). And finally, there are some respondents who opposed abortion in all three circumstances (line 4). The second part of the table presents those response patterns that violate the scalar structure of the items. The most radical departures from the scalar structure are the last two response patterns: those who accepted only the hardest item and those who rejected only the easiest one. The final column in the table indicates the number of survey respondents who gave each of the response patterns. The great majority (1.596, or
173
99 percent) fit into one of the scale types. The presence of mixed types, however. indicates that the items do not form a perfect Guttman scale. (It would be extremely rare for such data to form a Guttman scale perfectly.) Recall at this point that one of the chief functions of scaling is efficient data reduction. Scales provide a technique for presenting data in a summary form while maintaining as much of the original information as possible. When the scientific orientation items were formed into an index in our earlier discussion, respondents were given one point for each scientific response they gave. If these same three items were scored as a Guttman scale, some respondents would be assigned scale scores that would permit the most accurate reproduction of their original responses to all three items. In the present example of attitudes regarding abortion, respondents fitting into the scale types would receive the same scores as would be assigned in the construction of an index. Persons selecting all three pro-choice responses (+ + +) would still be scored 3, those who selected prochoice responses to the two easier items and were opposed on the hardest item (+ + -) would be scored 2, and so on. For each of the four scale types we could predict accurately all the actual responses given by all the respondents based on their scores . The mixed types in the table present a problem, however. The first mixed type (- + -) was scored I on the index to indicate only one pro-choice response. But. if I were assigned as a scale score, we would predict that the 42 respondents in this group had chosen only the easiest item (approving abortion when the woman's life was endangered), and we would be making two errors for each such respondent: thinking their response pattern was (+ - -) instead of (- + -). Scale scores are assigned, therefore, with the aim of minimizing the errors that would be made in reconstructing the original responses. Table 6- 3 illustrates the index and scale scores that would be assigned to each of the response patterns in our example. Note that one error is made for each respondent in the mixed types. This is the
174
Typologies
Chapter 6: Indexes, Scales, and Typologies
TABLE 6-3
and Scale Scores Response
Scale Types
+++ +++--
Mixed Types -+-
+-+ -+ -++
Number of Cases
Index Scores
677 607 165 147 42 5 2 4
3
Scale Scores Scale Errors
2 0 2 2
3 2 1 0 2 3 0 3
0 0 0 0 42 5 2 4
Total Scale errors = 53 number of errors Coefficient of reproducibility = 1 b f num er 0 guesses 53 1,649 X 3 = 0.989 = 98.9%
=1
= 1- ~ 4,947
This table presents one common method for scoring mixed types, but you should be advised that other methods are also used.
minimum we can hope for in a mixed-type pattern. In the first mixed type, for example, we would erroneously predict a pro-choice response to the easiest item for each of the 42 respondents in this group, making a total of 42 errors. The extent to which a set of empirical responses form a Guttman scale is determined by the accuracy with which the original responses can be reconstructed from the scale scores. For each of the 1,649 respondents in this example, we'll predict three questionnaire responses, for a total of 4,947 predictions. Table 6-3 indicates that we'll make 53 errors using the scale scores assigned. The percentage of correct predictions is called the coefficient of reproducibility.: the percentage of original responses that could be reproduced by knm·ving the scale scores used to summarize them. In the present example, the coefficient of reproducibility is 4,894/4,947, or 98.9 percent. Except for the case of perfect (100 percent) reproducibility, there is no way of saying that a set of items does or does not form a Guttman scale
in any absolute sense. Virtually all sets of such items approximate a scale. As a general guideline, however, coefficients of 90 or 95 percent are the commonly used standards. If the observed reproducibility exceeds the level you've set, you'll probably decide to score and use the items as a scale. The decision concerning criteria in this regard is, of course, arbitrary. Moreover, a high degree of reproducibility does not insure that the scale constructed in fact measures the concept under consideration. What it does is increase confidence that all the component items measure rhe same thing. Also, you should realize that a high coefficient of reproducibility is most likely when few items are involved. One concluding remark with regard to Guttman scaling: It's based on the structure observed among the actual data under examination. This is an important point that is often misunderstood. It does not make sense to say that a set of questionnaire items (perhaps developed and used by a previous researcher) constitutes a Guttman scale. Rather, we can say only that they form a scale within a given body of data being analyzed. Scalability, then, is a sample-dependent, empirical matter. Although a set of items may form a Guttman scale among one sample of survey respondents, for example, there is no guarantee that this set will form such a scale among another sample. In this sense, then, a set of questionnaire items in and of itself never forms a scale, but a set of empirical observations may. This concludes our discussion of indexing and scaling. Like indexes, scales are composite measures of a variable, typically broadening the meaning of the variable beyond what might be captured by a single indicator. Both scales and indexes seek to measure variables at the ordinal level of measurement. Unlike indexes, however, scales take advantage of any intensity structure that may be present among the individual indicators. To the extent that such an intensity structure is found and the data from the people or other units of analysis comply vvith the logic of that intensity structure, we can have confidence that we have created an ordinal measure.
Typologies We conclude this chapter with a short discussion of typology construction and analysis. Recall that indexes and scales are constructed to provide ordinal measures of given variables. We attempt to assign index or scale scores to cases in such a way as to indicate a rising degree of prejudice, religiosity, conservatism, and so forth. In such cases, we're dealing with single dimensions. Often, however, the researcher wishes to summarize the intersection of two or more variables, thereby creating a set of categories or types-a nominal variable-called a typology. You may, for example, ,vish to examine the political orientations of nevvspapers separately in terms of domestic issues and foreign policy. The fourfold presentation in Table 6-4 describes such a typology. Newspapers in cell A of the table are conservative on both foreign policy and domestic policy; those in cell D are liberal on both. Those in cells B and C are conservative on one and liberal on the other. Frequently, you arrive at a typology in the course of an attempt to construct an index or scale. The items that you felt represented a single variable appear to represent two. We might have been attempting to construct a single index of political orientations for newspapers but discovered-empirically-that foreign and domestic politics had to be kept separate. In any event, you should be warned against a difficulty inherent in typological analysis. Whenever the typology is used as the independent variable, there will probably be no problem. In the preceding example, you might compute the percentages of newspapers in each cell that normally endorse Democratic candidates; you could then easily examine the effects of both foreign and domestic policies on political endorsements. It's extremely difficult, however, to analyze a typology as a dependent variable. If you want to discover why newspapers fall into the different cells of typology, you're in trouble. That becomes apparent when we consider the ways you might construct and read your tables. Assume, for example, that you want to examine the effects of community
175
TABLE 6-4
APolitical Typology of Newspapers Foreign Policy
Domestic Policy
Conservative Liberal
Conservative
Liberal
A
B
C
D
size on political policies. With a single dimension, you could easily determine the percentages of rural and urban newspapers that were scored conservative and liberal on your index or scale. With a typology, however, you would have to present the distribution of the urban newspapers in your sample among types A, B, C, and D. Then you would repeat the procedure for the rural ones in the sample and compare the two distributions. Let's suppose that 80 percent of the rural newspapers are scored as type A (conservative on both dimensions), compared with 30 percent of the urban ones. Moreover, suppose that only 5 percent of the rural newspapers are scored as type B (conservative only on domestic issues), compared with 40 percent of the urban ones. It would be incorrect to conclude from an examination of type B that urban newspapers are more conservative on domestic issues than rural ones are, because 85 percent of the rural newspapers, compared with 70 percent of the urban ones, have this characteristic. The relative sparsity of rural newspapers in type B is due to their concentration in type A. It should be apparent that an interpretation of such data would be very difficult for anything other than description . In reality, you'd probably examine two such dimensions separately, especially if the dependent variable has more categories of responses than the given example does. Don't think that typologies should always be avoided in social research; often they provide the
typology The classification (typically nominal) of observations in terms of their attributes on two or more variables. The classification of newspapers as liberal-urban, liberal-rural, conservative-urban, or conservative-rural would be an example.
176 " Chapter 6: Indexes, Scales, and Typologies
most appropriate device for understanding the data. To examine the pro-life orientation in depth, for example, you might create a typology involving both abortion and capital punishment. Libertarianism could be seen in terms of both economic and social permissiveness. You've now been warned, however, against the special difficulties involved in using typologies as dependent variables.
Online Study Resources o
Index scoring involves deciding the desirable range of scores and determining whether items will have equal or different weights.
o
There are various techniques that allow items to be used in an index in spite of missing data.
o
Item analysis is a type of internal validation, based on the relationship between individual items in the composite measure and the measure itself. External validation refers to the relationships between the composite measure and other indicators of the variable-indicators not included in the measure.
MAIN POINTS
Introduction o
Single indicators of variables seldom capture all the dimensions of a concept, have sufficiently clear validity to warrant their use, or permit the desired range of variation to allow ordinal rankings. Composite measures, such as scales and indexes, solve these problems by including several indicators of a variable in one summary measure.
Indexes versus Scales o
Although both indexes and scales are intended as ordinal measures of variables, scales typically satisfy this intention better than indexes do.
o
Whereas indexes are based on the simple cumulation of indicators of a variable, scales take advantage of any logical or empirical intensity structures that exist among a variable's indicators.
Scale Construction o Four types of scaling techniques are represented by the Bogardus social distance scale, a device for measuring the varying degrees to which a person would be willing to associate with a given class of people; Thurstone scaling, a technique that uses judges to determine the intensities of different indicators; Likert scaling, a measurement technique based on the use of standardized response categories; and Guttman scaling, a method of discovering and using the empirical intensity structure among several indicators of a given variable. Guttman scaling is probably the most popular scaling technique in social research today. o The semantic differential is a question format that asks respondents to make ratings that lie between two extremes, such as "very positive" and "very negative."
Index Construction o
The principal steps in constructing an index include selecting possible items, examining their empirical relationships, scoring the index, and validating it.
o
Criteria of item selection include face validity, unidimensionality, the degree of specificity with which a dimension is to be measured, and the amount of variance provided by the items.
o
If different items are indeed indicators of the same variable, then they should be related empirically to one another. In constructing an index, the researcher needs to examine bivariate and multivariate relationships among the items.
Typologies o
A typology is a nominal composite measure often used in social research. Typologies may be used effectively as independent variables, but interpretation is difficult when they are used as dependent variables.
KEY TERMS The following terms are defined in context in the chapter and at the bottom of the page where the term is introduced, as well as in the comprehensive glossary at the back of the book.
Bogardus social distance scale external validation Guttman scale index item analysis
Likert scale scale semantic differential Thurstone scale typology
REVIEW QUESTIONS AND EXERCISES I. In your own words, describe the difference between an index and a scale. 2. Suppose you wanted to create an index for rating the quality of colleges and universities. Name three data items that might be included in such an index . 3. Make up three questionnaire items that measure attitudes toward nuclear power and that would probably form a Guttman scale. 4.
Construct a typology of pro-life attitudes as discussed in the chapter .
5. Economists often use indexes to measure economic variables, such as the cost of living. Go to the Bureau of Labor Statistics (http://www.bls .gov) and find the Consumer Price Index survey. What are some of the dimensions of livina costs included in this measure?
ADDITIONAL READINGS Anderson, Andy K, Alexander Basilevsky, and Derek E J. Hum. 1983. "Measurement: Theory and Techniques." Pp. 231-87 in Halldbook of Sllrvey Research, edited by Peter H. Rossi. James D. Wright. and Andy B. Anderson. New York: Academic Press. The logic of measurement is analyzed in the context of composite measures. Bobo, Lawrence, and Frederick C Licari. 1989. "Education and Political Tolerance: Testing the Effects of Cognitive Sophistication and Target Group Effect." Pllblic Opillion Qllarterly 53: 285-308. The authors use a variety of techniques for determining how best to measure tolerance toward different groups in society. Indrayan, A., M. J. Wysocki. A. Chawla, R. Kumar, and N. Singh. 1999. "Three-Decade Trend in Human Development Index in India and Its Major States." Social Indicators Research 46 (I): 91-120. The authors use several human
177
development indexes to compare the status of different states in India. Lazarsfeld, PauL Ann Pasanella, and Morris Rosenberg, eds. 1972. Continlliries in the Langllage ofSocial Research. New York: Free Press. See especially Section I. An excellent collection of conceptual discussions and concrete illustrations . The construction of composite measures is presented within the more general area of conceptualization and measurement. McIver, John E, and Edward G. Carmines. 198 L Ullidimensiollal Scaling . Newbury Park, CA: Sage. Here's an excellent way to pursue Thurstone, Likert, and Guttman scaling in further depth. Miller, Delbert. 1991. Halldbook of Research Design and Social MeaSllremem. Newbury Park, CA: Sage . An excellent compilation of frequently used and semistandardized scales. The many illustrations reported in Part 4 of the Miller book may be directly adaptable to studies or at least suggestive of modified measures. Studying the several illustrations, moreover, may also give you a better understanding of the logic of composite measures in general.
SPSS EXERCISES See the booklet that accompanies your text for exercises using SPSS (Statistical Package for the Social Sciences). There are exercises offered for each chapter, and you'll also find a detailed primer on using SPSS.
Online Study Resources Sociology ~ Now'M: Research Methods L Before you do your final review of the chapter, take the SociologyNoll': Research J'v[ethods diagnostic quiz to help identify the areas on which you should concentrate. You'll find information on this online tool. as well as instructions on how to access all of its great resources, in the front of the book. 2. As you review, take advantage of the Sodology Now: Research J'vler/zods customized study plan, based on your quiz results. Use this study plan with its interactive exercises and other resources to master the material.
178 3,
Chapter 6: Indexes, Scales, and Typologies When you're finished with your review, take the posttest to confirm that you're ready to move on to the next chapter.
WEBSITE FOR THE PRACTICE OF SOCIAL RESEARCH 11 TH EDITION Go to your book's website at http://sociology ,wadsworth,com/babbie_practice lIe for tools to aid you in studying for your exams, You'll find Titto, rial Quizzes with feedback, Imel7let Exercises, Flashcards, and Chapter Tlttorials, as well as E\1ended ProjeCTs, Info, Trac College Editioll search terms, Social Research in Cyberspace, GSS Data, Web Links, and primers for using various data-analysis software such as SPSS and 't\'Vivo.
WEB LINKS FOR THIS CHAPTER Please realize that the Internet is an evolving entity, subject to change. Nevertheless, these few websites should be fairly stable. Also, check your book's website for even more H't?b Links, These websites, current at the time of this book's
publication. provide opportunities to learn about indexes, scales. and typologies .
Bureau of Labor Statistics, Measurement Issues in the Consumer Price Index http://www.bls . gov/cpilcpigm697.htm The federal government's Consumer Price Index (CPI) is one of those composite measures that affects many people's lives-determining cost-of-living increases, for example This site discusses some aspects of the measure,
The Logic of Sampling
Arizona State University, Reliability and Validity http://seamonkey,ed.asu . edu/-alex/teaching/ assessment/reliability.html Here you'll find an extensive discussion of these two aspects of measurement quality.
Thomas O'Connor, "Scales and Indexes" http://faculty..ncwc.edu/toconnor/308/308Iect05.htm This web page has an excellent discussion of scales and indexes in general. provides illustrative examples, and also gives hot links useful for pursuing the topic
Introduction A Brief History of Sampling President AU Landon President Thomas E. Dewey Two Types of Sampling Methods Nonprobabili[y Sampling Reliance on Available Subjects Purposive or Judgmental Sampling Snowball Sampling Quota Sampling Selecting Informants The Theorv and Louic of Probabi'lity Sam~ling Conscious and Unconscious Sampling Bias Representativeness and Probability of Selection Random Selection Probability Theory, Sampling Distributions. and Estimates of Sampling Error
Populations and Sampling Frames Review of Populations and Sampling Frames Types of Sampling Designs Simple Random Sampling Systematic Sampling Stratified Sampling Implicit Stratification in Systematic Sampling Illustration: Sampling University Students Multistage Cluster Sampling Multistage Designs and Sampling Error Stratification in Multistage Cluster Sampling Probability Proportionate to Size (PPS) Sampling Disproportionate Sampling and Weighting Probability Sampling in Review
Sociology:~ Now''': Research Methods Use this online tool to help you make the grade on your next exam. After reading this chapter, go to the "Online Study Resources" at the end of the chapter for instructions on how to benefit from SociologyNow: Research
Methods .
180
Chapter 7: TIle Logic of Sampling
A Brief History of Sampling
181
100r-----·----.----------------.----------------________.______.______
Introduction One of the most visible uses of survey sampling lies in the political polling that is subsequently tested by election results. Whereas some people doubt the accuracy of sample surveys, others complain that political polls take all the suspense out of campaigns by foretelling the resulL In recent presidential elections, however. the polls have not removed the suspense. Going into the 2004 presidential elections, pollsters generally agreed that the election was "too close to call," a repeat of their experience four years earlier. The Roper Center has compiled a list of polls conducted throughout the campaign; Table 7 -1 reports those conducted during the few days preceding the election. Despite some variations, the overall picture they present is amazingly consistent and was played out in the election results. Now, how many interviews do you suppose it took each of these pollsters to come within a couple of percentage points in estimating the behavior of more than 115 million voters? Often fewer than 2,000! In this chapter. we're going to find out how social researchers can pull off such wizardry. For another powerful illustration of the potency of sampling, look at this graphic portrayal of President George W. Bush's approval ratings prior to and following the September 1 L 200 L terrorist attack on the US. (see Figure 7 -1). The data reported by several different polling agencies descTibe the same pattern. Political polling, like other forms of social research, rests on observations. But neither pollsters nor other social researchers can observe everything that might be relevant to their interests. A critical part of social research, then, is deciding what to observe and what not. If you want to study voters, for example, which voters should you study? The process of selecting observations is called sampling. Although sampling can mean any procedure for selecting units of observation-for example, interviewing every tenth passerby on a busy street-the key to generalizing from a sample to a
TABLE 7-1
Election Eve Polls Reporting Percent of Population Voting for U.5. Presidential Candidates,2004 Date Begun
Bush
Kerry
Fox/OpinDynamics
Oct 28
50
50
TIPP
Oct 28
53
47
CBS/NYT
Oct 28
52
48
Poll
ARG
Oct28
50
50
ABC
Oct 28
51
49
Fox/OpinDynamics
Oct 29
49
51
Gallup/CNN/USA
Oct 29
49
51
NBCIWSJ
Oct 29
51
49
TIPP
Oct 29
51
49
Harris
Oct 29
52
48
Democracy Corps
Oct 29
49
51
Harris
Oct 29
51
49
CBS
O(t29
51
49
Fox/OpinDynamics
Oct30
49
52
TiPP
Oct30
51
49
Marist
Oct31
50
50
GWU Battleground 2004
Oct31
52
48
Actual vote
Nov2
52
48
Source: Poll data adapted from the Roper Center, Election 2004 (http://www
Jopercentecuconn.edu/elecC2004/pres_triaUeats.html). Accessed November 16, 2004 . I've apportioned the undecided and other votes according to the percentages saying they were voting for Bush or Kerry.
larger popUlation is probability sampling, which involves the important idea of random selection. Much of this chapter is devoted to the logic and skills of probability sampling. This topic is more rigorous and precise than some of the other topics in this book. Whereas social research as a whole is both art and science, sampling leans toward science. Although this subject is somewhat technical. the basic logic of sampling is not difficult to understand. In fact. the logical neatness of this topic can make it easier to comprehend than, say, conceptualization. Although probability sampling is central to social research today, we'll take some time to examine a variety of nonprobability methods as
Before September 11th attack 90 - . - - - -
After September 11th attack
+
801------
501----oID---·lm---~~~-~--------------.----.- ______________
+
o
0
Key:
00
o
0
+ ABC/Post
III CBS
• Harris
t::. Ipsos-Reid
Pew
JJ. Bloomberg
+ Fox
o IBO/CSM
t::. NBCIWSJ
AmResGp
JJ. CNNfTime
G Gallup
o Zogby
III Newsweek
FIGURE 7-1 Bush Approval: Raw Poll Data. This graph demonstrates how independent polls produce the same picture of reality. This also shows the impact of a national crisis on the president's popularity: in this ease, the September 11 terrorist attack and President George W. Bush's popularity. Source: Copyright © 2001,2002 by drlimerick.com (http://wwwpolikatz.homestead.com/files/MyHTML2gif).Allrights reserved.
well. These methods have their own logic and can provide useful samples for social inquiry. Before we discuss the two major types of sampling, I'll introduce you to some basic ideas by way of a brief history of sampling. As you'll see, the pollsters who correctly predicted the election cliffhanger of 2000 did so in part because researchers had learned to avoid some pitfalls that earlier pollsters had fallen into.
A Brief History of Sampling Sampling in social research has developed hand in hand with political polling. This is the case, no doubt. because political polling is one of the few
opportunities social researchers have to discover the accuracy of their estimates. On election day, they find out how well or how poorly they did.
President AIr Landon President Alf Landon? Who's he? Did you sleep through an entire presidency in your US. history class? No-but Alf Landon \vould have been president if a famous poll conducted by the Litermy Digest had proved to be accurate. The Literary Digest was a popular newsmagazine published between 1890 and 1938. In 1920, Digest editors mailed postcards to people in six states, asking them whom they were planning to vote for in the presidential campaign between Warren Harding and James
182
Chapter 7: The Logic of Sampling
Cox. Names were selected for the poll from telephone directories and automobile registration lists. Based on the postcards sent back, the Digest correctly predicted that Harding would be elected. In the elections that followed, the Literary Digest expanded the size of its poll and made correct predictions in 1924, 1928, and 1932. In 1936, the Digest conducted its most ambitious poll: Ten million ballots were sent to people listed in telephone directories and on lists of automobile owners. Over two million people responded, giving the Republican contender, Alf Landon, a stunning 57 to 43 percent landslide over the incumbent, President Franklin Roosevelt. The editors modestly cautioned, We make no claim to infallibility. We did not coin the phrase "uncanny accuracy" which has been so freely applied to our Polls. We know only too well the limitations of every straw vote, however enormous the sample gathered, however scientific the method. It would be a miracle if every State of the forty-eight behaved on Election Day exactly as forecast by the PolL (Litemry Digest 1936a. 6)
Two weeks later, the Digest editors knew the limitations of straw polls even better: The voters gave Roosevelt a second term in office by the largest landslide in history, with 61 percent of the vote. Landon won only 8 electoral votes to Roosevelt's 523. The editors were puzzled by their unfortunate turn of luck. A part of the problem surely lay in the 22 percent return rate garnered by the polL The editors asked, Why did only one in five voters in Chicago to whom the Digest sent ballots take the trouble to reply? And why was there a preponderance of Republicans in the one-fifth that did reply? .... We were getting better cooperation in what we have always regarded as a public service from Republicans than we v'lere getting from Democrats. Do Republicans live nearer to mailboxes? Do Democrats generally disapprove of straw polls? (Literal), Dig,'st 19361> 7)
Nonprobability Sampling
Actually, there was a better explanation-what is technically called the sampling frame used by the Digest. In this case the sampling frame consisted of telephone subscribers and automobile owners. In the context of 1936, this design selected a disproportionately wealthy sample of the voting population, especially coming on the tail end of the worst economic depression in the nation's history. The sample effectively excluded poor people, and the poor voted predominantly for Roosevelt's New Deal recovery program. The Digests poll mayor may not have correctly represented the voting intentions of telephone subscribers and automobile ovvners. Unfortunately for the editors, it decidedly did not represent the voting intentions of the population as a whole.
President Thomas E Dewey The 1936 election also saw the emergence of a young pollster whose name would become synonymous with public opinion, In contrast to the Literal)' Digest, George Gallup correctly predicted that Roosevelt would beat Landon. Gallup's success in 1936 hinged on his use of something called quota sampling, which we'll look at more closely later in the chapter. For now, it's enough to know that quota sampling is based on a knowledge of the characteristics of the population being sampled: what proportion are men, what proportion are women, what proportions are of various incomes, ages, and so on, Quota sampling selects people to match a set of these characteristics: the right number of poor, white, rural men; the right number of rich, African American, urban women; and so on. The quotas are based on those variables most relevant to the study. In the case of Gallup's poll. the sample selection was based on levels of income; the selection procedure ensured the right proportion of respondents at each income leveL Gallup and his American Institute of Public Opinion used quota sampling to good effect in 1936, 1940, and 1944-correctly picking the presidential winner each of those years. Then, in 1948, Gallup and most political pollsters suffered the embarrassment of picking Governor Thomas Dewey of New York over the incumbent, President Harry
Truman. The pollsters' embarrassing miscue continued right up to election night. A famous photograph shows a jubilant Truman-whose followers' battle cry was "Give 'em helL Harry!"-holding aloft a newspaper with the banner headline "Dewey Defeats Truman/' Several factors accounted for the pollsters' failure in 1948. First, most pollsters stopped polling in early October despite a steady trend toward Truman during the campaign. In addition, many voters were undecided throughout the campaign, and these went disproportionately for Truman when they stepped into the voting booth. More important, Gallup's failure rested on the unrepresentativeness of his samples. Quota sampling-which had been effective in earlier years-was Gallup's undoing in 1948. This technique requires that the researcher know something about the total population (of voters in this instance). For national political polls, such information came primarily from census data. By 1948, however, World War II had produced a massive movement from the country to cities, radically changing the character of the US. population from what the 1940 census showed, and Gallup relied on 1940 census data. City dwellers, moreover, tended to vote Democratic; hence, the overrepresentation of rural voters in his poll had the effect of underestimating the number of Democratic votes.
Two Types of Sampling Methods By 1948, some academic researchers had already been experimenting with a form of sampling based on probability theory, This technique involves the selection of a "random sample" from a list containing the names of everyone in the population being sampled. By and large, the probability sampling methods used in 1948 were far more accurate than quota sampling techniques. Today, probability sampling remains the primary method of selecting large, representative samples for social research, including national political polls, At the same time, probability sampling can be impossible or inappropriate in many research situations. Accordingly, before turning to the logic and techniques of probability sampling, we'll
183
first take a look at techniques for nonprobability sampling and how they're used in social research.
Nonprobability Sampling Social research is often conducted in situations that do not permit the kinds of probability samples used in large-scale social surveys. Suppose you wanted to study homelessness: There is no list of all homeless individuals, nor are you likely to create such a list. Moreover, as you'll see, there are times when probability sampling wouldn't be appropriate even if it were possible. Many such situations call for nonprobability sampling. In this section, we'll examine four types of non probability sampling: reliance on available subjects, purposive or judgmental sampling, snowball sampling, and quota sampling. We'll conclude v'lith a brief discussion of techniques for obtaining information about social groups through the use of informants.
Reliance on Available Subjects Relying on available subjects, such as stopping people at a street corner or some other location, is an extremely risky sampling method; even so, it's used all too frequently. Clearly, this method does not permit any control over the representativeness of a sample. It's justified only if the researcher wants to study the characteristics of people passing the sampling point at specified times or if less risky sampling methods are not feasible. Even when this method is justified on grounds of feasibility, researchers must exercise great caution in generalizing from their data. Also, they should alert readers to the risks associated with this method., University researchers frequently conduct surveys among the students enrolled in large lecture
nonprobability sampling Any technique in which samples are selected in some way not suggested by probability theory. Examples include reliance on available subjects as well as purposive (judgmental), quota, and snowball sampling.
184
Chapter 7: The Logic of Sampling
classes, The ease and frugality of such a method explains its popularity, but it seldom produces data of any general value. It may be useful for pretesting a questionnaire, but such a sampling method should not be used for a study purportedly describing students as a whole. Consider this report on the sampling design in an examination of knowledge and opinions about nutrition and cancer among medical students and family physicians: The fourth-year medical students of the University of Minnesota Medical School in Minneapolis comprised the student population in this study. The physician population consisted of all physicians attending a "Family Practice Review and Update" course sponsored by the University of Minnesota Department of Continuing Medical Education. (Cooper-Stephenson and Tlzeologides 1981. 472)
After all is said and done, what will the results of this study represent? They do not provide a meaningful comparison of medical students and family physicians in the United States or even in Minnesota. Who were the physicians who attended the course? We can guess that they were probably more concerned about their continuing education than other physicians were, but we can't say for sure. Although such studies can be the source of useful insights, we must take care not to overgeneralize from them.
Purposive or Judgmental Sampling Sometimes it's appropriate to select a sample on the basis of knowledge of a population, its elements, and the purpose of the study. This type of sampling
purposive (judgmental) sampling A type of nonprobability sampling in whid1 the units to be observed are selected on the basis of the researmer's judgment about which ones will be the most useful or representative. snowball sampling A nonprobability sampling method often employed in field researm whereby each person interviewed may be asked to suggest additional people for interviewing.
Nonprobability Sampling
is called purposive or judgmental sampling. In the initial design of a questionnaire, for example, you might wish to select the widest variety of respondents to test the broad applicability of questions. Although the study findings would not represent any meaningful population, the test run might effectively uncover any peculiar defects in your questionnaire. This situation would be considered a pretest, however, rather than a final study. In some instances, you may wish to study a small subset of a larger population in which many members of the subset are easily identified, but the enumeration of them all would be nearly impossible. For example, you might want to study the leadership of a student protest movement; many of the leaders are easily visible, but it would not be feasible to define and sample all the leaders. In studying all or a sample of the most visible leaders, you may collect data sufficient for your purposes. Or let's say you want to compare left-wing and right-wing students. Because you may not be able to enumerate and sample from all such students, you might decide to sample the memberships of left- and right-leaning groups, such as the Green Party and the Young Americans for Freedom. Although such a sample design would not provide a good description of either left-wing or right-wing students as a whole, it might suffice for general comparative purposes. Field researchers are often particularly interested in studying deviant cases-cases that don't fit into fairly regular patterns of attitudes and behaviors-in order to improve their understanding ofthe more regular pattern. For example, you might gain important insights into the nature of school spirit, as exhibited at a pep rally, by interviewing people who did not appear to be caught up in the emotions of the crowd or by interviewing students who did not attend the rally at all. Selecting deviant cases for study is another example of purposive study.
Snowball Sampling Another nonprobability sampling technique, which some consider to be a form of accidental sampling, is called snowball sampling. This procedure is appropriate when the members of a special
population are difficult to locate, such as homeless individuals, migrant workers, or undocumented immigrants. In snowball sampling, the researcher collects data on the few members of the target population he or she can locate, then asks those individuals to provide the information needed to locate other members of that population whom they happen to know. "Snowball" refers to the process of accumulation as each located subject suggests other subjects. Because this procedure also results in samples with questionable representativeness, it's used primarily for exploratory purposes. Suppose you wish to learn a community organization's pattern of recruitment over time. You might begin by interviewing fairly recent recruits, asking them who introduced them to the group. You might then interview the people named, asking them who introduced them to the group. You might then interview those people named, asking, in part, who introduced them. Or, in studying a loosely structured political group, you might ask one of the participants who he or she believes to be the most influential members of the group. You might interview those people and, in the course of the interviews, ask who they believe to be the most influential. In each of these examples, your sample would "snowball" as each of your interviewees suggested other people to interview.
Quota Sampling Quota sampling is the method that helped George Gallup avoid disaster in 1936-and set up the disaster of 1948. Like probability sampling, quota sampling addresses the issue of representativeness, although the two methods approach the issue quite differently. Quota sampling begins with a matrix, or table, describing the characteristics of the target population. Depending on your research purposes, you may need to know what proportion of the population is male and what proportion female as well as what proportions of each gender fall into various age categories, educational levels, ethnic groups, and so forth. In establishing a national quota sample, you might need to know what
185
proportion of the national population is urban, eastern, male, under 25, white, working class, and the like, and all the possible combinations of these attributes. Once you've created such a matrix and assigned a relative proportion to each cell in the matrix, you proceed to collect data from people having all the characteristics of a given cell. You then assign to all the people in a given cell a weight appropriate to their portion of the total population. When all the sample elements are so weighted, the overall data should provide a reasonable representation of the total population. Although quota sampling resembles probability sampling, it has several inherent problems. First, the quota frame (the proportions that different cells represent) must be accurate, and it's often difficult to get up-to-date information for this purpose. The Gallup failure to predict Truman as the presidential victor in 1948 was due partly to this problem. Second, the selection of sample elements within a given cell may be biased even though its proportion of the population is accurately estimated. Instructed to interview five people who meet a given, complex set of characteristics, an interviewer may still avoid people living at the top of seven-story walkups, having particularly run-down homes, or owning vicious dogs. In recent years, attempts have been made to combine probability- and quota-sampling methods, but the effectiveness of this effort remains to be seen. At present, you would be advised to treat quota sampling warily if your purpose is statistical description. At the same time, the logic of quota sampling can sometimes be applied usefully to a field research project. In the study of a formal group, for example, you might wish to interview both leaders and nonleaders. In studying a student political organization, you might want to interview radical,
quota sampling A type of nonprobability sampling in which units are selected into a sample on the basis of prespecified characteristics, so that the total sample will have the same distribution of characteristics assumed to exist in the population being studied.
186
Chapter 7: The Logic of Sampling
moderate, and conservative members of that group. You may be able to achieve sufficient representativeness in such cases by using quota sampling to ensure that you interview both men and women, both younger and older people, and so forth.
Selecting Informants When field research involves the researcher's attempt to understand some social setting-a juvenile gang or local neighborhood, for examplemuch of that understanding will come from a collaboration vvith some members of the group being studied. Whereas social researchers speak of respondents as people who provide information about themselves, allowing the researcher to construct a composite picture of the group those respondents represent, an informant is a member of the group who can talk directly about the group per se. Especially important to anthropologists, informants are important to other social researchers as well. If you wanted to learn about informal social networks in a local public housing project, for example, you would do well to locate individuals who could understand vvhat you were looking for and help you find it. When Jeffrey Johnson (1990) set out to study a salmon-fishing community in North Carolina, he used several criteria to evaluate potential informants. Did their positions allow them to interact regularly vvith other members of the camp, for example, or were they isolated? (He found that the carpenter had a vvider range of interactions than the boat captain did.) Was their information about the camp pretty much limited to their specific jobs, or did it cover many aspects of the operation? These and other criteria helped determine how useful the potential informants might be. Usually, you'll want to select informants somewhat typical of the groups you're studying. Other\vise, their observations and opinions may be
informant Someone who is well versed in the social phenomenon that you wish to study and who is willing to tell you what he or she knows about it. Not to be confused with a respondent.
The Theory and Logic of Probability Sampling
misleading. Interviewing only physicians will not give you a well-rounded view of how a community medical clinic is working, for example. Along the same lines, an anthropologist who interviews only men in a society where women are sheltered from outsiders vvill get a biased view. Similarly, although informants fluent in English are convenient for English-speaking researchers from the United States, they do not typify the members of many societies and even many subgroups within Englishspeaking countries. Simply because they're the ones willing to work with outside investigators, informants 'will almost always be somewhat "marginal" or atypical \vithin their group. Sometimes this is obvious. Other times, however, you'll learn about their marginality only in the course of your research. In Jeffrey Johnson's study, the county agent identified one fisherman who seemed squarely in the mainstream of the community. Moreover, he was cooperative and helpful to Johnson's research. The more Johnson worked with the fisherman, however, the more he found the man to be a marginal member of the fishing community. First, he was a Yankee in a southern town. Second, he had a pension from the Navy [so he was not seen as a "serious fisherman" by others in the community] .... Third, he was a major Republican activist in a mostly Democratic village. Finally, he kept his boat in an isolated anchorage, far from the community harbor. (1990: 56)
Informants' marginality may not only bias the view you get, but their marginal status may also limit their access (and hence yours) to the different sectors of the community you wish to study. These conmlents should give you some sense of the concerns involved in nonprobability sampling, typically used in qualitative research projects. I conclude \vith the following injunction: Your overall goal is to collect the richest possible data. Rich data mean, ideally, a \vide and diverse range of information collected over a relatively prolonged period of time. Again, ideally, you achieve this through direct, face-to-face
contact with, and prolonged immersion in, some social location or circumstance.
187
50
44
(Lofland and Lofland 1995 16)
44
40 Ql
In other words, nonprobability sampling does have its uses, particularly in qualitative research projects. But researchers must take care to acknowledge the limitations of nonprobability sampling, especially regarding accurate and precise representations of populations. This point will become clearer as we discuss the logic and techniques of probability sampling. As you can see, choosing and using informants can be a tricky business. To see some practical implications of doing so, you can visit the website of Canada's Community Adaptation and Sustainable Livelihoods (CASL) Program: http://iisd.ca/casl! CASLGuide/KeyInformEx.htm.
The Theory and Logic of Probability Sampling However appropriate to some research purposes, nonprobability sampling methods cannot guarantee that the sample we observed is representative of the whole population. When researchers want precise, statistical descriptions of large populations-for example, the percentage of the population who are unemployed, plan to vote for Candidate X, or feel a rape victim should have the right to an abortionthey turn to probability sampling. All large-scale surveys use probability-sampling methods. Although the application of probability sampling involves some sophisticated use of statistics, the basic logic of probability sampling is not difficult to understand. If all members of a population were identical in all respects-all demographic characteristics, attitudes, experiences, behaviors, and so on-there would be no need for careful sampling procedures. In this extreme case of perfect homogeneity, in fact, any single case would suffice as a sample to study characteristics of the whole population. In fact, of course, the human beings who compose any real population are quite heterogeneous,
Ci 0
Ql
0-
30
'0 CD
.0
E :::J
20
Z
10
a
L....It_-.:5.._ _ _' -
White women
African American women
White men
African American men
FIGURE 7·2 A Population of 100 Folks. Typically, sampling aims to reflect the characteristics and dynamiCS of large populations. For the purpose of some simple illustrations, let's assume our total population only has 100 members.
varying in many ways. Figure 7-2 offers a simplified illustration of a heterogeneous population: The 100 members of this small population differ by gender and race. We'll use this hypothetical micropopulation to illustrate various aspects of probability sampling. The fundamental idea behind probability sampling is this: To provide useful descriptions of the total population, a sample of individuals from a population must contain essentially the same variations that exist in the population. This isn't as simple as it might seem, however. Let's take a minute to look at some of the ways researchers might go astray. Then, we'll see how probability sampling provides an efficient method for selecting a sample that should adequately reflect variations that exist in the population.
probability sampling The general term for samples selected in accord with probability theory, typically involving some random-selection mechanism. Specific types of probability sampling include EPSEM, PPS, simple random sampling, and systematic sampling..
188
Chapter 7: The Logic of Sampling
The Theory and Logic of Probability Sampling
t FIGURE 7-3 A Sample of Convenience: Easy, but Not Representative. Simply selecting and observing those people who are most readily at hand is the simplest method, perhaps, but it's unlikely to provide a sample that accurately reflects the total population.
Conscious and Unconscious Sampling Bias At first glance, it may look as though sampling is pretty straightforward. To select a sample of 100 university students, you might simply interview the first 100 students you find walking around campus. This kind of sampling method is often used by untrained researchers, but it runs a high risk of introducing biases into the samples. In connection with sampling, bias simply means that those selected are not typical or representative of the larger populations they have been chosen from. This kind of bias does not have to be intentional. In fact. it is virtually inevitable when you pick people by the seat of your pants. Figure 7 -3 illustrates what can happen when researchers simply select people who are convenient for study. Although women are only 50 percent of our micropopulation, those closest to the researcher (in the lower right corner) happen to be 70 percent women, and although the population is 12 percent black, none was selected into the sample.
Beyond the risks inherent in simply studying people who are convenient, other problems can arise. To begin with, the researcher's personalleanings may affect the sample to the point where it does not truly represent the student population. Suppose you're a little intimidated by students who look particularly "cooL" feeling they might ridicule your research effort. You might consciously or unconsciously avoid interviewing such people. Or, you might feel that the attitudes of "supeHtraightlooking" students would be irrelevant to your research purposes and so avoid interviewing them. Even if you sought to interview a "balanced" group of students, you wouldn't know the exact proportions of different types of students making up such a balance, and you wouldn't always be able to identify the different types just by watching them walk by. Even if you made a conscientious effort to interview, say, every tenth student entering the university library, you could not be sure of a representative sample, because different types of students visit the library with different frequencies.
Your sample would overrepresent students who visit the library more often than others do. Similarly, the "public opinion" call-in polls-in which radio stations or newspapers ask people to call specified telephone numbers to register their opinions-cannot be trusted to represent general populations. At the very least. not everyone in the population will even be aware of the poll. This problem also invalidates polls by magazines and newspapers that publish coupons for readers to complete and mail in. Even among those who are aware of such polls, not all will express an opinion, especially if doing so will cost them a stamp, an envelope, or a telephone charge. Similar considerations apply to polls taken over the Internet. Ironically, the failure of such polls to represent all opinions equally was inadvertently acknowledged by Phillip Perinelli (1986), a staff manager of AT&T Communications' DIAL-IT 900 Service, which offers a call-in poll facility to organizations. Perinelli attempted to counter cTiticisms by saying, "The 50-cent charge assures that only interested parties respond and helps assure also that no individual 'stuffs' the ballot box." We cannot determine general public opinion while considering "only interested parties:' This excludes those who don't care 50-cents' worth, as well as those who recognize that such polls are not valid. Both types of people may have opinions and may even vote on election day. Perinelli's assertion that the 50-cent charge will prevent ballot stuffing actually means that only those who can afford it will engage in ballot stuffing. The possibilities for inadvertent sampling bias are endless and not always obvious. Fortunately many techniques can help us avoid bias.
Representativeness and Probability ofSelection Although the term representativeness has no precise, scientific meaning, it carries a commonsense meaning that makes it useful here. For our purpose, a sample is representative of the population from which it is selected if the aggregate characteristics of the sample closely approximate those same aggregate characteristics in the population. If, for example, the population contains
189
50 percent women, then a sample must contain "close to" 50 percent women to be representative. Later, we'll discuss "how close" in detail. Note that samples need not be representative in all respects; representativeness is limited to those characteristics that are relevant to the substantive interests of the study. However, you may not know in advance which characteristics are relevant. A basic principle of probability sampling is that a sample will be representative of the population from which it is selected if all members of the population have an equal chance of being selected in the sample. (We'll see shortly that the size of the sample selected also affects the degree of representativeness.) Samples that have this quality are often labeled EPSEM samples (EPSEM stands for "equal probability of selection method"). Later, we'll discuss variations of this principle, which forms the basis of probability sampling. Moving beyond this basic principle, we must realize that samples-even carefully selected EPSEM samples-seldom if ever perfectly represent the populations from which they are drawn. Nevertheless, probability sampling offers two special advantages. First, probability samples, although never perfectly representative, are typically more representative than other types of samples, because the biases previously discussed are avoided. In practice, a probability sample is more likely than a nonprobability sample to be representative of the population from which it is drawn. Second, and more important, probability theory permits us to estimate the accuracy or
representativeness That quality of a sample of having the same distribution of characteristics as the population from which it was selected. By implication, descriptions and explanations derived from an analysis of the sample may be assumed to represent similar ones in the population. Representativeness is enhanced by probability sampling and provides for generalizability and the use of inferential statistics. EPSEM (equal probability of selection method) A sample design in which each member of a population has the same chance of being selected into the sample.
190
Chapter 7: The logic of Sampling
representativeness of the sample. Conceivably, an uninformed researcher might. through wholly haphazard means, select a sample that nearly perfectly represents the larger population. The odds are against doing so, however, and we would be unable to estimate the likelihood that he or she has achieved representativeness. The probability sampler, on the other hand, can provide an accurate estimate of success or failure. We'll shortly see exactly how this estimate can be achieved. I've said that probability sampling ensures that samples are representative of the population we wish to study. As we'll see in a moment, probability sampling rests on the use of a random-selection procedure. To develop this idea, though, we need to give more-precise meaning to two important terms: element and population.* An element is that unit about which information is collected and that provides the basis of analysis. Typically, in survey research, elements are people or certain types of people. However, other kinds of units can constitute the elements for social research: Families, social clubs, or corporations might be the elements of a study. In a given study, elements are often the same as units of analysis, though the former are used in sample selection and the latter in data analysis. Up to now we've used the term popularion to mean the group or collection that we're interested in generalizing about. More formally, a population is the theoretically specified aggregation of would like to acknowledge a debt to Leslie Kish and his excellent textbook Slim?), Sampli11g . Although I've modified some of the conventions used by Kish, his presentation is easily the most important source of this discussion. *1
element That unit of which a population is comprised and which is selected in a sample . Distinguished from Ilnits of a11alysis, which are used in data analysis. population The theoretically specified aggregation of the elements in a study. study population That aggregation of elements from which a sample is actually selected.
TIle Theory and logic of Probability Sampling study elements. Whereas the vague term Americans might be the target for a study, the delineation of the population would include the definition of the element Americans (for example, citizenship, residence) and the time referent for the study (Americans as of when?). Translating the abstract "adult New Yorkers" into a workable population would require a specification of the age defining adult and the boundaries of New York. Specifying the term college student would include a consideration of full- and part-time students, degree candidates and nondegree candidates, undergraduate and graduate students, and so forth. A study population is that aggregation of elements from which the sample is actually selected. As a practical matter, researchers are seldom in a position to guarantee that every element meeting the theoretical definitions laid down actually has a chance of being selected in the sample. Even where lists of elements exist for sampling purposes, the lists are usually somewhat incomplete. Some students are always inadvertently omitted from student rosters. Some telephone subscribers request that their names and numbers be unlisted. Often, researchers decide to limit their study populations more severely than indicated in the preceding examples . National polling firms may limit their national samples to the 48 adjacent states, omitting Alaska and Hawaii for practical reasons . A researcher wishing to sample psychology professors may limit the study population to those in psychology departments, omitting those in other departments. Whenever the population under examination is altered in such fashions, you must make the revisions clear to your readers.
methods for estimating the degree of probable success. Random selection is the key to this process. In random selection, each element has an equal chance of selection independent of any other event in the selection process. Flipping a coin is the most frequently cited example: Provided that the coin is perfect (that is, not biased in terms of coming up heads or tails), the "selection" of a head or a tail is independent of previous selections of heads or tails. No matter how many heads tum up in a row, the chance that the next flip will produce "heads" is exactly 50-50. Rolling a perfect set of dice is another example. Such images of random selection, although useful. seldom apply directly to sampling methods in social research. More typically, social researchers use tables of random numbers or computer programs that provide a random selection of sampling units. A sampling unit is that element or set of elements considered for selection in some stage of sampling. In Chapter 9, on survey research, we'll see how computers are used to select random telephone numbers for interviewing, a technique called random-digit dialing. The reasons for using random selection methods are twofold. First, this procedure serves as a check on conscious or unconscious bias on the part of the researcher. The researcher who selects cases on an intuitive basis might very well select cases that would support his or her research expectations or hypotheses. Random selection erases this danger. More important, random selection offers access to the body of probability theory, which provides the basis for estimating the characteristics of the population as well as estimating the accuracy of samples. Let's now examine probability theory in greater detail.
samples and to analyze the results of their sampling statistically. More formally, probability theory provides the basis for estimating the parameters of a population. A parru:neter is the summary description of a given variable in a population. The mean income of all families in a city is a parameter; so is the age distribution of the city's population. When researchers generalize from a sample, they're using sample observations to estimate population parameters. Probability theory enables them to both make these estimates and arrive at a judgment of how likely the estimates ,viII accurately represent the actual parameters in the population. For example, probability theory allows pollsters to infer from a sample of 2,000 voters how a population of 100 million voters is likely to vote-and to specify exactly the probable margin of error of the estimates. Probability theory accomplishes these seemingly magical feats by way of the concept of sampling distributions. A single sample selected from a population will give an estimate of the population parameter. Other samples would give the same or slightly different estimates. Probability theory tells us about the distribution of estimates that would be produced by a large number of such samples. To see how this works, we'll look at two examples of sampling distributions, beginning with a simple example in which our popUlation consists of just ten cases.
The Sampling Distribution of Ten Cases Suppose there are ten people in a group, and each has a certain amount of money in his or her pocket. To simplify, let's assume that one person has no money, another has one dollar, another has two dollars, and so forth up to the person with
Random Selection With these definitions in hand, we can define the ultimate purpose of sampling: to select a set of elements from a population in such a way that descriptions of those elements accurately portray the total population from which the elements are selected . Probability sampling enhances the likelihood of accomplishing this aim and also provides
Probability Theory, Sampling Distribution~ and Estimates of Sampling Error Probabilily theOl)' is a branch of mathematics that provides the tools researchers need to devise sampling techniques that produce representative
191
random selection A sampling method in which each element has an equal chance of selection independent of any other event in the selection process" sampling unit That element or set of elements considered for selection in some stage of sampling. parameter The summary description of a given variable in a population.
The Theory and logic of Probability Sampling
192 "' Chapter 7: The logic of Sampling 10 9 Q)
Q.
7
-
6
Eo
(\\..-
fIJ
II
Oro
Cia ~c :::l
Z
10 9
~-------~.
8
fIJ
Q)
--"----
E;:n (\\'vith replacement-that is, every sampling unit selected is "thrown back into the
pot" and could be selected again. Second, our discussion has greatly oversimplified the inferential jump from the distribution of several samples to the probable characteristics of one sample. I offer these cautions to provide perspective on the uses of probability theory in sampling. Social researchers often appear to overestimate the precision of estimates produced by the use of probability theory. As I'll mention elsewhere in this chapter and throughoUl the book, variations in sampling techniques and nonsampling factors may further reduce the legitimacy of such estimates . For example, those selected in a sample who fail or refuse to participate further detract from the representativeness of the sample. Nevertheless, the calculations discussed in this section can be extremely valuable to you in understanding and evaluating your data, Although the calculations do not provide as precise estimates as some researchers might assume, they can be quite valid for practical purposes. They are unquestionably more valid than less rigorously derived estimates based on less-rigorous sampling methods. Most important, being familiar with the basic logic underlying the calculations can help you react sensibly both to your own data and to those reported by others.
Populations and Sampling Frames The preceding section introduced the theoretical model for social research sampling. Although as students, research consumers, and researchers we need to understand that theory, it's no less important to appreciate the less~than-perfect conditions that exist in the field. In this section we'll look at one aspect of field conditions that requires a compromise with idealized theoretical conditions and assumptions: the congruence of or disparity between populations of sampling frames. Simply put, a sampling frame is the list or quasi list of elements from which a probability sample is selected. If a sample of students is selected from a student roster, the roster is the sampling frame. If the primary sampling unit for a
199
complex population sample is the census block, the list of census blocks composes the sampling frame-in the form of a printed booklet, a magnetic tape file, or some other computerized record. Here are some reports of sampling frames appearing in research journals In each example I've italicized the actual sampling frames. The data for this research were obtained from a random sample of pareJUs of children ill the rhird grade in public alld parochial schools in Yakima Coumy. Washington. (Pe[t'/'sm alld Maynard 1981: 92)
The sample at Time 1 consisted of 160 names drawn randomly from the telephone directory of Lubbock, Texas. (Tan 1980..242)
The data reported in this paper. " . were gathered from a probability sample of adults aged 18 alld Ol'e/" residing ill hOllselzolds ill the 48 contigllolls United Srares, Personal interviews with 1,914 respondents were conducted by the Survey Research Center of the University of Michigan during the fall of 1975 . (Jackmal/ mzd Sel/ler 1980.: 345)
Properly drawn samples provide information appropriate for describing the population of elements composing the sampling frame-nothing more. I emphasize this point in view of the alltoo-common tendency for researchers to select samples from a given sampling frame and then make assertions about a population similar to, but not identical to, the population defined by the sampling frame. For example, take a look at this report, which discusses the drugs most frequently prescribed by US" physicians: Information on prescription drug sales is not easy to obtain. But Rinaldo V DeNuzzo, a
sampling frame That list or quasi list of units composing a population from which a sample is selected. If the sample is to be representative of the population, it is essential that the sampling frame include all (or nearly all) members of the population"
200
Chapter 7: The logic of Sampling
professor of pharmacy at the Albany College of Pharmacy, Union University, Albany. NY, has been tracking prescription drug sales for 15 years by polling nearby drugstores. He publishes the results in an industry trade magazine, lvIM&M . DeNuzzo's latest survey, covering 1980, is based on reports from 66 pharmacies in 48 communities in New York and New Jersey. Unless there is something peculiar about that part of the country, his findings can be taken as representative of what happens across the country. (Aloskoll'irz 1981 33)
What is striking in the excerpt is the casual comment about whether there is anything peculiar about New York and New Jersey. There is. The lifestyle in these two states hardly typifies the other 48. We cannot assume that residents in these large, urbanized, eastern seaboard states necessarily have the same drug-use patterns that residents of Mississippi. Nebraska, or Vermont do. Does the survey even represent prescription patterns in New York and New Jersey? To determine that, we would have to know something about the way the 48 communities and the 66 pharmacies were selected. We should be wary in this regard, in view of the reference to "polling nearby drugstores." As we'll see, there are several methods for selecting samples that ensure representativeness, and unless they're used, we shouldn't generalize from the study findings. A sampling frame, then, must be consonant with the population we wish to study. In the simplest sample design, the sampling frame is a list of the elements composing the study population. In practice, though, existing sampling frames often define the study population rather than the other way around. That is, we often begin with a population in mind for our study; then we search for possible sampling frames. Having examined and evaluated the frames available for our use, we decide which frame presents a study population most appropriate to our needs. Studies of organizations are often the simplest from a sampling standpoint because organizations
Populations and Sampling Frames
typically have membership lists. In such cases, the list of members constitutes an excellent sampling frame. If a random sample is selected from a membership list the data collected from that sample may be taken as representative of all members-if all members are included in the list Populations that can be sampled from good organizational lists include elementary schooL high schooL and university students and faculty; church members; factory workers; fraternity or sorority members; members of social, service, or political clubs; and members of professional associations. The preceding comments apply primarily to local organizations. Often, statewide or national organizations do not have a single membership list There is, for example, no single list of Episcopalian church members . However, a slightly more complex sample design could take advantage of local church membership lists by first sampling churches and then subsampling the membership lists of those churches selected. (More about that lateL) Other lists of individuals may be especially relevant to the research needs of a particular study. Government agencies maintain lists of registered voters, for example, that might be used if you wanted to conduct a preelection poll or an in-depth examination of voting behavior-but you must insure that the list is up-to-date. Similar lists contain the names of automobile owners, welfare recipients, taxpayers, business permit holders, licensed professionals, and so forth. Although it may be difficult to gain access to some of these lists, they provide excellent sampling frames for specialized research purposes. Realizing that the sampling elements in a study need not be individual persons, we may note that the lists of other types of elements also exist: universities, businesses of various types, cities, academic journals, newspapers, unions, political clubs, professional associations, and so forth. Telephone directories are frequently used for "quick and dirty" public opinion polls. Undeniably they're easy and inexpensive to use-no doubt the reason for their popularity. And, if you want to make assertions about telephone subscribers, the
directory is a fairly good sampling frame. (Realize, of course, that a given directory "vill not include new subscribers or those who have requested unlisted numbers. Sampling is further complicated by the directories' inclusion of nonresidemiallistings.) Unfortunately, telephone directories are all too often used as a listing of a city's population or of its voters. Of the many defects in this reasoning, the chief one involves a bias, as we have seen. Poor people are less likely to have telephones; rich people may have more than one line. A telephone directory sample, therefore, is likely to have a middle- or upper-class bias. The class bias inherent in telephone directory samples is often hidden. Preelection polls conducted in this fashion are sometimes quite accurate, perhaps because of the class bias evident in voting itself: Poor people are less likely to vote. Frequently, then, these two biases nearly coincide, so that the results of a telephone poll may come very close to the final election outcome. Unhappily, you never know for sure until after the election. And sometimes, as in the case of the 1936 Literary DigeS[ polL you may discover that the voters have not acted according to the expected class biases. The ultimate disadvantage of this method, then, is the researcher's inability to estimate the degree of error to be expected in the sample findings. Street directories and tax maps are often used for easy samples of households, but they may also suffer from incompleteness and possible bias. For example, in strictly zoned urban regions, illegal housing units are unlikely to appear on official records. As a result, such units could not be selected, and sample findings could not be representative of those units, which are often poorer and more crowded than the average. Though the preceding comments apply to the United States, the situation is quite different in some other countries. In Japan, for example, the government maintains quite accurate population registration lists. Moreover, citizens are required by law to keep their information up-to-date, such as changes in residence or births and deaths in the household. As a consequence, you can select simple random samples of the population more
201
easily in Japan than in the United States. Such a registration list in the United States would conflict directly with this country's norms regarding individual privacy.
Review of Populations and Sampling Frames Because social research literature gives surprisingly little attention to the issues of populations and sampling frames, I've devoted special attention to them. Here is a summary of the main guidelines to remember: 1. Findings based on a sample can be taken as representing only the aggregation of elements that compose the sampling frame. 1. Often, sampling frames do not truly include
all the elements their names might imply. Omissions are almost inevitable. Thus, a first concern of the researcher must be to assess the extent of the omissions and to correct them if possible . (Of course, the researcher may feel that he or she can safely ignore a small number of omissions that cannot easily be corrected. ) 3. To be generalized even to the population composing the sampling frame, all elements must have equal representation in the frame. Typically, each element should appear only once. Elements that appear more than once ,viII have a greater probability of selection, and the sample wilL overall, overrepresent those elements. Other, more practical matters relating to populations and sampling frames will be treated elsewhere in this book. For example, the form of the sampling frame-such as a list in a publication, a 3-by-5 card file, CD-ROMS, or magnetic tapescan affect how easy it is to use. And ease of use may often take priority over scientific considerations: An "easier" list may be chosen over a "harder" one, even though the latter is more appropriate to the target population. We should not take a dogmatic position in this regard, but
202
Types of Sampling Designs
Chapter 7: The Logic of Sampling
every researcher should carefully weigh the relative advantages and disadvantages of such alternativeso
Types of Sampling Designs Up to this point. we've focused on simpl~ r.andon: sampling (SRS). Indeed, the body of statistics tYPIcally used by social researchers assumes such a sample. As you'll see shortly, however, you have several options in choosing your sampling method, and you'll seldom if ever choose simple ra~dom . sampling. There are two reasons for this. FlfSt, with all but the simplest sampling frame, simple random sampling is not feasible. Second, and probably surprisingly, simple random sampling may not be the most accurate method available. Let's turn now to a discussion of simple random sampling and the other options available.
Simple Random Sampling As noted, simple random sampling is the basic sampling method assumed in the statistical com~u tations of social researcho Because the mathematics of random sampling are especially complex, we'll detour around them in favor of describing the ways of employing this method in the field. Once a sampling frame has been properly established, to use simple random sampling the
simple random sampling A type of probability. sampling in which the units composing a populat.lon are assigned numbers. A set of random numbers IS then generated, and the units having those numbers are included in the sample. systematic sampling A ty.r~ of p:o~ability sampiing in which every kth unIt m a list IS selected for inclusion in the sample-for example, every 25th student in the college directory of students. You compute k by dividing the size of the popul~tio~ by the desired sample size; k is called the salllpllllg Il11erval. Within certain constraints, systematic sampling is a functional equivalent of simple random sam- . piing and usually easier to do. Typically, the first umt is selected at random,
researcher assigns a single number to each element in the list. not skipping any number in the process. A table of random numbers (Appendix C) is then used to select elements for the sample. "Using a Table of Random Numbers" explains its use. If your sampling frame is in a machine-. readable form, such as CD-ROM or magnetic tape, a simple random sample can be selected automatically by computer. (In effect. the computer program numbers the elements in the sampling frame, generates its own series of random numbers, and prints out the list of elements selected.) Fiaure 7-11 offers a graphic illustration of simpl: random sampling. Note that the members of our hypothetical micropopulation have been numbered from I to 100. Moving to Appendix C we decide to use the last two digits of the first column and to begin with the third number from the top" This yields person number 30 as the first one selected into the sample. Number 67 is next. and so fortho (Person 100 would have been selected if "00" had come up in the lis!.)
nsocial research, it's often appropriate to ;elect aset of ra~dom numbers from atable such as the one In AppendiX CHeres hO'!1 to do that Suppose you wani to select asimple random sample of 100 people (or other units) out of apopulation totaling 980
I
2
Systematic Sampling Simple random sampling is seldom used in pra~ tice. As you'll see, it's not usually the most effiCIent method, and it can be laborious if done manually. Typically, simple random sampling requires a list of elements. When such a list is available, researchers usually employ systematic sampling instead. . In systematic sampling, every kth element 1I1 the total list is chosen (systematically) for inclusion in the sampleo If the list contained 10,000 elements and you wanted a sample of LOOO, you would select every tenth element for your sampleo To ensure against any possible human bias in using this method, you should select the first element at random. Thus, in the preceding example, you would one beain bv~ selecting a random number between v .. and ten. The element having that number IS 1I1cluded in the sample, plus every tenth element following it. This method is technically referred to as a svstemaric sample with a random start. Two terms are f;equently used in connection with systematic
To begin, number the members of the population in this case, from 1to 980" How the problem is to select 100 random numbers. Once you've done that, your sample will consist of the people having the numbers you've selected (Naie It's not essential to actually number them, as long as you're sure of the total. If you have them in a list, for example, you can always count through the list after you've selected the numbers) The next step is to determine the number of digits you'll need in the random numbers you select In our example, there are 980 members of the population, so you'll need three-digit numbers to give everyone achance of selection. (If there were 11,825 members of the population, you'd need to select five-digit numbers) Thus, we want TO select 100 random numbers in the range from 001 to 980 110','1 turn to the first page of Appendix Ulotice there are several rows and columns of five-digit numbers, and there are several pages The table represents aseries of random numbers in the range from 00001 to 99999 To use the table for your hypothetical sample, you have to ansvler these questions a How will you create three-digit numbers out of five-digit numbers) What pattern will you follo'!l in moving through the table to select your numbers? Where will you start)
6
7.
Each of these questions has several satisfactory answers. The key is to create aplan and follow it Here's an example 4
To create three-digit numbers from five-digit numbers, let's agree to select five-digit numbers from the table but consider only the left-most three digits in each case If 'Ne picked the first number on the first page-10480-we'd only consider the 104 (We could
9
203
agree to take the digits farthest to the right. 480, or the middle three digits, 048, and any of these plans 'IIould work) They key is to make aplan and stick 'ilith it For convenience, let's use the leftmost three digits We can also choose to progress through the tables any way we Vlant daVin the columns, up them,across to the right orto the left, or diagonally Again, any of these plans will work just fine as long as Vie stick to it For convenience, let's agree to move down the columns.vVhen we.get to the bottom of one column, we'll go to the tOp of the next Ilow, ,!Ihere do VIe start) You can close your eyes and stick apencil into the table and start wherever the pencil point lands (I know it doesn't sound SCientific, but it works) Or, if you're afraid you'll hurt the book or miss it altogether, close your eyes and make up acolumn number and arow number. ("I'll pick the number in the fifth row of column 2") Start with that number. Let's suppose Vie decide to start with the fifth number in column 2 If you look on the first page of Appendix C, you'll see that the startina number is 39975 vVe've selected 399 as our first random numb;r, and we haVe 99 more to go. Moving down the second column, Vie select 069, 729, 919,143,368,695,409,939, and so forth.M the bonom of column 2 (on the second page of this table), we select number 017 and continue to the top of column 3 015,255, and so on See how easy it is) But trouble lies ahead When we reach column 5, ','ie're along, selecting 816.309,763,078,061,277, 988 .VIait aminute l There are only 980 students in the senior class.HO','j can 'lIe pick number 988lThe solution is simple Ignore it Any time you come across anumber that lieS outside your range, skip it and continue on your way i 88,17 4,and so forth.The same solution applies if the same number comes up more than once If you select 399 again, for example,just ignore it the second time That's it You up the procedure until you've selected 100 random numbers. Returning to your list, your sample consists of person number 399, person number 69, person number 729, and so forth
204
2
3
4
5
6
ft
26 27 28 29 30 31
51
Types of Sampling DeSigns
Chapter 7: The logic of Sampling
f'
7
8
9
10 11
32 33 34 35 36 37 38 39 40 41
52 53 54 55 56 57 58 59 60 61
t
76 77 78 79 80 81
'f • • • tt ft.
82 83 84 85 86 87 88 89 90 91
15011 46573 48360 93093 39975
01536 25595 22527 06243 81837
77921 99562 96301 89579 85475
06907 72905 91977 14342 36857
11008 56420 05463 63661 53342
28918 63553 09429
69578 40961 93969
88231 48235 52636
ff
42 43 44 45 46 47 48 49 50
62 63 64 65 66 67 68 69 70 71
t t
10480 22368 24130 42167 37570
t
72 73 74 75
92 93 94 95 96 97 98 99 100
30
67
70
79
75
•
62
21
18
• 53
FIGURE 7-11 A Simple Random Sample. Having numbered everyone in the population, we can use a table of random numbers to select a representative sample from the overall population. Anyone whose number is chosen from the table is in the sample.
sampling . The sampling interval is the standard distance between elements selected in the sample: ten in the preceding sample. The sampling ratio is sampling interval The standard distance between elements selected from a population for a sample. sampling ratio The proportion of elements in the population that are selected to be in a sample.
the proportion of elements in the population that are selected: J/l 0 in the example.
sampling interval =
population size I' samp e size
sample size ., samplina ratio = '" population size
In practice, systematic sampling is virtually identical to simple random sampling. If the list of elements is indeed randomized before sampling, one might argue that a systematic sample drawn from that list is in fact a simple random sample. By now, debates over the relative merits of simple random sampling and systematic sampling have been resolved largely in favor of the latter, simpler method. Empirically, the results are virtually identicaL And, as you'll see in a later section, systematic sampling, in some instances, is slightly more accurate than simple random sampling. There is one danger involved in systematic sampling. The arrangement of elements in the list can make systematic sampling umvise. Such an arrangement is usually called periodicity. If the list of elements is arranged in a cyclical pattern that coincides with the sampling interval, a grossly biased sample may be drawn. Here are two examples that illustrate this danger. In a classic study of soldiers during World War II. the researchers selected a systematic sample from unit rosters. Every tenth soldier on the roster was selected for the study. The rosters, however. were arranged in a table of organizations: sergeants first then corporals and privates, squad by squad. Each squad had ten members. As a result every tenth person on the roster was a squad sergeant. The systematic sample selected contained only sergeants. It could, of course, have been the case that no sergeants were selected for the same reason. As another example, suppose we select a sample of apartments in an apartment building. If the sample is drmvn from a list of apartments arranged in numerical order (for example, lOt I02, I03, 104, 201. 202, and so on), there is a danger of the sampling interval coinciding with the number of apartments on a floor or some multiple thereof. Then the samples might include only northwestcorner apartments or only apartments near the elevator. If these types of apartments have some other particular characteristic in cornman (for example, higher rent). the sample will be biased. The same danger would appear in a systematic sample of houses in a subdivision arranged with the same number of houses on a block. In considering a systematic sample from a list then, you should carefully examine the nature of
205
that list. If the elements are arranged in any particular order. you should figure out whether that order will bias the sample to be selected, then you should take steps to counteract any possible bias (for example, take a simple random sample from cyclical portions). Usually, however, systematic sampling is superior to simple random sampling, in convenience if nothing else. Problems in the ordering of elements in the sampling frame can usually be remedied quite easily.
Stratified SamPling So far we've discussed two methods of sample selection from a list: random and systematic Stratification is not an alternative to these methods; rather, it represents a possible modification of their use. Simple random sampling and systematic sampling both ensure a degree of representativeness and permit an estimate of the error present. Stratified sampling is a method for obtaining a greater degree of representativeness by decreasing the probable sampling error. To understand this method, we must return briefly to the basic theory of sampling distribution. Recall that sampling error is reduced by two factors in the sample design. First a large sample produces a smaller sampling error than a small sample does. Second, a homogeneous population produces samples with smaller sampling errors than a heterogeneous population does. If 99 percent of the population agrees with a certain statement, it's extremely unlikely that any probability sample will greatly misrepresent the extent of agreement. If the population is split 50-50 on the statement, then the sampling error will be much greater.
stratification The grouping of the units composing a population into homogeneous groups (or strata) before sampling. This procedure, which may be used in conjunction with simple random, systematic, or cluster sampling, improves the representativeness of a sample, at least in terms of the stratification variables.
206
Chapter 7: The logic of Sampling
Stratified sampling is based on this second factor in sampling theory, Rather than selecting a sample from the total population at large, the researcher ensures that appropriate numbers of elements are drawn from homogeneous subsets of that population, To get a stratified sample of university students, for example, you would first organize your population by college class and then draw appropriate numbers of freshmen, sophomores, juniors, and seniors, In a nonstratified sample, representation by class would be subjected to the same sampling error as other variables would, In a sample stratified by class, the sampling error on this variable is reduced to zero, More-complex stratification methods are also possible, In addition to stratifying by class, you might also stratify by gender, by GPA, and so forth, In this fashion you might be able to ensure that your sample would contain the proper numbers of male sophomores with a 35 average, of female sophomores with a 4,0 average, and so forth, The ultimate function of stratification, then, is to organize the population into homogeneous subsets ("\ovith heterogeneity between subsets) and to select the appropriate number of elements from each, To the extent that the subsets are homogeneous on the stratification variables, they may be homogeneous on other variables as welL Because age is related to college class, a sample stratified by class ,viII be more representative in terms of age as well, compared with an unstratified sample, Because occupational aspirations still seem to be related to gender, a sample stratified by gender will be more representative in terms of occupational aspirations, The choice of stratification variables typically depends on what variables are available, Gender can often be determined in a list of names, University lists are typically arranged by class, Lists of facuIty members may indicate their departmental affiliation, Government agency files may be arranged by geographic region, Voter registration lists are arranged according to precinct In selecting stratification variables from among those available, however, you should be concerned primarily with those that are presumably related to variables you want to represent accurately, Because gender is related to many variables and is often
Types of Sampling Designs
available for stratification, it's often used, Education is related to many variables, but it's often not available for stratification, Geographic location within a city, state, or nation is related to many things, Within a city, stratification by geographic location usually increases representativeness in social class, ethnic group, and so forth. Within a nation, it increases representativeness in a broad range of attitudes as well as in social class and ethnicity, When you're working with a simple list of all elements in the population, two methods of stratification predominate, In one method, you sort the population elements into discrete groups based on whatever stratification variables are being used. On the basis of the relative proportion of the population represented by a given group, you selectrandomly or systematically-several elements from that group constituting the same proportion of your desired sample size. For example, if sophomore men with a 4,0 average compose 1 percent of the student population and you desire a sample of LOOO students, you would select 10 sophomore men with a 4.0 average. The other method is to group students as described and then put those groups together in a continuous list beginning with all freshmen men vvith a 4.0 average and ending with all senior women with a 1,0 or below. You would then select a systematic sample, with a random start, from the entire list Given the arrangement of the list, a systematic sample 'would select proper numbers (within an error range of 1 or 2) from each subgroup, (Note: A simple random sample drawn from such a composite list would cancel out the stratification.) Figure 7-12 offers a graphic illustration of stratified, systematic sampling, As you can see, we lined up our micropopulation according to gender and race. Then, beginning with a random start of "3," we've taken every tenth person thereafter: 3, 13, 23, ' , , , 93. Stratified sampling ensures the proper representation of the stratification variables; this, in turn, enhances the representation of other variables related to them. Taken as a whole, then, a stratified sample is more likely than a simple random sample to be more representative on several variables. Although the simple random sample is still re-
207
Random start
.jJ
2
3
4
5
6
7
8
9
10
11
=======. ..
12
26 27 28
29 30 31
32 33 34 35
.•
36
=======__• ===_
13 14 15
·.~.x~_~+~jpil.*~,
37 38
16
39 40 41
17 18
19 20 21
22 23 24 25
42 43 44 45 46 47 48 49
50
=====.~~~~-~ &~========-.=======~
51
52 53 54 55
56 57 58
59
60
61
62 63 64 65
66 67 68
69
70 71
j.
72 73 74 75
.. " j .• ?'~j~>.
76 77 78
79 80 81
82 83 84 85
86
87 88
tttttttttttt 89
90
91
92
·."·•.••doRis·
3
13
23
33
43
53
63
93 94 95
96
97
98
99 100
•
73
83
t 93
FIGURE 7-12 A Stratified, Systematic Sample with a Random Start A stratified, systematic sample involves two stages, First the members of the population are gathered into homogeneous strata; this simple example merely uses gender as a stratification variable, but more could be used. Then every kth (in this case, every 10th) person in the stratified arrangement is selected into the sample,
garded as somewhat sacred, it should now be clear that you can often do betteL
ImpHdtSifailficailon in Systematic SampHng I mentioned that systematic sampling can, under certain conditions, be more accurate than simple random sampling . This is the case whenever the arrangement of the list creates an implicit stratification. As already noted, if a list of university students is arranged by class, then a systematic sample provides a stratification by class where a simple random sample would not
In a study of students at the University of Hawaii, after stratification by school class, the students were arranged by their student identification numbers. These numbers, however, were their social security numbers, The first three digits of the social security number indicate the state in which the number was issued. As a result, within a class, students were arranged by the state in which they were issued a social security number, providing a rough stratification by geographic origin. An ordered list of elements, therefore, may be more useful to you than an unordered, randomized list I've stressed this point in view of the unfortunate belief that lists should be randomized
208
Multistage Cluster Sampling
01apter 7: The logic of Sampling
before systematic sampling. Only if the arrangement presents the problems discussed earlier should the list be rearranged.
!IIustration: Sampling University Students Let's put these principles into practice by looking at an actual sampling design used to select a sample of university students. The purpose of the study was to survey, with a mail-out questionnaire, a representative cross section of students attending the main campus of the University of Hawaii. The following sections describe the steps and decisions involved in selecting that sample,
Study Population and Sampling Frame The obvious sampling frame available for use in this sample selection was the computerized file maintained by the university administration. The tape contained students' names, local and permanent addresses, and social security numbers, as well as a variety of other information such as field of study, class, age, and gendeL The computer database, however, contained files on all people who could, by any conceivable definition, be called students, many of whom seemed inappropriate for the purposes of the study. As a result, researchers needed to define the study population in a somewhat more restricted fashion. The final definition included those 15,225 dayprogram degree candidates who were registered for the fall semester on the Manoa campus of the university, including all colleges and departments, both undergraduate and graduate students, and both US. and foreign students. The computer program used for sampling, therefore, limited consideration to students fitting this definition.
Stratification The sampling program also permitted stratification of students before sample selection, The researchers decided that stratification by college class would be sufficient, although the students might have been further stratified within class, if desired, by gender, college, major, and so forth.
Sample Selection Once the students had been arranged by class, a systematic sample was selected across the entire rearranged list. The sample size for the study was initially set at 1, 100. To achieve this sample, the sampling program was set for a lIl4 sampling ratio, The program generated a random number between 1 and 14; the student having that number and every fourteenth student thereafter was selected in the sample. Once the sample had been selected, the computer was instructed to print each student's name and mailing address on self-adhesive mailing labels. These labels were then simply transferred to envelopes for mailing the questionnaires.
Sample Modification This initial design of the sample had to be modified. Before the mailing of questionnaires, the researchers discovered that, because of unexpected expenses in the production of the questionnaires, they couldn't cover the costs of mailing to all 1,100 students, As a result, one-third of the mailing labels were systematically selected (with a random start) for exclusion from the sample. The final sample for the study was thereby reduced to 733 students, I mention this modification in order to illustrate the frequent need to alter a study plan in midstream, Because the excluded students were systematically omitted from the initial systematic sample, the remaining 733 students could still be taken as reasonably representing the study population, The reduction in sample size did, of course, increase the range of sampling error,
Multistage Cluster Sampling The preceding sections have dealt with reasonably simple procedures for sampling from lists of elements, Such a situation is ideal. Unfortunately, however, much interesting social research requires the selection of samples from populations that can· not easily be listed for sampling purposes: the population of a city, state, or nation; all university students in the United States; and so forth. In such
cases, the sample design must be much more complex. Such a design typically involves the initial sampling of groups of elements-dusters-followed by the selection of elements within each of the selected clusters. Cluster sampling may be used when it's either impossible or impractical to compile an exhaustive list of the elements composing the target population, such as all church members in the United States. Often, however, the population elements are already grouped into subpopulations, and a list of those subpopulations either exists or can be created practically. For example, church members in the United States belong to discrete churches, which are either listed or could be. Following a cluster sample format, then, researchers could sample the list of d1Urches in some manner (for example, a stratified, systematic sample). Next, they would obtain lists of members from each of the selected churches, Each of the lists would then be sampled, to provide samples of church members for study. (For an example, see Glock, Ringer, and Babbie 1967.) Another typical situation concerns sampling among population areas such as a city, Although there is no single list of a city's population, citizens reside on discrete city blocks or census blocks, Researchers can, therefore, select a sample of blocks initially, create a list of people living on each of the selected blocks, and take a subsample of the people on each block. In a more complex design, researchers might sample blocks, list the households on each selected block, sample the households, list the people residing in each household, and, finally, sample the people within each selected household. This multistage sample design leads ultimately to a selection of a sample of individuals but does not require the initiallisting of all individuals in the city's population, Multistage duster sampling, then, involves the repetition of two basic steps: listing and sampling. The list of primary sampling units (churches, blocks) is compiled and, perhaps, stratified for sampling, Then a sample of those units is selected . The se· lected primary sampling units are then listed and perhaps stratified, The list of secondary sampling units is then sampled, and so forth. The listing of households on even the selected blocks is, of course, a labor-intensive and costly
Ei
209
activity-one of the elements making face-to-face, household surveys quite expensive, Vincent Iannacchione, Jennifer Staab, and David Redden (2003) report some initial success using postal mailing lists for this purpose. Although the lists are not perfect. they may be close enough to warrant the significant savings in cost.. Multistage cluster sampling makes possible those studies that would othervvise be impossible, Specific research circumstances often call for special designs, as "Sampling Iran" demonstrates.
Multistage Designs and Sampling Error Although cluster sampling is highly efficient, the price of that efficiency is a less-accurate sample, A sin1ple random sample drawn from a population list is subject to a single sampling error. but a twostage cluster sample is subject to two sampling errors. First. the initial sample of clusters will represent the population of clusters only within a range of sampling error. Second, the sample of elements selected within a given cluster will represent all the elements in that cluster only within a range of sampling errOL Thus, for example, a researcher runs a certain risk of selecting a sample of disproportionately wealthy city blocks, plus a sample of disproportionately wealthy households within those blocks. The best solution to this problem lies in the number of clusters selected initially and the number of elements within each cluster. Typically, researchers are restricted to a total sample size; for example, you may be limited to conducting 2,000 interviews in a city. Given this broad limitation, however, you have several options in designing your cluster sample, At the extremes you could choose one cluster and select 2,000 elements within that cluster. or you could select 2,000 clusters with one element selected within each, Of course, neither approach is advisable, but a broad
cluster sampling A multistage sampling in which natural groups (clusters) are sampled initially. with the members of each selected group being subsampled afterward. For example, you might select a sample of US. colleges and universities from a directory, get lists of the students at all the selected schools, then draw samples of students from each,
210
Chapter 7: The logic of Sampling
hereas most of the examples given in this textbook are taken from its country of origin, the United States, the basic methods of sampling would apply in other national settings as well. At the same time, researchers may need to make modifications appropriate to local conditions. In selecting anational sample of Iran, for example, Abdollahyan and Azadarmaki (2000 21) from [he University ofTehran began by stratifying the nation on the basis of cultural differences, dividing the country into nine cultural zones as follovls
1. 2 3 4.
Tehran Central region including Isfahan, Arak, Qum,Yazd and Kerman The southern provinces including Hormozgan, Khuzistan, Bushehr and Fars The marginal western region including Lorestan, Charmahal and Bakhtiari, Kogiluyeh and Eelam
range of choices lies between them. Fortunately, the logic of sampling distributions provides a general guideline for this task. Recall that sampling error is reduced by two factors: an increase in the sample size and increased homogeneity of the elements being sampled. These factors operate at each level of a multistage sample design . A sample of clusters will best represent all clusters if a large number are selected and if all clusters are very much alike. A sample of elements will best represent all elements in a given cluster if a large number are selected from the cluster and if all the elements in the cluster are very much alike. With a given total sample size, however, if the number of clusters is increased, the number of elements within a cluster must be decreased . In this respect the representativeness of the clusters is increased at the expense of more poorly representing the elements composing each cluster, or vice versa. Fortunately, homogeneity can be used to ease this dilemma. Typically, the elements composing a given natural cluster within a population are more homogeneous than all elements composing the total population are. The members of a given church are more alike than all church members
Multistage Cluster Sampling
5. 6. 7. 8. 9.
The western provinces including western and eastern Azarbaijan, Zanjan, Ghazvin and Ardebil The eastern provinces including Khorasan and Semnan The northern provinces including Gilan, Mazandran and Golestan Systan Kurdistan
Within each of these cultural areas, the researchers selected samples of census blocks and, on each selected block, asample of households.Their sample design made provisions for getting the proper numbers of men and women as respondents within households and provisions for replacing those households where no one was at home. Source Hamid Abdollahyan and Taghi Azadarmaki,Sampiing Design in aSurvey Research: The Sampling Practice in Iran, paper presented to the meetings of the /\merican
Sociological Association, August 12-16,2000, Washington, DC.
are; the residents of a given city block are more alike than the residents of a whole city are. As a result relatively few elements may be needed to represent a given natural cluster adequately, although a larger number of clusters may be needed to represent adequately the diversity found among the clusters. This fact is most clearly seen in the extreme case of very different clusters composed of identical elements within each. In such a situation, a large number of clusters would adequately represent all its members . Although this extreme situation never exists in reality, it's closer to the truth in most cases than its opposite: identical clusters composed of grossly divergent elements. The general guideline for cluster design, then, is to maximize the number of clusters selected while decreasing the number of elements within each cluster. However, this scientific guideline must be balanced against an administrative constraint. The efficiency of cluster sampling is based on the ability to minimize the listing of population elements. By initially selecting clusters, you need only list the elements composing the selected clusters, not all elements in the entire population. Increasing the number of clusters, however, goes directly against this efficiency factor. A small number of clusters
may be listed more quickly and more cheaply than a large number. (Remember that all the elements in a selected cluster must be listed even if only a few are to be chosen in the sample . ) The final sample design will reflect these two constraints. In effect, you'll probably select as many clusters as you can afford. Lest this issue be left too open-ended at this point, here's one general guideline. Population researchers conventionally aim at the selection of 5 households per census block. If a total of 2,000 households are to be interviewed, you would aim at 400 blocks with 5 household intenrjews on each. Figure 7-13 presents a graphic overvievv of this process. Before we turn to other, more detailed procedures available to cluster sampling, let me reiterate that this method almost inevitably involves a loss of accuracy. The manner in which this appears, however, is somewhat complex. First as noted earlier, a multistage sample design is subject to a sampling error at each stage. Because the sample size is necessarily smaller at each stage than the total sample size, the sampling error at each stage will be greater than would be the case for a single-stage random sample of elements. Second, sampling error is estimated on the basis of observed variance among the sample elements. When those elements are drawn from among relatively homogeneous clusters, the estimated sampling error will be too optimistic and must be corrected in the light of the cluster sample design.
Stratification in Multistage Cluster Sampling Thus far, we've looked at cluster sampling as though a simple random sample were selected at each stage of the design. In fact stratification techniques can be used to refine and improve the sample being selected. The basic options here are essentially the same as those in single-stage sampling from a list. In selecting a national sample of churches, for example, you might initially stratify your list of churches by denomination, geographic region, size, rural or urban location, and perhaps by some measure of social class.
211
Once the primary sampling units (churches, blocks) have been grouped according to the relevant, available stratification variables, either simple random or systematic-sampling techniques can be used to select the sample. You might select a specified number of units from each group, or stratum, or you might arrange the stratified clusters in a continuous list and systematically sample that list. To the extent that clusters are combined into homogeneous strata, the sampling error at this stage will be reduced. The primary goal of stratification, as before, is homogeneity, There's no reason why stratification couldn't take place at each level of sampling. The elements listed within a selected cluster might be stratified before the next stage of sampling. Typically, however, this is not done . (Recall the assumption of relative homogeneity within clusters,)
Probability Proportionate to Size (PPS) Sampling This section introduces you to a more sophisticated form of cluster sampling, one that is used in many large-scale survey-sampling projects. In the preceding discussion, I talked about selecting a random or systematic sample of clusters and then a random or systematic sample of elements within each cluster selected. Notice that this produces an overall sampling scheme in which every element in the whole population has the same probability of selection. Let's say we're selecting households within a city If there are 1,000 city blocks and we initially select a sample of 100, that means that each block has a 10Qll,000 or 0.1 chance of being selected. If we next select 1 household in 10 from those residing on the selected blocks, each household has a 0.1 chance of selection within its block. To calculate the overall probability of a household being selected, we simply mUltiply the probabilities at the individual steps in sampling. That is, each household has a l/l 0 chance of its block being selected and a l/l 0 chance of that specific household being selected if the block is one of those chosen. Each household, in this case, has a l/l 0 X l/l 0 = l/l 00 chance of selection overall. Because each household would have the same chance of selection, the
212
Multistage Cluster Sampling
Chapter 7: The Logic of Sampling
JD
Stage One: Identify blocks and select a sample. (Selected blocks are shaded.)
1L~
5th St.
r--I
Stage Two: Go to each selected block and list all households in order . (Example of one listed block.) 1. 491 Rosemary Ave .
16. 408 Thyme Ave.,
2. 487 Rosemary Ave.,
17. 424 Thyme Ave.
3. 473 Rosemary Ave,
18. 446 Thyme Ave. 19. 458 Thyme Ave.
4. 455 Rosemary Ave,
5. 437 Rosemary Ave.
Stage Three: For
20. 480 Thyme Ave.
6. 423 Rosemary Ave,
each list, select sample of households.
22. 1186 5th St
7. 411 Rosemary Ave, 8. 403 Rosemary Ave. 9. 1101 4th St 10. 1123 4th St 11. 1137 4th Sf.
(In this example, every sixth household has been selected starting with #5, which was selected at random..)
21. 498 Thyme Ave. 23. 11745thSt. 24. 1160 5th St 25. 1140 5th Sf. 26. 1122 5th St.
12. 1157 4th Sf.
27. 1118 5th St.
13. 1169 4th Sf.
28. 1116 5th St
14. 1187 4th St
29. 11045th St.
15. 402 Thyme Ave.
30. 1102 5th St
FIGURE 7-13 Multistage Cluster Sampling. In multistage cluster sampling, we begin by selecting a sample of the clusters (in this case, city blocks), Then, we make a list of the elements (households, in this case) and select asample of elements from each of the selected clusters.
sample so selected should be representative of all households in the city. There are dangers in this procedure, however. In particular, the variation in the size of blocks (measured in numbers of households) presents a problem. Let's suppose that half the city's population resides in 10 densely packed blocks filled with
high-rise apartment buildings, and suppose that the rest of the population lives in single-family dwellings spread out over the remaining 900 blocks. When we first select our sample of III 0 of the blocks, it's quite possible that we'll miss all of the 10 densely packed high -rise blocks. No matter what happens in the second stage of sampling, our final
sample of households will be grossly unrepresentative of the city, comprising only single-family dwellings. Whenever the clusters sampled are of greatly differing sizes, it's appropriate to use a modified sampling design called PPS (probability proportionate to size), This design guards against the problem I've just described and still produces a final sample in which each element has the same chance of selection. As the name suggests, each cluster is given a chance of selection proportionate to its size. Thus, a city block with 200 households has twice the chance of selection as one with only 100 households. Within each cluster, however, a fixed number of elements is selected, say, 5 households per block. Notice how this procedure results in each household having the same probability of selection overall. Let's look at households of two different city blocks. Block A has 100 households, Block B has only 10. In PPS sampling, we would give Block A ten times as good a chance of being selected as Block B, So if, in the overall sample design, Block A has a 1120 chance of being selected, that means Block B would only have a 11200 chance. Notice that this means that all the households on Block A would have a 1120 chance of having their block selected; Block B households have only a 1/200 chance. If Block A is selected and we're taking 5 households from each selected block, then the households on Block A have a 5/1 00 chance of being selected into the block's sample. Because we can multiply probabilities in a case like this, we see that every household on Block A has an overall chance of selection equal to 1120 X 5/100 = 5/2000 = 11400, If Block B happens to be selected, on the other hand, its households stand a much better chance of being among the 5 chosen there: 5/1 0, When this is combined with their relatively poorer chance of having their block selected in the first place, however, they end up with the same chance of selection as those on Block A: 11200 X 5/1 0 = 51.2000 = 11400. Further refinements to this design make it a very efficient and effective method for selecting large cluster samples. For now, however, it's enough to understand the basic logic involved.
213
Disproportionate Sampling and Weighting Ultimately, a probability sample is representative of a population if all elements in the population have an equal chance of selection in that sample. Thus, in each of the preceding discussions, we've noted that the various sampling procedures result in an equal chance of selection-even though the ultimate selection probability is the product of several partial probabilities. More generally, however, a probability sample is one iIi which each population element has a known nonzero probability of selection-even though different elements may have different probabilities, If controlled probability sampling procedures have been used, any such sample may be representative of the population from which it is dravvn if each sample element is assigned a weight equal to the inverse of its probability of selection. Thus, where all sample elements have had the same chance of selection, each is given the same weight: 1. This is called a seliweiglzting sample. Sometimes it's appropriate to give some cases more weight than others, a process called weighting, Disproportionate sampling and weighting come into play in two basic ways. First, you may sample subpopulations disproportionately to ensure sufficient numbers of cases from each for analysis, For example, a given city may have a suburban area containing one-fourth of its total population. Yet you might be especially interested in a detailed analysis of households in that area and may feel that one-fourth of this total sample size would be too few. As a result, you might decide to select the
PPS (probability proportionate to size) This refers to a type of multistage cluster sample in which clusters are selected, not with equal probabilities (see EPSElvf) but with probabilities proportionate to their sizes-as measured by the number of units to be subsampled, weighting Assigning different weights to cases that were selected into a sample with different probabilities of selection. In the simplest scenario, each case is given a weight equal to the inverse of its probability of selection. When all cases have the same chance of selection, no weighting is necessary.
214
Chapter 7: The Logic of Sampling
Main Points
same number of households from the suburban area as from the remainder of the city. Households in the suburban area, then, are given a disproportionately better chance of selection than are those located elsewhere in the city. As long as you analyze the two area samples separately or comparatively, you need not worry about the differential sampling. If you want to combine the two samples to create a composite picture of the entire city, however, you must take the disproportionate sampling into account. If II is the number of households selected from each area, then the households in the suburban area had a chance of selection equal to n divided by onefourth of the total city population. Because the total city population and the sample size are the same for both areas, the suburban-area households should be given a weight of tn, and the remaining households should be given a weight of fIL This weighting procedure could be simplified by merely giving a weight of 3 to each of the households selected outside the suburban area. Here's an example of the problems that can be created when disproportionate sampling is not accompanied by a weighting scheme. When the Harvard Business Review decided to survey its subscribers on the issue of sexual harassment at work, it seemed appropriate to oversample women because female subscribers were vastly outnumbered by male subscribers . Here's how G. C Collins and Timothy Blodgett explained the matter: We also skewed the sample another way: to ensure a representative response from women, we mailed a questionnaire to virtually every female subscriber, for a male/female ratio of 68% to 32%. This bias resulted in a response of 52% male and 44% female (and 4% who gave no indication of gender)-compared to HBR's US. subscriber proportion of 93 % male and 7% female. (1981 78)
Notice a couple of things in this excerpt. First. it would be nice to know a little more about what "virtually every female" means. Evidently, the authors of the study didn't send questionnaires to all female subscribers, but there's no indication of who was omitted and why. Second, they didn't use the
term represemative with its normal social science usage. What they mean, of course, is that they wanted to get a substantial or "large enough" response from women, and oversampling is a perfectly acceptable way of accomplishing that. By sampling more women than a straightforward probability sample would have produced, the authors were able to "select" enough women (812) to compare vvith the men (960). Thus, when they report, for example, that 32 percent of the women and 66 percent of the men agree that "the amount of sexual harassment at work is greatly exaggerated," we know that the female response is based on a substantial number of cases. That's good. There are problems, however. To begin with, subscriber surveys are always problematic. In this case, the best the researchers can hope to talk about is "what subscribers to Harvard Business Review think." In a loose way, it might make sense to think of that population as representing the more sophisticated portion of corporate management. Unfortunately, the overall response rate was 25 percent. Although that's quite good for subscriber surveys, it's a low response rate in terms of generalizing from probability samples. Beyond that. however, the disproportionate sample design creates another problem. When the authors state that 73 percent of respondents favor company policies against harassment (Collins and Blodgett 1981: 78), that figure is undoubtedly too high, because the sample contains a disproportionately high percentage of women-who are more likely than men to favor such policies. And, when the researchers report that top managers are more likely to feel that claims of sexual harassment are exaggerated than are middle- and lower-level managers (1981: 81), that finding is also suspect. As the researchers report women are disproportionately represented in lower management. That alone might account for the apparent differences among levels of management. In short, the failure to take account of the oversampling of women confounds all survey results that don't separate the findings by gender. The solution to this problem would have been to weight the responses by gender, as described earlier in this section. In the 2000 and 2004 election campaign polling, survey weighting became a controversial
215
topic, as some polling agencies weighted their results on the basis of party affiliation and other variables, whereas others did not. Weighting in this instance involved assumptions regarding the differential participation of Republicans and Democrats in opinion polls and on election day-plus a determination of how many Republicans and DemocTats there were. This is likely to be a topic of debate among pollsters and politicians in the years to come . Alan Reifman has created a website devoted to a discussion of this topic (http://www.hs.ttu .edu/hdfs3390/weighting.htm).
predict an election but can't interview all voters. As we proceed through the book, we'll see in greater detail how social researchers have found ways to deal mth this issue.
s
Social researchers must select observations that mIl allow them to generalize to people and events not observed. Often this involves sampling, a selection of people to observe.
Probability Sampling
s
Understanding the logic of sampling is essential to doing social research.
in Review Much of this chapter has been devoted to the key sampling method used in controlled survey research: probability sampling. In each of the variations examined, we've seen that elements are chosen for study from a population on a basis of random selection mth known nonzero probabilities. Depending on the field situation, probability sampling can be either very simple or extremely difficult time-consuming, and expensive. Whatever the situation, however. it remains the most effective method for the selection of study elements. There are two reasons for this. First probability sampling avoids researchers' conscious or unconscious biases in element selection. If all elements in the population have an equal (or unequal and subsequently weighted) chance of selection, there is an excellent chance that the sample so selected mll closely represent the population of all elements. Second, probability sampling permits estimates of sampling error. Although no probability sample will be perfectly representative in all respects, controlled selection methods permit the researcher to estimate the degree of expected errOL In this lengthy chapter, we've taken on a basic issue in much social research: selecting observations that mIl tell us something more general than the specifics we've actually observed. This issue confronts field researchers, who face more action and more actors than they can observe and record fully, as well as political pollsters who want to
MAIN POINTS
Introduction
A Brief History of Sampling s
Sometimes you can and should select probability samples using precise statistical techniques, but other times nonprobability techniques are more appropriate.
Nonprobability Sampling s
Nonprobability sampling techniques include relying on available subjects, purposive or judgmental sampling, snowball sampling, and quota sampling. In addition, researchers studying a social group may make use of informants. Each of these techniques has its uses, but none of them ensures that the resulting sample mll be representative of the population being sampled.
The Theory and Logic of Probability Sampling s
Probability sampling methods provide an excellent way of selecting representative samples from large, known populations. These methods counter the problems of conscious and unconscious sampling bias by giving each element in the population a known (nonzero) probability of selection.
s
The key to probability sampling is random selection.
s
The most carefully selected sample will never provide a perfect representation of the population from which it was selected. There will always be some degree of sampling error.
216 8
8
Chapter 7: TIle LogiC of Sampling By predicting the distribution of samples with respect to the target parameter, probability sampling methods make it possible to estimate the amount of sampling error expected in a given sample. The expected error in a sample is expressed in terms of confidence levels and confidence intervals.
Online Study Resources 8
If the members of a population have unequal probabilities of selection into the sample, researchers must assign weights to the different observations made, in order to provide a representative picture of the total population. The weight assigned to a particular sample member should be the inverse of its probability of selection.
4. In Chapter 9, we'll discuss surveys conducted on the Internet. Can you anticipate possible problems concerning sampling frames, representativeness, and the like? Do you see any solutions? 5. Using InfoTrac College Edition, locate studies using (1) a quota sample, (2) a multistage cluster sample, and (3) a systematic sample. Write a brief description of each study.
Populations and Sampling Frames 8
A sampling frame is a list or quasi list of the members of a population. It is the resource used in the selection of a sample. A sample's representativeness depends directly on the extent to which a sampling frame contains all the members of the total population that the sample is intended to represent.
Types of Sampling Designs 8
8
8
8
Several sampling designs are available to researchers. Simple random sampling is logically the most fundamental technique in probability sampling, but it is seldom used in practice. Systematic sampling involves the selection of every kth member from a sampling frame. This method is more practical than simple random sampling; with a few exceptions, it is functionally equivalent. Stratification, the process of grouping the members of a population into relatively homogeneous strata before sampling, improves the representativeness of a sample by reducing the degree of sampling error.
Multistage Cluster Sampling 8
8
Multistage cluster sampling is a relatively complex sampling technique that frequently is used when a list of all the members of a population does not exist. Typically, researchers must balance the number of clusters and the size of each cluster to achieve a given sample size. Stratification can be used to reduce the sampling error involved in multistage cluster sampling. Probability proportionate to size (PPS) is a special. efficient method for multistage cluster sampling.
KEY TERMS ADDITIONAL READINGS
The following terms are defined in context in the chapter and at the bottom of the page where the term is introduced, as well as in the comprehensive glossary at the back of the book. cluster sampling confidence interval confidence level element EPSEM informant nonprobability sampling parameter population PPS probability sampling purposive (judgmental) sampling quota sampling random selection
representativeness sampling error sampling frame sampling interval sampling ratio sampling unit simple random sampling snowball sampling statistic stratification study population systematic sampling weighting
REVIEW QUESTIONS AND EXERCISES
1. Review the discussion of the 1948 Gallup Poll that predicted that Thomas Dewey would defeat Harry Truman for president. What are some ways Gallup could have modified his quota sample design to avoid the error? 2. Using Appendix C of this book, select a simple random sample of 10 numbers in the range of 1 to 9,876. What is each step in the process? 3. What are the steps involved in selecting a multistage cluster sample of students taking first-year English in US. colleges and universities?
Frankfort-Nachmias, Chava, and Anna LeonGuerrero. 2000. Social Statistics for a Diverse Society. 2nd ed. Thousand Oaks, CA: Pine Forge Press. See Chapter 11 especially. This statistics textbook covers many of the topics we've discussed in this chapter but in a more statistical context. It demonstrates the links between probability sampling and statistical analyses. Kalton, Graham. 1983 . Introduction ro Survey Sampling . Newbury Park, CA: Sage. Kalton goes into more of the mathematical details of sampling than the present chapter does, without attempting to be as definitive as Kish, described next. Kish, Leslie. 1965. Survey Sampling. New York: Wiley. Unquestionably the definitive work on sampling in social research. Kish's coverage ranges from the simplest matters to the most complex and mathematicaL both highly theoretical and downright practicaL Easily readable and difficult passages intermingle as Kish exhausts everything you could want or need to know about each aspect of sampling. Sudman, Seymour. 1983. "Applied Sampling." Pp. 145-94 in Handbook ofSurvey Research, edited by Peter H. Rossi. James D. Wright. and Andy B. Anderson. New York: Academic Press. An excellent. practical guide to survey sampling.
SPSS EXERCISES
See the booklet that accompanies your text for exercises using SPSS (Statistical Package for the Social Sciences). There are exercises offered for each chapter, and you'll also find a detailed primer on using SPSS.
217
Online Study Resources Sociology~' Now"': Research Methods 1. Before you do your final review of the chapter, take the SociologyNolV. Research Methods diagnostic quiz to help identify the areas on which you should concentrate. You'll find information on this online tool. as well as instructions on how to access all of its great resources, in the front of the book. 2.. As you review, take advantage of the Sociology Now: Research Methods customized study plan, based on your quiz results. Use this study plan with its interactive exercises and other resources to master the material. 3. When you're finished with your review, take the posttest to confirm that you're ready to move on to the next chapter .
WEBSITE FOR THE PRACTICE OF SOc/AL RESEARCH 11 TH EDITION Go to your book's website at http://sociology .wadswonh.com/babbie_practicelle for tools to aid you in studying for your exams. You'll find Tutorial Quizzes with feedback, Intemet Exercises, Flashcards, and Chapter TIltorials, as well as E).1ended Projects, II/foTrac College Edition search terms, Social Research ill Cyberspace. GSS Data, Web Links, and primers for using various data-analysis software such as SPSS and NVivo.
WEB LINKS FOR THIS CHAPTER Please realize that the Internet is an evolving entity, subject to change . Nevertheless, these few websites should be fairly stable. Also, check your book's website for even more VI't?b Lil/ks. These websites, current at the time of this book's publication, provide opportunities to learn about sampling.
Bill Trochim, Probability Sampling http://www.socialresearchmethods.netlkb/ sampprob.htm Survey Sampling, Inc., The Frame http://www.worldopinion.com/the_frame/ Bureau of Labor Statistics and Census Bureau, Sampling http://ww\v.bls.census . gov/cps/bsampdes.htm
•
I aving explored the structuring of inquiry in depth, we're now ready to dive into the various observational techniques available to social scientists. Experiments are usually thought of in connection with the physical sciences. In Chapter 8 we'll see how social scientists use experiments. This is the most rigorously controllable ofthe methods we'll examine. Understanding experiments is also a useful way to enhance your understanding of the general logic of social scientific research. Chapter 9 will describe survey research, one of the most popular methods in social science. This type of research involves collecting data by asking people questions-either in self-administered questionnaires or through interviews, which, in tum, can be conducted face-to-face or over the telephone. Chapter 10, on qualitative field research, examines perhaps the most natural form of data collection used by social scientists: the direct observation of social phenomena in natural settings. As you'll see, some researchers go beyond mere observation to participate in what they're studying, because they want a more intimate view and a fuller understanding of it.
Chapter 11 discusses three forms of unobtrusive data collection that take advantage of some of the data available all around us. For example, content analysis is a method of collecting social data through carefully specifying and counting social artifacts such as books, songs, speeches, and paintings. Without making any personal contact with people, you can use this method to examine a wide variety of social phenomena. 'The analysis of existing statistics offers another way of studying people without having to talk to them. Governments and a variety of private organizations regularly compile great masses of data, which you often can use with little or no modification to answer properly posed questions. Finally, historical documents are a valuable resource for social scientific analysis. Chapter 12, on evaluation research, looks at a rapidly growing subfield in social science, involving the application of experimental and quasi-experimental models to the testing of social interventions in real life. You might use evaluation research, for example, to test the effectiveness of a drug rehabilitation program or the efficiency of a new school cafeteria. In the same chapter,
we'll look briefly at social indicators as a way of assessing broader social processes. Before we tum to the actual descriptions of these research methods, two points should be made. First, you'll probably discover that you've been using these scientific methods casually in your daily life for as long as you can remember. You use some form of field research every day. You employ a crude form of content analysis every time you judge an author's motivation from her or his writings. You engage in at least casual experiments frequently. Part 3 will show you how to improve your use of these methods so as to avoid certain pitfalls. Second, none of the data-collection methods described in these chapters is appropriate to all research topiCS and situations. I give you some ideas, early in each chapter, of when agiven method might be appropriate. Still, I could never anticipate all the research topics that may one day interest you. As ageneral guideline, you should always use avariety of techniques in the study of any topic. BeCause each method has its weaknesses, the use of several methods can help fill in any gaps; if the different, independent approaches to the topic all yield the same conclusion, you've achieved aform of replication. 219
Topics Appropriate to Experiments
Introduction
Experiments
Introduction Topics Appropriate to Experiments The Classical Experiment Independent and Dependent Variables Pretesting and Posttesting Experimental and Control Groups The Double-Blind Experiment Selecting Subjects Probability Sampling Randomization Matching Matching or Randomization?
Variations on Experimental Design Preexperimental Research Designs Validity Issues in Experimental Research An Illustration of Experimentation Alternative Experimental Settings Web-Based Experiments "Natural" Experiments Strengths and Weaknesses of the Experimental Method
Sociology@Now"': Research Methods Use this online tool to help you make the grade on your next exam. After reading this chapter, go to the "Online Study Resources" at the end of the chapter for instructions on how to benefit from SociologyNotv: Research Methods..
This chapter addresses the research method most commonly associated 'with structured science in general: the experiment. Here we'll focus on the experiment as a mode of scientific observation in social research. At base, experiments involve ( 1) taking action and (2) observing the consequences of that action. Social researchers typically select a group of subjects, do something to them, and observe the effect of what was done. In this chapter, we'll examine the logic and some of the techniques of social scientific experiments. It's worth noting at the outset that we often use experiments in nonscientific inquiry. In preparing a stew, for example, we add salt, taste, add more salt, and taste again. In defusing a bomb, we clip the red wire, observe whether the bomb explodes, clip another, and ... We also experiment copiously in our attempts to develop generalized understandings about the world we live in. All skills are learned through experimentation: eating, walking, talking, riding a bicycle, swimming, and so forth. Through experimentation, students discover how much studying is required for academic success. Through experimentation, professors learn how much preparation is required for successfullectures. This chapter discusses how social researchers use experiments to develop generalized understandings. We'll see that. like other methods available to the social researcher, experimenting has its special strengths and weaknesses.
Topics Appropriate to Experiments Experiments are more appropriate for some topics and research purposes than others, Experiments are especially well suited to research projects involving relatively limited and well-defined concepts and propositions. In terms of the traditional image of science, discussed earlier in this book, the experimental model is especially appropriate for
221
hypothesis testing. Because experiments focus on determining causation, they're also better suited to explanatory than to descriptive purposes. Let's assume, for example, that we want to discover ways of reducing prejudice against African Americans. We hypothesize that learning about the contribution of African Americans to U.S. history will reduce prejudice, and we decide to test this hypothesis experimentally. To begin, we might test a group of experimental subjects to determine their levels of prejudice against African Americans. Next. we might show them a documentary film depicting the many important ways African Americans have contributed to the scientific, literary, political. and social development of the nation. Finally, we would measure our subjects' levels of prejudice against African Americans to determine whether the film has actually reduced prejudice. Experimentation has also been successful in the study of small-group interaction. Thus, we might bring together a small group of experimental subjects and assign them a task, such as making recommendations for popularizing car pools. We observe, then, how the group organizes itself and deals with the problem. Over the course of several such experiments, we might systematically vary the nature of the task or the rewards for handling the task successfully. By observing differences in the way groups organize themselves and operate under these varying conditions, we can learn a great deal about the nature of small-group interaction and the factors that influence it. For example, attorneys sometimes present evidence in different ways to different mock juries, to see which method is the most effective. We typically think of experiments as being conducted in laboratories. Indeed, most of the examples in this chapter involve such a setting. This need not be the case, however. Increasingly, social researchers are using the World Wide Web as a vehicle for conducting experiments. Further, sometimes we can construct what are called natural f),:perimeJ1ts: "experiments" that occur in the regular course of social events. The latter portion of this chapter deals with such research.
222
Chapter 8: Experiments
The Classical Experiment In both the natural and the social sciences, the most conventional type of experiment involves three major pairs of components: (1) independent and dependent variables, (2) pretesting and posttesting, and (3) experimental and control groups, This section looks at each of these components and the way they're put together in the execution of the experiment
Independent and Dependent Variables Essentially, an experiment examines the effect of an independent variable on a dependent variable, Typically, the independent variable takes the form of an experimental stimulus, which is either present or absent That is, the stimulus is a dichotomous variable, having two attributes, present or not present In this typical modeL the experimenter compares what happens when the stimulus is present to what happens when it is not In the example concerning prejudice against African Americans, prejudice is the dependent variable and exposure to Africal1 American history is the independent variable, The researcher'S hypothesis suggests that prejudice depends, in part on a lack of knowledge of African American history. The purpose of the experiment is to test the validity of this hypothesis by presenting some subjects with an appropriate stimulus, such as a documentary film. In other terms, the independent variable is the cause and the dependent variable is the effect Thus, we might say that watching the film caused a change in prejudice or that reduced prejudice was an effect of watching the film. The independent and dependent variables appropriate to experimentation are nearly limitless. Moreover. a given variable might serve as an independent variable in one experiment and as a
pretesting The measurement of a dependent variable among subjects. posttesting The remeasurement of a dependent variable among subjects after they've been exllosed to an independent variable.
The Classical Experiment dependent variable in another., For example, prejudice is the dependent variable in our example, but it might be the independent variable in an experiment examining the effect of prejudice on voting behavior., To be used in an experiment both independent and dependent variables must be operationally defined. Such operational definitions might involve a variety of observation methods. Responses to a questionnaire, for example, might be the basis for defining prejudice. Speaking to or ignoring African Americans, or agreeing or disagreeing with them, might be elements in the operational definition of interaction with African Americans in a smallgroup setting. Conventionally, in the e)"'Perimental modeL dependent and independent variables must be operationally defined before the experiment begins. However, as you'll see in connection with survey research and other methods, it's sometimes appropriate to make a vvide variety of observations during data collection and then determine the most useful operational definitions of variables during later analyses. Ultimately, however, experimentation, like other quantitative methods, requires specific and standardized measurements and observations.
Pretesting and Posttesting In the simplest experimental design, subjects are measured in terms of a dependent variable (pretesting), exposed to a stimulus representing an independent variable, and then remeasured in terms of the dependent variable (posttesting), Any differences between the first and last measurements on the dependent variable are then attributed to the independent variable. In the example of prejudice and exposure to African American history, we'd begin by pretesting the extent of prejudice among our experimental subjects. Using a questionnaire asking about attitudes toward African Americans, for example, we could measure both the extent of prejudice exhibited by each individual subject and the average prejudice level of the whole group. After exposing the subjects to the African American history film, we could administer the same questionnaire again. Responses given in this posttest would permit us to
measure the later extent of prejudice for each subject and the average prejudice level of the group as a whole. If we discovered a lower level of prejudice during the second administration of the questionnaire, we might conclude that the film had indeed reduced prejudice. In the experimental examination of attitudes such as prejudice, we face a special practical problem relating to validity. As you may already have imagined, the subjects might respond differently to the questionnaires the second time even if their attitudes remain unchanged. During the first administration of the questionnaire, the subjects might be unaware of its purpose. By the second measurement, they might have figured out that the researchers were interested in measuring their prejudice. Because no one wishes to seem prejudiced, the subjects might "clean up" their answers the second time around. Thus, the film would seem to have reduced prejudice although, in fact, it had not. This is an example of a more general problem that plagues many forms of social research: The very act of studying something may change it. The techniques for dealing with this problem in the context of experimentation i'.ill be discussed in various places throughout the chapter. The first technique involves the use of control groups.
Experimental and Control Groups Laboratory experiments seldom, if ever, involve only the observation of an experimental group to which a stimulus has been administered. In addition, the researchers also observe a control group, which does not receive the experimental stimulus. In the example of prejudice and African American history, we might examine two groups of subjects. To begin, we give each group a questionnaire designed to measure their prejudice against African An1ericans. Then we show the film only to the experimental group, Finally, we administer a posttest of prejudice to both groups. Figure 8-1 illustrates this basic experimental design. Using a control group allows the researcher to detect any effects of the experiment itself. If the posttest shows that the overall level of prejudice exhibited by the control group has dropped as
Experimental Group Measure dependent variable
223
Control Group Compare: Same?
Measure dependent variable
Compare: Different?
Remeasure dependent variable
Administer experimental stimulus (film)
Remeasure dependent variable
FIGURE 8-1 Diagram of Basic Experimental Design, The fundamental purpose of an experiment is to isolate the possible effect of an independent variable (called the stimulus in experiments) on a dependent variable, Members of the experimental group(s) are exposed to the stimulus, while those in the control group(s) are not. much as that of the experimental group, then the apparent reduction in prejudice must be a function of the experiment or of some external factor rather than a function of the film. If, on the other hand, prejudice is reduced only in the experimental group, this reduction would seem to be a consequence of exposure to the film, because that's the only difference between the two groups. Alternatively, if prejudice is reduced in both groups but to a greater degree in the e)"'Perimental group than in the control group, that, too, would be grounds for assuming that the film reduced prejudice. The need for control groups in social research became clear in connection ,vith a series of studies
experimental group In experimentation, a group of subjects to whom an experimental stimulus is administered. control group In experimentation, a group of subjects to whom no experimental stimulus is administered and who should resemble the exllerimental group in all other respects. The comparison of the control group and the experimental group at the end of the experiment points to the effect of the experimental stimulus.
224
Chapter 8: Experiments
of employee satisfaction conducted by F. J. Roethlisberger and W. 1. Dickson (1939) in the late 1920s and early 1930s. These two researchers were interested in discovering what changes in working conditions would improve employee satisfaction and productivity. To pursue this objective, they studied working conditions in the telephone "bank wiring room" of the Western Electric Works in the Chicago suburb of Hawthorne, Illinois. To the researchers' great satisfaction, they discovered that improving the working conditions increased satisfaction and productivity consistently. As the workroom was brightened up through better lighting, for example, productivity went up. When lighting was further improved, productivity went up again. To further substantiate their scientific conclusion, the researchers then dimmed the lights. Whoops-productivity improved again! At this point it became evident that the wiringroom workers were responding more to the attention given them by the researchers than to improved working conditions. As a result of this phenomenon, often called the Hawthorne effea, social researchers have become more sensitive to and cautious about the possible effects of experiments themselves. In the wiring-room study, the use of a proper control group-one that was studied intensively without any other changes in the working conditionswould have pointed to the presence of this effect. The need for control groups in experimentation has been nowhere more evident than in medical research. Time and again, patients who participate in medical experiments have appeared to improve, but it has been unclear how much of the improvement has come from the experimental treatment and how much from the experiment In testing the effects of new drugs, then, medical researchers frequently administer a p/acebo-a "drug" with no relevant effect. such as sugar pills-to a control group. Thus, the control-group patients believe that
double-blind experiment An experimental design in which neither the subjects nor the experimenters know which is the experimental group and which is the control.
Selecting Subjects
they, like the experimental group, are receiving an experimental drug. Often, they improve. If the new drug is effective, however, those receiving the actual drug will improve more than those receiving the placebo . In social scientific experiments, control groups guard against not only the effects of the experiments themselves but also the effects of any events outside the laboratory during the experiments. In the example of the study of prejudice, suppose that a popular African American leader is assassinated in the middle of, say, a weeklong experiment. Such an event may very well horrify the experimental subjects, requiring them to examine their own attitudes toward African Americans, with the result of reduced prejudice. Because such an effect should happen about equally for members of the control and experimental groups, a greater reduction of prejudice among the experimental group would, again, point to the impact of the experimental stimulus: the documentary film. Sometimes an experimental design requires more than one e)(perinlental or control group. In the case of the documentary film, for example, we might also want to examine the impact of reading a book on African American history. In that case, we might have one group see the film and read the book, another group only see the movie, still another group only read the book, and the control group do neither. With this kind of design, we could determine the impact of each stimulus separately, as well as their combined effect.
The Double-Blind Experiment Like patients who improve when they merely think they're receiving a new drug, sometimes experimenters tend to prejudge results. In medical research, the experimenters may be more likely to "observe" improvements among patients receiving the experimental drug than among those receiving the placebo. (This would be most likely, perhaps, for the researcher who developed the drug.) A double-blind experiment eliminates this possibility, because in this design neither the subjects nor the experimenters know which is the experimental group and which is the controL In the
medical case, those researchers who were responsible for administering the drug and for noting improvements would not be told which subjects were receiving the drug and which the placebo. Conversely, the researcher who knew which subjects were in which group would not administer the experiment. In social scientific experiments, as in medical experiments, the danger of experimenter bias is further reduced to the extent that the operational definitions of the dependent variables are clear and precise. Thus, medical researchers 'would be less likely to unconSciously bias their reading of a patient's temperature than they would be to bias their assessment of how lethargic the patient was. For the same reason, the small-group researcher would be less likely to misperceive which subject spoke, or to whom he or she spoke, than whether the subject's comments sounded cooperative or competitive, a more subjective judgment that's difficult to define in precise behavioral terms. As I've indicated several times, seldom can we devise operational definitions and measurements that are wholly precise and unambiguous. This is another reason why it can be appropriate to employ a double-blind design in social research eX1Jeriments.
Selecting Subjects In Chapter 7 we discussed the logic of sampling, which involves selecting a sample that is representative of some population. Similar considerations apply to experiments. Because most social researchers work in colleges and universities, it seems likely that research laboratory experiments would be conducted with college undergraduates as subjects. Typically, the eX1Jerimenter asks students enrolled in his or her classes to participate in experiments or advertises for subjects in a college newspaper. Subjects mayor may not be paid for participating in such experiments (recall also from Chapter 3 the ethical issues involved in asking students to participate in such studies). In relation to the norm of generalizability in science, this tendency clearly represents a potential
225
defect in social research. Simply put, college undergraduates are not typical of the public at large. There is a danger, therefore, that we may learn much about the attitudes and actions of college undergraduates but not about social attitudes and actions in generaL However, this potential defect is less significant in explanatory research than in descriptive research. True, having noted the level of prejudice among a group of college undergraduates in our pretesting, we would have little confidence that the same level existed among the public at large. On the other hand, if we found that a documentary film reduced whatever level of prejudice existed among those undergraduates, we would have more confidence-without being certain-that it would have a comparable effect in the community at large. Social processes and patterns of causal relationships appear to be more generalizable and more stable than specific characteristics such as an individual's level of prejudice. Aside from the question of generalizability, the cardinal rule of subject selection in experimentation concerns the comparability of experimental and control groups. Ideally, the control group represents what the experimental group would be like if it had /lor been exposed to the experimental stimulus. The logic of experiments requires, therefore, that experimental and control groups be as similar as possible. There are several ways to accomplish this.
Probability Sampling The discussions of the logic and techniques of probability sampling in Chapter 7 provide one method for selecting two groups of people that are similar to each other. Beginning v'lith a sampling frame composed of all the people in the population under study, the researcher might select two probability samples. If these samples each resemble the total population from which they're selected, they'll also resemble each other. Recall also, however, that the degree of resemblance (representativeness) achieved by probability sampling is largely a function of the sample size. As a general guideline, probability samples of less than 100 are not likely to be terribly representative, and
226
Chapter 8: Experiments
social scientific experiments seldom involve that many subjects in either experimental or control groups. As a result, then, probability sampling is seldom used in experiments to select subjects from a larger population. Researchers do, however, use the logic of random selection when they assign subjects to groups.
Randomization Having recruited, by whatever means, a total group of subjects, the experimenter may randomly assign those subjects to either the experimental or the control group. The researcher might accomplish such randomization by numbering all of the subjects serially and selecting numbers by means of a random number table. Alternatively, the experimenter might assign the odd-numbered subjects to the experimental group and the even-numbered subjects to the control group. Let's return again to the basic concept of probability sampling. If we recruit 40 subjects all together, in response to a newspaper advertisement. for example, there's no reason to believe that the 40 subjects represent the entire population from which they've been drawn. Nor can we assume that the 20 subjects randomly assigned to the experimental group represent that larger population. We can have greater confidence, however, that the 20 subjects randomly assigned to the experimental group will be reasonably similar to the 20 assigned to the control group. Follmving the logic of our earlier discussions of sampling, we can see our 40 subjects as a population from which we select two probability sampleseach consisting of half the population. Because each
randomization A technique for assigning experimental subjects to experimental and control groups randomly. matching In connection vvith experiments, the procedure whereby pairs of subjects are matched on the basis of their similarities on one or more variables, and one member of the pair is assigned to the experimental group and the other to the control group.
Selecting Subjects
227
sample reflects the characteristics of the total population, the two samples will mirror each other. As we saw in Chapter 7, our assumption of similarity in the two groups depends in part on the number of subjects involved. In the extreme case, if we recruited only two subjects and assigned, by the flip of a coin, one as the experimental subject and one as the controL there would be no reason to assume that the two subjects are similar to each other. With larger numbers of subjects, however, randomization makes good sense . Control group
6
Matching Another way to achieve comparability between the experimental and control groups is through matching. This process is similar to the quota sampling methods discussed in Chapter 7. If 12 of our subjects are young white men, we might assign 6 of them at random to the experimental group and the other 6 to the control group. If 14 are middle-aged African Anlerican women, we might assign 7 to each group. We repeat this process for every relevant grouping of subjects. The overall matching process could be most efficiently achieved through the creation of a quota matrix constructed of all the most relevant characteristics. Figure 8-2 provides a simplified illustration of such a matrix. In this example, the experimenter has decided that the relevant characteristics are race, age, and gender. Ideally, the quota matrix is constructed to result in an even number of subjects in eadl cell of the matrix. Then, half the subjects in each cell go into the experimental group and half into the control group. Alternatively, we might recruit more subjects than our experimental design requires. We might then examine many characteristics of the large initial group of subjects. Whenever we discover a pair of quite similar subjects, we might assign one at random to the experimental group and the other to the control group. Potential subjects who are unlike anyone else in the initial group might be left out of the experiment altogether. Whatever method we employ, the desired result is the same. The overall average description of
7
etc.
FIGURE 8-2 Quota Matrix Illustration, Sometimes the experimental and control groups are created by finding pairs of matching subjects and aSSigning one to the experimental group and the other to the control group.
the experimental group should be the same as that of the control group. For example, on average both groups should have about the same ages, the same gender composition, the same racial composition, and so forth. This test of comparability should be used whether the two groups are created through probability sampling or through randomization. Thus far I've referred to the "relevant" variables without saying dearly what those variables are. Of course, these variables cannot be specified in any definite way, any more than I could specify in Chapter 7 which variables should be used in stratified sampling. Which variables are relevant ultimately depends on the nature and purpose of an experimenlo As a general rule, however, the control and experimental groups should be comparable in terms of those variables that are most likely to be related to the dependent variable under study. In a study of prejudice, for example, the two groups should be alike in terms of education, etlmicity, and age, among other characteristics. In some cases, moreover, we may delay assigning subjects to experimental and control groups until we have initially measured the dependent variable. Thus, for example, we might administer a questionnaire measuring subjects' prejudice and then match the
experimental and control groups on this variable to assure ourselves that the two groups exhibit the same overall level of prejudice.
Matching or Randomization? When assigning subjects to the experimental and control groups, you should be aware of two arguments in favor of randomization over matching. First. you may not be in a position to know in advance which variables will be relevant for the matching process . Second, most of the statistics used to analyze the results of experiments assume randomization. Failure to design your experiment that way, then, makes your later use of those statistics less meaningfuL On the other hand, randomization only makes sense if you have a fairly large pool of subjects, so that the laws of probability sampling apply. With only a few subjects, matching would be a better procedure. Sometimes researchers can combine matching and randomization. When conducting an experiment on the educational enridmlent of young adolescents, for example, J. Milton Yinger and his colleagues (1977) needed to assign a large number of
228
Chapter 8: Experiments
students, aged 13 and 14, to several different experimental and control groups to ensure the comparability of students composing each of the groups. They achieved this goal by the following method. Beginning with a pool of subjects, the researchers first created strata of students nearly identical to one another in terms of some 15 variabIes . From each of the strata, students were randomly assigned to the different experimental and control groups. In this fashion, the researchers actually improved on conventional randomization. Essentially, they had used a stratified-sampling procedure (Chapter 7). except that they had employed far more stratification variables than are typically used in, say, survey sampling. Thus far I've described the classical experiment-the experimental design that best represents the logic of causal analysis in the laboratory. In practice, however, social researchers use a great variety of experimental designs. Let's look at some now.
Variations on Experimental DeSign Donald Campbell and Julian Stanley (1963), in a classic book on research design, describe some 16 different experimental and quasi-experimental designs. This section describes some of these variations to better show the potential for experimentation in social research.
Preexperimental Research DeSigns To begin, Campbell and Stanley discuss three "preexperimental" designs, not to recommend them but because they're frequently used in less-thanprofessional research. These designs are called "preexperimental" to indicate that they do not meet the scientific standards of experimental designs. In the first such design-the one-shot case study-the researcher measures a single group of subjects on a dependent variable following the administration of some experimental stimulus. Suppose, for example, that we show the African American history
Variations on Experimental Design film, mentioned earlier, to a group of people and then administer a questionnaire that seems to measure prejudice against African Americans. Suppose further that the answers given to the questionnaire seem to represent a low level of prejudice. We might be tempted to conclude that the film reduced prejudice. Lacking a pretest, however, we can't be sure. Perhaps the questionnaire doesn't really represent a sensitive measure of prejudice, or perhaps the group we're studying was low in prejudice to begin with. In either case, the film might have made no difference, though our experimental results might have misled us into thinking it did. The second preexperimental design discussed by Campbell and Stanley adds a pretest for the experimental group but lacks a control group. This design-which the authors call the one-group pretestposrtest design-suffers from the possibility that some factor other than the independent variable might cause a change between the pretest and posttest results, such as the assassination of a respected African American leader. Thus, although we can see that prejudice has been reduced, we can't be sure that the film is what caused that reduction. To round out the possibilities for preexperimental designs, Campbell and Stanley point out that some research is based on experimental and control groups but has no pretests. They call this design the sratic-group comparison. For example, we might show the African American history film to one group and not to another and then measure prejudice in both groups. If the experimental group had less prejudice at the conclusion of the experiment, we might assume the film was responsible. But unless we had randomized our subjects, we would have no way of knovving that the two groups had the same degree of prejudice initially; perhaps the experimental group started out "l'vith less. Figure 8-3 graphically illustrates these three preexperimental research designs by using a different research question: Does exercise cause weight reduction? 10 make the several designs clearer, the figure shows individuals rather than groups, but the same logic pertains to group comparisons. Let's review the three preexperimental designs in this new example.
229
One-Shot Case Study A man who exercises is observed to be in trim shape
Some intuitive standard of what constitutes a trim shape Time 1
Time 2
Time 3
Time 1
Time 2
Time 3
Time 1
Time 2
Time 3
One-Group Pretest-PosHest Design An overweight man who exercises is later observed to be in trim shape
Static-Group Comparison A man who exercises is observed to be in trim shape while one who doesn't is observed to be overweight
FIGURE 8-3 Three Preexperimental Research Designs. These preexperimental designs anticipate the logic of true experiments but leave themselves open to errors of interpretation, Can you see the errors that might be made in each of these designs? The various risks are solved by the addition of control groups, pretesting, and postlesting.
The one-shot case study represents a common form of logical reasoning in everyday life. Asked whether exercise causes weight reduction, we may bring to mind an example that would seem to support the proposition: someone who exercises and is thin. There are problems with this reasoning, however. Perhaps the person was thin long before
beginning to exercise. Or perhaps he became thin for some other reason, like eating less or getting sick. The observations shown in the diagram do not guard against these other possibilities. Moreover, the observation that the man in the diagram is in trim shape depends on our intuitive idea of what constitutes trim and overweight body
230
Chapter 8: Experiments
shapes. All told, this is very weak evidence for testing the relationship between exercise and 'weight loss. The one-group pretest-posttest design offers somewhat better evidence that exercise produces weight loss. Specifically, we've ruled out the possibility that the man was thin before beginning to exercise. However, we still have no assurance that his exercising is what caused him to lose weight. Finally, the static-group comparison eliminates the problem of our questionable definition of what constitutes trim or overweight body shapes. In this case, we can compare the shapes of the man who exercises and the one who does not. This design, however, reopens the possibility that the man who exercises was thin to begin with.
Validity Issues in Experimental Research At this point I want to present in a more systematic way the factors that affect the validity of experimental research. First we'll look at what Campbell and Stanley call the sources of il1ternal invalidity, reviewed and expanded in a follow-up book by Thomas Cook and Donald Campbell (1979). Then we'll consider the problem of generalizing experimental results to the "real" world, referred to as eXTernal invalidity. Having examined these, we'll be in a position to appreciate the advantages of some of the more sophisticated experimental and quasiexperimental designs social science researchers sometimes use.
Sources of Intemallnvalidity The problem of internal invalidity refers to the possibility that the conclusions drawn from experimental results may not accurately reflect what has
internal invalidity Refers to the possibility that the conclusions drawn from experimental results may not accurately reflect what went on in the experiment itself.
Variations on Experimental Design
gone on in the e;qJeriment itself. The threat of internal invalidity is present whenever anything other than the experimental stimulus can affect the dependent variable. Campbell and Stanley (1963: 5-6) and Cook and Campbell (1979: 51-55) point to several sources of internal invalidity. Here are 12: 1. HistOlY. During the course of the eA'Periment, historical events may occur that will confound the eA'Perimental results. The assassination of an African American leader during the course of an experiment on reducing anti-African American prejudice is one example; the arrest of an African American leader for some heinous crime, which might increase prejudice, is another. 2. Maturation. People are continually growing and changing, and such changes can affect the results of the eA'Periment. In a long-term experiment, the fact that the subjects grow older (and wiser?) may have an effect. In shorter experiments, they may grow tired, sleepy, bored, or hungry, or change in other ways that affect their behavior in the experiment. 3. TeSTing. As we have seen, often the process of testing and retesting influences people's behavior, thereby confounding the experinlental results. Suppose we administer a questionnaire to a group as a way of measuring their prejudice. Then we administer an experimental stimulus and remeasure their prejudice. By the time we conduct the posttest, the subjects 'will probably have become more sensitive to the issue of prejudice and will be more thoughtful in their answers. In fact, they may have figured out that we're trying to find out how prejudiced they are, and, because few people like to appear prejudiced, they may give answers that they think we want or that will make them look good. 4. I11STrUmentatioll. The process of measurement in pretesting and posttesting brings in some of the issues of conceptualization and operationalization discussed earlier in the book. If we use different measures of the dependent variable in the pretest and posttest (say, different questionnaires about prejudice), how can we be sure tney're comparable to each other? Perhaps prejudice will seem to decrease simply because the pretest measure was
more sensitive than the posttest measure. Or if the measurements are being made by the experimenters, their standards or their abilities may change over the course of the experiment. 5.. Starisrical regression Sometimes it's appropriate to conduct experiments on subjects who start out with extreme scores on the dependent variable. If you were testing a new method for teaching math to hard-core failures in math, you'd want to conduct your experiment on people who previously had done extremely poorly in math. But consider for a minute what's likely to happen to the math achievement of such people over time without any experimental interference. They're starting out so low that they can only stay at the bottom or inlprove: They can't get worse. Even without any experimental stimulus, then, the group as a whole is likely to show some improvement over time. Referring to a regressioll TO rlze mea11, statisticians often point out that extremely tall people as a group are likely to have children shorter than themselves, and extremely short people as a group are likely to have children taller than themselves. There is a danger, then, that changes occurring by virtue of subjects starting out in extreme positions will be attributed erroneously to the effects of the experimental stimulus. 6. Selectio11 biases. We discussed selection bias earlier when we examined different ways of selecting subjects for experiments and assigning them to experimental and control groups. Comparisons don't have any meaning unless the groups are comparable at the start of an experiment. 7. Experimemalmorrality. Although some social experiments could, I suppose, kill subjects, experimemallllorrality refers to a more general and lessextreme problem. Often, experimental subjects will drop out of the experiment before it's completed, and this can affect statistical comparisons and conclusions. In the classical experiment involving an experimental and a control group, each with a pretest and posttest, suppose that the bigots in the experimental group are so offended by the African American history film that they tell the experimenter to forget it, and they leave. Those subjects sticking around for the posttest will have been less
231
prejudiced to stan with, so the group results 'will reflect a substantial "decrease" in prejudice. S. Causal rime order. Though rare in social research, ambiguity about the time order of the experimental stimulus and the dependent variable can arise. Whenever this occurs, the research conclusion that the stimulus caused the dependent variable can be challenged with the explanation that the "dependent" variable actually caused changes in the stimulus. 9. DUfllSiol1 or imitariol1 of treatments. When experimental and control-group subjects can communicate with each other, experimental subjects may pass on some elements of the experimental stimulus to the control group. For example, suppose there's a lapse of time between our shOwing of the African American history film and the posttest administration of the questionnaire. Members of the experimental group might tell control-group subjects about the film. In that case, the control group becomes affected by the stimulus and is not a real control. Sometimes we speak of the control group as having been "contaminated." 10. COlllpcllSarioll. As you'll see in Chapter 12, in experiments in real-life situations-such as a special educational program-subjects in the control group are often deprived of something considered to be of value. In such cases, there may be pressures to offer some form of compensation. For example, hospital staff might feel sorry for controlgroup patients and give them extra "tender loving care." In such a situation, the control group is no longer a genuine control group. 11. COlllpensarOlY rivaby. In real -life experiments, the subjects deprived of the experimental stimulus may try to compensate for the missing stimulus by working harder. Suppose an experimental math program is the experimental stinlUlus; the control group may work harder than before on their math in an attempt to beat the "special" experimental subjects. 12. Demoralizatiol1. On the other hand, feelings of deprivation within the control group may result in their giving up . In educational experiments, demoralized control-group subjects may stop studying, act up, or get angry.
232
Chapter 8: Experiments Pretest
Variations on Experimental Design Stimulus
Postiest
Experimental Group
233
diffusion or imitation of treatments. Administrative controls can avoid compensations given to the control group, and compensatory rivalry can be watched for and taken into account in evaluating the results of the ex-periment, as can the problem of demoralization.
Sources of Exlemallnvalidily
Control Group
FIGURE 8-4 The Classical Experiment: Using an African American History Film to Reduce Prejudice. TIlis diagram illustrates the basiC structure of the classical experiment as a vehicle for testing the impact of a film on prejudice. Notice how the control group, the pretesting, and the posttesting function.
These, then, are some of the sources of internal invalidity in experiments. Aware of these, experimenters have devised designs aimed at handling them. The classical e:.:periment, if coupled with proper subject selection and assignment, addresses each of these problems. Let's look again at that study design, presented graphically in Figure 8-4. If we use the experimental design shoH'I1 in Figure 8-4, we should expect two findings. For the experimental group, the level of prejudice measured in their posttest should be less than was found in their pretest. In addition, when the two posttests are compared, less prejudice should be found in the experin1ental group than in the control group. This design also guards against the problem of history in that anything occurring outside the experiment that might affect the experimental group should also affect the control group. Consequently, there should still be a difference in the two posttest results. The same comparison guards against problems of maturation as long as the subjects have been randomly assigned to the two groups. Testing
and instrumentation can't be problems, because both the experimental and control groups are subject to the same tests and experimenter effects. If the subjects have been assigned to the two groups randomly, statistical regression should affect both equally, even if people with extreme scores on prejudice are being studied. Selection bias is ruled out by the random assignment of subjects. Experimental mortality is more complicated to handle, but the data provided in this study design offer several ways to deal with it. Slight modifications to the design-administering a placebo (such as a film having nothing to do with African Americans) to the control group, for example-can make the problem even easier to manage. The remaining five problems of internal invalidity are avoided through the careful administration of a controlled experimental design. The experimental design we've been discussing facilitates the dear specification of independent and dependent variables. Ex-perimental and control subjects can be kept separate, reducing the possibility of
Internal invalidity accounts for only some of the complications faced by experimenters. In addition, there are problems of what Campbell and Stanley call external inValidity, which relates to the generalizability of ex-perimental findings to the "real" world . Even if the results of an ex-periment provide an accurate gauge of what happened during that experiment do they really teU us anything about life in the wilds of society? Campbell and Stanley describe four forms of this problem; I'll present one as an illustration. The generalizability of experimental findings is jeopardized, as the authors point out if there's an interaction between the testing situation and the experimental stimulus (1963: 18). Here's an example of what they mean. Staying with the study of prejudice and the African American history film, let's suppose that our experimental group-in the classical experimenthas less prejudice in its posttest than in its pretest and that its posttest shows less prejudice than that of the control group. We can be confident that the film actually reduced prejudice among our experimental subjects. But would it have the same effect if the film were shown in theaters or on television? We can't be sure, because the film might be effective only when people have been sensitized to the issue of prejudice, as the subjects may have been in taking the pretest. This is an example of interaction between the testing and the stimulus. The classical experimental design cannot control for that possibility. Fortunately, experimenters have devised other designs that can. The So!olllonfouT-group design (D. Campbell and Stanley 1963: 24-25) addresses the problem of testing interaction with the stimulus. As the name suggests, it involves four groups of subjects, assigned randomly from a pool. Figure 8-5 presents this design graphically.
Stimulus (film)
No stimulus
=====TIME=_ _- ' Expected Findings In Group 1, postlest prejudice should be less than pretest prejudice. In Group 2, prejudice should be the same in the pretest and the postlest The Group 1 postlest should show less prejudice than the Group 2 postlest The Group 3 postlest should show less prejudice than the Group 4 postles!
FIGURE 8-5 The Solomon Four-Group Design. The classical experiment runs the risk that pretesting will have an effect on subjects, so the Solomon Four Group Design adds experimental and control groups that skip the pretest.
Notice that Groups land 2 in Figure 8-5 compose the classical experiment, with Group 2 being the control group. Group 3 is administered the experimental stimulus without a pretest and Group 4 is only posttested. This ex-perimental design permits four meaningful comparisons, which are described in the figure. If the African American history film really reduces prejudice-unaccounted for by the problem of internal validity and unaccounted for by an interaction between the testing
external invalidity Refers to the possibility that conclusions drawn from experimental results may not be generalizable to the "reaJ" world.
234
and the stimulus-we should expect four findings: L
An Illustration of Experimentation
Chapter 8: Experiments
In Group L posttest prejudice should be less than pretest prejudice.
2. In Group 2, prejudice should be the same in the pretest and the posttest. 3. The Group 1 posttest should show less prejudice than the Group 2 posttest. 4. The Group 3 posttest should show less prejudice than the Group 4 posttest. Notice that finding (4) rules out any interaction between the testing and the stimulus. And remember that these comparisons are meaningful only if subjects have been assigned randomly to the different groups, thereby providing groups of equal prejudice initiaily, even though their preexperimental prejudice is only measured in Groups 1 and 2. There is a side benefit to this research design, as the authors point out. Not only does the Solomon four-group design rule out interactions between testing and the stimulus, it also provides data for comparisons that \vill reveal how much of this interaction has occurred in a classical experiment This knowledge allows a researcher to review and evaluate the value of any prior research that used the simpler design. The last experimental design I'll mention here is what Campbell and Stanley (1963: 25-26) call the posttest-only control group design; it consists of the second half-Groups 3 and 4-of the Solomon design. As the authors argue persuasively, vvith proper randomization, only Groups 3 and 4 are needed for a true experiment that controls for the problems of internal invalidity as well as for the interaction between testing and stimulus. With randomized assignment to experimental and control groups (which distinguishes this design from the static-group comparison discussed earlier), the subjects will be initially comparable on the dependent variable-comparable enough to satisfy the conventional statistical tests used to evaluate the results-so it's not necessary to measure them. Indeed, Campbell and Stanley suggest that the only justification for pretesting in this situation is tradition. Experimenters have simply grown accustomed to pretesting and feel more secure with research
designs that include it Be clear, however, that this point applies only to experiments in which subjects have been assigned to experimental and control groups randomly, because that's what justifies the assumption that the groups are equivalent without having been measured to find out. This discussion has introduced the intricacies of experimental design, its problems, and some solutions. There are, of course, a great many other experimental designs in use. Some involve more than one stimulus and combinations of stimuli. Others involve several tests of the dependent variable over time and the administration of the stimulus at different times for different groups. If you're interested in pursuing this topic, you might look at the Campbell and Stanley book.
An Illustration of Experimentation Experiments have been used to study a ,'\'ide variety of topics in the social sciences. Some experiments have been conducted within laboratory situations; others occur out in the "real world." The folloyving discussion provides a glimpse of both. Let's begin with a "real world" example . In George Bernard Shaw's well-loved play, Pygmalion-the basis of the long-running Broadway musical, My Fair Lady-Eliza Doolittle speaks of the powers others have in determining our social identity. Here's how she distinguishes the way she's treated by her tutor, Professor Higgins, and by Higgins's friend, Colonel Pickering: You see, really and truly, apart from the things anyone can pick up (the dressing and the proper way of speaking, and so on), the difference between a lady and a flower girl is not how she behaves, but how she's treated. I shall always be a flower girl to Professor Higgins, because he always treats me as a flower girl, and always will, but I know I can be a lady to you, because you always treat me as a lady, and always wilL (Act V)
The sentiment Eliza expresses here is basic social science, addressed more formally by sociologists such as Charles Horton Cooley (the "looking-glass
self") and George Herbert Mead ("the generalized other"). The basic point is that who we think we are-our self-concept-and how we behave are largely a function of how others see and treat us. Related to this, the way others perceive us is largely conditioned by expectations they have in advance. If they've been told we're stupid, for example, they're likely to see us that way-and we may come to see ourselves that way and actually act stupidly "Labeling theory" addresses the phenomenon of people acting in accord with the ways that others perceive and label them. These theories have served as the premise for numerous movies, such as the 1983 film Trading Places, in which Eddie Murphy and Dan Ackroyd playa derelict converted into a stockbroker and vice versa. The tendency to see in others what we've been led to expect takes its name from Shaw's play. Called the Pygmalion effect, it's nicely suited to controlled experiments. In one of the best-known experimental investigations of the Pygmalion effect, Robert Rosenthal and Lenore Jacobson (l968) administered what they called the "Harvard Test of Inflected Acquisition" to students in a West Coast school. Subsequently, they met with the students' teachers to present the results of the test In particular, Rosenthal and Jacobson identified certain students as very likely to exhibit a sudden spurt in academic abilities during the coming year, based on the results of the test. When IQ test scores were compared later, the researchers' predictions proved accurate. The students identified as "spurters" far exceeded their classmates during the following year, suggesting that the predictive test was a powerful one. In fact, the test was a hoax! The researchers had made their predictions randomly among both good and poor students. What they told the teachers did not really reflect students' test scores at alL The progress made by the "spurters" was simply a result of the teachers expecting the improvement and paying more attention to those students, encouraging them, and rewarding them for achievements. (Notice the similarity between this situation and the Hawthorne effect discussed earlier in this chapteL) The Rosenthal-Jacobson study attracted a great deal of popular as well as scientific attention.
235
Subsequent experiments have focused on specific aspects of what has become known as the attriburion process, or the expectatiollS communication model. This research, largely conducted by psychologists, parallels research primarily by sociologists, which takes a slightly different focus and is often gathered under the label expectations-states theory. Psychological studies focus on situations in which the expectations of a dominant individual affect the performance of subordinates-as in the case of a teacher and students, or a boss and employees. The sociological research has tended to focus more on the role of expectations among equals in smail, taskoriented groups. In a jury, for example, how do jurors initially evaluate each other, and how do those initial assessments affect their later interactions? (You can learn more about this phenomenon, including attempts to find practical applications, by searching the web for "Pygmalion Effect.") Here's an example of an experiment conducted to examine the way our perceptions of our abilities and those of others affect our willingness to accept the other person's ideas. Martha Foschi, G. Keith Warriner, and Stephen Hart (1985) were particularly interested in the role "standards" play in that respect: In general terms, by "standards" we mean how well or how poorly a person has to perform in order for an ability to be attributed or denied him/her. In our view, standards are a key variable affecting how evaluations are processed and what expectations result For example, depending on the standards used, the same level of success may be interpreted as a major accomplishment or dismissed as unimportant (1985 108-9)
To begin examining the role of standards, the researchers designed an experiment involving four eX'J)erimental groups and a control. Subjects were told that the experiment involved something cailed "pattern recognition ability," defined as an innate ability some people had and others didn't. The researchers said subjects would be working in pairs on pattern recognition problems" In fact, of course, there's no such thing as pattern recognition ability. The object of the
236
Chapter 8: Experiments
experiment was to determine how information about this supposed ability affected subjects' subsequent behavior. The first stage of the experiment was to "test" each subject's pattern recognition abilities. If you had been a subject in the experin1ent, you would have been shovvn a geometrical pattern for 8 seconds, followed by two more patterns, each of which was similar to but not the same as the first one. Your task would be to choose which of the subsequent set had a pattern closest to the first one you saw. You would be asked to do this 20 times, and a computer would print out your "score." Half the subjects would be told that they had gotten 14 correct; the other half would be told that they had gotten only 6 correct-regardless of which patterns they matched vvith which. Depending on the luck of the draw, you would think you had done either quite well or quite badly. Notice, however, that you wouldn't really have any standard for judging your performance-maybe getting 4 correct would be considered a great performance. At the same tin1e you were given your score, however, you would also be given your "partner's score," although both the "partners" and their "scores" would also be computerized fictions. (Subjects were told they would be communicating with their partners via computer terminals but would not be allowed to see each otheL) If you were assigned a score of 14, you would be told your partner had a score of 6; if you were assigned 6, you would be told your partner had 14 . This procedure meant that you would enter the teamwork phase of the experiment believing either (1) you had done better than your partner or (2) you had done worse than your partner. This information constituted part of the "standard" you would be operating under in the experiment. In addition, half of each group was told that a score of between 12 and 20 meant the subject definitely had pattern recognition ability; the other subjects were told that a score of 14 wasn't really high enough to prove anything definite. Thus, you would emerge from this with one of the following beliefs: 1. You are definitely better at pattern recognition
than your partner.
Alternative Experimental Settings 2. You are possibly better than your partner. 3. You are possibly WOloe than your partner. 4. You are definitely worse than your partner. The control group for this experiment was told nothing about their own abilities or their partners'. In other words, they had no expectations. The final step in the e}q)eriment was to set the "teams" to work. As before, you and your partner would be given an initial pattern, followed by a comparison pair to choose from. When you entered your choice in this round, however, you would be told what your partner had answered; then you would be asked to choose again. In your final choice, you could either stick with your original choice or switch. The "partner's" choice was, of course, created by the computer, and as you can guess, there were often a disagreements in the teams: 16 out of 20 times, in fact The dependent variable in this experiment was the extent to which subjects would switch their choices to match those of their partners. The researchers hypothesized that the definitely better group would svvitch least often, followed by the probably better group, followed by the control group, followed by the probably worse group, followed by the definirely worse group, who would s,vitch most often. The number of times subjects in the five groups switched their answers follows. Realize that each had 16 opportunities to do so. These data indicate that each of the researchers' ex-pectations was correct-with the exception of the comparison between the possibly worse and definitely worse groups. Although the latter group was in fact the more likely to switch, the difference was too small to be taken as a confirmation of the hypothesis. (Chapter 16 will discuss the statistical tests that let researchers make decisions like this.) Mean Number ofSwitches
Definitely better Possibly better Control group Possibly worse Definitely worse
5.05 6.23 7.95 9.23 9.28
In more-detailed analyses, it was found that the same basic pattern held for both men and women, though it was somewhat clearer for women than for men. Here are the actual data: Mean Number of Switches
Definitely better Possibly better Control group Possibly worse Definitely worse
Women
Men
4.50 6.34 7.68 9.36 10.00
5.66 6.10 8.34 9.09 8.70
Because specific research efforts like this one sometimes seem extremely focused in their scope, you might wonder about their relevance to anything. As part of a larger research effort, however, studies like this one add concrete pieces to our understanding of more general social processes. It's worth taking a minute or so to consider some of the life situations where "expectation states" might have very real and important consequences. I've mentioned the case of jury deliberations. How about all forms of prejudice and discrimination? Or, consider how expectation states figure into job interviews or meeting your heartthrob's parents. If you think about it, you'll undoubtedly see other situations where these laboratory concepts apply in real life.
Alternative Experimental Settings Although we tend to equate the terms experiment and laboratOlyexperiment, many important social scientific experiments occur outside controlled settings, such as on the Internet or in "real life."
Web-Based Experiments Increasingly, researchers are using the World Wide Web as a vehicle for conducting experiments. Because representative samples are not essential in
237
most experiments, researchers can often use volunteers who respond to invitations online. Here are two sites you might visit to get a better idea of this form of experimentation.