The use of psychological testing for treatment planning and outcomes assessment

cover title: author: publisher: isbn10 | asin: print isbn13: ebook isbn13: language: subject publication date: lcc: dd

3,970 119 14MB

Pages 1792 Page size 842 x 1191 pts (A3) Year 2011

Report DMCA / Copyright

DOWNLOAD FILE

Recommend Papers

The Use of Psychological Testing for Treatment Planning and Outcomes Assessment: Volume 2: Instruments for Children and Adolescents (The Use of Psychological ... Planning and Outcomes Assessment, Volume 2)

The Use of Psychological Testing for Treatment Planning and Outcomes Assessment Third Edition Volume 2 Instruments for C

1,440 737 45MB Read more

The Use of Psychological Testing for Treatment Planning and Outcomes Assessment: Volume 1: General Considerations

The Use of Psychological Testing for Treatment Planning and Outcomes Assessment Third Edition Volume 1 General Considera

1,463 543 29MB Read more

Dictionary of Psychological Testing, Assessment and Treatment

by the same author The Psychology of Ageing An Introduction 4th edition ISBN 978 1 8431 0426 1 An Asperger Dictionar

2,004 1,165 1MB Read more

Treatment Planning for Traumatized Teeth

729 482 13MB Read more

Psychological Testing and Assessment: An Introduction to Tests & Measurement

1,608 302 5MB Read more

Psychological Testing and Assessment: An Introduction to Tests & Measurement

1,480 620 5MB Read more

Strategic Environmental Assessment in Transport and Land Use Planning

772 458 4MB Read more

Psychological Testing: Principles, Applications, and Issues

SEVENTH EDITION Psychological Testing Principles, Applications, and Issues This page intentionally left blank SEVEN

19,303 14,965 11MB Read more

Essentials of WISC-IV Assessment (Essentials of Psychological Assessment)

3,295 202 2MB Read more

Performance Assessment for Field Sports: Physiological, Psychological and Match Notational Assessment in Practice

PERFORMANCE ASSESSMENT FOR FIELD SPORTS It has become standard practice for students of sports and exercise science to

659 151 3MB Read more

File loading please wait...

Citation preview

cover

title: author: publisher: isbn10 | asin: print isbn13: ebook isbn13: language: subject

publication date: lcc: ddc: subject:

next page >

The Use of Psychological Testing for Treatment Planning and Outcome Assessment Maruish, Mark E. Lawrence Erlbaum Associates, Inc. 0805827617 9780805827613 9780585176666 English Psychological tests, Mental illness--Diagnosis, Psychiatry--Differential therapeutics, Mental illness-Treatment--Evaluation, Psychiatric rating scales, Outcome assessment (Medical care) , Outcome assessment 1999 RC473.P79U83 1999eb 616.89/075 Psychological tests, Mental illness--Diagnosis, Psychiatry--Differential therapeutics, Mental illness-Treatment--Evaluation, Psychiatric rating scales, Outcome assessment (Medical care) , Outcome assessment

cover

next page >

< previous page

page_iii

next page > Page iii

The Use of Psychological Testing for Treatment Planning and Outcomes Assessment Second Edition Edited by Mark E. Maruish United Behavioral Health

< previous page

page_iii

next page >

< previous page

page_iv

next page > Page iv

Copyright © 1999 by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of this book may be reproduced in any form, by photostat, microfilm, retrieval system, or any other means, without prior written permission of the publisher. Lawrence Erlbaum Associates, Inc., Publishers 10 Industrial Avenue Mahwah, NJ 07430 Cover design by Kathryn Houghtaling Lacey Library of Congress Cataloging-in-Publication Data The use of psychological testing for treatment planning and outcomes assessment / edited by Mark E. Maruish.2nd ed. p. cm. Includes bibliographical references and index. ISBN 0-8058-2761-7 (cloth: alk. paper) 1. Psychological tests. 2. Mental illnessDiagnosis. 3. Psychiatry Differential therapeutics. 4. Mental illnessTreatmentEvaluation. 5. Psychiatric rating scales. 6. Outcome assessment (Medical care). 7. Outcome assessment. I. Maruish, Mark E. (Mark Edward) RC473.P79U83 1998 616.89'075dc21 98-34838 CIP Books published by Lawrence Erlbaum Associates are printed on acid-free paper, and their bindings are chosen for strength and durability. Printed in the United States of America 10 9 8 7 6 5 4 3 2 1

< previous page

page_iv

next page >

< previous page

page_v

next page > Page v

CONTRIBUTORS Thomas M. Achenbach University of Vermont Grace G. Aikman Texas A&M University Ross B. Andelman University of California San Francisco Robert P. Archer Eastern Virginia Medical School C. Clifford Attkisson University of California San Francisco Cheryl A. Barber University of Minnesota Larry E. Beutler University of California, Santa Barbara Jacque Bieber American Academy of Neurology Gary M. Burlingame Brigham Young University James N. Butcher University of Minnesota Daniel Carpenter Merit Behavioral Care Corporation and Cornell University College of Medicine Dianne L. Chambless University of North Carolina at Chapel Hill James A. Ciarlo University of Denver James R. Clopton Texas Tech University C. Keith Conners Duke University Medical Center Melissa A. Culhane McLean Hospital Gayle A. Dakof University of Miami School of Medicine Allyson Ross Davies Independent Health Care Consultant Roger D. Davis Institute for Advanced Studies in Personology and Psychopathology Edwin de Beurs University of North Carolina at Chapel Hill Leonard R. Derogatis Clinical Psychometric Research, Inc., and Loyola College of Maryland Susan V. Eisen McLean Hospital William O. Faustman Palo Alto Veterans Affairs Health Care System and Stanford University School of Medicine Arthur E. Finch Brigham Young University

Daniel Fisher University of California, Santa Barbara Raymond D. Fowler American Psychological Association Michael B. Frisch Baylor University Anthony B. Gerard Western Psychological Services Antonio Goncalves University of Miami and Institute for Advanced Studies in Personology and Psychopathology Ginger Goodrich University of California, Santa Barbara Roger L. Greene Pacific Graduate School of Psychology Thomas K. Greenfield University of California San Francisco Steven R. Hahn Albert Einstein College of Medicine and Jacobi Medical Center Nancy M. Hatcher University of Georgia George Henly University of North Dakota Kay Hodges Eastern Michigan University L. Michael Honaker American Psychological Association R. W. Kamphaus University of Georgia Joel Katz University of Toronto Randy Katz University of Toronto Kenneth A. Kobak Dean Foundation Mark Kosinski The Health Institute at New England Medical Center and Quality Metric, Inc.

< previous page

page_v

next page >

< previous page

page_vi

next page > Page vi

Maria Kovacs University of Pittsburgh, School of Medicine Kurt Kroenke Regenstrief Institute for Health Care and Indiana University School of Medicine Samuel E. Krug MetriTech, Inc. David Lachar University of Texas-Houston Medical School John M. Lambert University of Utah Michael J. Lambert Brigham Young University Jeanne M. Landgraf HealthAct William W. Latimer University of Minnesota, Minneapolis Larry L. Lynn Clinical Psychometric Research, Inc., and Loyola College of Maryland John S. March Duke University Medical Center Brian J. Marsh University of South Florida Mark E. Maruish United Behavioral Health Sarah E. Meagher University of Miami and Institute for Advanced Studies in Personology and Psychopathology Theodore Millon Harvard University, University of Miami, and Institute for Advanced Studies in Personology and Psychopathology Kevin Moreland Fordham University Leslie C. Morey Vanderbilt University Jack A. Naglieri Ohio State University Frederick L. Newman Florida International University John E. Overall The University of Texas Health Science Center at Houston Ashley E. Owen University of South Florida Carleton A. Palmer University of North Carolina at Chapel Hill James D.A. Parker Trent University Julia N. Perry University of Minnesota Steven I. Pfeiffer Duke University Cecil R. Reynolds Texas A&M University

William M. Reynolds University of British Columbia Abram B. Rosenblatt University of California San Francisco Kathryn L. Savitz Clinical Psychometric Research, Inc., and Loyola College of Maryland Brian F. Shaw University of Toronto Gill Sitarenios Multi-Health Systems, Inc. Douglas K. Snyder Texas A&M University Charles D. Spielberger University of South Florida Robert L. Spitzer New York State Psychiatric Institute and Columbia University Randy D. Stinchfield University of Minnesota, Minneapolis Sumner J. Sydeman University of South Florida Manuel J. Tejeda Gettysburg College John E. Ware, Jr. Qualtity Metric, Inc., Health Assessment Lab, Tufts University School of Medicine, and Harvard School of Public Health Irving B. Weiner University of South Florida M. Gawain Wells Brigham Young University Janet B.W. Williams New York State Psychiatric Institute and Columbia University Oliver B. Williams University of California, Santa Barbara Kimberly A. Wilson University of North Carolina at Chapel Hill Ken C. Winters University of Minnesota, Minneapolis Mark Woodward University of Miami and Institute for Advanced Studies in Personology and Psychopathology Jill M. Wroblewski Strategic Advantage, Inc., Minneapolis, MN Bonnie T. Zima University of California Los Angeles

< previous page

page_vi

next page >

< previous page

next page >

page_vii

Page vii CONTENTS Preface

xiii

Part I: General Considerations 1. Introduction Mark E. Maruish

1

2. Psychological Tests in Screening for Psychiatric Disorder Leonard R. Derogatis and Larry L. Lynn

41

3. Use of Psychological Tests/Instruments for Treatment Planning Larry E. Beutler, Ginger Goodrich, Daniel Fisher, and Oliver B. Williams

81

4. Use of Psychological Tests for Assessing Treatment Outcome Michael J. Lambert and John M. Lambert

115

5. Guidelines for Selecting Psychological Instruments for Treatment Planning and Outcome Assessment Frederick L. Newman, James A. Ciarlo, and Daniel Carpenter

153

6. Design and Implementation of an Outcomes Management System Within Inpatient and Outpatient Behavioral Health Settings 171 Jacque Bieber, Jill M. Wroblewski, and Cheryl A. Barber 7. Progress and Outcome Assessment of Individual Patient Data: Selecting Single-Subject Design and Statistical Procedures Frederick L. Newman and Gayle A. Dakof

211

8. Selecting Statistical Procedures for Progress and Outcome Assessment: The Analysis of Group Data Frederick L. Newman and Manuel J. Tejeda

225

< previous page

page_vii

next page >

< previous page

next page >

page_viii

Page viii Part II: Child and Adolescent Assessment Instrumentation 9. Use of the Children's Depression Inventory Gill Sitarenios and Maria Kovacs

267

10. The Multidimensional Anxiety Scale for Children (Masc) John S. March and James D.A. Parker

299

11. Characteristics and Applications of the Revised Children's Manifest Anxiety Scale (Rcmas) Anthony B. Gerard and Cecil R. Reynolds

323

12. Overview of the Minnesota Multiphasic Personality InventoryAdolescent (Mmpi-a) 341 Robert P. Archer 13. Studying Outcome in Adolescents: The Millon Adolescent Clinical Inventory and Millon Adolescent Personality Inventory Roger D. Davis, Mark Woodward, Antonio Goncalves, Sarah Meagher, and Theodore Millon

381

14. Personality Inventory for Children, Second Edition (PIC-2), Personality Inventory for Youth (Piy), and Student Behavior Survey (Sbs) David Lachar

399

15. The Child Behavior Checklist and Related Instruments Thomas M. Achenbach

429

16. Conners Rating Scales-Revised C. Keith Conners

467

17. Youth Outcome Questionnaire (Y-OQ) M. Gawain Wells, Gary M. Burlingame, and Michael J. Lambert

497

18. Use of the Devereux Scales of Mental Disorders for Diagnosis, Treatment Planning, and Outcome Assessment 535 Jack A. Naglieri and Steven I. Pfeiffer 19. Treatment Planning and Evaluation with the Basc: The Behavior Assessment System for Children R.W. Kamphaus, Cecil R. Reynolds, and Nancy M. Hatcher

563

20. Assessing Adolescent Drug Use with the Personal Experience Inventory Ken C. Winters, Williams W. Latimer, Randy D. Stinchfield, and 599 George Henly 21. Child and Adolescent Functional Assessment Scale (CAFAS) 631 Kay Hodges

< previous page

page_viii

next page >

< previous page

next page >

page_ix

Page ix 22. The Child Health Questionnaire (CHQ): A Potential New Tool to Assess the Outcome of Psychosocial Treatment and Care Jeanne M. Landgraf

665

Part III: Adult Assessment Instrumentation 23. The SCL-90-R, Brief Symptom Inventory, and Matching Clinical Rating Scales Leonard R. Derogatis and Kathryn L. Savitz

679

24. Symptom Assessment-45 Questionnaire (SA-45) Mark E. Maruish

725

25. Behavior and Symptom Identification Scale (BASIS-32) Susan V. Eisen and Melissa A. Culhane

759

26. Brief Psychiatric Rating Scale William O. Faustman and John E. Overall

791

27. The Outcome Questionnaire Michael J. Lambert and Arthur E. Finch

831

28. Primary Care Evaluation of Mental Disorders (PRIME-MD) Steven R. Hahn, Kurt Kroenke, Janet B.W. Williams, and Robert L. Spitzer

871

29. Beck Depression Inventory and Hopelessness Scale Randy Katz, Joel Katz, and Brian F. Shaw

921

30. Hamilton Depression Inventory Kenneth A. Kobak and William M. Reynolds

935

31. Beck Anxiety Inventory Kimberly A. Wilson, Edwin De Beurs, Carleton A. Palmer, and Dianne L. Chambless

971

32. Measuring Anxiety and Anger with the State-Trait Anxiety Inventory (Stai) and the State-Trait Anger Expression Inventory (Staxi) Charles D. Spielberger, Sumner J. Sydeman, Ashley E. Owen, and 993 Brian J. Marsh 33. Minnesota Multiphasic Personality Inventory-2 (MMPI-2) Roger L. Greene and James R. Clopton

1023

34. Treatment Planning and Outcome in Adults: The Millon Clinical Multiaxial Inventory-III Roger D. Davis, Sarah E. Meagher, Antonio Goncalves, Mark Woodward, and Theodore Millon

1051

35. Personality Assessment Inventory Leslie C. Morey

1083

< previous page

page_ix

next page >

< previous page

next page >

page_x

Page x 36. Rorschach Inkblot Method Irving B. Weiner

1123

37. Butcher Treatment Planning Inventory (BTPI): An Objective Guide to Treatment Planning 1157 Julia N. Perry and James N. Butcher 38. Marital Satisfaction Inventory-Revised Douglas K. Snyder and Grace G. Aikman

1173

39. The Adult Personality Inventory Samuel E. Krug

1211

40. SF-36 Health Survey John E. Ware, Jr.

1227

41. Katz Adjustment Scales James R. Clopton and Roger L. Greene

1247

42. Quality of Life Assessment/Intervention and the Quality of Life Inventoryä (QOLIâ) Michael B. Frisch

1277

43. The UCSF Client Satisfaction Scales: I. the Client Satisfaction Questionnaire-8 1333 C. Clifford Attkisson and Thomas K. Greenfield 44. The UCSF Client Satisfaction Scales: II. the Service Satisfaction Scale-30 Thomas K. Greenfield and C. Clifford Attkisson

1347

45. The Consumer Satisfaction Survey Allyson Ross Davies, John E. Ware, Jr., and Mark Kosinski

1369

Part IV: Future Directions 46. Quality of Life of Children: Toward Conceptual Clarity Ross B. Andelman, C. Clifford Attkisson, Bonnie T. Zima, and Abram B. Rosenblatt

1383

47. Future Directions in the Use of Psychological Assessment for Treatment Planning and Outcome Assessment: Predictions and Recommendations Kevin Moreland, Raymond D. Fowler, and L. Michael Honaker 1415 Author Index

1437

Subject Index

1475

< previous page

page_x

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_xiii

next page > Page xiii

PREFACE Over the past several years the American people have witnessed some rather dramatic changes in the state of their health care system. These changes, prompted by the out-of-control cost of health care services, have reshaped the way in which health care is delivered and paid for. The resulting developments have affected not only physical health care services but behavioral health care services as well. The practice of test-based psychological assessment has not entered this new era unscathed. Limitations placed on total moneys allotted for psychological services have had an impact on the practice of psychological testing. However, for those skilled in its use, psychological testing's ability to help quickly identify psychological problems, plan and monitor treatment, and document treatment effectiveness presents many potentially rewarding opportunities during a time when health care organizations must (a) provide problem-focused, timelimited treatment; (b) demonstrate the effectiveness of treatment to payers and patients; and (c) implement quality improvement initiatives. With the opportunity at hand, it is now up to those with skill and training in psychological assessment to make the most of it as they contribute to (and benefit from) efforts to control health care costs. However, the task may not be as simple as it would appear. Many psychologists and other professionals schooled and experienced in the use of psychological tests actually have had relatively limited training and experience in the full range of applications of testing to day-to-day clinical practice. For many, formal testing courses and practicum and internship experiences have focused primarily on the use of testing for symptom identification, personality description, and diagnostic purposes. Many of the more experienced professionals are likely to have only limited knowledge of how to use test results for planning, monitoring, and assessing the outcome of psychological interventions. Consequently, although the basic skills are there, many well-trained cliniciansand graduate students as wellneed to develop or expand their testing knowledge and skills so as to be better able to apply them for such purposes. This need served as the impetus for the

< previous page

page_xiii

next page >

< previous page

page_xiv

next page > Page xiv

development of the first edition of this book, and the development of this second edition attests to its continued presence. In both cases, it was decided that the most informative and useful approach would be one in which aspects of each of four broad topical areas were addressed separately. The first area has to do with general issues and recommendations to be considered in the use of psychological testing for treatment planning and outcome assessment in today's behavioral health care environment. The second and third areas deal with issues related to the use of specific psychological tests and scales for these same purposes. The fourth concerns the future of psychological testing, including new developments on the horizon. Part I of this second edition represents an update and extension of the first part of the first edition. As in the first edition, it is devoted to general considerations that pertain to the need for and use of psychological testing for treatment planning and outcome assessment. The introductory chapter provides an overview of the status of the health care delivery system today and the ways in which testing can contribute to making that system more costeffective. Two chapters are devoted to issues related to treatment planning while four chapters focus on issues related to outcome assessment. The first of the two planning chapters deals with the use of psychological tests for screening purposes in various clinical settings. Screening can serve as the first step in the treatment planning process; for this reason, it is a topic that warrants the reader's attention. The second of these chapters presents a discussion of the research suggesting how testing may be used as a predictor of differential response to treatment and its outcome. Each of these chapters represent updated versions of the original work. The four chapters on the use of testing for outcome assessment are complementary. The first provides an overview of the use of testing for outcome assessment purposes, discussing some of the history of outcome assessment, its current status, its measures and methods, individualizing outcome assessment, the distinction between clinically and statistically significant differences in outcome assessment, and some outcomes-related issues that merit further research. The next three chapters expand on the groundwork laid in this chapter. The first of these three presents an updated discussion of a set of specific guidelines that can be valuable to clinicians in their selection of psychological measures for assessing treatment outcomes. These same criteria also are generally applicable to the selection of instruments for treatment planning purposes. The other two chapters provide a discussion of statistical procedures and research design issues related to the measurement of treatment progress and outcomes with psychological tests. One is a new chapter that specifically addresses the analysis of individual patient data; the other is an update of the original chapter that deals with the analysis of group data. As before, these discussions are presented with full knowledge of the understandable distaste that many clinicians have for statistics. However, knowledge and skills in this area are particularly important and needed by clinicians wishing to establish and maintain an effective treatment evaluation process within their particular setting. Completing Part I is another new chapter. It presents considerations relevant to the design, implementation, and maintenance of outcomes management programs in behavioral health care settings. Parts II and III address the use of specific psychological instruments for treatment planning and outcome assessment purposes. Part II deals with child and adolescent instruments while Part III focuses on instruments that are intended exclusively or primarily for use with adult populations. The instruments considered as potential chapter topics were evaluated against several selection criteria, including the popularity of the instrument among clinicians; recognition of its psychometric integrity; the potential, in the case of recently released instruments, for the instrument to become widely accepted and used; the perceived usefulness of the instrument for treatment planning and outcome assessment purposes; and the

< previous page

page_xiv

next page >

< previous page

page_xv

next page > Page xv

availability of a recognized expert on the instrument (preferably its author) to contribute a chapter to this book. In the end, the instrument-specific chapters selected for inclusion were those judged most likely to be of the greatest interest and utility to the majority of the book's intended audience. Each of the original chapters in the first edition had previously met these selection criteria; thus, Parts II and III consist of updated versions of the instrumentation chapters that appeared in the first edition. Both parts also contain several new chapters discussing instruments that were not included in the first edition for one reason or another (e.g., those that were not developed at the time, or those that only recently gained wide acceptance for outcome assessment purposes). Recognition of the potential utility of each of these instruments for treatment planning and/or evaluation served as a significant impetus for revising the original work. A decision regarding the specific content of each of the chapters in Parts II and III was not easy to arrive at. However, in the end, the contributors were asked to address those issues and questions that are of the greatest concern or relevancy for practicing clinicians. Generally, these fall into three important areas: what the instrument does and how it was developed; how one should use this instrument for treatment planning and monitoring; and how one should use it to assess treatment outcomes. Guidelines were provided to assist the contributors in addressing each of these areas. Many of the contributors adhered strictly to these guidelines; others modified the contents of their chapter to reflect and emphasize what they judged to be important for the reader to know about the instrument when using it for planning, monitoring, and/or outcome assessment purposes. Some may consider the chapters in these Parts II and III as being the ''meat" of this work because they provide how-to instructions for tools that are commonly found in the clinician's armamentarium. In fact, these chapters are no more or less important than those found in Part I. They are only extensions and are of limited value outside the context of those first several chapters. Part IV presents a discussion of the future of psychological assessment. One chapter in this part was written to inform the reader of anticipated advances in the field of testing as well as anticipated legislative and medical care mandates that may affect the manner in which psychological testing will be used in years to come. The other chapter is devoted to reviewing research related to the conceptualization of quality of life (QOL) as it applies to children, and to how it has evolved over the years. The purpose of the authors' endeavor is to present a foundation for the future development of useful measures of child QOLsomething that currently appears to be in short supply. Like the first edition, this book is not intended to be a definitive summary and review. However, it is hoped that the reader will find its chapters useful in better understanding general and test-specific considerations and approaches related to treatment planning and outcome assessment, and in effectively applying them in his or her daily practice. It also is hoped that the book will stimulate further endeavors to investigate the application of psychological testing for these purposes.

Acknowledgments The development of the second edition of this book was a significant undertaking, requiring the efforts and support from a number of people. First and foremost are the contributors. Nearly all of the eminent contributors to the first edition were gracious enough to set aside time in their busy schedules to update and revise their initial work. In addition, I was extremely fortunate to be able to enlist an equally distinguished group of experts in the field of psychological testing

< previous page

page_xv

next page >

< previous page

page_xvi

next page > Page xvi

and assessment to contribute new chapters dealing with instruments and topics that have gained attention from the professional community since the publication of the first edition. This project was successful only because of their commitment and willingness to share their knowledge, experience, and insights with this audience. Several other parties deserve particular thanks for their contributions to this endeavor. I thank Magellan Health Services and Strategic Advantage, Inc., for their support during the course of this project. Nancee Meuser once again was kind enough to serve as the "editor's editor," reviewing, editing, and offering suggestions as to how to improve the chapters that I authored. And a special thanks goes to Larry Erlbaum of Lawrence Erlbaum Associates for his encouragement, counsel, and support. Finally, I am grateful to those family members and friends who have been there for me during this project. Without their support, this book would not have been possible. MARK E. MARUISH

< previous page

page_xvi

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_xix

next page > Page xix

PART I GENERAL CONSIDERATIONS

< previous page

page_xix

next page >

< previous page

page_1

next page > Page 1

Chapter 1 Introduction Mark E. Maruish United Behavioral Health The cost of health care in the United States has reached astronomical heights. In 1995, approximately $1 trillion, or 14.9%, of the gross domestic product was spent on health care, and a 20% increase is expected by the year 2000 ("Future Targets," 1996). The costs of mental health problems and the need for behavioral health care services in the United States have risen over the past several years and are particularly disconcerting. A Substance Abuse and Mental Health Services Administration (SAMHSA) summary of various findings in the literature indicated that the U.S. bill for mental health disorders in 1990 was $148 billion (Rouse, 1995). This compares to the 1983 direct and indirect mental health costs of $73 billion reported by Harwood, Napolitano, and Kristiansen (cited in Kiesler & Morton, 1988). The high cost of treating behavioral health problems is not surprising given the prevalence of psychiatric and substance use disorders in this country. The Center for Disease Control and Prevention (1994) reported on the results of a survey of 45,000 randomly interviewed Americans regarding their quality of life. The survey found that one third of the respondents reported suffering from depression, stress, or emotional problems at least 1 day a month. Eleven percent of the sample reported having these problems more than 8 days a month. Preliminary estimates from SAMHSA's 1995 National Household Survey on Drug Abuse (SAMHSA, 1996) indicated that 12.8 million Americans 12 years and older, or 6.1% of the population, had used illicit drugs within a month of the survey interview. More alarming is the fact that illicit drug use was reported by 10.9% of the 12- to 17-yearold subsample. Also, it was estimated that 11 million Americans, or 5.5% of the population, could be classified as heavy drinkers, that is, they consumed five or more drinks on each of five different days during the same 1month time period. The American Psychological Association (APA, 1996) also reported statistics that bear attention. In sum: It is estimated that 15% to 18% of Americans suffer from a mental disorder. Fourteen million of these individuals are children. Approximately eight million Americans suffer from depression in any given month.

< previous page

page_1

next page >

< previous page

page_2

next page > Page 2

As many as 20% of Americans will suffer one or more major episodes of depression during their lifetime. An estimated 80% of elderly residents in Medicaid facilities were found to have moderate to intensive needs for mental health services. Moreover, information from various studies indicate that at least 25% of primary health care patients have a diagnosable behavioral disorder ("Leaders Predict," 1996). The Value of Behavioral Health Care Services The demand for behavioral health care services also is significant. In analyzing data from a 1987 national survey of 40,000 people in 16,000 households, Olfson and Pincus (1994a, 1994b) found that 3% of the population was seen for at least one psychotherapeutic session that year. Eighty-one percent of these sessions were visits to mental health professionals. The stigma attached to mental health problems and their treatment continues to lessen, so it might well be assumed that utilization of behavioral health care services has increased since the time of that survey. But what is the value of the services provided to those suffering from mental illness or substance abuse/dependency? Some might argue that the benefit is either minimal or too costly to achieve if significant effects are to be gained. This, however, is in the face of data that indicate otherwise. Numerous studies have demonstrated that treatment of mental health and substance abuse/dependency problems can result in substantial savings when viewed from a number of perspectives. This "cost offset" effect has been demonstrated most clearly in savings in medical care dollars. Given reports that 50% to 70% of typical primary care visits are for medical problems involving psychological factors, the value of medical cost offset is significant (American Psychological Association, 1996). Moreover, APA also reported that 25% of patients seen by primary care physicians have a disabling psychological disorder, and that depression and anxiety rank among the top six conditions seen by family physicians. The following are just a few of the findings supporting the medical cost savings that can be achieved through the provision of behavioral health care treatment: Patients with diagnosable behavioral disorders who are seen in primary care settings use two to four times as many medical resources as those patients without these disorders ("Leaders Predict," 1996b). A study by Simon, VonKorff, and Barlow (1995) revealed that the annual health care costs of 6,000 primary care patients with identified depression were nearly twice those of the same number of primary care patients without depression ($4,246 vs. $2,371). J. Johnson, Weissman, and Klerman (1992) reported that depressed patients make seven times as many visits to emergency rooms as do nondepressed patients. Saravay, Pollack, Steinberg, Weinschel, and Habert (1996) found that cognitively impaired medical and surgical inpatients were rehospitalized twice as many times as cognitively unimpaired patients within a 6month period. In the same study, depressed medical and surgical inpatients were found to have an average of approximately 12 days of rehospitalization over a 4-year follow-up period. During this same period, nondepressed inpatients averaged only 6 days of rehospitalization.

< previous page

page_2

next page >

< previous page

page_3

next page > Page 3

Demonstrating the potential for additional costs that can accrue from the presence of a behavioral health problem, the health care costs of families with an alcoholic member were found to be twice that of families without alcoholic members in a longitudinal study by Holder and Blose (1986). Sipkoff (1995) made several conclusions after reviewing several studies conducted between 1988 and 1994 and listed in the "Cost of Addictive and Mental Disorders and Effectiveness of Treatment" report published by SAMHSA. After meta-analyzing the cost offset effect, Sipkoff found that treatment for mental health problems results in about a 20% reduction in the overall cost of health care. The report also concluded that whereas alcoholics were found to spend twice as much on health care as those without abuse problems, one half of the cost of substance abuse treatment is offset within 1 year by subsequent reductions in the combined medical cost savings for patients and their families. Strain et al. (1991) found that screening a group of 452 elderly hip fracture patients for psychiatric disorders prior to surgery, and then providing mental health treatment to the 60% of the sample needing treatment, reduced total medical expenses by $270,000. The cost of the psychological/psychiatric services provided to this group was only $40,000. Simmons, Avant, Demski, and Parisher (1988) compared the average medical costs for chronic back pain patients at a multidimensional pain center (providing psychological and other types of intervention) incurred during the year prior to treatment to those costs incurred in the year following treatment. The pretreatment costs per patient were $13,284 whereas posttreatment costs were $5,596. APA (1996) succinctly summarized what appears to be the prevalent findings of the medical cost offset literature: Patients with mental disorders are heavy users of medical services, averaging twice as many visits to their primary care physicians as patients without mental disorders. When appropriate mental health services are made available, this heavy use of the system often decreases, resulting in overall health savings. Cost offset studies show a decrease in total health care costs following mental health interventions even when the cost of the intervention is included. In addition, cost offset increases over time, largely because . . . patients continue to decrease their overall use of the health care system, and don't require additional mental health services. (p. 2) A more detailed discussion of various ways in which behavioral interventions can both maximize care to medical patients and achieve significant economic gains can be found in Friedman, Sobel, Myers, Caudill, and Benson (1995). The dollar savings that result from medical cost offset are relatively obvious and easy to measure. However, the larger benefits to the communityfinancial and otherwisethat can also accrue from the treatment of mental health and substance abuse/dependency problems may not be as obvious. One area in which treatment can have a tremendous impact is in the workplace. For example, note the following facts compiled by APA (1996): In 1985, behavioral health problems resulted in over $77 billion in lost income to Americans. California's stress-related disability claims totaled $350 million in 1989. In 1980, alcoholism resulted in over 500 million lost work days in the United States. Major depression costs an estimated $23 billion in lost work days in 1990. Individuals with major depression are three times more likely than nondepressed individuals to miss time from work and four times more likely to take disability days.

< previous page

page_3

next page >

< previous page

page_4

next page > Page 4

Seventy-seven percent of all subjects from 58 psychotherapy effectiveness studies focusing on the treatment of depression received significantly better work evaluations than depressed subjects who did not receive treatment. Treatment resulted in a 150% increase in earned income for alcoholics and a 390% increase in income for drug abusers in one study of 742 substance abusers. On another front, the former director of the Office of the National Drug Control Policy reported that for every dollar spent on drug treatment, the United States saves $7 in health care and criminal justice costs (Substance Abuse Funding News, 1995). Also, SAMHSA's summary of the literature on 1990 behavioral health care costs indicated that crime, criminal justice activities, and property loss associated with substance use and mental disorders crime resulted in a total of $67.8 billion spent or lost (Rouse, 1995). Society's need for behavioral health care services provides an opportunity for psychologists and other trained behavioral health service providers to become part of the solution to a major health care problem with no indication of decline. Each of the helping professions has the potential to make a contribution to this solution. Especially important are those contributions that can be made by psychologists and others trained in the use of psychological tests. For decades, psychologists and other behavioral health care providers have come to rely on psychological assessment as a standard tool to assist diagnostic and treatment planning activities. However, the care delivery system that has evolved within health care in general, and behavioral health care services in particular, has led to changes in how third-party payers, psychologists, and other service providers think about and/or use psychological assessment in day-to-day clinical practice. Some question the value of psychological assessment in the current time-limited, capitated service delivery arena where the focus seemingly has changed from clinical priorities to fiscal priorities (Sederer, Dickey, & Hermann, 1996). Others argue that it is in just such an arena that the benefits of psychological assessment can be most fully realized and contribute significantly to the delivery of cost-effective treatment for behavioral health disorders (Maruish, 1994). Consequently, assessment could assist the health care industry in appropriately controlling or reducing the utilization and cost of health care over the long term. As Maruish (1990) observed nearly a decade ago, consider that the handwriting on the wall appears to be pointing to one scenario. With limited dollars available for treatment, the delivery of cost-efficient, effective treatment will be dependent on the ability to clearly identify the patient's problem(s). Based on this and other considerations, the most appropriate treatment modality . . . must then be determined. Finally, the organization will have to show that it has met the needs of each client. . . . It is in all of these functionsproblem identification, triage/disposition, and outcome measurementthat psychological assessment can make a significant contribution to the success of the organization. (p. 5) It is the latter side of the argument that is supported here, and that provides the basis for this and subsequent chapters within this volume. This introduction is intended to provide students and practitioners of psychology and other behavioral health care professions with an overview of how psychological assessment could, and should, be used in this era of managed behavioral health care to the ultimate benefit of patients, providers, and payers. On a final introductory note, it is important to understand that the term psychological assessment, as it is used in this chapter, refers to the evaluation of a patient's mental health status using psychological tests or related instrumentation. This evaluation may

< previous page

page_4

next page >

< previous page

page_5

next page > Page 5

be conducted with or without the benefit of patient or collateral interviews, review of medical or other records, and/or other sources of relevant information about the patient. The Current Practice of Psychological Assessment in Behavioral Health Care Settings For a number of decades, psychological assessment has been a valued and integral part of the services offered by psychologists and other mental health professionals trained in its use. However, its popularity has not been without its ups and downs. Megargee and Spielberger (1992) described a period of decreased interest in assessment that began in the 1960s. This decline was attributed to a number of factors, including a shift in focus to those aspects of treatment where assessment was thought to contribute little (e.g., a growing emphasis on behavior modification techniques, increased use of psychotropic medications, emphasis on the study of symptoms rather that personality syndromes and structures). But Megargee and Spielberger also noted a resurgence in the interest in assessment, including a new realization of how psychological assessment can assist in mental health care interventions today. Where does psychological assessment currently fit into the daily scope of activities for practicing psychologists? The newsletter Psychotherapy Finances ("Fee, Practice and Managed Care," 1995) reported the results of a nationwide readership survey of 1,700 mental health providers. Sixty-seven percent of the psychologists participating in this survey reported providing psychological testing services. This represents about a 10% drop from the level indicated by a similar 1992 survey published in the same newsletter. Also of interest in the more recent survey is the percent of professional counselors (39%), marriage and family counselors (16%), psychiatrists (21%), and social workers (13%) offering these same services. In a 1995 survey by the American Psychological Association's Committee for the Advancement of Professional Practice (Phelps, 1996), 14,000 psychological practitioners responded to questions related to workplace settings, areas of practice concerns, and range of activities. Most of the respondents (40.7%) were practitioners whose primary work setting was independent practice. Other general work settingsgovernment, medical, academic, group practicewere fairly equally represented by the remainder of the sample. The principal professional activity reported by the respondents was psychotherapy, with 43.9% of the sample acknowledging involvement in this service. Assessment, which was reported by 14% of the sample, was the second most prevalent activity. Differences in the two samples utilized in the aforementioned surveys may account for inconsistencies in findings. Psychologists who are subscribers to Psychotherapy Finances may represent that subsample of the APA survey respondents who are more involved in the delivery of clinical services. Certainly the fact that only about 44% of the APA respondents offer psychotherapy services supports this hypothesis. Regardless of the two sets of findings, psychological assessment does not appear to be utilized as much as in the past, and it is not necessary to look hard to determine at least one reason why. One of the major changes that has come about in the U.S. health care system during the past several years has been the creation and proliferation of managed care organizations (MCOs). The most significant direct effects of managed care include reductions in the length and amount of service, reductions in accessibility to particular modalities (e.g., reduced number of outpatient visits per case), and profession-related changes in

< previous page

page_5

next page >

< previous page

page_6

next page > Page 6

the types of services managed by behavioral health care providers (Oss, 1996). Overall, the impact of managed behavioral health care on the services offered by psychologists and other behavioral health care providers has been tremendous. In the APA survey reported earlier (Phelps, 1996), approximately 79% of the respondents reported that managed care had at least some impact on their work. How has managed care negatively impacted the use of psychological assessment? It is not clear from the results of the APA or Psychotherapy Finances surveys, but perhaps others can offer some insight. Ficken (1995) commented on how the advent of managed care has limited the reimbursement for (and therefore the use of) psychological assessment. In general, he saw the primary reason for this as a financial one. In an era of capitated behavioral health care coverage, the amount of money available for behavioral health care treatment is limited. MCOs therefore require a demonstration that the amount of money spent for testing will result in a greater amount of treatment cost savings. This author is unaware of any published or unpublished research to date that can provide this demonstration. In addition, Ficken noted that much of the information obtained from psychological assessment is not relevant to the treatment of patients within a managed care environment. Understandably, MCOs are reluctant to pay for gathering such information. Werthman (1995) provided similar insights into this issue, noting that managed care . . . has caused [psychologists] to revisit the medical necessity and efficacy of their testing practices. Currently, the emphasis is on the use of highly targeted and focused psychological and neuropsychological testing to sharply define the "problems" to be treated, the degree of impairment, the level of care to be provided and the treatment plan to be implemented. The high specificity and "problem-solving" approach of such testing reflects MCOs' commitment to effecting therapeutic change, as opposed to obtaining a descriptive narrative with scores. In this context, testing is perceived as a strong tool for assisting the primary provider in more accurately determining patient "impairments" and how to "repair" them. (p. 15) In general, Werthman (1995) viewed psychological assessment as being no different from other forms of patient care, thus making it subject to the same scrutiny, the same demands for demonstrating medical necessity and/or utility, and the same consequent limitations imposed by MCOs on other covered services. The foregoing representations of the current state of psychological assessment in behavioral health care delivery could be viewed as an omen of worse things to come. But, they are not. Rather, the limitations being imposed on psychological assessment and the demand for justification of its use in clinical practice represent part of the customers' dissatisfaction with the way things were done in the past. In general, the tightening of the purse strings is a positive move for both behavioral health care and the profession of psychology. It is a wake-up call to those who have contributed to the health care crisis by either uncritically performing costly psychological assessments, being unaccountable to the payers and recipients of those services, and generally not performing those services in the most responsible, cost-effective way possible. Providers need to evaluate how they have used psychological assessments in the past and then determine the best way to use them in the future. As such, it is an opportunity for providers to reestablish the value of the contributions they can make to improve the quality of care delivery through their knowledge and skills in the area of psychological assessment. The sections that follow convey one vision of the present and future opportunities for psychological assessment in behavioral health care and the means of best achieving

< previous page

page_6

next page >

< previous page

page_7

next page > Page 7

them. In doing so, the context for the remaining chapters is established. The views advanced are based on a knowledge of and experience in current psychological assessment practices as well as directions provided by the current literature. Some practitioners will disagree with the view put forth here, given their own experience and thinking on the matters discussed. However, it is hoped that even though in disagreement, they will be challenged to defend their position to themselves and, as a result, further refine their thinking and approach to the use of assessments within their practice. Psychological Assessment as a Treatment Adjunct: An Overview Traditionally, the role of psychological assessment in therapeutic settings has been quite limited. Those who did not receive their clinical training within the past few years probably were taught that the value of psychological assessment is found only at the "front end" of treatment. That is, they likely were instructed in the power and utility of psychological assessment as a means of assisting in the identification of symptoms and their severity, personality characteristics, and other aspects of the individual (e.g., intelligence, vocational interests) that are important in understanding and describing the patient at a specific point in time. Based on this data and information obtained from patient and collateral interviews, medical records, and the individual's stated goals for treatment, a diagnostic impression was given and a treatment plan was formulated and placed in the patient's charthopefully to be reviewed at various points during the course of treatment. In some cases, the patient was assigned to another practitioner within the same organization or referred out, never to be seen or contacted againmuch less to be reassessed by the person who performed the original assessment. Fortunately, during the past few years, psychological assessment has come to be recognized for more than just its usefulness at the beginning of treatment. Consequently, its utility has been extended beyond being a mere tool for describing an individual's current state, to a means of facilitating the treatment and understanding behavioral health care problems throughout and beyond the episode of care. Generally speaking, several psychological tests that are now commercially available can be employed as tools to assist in clinical decision making and outcomes assessment, and, more directly, as a treatment technique in and of itself. Each of these uses contributes value to the therapeutic process. Psychological Assessment for Clinical Decision Making Traditionally, psychological assessment has been used to assist psychologists and other behavioral health care clinicians in making important clinical decisions. The types of decision making for which it has been used include those related to screening, treatment planning, and monitoring of treatment progress. Generally, screening may be undertaken to assist in either identifying the patient's need for a particular service, or to determine the likely presence of a particular disorder or other behavioral/emotional problems. More often than not, a positive finding on screening leads to a more extensive evaluation of the patient in order to confirm with greater certainty the existence of the problem, or to further delineate the nature of the problem. The value of screening lies in the fact

< previous page

page_7

next page >

< previous page

page_8

next page > Page 8

that it permits the clinician to quickly and economically identify, with a fairly high degree of confidence (depending on the particular instrumentation used), those who are likely to need care or at least further evaluation. In many instances, psychological assessment is performed in order to obtain information that is deemed useful in the development of a specific treatment plan. Typically, this type of information is not easily (if at all) accessible through other means or sources. When combined with other information about the patient, information obtained from a psychological assessment can aid in understanding the patient, identifying the most important problems and issues that need to be addressed, and formulating recommendations about the best means of addressing them. Another way psychological assessment plays a valuable role in clinical decision making is in treatment monitoring. Repeated assessment of the patient at regular intervals during the treatment episode can provide the clinician with feedback regarding therapeutic progress. Based on the findings, the therapist will be encouraged to either continue with the original therapeutic approach or, in the case of no change or exacerbation of the problem, modify or abandon the approach in favor of an alternate one. Psychological Assessment as a Treatment Technique The degree to which the patient is involved in the assessment process is changing. One reason for this is the relatively recent revision of the ethical standards of the American Psychological Association (American Psychological Association, 1992). This revision includes a mandate for psychologists to provide feedback to clients whom they assess. According to Ethical Standard 2.09, ''Psychologists ensure that an explanation of the results is provided using language that is reasonably understandable to the person assessed or to another legally authorized person on behalf of the client" (p. 8). Finn and Tonsager (1992) offered other reasons for the recent interest in providing patients with assessment feedback. These include the recognition of the patient's right to see their medical and psychiatric health care records, as well as clinically and research-based findings and impressions suggesting that "therapeutic assessment" (described later) facilitates patient care. Finn and Tonsager also referred to Finn and Butcher's (1991) summary of potential benefits that may accrue from providing feedback to patients about their results. The benefits cited include increased feelings of self-esteem and hope, reduced symptomatology and feelings of isolation, increased self-understanding and self-awareness, and increased motivation to seek or be more actively involved in their mental health treatment. In addition, Finn and Martin (1997) noted that the therapeutic assessment process provides a model for relationships that can result in increased mutual respect, can lead to increased feelings of mastery and control, and can decrease feelings of alienation. Recently, empirical studies and other published works have addressed the therapeutic benefits that can be realized directly from discussing psychological assessment results with the patient. Although not the focus of this book, the use of psychological assessment as a treatment also bears mention here. Therapeutic use of assessment generally involves a presentation of assessment results (including assessment materials such as test protocols, profile forms, other assessment summary materials) directly to the patient, an elicitation of the patient's reactions to them, and an in-depth discussion of the meaning of the results in terms of patient-defined assessment goals. In essence, assessment data can serve as a catalyst for the therapeutic

< previous page

page_8

next page >

< previous page

page_9

next page > Page 9

encounter via the objective feedback that is provided to the patient, the patient self-assessment that is stimulated, and the opportunity for patient and therapist to arrive at mutually agreed on therapeutic goals. The use of psychological assessment as a means of therapeutic intervention has received particular attention primarily through the work of Finn and his associates (Finn, 1996a, 1996b; Finn & Martin, 1997; Finn & Tonsager, 1992). In discussing what he termed "therapeutic assessment" using the MMPI-2, Finn (1996b) outlined a procedure with a goal to "gather accurate information about clients . . . and then use this information to help clients understand themselves and make positive changes in their lives" (p. 3). Elaborating on this procedure and extending it to the use of any test, Finn and Martin (1997) described therapeutic assessment as collaborative, interpersonal, focused, time limited, and flexible. It is . . . very interactive and requires the greatest of clinical skills in a challenging role for the clinician. It is unsurpassed in a respectfulness for clients: collaborating with them to address their concerns (around which the work revolves), acknowledging them as experts on themselves and recognizing their contributions as essential, and providing to them usable answers to their questions in a therapeutic manner. Simply stated, Finn and his colleagues' therapeutic assessment procedure may be considered an approach to the assessment of mental health patients in which the patient is not only the primary provider of information needed to answer questions, but also is actively involved in formulating the questions to be answered by the assessment. Feedback regarding the results of the assessment is provided to the patient and is considered a primary, if not the primary, element to the assessment process. Thus, the patient becomes a partner in the assessment process and, as a result, therapeutic and other benefits accrue. Finn's clinical and research work primarily has focused on therapeutic assessment techniques using the MMPI-2. However, it appears that the same techniques can be employed with other instruments or batteries of instruments that provide multidimensional information relevant to patients' concerns. Thus, the work of Finn and his colleagues can serve as a model for deriving direct therapeutic benefits from the psychological assessment experience using any of several commercially available and public domain instruments. Psychological Assessment for Outcomes Assessment Currently, one of the most common reasons for conducting psychological assessment in the United States is to assess the outcomes of behavioral health care treatment. It is difficult to open a trade paper or health care newsletter, or to attend a professional conference, without being presented with a discussion about either how to "do outcomes" or what the results of a certain facility's outcomes study has revealed. The interest in and focus on outcomes assessment most probably can be traced to the continuous quality improvement (CQI) movement that was initially implemented in business and industrial settings. The impetus for the movement was a desire to produce quality products in the most efficient manner, resulting in increased revenues and decreased costs. In health care, outcomes assessment has multiple purposes, not the least of which is as a tool for marketing the organization's services. Related to this, those provider organizations vying for lucrative contracts from a third party frequently must present outcomes data demonstrating the effectiveness of their services. Equally important are data that demonstrate patient satisfaction. But perhaps the most important potential

< previous page

page_9

next page >

< previous page

page_10

next page > Page 10

use of outcomes data within provider organizations (although not always recognized as such) is the knowledge it can yield about what does and does not work. In this regard, outcomes data can serve as a means for ongoing program evaluation. It is the knowledge obtained from outcomes data that, if attended to and acted on, can lead to improvement in the services offered by the organization. When used in this manner, outcomes assessment can become an integral component of the organization's CQI initiative. More importantly, however, for individual patients, outcomes assessment provides a means of objectively measuring how much improvement they have made from the time of treatment initiation to the time of treatment termination, and in some cases extending to some time after termination. Feedback to this effect may serve to instill in patients greater self-confidence and self-esteem, and/or a more realistic view of where they are (from a psychological standpoint) at that point in time. It also may serve as an objective indicator to patients of the need for continued treatment. The purpose of the foregoing discussion was to present a broad overview of psychological assessment as a multipurpose behavioral health care tool. Depending on the individual clinician or provider organization, it may be employed for one or more of the purposes just described. The preceding overview should provide a context for a better understanding of the more in-depth and detailed discussion about each of these applications that follows. Before beginning this discussion, however, it is important to briefly review the types of instrumentation most likely to be used in therapeutic psychological assessment, as well as the significant considerations and issues related to the selection and use of this instrumentation. This should further facilitate an understanding of what is presented in the remainder of the chapter. General Considerations for the Selection and Use of Assessment Instrumentation Major test publishers regularly release new instrumentation for facilitating and evaluating behavioral health care treatment. Thus, availability of instrumentation for these purposes is not an issue. However, selection of the appropriate instrument(s) for one or more of the purposes already described is a matter requiring careful consideration. Inattention to an instrument's intended use, its demonstrated psychometric characteristics, its limitations, and other aspects related to its practical application can result in misguided treatment and potentially harmful consequences for a patient. Several types of instruments could be used for the general assessment purposes described earlier. For example, neuropsychological instruments might be used to assess memorial deficits that could impact the clinician's decision to perform further testing, the goals established for treatment, and the approach to treatment selected. Tests designed to provide estimates of level of intelligence might be used for the same purposes. It is beyond the scope of this chapter (and this book) to address, even in the most general way, all of the types of tests, rating scales, and the instrumentation that might be employed in a therapeutic environment. Instead, the focus here is on general classes of instrumentation that have the greatest applicability in the service of patient screening as well as in the planning, monitoring, and evaluation of psychotherapeutic interventions. To a limited extent, specific examples of such instruments are presented. This is followed by a brief overview of criteria and considerations that will assist clinicians in

< previous page

page_10

next page >

< previous page

page_11

next page > Page 11

selecting the best instrumentation for their intended purposes. Newman and Ciarlo present a more detailed discussion of this topic in chapter 5. Instrumentation for Behavioral Health Care Assessment The instrumentation required for any assessment application will depend on the general purpose(s) for which the assessment is being conducted and the level of informational detail that is required for those purpose(s). Generally, the types of instrumentation that would serve the purpose of assessment may be classified into one of four general categories. As already mentioned, other types of instrumentation are frequently used in clinical settings for therapeutic purposes. However, this discussion is limited to those more commonly used for screening, treatment planning, treatment monitoring, and outcome assessment. Psychological/Psychiatric Symptom Measures. Probably the most frequently used instrumentations for each of the four stated purposes are measures of psychopathological symptomatology. These are the types of instruments on which the majority of the clinician's psychological assessment training has likely been focused. These instruments were developed to assess the problems that typically prompt people to seek treatment. There are several subtypes of these measures of psychological/psychiatric symptomatology. The first is the comprehensive multidimensional measure, which is typically a lengthy, multiscale instrument that measures and provides a graphical profile of the patient on several psychopathological symptoms domains (e.g., anxiety, depression) or disorders (schizophrenia, antisocial personality). Also, summary indices sometimes are available to provide a more global picture of the individuals with regard to their psychological status or level of distress. Probably the most widely used and/or recognized of these measures are the Minnesota Multiphasic Personality Inventory (MMPI; Hathaway & McKinley, 1951) and its restandardized revision, the MMPI-2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989), the Millon Clinical Multiaxial Inventory-III (MCMI-III; Millon, 1994), and the Personality Assessment Inventory (PAI; Morey, 1991). Multiscale instruments of this type can serve a variety of purposes that facilitate therapeutic interventions. They may be used on initial contact with the patient to screen for the need for service and, at the same time, yield information useful for treatment planning. Indeed, some such instruments (e.g., the MMPI-2) may make available supplementary, content-related, and/or special scales that can assist in addressing specific treatment considerations (e.g., motivation for treatment). Other multiscale instruments might be useful in identifying specific problems that may be unrelated to the patient's chief complaints (e.g., low self-esteem). They also can be administered numerous times during the course of treatment to monitor the patient's progress toward achieving established goals and to assist in determining what adjustments (if any) must be made to the clinician's approach. In addition, use of such an instrument in a pre- and posttreatment fashion can provide information related to the outcomes of an individual patient's treatment. At the same time, data obtained in this fashion can be analyzed with the results of other patients to evaluate the effectiveness of an individual therapist, a particular therapeutic approach, or an organization. Abbreviated multidimensional measures are similar to the comprehensive multidimensional measure in many respects. First, by definition, they contain multiple scales for

< previous page

page_11

next page >

< previous page

page_12

next page > Page 12

measuring a variety of symptom domains and disorders. They also may allow for the derivation of an index of the patient's general level of psychopathology or distress. In addition, they may be used for screening, treatment planning and monitoring, and outcomes assessment purposes just like the comprehensive instruments. The distinguishing feature of the abbreviated instrument is its length. Again, by definition, these instruments are relatively short, easy to administer, and (usually) easy to score. Their brevity does not allow for an in-depth assessment of patients and their problems, but this is not what these instruments were designed to do. Probably the most widely used of these brief instruments are Derogatis' family of symptom checklist instruments. These include the original Symptom Checklist-90 (SCL-90; Derogatis, Lipman, & Covi, 1973) and its revision, the SCL-90-R (Derogatis, 1983). Both of these instruments contain a checklist of 90 psychological symptoms, most of which score on the instruments' nine symptom scales. An even briefer version has been developed for each of these instruments. The first is the Brief Symptom Inventory (BSI; Derogatis, 1992), which was derived from the SCL-90-R. In a health care environment that is cost-conscious and unwilling to make too many demands on a patient's time, this 53-item instrument is gaining popularity over its longer 90-item parent instrument. Similarly, a brief form of the original SCL-90 has been developed. Titled the Symptom Assessment45 Questionnaire (SA-45; Strategic Advantage, Inc., 1996), its development did not follow Derogatis' approach to the development of the BSI; instead, cluster analytic techniques were used to select five items each for assessing each of the nine symptom domains found on the three Derogatis checklists. The major strength of the abbreviated multiscale instruments is their ability to broadly and very quickly survey several psychological symptoms domains and disorders relative to the patient. Their value is most clearly evident in settings where both the time and dollars available for assessment services are quite limited. These instruments provide a lot of information quickly. Also, they are much more likely to be completed by patients than their lengthier counterparts. This last point is particularly important if there is an interest in monitoring treatment or assessing outcomes, both of which require at least two or more assessments to obtain the necessary information. Measures of General Health Status and Role Functioning. During the past decade, there has been an increasing interest in the assessment of health status in physical and behavioral health care delivery systems. Initially, this interest was shown primarily within those organizations and settings focused on the treatment of physical diseases and disorders. In recent years, behavioral health care providers have recognized the value of assessing the patient's general level of health. It is important to recognize that the term health means more than just the absence of disease or debility. According to the World Health Organization (WHO; as cited in Stewart & Ware, 1992), it also implies a state of well-being throughout the individual's physical, psychological, and social spheres of existence. Dickey and Wagenaar (1996) noted how this concept of health recognizes the importance of eliciting the patient's point of view in assessing health status. They also pointed to similar conclusions reached by Jahoda (1958) specific to the area of mental health. Here, individuals' self-assessment relative to how they feel they should be is an important component of "mental health." Measures of health status and physical functioning can be classified into one of two groups: generic and condition specific. Probably the most widely used and respected generic health status measures are the 36-item Medical Outcomes Study Short Form Health Scale (SF-36; Ware & Sherbourne, 1992; Ware, Snow, Kosinski, & Gandek,

< previous page

page_12

next page >

< previous page

page_13

next page > Page 13

1994) and the 39-item Health Status Questionnaire 2.0 (HSQ; Health Outcomes Institute, 1993; Radosevich, Wetzler, & Wilson, 1994). Aside from minor variations in the scoring of one of the instruments' scales (i.e., Bodily Pain) and the HSQ's inclusion of three depression screening items, the two measures are essentially identical. Each assesses eight dimensionsfour addressing mental health-related constructs and four addressing physical health-related constructsthat reflect the WHO concept of "health." Role functioning has recently gained attention as an important variable to address in the course of assessing the impact of a physical or mental disorder on an individual's life functioning. How the person's ability to work, perform daily tasks, or interact with others is affected by the disorder is important to know in devising a treatment plan and monitoring progress over time. The SF-36 and HSQ both address these issues with scales designed for this purpose. In response to concerns that even these two relatively brief objective measures are too lengthy for regular administration in clinical and research settings, a 12-item, abbreviated version of each has been developed. The SF-12 (Ware, Kosinski, & Keller, 1995) was developed for use in large-scale, population-based research where the monitoring of health status at a broad level is all that is required. Also, a 12-item version of the HSQ, the HSQ-12 (Radosevich & Pruitt, 1996), was developed for similar uses. (It is interesting that despite being derived from essentially the same instrument, there is only 50% item overlap between the two abbreviated instruments.) Both instruments are relatively new, but the data supporting their use that has been gathered to date are promising. Condition-specific health status and functioning measures have been utilized for a number of years. Most have been developed for use with physical rather than mental disorders, diseases, or conditions. However, conditionspecific measures of mental health status and functioning are beginning to appear. A major source of this type of instrument is the Minnesota-based Health Outcomes Institute (HOI), a successor to the health care think tank InterStudy. In addition to the HSQ and the HSQ-12, HOI serves as the distributor/clearinghouse for the condition-specific "Technology of Patient Experience (TyPE) specifications." The available TyPEs that would be most useful to behavioral health care practitioners currently include those developed by a team of researchers at the University of Arkansas Medical Center for use with depressive, phobic, and alcohol/substance use disorders. TyPEs for other specific psychological disorders are currently under development at the University of Arkansas for distribution through HOI. Quality of Life Measures. Andrews, Peters, and Teesson (1994) indicated that most of the definitions of the "quality of life" (QOL) describe a multidimensional construct encompassing physical, affective, cognitive, social, and economic domains. Objective measures of QOL focus on environmental resources required to meet a person's needs and can be completed by someone other than the patient. The subjective measures of QOL assess patients' satisfaction with the various aspects of their life and thus must be completed by patients. Andrews et al. (1994) indicated other distinctions in the QOL arena. One has to do with the differences between QOL and health-related quality of life, or HRQL; and (similar to the case with health status measures) the other has to do with the distinction between generic and condition-specific measures of QOL. QOL measures differ from HRQL measures in that the former assess the whole "fabric of life," whereas the latter assess quality of life as it is affected by a disease or disorder, or by its treatment. Generic measures are designed to assess aspects of life that are generally relevant to most people;

< previous page

page_13

next page >

< previous page

page_14

next page > Page 14

condition-specific measures are focused on aspects of the lives of particular disease/disorder populations. However, as Andrews et al. pointed out, there tends to be a great deal of overlap between generic and conditionspecific QOL measures. Service Satisfaction Measures. With the expanding interest in assessing the outcomes of treatment for the patient, it is not surprising to see an accompanying interest in assessing patients' and (in some instances) their family's satisfaction with the services received. In fact, many professionals and organizations equate satisfaction with outcomes and frequently consider it the most important outcome. In a recent survey of 73 behavioral health care organizations, 71% of the respondents indicated that their outcomes studies included measures of patient satisfaction (Pallak, 1994). Although service satisfaction is frequently viewed as an outcome, it should not be classified as such. Rather, it should be considered a measure of the overall therapeutic process, encompassing patients' (and at times, others') view of how the service was delivered, the capabilities and attentiveness of the service provider, the benefits of the service (if any), and any number of other selected aspects of the service they received. Patient satisfaction surveys do not answer the question, "What was the result of the treatment rendered to the patient?"; they do answer the question, "How did the patient feel about the treatment he or she received?" Thus, they serve an important program evaluation/improvement function. The number of questionnaires currently in use to measure patient satisfaction is countless. This reflects the attempts of individual health care organizations to develop customized measures that assess variables important to their particular needs, which in turn reflects a response to outside demands to "do something" to demonstrate the effectiveness of their services. Often, this "something" has not been evaluated to determine its basic psychometric properties. As a result, there exists numerous survey options that one may choose from, but very few that actually have demonstrated their validity and reliability as measures of service satisfaction. Fortunately, there are a few instruments that have been investigated for their psychometric integrity. Probably the most widely used and researched patient satisfaction instrument designed for use in behavioral health care settings is the eight-item version of the Client Satisfaction Questionnaire (CSQ-8; Attkisson & Zwick, 1982; Nguyen, Attkisson, & Stenger, 1983). The CSQ-8 was derived from the original 31-item CSQ (Larsen, Attkisson, Hargreaves, & Nguyen, 1979), which also yielded two longer 18-item alternate forms, the CSQ-18A and CSQ-18B (LeVois, Nguyen, & Attkisson, 1981). The more recent work of Attkisson and his colleagues at the University of California at San Francisco is the Service Satisfaction Scale-30 (SSS-30; Greenfield & Attkisson, 1989). The SSS-30 is a 30-item, multifactorial scale that yields information about several aspects of satisfaction with the services received, such as perceived outcome and manner and skill of the clinician. Guidelines for Instrument Selection Regardless of the type of instrument being considered for use in the therapeutic environment, clinicians frequently must choose between many product offerings. But what are the general criteria for the selection of any assessment instrument? What should guide the clinician's selection of an instrument for a specific purpose? As part of their training, psychologists and other mental health professionals have been educated about the

< previous page

page_14

next page >

< previous page

page_15

next page > Page 15

psychometric properties important to consider when determining the appropriateness of an instrument for its intended use. However, this is just one of several considerations that should be taken into account in evaluating an instrument for specific therapeutic use. Guidance regarding instrument selection has been offered by many experts. Probably the most thorough and clinically relevant guidelines for the selection of psychological assessment instrument comes from the National Institute of Mental Health (NIMH) supported work of Ciarlo, Brown, Edwards, Kiresuk, and Newman (1986). Newman, Ciarlo, and Carpenter present an update summary and synopsis of this NIMH work in chapter 5. The criteria they describe deal generally with applications, methods and procedures, psychometric features, as well as cost and utility considerations. Although these selection criteria were originally developed for use in evaluating instruments for outcomes assessment purposes, most also have relevance in the selection of instrumentation for screening, treatment planning, and treatment monitoring. The work of Ciarlo and his colleagues provides more extensive instrument selection guidelines than most. Others who have addressed the issue have arrived at recommendations that serve to reinforce and/or compliment those listed in the NIMH document. For example, Andrews' work in Australia has led to significant contributions to the body of outcomes assessment knowledge. As part of this, Andrews et al. (1994) identified six general "qualities of consumer outcome measures" that are generally in concordance with those from the NIMH study: applicability, acceptability, practicality, reliability, validity, and sensitivity to change. Ficken (1995) indicated that instruments used for screening purposes should (a) possess high levels of sensitivity and specificity to diagnostic criteria from the Diagnostic and Statistical Manual of Mental Disorders (4th ed., DSMIV; American Psychiatric Association, 1994) or the most up-to-date version of the International Classification of Diseases (ICD); (b) focus on hard-to-detect (in a single office visit) but treatable disorders associated with imminent harm to self or others, significant suffering, and a decrease in productivity; (c) require no more than 10 minutes to administer; and (d) have an administration protocol that easily integrates into the organization's work flow. Other sources (Burlingame, Lambert, Reisinger, Neff, & Mosier, 1995; Schlosser, 1995; Sederer et al., 1996) discuss this matter further. Psychological Assessment as a Tool for Screening Among the most significant ways in which psychological assessment can contribute to the development of an economic and efficient behavioral health care delivery system is through its ability to quickly identify individuals in need of mental health or substance use treatment services, and/or to determine the likelihood of the presence of a specific disorder or condition. The most important aspect of any screening procedure is the efficiency with which it can provide information useful to clinical decision making. In the field of psychology, the most efficient and thoroughly investigated screening procedures involve the use of psychological test instruments. The power or utility of a psychological screener lies in its ability to determine, with a high level of probability, whether respondents do or do not have a particular disorder or condition, or whether they are or are not a member of a group with clearly defined characteristics. The most commonly used screeners in daily clinical practice are those designed to identify some specific aspect of psychological functioning or disturbance, or to provide a broad overview of the respondent's point-in-time mental status. Examples of problem-specific

< previous page

page_15

next page >

< previous page

page_16

next page > Page 16

screeners include the Beck Depression Inventory (BDI; Beck, Rush, Shaw, & Emery, 1979) and State-Trait Anxiety Inventory (STAI; Spielberger, 1983). Examples of screeners for more generalized psychopathology or distress include the SA-45 and BSI. Research-Based Use of Psychological Screeners The establishment of a system for screening for a particular disorder or condition involves determining what it is clinicians want to screen in or screen out, the level of probability they feel comfortable at making that decision, and how many misclassifications (or what percentage of errors) they are willing to tolerate. Once it is decided what particular disorder or condition will be screened, they then must evaluate the instrument's classification efficiency statisticssensitivity, specificity, positive predictive power (PPP), and negative predictive power (NPP)to determine if a given instrument is suitable for the intended purpose(s). These statistics and related issues are described in detail in chapter 2. Implementation of Screeners into the Daily Work Flow of Service Delivery The utility of a screening instrument is only as good as the degree to which it can be integrated into an organization's daily regimen of service delivery. This, in turn, depends on a number of factors. The first is the degree to which the administration and scoring of the screener is quick and easy, and the amount of time required to train the provider's staff to successfully incorporate the screener into the daily work flow. The second factor relates to the instrument's use. Generally, screeners are developed to assist in determining the likelihood that the patient does or does not have the specific condition or characteristic the instrument is designed to identify. Use for any other purpose (e.g., assigning a diagnosis based solely on screener results, determining the likelihood of the presence of other characteristics) only serves to undermine the integrity of the instrument in the eyes of staff, payers, and other parties with a vested interest in the screening process. The third factor has to do with the ability of the provider to act on the information obtained from the screener. It must be clear how the clinician should proceed based on the information available. The final factor is staff acceptance and commitment to the screening process. This comes only with a clear understanding of the importance of the screening, the usefulness of the obtained information, and how the screening process is to be incorporated into the organization's daily work flow. Ficken (1995) provided an example of how screeners can be integrated into an assessment system designed to assist primary care physicians in identifying patients with psychiatric disorders. This system (which also allows for the incorporation of practice guidelines) seems to take into account the first three utility-related factors mentioned earlier. It begins with the administration of a screener that is highly sensitive and specific to DSM- or ICD-related disorders. These screeners should require no more than 10 minutes to complete, and "their administration must be integrated seamlessly into the standard clinical routine" (p. 13). Somewhat similar to the sequence described by Derogatis and DellaPietra (1994), positive findings would lead to a second level of testing. Here, another screener that meets the same requirements as those for the first

< previous page

page_16

next page >

< previous page

page_17

next page > Page 17

screener and also affirms or rules out a diagnosis would be administered. Positive findings would lead to additional assessment for treatment planning purposes. Consistent with standard practice, Ficken recommended confirmation of screener findings by a qualified psychologist or physician. Psychological Assessment as a Tool for Treatment Planning Problem identification through the use screening instruments is only one way in which psychological assessment can facilitate the treatment of behavioral health problems. When employed by a trained clinician, psychological assessment also can provide information that can greatly facilitate and enhance the planning of a specific therapeutic intervention for the individual patient. It is through the implementation of a tailored treatment plan that the patient's chances of problem resolution are maximized. The importance of treatment planning has received significant attention during recent years. The reasons for this recognition were summarized (Maruish, 1990) as follows: "Among important and interrelated reasons . . . [are] concerted efforts to make psychotherapy more efficient and cost effective, the growing influence of `third parties' (insurance companies and the federal government) that are called upon to foot the bill for psychological as well as medical treatments, and society's disenchantment with open-ended forms of psychotherapy without clearly defined goals" (p. iii). The role that psychological assessment can play in planning a course of treatment for behavioral health care problems is significant. Butcher (1990) indicated that information available from instruments such as the MMPI2 not only can assist in identifying problems and in establishing communication with the patient, it also can help ensure that the plan for treatment is consistent with the patient's personality and external resources. In addition, psychological assessment may reveal potential obstacles to therapy, areas of potential growth, and problems of which the patient may not be consciously aware. Moreover, both Butcher (1990) and Appelbaum (1990) viewed testing as a means of quickly obtaining a second opinion. Other benefits of the results of psychological assessment identified by Appelbaum include assistance in identifying patient strengths and weaknesses, identification of the complexity of the patient's personality, and establishment of a reference point during the therapeutic episode. The type of treatment-relevant information that can be derived from patient assessment and the manner in which it is applied are quite varieda fact that will become evident later. Regardless, Strupp (see Butcher, 1990) probably provided the best summary of the potential contribution of psychological assessment to treatment planning, stating that "careful assessment of patient's personality resources and liabilities is of inestimable importance. It will predictably save money and avoid misplaced therapeutic effort; it can also enhance the likelihood of favorable treatment outcomes for suitable patients" (pp. v-vi). Assumptions About Treatment Planning The introduction to this section presented a broad overview of ways in which psychological assessment can assist in the development and successful implementation of treatment plans for behavioral health care patients. These and other benefits are discussed

< previous page

page_17

next page >

< previous page

page_18

next page > Page 18

in greater detail later. However, it is important to first clarify what treatment planning is and some of the general, implicit assumptions that can typically be made about this important therapeutic activity. For the purpose of this discussion, the term treatment planning is defined as that part of a therapeutic episode in which a set of goals for an individual presenting with mental health or substance abuse problems is developed, and the specific means by which the therapist or other resources will assist the patient in achieving those goals is identified. The following are some general assumptions underlying the treatment planning process: 1. Patients are experiencing behavioral health problems that have been identified either by themselves or another party. Common external sources of problem identification include a spouse, parent, teacher, employer, and the legal system. 2. Patients experience some degree of internal and/or external motivation to eliminate or reduce the identified problems. An example of external motivation to change is the potential loss of a job or dissolution of a marriage if problems are not resolved to the satisfaction of the other party. 3. The goals of treatment are tied either directly or indirectly to the identified problems. 4. The goals of treatment have definable criteria for achievement, are indeed achievable by the patient, and are developed by the patient in collaboration with the clinician. 5. The prioritization of goals is reflected in the treatment plan. 6. Patients' progress toward the achievement of the treatment goals can be tracked and compared against an expected path of improvement in either a formal or informal manner. This expected path of improvement may be based on the clinician's experience or (ideally) on objective data gathered on similar patients. 7. Deviations from the expected path of improvement will lead to a modification in the treatment plan, followed by subsequent monitoring to determine the effectiveness of the alteration. These assumptions should not be considered exhaustive, nor are they likely to reflect what actually occurs in all situations. For example, some patients seen for therapeutic services have no motivation to change. As may be seen in juvenile detention settings or in cases where children are brought to treatment by the parents, their participation in treatment is forced, and they may exert no effort to change. In the more extreme cases, they might in fact engage in intentional efforts to sabotage the therapeutic intervention. In other cases, it is likely that some clinicians continue to identify and prioritize treatment goals without the direct input of patient. Regardless, the previous assumptions have a direct bearing on the manner in which psychological assessment can best serve treatment-planning efforts. The Benefits of Psychological Assessment for Treatment Planning As pointed out earlier, there are several ways in which psychological assessment can assist in the planning of treatment for behavioral health care patients. The more common and evident contributions can be organized into four general categories: problem identification, problem clarification, identification of important patient characteristics, and monitoring of treatment progress. Problem Identification. Probably the most common use of psychological assessment in the service of treatment planning is for problem identification. Often, the use of psychological testing per se is not needed to identify what problems patients are experiencing.

< previous page

page_18

next page >

< previous page

page_19

next page > Page 19

They either will tell the clinician directly without questioning, or they will admit their problem(s) while questioned during a clinical interview. However, this is not always the case. The value of psychological testing becomes apparent in those cases where patients are hesitant or unable to identify the nature of their problems. With a motivated and engaged patient who responds openly and honestly to items on a well-validated and reliable test, the process of identifying what led the patient to seek treatment may be greatly facilitated. Cooperation shown during testing may be attributable to the nonthreatening nature of questions presented on paper or a computer monitor (as opposed to those posed by another human being); the subtle, indirect qualities of the questions themselves (compared to those asked by the clinician); or a combination of these reasons. In addition, the nature of some of the more commonly used psychological test instruments allows for the identification of secondary, but significant, problems that might otherwise be overlooked. Multidimensional inventories such as the MMPI-2 and the PAI are good examples of these types of instruments. Moreover, these instruments may be sensitive to other patient symptoms, traits, or characteristics that may exacerbate or otherwise contribute to the patient's problems. Note that the type of problem identification described here is different from that conducted during screening. Whereas screening is focused on determining the presence or absence of a single problem, problem identification generally takes a broader view and investigates the possibility of the presence of multiple problem areas. At the same time, there also is an attempt to determine problem severity and the extent to which the problem area(s) affect the patient's ability to function. Problem Clarification. Psychological testing can often assist in the clarification of a known problem. Through tests designed for use with populations presenting problems similar to those of the patient, aspects of identified problems can be elucidated. Information gained from these tests can improve the patient's and clinician's understanding of the problem, and lead to the development of a better treatment plan. The three most important types of information that can be gleaned for this purpose are the severity of the problem, the complexity of the problem, and the degree to which the problem impairs the patient's ability to function in one or more life roles. The manner is which a patient is treated depends a great deal on the severity of the problem. In particular, problem severity plays a significant role in determining the setting in which the behavioral health care intervention is provided. Those patients whose problems are so severe that they are considered a danger to themselves or others are often best suited for inpatient treatment, at least until dangerousness is no longer an issue. Similarly, problem severity may be a primary criterion that signals the necessity of evaluation for a medication adjunct to treatment. Severity also may have a bearing on the type of psychotherapeutic approach taken by the clinician. For example, it may be more productive for the clinician to take a supportive role with severe cases; all things being equal, a more confrontive approach may be more appropriate with patients whose problems are mild to moderate in severity. As alluded to earlier, the problems of patients seeking behavioral health care services are frequently multidimensional. Patient and environmental factors that play into the formation and maintenance of a psychological problem, along with the problem's relation with other conditions, all contribute to its complexity. Knowing the complexity

< previous page

page_19

next page >

< previous page

page_20

next page > Page 20

of the target problem is invaluable in developing an effective treatment plan. Again, multidimensional instruments or batteries of tests, each measuring specific aspects of psychological dysfunction, serve this purpose well. As with problem severity, knowledge of the complexity of a patient's problems can help the clinician and patient in many aspects of treatment planning, including determination of appropriate setting, therapeutic approach, need for medication, and other important decisions. However, possibly of equal importance to the patient and other concerned parties (wife, employer, school, etc.) is the extent to which these problems affect patients' ability to function in their role as parent, child, employee, student, friend, and so on. Information gathered from the administration of measures designed to assess role functioning clarifies the impact of the patient's problems and serves to establish role-specific goals. It also can identify other parties that may serve as potential allies in the therapeutic process. In general, the most important role-functioning domains to assess are those related to work or school performance, interpersonal relationships, and activities of daily living (ADLs). Identification of Important Patient Characteristics. The identification and clarification of the patient's problems is of key importance in planning a course of treatment. However, there are numerous other types of patient information not specific to the identified problem that can be useful in planning treatment and easily identified through the use of psychological assessment instruments. The vast majority of treatment plans are developed or modified with consideration to at least some of these nonpathological characteristics. The exceptions are generally found with clinicians or programs that take a ''one-size-fits-all" approach to treatment. Probably the most useful type of information not specific to the identified problem that can be gleaned from psychological assessment is the identification of patient characteristics that can serve as assets or areas of strength for patients in working to achieve their therapeutic goals. For example, Morey and Henry (1994) pointed to the utility of the PAI's Nonsupport scale in identifying whether patients perceive an adequate social support network, this being a predictor of positive therapeutic progress. Other examples include "normal" personality characteristics, such as that which can be obtained from Gough, McClosky, and Meehl's Dominance (1951) and Social Responsibility scales (1952) developed for use with the MMPI/MMPI-2. Greene (1991) indicated that those with high scores on the Dominance scale are described as "being able to take charge of responsibility for their lives. They are poised, self-assured, and confident of their own abilities" (p. 209). Gough and his colleagues interpreted high scores on the Social Responsibility scale as being indicative of individuals who, among other things, trust the world, are self-assured and poised, and believe that individuals must carry their share of duties. Thus, scores on these and similar types of scales may reveal important aspects of patient functioning that can be used to affect therapeutic change. Similarly, knowledge of the patient's weaknesses or deficits also may impact the type of treatment plan. Greene and Clopton (1994) provided numerous types of deficit-relevant information from the MMPI-2 Content scales that have implications for treatment planning. For example, a clinically significant score (T > 64) on the Anger scale should lead clinicians to consider the inclusion of training in assertiveness and/or anger control as part of the patient's treatment. On the other hand, uneasiness in social situations, as suggested by a significantly elevated score on either the Low Self-Esteem or Social Discomfort scale, suggests that a supportive approach to the intervention would be beneficial, at least initially.

< previous page

page_20

next page >

< previous page

page_21

next page > Page 21

Moreover, use of specially designed scales and procedures can provide information related to the patient's ability to become engaged in the therapeutic process. For example, the Therapeutic Reactance Scale (Dowd, Milne, & Wise, 1991) and the MMPI-2 Negative Treatment Indicators Content Scale developed by Butcher and his colleagues (Butcher, Graham, Williams, & Ben-Porath, 1989) may be useful in determining whether the patient is likely to resist therapeutic intervention. Morey and Henry (1994) presented algorithms utilizing PAI T-scores that may be useful in making statements about the presence of characteristics that bode well for the therapeutic endeavor (e.g., sufficient distress to motivate engagement in treatment, the ability to form a therapeutic alliance). Other types of patient characteristics that can be identified through psychological assessment have implications for selecting the best therapeutic approach for a given patient and thus can contribute significantly to the treatment planning process. Moreland (1996) pointed out how psychological assessment can assist in determining if the patient deals with problems through internalizing or externalizing behaviors. He noted that all things being equal, internalizers would probably profit most from an insight-oriented approach rather than a behaviorally oriented approach. The reverse would be true for externalizers. And through their work over the years, Beutler and his colleagues (Beutler & Clarkin, 1990; Beutler, Wakefield, & R.E. Williams, 1994; Beutler & O.B. Williams, 1995) identified several patient characteristics that are important to matching patients and treatment approaches for maximized therapeutic effectiveness. These are addressed in detail in chapter 3. Monitoring of Progress Along the Path of Expected Improvement. Information from repeated testing during the treatment process can help the clinician to determine if the treatment plan is appropriate for the patient at a given point in time. Thus, many clinicians use psychological assessment to determine whether their patients are showing the expected improvement as treatment progresses. If not, adjustments can be made. These adjustments may reflect the need for more intensive or aggressive treatment (e.g., increased number of psychotherapeutic sessions each week, addition of a medication adjunct), less intensive treatment (e.g., reduction or discontinuation of medication, transfer from inpatient to outpatient care), or a different therapeutic approach (e.g., changing from humanistic therapy to cognitive-behavioral therapy). Regardless, any modifications require later reassessment of the patient to determine if the treatment revisions have impacted patient progress in the expected direction. This process may be repeated any number of times. These "in-treatment" reassessments also can provide information relevant to the decision of when to terminate treatment. The goal of monitoring is to determine whether treatment is "on track" with expected progress at a given point in time. When and how often clinicians might assess the patient is dependent on a few factors. The first is the instrumentation. Many instruments are designed to assess the patient's status at the time of testing. Items on these measures are generally worded in the present tense (e.g., "I feel tense and nervous," "I feel that my family loves and cares about me"). Changes from one day to the next on the construct(s) measured by these instruments should be reflected in the test results. Other instruments, however, ask the patient to indicate if a variable of interest has been present, or how much or to what extent it has occurred during a specific time period in the past. The items usually are asked in the context of something like "During the past month, how often have you . . . ?" or "During the past week, to what extent has . . . ?" Readministration of these interval-of-time-specific measures or subsets of items within them should be undertaken only after a period of time equivalent to or longer than the

< previous page

page_21

next page >

< previous page

page_22

next page > Page 22

time interval to be considered in responding to the items. For example, an instrument that asks the patient to consider the extent to which certain symptoms have been problematic "during the past 7 days" should not be readministered for at least 7 days. The responses from a readministration that occurs less than 7 days after the first administration would include patients' consideration of their status during the previously considered time period. This may make interpretation of the change of symptom status (if any) from the first to the second administration difficult, if not impossible. Methods to determine if clinically significant change has occurred from one point in time to another have been developed and can be used for treatment monitoring. These methods are discussed in the outcomes assessment section of this chapter and in chapter 7. However, another approach to monitoring therapeutic change, referred to as the glide-path approach, may be superior. The term glide path refers to the narrow path of descent that airplanes must follow when landing. Deviation from the flight glide path requires corrections in the plane's speed, altitude, and/or attitude in order to land safely. R.L. Kane (personal communication, July 22, 1996) indicated that just as pilots have the instrumentation to alert them about the plane's position on the glide path, the clinician may use assessment instruments to track how well the patient is following the "glide path of treatment." The glide path, in this case, represents expected improvement over time in one or more measurable areas of functioning (e.g., symptom severity, social role functioning, occupational performance). The expectations would be based on objective data obtained from similar patients at various points during their treatment and would allow for minor deviations from the path. The end of the glide path is one or more specific goals that are part of the treatment plan. Thus, "arrival" at the end of the glide path signifies the attainment of specific treatment goal(s). Psychological Assessment as a Tool for Outcomes Management The 1990s have witnessed accelerating growth in the level of interest and development of behavioral health care outcomes programs. Cagney and Woods (1994) attributed this to four major factors. First, behavioral health care purchasers are asking for information regarding the value of the services they buy. Second, there is an increasing number of purchasers who are requiring a demonstration of patient improvement and satisfaction. Third, MCOs need data demonstrating that their providers render efficient and effective services. And fourth, outcomes information will be needed for the "quality report cards" that MCOs anticipate they will be required to provide in the future. In short, fueled by soaring health care costs, there has been an increasing need for providers to demonstrate that what they do is effective. And all of this has occurred within the context of the CQI movement, in which there has been similar trends in the level of interest and growth. As noted previously, the interest in and necessity for outcomes measurement and accountability in this era of managed care provides a unique opportunity for psychologists to use their training and skills in assessment (Maruish, 1994). However, the extent to which psychologists and other trained professionals become a key and successful contributor to an organization's outcomes initiative (whatever that might be) will depend on their understanding of what "outcomes" and their measurement and applications are all about.

< previous page

page_22

next page >

< previous page

page_23

next page > Page 23

What are Outcomes? Before discussing outcomes, it is important to have a clear understanding of what is meant by the term. Experience has shown that its meaning varies depending on the source. Donabedian (1985) identified three dimensions of quality of care. The first is structure. This refers to various aspects of the organization providing the care, including how the organization is "organized," the physical facilities and equipment, and the number and professional qualifications of its staff. Process refers to the specific types of services provided to a given patient (or group of patients) during a specific episode of care. These might include various tests and assessments (e.g., psychological tests, lab tests, magnetic resonance imaging), therapeutic interventions (e.g., group psychotherapy, medication), and discharge planning activities. Processes that address treatment complications (e.g., drug reactions) also are included here. Outcomes, on the other hand, refers to the results of the specific treatment that was rendered. As for the relation between these three facets of quality of care, Brook, McGlynn, and Cleary (1996) noted that "if quality-of-care criteria based on structural or process data are to be credible, it must be demonstrated that variations in the attribute they measure lead to differences in outcome. If outcome criteria are to be credible, it must be demonstrated that differences in outcome will result if the processes of care under the control of health professionals are altered" (p. 966). The outcomes, or results, of treatment should not imply a change in only a single aspect of functioning. Treatment may impact multiple facets of a patient's life. Stewart and Ware (1992) identified five broad aspects of general health status: physical health, mental health, social functioning, role functioning, and general health perception. Treatment may affect each of these aspects of health in different ways, depending on the disease or disorder being treated and the effectiveness of the treatment. Some specific aspects of functioning related to these five areas of general health status that are commonly measured include feelings of well-being, psychological symptom status, use of alcohol and other drugs, functioning on the job or at school, marital/family relationships, utilization of health care services, and ability to cope. In considering the types of outcomes that might be assessed in behavioral health care settings, a substantial number of clinicians probably would identify symptomatic change in psychological status as being the most important. However important change in symptom status may have been in the past, psychologists and other behavioral health care providers have come to realize that change in many other aspects of functioning identified by Stewart and Ware (1992) are equally important indicators of treatment effectiveness. As Sederer et al. (1996) noted, Outcome for patients, families, employers, and payers is not simply confined to symptomatic change. Equally important to those affected by the care rendered is the patient's capacity to function within a family, community, or work environment or to exist independently, without undue burden to the family and social welfare system. Also important is the patient's ability to show improvement in any concurrent medical and psychiatric disorder. . . . Finally, not only do patients seek symptomatic improvement, but they want to experience a subjective sense of health and well being. (p. 2) A much broader perspective is offered in Faulkner and Gray's The 1995 Behavioral Outcomes and Guidelines Sourcebook (Migdail, Youngs, & Bengen-Seltzer, 1995): Outcomes measures are being redefined from a vague "is the patient doing better?" to more specific questions, such as, "Does treatment work in ways that are measurably valuable to the

< previous page

page_23

next page >

< previous page

page_24

next page > Page 24

patient in terms of daily functioning level and satisfaction, to the payer in terms of value for each dollar spent, to the managed care organization charged with administering the purchaser's dollars, and to the clinician charged with demonstrating value for hours spent?" (p. 1) Thus, "outcomes" holds a different meaning for each of the different parties who have a stake in behavioral health care delivery; what is measured generally depends on the purpose(s) for which outcomes assessment is undertaken. As is shown here, these vary greatly. Outcomes Assessment: Measurement, Monitoring, and Management Just as it is important to be clear about what is meant by outcomes, it is equally important to clarify the three general purposes for which outcomes assessment may be employed. The first is outcomes measurement. This involves nothing more than pre-and posttreatment assessment of one or more variables to determine the amount of change that has occurred (if any) in these variables as a result of therapeutic intervention. A more useful approach is that of outcomes monitoring. This refers to "the use of periodic assessment of treatment outcomes to permit inferences about what has produced change" (Dorwart, 1996, p. 46). Like treatment progress monitoring used for treatment planning, outcomes monitoring involves the tracking of changes in the status of one or more outcomes variables at multiple points in time. Assuming a baseline assessment at the beginning of treatment, reassessment may occur one or more times during the course of treatment (e.g., weekly, monthly), at the time of termination, and/or during one or more periods of posttermination follow-up. Whereas treatment progress monitoring is used to determine deviation from the expected course of improvement, outcomes monitoring focuses on revealing aspects about the therapeutic process that seem to affect change. The third, and most useful, purpose of outcomes assessment is that of outcomes management. Dorwart (1996) defined outcomes management as "the use of monitoring information in the management of patients to improve both the clinical and administrative processes for delivering care" (pp. 46-47). Whereas Dorwart appeared to view outcomes management as relevant to the individual patient, it is a means to improve the quality of services offered to the patient population(s) served by the provider, not to any one patient. Information gained through the assessment of patients can provide the organization with indications of what works best, with whom, and under what set of circumstances, thus helping to improve the quality of services for all patients. In essence, outcomes management can serve as a tool for those organizations with an interest in implementing a CQI initiative (discussed later). The Benefits of Outcomes Assessment The implementation of any type of outcomes assessment initiative within an organization does not come without effort from and cost to the organization. However, if implemented properly, all interested partiespatients, clinicians, provider organizations, payers, and the health care industry as a wholeshould find the yield from the outlay of time and money to be substantial. Cagney and Woods (1994) identified several benefits to patients, including enhanced health and quality of life, improved health care quality, and effective use of the dollars paid into benefits plans. For providers, the outcomes data can result in

< previous page

page_24

next page >

< previous page

page_25

next page > Page 25

improved clinical skills, information related to the quality of care provided and to local practice standards, increased profitability, and decreased concerns over possible litigation. Outside of the clinical context, benefits also can accrue to payers and MCOs. Cagney and Woods (1994) saw the potential payer benefits as including healthier workers, improved health care quality, increased worker productivity, and reduced or contained health care costs. As for MCOs, the benefits include increased profits, information that can shape the practice patterns of their providers, and a decision-making process based on delivering quality care. The Therapeutic Use of Outcomes Assessment The foregoing overview provides the background necessary for discussing the use of outcomes data from psychological assessment in day-to-day clinical practice. Whereas the focus of the previous review was centered on both the individual patient and patient populations, it now will narrow to the use of outcomes assessment primarily in service to the individual patient. The reader interested in issues related to large, organization-wide outcomes studies conducted for outcomes management purposes (as defined earlier) is referred to chapter 8. The reader also is encouraged to seek other sources of information that specifically address that topic (see, e.g., Migdail et al., 1995; Newman, 1994). There is no one system or approach to the assessment of treatment outcomes for an individual patient that is appropriate for all providers of behavioral health care services. Because of the various types of outcomes of interest, the reasons for assessing them, and the manner in which they may impact decisions, any successful and useful outcomes assessment approach must be customized. Customization should reflect the needs of the primary benefactor of the assessment information (i.e., patient, payer, or provider), with consideration to the secondary stakeholders in the therapeutic endeavor. Ideally, the identified primary benefactor would be the patient. Although this is not always the case, it appears that only rarely would the patient not benefit from involvement in the outcomes assessment process. Following are considerations and recommendations for the development and implementation of an outcomes assessment initiative by behavioral health care providers. Although space limitations do not allow a comprehensive review of all issues and solutions, the information that follows can be useful to psychologists and others with similar training who wish to incorporate outcomes assessment into their standard therapeutic routine. Purpose of the Outcomes Assessment. There are numerous reasons for assessing outcomes. For example, in a recent survey of 73 behavioral health care organizations, the top five reasons (in descending order) identified by the participants as to why they had conducted an outcomes program were evaluation of outcomes for patients, evaluation of provider effectiveness, evaluation of integrated treatment programs, management of individual patients, and support of sales and marketing efforts (Pallak, 1994). However, from the clinician's standpoint, a couple of purposes are worth noting. In addition to monitoring the course of progress during treatment, clinicians may employ outcomes assessment to obtain a direct measure of how much patient improvement has occurred as the result of a completed course of treatment intervention. Here, the findings are of more benefit to the clinician than to patients themselves because a pre- and posttreatment approach to the assessment is utilized. The information will not lead to any change in the patient being assessed, but the feedback it provides to clinicians could assist them in the treatment of other patients in the future.

< previous page

page_25

next page >

< previous page

page_26

next page > Page 26

Another common reason for outcomes assessment is to demonstrate the patient's need for therapeutic services beyond that which is typically covered by the patient's health care benefits. When assessment is conducted for this reason, the patient and the clinician both may benefit from the outcomes data. However, the type of information that a third-party payer requires for authorization of extended benefits may not always be useful, relevant, or beneficial to the patient or the clinician. What to Measure. The specific aspects or dimensions of patient functioning that are measured as part of outcomes assessment will depend on the purpose for which the assessment is being conducted. As discussed earlier, probably the most frequently measured variable is that of symptomatology or psychological/mental health status. After all, disturbance or disruption in this dimension is probably the most common reason why people seek behavioral health care services in the first place. However, there are other reasons for seeking help. Common examples include difficulties in coping with various types of life transitions (e.g., a new job, a recent marriage or divorce, other changes in the work or home environment), an inability to deal with the behavior of others (e.g., spouse, children), or general dissatisfaction with life. Additional assessment of related variables therefore may be necessary, or may even take precedence over the assessment of symptoms or other indicators. Nevertheless, in the vast majority of the cases seen for behavioral health care services, the assessment of the patient's overall level of psychological distress or disturbance will yield the most singularly useful information. This is regardless of whether it is used for outcomes measurement, outcomes monitoring, outcomes management, or to meet the requirements of third-party payers. Indices such as the Positive Symptom Total (PST) or Global Severity Index (GSI), which are part of both the SA-45 and the BSI, can provide this type of information efficiently and economically. For some patients, measures of one or more specific psychological disorders or symptom clusters are at least as important, if not more important, than overall symptom or mental health status. Here, if interest is in only one disorder or symptom cluster (e.g., depression), clinicians may choose to measure only that particular set of symptoms using an instrument designed specifically for that purpose (e.g., use of the BDI with depressed patients). For those interested in assessing the outcomes of treatment relative to multiple psychological dimensions, the administration of more than one disorder-specific instrument or a single, multiscale instrument that assesses all or most of the dimensions of interest would be required. Again, instruments such as the SA-45 or the BSI can provide a quick, broad assessment of several symptom domains. Although much lengthier, other multiscale instruments, such as the MMPI-2 or the PAI, permit a more detailed assessment of multiple disorders or symptom domains using one inventory. In many cases, the assessment of mental health status alone is adequate for outcomes assessment purposes. There are other instances in which changes in psychological distress or disturbance either (a) provide only a partial indication of the degree to which therapeutic intervention has been successful and are not of interest to the patient or a third-party payer; (b) are unrelated to the reason why the patient sought services in the first place; or (c) are otherwise inadequate or unacceptable as measures of improvement in the patient's condition. One may find that for some patients, improved functioning on the job, at school, or with family or friends is much more relevant and important than symptom reduction. For other patients, improvement in their quality of life or sense of well-being is more meaningful.

< previous page

page_26

next page >

< previous page

page_27

next page > Page 27

It is not always a simple matter to determine exactly what should be measured. However, careful consideration of the following questions should greatly facilitate the decision: 1. Why did the patient seek services? People pursue treatment for many reasons. The patient's stated reason for seeking therapeutic assistance may be the first clue in determining what is important to measure. 2. What does the patient hope to gain from treatment? The patients' stated goals for the treatment they are about to receive may be a primary consideration in the selection of outcomes to assess. 3. What are the patient's criteria for successful treatment? The patient's goals for treatment may provide only a broad target for the therapeutic intervention. Having the patient identify exactly what would have to happen to consider treatment successful and no longer necessary would help in specifying the most important constructs and/or behaviors to assess. 4. What are the clinician's criteria for the successful completion of the current therapeutic episode? What patients identify as being important to accomplish during treatment might reflect a lack of insight into their problems, or it might be inconsistent with what an impartial observer would consider indicative of meaningful improvement. In such cases, it probably would be more appropriate for the clinician to determine what constitutes therapeutic success and the associated outcomes variables. 5. What are the criteria for the successful completion of the current therapeutic episode by significant third parties? From a strict therapeutic perspective, this should be given the least amount of consideration. From a more realistic perspective, the expectations and limitations that one or more third parties have for the treatment rendered cannot be overlooked. The expectations and limitations set by the patient's parents/guardian, significant other, health care plan, guidelines of the organization in which the clinician practices, and possibly other external forces may significantly play into the decision about when to terminate treatment. 6. What, if any, are the outcomes initiatives within the provider organization? One cannot ignore any outcomes programs that have been initiated by the organization in which the therapeutic services are delivered. Regardless of the problems and goals of the individual patient, organization-wide studies of treatment effectiveness may dictate the gathering of specific types of outcomes data from patients who have received services. Note that the selection of the variables to be assessed may address more than one of the previous issues. Ideally, this is what should happen. However, clinicians need to ensure that the task of gathering outcomes data does not become too burdensome. As a general rule, the more outcomes data they attempt to gather from a given patient or collateral, the less likely it is that they will obtain any data at all. The key is to identify the point where the amount of data that can be obtained from a patient and/or collaterals and the ease at which it can be gathered are optimized. How to Measure. Once the decision of what to measure has been made, clinicians must then decide how it should be measured. In many cases, the most important data will be that obtained directly from the patient using self-report instruments. Underlying this assertion is the assumption that valid and reliable instrumentation, appropriate to the needs of the patient, is available to the clinician; the patient can read at the level required by the instruments; and the patient is motivated to respond honestly to the questions asked. Barring one or more of these conditions, other options should be considered. Other types of data-gathering tools may be substituted for self-report measures. Rating scales completed by the clinician or other members of the treatment staff may provide information that is as useful as that elicited directly from the patient. In those

< previous page

page_27

next page >

< previous page

page_28

next page > Page 28

cases in which the patient is severely disturbed, unable to give valid and reliable answers (e.g., younger children), unable to read or is an otherwise inappropriate candidate for a self-report measure, clinical rating scales can serve as a valuable substitute for gathering information about the patient. Related to these instruments are parent-completed inventories for child and adolescent patients. These are particularly useful in obtaining information about the child's or teen's behavior that might not otherwise be known. Collateral rating instruments and parent report instruments can also be used to gather information in addition to that obtained from self-report measures. When used in this manner, these instruments provide a mechanism by which the clinician, other treatment staff, and/or parents, guardians, or other collaterals can contribute data to the outcomes assessment endeavor. This not only results in the clinician or provider organization having more information on which to evaluate the outcomes of therapeutic intervention, it also gives the clinician an opportunity to ensure that the perspectives of the treatment provider and/or relevant third parties are considered in this evaluation. Another potential source of outcomes information is administrative data. In many of the larger provider organizations, this information can easily be retrieved through the organization's management information systems (MISs). Data related to the patient's diagnosis, dose and regimen of medication, physical findings, course of treatment, resource utilization, treatment costs, and other types of data typically stored in these systems can be useful in evaluating the outcomes of therapeutic intervention. When to Measure. There are no hard and fast rules or widely accepted conventions related to when outcomes should be assessed. The common practice is to assess the patient at least at treatment initiation and termination/discharge. Obviously, at the beginning of treatment, the clinician should obtain a baseline measure of whatever variables will be measured at termination. At minimum, this allows for "outcomes measurement" as previously described. As has also been discussed, additional assessment of the patient on the variables of interest can take place at other points in time, that is, at other times during treatment and on postdischarge follow-up. Many would argue that postdischarge/posttermination follow-up assessment provides the best or most important indication of the outcomes of therapeutic intervention. Two types of comparisons may be made on follow-up. The first is a comparison of the patient's status on the variables of interesteither at the time of treatment initiation or at the time of discharge or terminationto that of the patient at the time of follow-up assessment. Either way, this follow-up data will provide an indication of the more lasting effects of the intervention. Generally, the variables of interest for this type of comparison include symptom presence and intensity, feeling of well-being, frequency of substance use, and social or role functioning. The second type of posttreatment investigation involves comparing the frequency or severity of some aspect(s) of the patient's life circumstances, behavior, or functioning that occurred during an interval of time prior to treatment, to that which occurred during an equivalent period of time immediately preceding the postdischarge assessment. This approach is commonly used in determining the medical cost offset benefits of treatment. For example, the number of times a patient has been seen in an emergency room for psychiatric problems during the 3-month period preceding the initiation of outpatient treatment can be compared to the number of emergency room visits during the 3-month period preceding the postdischarge follow-up assessment. Not only can this provide an indication of the degree to which treatment has helped patients deal with their problems, it also can demonstrate how much medical expenses have been reduced through the patients' decreased use of costly emergency room services.

< previous page

page_28

next page >

< previous page

page_29

next page > Page 29

In general, postdischarge outcomes assessment probably should take place no sooner than 1 month after treatment has ended. When feasible, waiting 3 to 6 months to assess the variables of interest is preferred. A longer interval between discharge and postdischarge follow-up should provide a more valid indication of the lasting effects of treatment. Assessments being conducted to determine the frequency at which some behavior or event occurs (as may be needed to determine cost offset benefits) should be administered no sooner than the reference time interval used in the baseline assessment. For example, suppose that the patient reports 10 emergency room visits during the 3-month period prior to treatment. If a clinician wants to know if the patient's emergency room visits have decreased after treatment, the assessment cannot take place any earlier than 3 months after treatment termination. How to Analyze Outcomes Data. There are two general approaches to the analysis of treatment outcomes data. The first is by determining whether changes in patient scores on outcomes measures are statistically significant. The other is by establishing whether these changes are clinically significant. Use of standard tests of statistical significance is important in the analysis of group or population change data. This topic is addressed in chapter 8. Clinical significance is more relevant to change in the individual patient's scores. As this chapter's focus is on the individual patient, this section centers on matters related to determining clinically significant change as the result of treatment. The issue of clinical significance has received a great deal of attention in psychotherapy research during the past several years. This is at least partially owing to the work of Jacobson and his colleagues (Jacobson, Follette, & Revenstorf, 1984, 1986; Jacobson & Truax, 1991) and others (e.g., Christensen & Mendoza, 1986; Speer, 1992; Wampold & Jenson, 1986). Their work came at a time when researchers began to recognize that traditional statistical comparisons do not reveal a great deal about the efficacy of therapy. In discussing the topic, Jacobson and Truax broadly defined the clinical significance of treatment as "its ability to meet standards of efficacy set by consumers, clinicians, and researchers" (p. 12). Further, they noted that while there is little consensus in the field regarding what these standards should be, various criteria have been suggested: a high percentage of clients improving . . .; a level of change that is recognizable by peers and significant others . . .; an elimination of the presenting problem . . .; normative levels of functioning at the end of therapy . . .; high end-state functioning at the end of therapy . . .; or changes that significantly reduce one's risk for various health problems. (p. 12) From their perspective, Jacobson and his colleagues (Jacobson et al., 1984; Jacobson & Truax, 1991) felt that clinically significant change could be conceptualized in one of three ways. Thus, for clinically significant change to have occurred, the measured level of functioning following the therapeutic episode would either fall outside the range of the dysfunctional population by at least two standard deviations from the mean of that population, in the direction of functionality; fall within two standard deviations of the mean for the normal or functional population; or be closer to the mean of the functional population than to that of the dysfunctional population. Jacobson and Truax viewed the third option as being the least arbitrary, and they provided different recommendations for determining cutoffs for clinically significant change, depending on the availability of normative data. At the same time, these same investigators noted the importance of considering the change in the measured variables of interest from pre- to posttreatment in addition to

< previous page

page_29

next page >

< previous page

page_30

next page > Page 30

the patient's functional status at the end of therapy. To this end, Jacobson et al. (1984) proposed the concomitant use of a reliable change (RC) index to determine whether change is clinically significant. This index, modified on the recommendation of Christensen and Mendoza (1986), is nothing more than the pretest score minus the posttest score divided by the standard error of the difference of the two scores. The RC index and its use are discussed in detail in chapter 4. Psychological Assessment as a Tool for Continuous Quality Improvement Implementing a regimen of psychological testing for planning treatment and/or assessing its outcome has a place in all organizations where the delivery of cost-efficient, quality behavioral health care services is a primary goal. However, additional benefits can accrue from testing when it is incorporated within an ongoing program of service evaluation and continuous quality improvement (CQI). Although espoused by Americans, the CQI philosophy was initially implemented by the Japanese in rebuilding their economy after World War II. Today, many U.S. organizations have sought to balance quality with cost by implementing CQI procedures. Simply put, CQI may be viewed as a process of continuously setting goals, measuring progress toward the achievement of those goals, and subsequently reevaluating them in light of the progress made. Underlying the CQI process are a few simple assumptions. First, those organizations that can produce high quality products or services at the lowest possible cost have the best chance of surviving and prospering in today's competitive market. Second, it is less costly to prevent errors than to correct them, and the process of preventing errors is a continuous one. Third, it is assumed that the workers within the organization are motivated and empowered to improve the quality of their products or services based on the information they receive about their work. More information about CQI can be found in several sources (e.g., Berwick, 1989; Dertouzos, Lester, & Solow, 1989; Donabedian, 1980, 1982, 1985; P.L. Johnson, 1989; Scherkenback, 1987; Shewhart, 1939; Walton, 1986). A continuous setting, measurement, and reevaluation of goalscharacteristic of the CQI processis being employed by many health care organizations as part of their efforts to survive in a competitive, changing market. At least in part, this move also reflects what InterStudy (a predecessor of the Health Outcomes Institute) described as a ''shifting from concerns about managing costs in isolation to a more comprehensive view that supplements an understanding of costs with an understanding of the quality and value of care delivered" (1991, p. 1). InterStudy defined quality as a position or view that should lead all processes within a system. In the case of the health care system, the most crucial of these processes is that of patient care. InterStudy pointed out that with a CQI orientation, these processes must be well-defined, agreed on, and implemented unvaryingly when delivering care. They also should provide measurable results that will subsequently lead to conclusions about how the processes might be altered to improve the results of care. InterStudy considered CQI as implying "a system that articulates the connections between inputs and outputs, between processes and outcomes . . ., a way of organizing information in order to discover what works, and what doesn't" (p. 1).

< previous page

page_30

next page >

< previous page

page_31

next page > Page 31

In behavioral health care, as in other arenas of health care, CQI is concerned with the services delivered to customers. Here, the "customer" may include not only the patient being treated, but also the employer through whom the health care plan is offered and the third-party payer who selects or approves the service providers who can be accessed by individuals seeking care under the health care plan. It should be evident from the discussion presented throughout this chapter that psychological testing can help the provider focus on delivering the most efficient and effective treatment in order to satisfy the needs of all "customers." It thus can contribute greatly to the CQI effort. Perhaps the most apparent way in which testing can augment the CQI process is through its contributions in the area of outcomes assessment. Through the repeated administration of tests to all patients at intake and later at one or more points during or after the treatment process, an organization can obtain a good sense of how effective individual clinicians, treatment program/units, and/or the organization as a whole are in providing services to their patients. This testing might include the use of not only problem-oriented measures but also measures of patient satisfaction. Considered in light of other, nontest data, this may result in changes in service delivery goals such as the implementation of more effective problem identification and treatment planning procedures. For example, Newman's (1991) graphic demonstration of how data used to support treatment decisions can be extended to indicate how various levels of depression (as measured by the Beck Depression Inventory) may be best served by different types of treatment (e.g., inpatient vs. outpatient). Future Directions The ways in which psychologists and other behavioral health care clinicians conduct the types of psychological assessment described here have continued to undergo dramatic changes during the 1990s. This should come as no surprise to anyone who spends a few minutes a day skimming the newspaper or watching the evening news. The health care revolution started gaining momentum at the beginning of the decade and has not slowed down since that timeand there are no indications that it will subside in the foreseeable future. From the beginning, there was no real reason to think that behavioral health care would be spared from the effects of the health care revolution, and there is no good reason why it should have been spared. The behavioral health care industry certainly has contributed its share of waste, inefficiency, and lack of accountability to the problems that led to the revolution. Now, like other areas of health care, it is forced to "clean up its act." Although some consumers of mental health or chemical dependency services have benefited from the revolution, others have not. Regardless, the way in which health care is delivered and financed has changed, and psychologists and other behavioral health care professionals must adapt to survive in the market. Some of those involved in the delivery of psychological assessment services may wonder (with some fear and trepidation) where the revolution is leading the behavioral health care industry and, in particular, how their ability to practice will be affected. At the same time, others are eagerly awaiting the inevitable advances in technology and other resources that will come with the passage of time. What ultimately will occur is open to speculation. However, close observation of the practice of psychological assessment and the various industries that support it has led to a few predictions as to where

< previous page

page_31

next page >

< previous page

page_32

next page > Page 32

the field of psychological assessment is headed and the implications for patients, clinicians, and provider organizations. What the Industry is Moving Away from One way of discussing what the field is moving toward is to first talk about what it is moving away from. In the case of psychological assessment, two trends are becoming quite clear. First, starting from the early 1990s, the use of (and reimbursement for) psychological assessment has gradually been curtailed. In particular, this has been the case with regard to indiscriminate administration of lengthy and expensive psychological test batteries. Payers began to demand evidence that the knowledge gained from the administration of these instruments in fact contributes to the delivery of cost-effective, efficient care to patients. There seem to be no indications that this trend will stop. Second, the form of assessment commonly used is moving away from lengthy, multidimensional objective instruments (e.g., MMPI) or time-consuming projective techniques (e.g., Rorschach) that previously represented the standard in practice. The type of assessment authorized now usually involves the use of brief, inexpensive, yet well-validated problem-oriented instruments. This reflects modern behavioral health care's time-limited, problem-oriented approach to treatment. Today, the clinician can no longer afford to spend a great deal of time in assessment when the patient is only allowed a limited number of payer-authorized sessions. Thus, brief instruments will become more commonly employed for problem identification, progress monitoring, and outcomes assessment in the foreseeable future. Trends in Instrumentation In addition to the move toward the use of brief, problem-oriented instruments, another trend in the selection of instrumentation is the increasing use of public domain tests, questionnaires, rating scales, and other measurement tools. Typically, these "free-use" instruments were not developed with the same rigor that is applied by commercial test publishers in the development of psychometrically sound instruments. Consequently, they commonly lacked the validity and reliability data necessary to judge their psychometric integrity. Recently, however, there has been significant improvement in the quality and documentation of the public domain and other free-use tests that are available. Instruments such as the SF-36/SF-12 and HSQ/HSQ-12 health measures are good examples of such tools. These and instruments such as the Behavior and Symptom Identification Scale (BASIS-32; Eisen, Grob, & Klein, 1986) and the Outcome Questionnaire (OQ-45.1; Lambert, Lunnen, Umphress, Hansen, & Burlingame, 1994) have undergone psychometric scrutiny and have gained widespread acceptance as a result. Although copyrighted, these instruments may be used for a nominal one-time or annual licensing fee; thus, they generally are treated much like public domain assessment tools. In the future, other high quality, useful instruments will be made available for use at little or no cost. As for the types of instrumentation that will be needed and developed, some changes can probably be expected. Accompanying the increasing focus on outcomes assessment is a recognition by payers and patients that positive change in several areas of functioning are at least as important as change in level of symptom severity when evaluating

< previous page

page_32

next page >

< previous page

page_33

next page > Page 33

treatment effectiveness. For example, employers are interested in patients' ability to resume the functions of their job, whereas family members likely are concerned with the patients' ability to resume their role as spouse or parent. Increasingly, measurement of the patient's functioning in areas other than psychological/mental status has come to be included as part of behavioral health care outcomes systems. Probably the most visible indications of this is the incorporation of the SF-36 or HSQ in various behavioral health care studies, and the fact that at least three major psychological test publishers now offer HSQ products in their clinical product catalogs. Other public domain and commercially available nonsymptom-oriented instruments, especially those emphasizing social and occupational role functioning, will likely appear in increasing numbers over the next several years. Other types of instrumentation also will become prominent. These may well include measures of variables that support outcomes and other assessment initiatives undertaken by provider organizations. What one organization or provider believes is important, or what payers determine is important for reimbursement or other purposes, will dictate what is measured. Instrumentation also may include measures that will be useful in predicting outcomes for individuals seeking specific psychotherapeutic services from those organizations. Conclusions The health care revolution has brought mixed blessings to those in the behavioral health care professions. It has limited reimbursement for services rendered and has forced many to change the way they practice their profession. At the same time, it has led to revelations about the cost savings that can accrue from the treatment of mental health and substance use disorders. This has been the bright spot in an otherwise bleak picture for some behavioral health care professionals. But, for psychologists and others trained in psychological assessment procedures, the picture appears to be somewhat different. They now have additional opportunities to contribute to the positive aspects of the revolution and to gain from the "new order" it has imposed. By virtue of their training and through the application of appropriate instrumentation, they are uniquely qualified to support or otherwise facilitate multiple aspects of the therapeutic process. Earlier in this chapter, some of the types of psychological assessment instruments that are commonly used in the therapeutic endeavors were identified. These included both brief and lengthy (multidimensional) symptom measures, as well as measures of general health status, quality of life, role functioning, and patient satisfaction. Also identified were different sets of general criteria that can be applied when selecting instruments for use in therapeutic settings. The main intent of this chapter, however, was to present an overview of the various ways in which psychological assessment can be used to facilitate the selection, implementation, and evaluation of appropriate therapeutic interventions in behavioral health care settings. Generally, psychological assessment can assist the clinician in three important clinical activities: clinical decision making, treatment (when used as a specific therapeutic technique), and treatment outcomes evaluation. Regarding the first of these activities, three important clinical decision-making functions can be facilitated by psychological assessment: screening, treatment planning, and treatment monitoring. The first of these can be served by the use of brief instruments designed to identify, within a high degree of

< previous page

page_33

next page >

< previous page

page_34

next page > Page 34

certainty, the likely presence (or absence) of a particular condition or characteristic. Here, the diagnostic efficiency of the instrument used (as indicated by their PPPs and NPPs) is of great importance. Through their ability to identify and clarify problems, as well as other important treatment-relevant patient characteristics, psychological assessment instruments also can be of great assistance in planning treatment. In addition, treatment monitoring, or the periodic evaluation of the patient's progress during the course of treatment, can be served well by the application psychological assessment instruments. Second, assessment may be used as part of a therapeutic technique. In what Finn termed "therapeutic assessment," situations in which patients are evaluated via psychological testing are used as opportunities for the process itself to serve as a therapeutic intervention. This is accomplished through involving the patient as an active participant in the assessment process, not just as the object of the assessment. Third, psychological assessment can be employed as the primary mechanism by which the outcomes or results of treatment can be measured. However, use of assessment for this purpose is not a cut-and-dried matter. Issues pertaining to what to measure, how to measure, and when to measure require considerable thought prior to undertaking a plan to assess outcomes. Guidelines for resolving these issues are presented, as is information on how to determine if the measured outcomes of treatment are indeed "significant." The role that outcomes assessment can have in an organization's CQI initiative also was discussed. The final section of the chapter shared some thoughts about where psychological assessment is headed in the future. In general, what is foreseen is the appearance of more quality, affordable instrumentation designed to assess various aspects of a patient's functioning. There is no doubt that the practice of psychological assessment has been dealt a blow within recent years. However, clinicians trained in the use of psychological tests and related instrumentation have the skills to take these powerful tools, apply them in ways that will benefit those suffering from mental health and substance abuse problems, and demonstrate their value to patients and payers. Only time will tell whether they will be successful in this demonstration. Meanwhile, the field will continue to make advancements that will facilitate and improve the quality of its work. A Final Word As suggested earlier, psychologists' training in psychological testing should provide them with an edge in surviving in the evolving revolution in mental health service delivery. Maximizing their ability to use the "tools of the trade" to facilitate problem identification, subsequent planning of appropriate treatment, and measuring and documenting the effectiveness of their efforts can only aid in clinicians' quest for optimal efficiency and quality in service. It is hoped that the information and guidance provided by the many distinguished contributors to this volume will assist practicing psychologists, psychologists-in-training, and other behavioral health care providers in maximizing the resources available to them and thus in prospering in the emerging new health care arena. This is a time of uncertainty and perhaps some anxiety. It is also a time of great opportunity. How practitioners choose to face the current state of affairs is a matter of personal and professional choice.

< previous page

page_34

next page >

< previous page

page_35

next page > Page 35

Acknowledgment This chapter is adapted from M.E. Maruish, "Therapeutic Assessment: Linking Assessment and Treatment," in M. Hersen & A. Bellack (Series Eds.) & C.R. Reynolds (Vol. Ed.), Comprehensive clinical psychology: Vol. 4. Assessment (in press), with permission from Elsevier Science Ltd., The Boulevard, Langford Lane, Kidlington OX5 1GB, UK. References American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. American Psychological Association. (1992). Ethical principles. Washington, DC: Author. American Psychological Association. (1996). The costs of failing to provide appropriate mental health care. Washington, DC: Author. Andrews, G., Peters, L., & Teesson, M. (1994). The measurement of consumer outcomes in mental health. Canberra, Australia: Australian Government Publishing Service. Appelbaum, S.A. (1990). The relationship between assessment and psychotherapy. Journal of Personality Assessment, 54, 791-801. Attkisson, C.C., & Zwick, R. (1982). The Client Satisfaction Questionnaire: Psychometric properties and correlations with service utilization and psychotherapy outcome. Evaluation and Program Planning, 6, 233-237. Beck, A.T., Rush, A.J., Shaw, B.F., & Emery, G. (1979). Cognitive therapy of depression. New York: Guilford. Berwick, D.M. (1989). Sounding board: Continuous improvement as an ideal in health care. New England Journal of Medicine, 320, 53-56. Beutler, L.E., & Clarkin, J. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. New York: Brunner/Mazel. Beutler, L.E., Wakefield, P., & Williams, R.E. (1994). Use of psychological tests/instruments for treatment planning. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 55-74). Hillsdale, NJ: Lawrence Erlbaum Associates. Beutler, L.E., & Williams, O.B. (1995). Computer applications for the selection of optimal psychosocial therapeutic interventions. Behavioral Healthcare Tomorrow, 4, 66-68. Brook, R.H., McGlynn, E.A., & Cleary, P.D. (1996). Quality of health care: Part: 2 Measuring quality of care. New England Journal of Medicine, 335, 966-970. Burlingame, G.M., Lambert, M.J., Reisinger, C. W., Neff, W.M., & Mosier, J. (1995). Pragmatics of tracking mental health outcomes in a managed care setting. Journal of Mental Health Administration, 22, 226-236. Butcher, J.N. (1990). The MMPI-2 in psychological treatment. New York: Oxford University Press. Butcher, J.N., Dahlstrom, W.G., Graham, J.R., Tellegen, A.M., & Kaemmer, B. (1989). MMPI-2: Manual for administration and scoring. Minneapolis, MN: University of Minnesota Press. Butcher, J.N., Graham, J.R., Williams, C.L., & Ben-Porath, Y. (1989). Development and use of the MMPI-2 content scales. Minneapolis, MN: University of Minnesota Press. Cagney, T., & Woods, D.R. (1994). Why focus on outcomes data? Behavioral Healthcare Tomorrow, 3, 65-67. Center for Disease Control and Prevention. (1994, May 27). Quality of life as a new public health measure: Behavioral risk factor surveillance system. Morbidity and Mortality Weekly Report, 43, 375-380. Christensen, L., & Mendoza, J.L. (1986). A method of assessing change in a single subject: An alteration of the RC index [Letter to the editor]. Behavior Therapy, 17, 305-308. Ciarlo, J.A., Brown, T.R., Edwards, D.W., Kiresuk, T.J., & Newman, F.L. (1986). Assessing mental health treatment outcomes measurement techniques (DHHS Publication No. ADM 86-1301). Washington, DC: U.S. Government Printing Office. Derogatis, L.R. (1983). SCL-90-R: Administration, scoring and procedures manualII. Baltimore: Clinical Psychometric Research.

< previous page

page_35

next page >

< previous page

page_36

next page > Page 36

Derogatis, L.R. (1992). BSI: Administration, scoring and procedures manualII. Baltimore: Clinical Psychometric Research. Derogatis, L.R., & DellaPietra, L. (1994). Psychological tests in screening for psychiatric disorder. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 22-54). Hillsdale, NJ: Lawrence Erlbaum Associates. Derogatis, L. R., Lipman, R.S., & Covi, L. (1973). SCL-90: An outpatient psychiatric rating scalepreliminary report. Psycho-pharmacology Bulletin, 9, 13-27. Dertouzos, M.L., Lester, R.K., & Solow, R.M. (1989). Made in America: Regaining the productive edge. Cambridge, MA:MIT Press. Dickey, B., & Wagenaar, H. (1996). Evaluating health status. In L.I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 55-60). Baltimore: Williams & Wilkins. Donabedian, A. (1980). Explorations in quality assessment and monitoring: The definition of quality and approaches to its assessment (Vol. 1). Ann Arbor, MI: Health Administration Press. Donabedian, A. (1982). Explorations in quality assessment and monitoring: The criteria and standards of quality (Vol. 2). Ann Arbor, MI: Health Administration Press. Donabedian, A. (1985). Explorations in quality assessment and monitoring: The methods and findings in quality assessment: An illustrated analysis (Vol. 3). Ann Arbor, MI: Health Administration Press. Dorwart, R.A. (1996). Outcomes management strategies in mental health: Applications and implications for clinical practice. In L.I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 45-54). Baltimore: Williams & Wilkins. Dowd, E.T., Milne, C.R., & Wise, S.L. (1991). The Therapeutic Reactance Scale: A measure of psychological reactance. Journal of Counseling and Development, 69, 541-545. Eisen, S.V., Grob, M.C., & Klein, A.A. (1986). BASIS: The development of a self-report measure for psychiatric inpatient evaluation. The Psychiatric Hospital, 17, 165-171. Fee, practice and managed care survey. (1995, January). Psychotherapy Finances, 21 (1), Issue 249. Ficken, J. (1995). New directions for psychological testing. Behavioral Health Management, 20, 12-14. Finn, S.E. (1996a). Assessment feedback integrating MMPI-2 and Rorschach findings. Journal of Personality Assessment, 67, 543-557. Finn, S. E. (1996b). Manual for using the MMPI-2 as a therapeutic intervention. Minneapolis, MN: University of Minnesota Press. Finn, S.E., & Butcher, J.N. (1991). Clinical objective personality assessment. In M. Hersen, A.E. Kazdin, & A.S. Bellack (Eds.), The clinical psychology handbook (2nd ed., pp. 362-373). New York: Pergamon. Finn, S.E., & Martin, H. (1997). Therapeutic assessment with the MMPI-2 in managed health care. In J.N. Butcher (Ed.), Personality assessment in managed care (pp. 131-152). Minneapolis: University of Minnesota Press. Finn, S.E., & Tonsager, M.E. (1992). Therapeutic effects of providing MMPI-2 test feedback to college students awaiting therapy. Psychological Assessment, 4, 278-287. Friedman, R., Sobel, D., Myers, P., Caudill, M., & Benson, H. (1995). Behavioral medicine, clinical health psychology, and cost off-set. Health Psychology, 14, 509-518. Future targets behavioral health field's quest for survival. (1996, April 8). Mental Health Weekly, pp. 1-2. Gough, H.G., McClosky, H., & Meehl, P.E. (1951). A personality scale for dominance. Journal of Abnormal and Social Psychology, 46, 360-366. Gough, H.G., McClosky, H., & Meehl, P.E. (1952). A personality scale for social responsibility. Journal of Abnormal and Social Psychology, 47, 73-80. Greene, R.L. (1991). The MMPI-2/MMPI: An interpretive manual. Boston: Allyn & Bacon. Greene, R.L., & Clopton, J.R. (1994). Minnesota Multiphasic Personality Inventory-2. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 137-159). Hillsdale, NJ: Lawrence Erlbaum Associates. Greenfield, T.K., & Attkisson, C.C. (1989). Progress toward a multifactorial service satisfaction scale for evaluating primary care and mental health services. Evaluation and Program Planning, 12, 271-278.

< previous page

page_37

next page > Page 37

Hathaway, S.R., & McKinley, J.C. (1951). MMPI manual. New York: The Psychological Corporation. Health Outcomes Institute. (1993). Health Status Questionnaire 2.0 manual. Bloomington, MN: Author. Holder, H.D., & Blose, J.O. (1986). Alcoholism treatment and total health care utilization and costs: A 4-year longitudinal analysis of federal employees. Journal of the American Medical Association, 256, 1456-1460. InterStudy. (1991). Preface. The InterStudy Quality Edge, 1, 1-3. Jahoda, M. (1958). Current concepts of mental health. New York: Basic Books. Jacobson, N.S., Follette, W.C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy, 15, 336-352. Jacobson, N.S., Follette, W.C., & Revenstorf, D. (1986). Toward a standard definition of clinically significant change [Letter to the editor]. Behavior Therapy, 17, 309-311. Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Johnson, J., Weissman, M., & Klerman, G.J. (1992). Service utilization and social morbidity associated with depressive symptoms in the community. Journal of the American Medical Association, 267, 1478-1483. Johnson, P.L. (1989). Keeping score: Strategies and tactics for winning the quality war. New York: Harper & Row. Kiesler, C.A., & Morton, T.L. (1988). Psychology and public policy in the "health care revolution." American Psychologist, 43, 993-1003. Lambert, M.J., Lunnen, K., Umphress, V., Hansen, N.B., & Burlingame, G.M. (1994). Administration and scoring manual for the Outcome Questionnaire (OQ-45.1). Salt Lake City, UT: IHC Center for Behavioral Healthcare Efficacy. Larsen, D.L., Attkisson, C.C., Hargreaves, W.A., & Nguyen, T.D. (1979). Assessment client/patient satisfaction: Development of a general scale. Evaluation and Program Planning, 2, 197-207. Leaders predict integration of MH, primary care by 2000. (1996, April 8). Mental Health Weekly, pp. 1, 6. LeVois, M., Nguyen, T.D., & Attkisson, C.C. (1981). Artifact in client satisfaction assessment: Experience in community mental health settings. Evaluation and Program Planning, 4, 139-150. Maruish, M. (1990). Psychological assessment: What will its role be in the future? Assessment Applications, Fall, 7-8. Maruish, M.E. (1994). Introduction. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 3-21). Hillsdale, NJ: Lawrence Erlbaum Associates. Megargee, E.I., & Spielberger, C.D. (1992). Reflections on 50 years of personality assessment and future directions for the field. In E.I. Megargee & C.D. Spielberger (Eds.), Personality assessment in America (pp. 170-190). Hillsdale, NJ: Lawrence Erlbaum Associates. Migdail, K.J., Youngs, M.T., & Bengen-Seltzer, B. (Eds.). (1995). The 1995 behavioral outcomes and guidelines sourcebook. New York: Faulkner & Gray. Millon, T. (1994). MCMI-III manual. Minneapolis, MN: National Computer Systems. Moreland, K.L. (1996). How psychological testing can reinstate its value in an era of cost containment. Behavioral Healthcare Tomorrow, 5, 59-61. Morey, L.C. (1991). The Personality Assessment Inventory professional manual. Odessa, FL: Psychological Assessment Resources. Morey, L.C., & Henry, W. (1994). Personality Assessment Inventory. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 185-216). Hillsdale, NJ: Lawrence Erlbaum Associates. Newman, F.L. (1991). Using assessment data to relate patient progress to reimbursement criteria. Assessment Applications, Summer, 4-5. Newman, F.L. (1994). Selection of design and statistical procedures for progress and outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 111134). Hillsdale, NJ: Lawrence Erlbaum Associates.

< previous page

page_37

next page >

< previous page

page_38

next page > Page 38

Nguyen, T.D., Attkisson, C.C., & Stegner, B. L. (1983). Assessment of patient satisfaction: Development and refinement of a service evaluation questionnaire. Evaluation and Program Planning, 6, 299-313. Olfson, M., & Pincus, H.A. (1994a). Outpatient psychotherapy in the United States: I. Volume, costs, and user characteristics. American Journal of Psychiatry, 151, 1281-1288. Olfson, M., & Pincus, H.A. (1994b). Outpatient psychotherapy in the United States: II. Patterns of utilization. American Journal of Psychiatry, 151, 1289. Oss, M.E. (1996). Managed behavioral health care: A look at the numbers. Behavioral Health Management, 16, 16-17. Pallak, M.S. (1994). National outcomes management survey: Summary report. Behavioral Healthcare Tomorrow, 3, 63-69. Phelps, R. (1996, February). Preliminary practitioner survey results enhance APA's understanding of health care environment. Practitioner Focus, 9, 5. Radosevich, D., & Pruitt, M. (1996). Twelve-item Health Status Questionnaire (HSQ-12) Version 2.0 user's guide. Bloomington, MN: Health Outcomes Institute. Radosevich, D.M., Wetzler, H., & Wilson, S. M. (1994). Health Status Questionnaire (HSQ) 2.0: Scoring comparisons and reference data. Bloomington, MN: Health Outcomes Institute. Rouse, B.A. (Ed.) (1995). Substance abuse and mental health statistics sourcebook (DHHS Publication No. SMA 95-3064). Washington, DC: Superintendent of Documents, U.S. Government Printing Office. Saravay, S.M., Pollack, S., Steinberg, M.D., Weinschel, B., & Habert, M. (1996). Four-year follow-up of the influence of the influence of psychological comorbidity on medical rehospitalization. American Journal of Psychiatry, 153, 397-403. Scherkenback, W.W. (1987). The Deming route to quality and productivity: Road maps and roadblocks. Rockville, MD: Mercury Press/Fairchild Publications. Schlosser, B. (1995). The ecology of assessment: A "patient-centric" perspective. Behavioral Healthcare Tomorrow, 4, 66-68. Sederer, L.I., Dickey, B., & Hermann, R.C. (1996). The imperative of outcomes assessment in psychiatry. In L.I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 1-7). Baltimore: Williams & Wilkins. Shewhart, W.A. (1939). Statistical methods from the viewpoint of quality control. Washington, DC: U.S. Department of Agriculture Graduate School. Simmons, J.W., Avant, W.S., Demski, J., & Parisher, D. (1988). Determining successful pain clinic treatment through validation of cost effectiveness. Spine, 13, 34. Simon, G.E., VonKorff, M., & Barlow, W. (1995). Health care costs of primary care patients with recognized depression. Archives of General Psychiatry, 52, 850-856. Sipkoff, M.Z. (1995, August). Behavioral health treatment reduces medical costs: Treatment of mental disorders and substance abuse problems increase productivity in the workplace. Open Minds, 12. Speer, D.C. (1992). Clinically significant change: Jacobson and Truax (1991) revisited. Journal of Consulting and Clinical Psychology, 60, 402-408. Spielberger, C.D. (1983). Manual of the State-Trait Anxiety Inventory: STAI (Form Y). Palo Alto, CA: Consulting Psychologists Press. Stewart, A.L., & Ware, J.E., Jr. (1992). Measuring functioning and well-being. Durham, NC: Duke University Press. Strain, J.J., Lyons, J.S., Hammer, J.S., Fahs, M., Lebovits, A., Paddison, P.L., Snyder, S., Strauss, E., Burton, R., & Nuber, G. (1991). Cost offset from a psychiatric consultation-liaison intervention with elderly hip fracture patients. American Journal of Psychiatry, 148, 1044-1049. Strategic Advantage, Inc. (1996). Symptom Assessment-45 Questionnaire manual. Minneapolis, MN: Author. Substance Abuse Funding News. (1995, December 22). Brown resigns drug post. p. 7. Substance Abuse and Mental Health Services Administration (1996, August). Preliminary estimates from the 1995 National Household Survey on Drug Abuse (SAMHSA Advanced Rep. No. 18). Rockville, MD: Author. Walton, M. (1986). The Deming management method. New York: Dodd Mead. Wampold, B.E., & Jenson, W.R. (1986). Clinical significance revisited [Letter to the editor]. Behavior Therapy,

17, 302-305. Ware, J.E., Kosinski, M., & Keller, S.D. (1995). SF-12: How to score the SF-12 physical and

< previous page

page_38

next page >

< previous page

page_39

next page > Page 39

mental summary scales (2nd ed.). Boston: New England Medical Center, The Health Institute. Ware, J.E., & Sherbourne, C.D. (1992). The MOS 36-Item Short Form Health Survey (SF-36): I. Conceptual framework and item selection. Medical Care, 30, 473-483. Ware, J.E., Snow, K.K., Kosinski, M., & Gandek, B. (1993). SF-36 Health Survey manual and interpretation guide. Boston: New England Medical Center, The Health Institute. Werthman, M.J. (1995). A managed care approach to psychological testing. Behavioral Health Management, 15, 15-17.

< previous page

page_39

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_41

next page > Page 41

Chapter 2 Psychological Tests in Screening for Psychiatric Disorder Leonard R. Derogatis Larry L. Lynn Clinical Psychometric Research, Inc. And Loyola College of Maryland From a historical perspective, psychology represents one of the pioneering disciplines in the development of screening models and selection algorithms. Early on, psychologists recognized one of the fundamental tenets of screening: Early detection can substantially improve the probability of the delivering effective treatment. Psychologists were also among the first to formally describe the methodologies of establishing the reliability and validity of screening tests, and the differential implications of false positive and false negative errors in prediction. In addition, psychologists have argued consistently in favor of the inherent cost-efficiency of psychiatric screening programs, a critical issue with current health care costs approaching runaway proportions. Concerning the issue of cost, the evidence is extremely compelling that the implementation of screening paradigms for psychiatric disorder, particularly in primary care populations, can substantially reduce costs and measurably improve treatment effectiveness of primary medical disorders. Psychiatric screening programs represent an effective response to the reality that the majority of individuals with psychiatric disorders in society (many estimate as high as 75%) are never seen by a mental health professional. The large majority of such cases in the health system are attended to by primary care physicians. So, in spite of possessing highly reliable, valid methods for the evaluation of psychological status, treatment planning and outcomes assessment essentially address just the ''tip of the iceberg." Assessment methods are irrelevant for a large majority of those who would derive benefit from their utilization, because the psychiatric disorders of these individuals are rarely recognized by the primary care professionals who provide their health care. This being the case, the routine screening for mental disorders in cohorts known to be at elevated risk (e.g., college students, chronic medical illnesses, specific elderly populations) would identify a significantly larger proportion of covert psychological disorders, and do so at an earlier stage of their illnesses, thereby reducing the potential for cumulative morbidity.

< previous page

page_41

next page >

< previous page

page_42

next page > Page 42

Overview of Screening The Concept of Screening Screening has been defined traditionally as "the presumptive identification of unrecognized disease or defect by the application of tests, examinations or other procedures which can be applied rapidly to sort out apparently well persons who probably have a disease from those who probably do not" (Commission on Chronic Illness, 1957, p. 45). Screening is an operation conducted in an ostensibly well population in order to identify occult instances of the disease or disorder in question. Some authorities make a distinction between screening and case finding, which is specified as the ascertainment of disease in populations comprised of patients with other disorders. Under such a distinction, the detection of psychiatric disorders among medical patients would more precisely fit the criteria for case finding than screening. In actual implementation there appears to be little real difference between the two processes, so this chapter uses the term screening for both operations. Regardless of its specific manifestation, the screening process represents a relatively unrefined sieve designed to segregate the cohort under assessment into "positives," who presumptively have the condition, and "negatives," who are ostensively free of the disorder. Screening is not a diagnostic procedure per se. Rather, it represents a preliminary filtering operation that identifies those individuals with the highest probability of having the disorder in question for subsequent specific diagnostic evaluation. Individuals found negative by the screening process are usually not evaluated further. The conceptual underpinning for screening rests on the premise that the early detection of unrecognized disease in apparently healthy individuals carries with it a measurable advantage in achieving effective treatment and/or cure of the condition. Although logical, this assumption is not always valid. In certain conditions, early detection does not measurably improve the capacity to alter morbidity or mortality, either because diagnostic procedures are unreliable, or because effective treatments for the condition are not yet available. In an attempt to facilitate a better appreciation of the particular health problems that lend themselves to effective screening systems, the World Health Organization (WHO) published guidelines for effective health screening programs (Wilson & Junger, 1968). The following is a version of these criteria: 1. The condition should represent an important health problem that carries with it notable morbidity and mortality. 2. Screening programs must be cost-effective, that is, the incidence/significance of the disorder must be sufficient to justify the costs of screening. 3. Effective methods of treatment must be available for the disorder. 4. The test(s) for the disorder should be reliable and valid, so that detection errors (i.e., false positives or false negatives) are minimized. 5. The test(s) should have high cost-benefit, that is, the time, effort, and personal inconvenience to the patient associated with taking the test should be substantially outweighed by its potential benefits. 6. The condition should be characterized by an asymptomatic or benign period, during which detection will significantly reduce morbidity and/or mortality. 7. Treatment administered during the asymptomatic phase should demonstrate significantly greater efficacy than that dispensed during the symptomatic phase.

< previous page

page_42

next page >

< previous page

page_43

next page > Page 43

Some authorities are not convinced that psychiatric disorders, and the screening systems designed to detect them, conclusively meet all of the aforementioned criteria. For example, the efficacy of treatments for certain psychiatric conditions (e.g., schizophrenia) is arguable, and it has not been definitively demonstrated for some conditions that treatments initiated during asymptomatic phases (e.g. "maintenance" antidepressant treatment) are more efficacious than treatment initiated during acute episodes of manifest symptoms. Nevertheless, it is generally understood that psychiatric conditions and the screening paradigms designed to identify them do meet the WHO criteria in most instances, and the consistent implementation of screening systems can substantially improve the quality and cost-efficiency of health care. The impetus for the development and routine implementation of effective psychiatric screening systems for medical and community cohorts arises not only from the increases in morbidity and mortality associated with undetected psychiatric disorders (Hawton, 1981; Kamerow, Pincus, & MacDonald, 1986; Regier, Robert, et al., 1988), but also from several additional factors. First, it is currently well-established that between two thirds and three quarters of individuals with psychiatric disorders either go completely untreated or are treated by nonpsychiatric physicians; these individuals are never seen by a mental health care professional (B.P. Dohrenwend & B.S. Dohrenwend, 1982; Regier, Goldberg, & Taube, 1978; Weissman, Myers, & Thompson, 1981; Yopenic, Clark, & Aneshensel, 1983). Second, although there is a significant correlation, or comorbidity, between physical illness and psychiatric disorder (J.E. Barrett, J.A. Barrett, Oxman, & Gerber, 1988; Fulop & Strain, 1991; T.L. Rosenthal et al., 1992), the detection by primary care physicians of the most prevalent psychiatric disorders (i.e., anxiety and depressive disorders) is routinely poor (Linn & Yager, 1984; Nielson & Williams, 1980). It is not unusual to find recognition rates in medical cohorts falling below 50%. In addition, research has consistently demonstrated higher prevalence rates for psychiatric disorders among the medically ill (K.B. Wells, Golding, & Burnam, 1988), and has confirmed that high health care utilizers have elevated levels of psychological distress and psychiatric diagnoses (Katon et al., 1990). In addition, costs for medical patients with comorbid psychiatric disorders can be dramatically higher than those of comparable patients free of psychological comorbidities (Allison et al., 1995). Effective psychiatric screening programs designed with current methods would not only significantly reduce psychiatric and medical morbidity, but would almost certainly have a beneficial impact on health care costs (Allison et al., 1995). The number of unnecessary diagnostic tests would be diminished, lengths of stay and numbers of readmissions would be reduced (Saravay, Pollack, Steinberg, Weinschel, & Habert, 1996), and the demand for health care among the groups with the highest utilization rates would be decreased. More importantly, those individuals in the community whose disorders currently go undetected would be identified early, and treated before the pervasive morbidity associated with chronicity sets in. The Epidemiologic Screening Model Because most psychologists are not closely familiar with screening paradigms, the basic epidemiologic screening model is reviewed. Essentially, a cohort of individuals who are apparently well, or in the instance of case finding present with a condition distinct from the index disorder, are evaluated by a "test" to determine if they are at high risk for a particular disorder or disease. As already outlined, the disorder must have sufficient incidence or consequence to be considered a serious public health problem, and must

< previous page

page_43

next page >

< previous page

page_44

next page > Page 44

TABLE 2.1 Epidemiologic Screening Model Actual Screening Test Cases Noncases Test Positive a b Test Negative c d Note. Sensitivity (Se) = a/(a + c); false negative rate (1 - Se) = c/(a + c); specificity (Sp) = d/(b + d); false positive rate (1 - Sp) = b/(b + d); positive predictive value (PPV) = a/(a + b); negative predictive value (NPV) = d/(c + d). be characterized by a distinct early or asymptomatic phase during which detection will substantially improve the results of treatment. The screening test itself (e.g., pap smear, Western blot) should be both reliable (i.e., consistent in its performance from one administration to the next) and valid (i.e., be capable of identifying those with the index disorder, and eliminating individuals who do not have the condition). In psychometric terms, this form of validity has been traditionally referred to as "predictive," or "criterion-oriented," validity. In epidemiologic models, the predictive validity of the test is apportioned into two distinct partitions: the degree to which the test correctly identifies those individuals who actually have the disorder, termed its sensitivity, and the extent to which those free of the condition are correctly identified as such, or its specificity. Correctly identified individuals with the index disorder are referred to as true positives, and those accurately identified as being free of the disorder are termed true negatives. Misidentifications of healthy individuals as affected are labeled false positives, and affected individuals missed by the test are referred to as false negatives. It should be noted that each type of prediction error carries with it a socially determined value or significance, termed its utility, and these utilities need not be equal. The basic fourfold epidemiologic table, as well as the algebraic definitions of each of these validity indices, are given in Table 2.1. Sensitivity and specificity are a screening test's most fundamental validity indices; however, other parameters can markedly affect a test's performance. In particular, the prevalence, or base rate, of the disorder in the population under evaluation can have a powerful effect on the results of screening. Two other indicators of test performance, predictive value of a positive and predictive value of a negative, reflect the interactive effects of test validity and prevalence. These indices are also defined in Table 2.1, although their detailed discussion is postponed until a later section. Screening Tests for Psychiatric Disorders A History of Screening Measures The predecessors of modern psychological screening instruments date back to the late 19th and early 20th centuries. Sir Francis Galton (1883) created the prototype psychological questionnaire as part of an exposition for the World Fair. The first self-report symptom inventory, the Personal Data Sheet, was developed by Woodworth (1918) as

< previous page

page_44

next page >

< previous page

page_45

next page > Page 45

part of the effort to screen American soldiers entering World War I for psychiatric disorders. At approximately the same time, the psychiatrist Adolph Meyer constructed the first psychiatric rating scale, the Phipps Behavior Chart at Johns Hopkins (Kempf, 1914-1915). Since these pioneering efforts, many hundreds of analogous tests and rating scales have been developed and published. A number have become well-validated and widely used. The current chapter briefly reviews a small number of these instruments in an effort to familiarize the reader with the nature of screening measures available. This is not the appropriate place for a comprehensive review of psychological screening tests (See Sederer & Dickey, 1996; Spilker, 1996; Zalaquett & Wood, 1997); rather, the goal is to introduce a number of instruments judged to be exemplary of their class. General Psychometric Principles Fundamental to a realistic appreciation of the psychometric basis for psychiatric screening is the realization that we are first and foremost involved in psychological measurement. Psychologists are rigorously schooled in the awareness that the principles underlying psychological assessment are no different from those that govern any other form of scientific measurement. However, a major distinction that characterizes psychological measurement resides in the object of measurement: It is usually a hypothetical construct. By contrast, measurement in the physical sciences usually involves tangible entities, which are measured via ratio scales with true zeros and equal intervals and ratios throughout the scale continuum (e.g., weight, distance, velocity). In quantifying hypothetical constructs (e.g., anxiety, depression, impulsivity), measurement occurs on ordinalapproaching interval scales, which by their nature are less sophisticated, and have substantially larger errors of measurement (Luce & Narens, 1987). Psychological measurement is no less scientific due to this fact; however, it is less precise than measurement in the physical sciences. Reliability. All scientific measurement is based on consistency or replicability; reliability concerns the degree of replicability inherent in measurement. To what degree would a symptom inventory provide the same results on readministration? To what extent do two clinicians agree on a psychiatric rating scale? Conceived differently, reliability can be thought of as the converse of measurement error. It represents that proportion of variation in measurement that is due to true variation in the attribute under study, as opposed to random or systematic error variance. Reliability can be conceptualized as the ratio of true score variation to the total measurement variance. It specifies the precision of measurement and thereby sets the theoretical limit of measurement validity. Validity. Just as reliability indicates the consistency of measurement, validity reflects the essence of measurement: the degree to which an instrument measures what it is designed to measure. It specifies how well an instrument measures a given attribute or characteristic of interest. Establishing the validity of a screening instrument is more complex and programmatic than determining its reliability, and rests on more elaborate theory. Although the validation process involves many types of validity experiments, the most explicitly applicable to the screening process is predictive validity. Essentially, the predictive validity of an assessment device hinges on its degree of correlation with an external reference criterionsome sort of gold standard. In the case of screening tests, the external criterion usually takes the form of a comprehensive

< previous page

page_45

next page >

< previous page

page_46

next page > Page 46

laboratory and/or clinical diagnostic evaluation that definitively establishes the presence or absence of the index condition. Critical to a genuine appraisal of predictive validity is the realization that it is highly specific in nature. To say that a particular screening test is "valid" has little or no scientific meaning; tests are valid only for specific purposes. An explicit note of validation specificity is made here only because there appears to be some confusion on this issue relative to psychological tests. Psychological tests employed in screening for psychiatric disorder(s) must be validated specifically in terms of the diagnostic assignments they are designed to predict. A specific unidimensional test (e.g., a depression scale) should be validated in terms of its ability to accurately predict clinical depressions; it should be of little value in screening for other psychiatric disorders except by virtue of the high comorbidity of depression with numerous other conditions (Maser & Cloninger, 1990), and the pervasive nature of depressive symptoms in many medical illnesses. Generalizability. Like reliability and validity, generalizability is a fundamental psychometric characteristic of test instruments used in psychiatric screening paradigms. Many clinical conditions and manifestations are systematically altered as a function of parameters such as age, sex, race, and the presence or absence of a comorbid medical illness. When validity coefficients (i.e., sensitivity and specificity) for a particular test are established relative to a specific diagnostic condition, they may vary considerably if the demographic and health characteristics of the cohort on which they were established are altered significantly. To cite examples, it is well-established that men are more constrained than women in reporting emotional distress. Well-constructed tests measuring symptomatic distress develop distinct sets of norms for the two genders to deal with this effect (Nunnally, 1978). Another illustration resides in the change of the phenomenologic characteristics of depression across age: Depression in the very young tends toward less dramatic affective display, and progresses through the classic clinical delineations of young and middle adult years, to the geriatric depressions of the elderly, which are more likely to be characterized by dementialike cognitive dysfunctions. Any single test is unlikely to perform with the same degree of validity across shifts in relevant parameters; therefore, generalizability must be established empirically and cannot merely be assumed. It is interesting to note that in a recent treatise on modernizing the conceptualization of validity in psychological assessment, Messick (1995) integrated generalizability, along with external criterion validity, as one of six discernable aspects of construct validity. Self-Report Versus Clinical Judgment. Although advocates and adherents argue the differential merits of self-report versus clinician ratings, a great deal of evidence suggests that the two techniques have strengths and weaknesses of roughly the same magnitude. Neither approach can be said to function more effectively overall in screening for psychiatric disorder. Each screening situation must be assessed independently, and the circumstances of each must be objectively weighed to determine which instrument modality is best suited for any particular screening implementation. Traditionally, self-report inventories have been more frequently used as screening tests than clinical rating scales. This is probably so because the self-report modality of measurement has much to recommend it to the task of screening. Self-report measures tend to be brief, inexpensive, and are tolerated well by the individuals being screened. These features lend the important attributes of cost-efficiency and cost-benefit to self-report. Self-report scales are also transportable; they may be used in a variety of settings, and they minimize professional time and effort. In addition, their administration, scoring,

< previous page

page_46

next page >

< previous page

page_47

next page > Page 47

and evaluation require little or no professional input. Recently, such tests have been adapted for use on personal computers. Interactive computerized testing enables test administration, scoring, evaluation, and storage of results entirely by computer, reducing both professional and technical time. Finally, perhaps the greatest advantage of self-report resides in the fact that the test is being completed by the only individuals experiencing the phenomenathe respondents themselves. Clinicians, no matter how skilled or well-trained, can never know the actual experience of respondents; rather, they must be satisfied with an apparent, or deduced, representation of the phenomena. This last feature of self-report tests can also represent their greatest potential source of error, that is, patient bias in reporting. Because the test respondent is providing the test data, an opportunity exists to consciously or unconsciously distort the responses given. Although patient bias does represent a potential difficulty for selfreport, empirical studies have indicated that such distortions represent a problem only in situations where there is obvious personal gain associated with response distortions. Otherwise, this problem usually does not represent a major source of bias (L.R. Derogatis, Lipman, Rickles, Uhlenhuth, & Covi, 1974a). There is also the possibility that response sets, such as acquiescence or attempts at impression management, may result in systematic response distortions, but such effects tend to add little error variance in most realistic clinical screening situations. Probably the greatest limitation of self-report arises from the inflexibility of the format: A line of questioning cannot be altered or modified depending on how the individual responds to previous questions. In addition, only denotative responses can be appreciated; facial expressions, tones of voice, attitudes and postures, and cognitive/emotional status of respondents are not integral aspects of the test data. This inflexibility extends to the fact that respondents must also be literate in order to read the questions. The psychiatric rating scale or interview is a viable alternative to self-report instruments in designing a screening paradigm. The clinical rating scale introduces professional judgment into the screening process, and is inherently more flexible than self-report. The clinician has both the expertise and freedom to delve in more detail into any area of history, thought, or behavior that will deliver relevant information on the respondents' mental status. Clinicians also carry the capacity to clarify ambiguous answers and probe areas of apparent contradiction. In addition, because of their sophistication in psychopathology and human behavior, there is the theoretical possibility that more complex and sophisticated instrument design may be utilized in developing psychiatric rating scales. On the negative side, just as self-report is subject to patient bias, clinical rating scales are subject to equally powerful interviewer biases. Training sessions and videotaped interviews may be utilized in an attempt to reduce systematic errors of this type; however, interviewer bias can never be completely eliminated. Furthermore, the very fact that a professional clinician is required to make the ratings significantly increases the costs of screening. Lay interviewers have been trained to do such evaluations in some instances, but they are rarely as skilled as professionals, and the costs of their training and participation must be weighed into the equation as well. Finally, the more flexibility designed into the interview, the more time it is likely to take for the clinician to complete the ratings. At some point on this continuum, the "test" will no longer fit the format of a screening instrument, and it will begin to take on the characteristics of a comprehensive diagnostic interview. Both self-report and clinical interview modalities are designed to quantify the respondents' status in such a way as to facilitate a valid evaluation of their "caseness."

< previous page

page_47

next page >

< previous page

page_48

next page > Page 48

Both approaches lend themselves to actuarial quantitative methods, which allow for a normative framework to be established within which to evaluate individuals. Most importantly, both approaches work. It depends on the nature of the screening task, the resources at hand, and the experience of the clinicians or investigators involved to determine which method will work best in any particular situation. Screening Tests This section provides a brief synopsis of each of seven popular psychological tests and rating scales that are frequently employed as screening instruments. The assessment is not intended to be a comprehensive review, but rather to outline each measure and provide some information about its background and psychometric characteristics. In the case of commercially available tests (e.g., SCL-90-R, BSI, GHQ, BDI, BASIS-32), detailed discussions and comprehensive psychometric data are available from their published manuals. Scholarly reviews provide analogous information in the cases of the others (CES-D, HAS, HRDS). Five of the screening tests discussed here are self-report, and the remaining two are clinician rated. Table 2.2 provides a brief summary of instrument characteristics. SCL-90-R/BSI. The Symptom Checklist-90-Revised (SCL-90-R; Derogatis, 1977, 1983, 1994) is a 90-item, multidimensional, self-report symptom inventory derived from the Hopkins Symptom Checklist (L.R. Derogatis, Lipman, Rickles, Uhlenhuth, & Covi, 1974b) that was first published in 1975. The inventory measures symptomatic distress in terms of nine primary dimensions and three global indices of distress. The dimensions include Somatization, Obsessive-compulsive, Interpersonal Sensitivity, Depression, TABLE 2.2 Psychiatric Screening Tests in Common Use With Medical And Primary Care Populations Instrument Author/Date Mode Description Time Application Sensitivity/ Specificity SCL-90-R Self .73/91 Derogatis 90 items 15-20 1, 2, 3 (1975) multidim. min. 4, 7 Self .72/90 BSI 53 items 10-15 1, 2, 3 Derogatis (1975) multidim. min. 4, 7 GHQ Self .69-1.0/.75-.92 Goldberg 60, 30, 12 5-15 1, 2, (1972) items min. 3 4, 5 multidim. CES-D Self .83-.97/.61-.90 Radloff 20 items 10 1, 2, 3 (1977) unidim. min. 4 BDI Self .76-.92/.64-.80 Beck 21 items 5-10 2, 3, 4 (1961) unidim. min. BASIS-32 Self Eisen & 32 items 5-20 3, 5, 7 NA/NA Grob multidim. min. (1989) HAS Clin. 20 .91/.94 Hamilton Rating 14 items 1, 2, 3, (1959) bidim. 4, 6 HRDS Clin. 94-1.0/1.0 Hamilton Rating 21 items 30+ 1, 2, 3, (1960) unidim. min. 4, 5, 7 Note. 1 = Community adults, 2 = Community adolescents, 3 = Inpatient/outpatient, 4 = Medical patients, 5 = Elderly, 6 = Children, 7 = College students.

< previous page

page_48

next page >

< previous page

page_49

next page > Page 49

Anxiety, Hostility, Phobic Anxiety, Paranoid Ideation, and Psychoticism. Several matching clinical rating scales, such as the Derogatis Psychiatric Rating Scale and the SCL-90 Analogue Scale, which measure the same nine dimensions, are also available. Norms for the SCL-90-R have been developed for adult community nonpatients, psychiatric outpatients, psychiatric inpatients, and adolescent nonpatients. The Brief Symptom Inventory (BSI; L.R. Derogatis, 1993; L.R. Derogatis & Melisaratos, 1983; L.R. Derogatis & Spencer, 1982) is the brief form of the SCL-90-R. The BSI measures the same nine symptom dimensions and three global indices using only 53 items. Dimension scores on the BSI correlate highly with comparable SCL90-R scores (L.R. Derogatis, 1993), and the brief form shares most psychometric characteristics of the longer scale. Most recently, an 18-item version of the BSI has been developed and normed (L.R. Derogatis, 1997). Both the SCL-90-R and the BSI have been used as outcome measures in an extensive array of research studies, among them a number of investigations focusing specifically on screening (L.R. Derogatis et al., 1983; Kuhn, Bell, Seligson, Laufer, & Lindner, 1988; Royse & Drude, 1984; Zabora, Smith-Wilson, Fetting, & Enterline, 1990). To date, the SCL-90-R has been utilized in over 1,000 published research studies, with over 500 available in a published bibliography (L.R. Derogatis, 1990). The BSI has also demonstrated sensitivity to psychological distress in numerous clinical and research contexts (Cochran & Hale, 1985; O'Hara, Ghonheim, Heinrich, Metha, & Wright, 1989; Piersma, Reaume, & Boes, 1994). Both the SCL-90-R and the BSI have been translated into 26 languages. General Health Questionnaire (GHQ). The GHQ was originally developed as a 60-item, multidimensional, self-report symptom inventory by Goldberg (1972). Subsequent to its publication (Goldberg & Hillier, 1979), four subscales were factor analytically derived: Somatic Symptoms, Anxiety and Insomnia, Social Dysfunction, and Severe Depression. The GHQ is one of the most widely used screening tests for psychiatric disorder internationally, its popularity arising in part from the fact that several brief forms are available (e.g., the GHQ30 and GHQ12). The more recent brief forms retain the basic four subscale format of the longer parent scale, but avoid including physical symptoms as indicators of distress (Malt, 1989). The GHQ has been validated for use in screening and outcome assessment in numerous populations, including the traumatically injured, cancer patients, geriatric populations, and many community samples (Goldberg & Williams, 1988). Center for Epidemiologic Studies-Depression Scale (CES-D). The CES-D was developed by Radloff and her colleagues (1977). It is a brief, unidimensional, self-report depression scale comprised of 20 items that assess the respondent's perceived mood and level of functioning within the past 7 days. Four fundamental dimensionsDepressed Affect, Positive Affect, Somatic Problems, and Interpersonal Problemshave been identified as basic to the CES-D. The CES-D also has a total aggregate score. The CES-D has been used effectively as a screening test with a number of community samples (Comstock & Helsing, 1976; Frerichs, Areshensel, & Clark, 1981; Radloff & Locke, 1985), as well as medical (Parikh, Eden, Price, & Robinson, 1988) and clinic populations (Roberts, Rhoades, & Vernon, 1990). Recently, Shrout and Yager (1989) demonstrated that the CES-D could be shortened to 5 items and still maintain adequate sensitivity and specificity, as long as prediction was limited to traditional two-class categorizations. Generally, an overall score of 16 has been used as a cutoff score for depression, with approximately 15% to 20% of community populations scoring ³ 16.

< previous page

page_49

next page >

< previous page

page_50

next page > Page 50

Beck Depression Inventory (BDI). The BDI is a unidimensional, self-report, depression inventory employing 21 items to measure the severity of depression. Pessimism, guilt, depressed mood, self-deprecation, suicidal thoughts, insomnia, somatization, and loss of libido are some of the symptom areas covered by the BDI. The BDI was developed by Beck and his colleagues (A.T. Beck et al., 1961). A short (13-item) version of the BDI was introduced in 1972 (A.T. Beck & R.W. Beck, 1972), with additional psychometric evaluation accomplished subsequently (Reynolds & Gould, 1981). A revised version of the BDI has also been published (A.T. Beck & Steer, 1993). The BDI is characterized as being most appropriate for measuring severity of depression in patients who have been clinically diagnosed with depression. It has been utilized to assess depression worldwide with numerous community and clinical populations (Steer & A.T. Beck, 1996). Each of the items represents a characteristic symptom of depression on which respondents are to rate themselves on a 4-point (i.e., 0-3) scale. These scores are then summed to yield a total depression score. Beck's rationale for this system is that the frequency of depressive symptoms is distributed along a continuum, from "nondepressed" to "severely depressed." In addition, number of symptoms is viewed as correlating with intensity of distress and severity of depression. The BDI has been used as a screening device with renal dialysis patients, as well as with medical inpatients and outpatients (Craven, Rodin, & Littlefield, 1988). More recently, Whitaker et al. (1990) used the BDI with a group of 5,108 community adolescents and noted that it performed validly in screening for major depression in this previously undiagnosed population. In screening community populations, scores in the range of from 17 to 20 are generally considered suggestive of dysphoria, whereas scores greater than 20 are felt to indicate the presence of clinical depression (Steer & A.T. Beck, 1996). Behavior and Symptom Identification Scale (BASIS-32). The BASIS-32, otherwise known as the Behavior and Symptom Identification Scale, was designed and developed by Eisen and Grob (1989) to evaluate the outcome of mental health interventions from the perspective of the patient. It is a 32-item, self-report inventory that assesses current "difficulty in the major symptom and functioning domains that lead to the need for inpatient psychiatric treatment" (Eisen, 1996, p. 65). A 1-week time window is typically utilized with the BASIS-32, and respondents rate the degree of difficulty they have been experiencing on each item via 5-point (0-4) scales. The BASIS-32 contains five cluster analysis-derived subscales: Relation to Self and Others, Daily Living and Role Functioning, Anxiety and Depression, Impulsive and Addictive Behavior, and Psychosis. The five domains are not consonant with diagnostic entities, but reflect problems and manifestations of mental illness that are central aspects of the majority of psychiatric illnesses. The BASIS-32 has been utilized primarily with psychiatric hospital inpatients, and to a lesser degree with psychiatric outpatients and day-hospital patients. Fundamental psychometric characteristics of the scale are quite respectable (Eisen, 1996), and its sensitivity to change has been demonstrated in large sample improvement profiles (Eisen & Dickey, 1996). Although developed and validated within psychiatric inpatient populations, research on the use of the BASIS-32 with outpatient cohorts is in progress, and may confirm its utility for psychiatric populations in general. By integrating items reflecting problems in daily living and functional status with those representing formal psychiatric symptoms, the BASIS-32 could fulfill a long-standing need for a brief outcomes measure focused on psychosocial integration.

< previous page

page_50

next page >

< previous page

page_51

next page > Page 51

Hamilton Anxiety Scale (HAS). The HAS is a 14-item clinician rating scale published in 1959 by Hamilton. Each item represents a clinical feature of anxiety, requiring the clinician to rate the client on a 5-point scale from (0) ''not present" to (4) "very severe." The items reflect both somatic (e.g., cardiovascular, respiratory, gastrointestinal, and genitourinary), and psychic/cognitive (e.g., memory and concentration impairment) manifestations of anxiety. The HAS was designed to yield two separate subscores for "psychic anxiety" and "somatic anxiety." The HAS has been used with children as well as adults (Kane & Kendall, 1989), coronary artery bypass patients (Erikkson, 1988), general medical/surgical patients (Bech, Grosby, Husum, & Rafaelson, 1984), psychiatric outpatients (Riskind, Bech, Brown, & Steer, 1987), and many other groups. In addition to these applications, the HAS has become accepted as a standard outcome measure in clinical anxiolytic drug trials. Hamilton Rating Scale for Depression (HRDS/HAM-D). The HRDS is similar to the HAS in that both provide quantitative assessments of the severity of a clinical disorder. The HRDS was developed in 1960 by Hamilton, and was later revised in 1967 (Hamilton, 1967). It consists of 21 items, each measuring a depressive symptom. Hamilton recommended using only 17 items when scoring because of the uncommon nature of the remaining items (e.g., depersonalization). Hedlund and Vieweg (1979) reviewed the psychometric and substantive properties of the HRDS in two dozen studies and gave it a very favorable evaluation. More recently, Bech (1987) completed a similar review and concluded that the HRDS is an extremely useful scale for measuring depression. A Structured Interview Guide for the HRDS is also available (Williams, 1988). It provides standardized instructions for administration, and has been shown to improve interrater reliability. Just as the HAS with anxiety, the HRDS has also become a standard outcome measure in antidepressant drug trials. Psychological Screening in Specific Settings Community Settings By far the most comprehensive data on the prevalence of psychiatric disorders in the community has been developed from the NIMH Epidemiologic Catchment Area (ECA) investigation, a study of psychiatric disorders in the community involving nearly 20,000 individuals. These results make explicit the fact that psychiatric disorders are highly prevalent in society. This is so regardless of whether we assess lifetime (Robins et al., 1984), 6-month (Myers et al., 1984; Blazer et al., 1984) or 1-month (Regier et al., 1988) prevalence estimates. Detailing the latter, the 1-month prevalence for any psychiatric disorder, across all demographic parameters, was 15.4%, which is similar to European and Australian estimate ranging from 9% to 16% (Regier et al., 1988). In terms of specific diagnoses, the overall rate for affective disorders was 5.1%, whereas that for anxiety disorders was 7.3% (Regier et al., 1988). Six-month prevalence estimates for affective disorders ranged from 4.6% to 6.5% across the five ECA sites (Myers et al., 1984), whereas 6-month estimates for anxiety disorders, recently updated by Weissman and Merikangas (1986), reveal rates for panic disorder ranging from 0.6% to 1.0%. Agoraphobia showed prevalences from 2.5% to 5.8% across the various ECA sites.

< previous page

page_51

next page >

< previous page

page_52

next page > Page 52

Clearly, these data demonstrate that psychiatric disorders are a persistent and demonstrable problem that affect substantial numbers of the community population. Unfortunately, there is no effective system for screening individuals in the community per se; no action can be taken until they seek medical advice or treatment for a disorder, and then formally enter the health care system. At that point, primary care "gatekeepers" have the first, and in most instances the only, opportunity to identify psychiatric morbidity. Medical Settings In medical populations, prevalence estimates of psychiatric disorder are substantially increased over community rates. This is particularly true of anxiety and depressive disorders, which by far account for the majority of psychiatric diagnoses assigned to medical patients (Barrett et al., 1988; Derogatis et al., 1983; Von Korff, Dworkin, & Kruger, 1988). In recent reviews of psychiatric prevalence in medical populations, J.E. Barrett et al. (1988) observed prevalence rates of from 25% to 30%, and L.R. Derogatis and Wise (1989) reported prevalence estimates for a broad range of medical cohorts varying from 22% to 33%. Derogatis and Wise concluded that, "in general, it appears that up to one-third of medical inpatients reveal symptoms of depression, while 20 to 25% manifest more substantial depressive symptoms" (p. 101). Concerning anxiety, Kedward and Cooper (1966) observed a prevalence rate of 27% in their study of a London general practice, whereas Schulberg and his colleagues (1985) observed a combined rate of 8.5% for phobic and panic disorders among American primary care patients. In another contemporary review, Wise and Taylor (1990) concluded that from 5% to 20% of medical inpatients suffer the symptoms of anxiety, and 6% receive formal anxiety diagnoses. They further determined that depressive phenomena are even more prevalent among medical patients, citing reported rates of depressive syndromes of from 11% to 26% in inpatient samples. With such prevalence rates and the acknowledged escalations in morbidity and mortality associated with psychiatric disorders, there is little doubt that screening programs for psychiatric disorders in medical populations could achieve impressive utility. Potential therapeutic gains associated with psychiatric screening would be further enhanced and magnified by the fact that attendant related problemssuch as substance abuse, inappropriate diagnostic tests, and high utilization of health care servicesalso would be minimized. Particularly dramatic gains could be realized in specialty areas where estimated prevalence rates are over 50% (e.g., HIV: Lyketsos, Hutton, Fishman, Schwarz & Trishman, 1996; or obesity/weight reduction: Goldstein, Goldsmith, Anger, & Leon, 1996). In general, early and accurate identification of occult mental disorders in individuals with primary medical conditions would lead to a significant improvement in their well-being. It would also help relieve the fiscal and logistic strain on the health care system. Physician Recognition of Psychiatric Disorder. It is now well-established that in the United States, primary care physicians represent a de facto mental health care system (Burns & Burke, 1985; Regier et al., 1982; Regier et al., 1978). There is reliable evidence that one fifth to one third of the primary care population suffer from at least one psychiatric condition (typically an anxiety or depressive disorder; Derogatis & Wise, 1989), rendering the competence with which primary care physicians recognize psychiatric

< previous page

page_52

next page >

< previous page

page_53

next page > Page 53

disorders a critical issue. Unfortunately, current evidence suggests that only a fraction of prevalent psychiatric disorders are detected in primary care, a deficiency with considerable implications for both the physical and psychological health of patients (Seltzer, 1989). Also, because undetected psychiatric disorders are associated with increased morbidity and mortality, and enhanced use of health care facilities (Katon et al., 1990; Wells et al., 1988), the "costs" of failing to detect them are substantially magnified. Unaided Physician Recognition. During the past decade, a substantial number of studies have been reported documenting both the magnitude and nature of the problem of undetected psychiatric disorder among primary care physicians (Davis, Nathan, Crough, & Bairnsfather, 1987; Jones, Badger, Ficken, Leepek, & Andersen, 1987; Kessler, Amick, & Thompson, 1985; Schulberg et al., 1985). The data from these studies establish rates of accurate physician diagnosis of psychiatric conditions, which range from a low of 8% (Linn & Yager 1984) to a high of 53% observed by Shapiro et al. (1987), with an elderly cohort. More recently, in a study focused on primary care physicians, Yelin et al. (1996) observed that 44% of over 2,000 primary care patients who screened positive for clinical anxiety on the SCL-90R (Derogatis, 1994) had been previously assigned a mental health diagnosis. Although an improvement, these data also underscore the fact that 56% of these patients' mental conditions went undiagnosed. Although the methodology and precision of studies of this phenomenon continue to improve (Anderson & Harthorn, 1989; Rand, Badger, & Coggins, 1988), rates of accurate physician diagnosis have remained for the most part unacceptably low. A summary of these investigations along with their characteristics and accurate detection rates appear in Table 2.3. Aided Physician Recognition. The data from the aforementioned investigations strongly suggest that proactive steps must be taken to facilitate the accurate recognition of psychiatric conditions among primary care doctors. This is particularly the case in light of contemporary changes in health care, which suggest that in the future nonpsychiatric physicians will be playing a greater rather than a lesser role in this regard. If TABLE 2.3 Recent Research on Rates of Accurate Identification of Psychiatric Morbidity in Primary Care Investigator Study Sample Criteria Correct Diagnosis (%) Andersen & 120 physicians primary care DKI 33% affective Harthorn (1989) disorder Davis et al. (1987) 377 family practice patients Zung 15% mild SDS symptoms 30% severe symptoms Jones et al. (1987) 20 family physicians/ 51 patients DIS 21% Rand et al. (1988) 36 family practice residents/ 520 GHQ 16% patients Kessler et al. (1985) 1,452 primary care patients GHQ 19.7% Linn & Yager 150 patients in a general medical Zung 8% (1984) clinic Schulberg et al. 294 primary care patients DIS 44% (1985) Shapiro et al. (1987) GHQ 53% 1,242 Patients at university internal medicine clinic Zung et al. (1983) 41 family medicine patients Zung 15% SDS

< previous page

page_53

next page >

< previous page

page_54

next page > Page 54

primary care physicians cannot correctly identify psychiatric conditions, then they can neither adequately treat them personally nor refer them to appropriate mental health professionals. Such a situation will ultimately serve to degrade the quality of the health care systems further, and help to deny effective treatment to those who in many ways need it most. There is some evidence that primary care physicians can accurately identify both the prevalence of psychiatric disorders and the nature of these conditions. They estimate prevalence to be between 20% and 25% in their patient populations, and perceive anxiety and depressive disorders to be the most prevalent conditions they encounter (Fauman, 1983; Orleans, George, Haupt, & Brodie, 1985). In an effort to identify and overcome the problems inherent in detecting psychiatric conditions in primary care, a number of investigators have studied the effects of introducing a diagnostic aid for primary care doctors, in the form of results from a psychological screening test. Although far from unanimous, the studies completed during the past decade have concluded that, in the appropriate situation, screening tests can significantly improve physician detection of psychiatric conditions. Linn and Yager (1984), using the Zung SDS, found an increase from 8% correct diagnosis to 25% in a cohort of 150 general medical patients. Similarly, Zung, Magill, Moore, and George (1983) reported an increase in correct identification rising from 15% to 68% in family medicine outpatients with depression. Likewise, Moore, Silimperi, and Bobula (1978) observed an increase in correct diagnostic identification from 22% to 56% working with family practice residents. Not all studies have shown such dramatic improvements in diagnostic accuracy, however. Hoeper, Nyczi, and Cleary (1979) found essentially no improvement in diagnosis associated with making GHQ results available to doctors, and Shapiro et al. (1987) reported only a 7% increase in accuracy when GHQ scores were made accessible. The question of aided recognition of psychiatric disorders is a complex one, with numerous patient and doctor variables playing an important role. Nonetheless, the results of the studies on aided recognition appear promising, and an excellent contemporary review of the issues involved has been written by Anderson and Harthorn (1990). Problems Unique to Psychiatric Disorders. The prototypic psychiatric disorder is a hypothetical construct, with few pathognomonic clinical or laboratory indicators and a pathophysiology and etiology that are only dimly discernible. For these reasons, singular problems arise in the detection of psychiatric disorders, particularly in medical patients. To begin with, the highly prevalent anxiety and depressive disorders have a multitude of somatic symptoms associated with them, which are difficult to differentiate from those arising from verifiable physical causes. Schurman, Kramer, and Mitchell (1985) indicated that 72% of visits to primary care doctors resulting in a psychiatric diagnosis presented with somatic symptoms as the primary complaint. Katon et al. (1990) and Bridges and Goldberg (1984) both indicated that presentation with somatic symptoms as primary complaints is a key reason for misdiagnosis of psychiatric disorders in primary care. In their study of high health care utilizers, Katon et al. (1990) reported that the high utilization group had elevated SCL-90-R scores of over three quarters of a standard deviation, not only on the Anxiety and Depression subscales, but on the Somatization subscale as well. A second problem, more specific to the chronically or terminally ill, has to do with the misperception of clinical depressions as demoralization reactions (Derogatis & Wise, 1989). Most serious chronic illnesses, and those that inevitably result in mortality, have

< previous page

page_54

next page >

< previous page

page_55

next page > Page 55

as a natural aspect of the illness a period of disaffection and demoralization. These negative affective responses are a natural reaction to the loss of vitality and well-being associated with being chronically ill and, where appropriate, the anticipated loss of life itself. Physicians familiar with caring for such patients (e.g., cancer, emphysema patients) frequently misperceive true clinical depressions (for which effective treatments are available) for reactive demoralized states that are a natural part of the illness. They then fail to initiate a therapeutic regimen on the grounds that such mood states are part of the primary medical condition. There is good evidence that such reactive states can be reliably distinguished from major clinical depressions (Snyder, Strain, & Wolf, 1990), and patients suffering such painful comorbid conditions are done a substantial disservice if physicians fail to diagnose adequately and treat their disorders. Although understandable, this composite of problems has a highly regressive impact on the overall health care system. As it is now structured, it is a system where the preponderance of psychiatric disorders are seen by primary care physicians, who undeniably leave a majority of these conditions undetected. Of the cases they do identify, only a small minority are ever referred to mental health specialists, even though such conditions are known to be of a chronic and recurrent nature and primary care physicians admit they feel less than fully competent to treat them. Undetected or improperly treated anxiety and depressive disorders are known to be disproportionately associated with substance abuse, alcoholism, excessive diagnostic tests, suicide, excessive utilization of the health care system, and spiraling health care costs. We must greatly facilitate the proficiency of our primary care physicians in the detection of psychiatric disorders if our hopes of ever developing an efficient, cost-effective system are to be anything but illusory. Academic Settings Recollections of college days usually bring to mind idyllic images of youthful abandon and the pursuit of personal growth and pleasure, unencumbered by the tedious demands and stresses of everyday adult life. Unfortunately, the realities of contemporary student life paint a different portrait. The period of modern undergraduate and graduate studies represents a phase in the life cycle of rapid change, high stress, and previously unparalleled demands on an individual's coping resources. In the light of this reality, it is not surprising that it also represents a phase of life associated with a high incidence of psychiatric morbidity. Numerous studies have reported prevalence rates of psychological disorders in university populations. Telch, Lucas, and Nelson (1989) investigated panic disorder in a sample of 2,375 college students and found that 12% reported at least one panic attack in their lifetime. Furthermore, 2.36% of the sample met DSM-III-R criteria for panic disorder. Craske and Krueger (1990) reported lifetime prevalence of nocturnal panic attacks in 5.1% of their 294 undergraduates. Prevalence of daytime panic attacks was also 5.1%, but only 50% of those reporting nocturnal panic also reported daytime panic. Disorders that are especially salient in college populations include addiction, eating disorders, and depression. West, Drummond, and Eames (1990) found that 25.6% of men and 14.5% of women in a sample of 270 college students reported drinking large quantities of alcohol weekly. This same sample indicated that 20% of men and 6% of women had damaged property after drinking in the past year. Seay and T. Beck (1984) administered the Michigan Alcohol Screening Test to 395 undergraduates and discovered

< previous page

page_55

next page >

< previous page

page_56

next page > Page 56

that 25% were problem drinkers and 7% were alcoholics. However, only 1% were aware they had a drinking problem. Eating disorders, especially bulimia, are relatively common in college populations, because the average age at onset is between adolescence and early adulthood (American Psychiatric Association, 1987). In a study of 1,040 college students, Striegel-Moore, Silberstein, Frensch, and Rodin (1989) found rates of bulimia of 3.8% for females and 0.2% for males. In a study of 69 college women, Schmidt and Telch (1990) reported the prevalence of personality disorders in three groups. In a group defined as bulimic, 61% of subjects met criteria for at least one personality disorder. In a group of "nonbulimic binge eaters," 13% met criteria for a disorder. And in the control group, only 4% of individuals met criteria for a personality disorder. Most bulimics (57%) who exhibited personality disorder met criteria for borderline personality disorder. V. Wells, Klerman, and Deykin (1987) reported that 33% of their sample of 424 adolescents met the standard criteria for depression using the CES-D. The rate fell to 16% when more stringent duration criteria were applied. Even more troubling are the results of a study by McDermott, Hawkins, Littlefield, and Murray (1989), which revealed that 65% of the 331 college women and 51% of the 241 college men they surveyed met criteria for depression using the CES-D. Furthermore, 10% of this sample reported contemplating self-injurious behavior during the previous week. Suicidal ideation was reported by 8%, and 1% said they thought about suicide "most or all of the time" during the past week. The results of these studies make it apparent that university students suffer from a considerable prevalence of psychiatric morbidity. A critical question then becomes, to what degree is this morbidity detected by university health centers? As in the community, university physicians carry much of the burden for detecting psychological disorders because of the nature of their contacts with students. These physicians invariably treat many patients who present with somatic complaints that are actually manifestations of an underlying psychological disorder. The relative homogeneity of the age of the student group does carry some advantages with it, but beyond that fact, university physicians are essentially in the same position relative to recognizing psychological disorders as their primary care counterparts. There is very little published data regarding the accuracy of university physicians' diagnoses; however, it is probably safe to assume that they are approximately as precise as their primary care colleagues. As reviewed earlier, primary care doctors show rates of accurate diagnoses ranging from 8% (Linn & Yager, 1984) to 53% (Shapiro et al., 1987), with a majority of the rates remaining below 33% (L.R. Derogatis, DellaPietra, & Kilroy, 1992). This is obviously an unsatisfactory level of detection, given the prevalence of disorder in college populations. Aided Recognition of Disorders in Academic Settings. The use of aided recognition screening paradigms appears to be a strategy that may improve the rate of detection of psychological disorders in this important population. As evidenced by analogous approaches in primary care, significant improvement in recognition rates can be developed from the implementation of such systems. The components required involve screening tests that are valid indicators of psychiatric morbidity in this age group, and a systematic and meaningful system of application within universities. Several screening instruments have been used successfully with adolescent and college student populations and have been shown to have adequate sensitivity and specificity. The BDI and the CES-D are two unidimensional measures that have been used with college populations. Whitaker et al. (1990) used the BDI with 5,108 adolescents and

< previous page

page_56

next page >

< previous page

page_57

next page > Page 57

found it to have moderate validity in this population. Schmidt and Telch (1990) also used the BDI to measure depression in college women with bulimia. McDermott et al. (1989) used the CES-D in order to investigate health-related practices and events, and depression in college students. They judged the scale to be "practical and reliable." The General Behavior Inventory (Depue, Krauss, Spoot, & Arbisi, 1989) was used to detect unipolar and bipolar conditions in college students and was found to have adequate sensitivity (i.e., @ .76) and high specificity (i.e., .99). Two multidimensional inventories that have also proven useful with college populations are the GHQ and the SCL-90-R. The GHQ was used by Szulecka, Springett, and De Pauw (1986) to identify first-year undergraduates who might be good candidates for psychotherapy, whereas the SCL-90-R and its briefer counterpart, the BSI, have been utilized with a number of studies of distress in university students. For example, Benjamin, Kasniak, Sales, and Shanfield (1986) used the BSI to measure distress in law and medical students, and Johnson, Ellison, and Heikkinen (1989) employed the SCL-90-R to describe the type and severity of psychological symptoms in university students attending a counseling center. The substantial prevalence of psychiatric disorders in university populations is well-documented, and there are a variety of instruments currently available that can improve the rate of detection of psychological disorders in university students. It is now incumbent on academic decision makers to formally integrate such measures into university mental health evaluation systems. Implementation of a Screening System. In implementing a university-based screening system, there are two broad approaches available. The first is to take a preventive posture. Szulecka et al. (1986) used this method with entering students at Nottingham University. At time of registration, students were given the GHQ. Students scoring high (i.e., showing more distress) were then split into two groups: an intervention group (IG) and a control group (CG). There was also a matched group (MG) of students scoring low on the GHQ. The IG subjects were offered an interview to discuss their feelings about the GHQ and adjustment to college life. They were made aware of counseling and support services on campus. At the end of the year, all students again took the GHQ. Although many results failed to reach statistical significance, they supported a number of clinically relevant trends. Compared to the CG, the IG showed more improvement in GHQ scores at follow-up, made fewer consultations to physicians, had fewer withdrawals from the university, and had fewer students fail out of school. Student reaction was positive, with many students viewing the program as "evidence of care," which also "strengthened confidence in the Health Centre." The results imply that identifying vulnerable students on admission and offering help can have beneficial effects. Although the CG made more consults to physicians, they were less likely to return, showing that distressed students may not always follow through in seeking help. For this reason, an active outreach program seems essential. A second approach to the problem that integrates an active outreach component is described by L. Clark, Levine, and Kinney (1988-1989). This process is described with a specific focus on the prevention, identification, and treatment of bulimia; however, it reflects a generic set of procedures that can be applied to psychiatric disorders in general. Clark et al. addressed the problem from multiple sources. They enlisted faculty, staff, the library, campus media, counselors, physicians, and peers, thereby heightening awareness of the resources available for treating psychological problems. Examples of services that could be developed and brought to bear on the problem include courses, workshops,

< previous page

page_57

next page >

< previous page

page_58

next page > Page 58

and public lectures about psychological disorders (given by faculty), information sessions (given by counselors, ministers, residence hall coordinators), accessible reading materials and lists of people/organizations to contact (in the library), public service announcements (in all campus media), and peer support groups. Once students are made aware of the nature of psychological problems and the availability of a full range of services, counselors, physicians, and "recovered" patients can provide screening, therapy, medical evaluation and treatment, and support. Obviously, these two approaches to implementing screening are not mutually exclusive, and combining certain aspects would probably lead to an even more effective process of identifying and treating psychological disorders on campus. Matriculating students could be assessed prior to admission, and those identified as "at risk" could be offered interventions. Those who "slip through" the screening process, or those who develop psychological problems after entering school, would hopefully seek help as a result of the campus' campaign for awareness. Such a comprehensive program would insure that a large majority of students in need are reached. Screening for Suicidal Behavior "Suicidal behavior" is a phrase that strongly affects most physicians, psychologists, and other health care professionals. Suicide has always been a perplexing subject for members of the health care community because of its perceived unpredictability and its inherent life threatening nature. Chiles and Strosahl (1995) defined suicidal behavior as a "broad spectrum of thoughts, communications, and acts . . . ranging from the least common, completed suicide . . . to the more frequent, suicidal communications . . . and the most frequent, suicidal ideation and verbalizations" (p. 51). Chiles and Strosahl reported that the rate of suicide has remained stable in the United States for the past 20 years at approximately 12.7 deaths per 100,000, and ranks as the eighth leading cause of death in the general population. Suicide ranks as the third leading cause of death for individuals from 18 to 24 years old, and the suicide rate in the elderly (i.e., over 65) is approximately double the rate of that for the 18- to 24-year-old population. Suicidal behavior is generally broken down into three categories. The first type of suicidal behavior is suicidal ideation, or thoughts about suicide. Relatively little is known about the predictive value of suicidal ideation. The second type of suicidal behavior concerns suicide attempts, which tend to be more common in females and younger individuals. Chiles and Strosahl (1995) indicated that approximately 50% of those who attempt suicide have no formal mental health diagnosis. The last category of suicidal behavior is completed suicide, which is more common in males and older individuals, many of whom have formal psychiatric diagnoses. Evidence also suggests that whites and divorced or separated persons are at increased risk for suicide, as are individuals with diagnoses of depression, drug abuse, panic disorder, generalized anxiety disorder, phobias, posttraumatic stress disorder, obsessive compulsive disorder, somatoform, and dysthymic disorder (Lish et al., 1996). Other risk factors include loss of a spouse (increases risk for up to 4 years), unemployment, physical illness, bereavement, and physical abuse. Some personality traits associated with suicide include poor problem solving, dichotomous thinking, and feelings of helplessness (Chiles & Strosahl, 1995). Lish et al. (1996) noted that 82% of the people who commit suicide have visited primary care physicians within the past 6 months, 53% within 1 month, and 40% 1 week prior to the suicide. This situation makes it imperative that primary care physicians

< previous page

page_58

next page >

< previous page

page_59

next page > Page 59

are able to screen for and recognize the risk factors involved in suicidal behavior. Because medical illness is itself a risk factor for suicide, primary care physicians are more likely to see cases of suicidal behavior in their initial stages compared to mental health professionals such as psychologists or psychiatrists. And primary care physicians are usually not well-trained in the identification of mental health disorders, so they are at increased risk for missing the risk factors associated with suicidal behavior. A major problem affecting screening for suicidal behavior is the phenomena of low base rates (see later). The problem of low base rates is related to the low prevalence of suicide in the general population to be screened. Chiles and Strosahl (1995) reported a lifetime prevalence of suicide between 1% and 12%, and Lish et al. (1996) reported a 7.5% prevalence of suicidal behavior in a VA hospital sample. As discussed later in the chapter, with prevalences this low, even the most valid screening tests will produce an unacceptably high number of false positives for every true positive identified. One of the most common techniques used to estimate or predict suicidal behavior is the profile of risk factors. As already mentioned, age (younger and older are at higher risk for suicidal behavior) and race (whites, Hispanics, and Asians are two times more likely to attempt suicide than African Americans; Lish et al., 1996) have significant predictive value. In addition, those with a mental health diagnosis are 12 times more likely to attempt suicide, those that have had previous mental health treatment are 7 times more likely to attempt suicide, and those in poor physical health are 4 times more likely to attempt suicide. Even with these ratios, Chiles and Strosahl (1995) noted that profiling is not sufficiently powerful to accurately predict suicide in individuals, but is better suited to documenting differential rates of occurrence of suicidal behavior across groups. Some key risk factors addressed as part of an overall evaluation of suicidal behavior include an individual's positive evaluation of suicidal behavior, low ability to tolerate emotional pain, high levels of hopelessness, a sense of inescapability, and low survival and coping beliefs. Although a clinical interview with a detailed history of treatment and previous suicidal behaviors appears to be the most effective predictor of current suicidal behavior, this process can often be very time consuming and is not cost-effective or practical for screening purposes. Several brief instruments have been found to be useful in predicting suicidal behavior, including the Beck Hopelessness Scale (BHS; A.T. Beck, Kovacs, & Weissman, 1975) and the BDI (A.T. Beck & Steer, 1993). Westefeld and Liddel (1994) noted that the 21-item BDI may be particularly useful for screening for suicidal behavior in college students. L.R. Derogatis and M.F. Derogatis (1996) documented the utility of the SCL-90-R and the BSI in screening for suicidal behavior. A number of investigators have reported the primary symptom dimensions and the global scores of the SCL-90-R/BSI capable of discriminating suicidal behavior in individuals diagnosed with depression and panic disorder (Bulik, Carpenter, Kupfer, & Frank, 1990; Noyes, Chrisiansen, Clancy, Garvey, & Suelzer, 1991). Similarly, Swedo et al. (1991) found that all SCL-90-R subscales successfully distinguished suicide attempters from controls in an adolescent population, and the majority of subscales were effective in discriminating "attempters" from an intermediate at-risk group. Adolescents and adults who attempted suicide tended to perceive themselves as more distressed and hopeless on the SCL-90-R than the at-risk group, a finding confirmed by Cohen, Test, and Brown (1990) using the BSI. Several other instruments have been used to screen for suicidal behavior, including the SCREENER (Lish et al., 1996) and the College Student Reasons for Living Inventory (Westefeld, Cardin, & Deaton, 1992). Lish et al. (1996) used the SCREENER to screen

< previous page

page_59

next page >

< previous page

page_60

next page > Page 60

for psychiatric disorders and to determine the possibility of suicidal behavior. The SCREENER screens for DSM-IV Axis I conditions and is available in 96-item and 44-item forms. It contains three questions that address death or suicide and one that addresses suicidal ideation directly. Lish et al. stated that clinicians should screen for substance abuse and anxiety disorders as well as major depression because these disorders all increase the risk of suicidal behavior in a primary care setting. The College Student Reasons for Living Inventory (CSRLI) is a college student version of the 47-item Reasons for Living Inventory (Westefeld et al., 1992). The CSRLI produces a total score and six subscale scores that include Survival and Coping Beliefs, College and Future-related Concerns, Moral Objections, Responsibility to Friends and Family, Fear of Suicide, and Fear of Social Disapproval. Westefeld, Bandura, Kiel, and Scheel (1996) collected additional data on the CSRLI and found that college students who were at higher risk for suicidality endorsed fewer reasons for living. Westefeld et al. stated that the data supports using the CSRLI as an effective screening tool in a college setting. Two other screening methods for suicidal behavior are worth brief mention. The first technique is a computer interview for suicide risk prediction developed by Greist et al. (1973). This method uses a computeradministered interview that consists of both open-ended and multiple-choice questions. The program used a "branching" system of questions based on answers previously given to determine suicidal risk. The method was well received by the patients and predicted 70% of suicide attempts correctly, whereas clinicians predicted only 40% accurately. A final method to be mentioned, in contradistinction to the traditional profiling approach mentioned previously, is the "manifest predictors" method of Bjarnason and Thorlindsson (1994). They suggested the use of "manifest predictors''such as school postures (two questions), leisure time (two questions about music and two about what one does with leisure time), peer and parent relationships (eight questions), consumption (five questions involving smoking, alcohol, caffeine, and skipping meals), and contact with suicidal behavior (three questions)as a complement to the more commonly used "latent" predictors (i.e., depression and hopelessness). Screening for Cognitive Impairment Screening for cognitive impairment, especially when dealing with geriatric populations, is extremely important because it is estimated that up to 70% of patients with an organic mental disorder (OMD) go undetected (Strain et al., 1988). Some OMDs are reversible if discovered early enough, so screening programs in high risk populations can have a very high utility. Even in conditions found to be irreversible, early detection and diagnosis can help in the development of a treatment plan and the education of family members. Instruments with a General Versus Specific Focus. There are several instruments available that provide quick and efficient screening of cognitive functioning. Most of these address the general categories of cognitive functioning covered in the standard mental status examination, including attention, concentration, intelligence, judgment, learning ability, memory, orientation, perception, problem solving, psychomotor ability, reaction time, and social intactness (McDougall, 1990). However, not all instruments include items from all of the previous categories. These general instruments can be contrasted with another class of cognitive screening measures characterized by a more specific focus. For example, the Stroke Unit Mental Status Examination (SUMSE) was

< previous page

page_60

next page >

< previous page

next page >

page_61

Page 61 designed specifically to identify cognitive deficits and plan rehabilitation programs for stroke patients (Hajek, Rutman, & Scher, 1989). Another example of a screening instrument with a specific focus is the Dementia of Alzheimer's Type Inventory (DAT), designed to distinguish Alzheimer's disease from other dementias (Cummings & Benson, 1986). Previously, specific types of measure tended to be less common, owing to their limited range of applicability. More recently, specific scales have become more popular as they have been used in conjunction with general measures. Unlike other screening tests, the great majority of cognitive impairment scales are administered by an examiner. Of the instruments reviewed here, none are self-report measures. There are no pencil-and-paper inventories that can be completed by the respondent alone. Instead, these screening measures are designed to be administered by a professional and require a combination of oral and written responses. Most of the tests are highly transportable, however, and can be administered by a wide variety of health care workers. Following are nine cognitive impairment screening measures. This is not intended to be an exhaustive review, but rather to provide some data on the nature of each measure and its psychometric properties (see Table 2.4). Mini-Mental State Examination (MMSE). The MMSE was developed by M. Folstein, S. Folstein, and McHugh (1975) to determine the level of cognitive impairment. It is an 11-item scale measuring six aspects of cognitive function: orientation, registration, attention and calculation, recall, language, and praxis. Scores can range from 0 to 30, with lower scores indicating greater impairment. The MMSE has proved successful at assessing levels of cognitive impairment in many populations, including community residents (Kramer, German, Anthony, Von Korf, & Skinner, 1985), hospital patients (Teri, Larson, & Reifler, 1988), residents of long-term care facilities (Lesher & Whelihan, 1986), and neurological patients (Dick et al., 1984). However, Escobar et al. (1986) suggested using another instrument with Spanish-speaking individuals, as the MMSE may overestimate dementia in this population. Roca et al. (1984) also recommended other instruments for patients with less than 8 years of schooling for similar reasons. In contrast, the MMSE may underestimate cognitive TABLE 2.4 Screening Instruments for Cognitive Impairment Instrument Author MMSE M. Folstein et al. (1975) CCSE

Jacobs et al. (1977)

SPMS

Pfeiffer (1975)

HSCS

Faust & Fogel (1989)

MSQ

Kahn et al. (1960)

Description 11 itemsdesigned to determine level of cognitive impairment 30 itemsdesigned to detect presence of organic mental disorder 10 itemsdesigned to detect presence of cognitive impairment 15 itemsdesigned to estimate presence, scope, and severity of cognitive impairment

Sensitivity/ Application Specificity 1, 3, 4 .83/.89

2, 3, 4, 5

.73/.90

1

.55-.88/.72.96

2

.94/.92

1, 4, 5, 6 .55-.96/NA 10 itemsdesigned to quantify dementia Note. 1 = community populations, 2 = cognitively intact, 3 = hospital inpatients, 4 = medical patients, 5 = geriatric, 6 = long-term care patients.

< previous page

page_61

next page >

< previous page

page_62

next page > Page 62

impairment in psychiatric populations (Faustman, Moses, & Cernansky, 1990). The MMSE appears to have lower sensitivity with mildly impaired individuals who are more likely to be labeled as demented (Doyle, Dunn, Thadani, & Lenihan, 1986). As such, the MMSE is most useful for patients with moderate to moderately severe dementia. Fuhrer and Ritchie (1993) confirmed that the MMSE was more discriminating for moderate dementias as opposed to milder cases, but did not find a significant difference associated with education. The authors also noted that cutoff scores for the MMSE require adjustment when comparisons involve clinical samples with base rates higher than the 6% prevalence observed in the general population. Cognitive Capacity Screening Examination (CCSE). The CCSE is a 30-item scale designed to detect diffuse organic disorders, especially delirium, in medical populations. The instrument was developed by Jacobs, Berhard, Delgado, and Strain (1977) and is recommended if delirium is suspected. The items include questions of orientation, digit recall, serial sevens, verbal short-term memory, abstractions, and arithmetic, all of which are helpful in detecting delirium (Baker, 1989). The CCSE has been used with geriatric patients (McCartney & Palmateer, 1985), as well as hospitalized medical-surgical patients (Foreman, 1987). In a comparison study of several brief screening instruments, the CCSE was shown to be the most reliable and valid (Foreman, 1987). Like the MMSE, the CCSE is also influenced by the educational level of the subject. However, unlike the MMSE, the CCSE cannot differentiate levels of cognitive impairment or types of dementias, and is most appropriate for cognitively intact patients (Judd et al., 1986). Short Portable Mental Status Questionnaire (SPMSQ). The SPMSQ (Pfeiffer, 1975) is a 10-item scale for use with community and/or institutional residents. This scale is unique in that it has been used with rural and less-educated populations (Baker, 1989). The items assess orientation, and recent and remote memory; however, visuospatial skills are not tested. The SPMSQ is a reliable detector of organicity (Haglund & Schuckit, 1976), but it should not be used to predict the progression or course of the disorder (Berg, Edwards, Danziger, & Berg, 1987). High Sensitivity Cognitive Screen (HSCS). This scale was designed to be as sensitive and comprehensive as lengthier instruments while still being clinically convenient. It was developed by Faust and Fogel (1989) for use with 16- to 65-year-old, native Englishspeaking subjects with at least an eighth-grade education who are free from gross cognitive dysfunction. The 15 items include reading, writing, immediate and delayed recall, and sentence construction tasks, among others. The HSCS has shown adequate reliability and validity and is best used to estimate presence, scope, and severity of cognitive impairment (Faust & Fogel, 1989). The HSCS cannot pinpoint specific areas of involvement and, as with most of these scales, should represent a first step toward cognitive evaluation, not a substitute for a standard neuropsychological assessment. Mental Status Questionnaire (MSQ). The MSQ is a 10-item scale developed by Kahn, Goldfarb, Pollock, and Peck (1960). It has been used successfully with medical geriatric patients (LaRue, D'Elia, Clark, Spar, & Jarvik, 1986), community residents (Shore, Overman, & Wyatt, 1983), and long-term care patients (Fishback, 1977). Disadvantages of this measure include its sensitivity to education and ethnicity of the subject, its reduced sensitivity with mildly impaired individuals, and its omission of tests of retention, registration, and cognitive processing (Baker, 1989).

< previous page

page_62

next page >

< previous page

page_63

next page > Page 63

Other Instruments. Three measures have been developed that are particularly appropriate for primary care use, because their main function is to simply detect or rule out the presence of dementia. FROMAJE (Libow, 1981) classifies individuals into normal, mild, moderate, and severe dementia groups and has been used successfully with long-term care patients (Rameizl, 1984). The Blessed Dementia Scale (Blessed, Tomlinson, & Roth, 1968) measures changes in activities and habits, personality, interests and drives, and is useful for determining presence of dementia, though not its progression. Finally, the Global Deterioration Scale (Reisberg, Ferris, deLeon, & Crook, 1982) can be used to distinguish between normal aging, age-associated memory impairment, and primary degenerative disorder (such as Alzheimer's disease). The GDS is useful for assessing the magnitude and progression of cognitive decline (Reisberg, 1984). Recently, other innovative approaches have been proposed. With specific reference to Alzheimer's disease, Steffens and his colleagues (1996) proposed using the Telephone Interview for Cognitive Status in conjunction with a videotaped mental status exam. These researchers believe that the use of a telephone-based methodology shows some promise, and may help with physician time constraints. Another innovative new scale is the Chula Mental Test (CMT) developed by Jitapunkul, Lailert, Worakul, Srikiatkhachorn, and Ebrahim (1996) for use with elderly respondents from underdeveloped countries. Jitapunkul et al. commented that most cognitive screening measures have been based on highly developed, Western notions of cognitive dysfunction. As such, they may not be culturally or linguistically relevant in other countries. The CMT is a 13-item scale that is less biased toward education and literacy, which aids in minimizing false positives when screening in underdeveloped countries. The CMT tests for remote memory, orientation, attention, language, abstract thinking, judgment, and general knowledge. Two other measures of particular interest to the field of psychiatry are the Neurobehavioral Cognitive Status Examination (NCSE; Mitrushina, Abara, & Blumenfeld, 1995) and the Cognitive Levels Scale (C. Allen & R. Allen, 1987). The Cognitive Levels Scale is designed to measure cognitive impairment and social dysfunction in patients with mental disorders. Cognitive impairment is classified according to six levels (profoundly disabled to normal) and has implications for patients' functioning at home and at work. The NCSE samples 10 cognitive domains, which include orientation, attention, comprehension, repetition, naming, construction, memory, calculation, similarities, and judgment. The NCSE is capable of screening intact individuals as negative within approximately 5 minutes because, by design, it introduces a demanding item at the beginning of each substantive domain. The NCSE has been applied with neurological, medical, and psychiatric patients, and has been found capable of discriminating those patients with organic mental disorder from those free of the disorder. Although the NCSE has established a record of high sensitivity, low specificity has been characteristic of the test (Mityrushina et al., 1995). Cognitive Screening in Geriatric Populations As alluded to in the beginning of the chapter, an important consideration in any screening paradigm concerns the prevalence of the index disorder in the population under investigation. The prevalence of cognitive disorders is relatively dramatic in elderly populations. Furher and Ritchie (1993) noted a 6% prevalence rate for dementia in the general patient population, which may rise to as high as 14% to 18% in the elderly (Jagger, Clarke, & Anderson, 1992). In studying delirium, Hart et al. (1995) indicated

< previous page

page_63

next page >

< previous page

page_64

next page > Page 64

a prevalence of from 10% to 13% in the general patient population, which they estimate may rise to as high as from 15% to 30% in elderly patients. Screening the geriatric patient can often be a challenging enterprise for a number of diverse reasons. First, these patients often present with sensory, perceptual, and motor problems that seriously constrain the use of standardized tests. Poor vision, diminished hearing, and other physical handicaps can undermine the appropriateness of tests that are dependent on these skills. Similarly, required medications can cause drowsiness or inalertness, or in other ways interfere with optimal cognitive functioning. Illnesses such as heart disease and hypertension, common in the elderly, have also been shown to affect cognitive functioning (Libow, 1977). These limitations require screening instruments that are flexible enough to be adapted to the patient with handicaps or illnesses, and yet be sufficiently standardized to allow normative comparisons. Another difficulty with this population involves distinguishing cognitive impairment from aging-associated memory loss, and from characteristics of normal aging. This distinction requires a sensitive screening instrument, as the differences between these conditions are often subtle. Normal aging and dementia can be differentiated through their different effects on such functions as language, memory, perception, attention, information-processing speed, and intelligence (Bayles & Kaszniak, 1987). The Global Deterioration Scale is a screening test designed for this specific purpose. It has been shown to describe the magnitude of cognitive decline, and to predict functional ability (Reisberg, Ferris, deLeon, & Crook, 1988). A final problem encountered when screening in geriatric populations is the comorbidity of depression. Depression is one of several disorders in the elderly that may imitate dementia, resulting in a syndrome known as "pseudodementia." These patients have no discernable organic impairment, and the symptoms of dementia will usually remit when the underlying affective disorder is treated. Variability of task performance can distinguish these patients from truly demented patients, who tend to have an overall lowered performance level on all tasks (C. Wells, 1979). If depression is suspected, it should be the focus of a distinct diagnostic workup. Recently, a number of new instruments have been developed that help address these problems. One of these, the Cognitive Test for Delirium (CTD; Hart et al., 1995), appears promising. The CTD is a 9-item, examineradministered assessment that evaluates orientation, attention span, memory, comprehension, and vigilance. The CTD is completely nonverbal and requires only 10 to 15 minutes administration time. Through the application of ROC analysis (discussed later), Hart et al. were able to establish an optimal cutoff score of less than 19 to discriminate delirium from other disorders (Hart et al., 1995). They also reported that the CTD correlates highly with the MMSE in delirium and dementia patients, and it achieved a sensitivity and specificity of 100% and 95%, respectively, in an implementation with dementia in ICU patients. Cognitive Screening Among Inpatient Medical Populations When attempting to screen for cognitive impairment in medical populations, several of the limitations mentioned earlier as pertaining to geriatric populations will also apply, because the groups often overlap. Medical patients are often constrained by their illness and may not be able to respond to the test in the required manner. In addition, these patients are often bedridden, necessitating the use of a portable, bedside instrument. Perhaps the most demanding issue when evaluating this population is discriminating between the dementing patient and the patient with acute confusional states, or delirium.

< previous page

page_64

next page >

< previous page

page_65

next page > Page 65

This is particularly important not only because of the increased occurrence of delirium in medical patients, but because if left untreated, delirium can progress to an irreversible condition. Delirium can have multiple etiologies, such as drug intoxication, metabolic disorders, fever, cardiovascular disorders, or effects of anesthesia. The elderly and medical patients in general are both susceptible to misuse or overuse of prescription drugs, as well as metabolic or nutritional imbalances. Hypothyroidism, hyperparathyroidism, and diabetes are a few of the medical conditions often mistaken for dementia (Albert, 1981). In addition, cognitive impairment can also be caused by infections, such as pneumonia. Fortunately, three cardinal characteristics enable practitioners to distinguish dementia from delirium. The first characteristic is the rate of onset of symptoms. Delirium is marked by acute or abrupt onset of symptoms, whereas dementia has a more gradual progression. A second characteristic is the impairment of attention. Delirious patients have special difficulty sustaining attention on tasks such as serial sevens and digit span. The third characteristic is nocturnal worsening, which is characteristic of delirium but not dementia (Mesulam & Geschwind, 1976). Cognitive Screening in Primary Care Settings As already mentioned, many cases of cognitive impairment go undetected. This may be due to the fact that the early stages of cognitive dysfunction are often quite subtle, and many of these cases first present to primary care physicians (Mungas, 1991), who tend to have their principal focus on other systems. Also, many are unfamiliar with the available procedures for detecting cognitive impairment, whereas others are reluctant to add a formal cognitive screening to their schedule of procedures. Although brief, the 10 to 30 minutes required of most cognitive screening instruments remains a formidable requirement considering the fact that, on average, a family practice physician spends from 7 to 10 minutes with each patient. Because cognitive screening techniques are highly transportable and actuarial in nature, and may be administered by a broad range of health care professionals, the solution to introducing such screening in primary care may be to train nurses or physician's assistants to conduct screening. Such an approach would not add to the burden of physicians, and would at least effect an initiation of such programs so that their utility can be realistically evaluated. Methodological Issues in Screening for Psychiatric Disorders The Problem of Low Base Rates Over 40 years ago, a paper appeared in the psychological literature (Meehl & Rosen, 1955) that sensitized psychologists to the dramatic impact of low base rates on the predictive validity of psychological tests. Meehl and Rosen demonstrated that attempts to predict rare attributes or events, even with highly valid tests, would result in substantially more misclassifications than correct classifications if the prevalence of the event was sufficiently low. Knowledge and understanding of this important but little known fact remained limited to a few specialists at that time. However, Vecchio (1966)

< previous page

page_65

next page >

< previous page

page_66

next page > Page 66

TABLE 2.5 Predictive Values of Positive and Negative Tests at Varying Prevalence (Base) Rates Prevalence or Base Rate Predictive Value of a + Predictive Value of a (%) (%) (%) 16.1 99.9 1 2 27.9 99.9 5 50.0 99.7 10 67.9 99.4 20 82.6 98.7 50 95.0 95.0 75 98.3 83.7 100 100 Note. Synopsis of data originally presented by Vecchio (1966). Sensitivity and specificity = 95%. published a report in the medical literature dealing with essentially the same phenomenon. In Vecchio's report, where the substantive aspects of the report dealt with screening tests in medicine, the information reached a much wider audience. As a result, knowledge of the special relation between low base rates and the predictive validity of screening tests has since become well-established. To be precise, low prevalence does not equally affect all aspects of a test's validity; its impact is felt only in the validity partition that deals with correctly classifying positives, or "cases." Predictive validity concerning negatives, or "noncases," is minimally impaired because with extremely low prevalence, even a test with moderate validity will perform adequately. This relation is summarized in Table 2.5, which is a synopsis of data originally given by Vecchio (1966). In the example developed by Vecchio, the sensitivity and specificity of the screening test are given as .95, values that do not represent realistic validity coefficients for a psychological screening test. Table 2.6 provides a more realistic example of the relation between prevalence and positive predictive value, based on a hypothetical cohort of N = 1,000, with validity coefficients (i.e., sensitivity and specificity) more consistent with those that might be genuinely anticipated for such tests. The data of Table 2.5 and 2.6 make it clear that as prevalence drops below 10%, the predictive value of a positive experiences a precipitous decline. In the first example, when prevalence reaches 1%, the predictive value of a positive is only 16%, which means in practical terms that in such situations 5 out of 6 positives will be false positives. The predictive value of a negative remains extremely high throughout the range of base rates depicted, and is essentially unaffected by low prevalence situations. The example from Table 2.6 is more realistic in that the validity coefficients are more analogous to those commonly reported for psychological screening tests. In the screening situation depicted here, the predictive value of a positive drops from 77% when prevalence is 30% (e.g., the rate of psychiatric disorders among specialized medical patients) to 7.5% when prevalence falls to 1%. In the latter instance, 12 out of 13 positives would be false positives. Sequential Screening: A Technique for Low Base Rates Although screening for psychiatric disorders in general is not usually affected by problems of low base rates, there are specific mental health phenomena (e.g., suicide), and diagnostic categories (e.g., panic disorder) revealing prevalences that are quite low. In

< previous page

page_66

next page >

< previous page

page_67

next page > Page 67

TABLE 2.6 Relation of Prevalence (Base Rate) and Positive Predictive Value Assumed Test Sensitivity = .80 Assumed Test Specificity = .90 Prevalence = .30 Prevalence = .05 Prevalence = .01 Actual Disorder Actual Disorder Actual Disorder T Pos. Neg. T Pos. Neg. T Pos. Neg. 240 70 40 + 310 E+8 99 107 E E + 95 135 60 630 10 855 S 690 S 865 S 2 891 893 300 700 50 950 T T T 10 990 Pos. Predict. Val. = 240/310 Pos. Predict. Val. = 40/135 Pos. Predict. = 8/107 = = 77% = 30% 7.5% addition, as Baldessarini, Finklestein, and Arana (1983) noted, the nature of the population being screened can markedly affect the quality of screening outcomes. A good example of this distinction is provided by the dexamethasone suppression test (DST) when used as a screen for major depressive disorder (MDD). The DST functions relatively effectively as a screen for MDD on inpatient affective disorders units where the prevalence of the disorder is quite high. In general medical practice, however, where the prevalence of MDD is estimated to be about 5%, the DST results in unacceptable rates of misclassification. The validity of the DST is insufficient to support effective screening performance in populations with low base rates of MDD. A method designed to help overcome low base rate problems is commonly referred to as sequential screening. In a sequential screening paradigm, there are two phases to screening and two screening tests. Phase I involves a less refined screen, whose primary purpose is to correctly identify individuals without the condition and eliminate them from consideration in Phase II evaluation. The initial screening also has the important effect of raising the prevalence of the index condition in the remaining cohort. In Phase II, a separate test of equal or superior sensitivity is then utilized. Because the base rate of the index condition has been significantly raised by Phase I screening, the performance of the Phase II screen will involve much lower levels of false positive misclassification. A hypothetical example of sequential screening is given in Table 2.7.

Table 2.7 Hypothetical Example of Sequential Screening as a Strategy for Dealing with Low Base Rates

< previous page

page_67

next page >

< previous page

page_68

next page > Page 68

In Phase I of the hypothetical screening, a highly valid instrument with sensitivity and specificity equal to .90 is used in a large population cohort (N = 10,000) with a prevalence of 4% for the index condition. Because of the low base rate, the predictive value of a positive is only 27.2%, meaning essentially that less than 1 out of every 3 positives will be true positives. The 1,320 individuals screened positive from the original cohort of 10,000 subsequently become the cohort for Phase II screening. With an equally valid, independent test (sensitivity and specificity = .90) and a base rate of 27.2%, the predictive value of a positive in Phase II rises to 77%, representing a substantial increase in the level of screening performance. Sequential screening essentially zeros in on a high risk subgroup of the population of interest by virtue of a series of consecutive sieves. These have the effect of eliminating from consideration individuals with low likelihood of having the disorder, and simultaneously raising the base rate of the condition in the remaining sample. Sequential screening can become expensive because of the increased number of screening tests that must be administered. However, in certain situations where prevalence is low (e.g., HIV screening in the general population) and the validity of the screening test is already close to maximum, it may be the only method available to minimize errors in classification. ROC Analysis Although some screening tests operate in a qualitative fashion, depending on the presence or absence of a key indicator, psychological screening tests function, as do many others, along a quantitative continuum. The individual being screened must obtain a probability, or "score," above some criterion threshold, or "cutoff," to be considered a "positive,'' or a "case." The cutoff value is usually determined to be the value that will maximize correct classification and minimize misclassification relative to the index disorder. If the relative consequences of one type of error are considered more costly than the other (i.e., the consequences have dramatically different utilities; e.g., false negative = missed fatal but potentially curable disease), the cutoff value will often be adjusted to take this differential utility into account. Although quantitative methods exist to estimate optimal threshold values (e.g., Weinstein et al. 1980), traditionally they have been selected by simple inspection of cutoff tables and their associated sensitivities and specificities. The selection of a cutoff value automatically determines both the sensitivity and specificity of the test because it defines the rates of correct identification and misclassification. Actually, an entire distribution of cutoffs are possible, with corresponding sensitivities and specificities. Further, as touched on in the previous section, test performance (i.e., the errors associated with a particular cutoff value) is highly affected by the prevalence or base rate of the disorder under study. Viewed from this perspective, a test should not be characterized by a sensitivity and specificity; rather, it should be perceived as possessing distributions of sensitivities and specificities associated with the distribution of possible threshold values and the distribution of prevalences. Receiver Operating Characteristic (ROC) analysis is a method that enables the visualization of the entire distribution of sensitivity/specificity combinations for all possible cutoff values and prevalences. As such, it enables the selection of a criterion threshold based on substantially more information, and represents a much more sophisticated clinical decision process. ROC analysis was first developed by Swets (1964) in the context of signal detection paradigms in psychophysics. Subsequently, applications of the technique were developed in the areas of radiology and medical imaging (Hanley

< previous page

page_68

next page >

< previous page

page_69

next page > Page 69

& McNeil, 1982; Metz, 1978; Swets, 1979). Madri and Williams (1986) and Murphy et al. (1987) introduced and applied ROC analysis to the task of screening for psychiatric disorders. More recently, Somoza and his colleagues (Somoza, 1994, 1996; Somoza & Mossman, 1990a, 1990b, 1991; Somoza, Steer, A. T. Beck, & D. A. Clark, 1994) published an extensive series of in-depth reports integrating ROC analysis with information theory to optimize the performance of diagnostic and screening tests. In their informative series, these investigators reviewed the topics of construction of tests (Somoza & Mossman, 1990a, 1990b), the effects of prevalence (Mossman & Somoza, 1991), optimizing information yield (Somoza & Mossman, 1992a, 1992b), and maximizing expected utility (Mossman & Somoza, 1992), among others. Typically, an ROC curve is developed by plotting corresponding values of a test's sensitivity (true positive rate) on the vertical axis, against the compliment of its specificity (false positive rate) on the horizontal axis, for the entire range of possible cutting scores from lowest to highest (see Fig. 2.1). A number of computer programs (e.g., Somoza & Mossman, 1991) are available to generate and plot ROC curves. The ROC curve demonstrates the discriminative capacity of the test at each possible definition of threshold (cutoff score) for psychiatric disorder. If the discriminative capacity of the test is no better than chance, then the curve will follow a diagonal straight line from the origin of the graph (lower left) to its uppermost right corner. This line is termed the "line of no information." The ROC curve rises from the origin (point 0, 0) to its termination point (1, 1) on a plane as defined. To the extent that a test has discriminative ability, the curve will bow in a convex manner toward the upper left corner of the graph. The greater the deviation toward the upper left corner, the greater discriminative ability the test has for the particular application at hand. An ROC summary statistic describing the discriminative capacity of a test is referred to as the "area under the curve" (AUC). The AUC may be thought of as a probability estimate that at each cutoff score a randomly chosen positive (or "case") will demonstrate a higher score than a randomly chosen negative. When the ROC curve follows the line of no information, the AUC is .50. In the situation of theoretically optimal discrimination, the ROC curve would follow the outline of the ordinate of the graph from point 0, 0 to point 1, 0, and then move at right angles to point 1, 1. In this situation, the AUC would equal 1.0.

Fig. 2.1. ROC curves for two hypothetical psychiatric screening tests. From "Performance of Screening and Diagnostic Tests" by J. M. Murphy et al., 1987, Archives of General Psychiatry, 44, pp. 550-555. Copyright © 1987 by American Medical Association. Reprinted by permission.

< previous page

page_69

next page >

< previous page

page_70

next page > Page 70

Although ROC analysis has been introduced to the area of screening for psychiatric disorders only within the past decade, investigators have found numerous applications for the technique. In addition to simply describing the distribution of validity coefficients for a single test, ROC analysis has been used to compare various screening tests (Somoza et al., 1994; Weinstein, Berwick, Goldman, Murphy, & Barsky, 1989), aid in the validation of new tests, compare different scoring methods for a particular test (Birtchnell, Evans, Deahl, & Master, 1989), contrast the screening performance of a test in different populations (Burnam, Wells, Leake, & Landsverk, 1988; Hughson, Cooper, McArdle, & Smith, 1988), and assist in validating a foreign language version of a standard test (Chong & Wilkinson, 1989). ROC analysis has also been effectively integrated with paradigms from information theory to maximize information yield in screening (Somoza & Mossman, 1992a, 1992b), and with decision-making models to optimize expected utilities of screening outcomes (Mossman & Somoza, 1992). Although ROC analysis does not represent a definitive solution for the complex problems of psychiatric screening, it does significantly increase the information available to the decision maker and provides a relatively precise and sophisticated method for making decisions. Conclusions Currently, little doubt remains that psychiatric disorders meet the WHO criteria for conditions appropriate for the development of effective health screening programs (Wilson & Junger, 1968). The magnitude of the health problem they represent is extensive, and the morbidity, mortality, and costs associated with these conditions is imposing. Currently, there are valid, cost-efficient, psychological tests to effectively identify these conditions in medical and community settings, and the efficacy of treatment regimens for most psychiatric conditions is constantly improving (Regier et al., 1988). Although evidence concerning the incremental advantage of early detection remains somewhat equivocal, evidence is compelling that left to their natural courses, such conditions will result in chronic, compound morbidities of both a physical and psychological nature (L. R. Derogatis & Wise, 1989; Katon et al., 1990; Regier et al., 1988). As indicated earlier, it is of little ultimate consequence to develop effective systems of treatment planning and outcomes assessment if the majority of individuals who would benefit from their utilization are lost to the system. In large measure, this undesirable reality has to do with the fact that a substantial majority of patients with psychiatric conditions are never seen by mental health professionals, and up to 20% are never seen by any health care professional. The majority of individuals with psychiatric morbidity seen in the health care system are attended by primary care physicians who have been insufficiently trained to recognize or effectively treat these conditions. A substantial plurality of these cases go unrecognized, and of those in whom a correct diagnosis is made, only a minority are referred to mental health professionals. Typically, primary care physicians prefer to treat these cases personally, even though they do not feel confident in doing so. Primary care physicians are playing a more prominent role as "gatekeepers" in the health care system relative to psychiatric disorders and all evidence points toward their continuing in this role in the future. This being the case, it seems imperative that effective methods be developed and introduced to facilitate these professionals' diagnostic and treatment decisions concerning psychiatric disorders. Although biological markers may

< previous page

page_70

next page >

< previous page

page_71

next page > Page 71

ultimately bring enhanced refinement to the identification of psychiatric morbidity (Jefferson, 1988; Tollefson, 1990), such a reality remains futuristic. In the present, available psychological screening techniques can deliver valid, cost-effective identification of these conditions now. Considering the cost-benefit and savings involved, such systems should be extensively implemented as soon as possible. References Albert, M. (1981). Geriatric neuropsychology. Journal of Consulting and Clinical Psychology, 49, 835-850. Allen, C., & Allen, R. (1987). Cognitive disabilities: Measuring the social consequences of mental disorders. Journal of Clinical Psychiatry, 48, 185-190. Allison, T. G., Williams, D. E., Miller, T. D., Patten, C. A., Bailey, K. R., Squires, R. W., & Gau, G. T. (1995). Medical and economic costs of psychologic distress in patients with coronary artery disease. Mayo Clinic Proceedings, 70, 734-742. American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd rev. ed.). Washington, DC: Author. Anderson, S. M., & Harthorn, B. H. (1989). The recognition diagnosis, and treatment of mental disorders by primary care physicians. Medical Care 27, 869-886. Anderson, S. M., & Harthorn, B. H. (1990). Changing the psychiatric knowledge of primary care physicians: The effects of a brief intervention on clinical diagnosis and treatment. General Hospital Psychiatry, 12, 177-190. Baker, F. (1989). Screening tests for cognitive impairment. Hospital and Community Psychiatry, 40, 339-340. Baldessarini, R. J., Finklestein, S., & Arana, G. W. (1983). The predictive power of diagnostic tests and the effect of prevalence of illness. Archives of General Psychiatry, 40, 569-573. Barrett, J. E., Barrett, J. A., Oxman, T. E., & Gerber, P. D. (1988). The prevalence of psychiatric disorders in a primary care practice. Archives of General Psychiatry, 45, 1100-1106. Bayles, K., & Kaszniak, A. (1987). Communication and cognition in normal aging and dementia. Boston: Little, Brown. Bech, P. (1987). Observer rating scales of anxiety and depression with reference to DSM-III for clinical studies in psychosomatic medicine. Advances of Psychosomatic Medicine, 17, 55-70. Bech, P. Grosby, H., Husum, B., & Rafaelson, L. (1984). Generalized anxiety and depression measured by the Hamilton Anxiety Scale and the Melancholia Scale in patients before and after cardiac surgery. Psychopathology, 17, 253-263. Beck, A. T., & Beck, R. W. (1972). Screening depressed patients in family practice: A rapid technic. Postgraduate Medicine, 52, 81-85. Beck, A. T., Kovacs, M., & Weissman, A. (1975). Hopelessness and suicidal behavior: An overview. Journal of the American Medical Association, 234, 1146-1149. Beck, A. T., & Steer, R. A. (1993). Manual for the Beck Depression Inventory. San Antonio, TX: The Psychological Corporation. Beck, A. T., Ward C., Mendelson, M., Mock, J. E., & Erbaugh, J. K. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 53-63. Benjamin, G., Kaszniak, A., Sales, B., & Shanfield, S. (1986). The role of legal education in producing psychological distress among law students and lawyers. American Bar Foundation Research Journal, 2, 225252. Berg, G., Edwards, D., Danzinger, W., & Berg, L. (1987). Longitudinal change in three brief assessments of SDAT. Journal of the America Geriatrics Society, 35, 205-212. Bjarnason, T., & Thorlindsson, T. (1994). Manifest predictors of past suicide attempts in a population of Icelandic adolescents. Suicide and Life Threatening Behavior, 24, 350-357. Birtchnell, J., Evans, C., Deahl, M., & Master, N. (1989). The Depression Screening Instrument

< previous page

page_71

next page >

< previous page

page_72

next page > Page 72

(DSI): A device for the detection of depressive disorders in general practice. Journal of Affective Disorders, 16, 269-281. Blazer, D., George, G. K., Landerman, R., Pennybacker, M., Melville, M. L., Woodbury, M., Mantor, K. G., Jordan, K., & Locke, B. (1984). Psychiatric disorders: A rural/urban comparison. Archives of General Psychiatry, 41, 959-970. Blessed, G., Tomlinson, B., & Roth, M. (1968). The association between quantitative measures of dementia and of senile change in the cerebral gray matter of elderly. British Journal of Psychiatry, 114, 797-811. Bridges, K., & Goldberg, D. (1984). Psychiatric illness in in patients with neurological disorders: Patient's view on discussions of emotional problems with neurologists. British Medical Journal, 289, 656-658. Bulik, C. M., Carpenter, L. L., Kupfer, D. J., & Frank, E. (1990). Features associated with suicide attempts in recurrent major depression. Journal of Affective Disorders, 18, 27-29. Burnam, M. A., Wells, K. B., Leake, B., & Landsverk, J. (1988). Development of a brief screening instrument for detecting depressive disorders. Medical Care, 26, 775-789. Burns, B. J., & Burke, J. D. (1985). Improving mental health practices in primary care. Public Health Reports, 100, 294-299. Chiles, J. A., & Strosahl, K. D. (1995). The suicidal patient: Principles of assessment, treatment, and case management (pp. 50-105). Washington, DC: American Psychiatric Press. Chong, M., & Wilkinson, G. (1989). Validation of 30- and 12-item versions of the Chinese Health Questionnaire (CHQ) in patients admitted for general health screening. Psychological Medicine, 19, 495-505. Clark, L., Levine, M., & Kinney, N. (1988-1989). A multifaceted and integrated approach to the prevention, identification, and treatment of bulimia on college campuses. Journal of College Student Psychotherapy, 3, 257298. Cochran, C. D., & Hale, W. D. (1985). College students norms on the Brief Symptom Inventory. Journal of Clinical Psychology, 31, 176-184. Cohen, L. J., Test, M. A., & Brown, R. L. (1990). Suicide and schizophrenia: Data from a prospective community treatment study. American Journal of Psychiatry, 47, 602-607. Commission on Chronic Illness. (1957). Chronic Illness in the United States. Vol. 1, Cambridge, Commonwealth Fund, Harvard University Press. Comstock, G. W., & Helsing, K. J. (1976). Symptoms of depression in two communities. Psychological Medicine, 6, 551-564. Craske, M., & Krueger, M. (1990). Prevalence of nocturnal panic in a college population. Journal of Anxiety Disorders, 4, 125-139. Craven, J. L., Rodin, G. M., & Littlefield, C. (1988). The Beck Depression Inventory as a screening device for major depression in renal dialysis patients. International Journal of Psychiatry in Medicine, 18, 365-374. Cummings, J., & Benson, F. (1986). Dementia of the Alzheimer type: An inventory of diagnostic clinical features. Journal of the American Geriatrics Society, 34, 12-19. Davis, T. C., Nathan, R. G., Crough, M. A., & Bairnsfather, L. E. (1987). Screening depression with a new tool: Back to basics with a new tool. Family Medicine, 19, 200-202. Depue, R., Krauss, S., Spoont, M., & Arbisi, P. (1989). General Behavior Inventory identification of unipolar and bipolar affective conditions in a nonclinical university population. Journal of Abnormal Psychology, 98, 117-126. Derogatis, L. R. (1975a). The Brief Symptom Inventory (BSI). Baltimore, MD: Clinical Psychometric Research. Derogatis, L. R. (1975b). The SCL-90-R. Baltimore, MD: Clinical Psychometric Research. Derogatis, L. R. (1977). SCL-90-R: Administration, scoring and procedures manual-I, Baltimore: Clinical Psychometric Research. Derogatis, L. R. (1983). SCL-90-R: Administration, scoring and procedures manual-II, Baltimore: Clinical Psychometric Research. Derogatis, L. R. (1990). SCL-90-R: A bibliography of research reports 1975-1990. Baltimore: Clinical Psychometric Research. Derogatis, L. R. (1993). BSI: Administration, scoring and procedures manual for the Brief Symptom Inventory (3rd ed.). Minneapolis, MN: National Computer Systems.

Derogatis, L. R. (1994). SCL-90-R: Administration, scoring and procedures manual (3rd ed.).

< previous page

page_72

next page >

< previous page

page_73

next page > Page 73

Minneapolis, MN: National Computer Systems. Derogatis, L. R. (1997). The Brief Symptom Inventory-18 (BSI-18). Minneapolis, MN: National Computer Systems. Derogatis, L., DellaPietra, L., & Kilroy, V. (1992). Screening for psychiatric disorder in medical populations. In M. Fava, G. Rosenbaum, & R. Birnbaum (Eds.), Research designs and methods in psychiatry. (pp. 145-170). New York: Elsevier. Derogatis, L. R., & Derogatis, M. F. (1996). SCL-90-R and the BSI. In B. Spilker (Ed.), Quality of life and pharmacoeconomics (pp. 323-335). Philadelphia: Lippincott-Raven. Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi, L. (1974a). The Hopkins Symptom CheckList (HSCL): A self-report symptom inventory. Behavioral Science, 19, 1-15. Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi. L. (1974b). The Hopkins Symptom Checklist (HSCL). In P. Pinchot (Ed.), Psychological measurements in psychopharmacology (pp 79-111). Basel: Karger. Derogatis, L. R., & Melisaratos, N. (1983). The Brief Symptom Inventory: An introductory report. Psychological Medicine, 13, 595-605. Derogatis, L. R., Morrow, G. R., Fetting, J., Penman, D., Piasetsky, S., Schmale, A. M., Henrichs, M.,& Carnicke, C.L.M. (1983). The prevalence of psychiatric disorders among cancer patients. Journal of the American Medical Association, 249, 751-757. Derogatis, L. R., & Spencer, P. M. (1982). BSI administration and procedures manualI. Baltimore: Clinical Psychometric Research. Derogatis, L. R., & Wise, T. N. (1989). Anxiety and depressive disorders in the medical patient. Washington, DC: American Psychiatric Press. Dick, J., Guiloff, R., Stewart, A., Blackstock, J., Bielawska, C., Paul, E., & Marsden, C. (1984). Mini-Mental State Examination in neurological patients. Journal of Neurology, Neurosurgery, and Psychiatry, 47, 496-499. Dohrenwend, B. P., & Dohrenwend, B. S. (1982). Perspectives on the past and future of psychiatric epidemiology. American Journal of Public Health, 72, 1271-1279. Doyle, G., Dunn, S., Thadani, I., & Lenihan, P. (1986). Investigating tools to aid in restorative care for Alzheimer's patients. Journal of Gerontological Nursing, 12, 19-24. Eisen, S. V. (1996). Behavior and Symptom Identification Scale (BASIS-32). In L. I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 65-69). Baltimore: Williams & Wilkins. Eisen, S. V., & Dickey, B. (1996). Mental health outcome assessment: The new agenda. Psychotherapy, 33, 181-189. Eisen, S. V., & Grob, M. C. (1989). Substance abuse in an inpatient population. McLean Hospital Journal, 14, 1-22. Erikkson, J. (1988). Psychosomatic aspects of coronary artery bypass graft surgery: A prospective study of 101 male patients. Acta Psychiatrica Scandinavica, 77(Suppl. 340), 112. Escobar, J., Burnam, A., Karno, M., Forsythe, A., Landsverk, J., & Golding, J. (1986). Use of the Mini-Mental State Examination (MMSE) in a community population of mixed ethnicity: Cultural and linguistic artifacts. Journal of Nervous and Mental Disease, 174, 607-614. Fauman, M. A. (1983). Psychiatric components of medical and surgical practice: II. Referral and treatment of psychiatric disorders. American Journal of Psychiatry, 140, 760-763. Faust, D., & Fogel, B. (1989). The development and initial validation of a sensitive bedsider cognitive screening test. Journal of Nervous and Mental Disease, 177, 25-31. Faustman, W., Moses, J., & Csernansky, J. (1990). Limitations of the Mini-Mental State Examination in predicting neuropsychological functioning in a psychiatric sample. Acta Psychiatrica Scandinavica, 81, 126-131. Fishback, D. (1977). Mental Status Questionnaire for Organic Brain Syndrome, with a New Visual Counting Test. Journal of the American Geriatrics Society, 35, 167-170. Folstein, M., Folstein, S., & McHugh, P. (1975). Mini-Mental State. Journal of Psychiatric Research, 12, 189198. Foreman, M. (1987). Reliability and validity of mental status questionnaires in elderly hospitalized patients. Nursing Research, 36, 216-220. Frerichs, R. R., Areshensel, C. S.,& Clark, V. A. (1981). Prevalence of depression in Los Angeles County.

American Journal of Epidemiology, 113, 691-699.

< previous page

page_73

next page >

< previous page

page_74

next page > Page 74

Fulop, G., & Strain, J. J. (1991). Diagnosis and treatment of psychaitric disorders in medically ill inpatients. Hospital and Community Psychiatry, 42, 389-394. Furher, R., & Ritchie, K. (1993). Re: C. Jagger et al.'s article "Screening for dementiaA comparison of two tests using Receiver Operating Characteristic (ROC) analysis" 7, 659-665. International Journal of Geriatric Psychiatry, 8, 867-868. Galton, F. (1883). Inquiries into human faculty and its development. New York: Macmillan. Goldberg, D. (1972). The detection of psychiatric illness by questionnaire. Oxford, England: Oxford University Press. Goldberg, D., & Hillier, V. F. (1979). A scaled version of the General Health Questionnaire. Psychological Medicine, 9, 139-145. Goldberg, D., & Williams, P. (1988). A user's guide to the General Health Questionnaire. Windsor: NferNelson. Goldstein, L. T., Goldsmith, S. J., Anger, K.,& Leon, A. C. (1996). Psychiatric symptoms in clients presenting for commercial weight reduction treatment. International Journal of Eating Disorders, 20, 191-197. Griest, J. H., Gustafson, D. H., Stauss, F. F., Rowse, G. L., Laughren, T. P., & Chiles, J. A. (1973). A computer interview for suicide-risk prediction. American Journal of Psychiatry, 130, 1327-1332. Haglund, R., & Schuckit, M. (1976). A clinical comparison of tests of organicity in elderly patients. Journal of Gerontology, 31, 654-659. Hajek, V., Rutman, D., & Scher, H. (1989). Brief assessment of cognitive impairment in patients with stroke. Archives of Physical Medicine and Rehabilitation, 70, 114-117. Hamilton, M. (1959). The assessment of anxiety states by rating. British Journal of Medical Psychology, 32, 5055. Hamilton, M. (1960). A rating scale for depression. Journal of Neurosurgery Psychiatry, 23, 50-55. Hamilton, M. (1967). Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology, 6 278-296. Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a Receiver Operating Characteristic (ROC) curve. Diagnostic Radiography, 143, 29-36. Hart, R. P., Levenson, J. L., Sessler, C. N., Best, A. M., Schwartz, S. M., & Rutherford, L. E. (1995). Validation of a cognitive test for delerium in medical ICU patients. Psychosomatics, 37, 533-546. Hawton, K. (1981). The long term outcome of psychiatric morbidity detected in general medical patients. Journal of Psychosomatic Research, 25, 237-243. Hedlund, J. L., & Vieweg, M.D. (1979). The Hamilton Rating Scale for Depression: A comprehensive review. Journal of Operational Psychiatry, 10, 149-165. Hoeper, E. W., Nyczi, G. R., & Cleary, P. D. (1979). Estimated prevalence of RDC mental disorders in primary care. International Journal of Mental Health, 8, 6-15. Hughson, A.V.M., Cooper, A. F., McArdle, C. S., & Smith, D. C. (1988). Validity of the General Health Questionnaire and its subscales in patients receiving chemotherapy for early breast cancer. Journal of Psychosomatic Research, 32, 393-402. Jacobs, J., Berhard, M., Delgado, A., & Strain, J. (1977). Screening for organic mental syndromes in the medically ill. Annals of Internal Medicine, 86, 40-46. Jagger, C., Clarke, M., & Anderson, J. (1992). Screening for dementia: A comparison of two tests using Receiver Operating Characteristic (ROC) analysis. International Journal of Geriatric Psychiatry, 7, 659-665. Jefferson, J. W. (1988). Biologic systems and their relationship to anxiety. Psychiatric Clinics of North America, 11, 463-472. Jitapunkul, S., Lailert, C., Worakul, P, Srikiatkhachorn, A., & Ebrahim, S. (1996). Chula Mental Test: A screening test for elderly people in less developed countries. International Journal of Geriatric Psychiatry, 11, 715-720. Johnson, R., Ellison, R., & Heikkinen, C. (1989). Psychological symptoms of counseling center clients. Journal of Counseling Psychology, 36, 110-114. Jones, L. R., Badger, L. W., Ficken, R. P., Leepek, J. D., & Anderson, R. L. (1987). Inside the hidden mental health network: Examining mental health care delivery of primary care physicians. General Hospital Psychiatry, 9, 287-293.

Judd, B., Meyer, J., Rogers, R., Gandhi, S., Tanahashi, N., Mortel, K., & Tawaklna, T. (1986). Cognitive performance correlates with cerebrovascular impairments in multi-infarct

< previous page

page_74

next page >

< previous page

page_75

next page > Page 75

dementia. Journal of the American Geriatrics Society, 34, 355-360. Kahn, R., Goldfarb, A., Pollack, M., & Peck, A. (1960). Brief objective measures for the determination of mental status in the aged. American Journalof Psychiatry, 117, 326-328. Kamerow, D. B., Pincus, H. A., & MacDonald, D. I. (1986). Alcohol abuse, other drug abuse, and mental disorders in medical practice: Prevalence, cost, recognition, and treatment. Journal of the American Medical Association, 255, 2054-2057. Kane, M. T., & Kendall, P. C. (1989). Anxiety disorders in children: A multiple-baseline evaluation of a cognitive-behavioral treatment. Behavior Therapy, 20, 499-508. Katon, W., Von Korff, M., Lin, E., Lipscomb, P., Russo, J., Wagner, E., & Polk, E. (1990). Distressed high utilizers of medical care: DSM-III-R diagnoses and treatment needs. General Hospital Psychiatry, 12, 355-362. Kedward, H. B., & Cooper, B. (1966). Neurotic disorders in urban practice: A 3 year followup. Journal of College of General Practice, 12, 148-163. Kempf, E. J. (1914-1915). The behavior chart in mental diseases. American Journal of Insanity, 7, 761-772. Kessler, L. G., Amick, B. C., & Thompson, J. (1985). Factors influencing the diagnosis of mental disorder among primary care patients. Medical Care, 23, 50-62. Kramer, M., German, P., Anthony, J., Von Korff, M., & Skinner, E. (1985). Patterns of mental disorders among the elderly residents of eastern Baltimore. Journal of the American Geriatrics Society, 11, 236-245. Kuhn, W. F., Bell, R. A., Seligson, D., Laufer, S. T., & Lindner, J. E. (1988). The tip of the iceberg: Psychiatric consultations on an orthopedic service. International Journal of Psychiatry in Medicine, 18, 375-378. LaRue, A., D'Elia, L., Clark, E., Spar, J., & Jarvik, L. (1986). Clinical tests of memory in dementia, depression, and healthy aging. Psychology and Aging, 1, 69-77. Lesher, E., & Whelihan, W. (1986). Reliability of mental status instruments administered to nursing home residents. Journal of Consulting and Clinical Psychology, 54, 726-727. Libow, L. (1981). A rapidly administered, easily remembered mental status evaluation: FROMAJE. In L. S. Libow, & F. T. Sherman (Eds.), The core of geriatric medicine (pp. 85-91). St. Louis: C. V. Mosby. Libow, L. (1977). Senile dementia and pseudosenility: Clinical diagnosis. In C. Eisdorfer & R. Friedel (Eds.), Cognitive and emotional disturbance in the elderly. Chicago: Year Book Medical Publishing. Linn, L., & Yager, J. (1984). Recognition of depression and anxiety by primary care physicians. Psychosomatics, 25, 593-600. Lish, J. D., Zimmerman, M., Farber, N. J., Lush, D. T., Kuzma, M. A., & Plescia, G. (1996). Suicide screening in a primary care setting at a Veterans Affairs medical center. Psychosomatics, 37, 413-424. Luce, R. D., & Narens, L. (1987). Measurement scales on the continuum. Science, 236, 1527-1532. Lyketsos, C. G., Hutton, H., Fishman, M., Schwartz, J., & Trishman, G. J. (1996). Psychiatric morbidity on entry to an HIV primary care clinic. AIDS, 10, 1033-1039. Madri, J. J., & Williams, P. (1986). A comparison of validity of two psychiatric screening questionnaires. Journal of Chronic Disorders, 39, 371-378. Malt, U. F. (1989). The validity of the General Health Questionnaire in a sample of accidentally injured adults. Acta Psychiatrica Scaninavica, 80, 103-112. Maser, J. D., & Cloninger, C. R. (1990). Comorbidity of mood and anxiety disorders. Washington, DC: American Psychiatry Press. McCartney, J., & Palmateer, L. (1985). Assessment of cognitive deficit in geriatric patients: A study of physician behavior. Journal of the American Geriatrics Society, 33, 467-471. McDermott, R., Hawkins, W., Littlefield, E., & Murray, S. (1989). Health behavior correlates of depression among university students. Journal of American College Health, 38, 115-119. McDougall, G. (1990). A review of screening instruments for assessing cognition and mental status in older adults. Nurse Practitioner, 15, 18-28. Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52, 194-216. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from

< previous page

page_76

next page > Page 76

persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749. Mesulam, M., & Geschwind, N. (1976). Disordered mental status in the postoperative period. Urologic Clinics of North America, 3, 199-215. Metz, C. E. (1978). Basic principles of ROC analysis. Seminars in Nuclear Medicine, 8, 283-298. Mitrushina, M., Abara, J., & Blumenfeld, A. (1995). Cognitive screening of psychiatric patients. Journal of Psychiatric Research, 29, 13-22. Moore, J. T., Silimperi, D. R., & Bobula, J. A. (1978). Recognition of depression by family medicine residents: The impact of screening. Journal of Family Practice, 7, 509-513. Mossman, D., & Somoza, E. (1991). Neuropsychiatric decision making: The role of disorder prevalence in diagnostic testing. Journal of Neuropsychiatry and Clinical Neurosciences, 3, 84-88. Mossman, D., & Somoza, E. (1992). Balancing risks and benefits: Another approach to optimizing diagnostic tests. Journal of Neuropsychiatry and Clinical Neurosciences, 4, 331-335. Mungas, D. (1991). In-office mental status testing: A practical guide. Geriatrics, 46, 54-66. Murphy, J. M., Berwick, D. M., Weinstein, M. C., Borus, J. F., Budman, S. H., & Klerman, G. L. (1987). Performance of screening and diagnostic tests. Archives of General Psychiatry, 44, 550-555. Myers, J. K., Weissman, M. M., Tischler, G. L., Holzer, C. E., III, Leaf, P. J., Orvaschel, H., Anthony, J. C., Boyd, J. H., Burke, J. D., Kramer, M., & Stoltzman, R. (1984). Six-month prevalence of psychiatric disorders in three communities. Archives of General Psychiatry, 41, 959-970. Nielson, A. C., & Williams, T. A. (1980). Depression in ambulatory medical patients. Archives of General Psychiatry, 37, 999-1009. Noyes, R., Chrisiansen, J., Clancy, J., Garvey, M. J., & Suelzer, M. (1991). Predictors of serious suicide attempts among patients with panic disorder. Comprehensive Psychiatry, 32, 261-267. Nunnally, J. (1978). Psychometric theory. New York: McGraw-Hill. O'Hara, M. N., Ghonheim, M. M., Hinrich, J. V., Metha, M. P., & Wright, E. J. (1989). Psychological consequences of surgery. Psychosomatic Medicine, 51, 356-370. Orleans, C. T., George, L. K., Houpt, J. L., & Brodie, H. (1985). How primary care physicians treat psychiatric disorders: A national survey of family practitioners. American Journal of Psychiatry, 142, 52-57. Parikh, R. M., Eden, D. T., Price, T. R., & Robinson, R. G. (1988). The sensitivity and specificity of the center for epidemiologic studies depression scale in screening for post-stroke depression. International Journal of Psychiatry in Medicine, 18, 169-181. Pfeiffer, E. (1975). A short portable mental status questionnaire for the assessment of organic brain deficit in elderly patients. Journal of the American Geriatrics Society, 23, 433-441. Piersma, H. L., Reaume, W. M., & Boes, J. L. (1994). The Brief Symptom Inventory (BSI) as an outcome measure for adult psychiatric inpatients. Journal of Clinical Psychology, 50, 555-563. Radloff, L. S. (1977). The CES-D scale: A self report depression scale for research in the general population. Applied Psychological Measurement, 1, 385-401. Radloff, L. S., & Locke, B. Z. (1985). The community mental health assessment survey and the CES-D scale. In M. M. Weissman, J. K. Myers, & C. G. Ross (Eds.), Community survey of psychiatric disorder. New Brunswick: Rutgers University Press. Rameizl, P. (1984). A case for assessment technology in long-term care: The nursing perspective. Rehabilitation Nursing, 9, 29-31. Rand, E. H., Badger, L. W., & Coggins, D. R. (1988). Toward a resolution of contradictions: Utility of feedback from the GHQ. General Hospital Psychiatry, 10, 189-196. Regier, D. A., Boyd, J. H., Burke, J. D., Rae, D. S., Myers, J. K., Kramer, M., Robins, L. N., George L. K., Karno, M., & Locke, B. Z. (1988). One month prevalence of mental disorders in the United States. Archives of General Psychiatry, 45, 977-986. Regier, D. A., Goldberg, I. D., Burns, B. J., Hankin, J., Hoeper, E. W., & Nyez, G. R. (1982). Specialist/generalist division of responsibility for patients with mental disorders, Archives of General Psychiatry, 39, 219-224. Regier, D., Goldberg, I., & Taube, C. (1978). The defacto U.S. mental health services system:

< previous page

page_77

next page > Page 77

A public health perspective. Archives of General Psychaitry, 35, 685-693. Regier, D. A., Robert, M. A., Hirschfeld, Goodwin, F. K., Burke, J. D., Lazar, J. B., & Judd, L. L. (1988). The NIMH depression awareness, recognition, and treatment program: Structure, aims, and scientific basis. American Journal of Psychiatry, 145, 1351-1357. Reisberg, B. (1984). Stages of cognitive decline. American Journal of Nursing, 84, 225-228. Reisberg, B., Ferris, S., deLeon, M., & Crook, T. (1982). The Global Deterioration Scale for assessment of primary degenerative dementia. American Journal of Psychiatry, 139, 1136-1139. Reisberg, B., Ferris, S., deLeon, M., & Crook, T. (1988). Global Deterioration Scale (GDS). Psychopharmacology Bulletin, 24, 661-663. Reynolds, W. M., & Gould, J. W. (1981). A psychometric investigation of the standard and short form Beck Depression Inventory. Journal of Consulting Clinical Psychology, 49, 306-307. Riskind, J. H., Beck, A. T., Brown, G., & Steer, R. A. (1987). Taking the measure of anxiety and depression: Validity of the reconstructed Hamilton scales. Journal of Nervous and Mental Disorders, 175, 474-479. Roberts, R. E., Rhoades, H. M., & Vernon, S. W. (1990). Using the CES-D Scale to screen for depression and anxiety: Effects of language and ethnic status. Psychiatry Research, 31, 69-83. Robins, L. N., Helzer, J. E., Weissman, M. M., Orvaschel, H., Greenberg, E., Burke, J. D., & Regier, D. A. (1984). Lifetime prevalence of specific psychiatric disorders in three sites. Archives of General Psychiatry, 41, 949-958. Roca, P., Klein, L., Kirby, S., McArthur, J., Vogelsang, G., Folstein, M., & Smith, C. (1984). Recognition of dementia among medical patients. Archives of Internal Medicine, 144, 73-75. Rosenthal, T. L., Miller, S. T., Rosenthal, R. H., Sadish, W. R., Fogelman, B. S., & Dismuke, S. (1992). Assessing emotional interest at the internist's office. Behavioral Research and Therapy, 29, 249-252. Royce, D., & Drude, K. (1984). Screening drug abuse clients with the Brief Symptom Inventory. International Journal of Addiction, 19, 849-857. Saravay, S. M., Pollack, S., Steinberg, M. D., Weinschel, B., & Habert, B. A. (1996). Four-year follow-up of the influence of psychological comorbidity on medical rehospitalization. American Journal of Psychiatry, 153, 397-403. Schmidt, N., & Telch, M. (1990). Prevalence of personality disorders among bulimics, non-bulimic binge eaters, and normal controls. Journal of Psychopathology and Behavioral Assessment, 12, 160-185. Schulberg, H. C., Saul, M., McClelland, M., Ganguli, M., Christy, W., & Frank, R. (1985). Assessing depression in primary medical and psychiatric practices. Archives of General Psychiatry, 42, 1164-1170. Schurman, R. A., Kramer, P., & Mitchell, J. B. (1985). The Hidden Mental Heath Network. Archives of General Psychiatry, 42, 89-94. Seay, T., & Beck, T. (1984). Alcoholism among college students. Journal of College Student Personnel, 25, 9092. Sederer, L. I., & Dickey, B. (1996). Outcomes assessment in clinical practice. Baltimore: Williams & Wilkins. Seltzer, A. (1989). Prevalence, detection and referral of psychiatric morbidity in general medical patients. Journal of the Royal Society of Medicine, 82, 410-412. Shapiro, S., German, P., Skinner, E., Von Korff, M., Turner, R., Klein, L., Teitelbaum, M., Kramer, M., Burke, J., & Burns, B. (1987). An experiment to change detection and management of mental morbidity in primary care. Medical Care, 25, 327-339. Shore, D., Overman, C., & Wyatt, R. (1983). Improving accuracy in the diagnosis of Alzheimer's disease. Journal of Clinical Psychiatry, 44, 207-212. Shrout, P. E., & Yager, T. J. (1989). Reliability and validity of screening scales: Effect of reducing scale length. Journal of Clinical Epidemiology, 42, 69-78. Snyder, S., Strain, J. J., & Wolf, D. (1990). Differentiating major depression from adjustment disorder with depressed mood in the medical setting. General Hospital Psychiatry, 12, 159-165. Somoza, E. (1994). Classification of diagnostic tests. International Journal of Biomedical Computing, 37, 41-55. Somoza, E. (1996). Eccentric diagnostic tests: Redefining sensitivity and specificity. Medical Decision Making, 16, 15-23.

< previous page

page_78

next page > Page 78

Somoza, E., & Mossman, D. (1990a). Introduction to neuropsychiatric decision making: Binary diagnostic tests. Journal of Neuropsychiatry and Clinical Neurosciences, 2, 297-300. Somoza, E., & Mossman, D. (1990b). Optimizing REM latency as a diagnostic test for depression using ROC analysis and information theory. Biological Psychiatry, 27, 990-1006. Somoza, E., & Mossman, D. (1991). Biological markers and psychiatric diagnosis: Risk-benefit analysis using ROC analysis. Biological Psychiatry, 29, 811-826. Somoza, E., & Mossman, D. (1992a). Comparing and optimizing diagnostic tests: An information-theoretic approach. Medical Decision Making, 12, 179-188. Somoza, E., & Mossman, D. (1992b). Comparing diagnostic tests using information theory: The INFO-ROC technique. Journal of Neuropsychiatry and Clinical Neurosciences, 4, 214-219. Somoza, E., Steer, R. A., Beck, A. T., & Clark, D. A. (1994). Differentiating major depression and panic disorders by self-report and clinical rating scales: ROC analysis and information theory. Behavioral Research and Therapy, 32, 771-782. Spilker, B. (1996). Quality of life and pharmacoeconomics in clinical trials. Philadelphia: Lippincott-Raven. Steer, R. A., & Beck, A. T. (1996). Beck Depression Inventory. In L. I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 100-104). Baltimore: Williams & Wilkens. Steffens, D. C., Welsh, K. A., Burke, J. R., Helms, M. J., Folstein, M. F., Brandt, J., McDonald, W. M., & Breitner, J. C. (1996). Diagnosis of Alzheimer's disease in epidemiologic studies by staged review of clinical data. Neuropsychiatry, Neuropsychology and Behavioral Neurology, 2, 107-113. Strain, J. J., Fulop, G., Lebovits, A., Ginsberg, B., Robinson, M., Stern, A. Charap, P., & Gany, F. (1988). Screening devices for diminished cognitive capacity. General Hospital Psychiatry, 10, 16-23. Striegel-Moore, R., Silberstein, L., Frensch, P., & Rodin, J. (1989). A prospective study of disordered eating among college students. International Journal of Eating Disorders, 8, 499-509. Swedo, S. E., Rettew, D. C., Kuppenheimer, M., Lum, D., Dolan, S., & Goldberger, E. (1991). Can adolescent suicide attempters be distinguished from at-risk adolescents. Pediatrics, 88, 620-629. Swets, J. A. (1964). Signal detection and recognition by human observers. New York: Wiley. Swets, J. A. (1979). ROC analysis applied to the evaluation of medical imaging techniques. Investigatory Radiology, 14, 109-121. Szulecka, T., Springett, N., & De Pauw, K. (1986). Psychiatric morbidity in first-year undergraduates and the effect of brief psychotherapeutic interventionA pilot study. British Journal of Psychiatry, 149, 75-80. Telch, M., Lucas, J., & Nelson, P. (1989). Non-clinical panic in college students: An investigation of prevalence and symptomatology. Journal of Abnormal Psychology, 98, 300-306. Teri, L., Larson, E., & Reifler, B. (1988). Behavioral disturbance in dementia of the Alzheimer type. Journal of the American Geriatrics Society, 36, 1-6. Tollefson, G. D. (1990). Differentiating anxiety and depression. Psychiatric Medicine, 8, 27-39. Vecchio, T. J. (1966). Predictive value of a single diagnostic test in unselected populations. New England Journal of Medicine, 274, 1171. Von Korff, M., Dworkin, S. F., & Kruger, A. (1988). An epidemiologic comparison of pain complaints. Pain, 32, 173-183. Weinstein, M. C., Berwick, D. M., Goldman, P. A., Murphy, J. M., & Barsky, A. J. (1989). A comparison of three psychiatric screening tests using Receiver Operating Characteristic (ROC) analysis. Medical Care, 27, 593-607. Weinstein, M. C., Fineberg, H. V., Elstein, A. S., Frazier, H. S., Neuhauser, D., Neutra, R. R., & McNeil, B. J. (1980). Clinical decision analysis, Philadelphia: Saunders. Weissman, M. M., & Merikangas, K. R. (1986). The epidemiology of anxiety and panic disorder: An update. Journal of Clinical Psychiatry, 47, 11-17. Weissman, M. M., Myers, J. K., & Thompson, W. D. (1981). Depression and its treatment in a U.S. urban community. Archives of General Psychiatry, 38, 417-421. Wells, C. (1979). Pseudodementia. American Journal of Psychiatry, 136, 895-900. Wells, K. B., Golding, J. M., & Burnam, M. A. (1988). Psychiatric disorders in a sample of

< previous page

page_79

next page > Page 79

the general population with and without chronic medical conditions. American Journal of Psychiatry, 145, 976-981. Wells, V., Klerman, G., & Deykin, E. (1987). The prevalence of depressive symptoms in college students. Social Psychiatry, 22, 20-28. West, R., Drummond, C., & Eames, K. (1990). Alcohol consumption, problem drinking and anti-social behaviour in a sample of college students. British Journal of Addiction, 85, 479-486. Westefeld, J. S., Bandura, A., Kiel, J. T., & Scheel, K. (1996). The College Student Reason for Living Inventory: Additional psychometric data. Journal of College Student Development, 37, 348-350. Westefeld, J. S., Cardin, D., & Deaton, W. L. (1992). Development of the College Student Reasons for Living Inventory. Suicide and Life-Threatening Behavior, 22, 442-452. Westefeld, J. S., & Liddell, D. L. (1994). The Beck Depression Inventory and its relationship to college student suicide. Journal of College Student Development, 35, 145-146. Whitaker, A., Johnson, J., Shaffer, D., Rapoport, J., Kalikow, K., Walsh, B., Davies, M., Braiman, S., & Dolinsky, A. (1990). Uncommon trouble in young people: Prevalence estimates of selected psychiatric disorders in a non-referred adolescent population. Archives of General Psychiatry, 47, 487-496. Williams, J. B. (1988). A structured interview guide for the Hamilton Depression Rating Scale. Archives of General Psychiatry, 45, 742-747. Wilson, J. M., & Junger, F. (1968). Principles and practices of screening for diseases. (Public Health Papers No. 34). Geneva: WHO. Wise, M. G., & Taylor, S. E. (1990). Anxiety and mood disorders in mentally ill patients. Journal of Clinical Psychiatry, 51, 27-32. Woodworth, R. S. (1918). Personal Data Sheet. Chicago: Stoelting. Yelin, E., Mathias, S. D., Buesching, D. P., Rowland, C., Calucin, R. Q., & Fifer, S. (1996). The impact of the employment of an intervention to increase recognition of previously untreated anxiety among primary care physicians. Social Science in Medicine, 42, 1069-1075. Yopenic, P. A., Clark, C. A., & Aneshensel, C. S. (1983). Depression problem recognition and professional consultation. Journal of Nervous and Mental Disorders, 171, 15-23. Zabora, J. R., Smith-Wilson, R., Fetting, J. H., & Enterline, J. P. (1990). An efficient method for psychosocial screening of cancer patients. Psychosomatics, 31, 1992-1996. Zalaquett, C. P., & Wood, R. J. (1997). Evaluating stress: A book of resources. Lanham, Md: Scarecrow. Zung, W.K.W. (1965). A self-rating depression scale. Archives of General Psychiatry, 12, 63-70. Zung, W., Magill, M., Moore, J., & George, D. (1983). Recognition and treatment of depression in a family practice. Journal of Clinical Psychiatry, 44, 3-6.

< previous page

page_79

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_81

next page > Page 81

Chapter 3 Use of Psychological Tests/ Instruments for Treatment Planning Larry E. Beutler Ginger Goodrich Daniel Fisher Oliver B. Williams University of California, Santa Barbara The advent of the descriptive diagnostic system of DSM-III, biological psychiatry, and managed health care has conspired to produce a decline of third-party support for the use of formal psychological tests as routine diagnostic procedures in mental health. Descriptive diagnosis and the symptom focus of biological treatments eliminated the need for complex tests to identify covert and highly abstract psychic processes, as had previously been required in the diagnosis of disorders such as schizophrenia and neurosis. It is paradoxical that the same psychiatric and managed care forces that initially voiced concerns about maintaining reliable and valid diagnostic data have consistently preferred using of subjective and unreliable unstructured clinical interviews to gather this data, instead of empirically established, reliable, and valid psychological tests. The virtual exclusion of formal tests from the list of routinely approved intake procedures underlines the signal failure of psychological assessment to establish itself as making a meaningful contribution to treatment planning. To advantageously capitalize on the empirical advantages of psychological tests over unstandardized clinical methods, the nature and goals of the assessment process must change. The omnibus, broad-ranging instruments that have long served this tradition must give way to assessment procedures that are short, practical, and treatment centered. A new patient presents the clinician with a number of important questions; answering these is made easier by the use of reliable and empirically based assessment procedures: Is this condition treatable? Is psychotherapy or pharmacotherapy an appropriate treatment modality? What about family therapy? Should the treatment focus on the patient's symptoms, on broader symptoms of depression and anxiety, or even on the resolution of underlying, dynamic conflicts? Should the patient be hospitalized for further evaluation, or be seen by a neurologist or other medical specialist? What markers will tell practitioners when treatment can be safely terminated? The immediate challenge to the clinician is to decide on the most productive intervention with which to commence treatment and engage the client. Simultaneously, the clinician must foment a treatment plan that will be maximally effective in addressing the client's needs. In pursuing these objectives, it is implicitly acknowledged that treatments

< previous page

page_81

next page >

< previous page

page_82

next page > Page 82

that are effective for one client or problem may be ineffective for another. In recognition of this fact, health care researchers have attempted to develop guidelines that will assist clinicians by identifying both those treatments that have the highest likelihood of success and those that might be either inappropriate or minimally effective. The emerging field of Prescriptive Treatment Planning is devoted to the prescription of effective treatments and the proscription of ineffective ones (Beutler & Clarkin, 1990; Beutler & Harwood, 1995; Frances, Clarkin, & Perry, 1984). Because of their psychometric qualities relative to unstructured interview methods, and because they are adaptable to complex statistical manipulations, psychological tests are ideal for the task of developing standardized procedures for differentially assigning or recommending psychosocial treatments (e.g., Beutler & Berren, 1995; Butcher, 1990; Graham, 1987, 1993; Groth-Marnat, 1997). However, most of the indicators and signs that are employed by clinicians in making differential mental health treatment decisions from psychological tests are based on clinical experiences and conjectures rather than on empirical evidence of improved treatment efficacy or effectiveness. Accordingly, this chapter is devoted to providing the clinician with a representative overview of the research that suggests that test performance may predict both treatment outcome and, more importantly, a differential response to available treatments. It also reports an initial effort to develop a method that consolidates, in one measurement device, information that is currently available through a large battery of contemporary tests. Predictive Dimensions in Differential Therapeutics Psychological tests have traditionally been used to address questions within five domains: diagnosis, etiology or causes of behavior, prognosis and course, treatment planning, and functional impairment (Beutler & Rosner, 1995). Of these, as noted, questions of diagnosis and differential diagnosis have always been primary. Psychological tests like the Rorschach and Thematic Apperception Test (TAT) were developed and came to be relied on to uncover covert thought disorder, underlying dynamic conflicts, and pathological ideation and impulses associated with various diagnoses. These tests purported to be able to reveal these hidden processes much more validly than unstructured interviews. In response, diagnoses came to frequently depend on their being able to do so. As the diagnostic system became less reliant on evidence of covert, underlying processes, with the advent of the third edition of the Diagnostic and Statistical Manual (DSM-III; APA, 1983), weaknesses of projective tests became apparent (Butcher, 1995; Groth-Marnat, 1997; Nezworski & Wood, 1995). Even beyond the contributions of DSM-III, the seemingly unbridled expansion and growing complexity of the diagnostic system itself has raised concerns about the very processes of constructing psychiatric diagnoses. Contemporary disorders and their criteria (DSM-IV; APA, 1994) represent a consensual opinion of a committee of psychiatric experts, the majority vote of whom determines whether a given pattern of symptoms should be accorded the status of a socially viable ''syndrome" or "disorder." The committee's decisions to recognize a given cluster of symptoms as a diagnosable condition has traditionally been based on (a) the presence and frequency of the symptoms, (b) an analysis of the symptom's social significance and interpersonal effects, and where the empirical evidence has warranted, (c) the specificity of the symptomatic response to various classes of drugs.

< previous page

page_82

next page >

< previous page

page_83

next page > Page 83

However, the committees that have been responsible for the development of the various DSMs have largely ignored empirical information about patient characteristics and traits (e.g., coping styles, resistance, conflicts, etc.) that have been useful in the selection and use of various psychotherapeutic procedures. Consequently, even if a diagnosis is reliable (and there is still debate about this, e.g., Follette & Houts, 1996; Wells & Sturm, 1996), it provides little information on which to develop a differentially sensitive psychotherapeutic program. Although a patient with a diagnosis of major depression with vegetative signs can be expected to respond better to tricyclic antidepressants than to anxiolytics (e.g., Wells & Sturm, 1996), the diagnostic label does not allow a clinician to select among cognitive, interpersonal, or relationship psychotherapies. The symptoms that determine diagnosis are quite insensitive to qualities and characteristics that determine how well prospective patients will respond to them. The treatments themselves are cross-cutting. Their use is not bound by or specific to patient diagnoses and, in fact, diagnoses may be poor indicators for their implementation. It is unlikely that a cognitive therapy can be constructed to be so specific that it would work well for those with depression but not for those with anxiety, personality disorders, minor depressions, and eating disorders. Cognitive therapy has been applied to all of these disorders, as well as such widely different conditions as tics, sexual dysfunctions, sleep disorders, impulse control disorders, substance abuse disorders, adjustment disordersvirtually any condition in which thought and/or behavior is disrupted. Such theoretically diverse treatments as cognitive therapy, behavior therapy, psychodynamic therapy, and interpersonal therapy have all been advocated as treatments for the same multitude of diagnostic conditions. This cross-diagnostic application of psychotherapies does not mean that specific treatment indicators are not available. Indeed, some such indicators are present, but these indicating conditions were ignored in constructing the diagnostic criteria. Most clinicians realize the weaknesses and limitations of diagnosis and develop a large and rich array of treatment possibilities as they seek and obtain extradiagnostic information. This information is consolidated into both a patient formulation and a treatment plan. However, the patient formulations as well as the resulting treatment plans vary widely from clinician to clinician, even within a given theoretical framework (e.g., Caspar, 1995; Horowitz et al., 1984; Luborsky, 1996; Masterson, Tolpin, & Sifneos, 1991; Vaillant, 1997). The uniqueness of the diverse formulations reflect the amalgamation and influence both of therapists' personal theories of psychopathology and variations that exist among different formal systems. The failure of clinicians to rely on empirically derived dimensions of personality and prediction in constructing their formulations of patients and treatments probably reflects the absence of both knowledge about how to define and use such empirical predictors and of discriminating measures that are simply administered and reliably capture some of the patient characteristics related to treatment outcomes. Many authors have attempted to define the extradiagnostic dimensions that may allow a clinician to predict the differential effects of applying different therapeutic procedures. Most of these efforts have provided guidelines for the application of different procedures within a single theoretical framework, and few have attempted to incorporate the breadth of interventions that characterize widely different theories. For example, Hollon and Beck (1986) suggested the conditions under which cognitive therapy might be directed to schematic change versus changes in automatic thoughts, and Strupp and Binder (1984) introduced guidelines within which the psychodynamic therapist may differentially offer interpretations or support. However, because any single theoretical

< previous page

page_83

next page >

< previous page

page_84

next page > Page 84

framework is less than comprehensive of the many foci, procedures, and strategies that are advocated by the available array of psychotherapies, these mono-theoretical guidelines are necessarily incomplete and weakened. Recognizing the limitations that exist when only procedures advocated by a single theory can be selected for use, in recent years there has emerged a strong movement toward "technical eclecticism" among practitioners and researchers alike (Norcross & Goldfried, 1992; Striker & Gold, 1993). The several approaches that constitute this movement, although diverse in type, share the objective of developing guidelines for the selection of maximally effective interventions from the broadest possible array of proven procedures, regardless of the theories that gave rise to them. These guidelines specify the characteristics of patients and therapeutic situational demands that best fit one another by various theories of intervention. The various models differ in their level of technical specificity and in the nature of the constructs that they select as most important. For example, Lazarus (1981) developed one of the more widely recognized integrative models, Multi-Modal Therapy (MMT). MMT offers a general framework on which patient experience and problems are defined, and relates these general dimensions to the use of different models and techniques of treatment. Specifically, MMT provides a structured means for assessing the relative and absolute levels of problems in seven general domains of experience, the collection of which is described by the acronym BASICID (Behaviors, Affects, Sensory experiences, Imagery, Cognitions, Interpersonal relationships, and need for Drugs). The clinician observes the levels of disturbance in each domain and then determines their interrelations in the form of a firing or triggering order, describing the pattern of behavior that occurs when the problems arise. Then, the model proposes classes of interventions that correspond with the dimensions of patient experience affected by the problem. Thus, experiential procedures may be used when sensory and affective experiences are disturbed, behavioral interventions may be used when behavioral symptoms are disruptive, cognitive change procedures may be used when dysfunctional thoughts are observed, and so forth. In contrast to the focus on problem activation that defines the integration of procedures within MMT, other approaches have emphasized other methods of defining how procedures are integrated. In some cases, this has resulted in a less specific relation being posited between patient dimensions and the nature of treatment techniques than that proposed by Lazarus. Some, for example, have identified stages of problem development or resolution as an organizing principle. These stage models vary from one to another by virtue of the degree of emphasis they place on patient variables (Prochaska, 1984) or intervening therapy goals (Beitman, 1987) as the stage indicators. Prochaska (1984), more specifically, identified broad classes of intervention that may be recommended as a function of the patient's stage of problem resolution. Thus, behavioral strategies are recommended when the patient is in a stage of active problem resolution, strategies that raise awareness are used when the patient is in a preconceptual (precontemplative) stage of problem resolution, insight strategies are used when one is in the process of problem contemplation and cognitive exploration, and so forth. Beitman (1987), on the other hand, applied the concept of stages to the organization of the course of psychotherapy rather than to the stage of problem resolution achieved by the patient. Accordingly, he emphasized that the early sessions should focus on relationship development, and then the therapist should proceed through helping the patient recognize patterns, changing these patterns, and preparing for termination. Beutler and Clarkin (1990) suggested a resolution of these viewpoints, offering the possibility that there may be interrelationships among patient and treatment stages. The

< previous page

page_84

next page >

< previous page

page_85

next page > Page 85

resolution of such differences is dependent on the ability and success in developing psychological tests that can reliably identify patient and problem information that is directly usable in planning treatments. Patient Predisposing Variables Psychometrically stable measurements of treatment-relevant patient dimensions (i.e., predisposing variables) potentially could be used to identify markers for the application of different interventions. Unavoidably, however, the patient, treatment, and matching dimensions that have been correlated with treatment effects is virtually limitless (Beutler, 1991). In an effort to bring some order to the many variables and diverse hypotheses associated with the several models of differential treatment assignment and to place them in the perspective of empirical research, Beutler and Clarkin (1990) grouped patient characteristics presented by the different theories into a series of superordinate and subordinate categories. This classification included seven relatively specific classes of patient variables that are distinguished both by their susceptibility to measurement using established psychological tests and by their ability to predict differential responses to psychosocial treatment (Beutler & Hodgson, 1993; Gaw & Beutler, 1995). These categories included: Functional Impairment, Subjective Distress, Problem Complexity, Readiness for/Stage of Change, Potential to Resist Therapeutic Influences, Social Support, and Coping Styles. These "patient predisposing" dimensions will provide points of reference for organizing the topics of this chapter as the use of psychological tests for treatment planning is considered. Table 3.1 summarizes some representative instruments that may be used for assessing these various dimensions.1 Functional Impairment Traditionally, literature on the role of problem severity in treatment success has confounded two aspects of patient functioning: level of impairment and subjective distress (e.g., Beutler, Wakefield, & Williams, 1994). Not surprisingly, therefore, research on this topic has produced mixed results. To clarify the differing roles of impairment and distress, this discussion follows Strupp, Horowitz, and Lambert (1997) and distinguishes between external ratings of social functioning (i.e., observed impairment) and self-reports of unhappiness and subjective distress. Level of patient impairment in social functioning is a variable of considerable importance to treatment studies. It is typically measured by the General Assessment of Functioning (GAF) scale from the DSM-IV or by specific problem-centered measures, 1 Because of limited space, this discussion is restricted to psychological interventions and initial predisposing variables. This chapter gives only cursory attention to how tests have been used for the selection either of medical/somatic interventions (e.g., hospitalization, ECT, medication, etc.), establishing DSM diagnoses, or for the purposes of making treatment alterations in midcourse. Likewise, the discussion does not include differential treatment planning for children and adolescents, explorations of the relation between treatment initiated changes and subsequent modifications of treatment plans, or of the relation between psychotherapy process events and subsequent outcomes. Refer to Beutler and Clarkin (1990) for a more extensive consideration of both patient and treatment variables within these latter classes of treatments.

< previous page

page_85

next page >

< previous page

page_86

next page > Page 86

TABLE 3.1 Representative Tests for Measuring Patient/Problem Dimensions Test Functional Subjective Readiness Problem Resistance Social Coping Impairment Distress for Complexity Potential Support Style Change ADIS** X BDI* X X HRSD** SCL-90X X R* STAI* X Stages of X Change* TRS* X CCRT** X MMPI* X X CPI* X FES* X Note. ADIS = Anxiety Disorders Interview Schedule, BDI = Beck Depression Inventory, HRSD = Hamilton Rating Scale for Depression, SCL-90-R = Symptom Checklist-90-Revised, STAI = State-Trait Anxiety Inventory, TRS = Therapeutic Reactance Scale, CCRT = Core Conflictual Relationship Theme, MMPI = Minnesota Multiphasic Personality Inventory, CPI = California Personality Inventory, FES = Family Environment Scale. * Self-report instruments ** Observer report instruments such as the Anxiety Disorders Interview Schedule (ADIS; DiNardo, O'Brien, Barlow, Waddell, & Blanchard, 1983) and the Hamilton Rating Scale for Depression (HRSD; Hamilton, 1967). Changes on these indices of impairment reflect treatment improvement, but initial level of functional impairment may also serve as an index that can be used for planning the intensity and modality of treatment. This conclusion derives from three lines of evidence. First, there is a negative relation between level of impairment and amount of treatment-related improvement, quite independently of the type of treatment. This finding has been obtained in such diverse conditions as bulimia (Fahy & Russell, 1993), obsessive-compulsive disorder (Keijsers, Hoogduin, & Schaap, 1994), major depression (Beutler, Kim, Davison, Karno, & Fisher, 1996), and substance abuse (McLellan, Woody, Luborsky, O'Brien, & Druley, 1983). Indeed, McClellan et al. determined that measures of functional impairment were the single best (negative) predictors of treatment outcome. Second, impairment level has implications for determining the intensity and length of treatment that will be helpful. In spite of the poor prognosis, there is evidence that persistent treatment may eventually induce an effect among moderately impaired patients (Gaw & Beutler, 1995; Keijsers et al., 1994). Shapiro et al. (1994), for example, compared behavioral and psychodynamic-interpersonal therapies that were applied over a format of either 8 or 16 weeks duration. The more intensive and lengthy treatment showed the most positive effects among those with high levels of impairment, regardless of the model or type of treatment implemented. Those with low levels of impairment were not benefitted by intensifying treatment. Third, functional impairment is an important variable in anticipating patient relapse. Maintenance of treatment effects appears to be negatively affected by the initial severity of patient impairment. Even the positive effects of intensive treatment among patients with high levels of impairment may be negated with time. Thus, the good results obtained by Shapiro et al. (1994), favoring more intensive treatment among more impaired

< previous page

page_86

next page >

< previous page

page_87

next page > Page 87

individuals, largely disappeared after a year (Shapiro et al., 1995). Likewise, T. A. Brown and Barlow (1995) demonstrated both initial therapeutic gains among patients with panic disorder, and a negative relation between impairment and maintenance of benefit 2 years later. They concluded that even when treatment is able to induce an initial positive effect among those with high levels of initial impairment, these improvements are not wellmaintained when compared to patients who were less severely impaired. In all of these studies, unfortunately, even the most intensive treatment was short term and infrequent, compared to conventional standards for treating those with severe problems. The intensive treatment studied by Shapiro and his group ran just 16 sessions and certainly would not be considered "intense" by most practitioners. Studying a treatment whose length and frequency was more typical of the intensity applied to such problems in usual practice might have improved the results, both by facilitating the initial gains and by maintaining the effects of treatment longer than the short-term treatments studied in these investigations. Another important area of study is the exploration of whether impairment level can help one selectively apply different treatments. Some evidence, for example, suggests that level of impairment may differentially respond to different classes of antidepressant drugs (e.g., Ackerman, Greenland, Bystritsky, Morgentern, & Katz, 1994). Of note in this line of investigation are studies that have attempted to support the conventional wisdom that level of functional impairment is a negative or contra-indicator for using psychosocial treatments and a positive indicator for applying psychopharmacological interventions. The well-known National Institute of Mental Health (NIMH) collaborative study of depression (Elkin et al., 1989), for example, using composite measures of initial impairment level, determined that those with severe symptoms responded more rapidly to tricyclic antidepressants than to psychotherapy. However, this conclusion is not strongly supported by other lines of research. For example, several studies have used clinician ratings of "endogeneity" as a measure of impairment level, and have explored this measure as an index for predicting the differential efficacy of medical and psychosocial interventions among depressed patients. The studies indicate that pharmacotherapy achieves its greatest efficacy among patients with endogenous symptoms, as compared to those with less severe reactive depressions. Surprisingly, however, they have uniformly failed to find the hypothesized difference between various forms of cognitive therapy and pharmacotherapy among the most seriously symptomatic participants (see Simons & Thase, 1992). Other evidence confirms the surprising observation that psychosocial interventions are at least as effective as antidepressant and antianxiety medication among most nonsomatic patients (Antonuccio, Danton, & DeNelsky, 1995; Elkin et al., 1989; Nietzel, Russell, Hemmings, & Gretter, 1987; Robinson, Berman, & Neimeyer, 1990). In contrast, there is also promising evidence that functional impairment may be a mediator of differential effects attributed to various psychosocial models of treatment (e.g., Fremouw & Zitter, 1978; Joyce & Piper, 1996; McLellan et al., 1983; Shoham-Salomon & Rosenthal, 1987; Woody et al., 1984). Of special note, unimpaired object (interpersonal) relations (Joyce & Piper, 1996) and absence of comorbid personality disorders (Woody et al., 1984) have been found to enhance the power of dynamically oriented psychotherapy. The opposite relationships may also holdpoor interpersonal relationships and complex personality disorders may respond poorly to psychodynamic and insight treatments, compared to other psychosocial interventions. For example, Kadden, Cooney, Getter, and Litt (1989) found that patients who exhibited sociopathic personality patterns responded better to cognitive-behavioral coping skills training than

< previous page

page_87

next page >

< previous page

page_88

next page > Page 88

to an insight-oriented, interpersonal, interactional group therapy. Likewise, in a study of acutely impaired, psychiatric inpatients, Beutler, M. Frank, Scheiber, Calvert, and Gaines (1984) concluded that in this population, experiential-expressive interventions are not as effective as interactive, process-oriented, or behaviorally oriented therapies. Patients treated with experiential-expressive therapies showed increased symptoms and deterioration at the end of treatment, whereas these negative effects were not found among those treated with the other interventions. Subjective Distress Patient distress is a cross-cutting, cross-diagnostic index of well-being. It is poorly correlated with external measures of impairment, and is a transitory or changeable symptom state (Strupp et al., 1997; Lambert, 1994). In clinical research, the Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961), the SCL-90-R (Derogatis, 1994), and the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Lushene, 1970) have been most often used for assessing subjective distress. Interestingly, theoretical perspectives have emphasized the importance of distress as a motivating variable in keeping a patient engaged in treatment (J. D. Frank & J. B. Frank, 1991), as well as a measure of improvement. There is at least modest support of this latter proposition and, unlike the retarding effect of patient level of impairment, moderate amounts of subjective distress have generally been found to be a positive correlate of subsequent improvement (Lambert, 1994). Specifically, there is reasonably consistent evidence that psychosocial treatments achieve their greatest effects among those with relatively high initial levels of subjective distress (e.g., Klerman, 1986; Klerman, DiMascio, Weissman, Prusoff, & Paykel, 1974; Lambert & Bergin, 1983). These findings are especially strong among those with ambulatory depressions, general anxiety, and diffuse medical symptoms. Using the BDI as a measure of distress, for example, Parker, Holmes, and Manicavasager (1986) found that initial depression severity was positively correlated with treatment response among general medical patients. Likewise, Mohr et al. (1990) observed that the likelihood (though not the magnitude) of response to treatment was positively and linearly associated with general symptom severity on the SCL-90-R among patients with moderately severe depression. Even further, among patients with mild and moderate impairment levels, research evidence suggests that psychosocial interventions are as effective as antidepressant and antianxiety medications (Elkin et al., 1989; Nietzel et al., 1987; Robinson et al., 1990). These findings are not entirely uniform, however, and some evidence indicates that subjective distress may relate to outcome in a curvilinear fashion, particularly when personality disturbance or somatic symptoms are present. Hoencamp, Haffmans, Duivenvooden, Knegtering, and Dijken (1994), for example, found that whereas a positive, linear relation existed between distress and improvement among depressed and anxious patient groups, a curvilinear relation chararacterized those who had a comorbid personality disorder (with the exception of obsessive-compulsive personality disorder) and those whose complaints were weighted heavily in the direction of somatic symptoms. At least among nonsomatic patients, subjective distress has been implicated in the prediction of differential responses to various forms of psychotherapeutic treatment. For example, in the NIMH collaborative study of moderate depression, subjective distress, as measured by the BDI, differentiated the efficacy of the psychotherapeutic treatments (Imber et al., 1990). Those patients with the most severe distress were most effectively

< previous page

page_88

next page >

< previous page

page_89

next page > Page 89

treated by interpersonal psychotherapy, as compared to cognitive therapy. Beutler et al. (1996), employing a similar sample, demonstrated that level of subjective distress was positively related to the efficacy of selfdirected, supportive forms of treatment, but was not substantially related to the effects of cognitive and experiential treatments. The consistency of the foregoing relation is mitigated by the pattern of response among patients for whom somatic symptoms are prominent. Blanchard, Schwarz, Neff, and Gerardi (1988) determined that subjective distress (measured by the STAI) was negatively, rather than positively, associated with improvment among patients with irritable bowel syndrome. Those patients whose subjective anxiety did not exceed moderate limits were most likely to benefit from behavioral and self-regulatory treatment. Similarly, using the BDI as a subjective measure of distress/depression, Jacob, Turner, Szekely, and Eidelman (1983) suggested that those with low distress levels were more likely than those with high levels to benefit from self-monitored relaxation as a treatment for headaches. Patients with moderate and high levels of subjective distress were most inconsistently benefitted by behavioral and psychotherapeutic treatments. Readiness for Change Prochaska and colleagues (1984; Prochaska & DiClemente, 1986; Prochaska, DiClemente, & Norcross, 1992) suggested that a patient's progress in treatment is a function of how well the intervention method used fits the patient's position along a progressive series of stages reflecting personal efforts to change. He and his colleagues identified five stages or phases through which a person progresses as they seek to change an aspect of their life. The Stages of Change Questionnaire (Prochaska, Velicer, DiClemente, & Fava, 1988) is designed to assess these stages of readiness and differential receptivity to different interventions. The patient's stage of change is taken as an indication of the level of a patient's receptivity to different strategies of influence. It is thought that an individual normally proceeds sequentially through the stages, sometimes recycling several times, in the course of intentionally implementing change. These stages of readiness include: precontemplation, contemplation, preparation, action, and maintenance. Prochaska and his colleagues posed two hypotheses regarding the stage of readiness achieved by a patient: (a) More advanced stages of readiness are associated with a greater likelihood of improvement; and (b) the stage of readiness serves as an indicator for the use of specific therapeutic interventions. In support of the first of these propositions, they demonstrated that among patients seeking help to quit smoking, those who progressed to a higher stage of readiness during the early phase of treatment also doubled the likelihood of making improvement within the subsequent 6 months (Prochaska et al., 1992). Research support of the proposition that a patient's pretreatment stage of readiness predicts a differential response to specific interventions has been more difficult to obtain. Prochaska (1984) initially postulated that action-oriented (behavior) therapies were best suited to individuals who had achieved the preparation and action stages of readiness, but would be less suited to patients who were in the precontemplation or contemplation stages of readiness. In turn, consciousness-raising and motivation enhancement techniques (e.g., insight-oriented therapies) were posited to be most effective for patients in these early stages of readiness. Prochaska's proposals have stimulated a good deal of research on how people prepare themselves for making changes and how these processes may be used for treatment

< previous page

page_89

next page >

< previous page

page_90

next page > Page 90

planning (e.g., O'Connor, Carbonari, & DiClemente, 1996; Prochaska, Rossi, & Wilcox, 1991). Findings provide modest support for the value of fitting some intervention strategies to the patient's stage of readiness. For example, Prochaska et al. (1988) successfully demonstrated that patients' stage of change contributed to the relative effectiveness of different treatments for reducing smoking. Project MATCH (Project MATCH Research Group, 1997) compared the patient's pretreatment readiness for change to the effectiveness of various types of intervention. The findings were only partially supportive of Prochaska's predictions: Patients who were identified as having little readiness for change (precontemplative and contemplative stages) responded better to procedures designed to enhance motivation and encourage contemplation than to the more action-oriented procedures of cognitive-behavioral therapy. Patients at the action stage, however, did not show the expected preference for cognitive therapy, and the significant fit obtained between stage and therapy strategy proved to be time dependent, emerging only during the last month of the follow-up. Problem Complexity In addition to the severity of symptomatic presentation and the stage of problem resolution achieved, problems also vary in their complexity (Beutler, 1983; Beutler & Clarkin, 1990). Complexity is indexed by the concomitant presence of personality disorders, by evidence of chronicity of major disorders, and by evidence that interpersonal and conflictual patterns recur in persistent and pervasive ways. Recurrent patterns that indicate complex, problematic behavior are thought to be evoked by the similarity of symbolic meanings given to evoking cues, rather than by obvious similarities in overt stimulus characteristics (Barber, 1989; Crits-Christoph & Demorest, 1991). Complex patterns are expressed in a similar way across a large number and variety of social systems, transcending specific events and situations. The Minnesota Multiphasic Personality Inventory (MMPI; Butcher, 1990) and other omnibus personality measures have been used to assess the chronicity of problems associated with predicting the value of symptomfocused interventions. Knight-Law, Sugerman, and Pettinati (1988), for example, found that the effectiveness of behavioral-symptom focused interventions was highest among those patients whose MMPIs indicated that their problems were reactive and situational. Similar evidence that situation-specific problems are more responsive to behavioral treatments than chronic and recurrent ones has accrued among individuals with complex somatic symptoms (LaCroix, Clarke, Bock, & Doxey, 1986), those who abuse alcohol (Sheppard, Smith, & Rosenbaum, 1988), those with eating disorders (Edwin, Anderson, & Rosell, 1988), and patients with chronic back pain (Trief & Yuan, 1983). Although omnibus personality tests are useful in assessing chronicity, they are limited in identifying the significance of pervasive dynamic conflicts. Among the instruments that are designed to determine the presence and pervasiveness of interpersonal, conflictual themes, the Core Conflictual Relationship Theme (CCRT) method (Barber, 1989; Crits-Christoph & Demorest, 1988; Crits-Christoph, Demorest, & Connolly, 1990; CritsChristoph, Luborsky, Dahl, Popp, Mellon, & Mark, 1988; Luborsky, 1996; Luborsky, Crits-Christoph, & Mellon, 1986) is probably the most promising. The CCRT is based either on clinician ratings or self-reports, and is designed to define patterns related to complex, dynamically oriented problems. The method identifies three sequential aspects of recurring interpersonal behaviors: the organizing wishes that motivate the interaction,

< previous page

page_90

next page >

< previous page

page_91

next page > Page 91

the actions anticipated from others if these wishes are expressed, and the acts of self that either follow or prevent these acts of others. The pervasiveness of a given theme across a variety of interpersonal relationships can be viewed as an index of problem complexity. Treatments vary widely in the breadth of their objectives, ranging from symptomatic to thematic. These variations in breadth are reminiscent of corresponding variations in problem complexity, suggesting a link between the two. For example, psychosocial interventions, as a rule, are aimed at broader objectives than medical ones (DeRubeis et al., 1990; Simons, Garfield, & Murphy, 1984), and treatments that are oriented toward insight and awareness focus on broader themes than behavioral and cognitive ones (e.g., Caspar, 1995; Luborsky, 1996; Strupp & Binder, 1984). The obvious similarity between problem complexity and treatment focus suggests an optimal fit between problem complexity and the breadth of treatment focus applied. Thus, high problem complexity should favor psychosocial over pharmacological interventions, and systemic or dynamic treatments over symptom-focused procedures (Gaw & Beutler, 1995). In the absence of research directly on this hypothesis, evidence for its validity is, necessarily, indirect. One line of supportive research has revealed that recurrent themes are useful as guides for constructing dynamic interventions. Two findings are relevant, indicating that treatment outcome is enhanced as a function of the level of correspondence between the interpretation offered by the therapist and the dynamic theory directing therapy (Goldfried, 1991), and the level of correspondence between the interpretation offered and the most pervasive (independently determined) theme (Crits-Christoph, Cooper, & Luborsky, 1988). Collectively, these data suggest that amount of problem complexity is positively related to the effectiveness of broad-band, psychodynamic interventions. A second line of investigation reveals that problem complexity is inversely related to the impact of narrow-band treatments. Using comorbidity (coexisting personality or somatic disorder) as an index of complexity, several studies (e.g., Fahy & Russell, 1993; Fairburn, Peveler, Jones, Hope, & Doll, 1993; Hoencamp et al., 1994) have found that, in treating patients with cognitive-behavioral therapy (a symptom-focused treatment), complexity was a negative indicator of improvement. Wilson (1996) conceded that cognitive-behavioral treatment has been observed to have poor effects on such patients, but he pointed out that such complexity is a negative prognostic factor for all interventions. Whereas this point is well taken and indicates the need to standardize the means of identifying problem complexity, the collection of results suggests a promising mediating role of problem complexity in predicting (or controlling) the benefit of treatments that vary in breadth of focal objectives. Reactant/Resistance Tendencies Several investigations have explored the predictive role of patient resistance to psychosocial interventions in selecting therapy procedures. These studies vary in the degree to which they consider resistance to be a statelike or traitlike variable (e.g., Beutler, Engle, et al., 1991; Miller, Benefield, & Tonigan, 1993). Both aspects of resistance must be considered, however. Traitlike resistance has been particularly promising as both an indicator of poor prognosis and as a mediator of differential treatment response (Arkowitz, 1991). For example, Khavin (1985) compared the psychological characteristics of 50 young adult

< previous page

page_91

next page >

< previous page

page_92

next page > Page 92

males who were being treated for stuttering with psychosocial interventions and concluded that resistance-prone patients did poorly in all forms of intervention. The Ego Strength (ES) subscale of the MMPI (Barron, 1953) and the Therapeutic Reactance scale (TRS; Dowd, Milne, & Wise, 1991) were designed specifically as traitlike measures of prognosis to be used to predict resistance to treatment. Although the success of single-scale MMPI indices has been mixed (Graham, 1987), the TRS scale is better founded in theory and holds considerable promise for predicting differential response to directive and nondirective therapies (e.g., Dowd, Wallbrown, Sanders, & Yesenosky, 1994; Horvath & Goheen, 1990; Hunsley, 1993; Tracey, Ellickson, & Sherry, 1989). Most of the studies of this proposition involve patients with transitory and acute conditions, however. Empirical validation of this test among representative clinical populations is still needed. To broaden the applicability of the TRS and to define the correlates of resistance traits, Dowd and his colleagues (Dowd et al., 1994) inspected correlates of this test when compared to established measures of personality traits. For example, they regressed their own and a German-language measure of resistance traits (Fragebogen zur Messung der psychologischen Reactanz [Questionnaire for the Measurement of Psychological Reactance]; Merz, 1983) on scores from the California Psychological Inventory-Revised (CPI-R; Gough, 1987) scales. The results indicated that resistance-prone individuals were relatively less concerned about ''impression management" and relatively more likely to resist rules and social norms than people who had low resistance potential. Moreover, high trait-reactant individuals preferred work settings that allow them to exercise personal freedom and initiative. These findings suggest that high resistant individuals would respond poorly to therapies that are highly therapistcontrolled and directive (Beutler, 1983, 1991; Shoham-Salomon & Hannah, 1991). Several researchers have extended this hypothesis to postulate that high resistance-prone individuals would respond to paradoxical interventions that capitalize on their oppositional tendencies (e.g., Shoham-Salomon, Avner, & Neeman, 1989; Swoboda, Dowd, & Wise, 1990). Horvath and Goheen (1990) supported this hypothesis, finding that clients whose TRS scores indicated high levels of traitlike resistance responded well to a paradoxical intervention, maintaining their improvements beyond the period of active treatments. Less reactant clients exposed to the same treatment deteriorated after active treatment stopped. The reverse pattern was found among those treated with a nonparadoxical, stimulus control intervention. A prospective test of the hypothesis that clients who varied on measures of resistance potential would respond in an opposite way to directive and nondirective therapies was undertaken by Beutler, Engle et al. (1991), using a combination of MMPI subscales as an index of resistance. They demonstrated that manualized therapies differing in level of therapist directiveness were differentially effective for reducing depressive symptoms. Among high resistance-prone depressed subjects, the nondirective therapy surpassed the directive ones in effecting change in depressive symptoms, but the reverse was true among low resistant patients. This result was cross-validated at a 1-year follow-up of depression severity and relapse (Beutler, Machado, Engle, & Mohr, 1993), and also was extended to a cross-cultural sample of several alternative measures of resistance (Beutler, Mohr, Grawe, Engle, & MacDonald, 1991). In a study of brief directive and confrontational motivational interviews in treatment of problem drinkers, Miller and his colleagues (Miller et al., 1993) concluded that client in-therapy resistance was associated with poor outcomes at the 1-year follow-up. However, failure to measure traitlike resistance potential makes it difficult to say whether the pattern could have been altered by adjusting the nature of the intervention.

< previous page

page_92

next page >

< previous page

page_93

next page > Page 93

Beutler, Sandowicz, Fisher, and Albanese (1996) reviewed a variety of measures of patient state and traitlike resistance, along with associated evidence of their value in making differential treatment decisions. They concluded that the evidence strongly supports the role both of resistance traits and states, as measured by a variety of instruments, as contra-indicators for the use of directive interventions among psychotherapy patients. They also concluded that the evidence supports the value of differentially applying directive interventions to the treatment of low resistant and self-directed procedures to the treatment of high resistant patients. Social Support The level of social and interpersonal support from others has also been widely postulated as a predictor of therapeutic outcome and maintenance. Numerous studies have found that the presence of social support can improve outcomes in psychotherapy and decrease the likelihood of relapse (e.g., Sherbourne, Hays, & Wells, 1995; Vallejo, Gasto, Catalan, Bulbena, & Menchan, 1991). However, a close inspection of this literature suggests that some methods of measuring social support are better than others in making this prediction. Measures of social support either rely on external evidence of resource availability (objective support), such as proximity of family members, marriage, social network participation, and so on (e.g., Ellicott, Hammen, Gitlin, G. Brown, & Jamison, 1990), or on self-reports (e.g., R. H. Moos & B. S. Moos, 1986) by patients themselves (subjective support). These two methods of measurement play different roles in predicting response to treatment. For example, using the Family Environment scale (FES; R. H. Moos & B. S. Moos, 1986), R. H. Moos (1990) found that the proximal availability of at least one objectively identified confidant and family support member and the level of satisfaction derived from these relationships, each significantly and independently increased the likelihood of improvement among depressed patients. In comparing the predictive power of subjective and objective measures of support, Hooley and Teasdale (1989) found that the impact of subjective social support exceeded the impact of objective measures in the treatment of depressed patients. For example, the quality of the marital relationship, rather than its presence, predicted relapse rates. And, not all indices of marital quality are of equal importance. In fact, the level of perceived personal criticism from spouses accounted for more of the variance in relapse rates than did the presence of less personal marital conflict. The relative importance of subjective social support as a predictor of outcome also found support in a study by Hoencamp et al. (1994). Perceived lack of family support was found to have a significant negative association with treatment outcome with moderately depressed outpatients. This study made the surprising finding that the quality of perceived contact with children was negatively related to outcome, however. Those patients who reported having a poor relationship with their children improved more than those who sought and felt support from their children. This is an interesting finding but one whose interpretation is still uncertain. Clinicians should not prematurely reject the role of objective social support. Billings and R. H. Moos (1984) investigated the relation between the availability of social support networks and the chronicity and impairment associated with depression. Although both chronic and nonchronic patients reported having fewer available social resources than nondepressed controls, only among patients with nonchronic depression was the severity of the problem related to the availability of social resources. These findings raise some

< previous page

page_93

next page >

< previous page

page_94

next page > Page 94

interesting questions about the role of social support in the etiology of chronically depressed individuals, and about the possibility of a differential effect of activating support systems within the treatment of depression along the continuum of chronicity. Pursuing this point further, one of the most interesting aspects of social support may be its potential as an index for differential assignment of intensive and short-term treatments. Moos (1990), for example, found that social support availability was related to the optimal duration of treatment, the level serving as either an indicator or a contraindicator for the application of long-term treatment. Depressed patients who lacked social support continued to improve as a direct function of the number of weeks of treatment, whereas those patients who had satisfying support systems achieved an asymptotic level of benefit early in treatment and failed to benefit from continuing treatment. Interestingly, these latter patients were at risk for deterioration during long-term therapy. Another line of research in the domain of social support suggests that the way patients use the resources available to them may also be important, independent of the availability of these resources. Longabaugh and his colleagues (Longabaugh, Beattie, Noel, Stout, & Malloy, 1993) compared conventional measures of objective or subjective social support to the patient's level of social investmentthe effort expended in maintaining involvement with others. Social investment was measured by assessing both the amount of time a person spent close to another person (an aspect of objective support), and the patient's subjective perception of the quality of that relationship. Social investment, thus, incorporated concepts relating both to objective and subjective social support. Longabaugh et al. compared the independent roles of social support and social investment both in prognosis and differential response to psychotherapies. They found that both of these concepts predicted a differential response to relationship-enhancement and cognitive-behavioral therapies, but social investment had a more central and pervasive mediating role in this process. Among those who experienced little satisfying support from others, cognitive-behavioral therapy was more effective than relationship-enhancement therapy. This effect was not apparent among those who felt supported by others. However, these effects were partially ameliorated by the presence of high social investment. Among individuals who were judged to have high social investment, regardless of the level of available social support, relationship-enhancement therapy was more effective than cognitive therapy. A correspondent match between social investment and type of therapy also improved maintenance effects. Coping Styles People adopt characteristic ways of responding in times of distress. Coping styles embody a collection of both conscious and nonconscious behaviors that endure across situations and times (Butcher, 1990). These traitlike qualities go by various titles, but generally reflect qualities that vary from extroversion and impulsivity, on one hand, to self-constraint and emotional withdrawal, on the other (H. J. Eysenck, & S. B. G. Eysenck, 1969). This dimension of external to internal patterns of behavior can be assessed with omnibus personality measures, including such instruments at the Eysenck Personality Inventory (EPI; H. Eysenck & S. B. G. Eysenck, 1964), the MMPI, the California Personality Inventory (CPI), the NEO Personality Inventory (NEO-PI; Costa & McCrae, 1985); and the Millon Clinical Multiaxial Inventory-III (MCMI-III; Millon, 1994).

< previous page

page_94

next page >

< previous page

page_95

next page > Page 95

The CPI and MMPI have been most often used in the study of differential response to psychotherapy. Research of this type suggests that the effects of behaviorally and insight-oriented psychotherapies are differentially moderated by patient coping style, ranging from externalized and impulsive to internalized and seclusive. For example, in a well-controlled study of interpersonal and behavioral therapies, Kadden, Cooney, Getter, and Litt (1989) determined that alcoholic subjects who were high and low on the CPI Socialization subscale, a measure of sociopathic impulsivity, were predictive of the response to treatments that were based on behavioral and interpersonal insight models, respectively. Continued improvement over a 2-year follow-up period was also found to be greatest among compatibly matched, client-therapy dyads (Cooney, Kadden, Litt, & Getter, 1991). Other studies have also confirmed this relation and expand the role of patient coping style as a predictor of differential response to various psychotherapies. For example, Beutler and his colleagues (e.g., Beutler, Engle et al., 1991) found that depressed patients who scored high on the MMPI externalization subscales responded better to cognitive-behavioral treatment than to insight-oriented therapies, and the reverse was found with patients who scored low on this dimension. Both Beutler and Mitchell (1981) and Calvert, Beutler, and Crago (1988) found a similar pattern among mixed psychiatric in- and outpatients using the MMPI. Similarly, Longabaugh et al. (1994) found that alcoholics who were characterized as being impulsive and aggressive (externalizing behaviors) drank less frequently and with less intensity after receiving cognitive-behavioral treatment than after receiving relationship-enhancement therapy. The reverse was found with alcoholic clients who did not have these traits. Similarly, Barber and Muenz (1996) found cognitive therapy to be more effective than interpersonal therapy among patients who employed direct avoidance (externalization) as a coping mechanism, and interpersonal therapy was more effective among obsessively constricted (internalization) patients. Barber and Muenz noted the similarity to the findings of Beutler et al. and advanced an interpretation based on the theory of oppositesindividuals respond to interventions that are counter to their behavior and thereby undermine their own customary styles: For avoidant clients, cognitive therapy pushes clients to confront anxiety-provoking situations through homework and specific instructions, whereas obsessive clients who tend toward rigidity and intellectualization are encouraged to depart from these defenses by advancing interpersonal, insight-oriented interpretations. This latter interpretation has received some indirect support in studies of patient preferences for treatment type. Tasca and associates (Tasca, Russell, & Busby, 1994) found that externalizers preferred a process-oriented psychodynamic group over a structured activity-oriented group when allowed to make a choice, whereas internalizers preferred a cognitive-behavioral intervention. In each case, the therapy that is suggested as least effective by other research was preferred, perhaps because it posed less threat to the clients' normal defenses. Further research is called for on this interesting paradox, however. Combinations of Matching Dimensions Theoretical literature suggests that matching patients to treatments by utilizing a number of dimensions at once may enhance outcome more than by using any single dimension (Beutler & Clarkin, 1990). There is some evidence to support this suggestion, and this

< previous page

page_95

next page >

< previous page

page_96

next page > Page 96

literature suggests that various matching dimensions may add independent predictive power to treatment outcomes. The research program is designed to test the independent and collective contributions of several of the matching dimensions discussed in this chapter across samples of patients with depression and substance abuse. Both individual and joint effects of various matching dimensions have been revealed in an initial study that included co-habitating couples, one of whom was a problem drinker (Karno, 1997). The Couples Alcoholism Treatment (CAT) program employed an attribute x treatment interaction (ATI) design, manualized cognitive therapy (CT; Wakefield, Williams, Yost, & Patterson, 1996) and family systems therapy (FST; Rohrbaugh, Shoham, Spungen, & Steinglass, 1995), and it carefully monitored outcomes to test the mediating roles of patient variables on response to treatments. The separate and combined effects of matching four client characteristics to corresponding aspects of treatment were assessed: level of functional impairment corresponding with the number and frequency of treatment sessions, level of patient initial subjective distress corresponding with therapist focus on increasing or decreasing level of arousal, level of patient traitlike resistance corresponding to level of therapist directiveness, and patient coping style corresponding with the relative therapeutic focus on symptom change or insight. For the major analysis, the model of psychotherapy was ignored in favor of looking more specifically at actual in-therapy behaviors of therapists using the various models. It was reasoned from prior research that various models overlap in the particular procedures and therapeutic styles represented. Thus, the distinctiveness of the model was considered to be less sensitive to differences among therapist behaviors than direct observations (Beutler, Machado, & Neufeldt, 1994). Thus, all four matching variables were studied by measuring patient variables before treatment and directly observing the nature of therapy process. Patient variables were assessed by using standardized tests (e.g., MMPI, BSI, etc.), as suggested in the foregoing sections. Ratio scores reflecting the amount of a given therapeutic activity relative to a corresponding patient quality (e.g., amount of directiveness per patient nondefensiveness, amount of behavioral focus relative to patient externalization, amount of emotional focus relative to patient initial distress level, intensity of treatment relative to level of functional impairment, etc.) were used to assess the degree of correspondence between each patient and treatment dimension. Hierarchical linear modeling was used to assess the contributions of each patient and therapy dimension separately, and for the four matching dimensions. Effects were assessed and modeled over time (20 treatment sessions and 1-year follow-up). In order to ensure a wide distribution of therapeutic procedures, the two treatments were designed to differ in two dimensions. CT was symptom focused and FST was system focused; CT was therapist directed and FST was patient directed. By chance, they also differed in intensity or concentration (the number of weeks required to complete planned weekly and biweekly sessions), with the 20 sessions of FST taking longer to complete than the 20 sessions of CT. Individual therapists also differed, both within and between treatments, in their levels of directiveness, the application of insight-oriented procedures, and their success in raising patient emotions and arousal. The sample consisted of 62 male and 12 female alcoholics and their partners. The outcome measures reflected the substance abuse status and general psychiatric functioning of the identified alcoholic patients. All identified patients were alcohol dependent, most (85%) were European American, and they had been partnered for an average of 8.3 years. Nearly half of the patients used illicit drugs in addition to being dependent on alcohol.

< previous page

page_96

next page >

< previous page

page_97

next page > Page 97

The growth curve modeling procedure revealed a relatively steady decline of symptoms throughout treatment, independent of treatment type or level of fit between patient and treatment qualities. Estimated abstinence rates were quite low, but the mean rates were consistent with other research on substance abuse treatment (Bellack & Hersen, 1990). At termination, abstinence rates were 42.9% and 37.5% for clients who had received CT and FST, respectively. At follow-up, the rates were 39.3% and 29.7%. An even lower rate of change was noted on general symptoms, independent of alcoholic symptoms. These low average rates of change, along with wide variations of outcomes within treatments, were not unexpected and precisely underline the need to match patients with those treatments for which they are best suited. Analyses of the independent effects of patient and therapy variables revealed that level of patient distress and amount of patient impulsivity were inhibitors of treatment benefit, whereas treatment intensity and level of behavioral/symptom orientation both were associated with the level of improvment achieved. In all instances, there were wide differences from patient to patient indicating the presence of patient variables that were selectively determining treatment response. The analysis of the matching dimensions revealed that by the 6month posttreatment, three of the four matching dimensions studied proved to be related to desirable changes in alcohol usage. Specifically: 1. The correspondent match of level of initial severity (functional impairment) and level of care (average time to complete 20 planned sessions) predicted improvement in substance abuse. Patients whose level of care corresponded with the amount of impairment in functioning (high functioning with low intensive therapies and low functioning with high intensive procedures) tended to show more alcohol-related improvements than those who did not correspond on this patient-treatment dimension. 2. The match between patient resistance and treatment directiveness predicted change in alcohol use. Resistant patients who received nondirective interventions and nonresistant patients who received directive interventions reduced consumption and abuse more than those patients who were mismatched to level of therapist directiveness. 3. When patients were separated into two groupsabstinent and nonabstinentthe relation between therapist activation of affect and patient initial distress emerged as a significant predictor. Patients with low levels of distress treated with emotional activating procedures and those whose high level of distress was treated with emotional reduction procedures were more likely to benefit than their mismatched counterparts. Collectively, the matching dimensions alone accounted for 76% of the variance in alcohol-related changes. Therapist directiveness was also a positive predictor of treatment benefit, independent of its fit with patient qualities. Likewise, low levels of patient impairment are a predictor of positive change. By adding these two independent effects to the equation, it was possible to account for 82% of the variance in outcome, which is an astonishingly high rate of prediction. In comparison to changes in alcohol abuse, general changes in psychiatric functioning were not as efficiently predicted. This is not to say that the patient, treatment, and matching variables were unimportant, however. A patient (impulsivity), two treatment (arousal induction and symptom-focused interventions), and a matching dimension (distress X stress-reduction procedures) accounted for nearly 50% of the variance in outcomes. Improvement was significantly but negatively related to initial patient impulsivity/externalization, and positively to the use both of arousal induction procedures and symptom-focused interventions. At the same time, the level of initial patient distress and the corresponding amount of emphasis on procedures that reduced or raised distress predicted improved psychiatric functioning.

< previous page

page_97

next page >

< previous page

page_98

next page > Page 98

Taken collectively, the foregoing results provide support for the conclusion that patient distress and impairment, coping style, and resistance behaviors are implicated in predictable ways in the selection of psychotherapeutic strategies for treating either depression, substance abuse, or both. Patient coping style, therapist emotional focus, and treatment intensity and support are all qualities that can be identified as valuable regardless of the nature of patients and their problems. An ideal treatment directly focuses on particularly disruptive social symptoms, such as those involved in drug abuse or acting out behaviors; attempts to enhance emotional arousal and processing; adapts the level of emotional focus to the level of patient subjective distress; adapts the symptomatic versus insight/awareness focus of treatment to the level of patient externalization; and adapts the level of confrontation and directiveness to patient level of resistant traits. Selection of Appropriate Instruments The selection of appropriate instruments to measure the patient's presenting symptoms, personality traits, and transitional states is an important concern for the clinician. In each of the previous sections, representative instruments have been presented whose use has been supported empirically. The clinician must keep in mind several important considerations when selecting and using instruments for treatment planning purposes (see Cattell & Johnson, 1986; Goldstein & Hersen, 1990). First, the clinician must select instruments that together measure a variety of dimensions, such as those presented in Table 3.1, which have been selected because of their potential importance in making the required treatment decisions. Some instruments measure more than one dimension, but few measure all of the qualities recommended here. Even if they did, one must also consider the advantages and costs of including multiple instruments, some of which reflect different points of view or embody different perspectives and viewpoints. Both observer ratings and self-report measures should be used whenever possible in order to prevent the unchecked influence of a single perspective from biasing the results. Clinicians require focused tools that are specific to the task of assessing relevant patient qualities that can be used to guide treatment decisions. Hayes, Nelson, and Jarrett (1987) argued that "the role of clinical assessment in treatment utility has been buried by conceptual confusion, poorly articulated methods, and inappropriate linkage to structural psychometric criteria" (p. 973). A clinician-based, research-informed method of identifying traits and relatively enduring states of patients is being developed that will allow clinicians to select psychotherapeutic strategies that best fit different patients. This effort has been based on the method of Systematic Treatment Selection (STS) originally outlined by Beutler and Clarkin (1990) and revised by Gaw and Beutler (1995) and by Beutler, Consoli, and Williams (1995). In its final form, the STS software program2 will help clinicians develop empirically based and validated treatment plans. Clinicians will enter patient information obtained through their usual assessment procedures using an interactive computer interface. The source of this information is designed to be flexible, relying on clinical observations as well as including the option of supplementing these observations with results from 2 The STS Treatment Planning software will be distributed by New Standards, Inc.

< previous page

page_98

next page >

< previous page

page_99

next page > Page 99

standardized psychological tests. Output will include an (editable) intake report; a proposed treatment program that includes recommendations about level of care, treatment intensity, format and modality, medical considerations, risk assessment, and a variety of appropriate research-based treatment packages. It will also allow a clinician to project the course and length of treatment and, through subsequent patient monitoring, the projections will reveal the degree to which the patient's gains are in the limits expected within a given clinic. The program will flag nonresponders for further treatment refinement. Clinician profiling, therapist selection, and problem charting are additional features that will be present in the stand-alone version of the program. Although standardized instruments are available for assessing the various patient dimensions identified by the STS model, these instruments frequently are unnecessarily long and provide a good deal of superfluous information. Hence, a single instrument whose subscales are designed to reveal treatment-relevant characteristics promises to be more time efficient than those conventionally used in patient diagnostic assessment. The concluding section of this chapter describes a few of the most promising dimensions in the STS model, and describes the development and psychometric properties of this assessment procedure, the STS Clinician Rating Form, as applied to the following dimensions that have been reviewed previously: Functional Impairment, Subjective Distress, Problem Complexity, Resistance Potential, Social Support, and Coping Style. Methods Patient Participants. Participants in this study (Fisher, Beutler, & Williams, in press) included both patients and clinicians. Two archival and one prospective patient samples were utilized in developing the measure. Sample 1 was the main, prospective sample, with archival Samples 2 and 3 being used to increase sample size and generalization for the reliability and construct validity assessments of the STS Clinician Rating Form. The various intake procedures used in the separate samples were given to trained clinicians and provided the basis for the completion of the STS Clinician Rating Form (STS). The STS ratings allowed the discrepant information and tests derived from three different samples of patients to be translated into a common set of treatment-relevant dimensions. Sample 1 was comprised of ambulatory outpatients who presented with nonsubstance abuse primary diagnoses, average intellectual ability, and who had the ability to read at a sixth-grade level or more. Patients were diagnosed as having major depression (37%), dysthymia (37%), anxiety disorders (8%), or transient situational disturbances and personality disorders (18%). An initial sample of 48 individuals were screened and 46 elected to participate. The participants were largely Caucasian (84%) or Latino (11%), young adults (age = 34.55 years, SD = 11.71), and female (31 females, 15 males). Sample 2 consisted of 105 individuals entering the CAT project, the previously discussed, federally funded study on the treatment of alcoholism (Beutler, Patterson, et al., 1993). The participants from this sample comprised those who initially underwent intake evaluation and were identified as having substance abuse or substance dependent diagnoses. The 90 male participants initially assessed for inclusion had an average age of 37.78 years (SD = 8.81), and the 15 females had an average age of 40.00 years (SD = 7.06). Eighty-two percent of the participants were Caucasian.

< previous page

page_99

next page >

< previous page

page_100

next page > Page 100

Sample 3 consisted of 63 individuals who were reliably diagnosed as having a major depressive disorder (Beutler, Engle et al., 1991). These individuals were recruited and treated as part of a federally funded, randomized clinical trial study of cognitive, experiential, and self-directed therapies. Referred individuals were screened by telephone and then assessed by an independent clinician and subjected to a variety of standardized interviews and tests to assure compliance with depressive diagnostic and severity criteria. There were 22 male and 41 female participants, ranging in age from 22 to 76 years. They averaged 48.77 (SD = 14.95) and 45.41 (SD = 45.41) years of age, respectively. The sample was dominantly Caucasian (92%). Clinician Raters. Experienced (over 5 years) professional psychologists who were affiliated with the Psychotherapy Research Program at the University of California, Santa Barbara, were recruited and paid to complete the STS Clinician Rating Form on the three samples. They included three females and one male: One was Asian American and one was African American; two were self-employed in full-time clinical or consultive practice; one was a consultant to educational institutions; one was engaged in full-time clinical research; and one directed the outpatient clinic from which Sample 1 was obtained. Clinicians were trained to ensure a common level of familiarity with the various instruments and concepts that served as the basis for rating patient variables. Clinicians were first given a detailed description of the patient variables that were assessed in the STS Clinician Rating Form, and questions about the various constructs measured were discussed and answered. Readings were provided as necessary to supplement training. Specific training on the STS Clinician Rating Form proceeded by having clinicians complete ratings on various (nonstudy) patients, drawing from intake interviews (videotapes), whatever psychological tests were available, and clinical notes. They discussed the ratings and then rerated the tapes until they were able to produce criteria levels of initial agreement (k > .70) on one of the samples. At various points, the clinician ratings were compared to expert-derived criteria to ensure the achievement of accuracy. Patient Variables. The reliability phase of the study was based on Sample 1 data and focused on clinician responses to the STS Clinician Rating Form. To extend generalization, the construct validity phase used the additional samples, to the degree that they included data that was translatable to the current study. No satisfactory, standardized measures of level of social support, complexity, and level of functional impairment were available in the various samples, so these constructs were not assessed except from the STS Clinician Rating Form. Thus, information is available on reliability and expert-criteria validity, but not for other aspects of validity for these dimensions. The patient variables included in the construct validity phase included subjective distress, coping style (internal and external dimensions), and trait resistance. These three dimensions were assessed in two ways. Criterion measures of the constructs from standardized, selfreport tests were contrasted with ratings on the STS Clinician Rating Form as an evaluation of the construct validity of the latter instrument. Psychological Test Measures. The selection of criteria scores for the patient dimensions was constrained by the need to use instruments that were used in the various samples at the time of intake. Two of the three samples (Samples 1 and 2) included a variety of specific subscales from the MMPI-2, and these samples played a part in the discriminant validity phase of this study. Because of this same constraint, potentially important measures like the State-Trait Anxiety Inventory (STAI) were not available

< previous page

page_100

next page >

< previous page

page_101

next page > Page 101

as measures of subjective distress. The particular tests and scores used to operationalize the dimensions of subjective distress, coping style, and resistance were comprised of composite, self-report scores in order to obtain more stable and representative measures than could be provided by a single test or subtest score. Subjective distress was indexed by two different measures, reflecting respectively, statelike and traitlike qualities. The Pt subscale from the MMPI-2 (Butcher, 1990; Graham, 1993) was extracted as one measure of subjective distress. A second measure was based on the Beck Depression Inventory (Beck et al., 1961). Coping style was represented by two separate composite indicators, reflecting the separate dimensions of Externalization and Internalization. Both dimensions were extracted from the MMPI and MMPI-2 (Graham, 1993). These two dimensions of coping were used to separately assess the construct validity of comparable STS scales and then both were combined as a ratio to index a relative measure of Externalization as originally suggested by Welsh (1952). The mean elevations of four separate MMPI scales were used to index each dimension, based on prior research with this instrument (Beutler, Engle et al., 1991; Beutler & Mitchell, 1981; Calvert et al., 1988; Welsh, 1952). Externalization was indexed by a combination of Scales 3 (Hy), 4 (Pd), 6 (Pa), and 9 (Ma). Internalization was indexed by a combination of Scales 1 (Hs), 2 (D), 7 (Pt), and 0 (Si). A single measure of coping style was constructed, following the rationale of Welsh (1952) by constructing a ratio of Externalization to Internalization scores. Resistance traits have been the most difficult of the dimensions to measure (Beutler, Sandowicz et al., 1996). To reflect this complex dimension, a composite scale was constructed from the MMPI/MMPI-2. Through consultation with experts in the field and reviews of prior research, scales were identified that conceptually reflected aspects of patient resistance. An intercorrelation of many of these scales, using the MMPI-2 normative sample, allowed for the identification of several that conceptually reflected a common dimension. These subscales were moderately correlated with the Therapeutic Reactance scale (TRS; Dowd et al., 1991), earning correlations ranging from .31 to .64 (M = .53; all ps < .05). The selected subscales were subjected to a factor analysis, using the combined samples, and a composite factor that included the following scales was obtained: Cn (Control), Do (Dominance, weighted inversely), and Pa (Paranoia). The mean of these scales was then used as a final measure of resistance traits in the analysis of construct validity. STS Clinician Rating Form. The STS Clinician Rating Form was developed in a series of steps, beginning with a compilation of items that appeared to relate to the various targeted dimensions. The initial pool of 260 items was reduced to 226 by visual inspection, eliminating obvious overlaps and duplications. Dimensions assessed by the rating form included the patient's degree of Functional Impairment, Depression, Primary Problem/Disorder, Area of Social Impairment (e.g., family, partner, work, legal), Level of Social Support, Self-reported Distress, Clinician-rated Distress, Self-esteem, Externalization, Internalization, and Traitlike Level of Resistance. The items on these scales require clinician ratings based on all information available to them. The STS was designed to allow the clinician to subjectively weight all the information available at the time of intake to make summary ratings on a common set of scales. For the current study, clinicians were provided with a videotape of an early interview with the patient, clinical notes from the intake clinician, and whatever intake tests were part of the protocol governing the collection of data for that sample. For

< previous page

page_101

next page >

< previous page

page_102

next page > Page 102

those portions of the criteria validity assessment that did not include relations with psychological tests, all three samples were used (N = 216). The STS items were presented in a checklist format and clinicians were required to provide a dichotomous rating (present-not present) on each of the 226 initial items. Procedures. The procedures for addressing the questions posed in this study are described in sequence. Interrater Reliability of STS. Interrater reliability assessment took place in two stages using Sample 1 data. In the first stage, raters were compared to a sample of cases on which a criteria-level standard of accuracy had been established. In the second stage, clinician raters were paired randomly with one another and interrater reliabilities were computed on all pairs. In both cases, clinicians were independently provided with the following information and asked to review it before being presented with the STS Clinician Rating Form to complete: intake notes by the original intake clinician, a history of the patient's problem, the intake psychological test data, and a videotape of the intake session. Both the overall reliability of the STS Clinician Rating Form and the reliability of the specific subscales were computed. Sample 1 cases were assigned to rater pairs to ensure that each clinician rated at least 10 cases and all possible pairs were represented. In addition to making independent ratings, the clinician pairs also met and made consensual ratings for each patient on the STS Clinician Rating Form items. Construct Validity. To assess discriminant and convergent validity, Samples 1 and 2 were collapsed. STS Clinician Rating Forms were completed on all patients after the clinician reviewed the intake materials described in the previous paragraphs. The material supplied to the raters who assessed patients in Sample 2 was similar to that used for rating patients in Sample 1, varying only in the specific psychological tests employed by the varying research protocols. The intake material included a history of the patient and problem, a battery of intake tests used in the original research protocol, and a videotape of the patient's intake evaluation. The summary scores from the STS Clinician Rating Formrepresenting Subjective Distress, Internalization, Externalization, and Resistance traitswere compared both against each other (discriminant validity) and against the same constructs derived from standardized self-report tests (convergent validity). For these analyses, the STS Clinician Rating Form scores from a primary rater for Sample 2 and the consensual ratings derived from Sample 1 were used. In Samples 2 and 3, individual clinicians rated each sample, with a randomly selected 20% of cases being double rated to check reliabilities. Results Reliability Assessment. To indicate level of reliability, three types of concordance agreements were calculated: overall agreement of each possible rater pair, agreement of each rater with the sum of all other raters, and specific levels of interrater agreement for each of the three dimensions of interest to this study. These same data were used to calculate the agreement-level-rater-level concordance estimates. The mean coefficient of agreement of each rater with every other rater served

< previous page

page_102

next page >

< previous page

page_103

next page > Page 103

this purpose. The reliabilities of the separate STS subscales were also separately computed to derive an index of scale reliabilities. Overall interrater agreement was computed on all subscales of the STS Clinician Rating Form using Sample 1 data. The calculations reflected the mean degree of agreement across all rater pairs. The mean interrater concordance (K) coefficients ranged from .79 (Functional Impairment) to .99 (Presence of Eating Disorder), with an average coefficient of concordance of .84. Rater-level agreement was computed for each of the five individual raters, averaging across their pairings with each of the other raters. Based on a sample of 15 pairings for each rater, the mean coefficients of concordance ranged from .80 to .89. Specific levels of agreement were assessed by comparing raters against one another, again relying on Sample 1 data. The mean levels of interrater agreement were: .79 (Functional Impairment), .82 (Subjective Distress), .83 (Complexity), .80 (Resistance), .75 (Social Support), .86 (Internalization), and .86 (Externalization). Construct Validity. Criterion validity of the entire STS Clinician Rating Form was assessed in Sample 1 by comparing ratings of clinicians to an ''expert" standard. In addition, three other comparisons were initiated in order to determine the construct validity of four STS Clinician Rating Form dimensions (Subjective Distress, Externalization, Internalization, Resistance Traits): agreement with expert criteria ratings, convergent validity, discriminant validity, and concurrent validity. Expert Criteria Validity. Overall expert criteria agreement was calculated by averaging the concordance estimate of each rater, with two randomly selected "expert-rated" cases from Sample 1. Twelve cases were used as criteria samples for checking rater accuracy. The criteria of accuracy was consensual ratings from two expert raters who had the greatest familiarity with the cases. The mean, individual concordance estimates (K) with these criteria ranged from .69 to .80 (two raters). The overall mean concordance coefficient across the 260 STS items was .77, indicating a satisfactory level of criterion agreement across clinician raters. When these criteria ratings were applied to the seven focal scales, the following mean K values were obtained: .75 (Functional Impairment), .84 (Subjective Distress), .69 (Complexity), .83 (Resistance Potential), .76 (Social Support), .85 (Internalization), and .86 (Externalization). Convergent Validity. The test of convergent validity proceeded in two steps. The first step constituted a correlation between each item and a summary score for each dimension, drawing from all three samples. These data were used to reduce the items used in each scale. Items that did not correlate significantly (p £ .01) with the summary criteria were eliminated. Based on this step, the subjective distress subscale was reduced from 40 to 29 items; the Externalization subscale was reduced to 21 items; the Internalization subscale was reduced to 12 items; and the Resistance Traits subscale was reduced to 24 items. This refined item listing was used in all subsequent validational steps of the current project. The next step consisted of a comparison of the refined STS scales to the psychological test criteria. In this step, only Samples 1 and 2 were used because the four dimensions on which external validity was assessed were available. A series of Pearson product-moment correlations were computed between these refined STS dimensions and the independently derived criteria from standardized psychological tests of the same dimensions. The correlations are reported in Table 3.2.

< previous page

page_103

next page >

< previous page

page_104

next page > Page 104

TABLE 3.2 Intercorrelations of STS and Psychological Test Constructs Test Dimensions STS Ratings Subjective Subjective Externalization InternalizationResistance Distress Distress (MMPI) (MMPI) Traits (MMPI) (BDI) Subjective .63** .65** .42** .64** .47** Distress (STS) n = 89 n = 90 n = 89 n = 89 n = 81 Externalization -.07 -.06 .35** -.18 .21 (STS) n = 89 n = 90 n = 89 n = 81 n = 81 Internalization .34** .36** -.11 .42** .20 (STS) n = 89 n = 90 n = 89 n = 89 n = 81 Resistance .09 .07 .33* .08 .43** (STS) n = 89 n = 90 n = 89 n = 89 n = 81 *p < .01 **p < .001 The results of these analyses were mixed. The STS clinician rating of subjective distress correlated (p < .001) at the highest levels with the external criteria (r = .63 and .65 with Pt and BDI, respectively). The correlations of the other STS dimensions with relevant external self-report criteria were all significant, but weaker. They varied from .35 for the correspondence between STS and MMPI indices of Externalization to .43 for correspondent measures of resistance traits. As a further check of resistance traits, composite ratios of external to internal coping styles were constructed for both the STS and the MMPI. These two summary measures of relative coping style correlated at level of .46 (p < .001). Discriminant Validity. Another aspect of construct validity is the determination of how the various subscales and measures relate to one another. Discriminant validity requires that the three constructs are both relatively independent and reveal a prescribed pattern of relation with one another. Specifically, it was expected that the two coping style dimensions (Internalization and Externalization) would be significantly and negatively correlated; Subjective Distress was expected to be moderately correlated with Internalization but not Externalization; and Resistance Traits were expected to be moderately correlated with Externalization but not with either Internalization or Subjective Distress. The expected relations were obtained as revealed in Table 3.3. Internalization and Externalization were negatively correlated at a moderate level (r = -.44); Subjective Distress was correlated with Internalization (r = .48) but not with Externalization (r = -.03); and Resistance Traits were highly correlated with Externalization (r = .70) but only modestly with the other dimensions (r = .21 and -.26). Thus, the pattern of intercorrelations supported the discriminant validity of the three dimensions (collapsing Internalization and Externalization). A second test of discriminant validity cross-matched the STS dimensions with the external, psychological tests. The same pattern of relations should be revealed as found when the STS dimensions are intercorrelated with one another. A look back to Table 3.2 reveals that these patterns were in evidence, though they are not as striking as the patterns based on the internal correlations of the STS dimensions. Specifically, (a) STS Internalization and Externalization were correlated in a negative direction (r = -.18) as

< previous page

page_104

next page >

< previous page

page_105

next page > Page 105

TABLE 3.3 Intercorrelation of STS Dimensions Dimension InternalizationExternalization Resistance Traits Subjective .48** -.03 .21* Distress n = 93 n = 93 n = 93 Internalization -.44** -.26 n = 204 n = 204 Externalization .70** n = 141 *p < .01 **p < .001 expected; (b) STS subjective distress was quite highly correlated with MMPI Internalization (r = .64), but it was also correlated in a positive direction with MMPI Externalization (r = .42); and (c) STS resistance traits were correlated with MMPI externality but not with either MMPI indicators of distress or internalization. Concurrent Validity. An additional test was undertaken to ensure the STS's construct validity using all three samples. The samples reflected major depression (Sample 3), mixed psychiatric patients (Sample 1), and alcoholics (Sample 2), thus certain differences in levels of the four dimensions were expected across samples to assure concurrent validity: (a) the homogeneous depressed sample (Sample 3) would have higher subjective distress scores than the other two groups, with the alcoholic sample (Sample 2) having the lowest; (b) the alcoholic sample (Sample 2) should have the highest level of Externalization scores and the homogeneous depressed sample (Sample 3) should earn the highest Internalization scores; and (c) the alcoholic sample (Sample 2) should also be distinguished by the relatively high levels of Resistance Traits, compared to the other groups. Means and standard deviations are reported in Table 3.4. A series (four) of one-way analyses of variance that compared the three samples on the five dimensions demonstrated the expected relations. Analyses of variance revealed a significant sample effect for all variables: (a) Subjective Distress, F(2, 203) = 20.16, p < .001; (b) Internalization, F(2, 203) = 12.38, p < .001; (c) Externalization, F(2, 203) = 26.36, p < .001; and (d) Resistance Traits, F(2, 203) = 13.33, p < .001. In all cases, a post hoc Tukey test revealed significant ps < .05, favoring the expected group differences. The foregoing results are promising and suggest the value of the STS treatment planning procedure and software. However, the predictive validity of the various patient dimensions is still under investigation, and at the time of this writing, a validation of the predictive algorithms for developing treatment plans is being completed. By the time TABLE 3.4 Means and Standard Deviations of Samples by STS Dimension Variable Sample 1 Sample 2 Sample 3 9.17 (6.37) Subjective Distress 10.68 (4.60) 13.55 (4.34) 5.01 (4.02) 8.46 (4.73) Internalization 6.85 (5.13) 5.19 (5.30) Externalization 5.89 (5.11) 12.73 (8.60) 7.43 (6.07) Resistance Traits 10.76 (7.03) 14.34 (9.25) Note. Because of nonavailability of all self-report measures, only Samples 1 and 2 were used in the computations for Subjective Distress.

< previous page

page_105

next page >

< previous page

page_106

next page > Page 106

of its release, a small normative database representing the three samples described in the foregoing and additional beta samples of specialized psychiatric patients will be available. This normative base of approximately 300 individuals will allow the intiatiation of clinical applications. The next step planned in the construction and validation of the STS is the development of an interactive database that will systematically modify the algorithms with each new patient entry to improve and refine the predictive efficiency of the program for each clinical setting. That is, with continued use in a given system, local normative data will supplement the general database as the source of predictive algorithms in order to allow increasing specificity in the predictions and treatment recommendations with each use. A feature will be included to allow the program to use local resources and norms to identify specific therapists whose success rates are highest for patients who represent different patterns of demography, functional impairment, distress, social support, coping style, and resistance potential. Conclusions Psychological tests have been used widely in the prediction of response to treatment. This chapter has summarized the status of research on some of the more promising of these dimensions and their associated measures. Seven dimensions appear to be promising for use in planning treatment: Functional Impairment, Subjective Distress, Readiness for (or stage) of Change, Problem Complexity, Resistance Potential or Inclination, Social Support Level, and Coping Style. Some of these dimensions, such as social support and coping style, have individual components that appear promising as well. To summarize, the following conclusions appear to be justified: 1. Functional Impairment. Impairment level serves as an index of progress in treatment, as well as a predictor of outcome. High impairment may also serve as a contra-indicator for the use of insight- and relationship-oriented psychotherapies. Very impairing symptoms seem to indicate the value of pharmacological interventions or problem-oriented approaches, and mild to moderate levels of impairment may be conducive to or predictive of a positive response to a variety of psychotherapy models. 2. Subjective Distress. Subjective distress is directly related to improvement among nonsomatic depressed and anxious patients. Distress is more complexly related to improvment among those with somatic complaints. Distress level may be a particular marker for the differential application of self-directed and traditional therapies, and may indicate the need to use procedures that either lower or raise distress to enhance patient motivation. High distress serves as a positive marker for self-directed treatment among nonsomatic patients, whereas low distress may serve such a function among somatically disturbed patients. 3. Readiness for Change. Contemporary studies on patient stages of change are very promising. The higher the stage or readiness level of the patient, the more positive the outcomes of treatment. Some limited evidence suggests that those at the precontemplative (preconceptual) and contemplative (conceptual) stages of change may be particularly suited for interventions that raise consciousness and facilitate self-exploration. 4. Problem Complexity. Both symptom-focused psychopharmacological interventions and symptomfocused psychological interventions may be indicated most clearly among patients whose conditions are acute, or are relatively uncomplicated by concomitant personality disorder and interpersonal conflict or dynamic conflicts associated with symbolic internal conflicts. At the same time, comparisons across treatment models suggest that treatment efficacy is enhanced when the breadth of the interventions used correspond with the complexity of the problem presented. 5. Resistance Potential. Traitlike resistance is a reasonably good predictor of the differential effects of directive and nondirective therapies. Therapist guidance, use of status, and control are

< previous page

page_106

next page >

< previous page

page_107

next page > Page 107

contraindicated among resistance-prone individuals. On the other hand, these patients respond quite well to paradoxical and nondirective interventions, whereas low resistance-prone patients respond well to directive interventions and therapist guidance. State reactions that suggest the presence of resistance also may be indicators for altering how material is presented within treatment sessions. 6. Social Support. Dimensions of objective social support, subjectively experienced support, and social investment have implications for treatment planning, even serving as indices and predictors of differential response to various psychosocial interventions. Some aspects of support even serve as contraindicators for long-term treatment. Patients with low levels of objective social support, or who feel unsupported by those around them, are candidates for long-term or intensive treatment. Their level of improvement corresponds with the intensity of treatment. However, those with good support systems do not respond well to intensive or long-term treatments. Their improvement appears to reach an asymptote and may even decline with continuing treatment. Level of social investment may outweigh actual support availability, however, and may be a mediator that increases the value of relationship-oriented psychotherapy over behavioral and symptomatically oriented treatments. 7. Coping Style. Patient level of impulsivity, or coping style, has consistently been found to be a differential predictor of the value of cognitive-behavioral and relationship, or insight-oriented, treatments. Comparatively, externalizing, impulsive patients respond better to behaviorally oriented therapies and constricted and introspective patients tend to respond better to insight- and relationship-oriented therapies. The types of tests used to assess these various dimensions often have implications for assessment of outcome itself. Measures of Functional Impairment, Subjective Distress, and Problem Complexity may be especially valuable for assessing outcome or predicting prognosis to treatment generally. In contrast, both statelike measures of resistance and readiness for change, and trait measures of resistance potential, complexity, and coping styles appear promising for selecting treatments that vary in directiveness and insight foci, respectively. The measures used to assess all of these patient dimensions are often drawn from omnibus personality measures and, in the case of coping style measures, may reflect complex processes that encompass both unconscious and conscious experience. Taken together, the research reported in this brief and selective review suggests that various combinations of dimensions allow discrimination among treatment variables and may point to directions in which the development and applications of treatments may evolve in clinical practice. Accordingly, this chapter has reported the initial development of the STS, a clinician-based measure that promises to tap a multitude of relevant treatment planning dimensions that can be used to plan a treatment with the potential to enhance the efficiency of treatment. This procedure is currently being developed as a computer-based assessment tool to complement the use of standardized tests. The initial data presented here is promising, but suggests that the information derived from clinicians may not be as sensitive as that based on patient self-reports. A combination of established tests and structured clinician judgments may be optimal for deriving the dimensions that can be used in treatment planning. References Ackerman, D. L., Greenland, S., Bystritsky, A., Morgenstern, H., & Katz, R. J. (1994). Predictors of treatment response in obsessive-compulsive disorder: Multivariate analyses from a multicenter trial of Clomipramine. Journal of Clinical Psychopharmacology, 14, 247-253. American Psychiatric Association (1983). Diagnostic and statistical manual of mental disorders (3rd ed.). Washington, DC: Author.

< previous page

page_107

next page >

< previous page

page_108

next page > Page 108

American Psychiatric Association (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Antonuccio, D. O., Danton, W. G., & DeNelsky, G. Y. (1995). Psychotherapy versus medication for depression: Challenging the conventional wisdom with data. Professional Psychology: Research and Practice, 26, 574-585. Arkowitz, H. (1991, August). Psychotherapy integration: Bringing psychotherapy back to psychology. Paper presented at the annual meeting of the American Psychological Association, San Francisco, CA. Barber, J. P. (1989). The Central Relationship Questionnaire (version 1.0). Unpublished manuscript, University of Pennsylvania, School of Medicine, Philadelphia. Barber, J. P., & Muenz, L. R. (1996). The role of avoidance and obsessiveness in matching patients to cognitive and interpersonal psychotherapy: Empirical findings from the treatment for depression collaborative research program. Journal of Consulting and Clinical Psychology, 64, 951-958. Barron, F. (1953). An ego strength scale which predicts response to psychotherapy. Journal of Consulting Psychology, 17, 327-333. Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-569. Beitman, B. D. (1987). The structure of individual psychotherapy. New York: Guilford. Bellack, A. S., & Hersen, M. (Eds.). (1990). Handbook of comparative treatments for adult disorders. New York: Wiley. Beutler, L. E. (1983). Eclectic psychotherapy: A systematic approach. New York: Pergamon. Beutler, L. E. (1991). Have all won and must all have prizes? Revisiting Luborsky, et al.'s verdict. Journal of Consulting and Clinical Psychology, 59, 226-232. Beutler, L. E., & Berren, M. (1995). Integrative assessment of adult personality. New York: Guilford. Beutler, L. E., & Clarkin, J. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. New York: Brunner/Mazel. Beutler, L. E., Consoli, A. J., & Williams, R. E. (1995). Integrative and eclectic therapies in practice. In B. Bongar & L. E. Beutler (Eds.), Comprehensive textbook of psychotherapy: Theory and practice (pp. 274-292). New York: Oxford University Press. Beutler, L. E., Engle, D., Mohr, D., Daldrup, R. J., Bergan, J., Meredith, K., & Merry, W. (1991). Predictors of differential and self directed psychotherapeutic procedures. Journal of Consulting and Clinical Psychology, 59, 333-340. Beutler, L. E., Frank, M., Scheiber, S. C., Calvert, S., & Gaines, J. (1984). Comparative effects of group psychotherapies in a short-term inpatient setting: An experience with deterioration effects. Psychiatry, 47, 66-76. Beutler, L. E., & Harwood, T. M. (1995) Prescriptive psychotherapies. Applied and Preventive Psychology, 4, 89-100. Beutler, L. E., & Hodgson, A. B. (1993). Prescriptive psychotherapy. In G. Stricker & J. R. Gold (Eds.), Comprehensive handbook of psychotherapy integration (pp. 151- 163). New York: Plenum. Beutler, L. E., Kim, E. J., Davison, E., Karno, M., & Fisher, D. (1996). Research contributions to improving managed health care outcomes. Psychotherapy, 33, 197-206. Beutler, L. E., Machado, P.P.P., Engle, D., & Mohr, D. (1993). Differential patient X treatment maintenance of treatment effects among cognitive, experiential, and self-directed psychotherapies. Journal of Psychotherapy Integration, 3, 15-32. Beutler, L. E., Machado, P.P.P, & Neufeldt, S. (1994). Therapist variables. In A. E. Bergin & S. L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 229-269). New York: Wiley. Beutler, L. E., & Mitchell, R. (1981). Psychotherapy outcome in depressed and impulsive patients as a function of analytic and experiential treatment procedures. Psychiatry, 44, 297-306. Beutler, L. E., Mohr, D. C., Grawe, K., Engle, D., & MacDonald, R. (1991). Looking for differential effects: Cross-cultural predictors of differential psychotherapy efficacy. Journal of Psychotherapy Integration, 1, 121142. Beutler, L. E., & Rosner, R. (1995). Introduction. In L. E. Beutler & M. Berren (Eds.), Integrative assessment of adult personality (pp. 1-24). New York: Guilford. Beutler, L. E., Sandowicz, M., Fisher, D., & Albanese, A. L. (1996). Resistance in psychotherapy:

< previous page

page_109

next page > Page 109

What can be concluded from empirical research? In Session: Psychotherapy in Practice, 2, 77-86. Beutler, L. E., Wakefield, P., & Williams, R. E. (1994). Use of psychological tests/instruments for treatment planning. In M. E. Maruish (Ed.), Use of psychological testing for treatment planning and outcome assessment (pp. 55-74). Hillsdale, NJ: Lawrence Erlbaum Associates. Billings, A. G., & Moos, R. H. (1984). Chronic and nonchronic unipolar depression: The differential role of environmental stressors and resources. Journal of Nervous and Mental Disease, 172, 65-75. Blanchard, E. B., Schwarz, S. P., Neff, D. F., & Gerardi, M. A. (1988). Prediction of outcome from the selfregulatory treatment of irritable bowel syndrome. Behavior, Research and Therapy, 26, 187-190. Brown, T. A., & Barlow, D. H. (1995). Long-term outcome in cognitive-behavioral treatment of panic disorder: Clinical predictors and alternative strategies for assessment. Journal of Consulting and Clinical Psychology, 63, 754-765. Butcher, J. N. (1990). The MMPI-2 in psychological treatment. New York: Oxford University Press. Butcher, J. N. (Ed). (1995). Clinical personality assessment: Practical approaches. New York: Oxford University Press. Calvert, S. J., Beutler, L. E., & Crago, M. (1988). Psychotherapy outcome as a function of therapist-patient matching on selected variables. Journal of Social and Clinical Psychology, 6, 104-117. Caspar, F. (1995). Plan analysis: Toward optimizing psychotherapy. Seattle: Hogrefe & Huber. Cattell, R. B., & Johnson, R. C. (Eds.). (1986). Functional psychological testing. New York: Brunner/Mazel. Cooney, N. L., Kadden, R. M., Litt, M. D., & Getter, H. (1991). Matching alcoholics to coping skills or interactional therapies: Two-year follow-up results. Journal of Consulting and Clinical Psychology, 59, 598601. Costa, P. T., & McCrae, R. R. (1985). The NEO Personality Inventory manual. Odessa, FL: Psychological Assessment Resources. Crits-Christoph, P., Cooper, A., & Luborsky, L. (1988). The accuracy of therapists' interpretations and the outcome of dynamic psychotherapy. Journal of Consulting and Clinical Psychology, 56, 490-495. Crits-Christoph, P., & Demorest, A. (1988, June). The development of standard categories for the CCRT method. Paper presented at the Society for Psychotherapy Research, Santa Fe, NM. Crits-Christoph, P., & Demorest, A. (1991). Quantitative assessment of relationship theme components. In M. J. Horowitz (Ed.), Person schemas and maladaptive interpersonal patterns (pp. 197-212). Chicago: University of Chicago Press. Crits-Christoph, P., Demorest, A., & Connolly, M. B. (1990). Quantitative assessment of interpersonal themes over the course of psychotherapy. Psychotherapy, 27, 513-521. Crits-Christoph, P., Luborsky, L., Dahl, L., Popp, C., Mellon, J., & Mark, D. (1988). Clinicians can agree in assessing relationship patterns in psychotherapy. Archives of General Psychiatry, 45, 1001-1004. Derogatis, L. R. (1994). SCL-90: Administration, scoring and procedures manual (3rd ed.). Minneapolis, MN: National Computer Systems. DeRubeis, R. J., Evans, M. D., Hollon, S. D., Garvey, M. J., Grove, W. M., & Tuason, V. B. (1990). How does cognitive therapy work? Cognitive change and symptom change in cognitive therapy and pharmacotherapy for depression. Journal of Consulting and Clinical Psychology, 58, 862-869. DiNardo, P. A., O'Brien, G. T., Barlow, D. H., Waddell, M. T., & Blanchard, E. B. (1983). Reliability of DSMIII anxiety disorder categories using a new structured interview. Archives of General Psychiatry, 40, 1070-1075. Dowd, E. T., Milne, C. R., & Wise, S. L. (1991). The Therapeutic Reactance Scale: A measure of psychological reactance. Journal of Counseling and Development, 69, 541-545. Dowd, E. T., Wallbrown, F., Sanders, D., & Yesenosky, J. M. (1994). Psychological reactance and its relationship to normal personality variables. Cognitive Therapy and Research, 18, 601-612. Edwin, D., Anderson, A. E., & Rosell, F. (1988). Outcome prediction of MMPI in subtypes of anorexia nervosa. Psychosomatics, 29, 273-282.

< previous page

page_109

next page >

< previous page

page_110

next page > Page 110

Elkin, I., Shea, T., Watkins, J. T., Imber, S. D., Sotsky, S. M., Collins, J. F., Glass, D. R., Pilkonis, P. A., Leber, W. R., Docherty, J. P., Feister, S. J., & Parloff, M. B. (1989). National Institute of Mental Health treatment of depression collaborative research program. Archives of General Psychiatry, 46, 971-982. Ellicott, A., Hammen, C., Gitlin, M., Brown, G., & Jamison, K. (1990). Life events and the course of bipolar disorder. American Journal of Psychiatry, 147, 1194-1198. Eysenck, H., & Eysenck, S.B.G. (1964). Manual of the Eysenck Personality Inventory. London: University of London Press. Eysenck, H. J., & Eysenck, S.B.G. (1969). Personality structure and measurement. San Diego: Knapp. Fahy, T. A., & Russell, G.F.M. (1993). Outcome and prognostic variables in bulimia nervosa. International Journal of Eating Disorders, 14, 135-145. Fairburn, C. G., Peveler, R. C., Jones, R., Hope, R. A., & Doll, H. A. (1993). Predictors of 12-month outcome in bulimia nervosa and the influence of attitudes to shape and weight. Journal of Consulting and Clinical Psychology, 61, 696-698. Fisher, D., Beutler, L. E., & Williams, O. B. (in press). STS Clinician Rating Form: Patient assessment and treatment planning. Journal of Clinical Psychology. Follette, W. C., & Houts, A. C. (1996). Models of scientific progress and the role of theory in taxonomy development: A case study of the DSM. Journal of Consulting and Clinical Psychology, 64, 1120-1132. Frances, A., Clarkin, J., & Perry, S. (1984). Differential therapeutics in psychiatry. New York: Brunner/Mazel. Frank, J. D., & Frank, J. B. (1991). Persuasion and healing (3rd ed.). Baltimore: Johns Hopkins University Press. Fremouw, W. J., & Zitter, R. E. (1978). A comparison of skills training and cognitive restructuring-relaxation for the treatment of speech anxiety. Behavior Therapy, 9, 248-259. Gaw, K. F., & Beutler, L. E. (1995). Integrating treatment recommendations. In L. E. Beutler & M. Berren (Eds.), Integrative assessment of adult personality (pp. 280-319). New York: Guilford. Goldfried, M. R. (1991). Research issues in psychotherapy integration. Journal of Psychotherapy Integration, 1, 5-25. Goldstein, G., & Hersen, M. (Eds.). (1990). Handbook of psychological assessment (2nd ed.). New York: Pergamon. Gough, H. G. (1987). California Psychological Inventory administrator's guide. Palo Alto, CA: Consulting Psychologists Press. Graham, J. R. (1987). The MMPI: A practical guide (2nd ed.). New York: Oxford University Press. Graham, J. R. (1993). The MMPI-2: Assessing personality and psychopathology. New York: Oxford University Press. Groth-Marnat, G. (1997). Handbook of psychological assessment (3rd ed.). New York: Wiley. Hamilton, M. (1967). Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology, 6, 278-296. Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1987). The treatment utility of assessment: A functional approach to evaluating assessment quality. American Psychologist, 42, 963-974. Hoencamp, E., Haffmans, P. M. J., Duivenvoorden, H., Knegtering, H., & Dijken, W. A. (1994). Predictors of (non-) response in depressed outpatients treated with a three-phase sequential medication strategy. Journal of Affective Disorders, 31, 235-246. Hollon, S. D., & Beck, A. T. (1986). Research on cognitive therapies. In S. L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed., pp. 443-482), New York: Wiley. Hooley, J. M., & Teasdale, J. D. (1989). Predictors of relapse in unipolar depressives: Expressed emotion, marital distress, and perceived criticism. Journal of Abnormal Psychology, 98, 229-235. Horowitz, M., Marmar, C., Krupnick, J., Wilner, N., Kaltreider, N., & Wallerstein, R. (1984). Personality styles and brief psychotherapy. New York: Basic Books. Horvath, A. O., & Goheen, M. D. (1990). Factors mediating the success of defiance- and compliance-based interventions. Journal of Counseling Psychology, 37, 363-371. Hunsley, J. (1993). Treatment acceptability of symptom prescription techniques. Journal of Counseling Psychology, 40, 139-143.

Imber, S. D., Pilkonis, P. A., Sotsky, S. M., Elkin, I., Watkins, J. T., Collins, J. F., Shea,

< previous page

page_110

next page >

< previous page

page_111

next page > Page 111

M. T., Leber, W. R., & Glass, D. R. (1990). Mode-specific effects among three treatments for depression. Journal of Consulting and Clinical Psychology, 58, 352-359. Jacob, R. G., Turner, S. M., Szekely, B. C., & Eidelman, B. H. (1983). Predicting outcome of relaxation therapy in headaches: The role of "depression." Behavior Therapy, 14, 457-465. Joyce, A. S., & Piper, W. E. (1996). Interpretive work in short-term individual psychotherapy: An analysis using hierarchical linear modeling. Journal of Consulting and Clinical Psychology, 64, 505-512. Kadden, R. M., Cooney, N. L., Getter, H., & Litt, M. D. (1989). Matching alcoholics to coping skills or interactional therapies: Post-treatment results. Journal of Consulting and Clinical Psychology, 57, 698-704. Karno, M. (1997). Identifying patient attributes and elements of psychotherapy that impact the effectiveness of alcoholism treatment. Unpublished doctoral dissertation, University of California, Santa Barbara. Keijsers, G.P.J., Hoogduin, C.A.L., & Schaap, C.P.D.R. (1994). Predictors of treatment outcome in the behavioural treatment of obsessive-compulsive disorder. British Journal of Psychiatry, 165, 781-786. Khavin, A. B. (1985). Individual-psychological factors in prediction of help to stutterers. Voprosy-Psikhologii, 2, 133-135. Klerman, G. L. (1986). Drugs and psychotherapy. In S. L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed., pp. 777-818). New York: Wiley. Klerman, G. L., DiMascio, A., Weissman, M. M., Prusoff, B., & Paykel, E. S. (1974). Treatment of depression by drugs and psychotherapy. American Journal of Psychiatry, 131, 186-191. Knight-Law, A., Sugerman, A. A., & Pettinati, H. M. (1988). An application of an MMPI classification system for predicting outcome in a small clinical sample of alcoholics. American Journal of Drug and Alcohol Abuse, 14, 325-334. LaCroix, J. M., Clarke, M. A., Bock, J. C., & Doxey, N. C. (1986). Predictors of biofeed-back and relaxation success in multiple-pain patients: Negative findings. International Journal of Rehabilitation Research, 9, 376378. Lambert, M. J. (1994). Use of psychological tests for outcome assessment. In. M. E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 75-97). Hillsdale, NJ: Lawrence Erlbaum Associates. Lambert, M. J., & Bergin, A. E. (1983). Therapist characteristics and their contribution to psychotherapy outcome. In C. E. Walker (Ed.), The handbook of clinical psychology (Vol. 1, pp. 205-241). Homewood, IL: Dow Jones-Irwin. Lazarus, A. A. (1981). The practice of multimodal therapy. New York: McGraw-Hill. Longabaugh, R., Beattie, M., Noel, N., Stout, R., & Malloy, P. (1993). The effect of social investment on treatment outcome. Journal of Studies on Alcohol, 54, 465-478. Longabaugh, R., Rubin, A., Malloy, P., Beattie, M., Clifford, P. R., & Noel, N. (1994). Drinking outcomes of alcohol abusers diagnosed as antisocial personality disorder. Alcoholism: Clinical and Experimental Research, 18, 778-785. Luborsky, L. (1996). The symptom-context method. Washington, DC: American Psychological Association. Luborsky, L., Crits-Christoph, P., & Mellon, J. (1986). The advent of objective measures of the transference concept. Journal of Consulting and Clinical Psychology, 54, 39-47. Masterson, J. F., Tolpin, M., & Sifneos, P. E. (1991). Comparing psychoanalytic psychotherapies. New York: Brunner/Mazel. McLellan, A. T., Woody, G. E., Luborsky, L., O'Brien, C. P., & Druley, K. A. (1983). Increased effectiveness of substance abuse treatment: A prospective study of patient-treatment "matching." Journal of Nervous and Mental Disease, 171, 597-605. Merz, J. (1983). Fragenbogen zur Messung der psychologischen Reactanz [Questionnaire for the measurement of psychological reactance]. Diagnostica, 29, 75-82. Miller, W. R., Benefield, G., & Tonigan, J. S. (1993). Enhancing motivation for change in problem drinking: A controlled comparison of two therapist styles. Journal of Consulting and Clinical Psychology, 61, 455-461. Millon, T. (1994). Millon Clinical Multiaxial Inventory-III (MCMI-III) manual. Minneapolis, MN: National Computer Systems.

< previous page

page_111

next page >

< previous page

page_112

next page > Page 112

Mohr, D. C., Beutler, L. E., Engle, D., Shoham-Salomon, V., Bergan, J., Kaszniak, A. W., & Yost, E. (1990). Identification of patients at risk for non-response and negative outcome in psychotherapy. Journal of Consulting and Clinical Psychology, 58, 622-628. Moos, R. H. (1990). Depressed outpatients' life contexts, amount of treatment and treatment outcome. Journal of Nervous and Mental Disease, 178, 105-112. Moos, R. H., & Moos, B. S. (1986). Family Environment Scale manual (2nd ed.). Palo Alto, CA: Consulting Psychologists Press. Nezworski, M. T., & Wood, J. M. (1995). Narcissism in the comprehensive system for the Rorschach. Clinical Psychology: Science and Practice, 2, 179-199. Nietzel, M. T., Russell, R. L., Hemmings, K. A., & Gretter, M. L. (1987). Clinical significance of psychotherapy for unipolar depression: A meta-analytic approach to social comparison. Journal of Consulting and Clinical Psychology, 55, 156-161. Norcross, J. C., & Goldried, M. R. (Eds.). (1992). Handbook of psychotherapy integration. New York: Basic Books. O'Connor, E. A., Carbonari, J. P., & DiClemente, C. C. (1996). Gender and smoking cessation: A factor structure comparison of processes of change. Journal of Consulting and Clinical Psychology, 64, 130-138. Parker, G., Holmes, S., & Manicavasagar, V. (1986). Depression in general practice attenders"caneness," natural history and predictors of outcomes. Journal of Affective Disorders, 10, 27-35. Prochaska, J. O. (1984). Systems of psychotherapy: A transtheoretical analysis (2nd ed.). Homewood, IL: Dorsey. Prochaska, J. O., & DiClemente, C. C. (1986). The transtheoretical approach. In J. C. Norcross (Ed.), Handbook of eclectic psychotherapy (pp. 163-200), New York: Brunner/Mazel. Prochaska, J. O., DiClemente, C. C., & Norcross, J. C. (1992). In search of how people change: Applications to addictive behaviors. American Psychologist, 47, 1102-1114. Prochaska, J. O., Rossi, J. S., & Wilcox, N. S. (1991). Change processes and psychotherapy outcome in integrative case research. Journal of Psychotherapy Integration, 1, 103-120. Prochaska, J. O., Velicer, W. F., DiClemente, C. C., & Fava, J. (1988). Measuring process of change: Applications to the cessation of smoking. Journal of Consulting and Clinical Psychology, 56, 520-528. Project MATCH Research Group. (1997). Matching alcoholism treatments to client heterogeneity: Project MATCH posttreatment drinking outcomes. Journal of Studies on Alcohol, 58, 7-29. Robinson, L. A., Berman, J. S., & Neimeyer, R. A. (1990). Psychotherapy for the treatment of depression: A comprehensive review of controlled outcome research. Psychological Bulletin, 108, 30-49. Rohrbaugh, M., Shoham, V., Spungen, C., & Steinglass, P. (1995). Family systems therapy in practice: A systemic couples therapy for problem drinking. In B. Bongar & L. E. Beutler (Eds.), Comprehensive textbook of psychotherapy: Theory and practice (pp. 228-253). New York: Oxford University Press. Shapiro, D. A., Barkham, M., Rees, A., Hardy, G. E., Reynolds, S., & Startup, M. (1994). Effects of treatment duration and severity of depression on the effectiveness of cognitive-behavioral and psychodynamicinterpersonal psychotherapy. Journal of Consulting and Clinical Psychology, 62, 522-534. Shapiro, D. A., Rees, A., Barkham, M., Hardy, G., Reynolds, S., & Startup, M. (1995). Effects of treatment duration and severity of depression on the maintenance of gains after cognitive-behavioral and psychodynamicinterpersonal therapy. Journal of Consulting and Clinical Psychology, 63, 378-387. Sheppard, D., Smith, G. T., & Rosenbaum, G. (1988). Use of MMPI subtypes in predicting completion of a residential alcoholism treatment program. Journal of Consulting and Clinical Psychology, 56, 590-596. Sherbourne, C. D., Hays, R. D., & Wells, K. B. (1995). Personal and psychosocial risk factors for physical and mental health outcomes and course of depression among depressed patients. Journal of Consulting and Clinical Psychology, 63, 345-355. Shoham-Salomon, V., Avner, R., & Neeman, K. (1989). "You are changed if you do and changed if you don't:" Mechanisms underlying paradoxical interventions. Journal of Consulting and Clinical Psychology, 57, 590-598. Shoham-Salomon, V., & Hannah, M. T. (1991). Client-treatment interactions in the study of

< previous page

page_112

next page >

< previous page

page_113

next page > Page 113

differential change processes. Journal of Consulting and Clinical Psychology, 59, 217-225. Shoham-Salomon, V., & Rosenthal, R. (1987). Paradoxical interventions: A meta-analysis. Journal of Consulting and Clinical Psychology, 55, 22-28. Simons, A. D., Garfield, S. L., & Murphy, G. E. (1984). The process of change in cognitive therapy and pharmacotherapy for depression. Archives of General Psychiatry, 41, 45. Simons, A. D., & Thase, M. E. (1992). Biological markers, treatment outcome, and 1-year follow-up in endogenous depression: Electroencephalogic sleep studies and response to cognitive therapy. Journal of Consulting and Clinical Psychology, 60, 392-401. Spielberger, C. D., Gorsuch, R. L., & Lushene, R. E. (1970). The State-Trait Anxiety Inventory (STAI) test manual for form X. Palo Alto: Consulting Psychologists Press. Striker, G., & Gold, J. R. (Eds.) (1993). Comprehensive handbook of psychotherapy integration. New York: Plenum. Strupp, H. H., & Binder, J. L. (1984). Psychotherapy in a new key. New York: Basic Books. Strupp, H. H., Horowitz, L. M., & Lambert, M. J. (1997). Measuring patient changes in mood, anxiety, and personality disorders: Toward a core battery. Washington, DC: American Psychological Association. Swoboda, J. S., Dowd, E. T., & Wise, S. L. (1990). Reframing and restraining directives in the treatment of clinical depression. Journal of Counseling Psychology, 37, 254-260. Tasca, G. A., Russell, V., & Busby, K. (1994). Characteristics of patients who choose between two types of group psychotherapy. International Journal of Group Psychotherapy, 44, 499-508. Tracey, T. J., Ellickson, J. L., & Sherry P. (1989). Reactance in relation to different supervisory environments and counselor development. Journal of Counseling Psychology, 36, 336-344. Trief, P. M., & Yuan, H. A. (1983). The use of the MMPI in a chronic back pain rehabilitation program. Journal of Clinical Psychology, 39, 46-53. Vaillant, L. M. (1997). Changing character. New York: Basic Books. Vallejo, J., Gasto, C., Catalan, R., Bulbena, A., & Menchon, J. M. (1991). Predictors of antidepressant treatment outcome in melancholia: Psychosocial, clinical, and biological indicators. Journal of Affective Disorders, 21, 151-162. Wakefield, P. J., Williams, R. E., Yost, E. B., & Patterson, K. M. (1996). Couple therapy for alcoholism: A cognitive-behavioral treatment manual. New York: Guilford. Wells, K. B., & Sturm, R. (1996). Informing the policy process: From efficacy to effectiveness data on pharmacotherapy. Journal of Consulting and Clinical Psychology, 64, 638-645. Welsh, G. S. (1952). An anxiety index and an internalization ratio for the MMPI. Journal of Consulting Psychology, 16, 65-72. Wilson, T. G. (1996). Treatment of bulimia nervosa: When CBT fails. Behavioral Research Therapy, 34, 197212. Woody, G. E., McLellan, A. T., Luborsky, L., O'Brien, C. P., Blaine, J., Fox, S., Herman, I., & Beck, A. T. (1984). Severity of psychiatric symptoms as a predictor of benefits from psychotherapy: The Veterans Administration-Penn Study. American Journal of Psychiatry, 141, 1172-1177.

< previous page

page_113

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_115

next page > Page 115

Chapter 4 Use of Psychological Tests for Assessing Treatment Outcome Michael J. Lambert Brigham Young University John M. Lambert University of Utah Outcome assessment is a branch of applied psychology that illuminates the strength of the effects of psychological interventions on patient functioning. In psychotherapy research, this assessment is done in the context of specific research designs aimed at answering specific questions of theoretical importance. The broad questions that are addressed in outcome assessment require an equally broad range of research designs to draw conclusions. Although assessment of outcome occurs only within the context of a particular research strategy (single case design, program evaluation, comparative outcome study, etc.), a limited set of procedures and principles guides the selection of outcome measures. This chapter offers a brief history of outcome assessment. This is followed by some guidelines for selecting outcome measures, followed by recommendations for assessing outcome in psychotherapy. A great deal of psychotherapy research has been undertaken over the past 50 years. Therefore, much is known about outcome measurement. This knowledge can aid the interested practitioner in selecting and using outcome measures in this age of accountability. An Overview of Outcome Assessment The problems associated with assessing the changing psychological status of patients are, as Luborsky (1971) suggested, a ''hardy perennial" in the field of psychotherapy. Historically, psychotherapists have devoted themselves to defining and perfecting treatments, not assessing the consequences of these treatments systematically. Likewise, social or personality psychologists historically have developed assessment devices in contexts devoid of interest in personality change or symptomatic improvement. Personality psychologists have been more interested in static traits and stability than in change per se. Occasional exceptions to this trend can be found (e.g., Worchel & Byrne, 1964). But for the most part, little real effort has been expended on developing measures for the

< previous page

page_115

next page >

< previous page

page_116

next page > Page 116

purpose of measuring change. Outcome assessment is the neglected domain between these two fields of study. Although measurement and quantification are central properties of empirical science, the earliest attempts at quantifying treatment gains lacked scientific rigor. The field gradually has moved from complete reliance on therapist ratings of gross and general improvement to the use of outcome indices of specific symptoms that are quantified from a variety of viewpoints, including the patient, outside observers, relatives, physiological indices, and environmental data such as employment records. The data generated from these viewpoints are always subject to the limitations inherent in the methodology; none is "objective" or most authoritative. Reliance on multiple viewpoints is an improvement from previous measurement methods, which were difficult to replicate because of their lack of clear operational definitions and lack of systematic means of data collection. Psychotherapy outcome assessment, with some recent notable exceptions such as the Consumer Reports satisfaction survey (Seligman, 1995), has moved from being based on simple posttheory ratings to relying on complex and multifaceted assessments of change. In the past, attempts at measuring change have reflected the fashionable theoretical positions of the day. Early studies relied on devices developed out of Freudian dynamic psychology. These devices (e.g., Rorschach and Thematic Apperception Test [TAT]) largely have been discarded as measures of outcome because of their poor psychometric qualities, reliance on inference, and the fact that they mainly reflected the interest of orientations that emphasized unconscious processes. Even if scoring systems such as Exner's for the Rorschach have overcome some of the psychometric problems associated with projective testing, these devices are not used in outcome studies because of practical constraints (they are time intensive). The use of these measures was followed by the use of devices consistent with client-centered theory (e.g., the Q-sort technique), behaviorism (behavioral monitoring), and more recently, cognitive theories with their emphasis on automatic thoughts. Although outcome assessment always will be guided by "in-vogue" theoretical positions, the field as a whole has moved a long way from its early theoretical foundations. Nonetheless, there are important lessons to be learned from past attempts at measuring change. It would be unfortunate if nothing from the past was used to guide current attempts to measure patient gains. Contemporary research reflects some significant lessons from early scientific efforts. For example, the Clinical Research Branch of the National Institute of Mental Health (NIMH) sponsored an Outcome Measures Project in 1975 in order to better evaluate the effectiveness of different psychotherapies through the potential creation of a core battery of instruments to ease the comparison and integration of research findings (Waskow & Parloff, 1975). Since that time there have been many developments in the field, yet a core battery has not been attained. Recently, the core battery idea has been brought up again and work has been intensified in order that a core battery of tests and measures might be narrowed and operationalized. In 1994 the American Psychological Association (APA) supported a conference at Vanderbilt University to discuss some questions like the following: Would a core battery be most useful as a small number of instruments to be used for all studies of outcome, or perhaps in the form of algorithms or flow diagrams that consider any aspect of a person's functioning essential and specific to a certain diagnostic category? What needs to be measured? What criteria should be used to evaluate outcome measures? What instruments should be used? These questions were discussed and proposals were made by different groups of experts working on separate diagnostic categories of anxiety

< previous page

page_116

next page >

< previous page

page_117

next page > Page 117

disorders, mood disorders, and personality disorders (Horowitz, Strupp, Lambert, & Elkin, 1997; see Strupp, Horowitz, & Lambert, 1997). The history of assessing outcome suggests several guidelines for the use of tests in future research and practice. The most important of these guidelines shows up in the current tendencies to clearly specify what is being measured, so that replication is possible; measure change from multiple perspectives; employ different types of rating scales and methods; employ symptom-based atheoretical measures; and examine, to some extent, patterns of change over time. These practices are an improvement over the past, and they are highlighted further in the following sections. The Current State of Outcome Assessment: Diversity if Not Chaos Common Measures of Outcome All measures of outcome have weaknesses, yet using measures that have a history of frequent use will provide advantages that are not available with new or infrequently used measures. Primary among these advantages is easy comparison across studies in order to judge the degree to which patients begin therapy at equivalent levels of pathology and the degree to which patients show comparable changes following treatment (across studies). Several surveys have summarized measures that occur frequently in studies of psychotherapy. Lambert (1983) reported that the following self-report scales were the most commonly used outcome measures in the Journal of Consulting and Clinical Psychology from 1976 to 1980: State-Trait Anxiety Inventory (STAI), Minnesota Multiphasic Personality Inventory (MMPI), Rotter Internal-External Locus of Control, S-R Inventory of Anxiousness, and the Beck Depression Inventory (BDI). In their review of 21 separate American journals published between 1983 and 1988, J. E. Froyd, Lambert, and J. D. Froyd (1996) summarized usage data from 334 outcome studies. The most frequently used self-report scales were the BDI, STAI, Symptom Checklist-90 (SCL-90), Locke-Wallace Marital Adjustment Inventory, and the MMPI. In another review of articles in the Journal of Consulting and Clinical Psychology (1986-1991), Lambert and McRoberts (1993) reviewed 116 studies of psychotherapy with adults. The frequency of outcome measure categorized by source is presented in Table 4.1. As can be seen, the measures employed continue to be similar within the category of self-report methodology. Clearly, the BDI, STAI, and SCL-90 remain the most popular measures used across a broad sampling of disorders. As one moves within a specific disorder, the listing of most frequently used scales can be expected to change, although it appears that the scales just mentioned, by virtue of their focus on anxiety or depression, remain relevant and popular. Beyond self-report methodology, there is less consensus within categories of usage. The Hamilton Rating Scale for Depression (Hamilton, 1967) was used frequently in the studies reviewed by Lambert (1983) and J. E. Froyd et al. (1996). In the more recent survey, it remains relatively popular, either in the hands of the therapist or through the use of expert raters. The Locke-Wallace Marital Adjustment Inventory (Locke & Wallace, 1959) is the most frequently used specific scale employed with significant others,

< previous page

page_117

next page >

< previous page

page_118

next page > Page 118

TABLE 4.1 Commonly Used Inventories and Methods of Assessment Self-report No. of % of Instrumental No. % Significant Others No.% (N = 384) Times Total (N = 50) (N = 15) Used Beck Depression 40 10.4 Heartrate 9 18 Information on Inventory Blood 7 14 specific behavior 5 33 Experimentor-created 37 9.6 pressure 5 10 Problem checklist scales or Weight 5 10 by 6 40 questionnaires 27 7.0 Saliva 3 6 informant 14 3.6 composition 2 4 Single use of Diary behavior and/or 12 3.1 CO level measure thoughts Respiration of family State-Trait Anxiety 6 1.6 rate functioning Inventory 6 1.6 (e.g., Family Life SCL-90-R 5 1.3 Symptom 3 Minnesota Multiphasic Checklist, Family 20 Personality Inventory 5 1.3 Environment, Dysfunctional Attitude Scale Family Scale Adjustment Hassles Scale Questionnaire) Schedule for Affective Disorders and Schizophrenia Trained Therapist Observer (N = 66) No. % (N = 67) No. % Interview-global or level Frequency of of 35 53 specific 13 19 functioning ratings behavior Hamilton Rating Scale 14 21 Rating of 27 40.3 for behavior or 12 17.9 Depression* subject characteristics Interview of subject Note. From Lambert and McRoberts (1993). * This scale also was counted as a trained observer measure when it was administered by someone other than the therapist.

< previous page

page_118

next page >

< previous page

page_119

next page > Page 119

but it is typically employed in marital therapies where both partners are the recipients of treatment. Despite the fact that some measures are used repeatedly, their frequency is still not high. For example, of the 384 uses of self-report scales reported in Table 4.1, the MMPI made up only 1.6% of the total. It is only a commonly used measure in the relative sense of the word. It is startling to discover the seemingly endless number of measures used to objectify outcome. J. E. Froyd et al. (1996) consulted journals that were representative of a broad range of therapy as practiced and reported in contemporary professional literature. A total of 1,430 outcome measures were identified and represented a wide variety of patient diagnoses, treatment modalities, and therapy types. Of this rather large number, 840 different measures were used just once. A second review looked at a more homogeneous set of studies in which data on agoraphobia outcome published during the 1980s (Ogles, Lambert, Weight, & Payne, 1990) located 106 studies using 98 unique outcome measures. This occurred in a well-defined, limited disorder treated with an equally narrow range of interventions, mainly behavioral and cognitive-behavioral therapies. Similar conclusions have been drawn by E. A. Wells, Hawkins, and Catalano (1988), who reported more than 25 ways to measure drug usage in addiction outcome research. The proliferation of outcome measures (a sizable portion of which were unstandardized scales) is overwhelming and could be expanded if consideration is given to the fact that some measures (e.g. HRSD) are actually not single scales, but scales with multiple variants. C. T. Grundy, Lunnen, Lambert, Ashton, and Tovey (1994), for example, found more than a dozen versions of the HRSD. Those who assess change have not agreed on a standard battery of tests and procedures even within homogeneous patient populations, but progress is being made to these ends as mentioned previously (Strupp et al., 1997). The seeming disarray of instruments is partly a function of the complex and multifaceted nature of psychotherapy outcome as reflected in the divergence in clients and their problems, treatments, and underlying assumptions and techniques, and the multidimensionality of the change process itself. But it also represents the struggle (failure) of scientists and practitioners to agree on valued outcomes. Indeed, measuring the outcomes of psychotherapy promises to be a hardy perennial for years to come. Change is Complex Although most current outcome research studies focus on seemingly homogeneous samples of patients (e.g., unipolar depression, agoraphobia), it is clear that each patient is unique and brings unique problems to treatment. For example, although the major complaint of a person is summed up as "depression" and this person meets diagnostic criteria for major depression, this same patient can have serious interpersonal problems, somatic concerns, evidence of anxiety, financial difficulties, problems at work, problems parenting children, substance abuse, and so on. These diverse problems may be addressed in therapy, and proper assessment of outcome may require that changes in all these problems be measured. Obviously, this is a demanding task that cannot be accomplished fully in any particular study or by the practitioner. The complexity of human behavior and the complexity of theories and conceptions of human behavior invite incredible complexity in operationalizing the changes that occur as a result of psychotherapy. For example, Williams (1985) documented considerable evidence that, even within the seemingly limited diagnosis of agoraphobia, there is considerable diversity among

< previous page

page_119

next page >

< previous page

page_120

next page > Page 120

patients. He noted that there is considerable diversity in the situations that provoke panic across patients, including numerous phobias that often appear as simple phobias (e.g., fear of flying, heights). At the same time, the most frequent panic provoking situation (driving on freeways) was rated as "no problem" by nearly 30% of agoraphobics. The typical agoraphobic is severely handicapped in some situations, moderately handicapped in others, and not at all restricted in other situations. Furthermore, agoraphobics have many fears that are common to social phobia (e.g., fear of causing a public disturbance, being stared at) as a primary or secondary fear. They also have many somatic complaints for which they often and persistently seek medical consultation for a physical diagnosis even after agoraphobia is diagnosed. These fears overlap with both hypochondriacal and hysterical disorders. "The configuration of fears in agoraphobics is so highly idiosyncratic that it is substantially true that no two agoraphobics have exactly the same pattern of phobias, and that two people with virtually no overlapping areas of phobia disability can both be called agoraphobic" (Williams, 1985, p. 112). It is clear that however specific a diagnosis may seem, the term does not denote only a precise set of symptoms that occur independently of other symptoms. Thus, a single measure cannot hope to capture the complications of psychological functioning or adequately evaluate therapeutic change, because no single measure of disability can routinely capture the complexity of the individual patient. Given the great complexity in the persons to be treated, it can be suggested that researchers begin studying outcome by identifying major targets of treatment, while accepting that the resulting picture of change will be far from complete. Although this fact cannot be changed, its implications can be recognized. For example, those who produce as well as consume psychotherapy research (e.g., the insurance industry, government policymakers, etc.), need to show due modesty in the conclusions they draw from research. Given repeated failures to capture the complexity of patient functioning and change, it is little wonder that many practitioners are not avid consumers of psychotherapy research. Nevertheless, being critical of psychotherapy research does not offer a solution to the problem of complexity. One can deal with some of the complexities of outcome assessment by employing a conceptual scheme that helps organize issues and procedures. Conceptualizing Measures and Methods Various nonbinding conceptual schemes have been put forth in order to better organize the multidimensional chaos (see Lambert, Ogles, & Masters, 1992; see also Schulte, 1995). For example, McLellan and Durell (1996) suggested four areas for assessment of outcome: reduction of symptoms; improvement in health, personal, social functioning; cost of care; and reduction in public health and safety threats. Docherty and Streeter (1996), on the other hand, suggested seven dimensions that must be considered in outcome assessment: symptomatology, social/interpersonal functioning, work functioning, satisfaction, treatment utilization, health status/global wellbeing, and health-related quality of life. The previous two examples take a rather broad view of the topic. The following is a conceptual scheme that concerns itself more narrowly with issues that surround the collection of outcome data in research that focuses on the individual patient rather than the health system or the administrative area.

< previous page

page_120

next page >

< previous page

page_121

next page > Page 121

The conceptual scheme offered purposely ignores psychotherapy theories and paradigms. Yet, theoretical concerns play a major role in determining what clinicians value as outcome appraisers (Cohen, 1980). Theory may play a prominent role in choice of outcome assessment measures especially to therapists who subscribe to a particular position. However, the use of instruments that are relevant within a theoretical system will not certify effectiveness or prove convincing to the parties interested in the effectiveness of differing forms of psychotherapy. Third-party reimbursers, clinicians, academicians, and the public at large will be influenced significantly if changes in clients can be shown through research to be practically important. Instruments associated with a variety of sources, numerous content areas and social levels, and varying time orientations will be most credible in demonstrating the value of psychotherapy. Table 4.2 presents an evolving and useful conceptual scheme that organizes several important dimensions in measuring outcome (Ogles, Lambert, & Masters, 1996). These dimensions are social level of outcome assessment, the methods (technology) used in outcome assessment, sources that generate outcome data, content of outcome measure, and time orientation. The first three areas are discussed fully. The last two dimensions can be briefly summed up. Time orientation is a dimension that describes the degree to which an instrument measures statelike, unstable constructs versus traitlike, stable constructs. The main question of research for this dimension is whether particular patient attributes remain consistent over time and whether change following treatment is characterized by different patterns (e.g., Howard, Leuger, Maling, & Martinovich, 1993, suggested that changes in patient morale occur early while changes in functioning take longer). The content dimension considers behavioral, cognitive, and affective faculties, including physiology, as a subset of behavior to answer the question: What psychological area is being measured? TABLE 4.2 Scheme for Organizing and Selecting Outcome Measures Social Level Source Technology Content Time Orientation IntrapersonalSelf Global Cognition Trait 1 1 1 1 1 2 2 2 2 2 * * * * * InterpersonalTherapist Specific Affect State 1 1 1 1 1 2 2 2 2 2 * * * * * Social role Trained ObservationBehavior observer 1 1 1 1 2 2 2 2 * * * * Relevant otherStatus 1 1 2 2 * * Institutional 1 2 * Note. The numbers and asterisks below each area represent the idea that there can be subcategories such as kinds of intrapersonal events or kinds of interpersonal measures, etc.

< previous page

page_121

next page >

< previous page

page_122

next page > Page 122

Social Level of Outcomes Assessment The first dimension listed in Table 4.2 is entitled "Social Level." Social level areas covered by outcome measures can be divided into intrapersonal, interpersonal, and social role performance. Thus, social level can be seen as a dimension that reflects the need to assess changes that occur within the client, in the client's intimate relationships, and, more broadly, in the client's participation in community and social roles. This dimension can be considered a continuum that represents the degree to which an instrument measures subjective discomfort, intrapsychic attributes, and bodily experiences, versus characteristics of the client's participation in the interpersonal world. It is a matter of intellectual curiosity and values, if not empirical importance, to know about the social level of the changes that are targeted and modified in treatment efforts. Empirically, the results of outcome studies are more impressive when social level is measured broadly, considering more than a single area, because interventions can have side effects as well as more and less extensive effects. Certainly, these social level areas reflect the values and interests of clients, mental health providers, third-party payers, government agencies, and society at large. In the J. E. Froyd et al. (1996) review, 74% of outcome measures focused on the intrapersonal social level, whereas 17% and 9% focused on interpersonal and social role performance, respectively. To date, changes in these last two areas have been underrepresented in outcome research. Several issues other than social level are represented in the conceptual scheme. Another dimension that is of central importance in outcome assessment is the source from which data are generated. Change Should be Measured from Multiple Perspectives In the ideal study of change, all the parties involved who have information about change might be represented. This would include the client, therapist, relevant (significant) others, trained judges (or observers), and societal agencies who store information, such as employment and educational records. Unlike the physical sciences, measurement in psychotherapy is highly affected by the politics and biases of those providing the data. It is seldom possible to merely observe phenomena of interest without seeing it through some filtering lens. The source dimension is a sort of hierarchy, beginning with those most involved with therapy (client and therapist) and moving to the more remote (others and institutions). Yet, it is a fluid dimension and the ordering of the specific source influence could change in a particular setting or circumstance. In the aforementioned review of 116 outcome studies found in the Journal of Consulting and Clinical Psychology (JCCP) between 1986 and 1991, Lambert and McRoberts (1993) examined research practices related to assessment source. Specific outcome measures were classified into five source categories: self-report, trained observer, significant other, therapist, or instrumental (a category including societal records or instruments, such as physiological recording devices). Frequency data were then computed on the usage of specific instruments and instrument sources across studies. As may be expected, the most popular source for outcome data was the client. In fact, 25% of the studies used client self-report data as the sole source for evaluation. Of the studies that relied solely on this type of self-report scale, three fourths used more than a single self-report scale. The next most frequent procedure employed two data sources simultaneously (self-report and observer ratings). This combination occurred in 20% of the studies,

< previous page

page_122

next page >

< previous page

page_123

next page > Page 123

followed by self-report and therapist ratings at 15% and self-report and instrumental sources at 8%. A self-report scale was used alone or in combination in over 90% of the studies. Significant other ratings rarely were employed. They were utilized alone or in combination with some other data sources in about 9% of the studies reported. The therapist rated outcome alone or in combination with other measures in about 25% of the studies. Impressively, 30% of the studies used six or more instruments to reflect changes in patients. The most ambitious effort had a combination of 12 distinct measures to assess changes following psychotherapy. Clearly, one of the most important conclusions to be drawn from past psychotherapy outcome research is that the results of studies can be misunderstood easily and even misrepresented through failure to appreciate the effects that different perspectives can have in reflecting the degree of change that results from therapy. The necessary and, to some degree, common practice of applying multiple criterion measures in research studies (Lambert, 1983) has made it obvious that multiple measures from different sources do not yield unitary results. For example, in studies using multiple criterion measures, a specific treatment used to reduce seemingly simple fears may result in a decrease in behavioral avoidance of the feared object (provided by observers), but may not affect the self-reported level of discomfort associated with the feared object (Mylar & Clement, 1972; Ross & Proctor, 1973; Wilson & Thomas, 1973). Likewise, a physiological indicator of fear may show no change in response to a feared object as a result of treatment, whereas improvement in subjective self-report will be marked (Ogles et al., 1990). Relying on different sources of assessment can have an impact on conclusions (e.g., Glaister, 1982). In a review of the effects of relaxation training, Glaister found that relaxation in contrast to other procedures (mainly exposure) had its principal impact on physiological indices of change. Indeed, it was superior to other treatments in 11 of 12 comparisons, whereas the other (exposure) conditions were superior in 28 of 38 comparisons using verbal reports of improvement by the patients. On behavioral measures (including assessor ratings), neither exposure nor relaxation appeared superior. Farrell, Curran, Zwick, and Monti (1983), although showing that raters can discriminate social skill deficits from anxiety level on the Simulated Social Skills Test, also found that there was poor correspondence between selfratings and behavior ratings of these variables. This lack of convergence between measurement methods also was apparent when physiological measures were added (Monti, Wallender, Ahern, Abrams, & Munroe, 1983). Little convergent validity was found for measurement method. It appears that different measures of the same target problem often disagree (e.g., self-report of sexual arousal and physiological measures; Sabalis, 1983). This conclusion is supported further by factor analytic studies that have combined a variety of outcome measures. The main factors derived from some "older" studies that employed factor analytic data tend to be associated closely with the measurement method or the source of observation used in collecting data, rather than being identified by some theoretical or conceptual variable that would be expected to cut across techniques of measurement (Cartwright, Kirtner, & Fiske, 1963; Forsyth & Fairweather, 1961; Gibson, Snyder, & Ray, 1955; Shore, Massimo, & Ricks, 1965). A more recent example was reported by Pilkonis, Imber, Lewis, and Rubinsky (1984), who factor analyzed 15 scales representing a variety of traits and symptoms from the client, therapist, expert judges, and significant others. These scales were reduced to three factors that most clearly represented the source of data, rather than the content of the scale. Beutler and Hamblin (1986) reported similar results.

< previous page

page_123

next page >

< previous page

page_124

next page > Page 124

However, few studies have recognized or adequately dealt with the complexities that result from divergence between sources, although creative efforts and some progress have been made. Berzins, Bednar, and Severy (1975) directly addressed the issue of consensus among criterion measures. They studied the relation among outcome measures in 79 client-therapist dyads using the MMPI, the Psychiatric Status Schedule, and the Current Adjustment Rating Scale. Sources of outcome measurement involved the client, therapist, and trained outside observers. Data from all three sources and a variety of outcome measures showed generally positive outcomes for the treated group as a whole at termination. There was the usual lack of consensus between criterion measures. However, the primary thesis of Berzins et al. (1975) was that problems of intersource consensus can be resolved through the application of alternatives to conventional methods of analysis. The principal components analysis showed four components: changes in patients' experienced distress as reported by clients on a variety of measures; changes in observable maladjustments as noted by psychometrist, client, and therapist (an instance of intersource agreement); changes in impulse expression (an instance of intersource disagreement between psychometrist and therapist); and changes in self-acceptance (another type of client-perceived change). The practical implications of these results is that a single criterion might suffice for measuring changes in one area of interest such as maladjustment, whereas this practice would be misleading if a single criteria were employed in another area of functioning, such as impulse control. J. Mintz, Luborsky, and Christoph (1979) addressed the question of intersource consensus by analyzing data in two large uncontrolled studies of psychotherapythe Penn Psychotherapy Project and the Chicago study, reported by Cartwright et al. (1963). They reported that there was substantial agreement among the viewpoints of patient, therapist, and outside raters when outcome was defined broadly as posttherapy adjustment or overall benefit. They concluded that, contrary to common opinion, consensus measures of psychotherapy outcome could be defined meaningfully. Despite this consensus, they noted that "distinct viewpoints do exist" (p. 331). In fact, when considering the effect sizes reported by J. Mintz et al. (1979), it is clear that the range of improvement varied, as a minimum from .52 to .93 on pre- to postchanges. The lowest effect size came from the MMPI Hypochondriasis scale and the highest from the Inventory of Social and Psychological Functioning, an observer rating of social adjustment. In addition, although correlations between viewpoints were statistically significant, they were often low. For example, in the Chicago data (N = 93), correlations between viewpoints on ratings of adjustment ranged from .39 to .59. The lack of consensus across sources of outcome evaluation, especially when each source presumably is assessing the same phenomena, has been viewed as a threat to the validity of data. Indeed, it appears that outcome data provide evidence about changes made by the individual, as well as information about the differing value orientations and motivations of the individuals providing outcome data. This issue has been dealt with in several ways, ranging from discussion of "biasing motivations" and ways to minimize bias, to discussions of the value orientation of those involved (Docherty & Streeter, 1996; Strupp & Hadley, 1977). The consistency of findings illuminating factors associated with the source of ratings, rather than the content of patient problems, highlights the need to pay careful attention to divergence of changes that follow psychological interventions, and the way information from different perspectives is analyzed and reported in outcome studies. The fact that source factors have been replicated across a variety of scales, patient populations, and three or four decades also suggests that these findings are robust. It is clear that outcome studies need to collect outcome data

< previous page

page_124

next page >

< previous page

page_125

next page > Page 125

from a variety of sources. Finding ways to combine these data to estimate overall change remains a task for future research. Technology of Change Measures In addition to selecting different sources to reflect change, the technology used in devising scales can have an impact on the final index of change. The measures can vary depending on the degree to which they show large versus small effects in studies of outcome. Smith, Glass, and Miller (1980) suggested that several factors associated with rating scales affect estimates of psychotherapy outcome. When summed, these factors were labeled ''reactivity." The variables of importance were the degree to which a measure could be influenced by either the client or therapist, the similarity between therapy goals and the measure, and the degree of blinding in the assessment process. The correlation between these dimensions (ratings of reactivity) and effect size was .18. This was a statistically significant and substantial relation. The type of outcome measure was categorized. Those measures showing the highest effect sizes were ratings of fear and anxiety, vocational or personal development, emotional somatic complaints, and measures of global adjustment. Those with the smallest effect sizes were personality traits, life indicators of adjustment, work, and school achievement. Table 4.2 lists several different technologies (or procedures) that have been employed in outcome measurement. These include evaluation (global ratings including measures of client satisfaction), description (specific symptom indexes), observation (behavioral counts), and status (physiological and institutional measures). Unfortunately, these procedures for collecting outcome data on patient change vary simultaneously on several dimensions, making it difficult to isolate the aspect of the measurement method that may be most important. For example, J. Mintz, L. I. Mintz, Arvuda and Hwang (1993), studying treatments of depression, showed symptomatology regularly remitted before positive changes were found in social and work functioning. However, the extent to which content versus technology played a role in the difference is not clear. A broad dimension on which technologies vary appears to be a direct-indirect dimension. Here, the data are seen as possibly reflecting a bias determined by the propensity of subjects to produce effects consciously. Thus, global ratings of outcome and client satisfaction measures call for (either implicitly or explicitly) raters (usually clients) to directly evaluate outcome. Their attention is drawn to the question, "Did I get better in therapy?" In contrast, specific symptom indices focus the raters' attention (before and after treatment) on the status of specific symptoms and signs at the time the rating is made without explicit references to the outcome of therapy. Although there is still knowledge at posttesting that the therapy (or even the therapist) is being evaluated directly, the tendency to rate change is diminished compared with global ratings. Observer ratings in the form of behavioral counts can be even more objective if enough attention is devoted to the procedures that are used. Ideally, these observer ratings call for counting behaviors in real-life circumstances, in which the patients do not know they are being observed or have plenty to focus on besides the impression they are making on the observer. Physiological monitoring usually is not under the conscious control of the patient or, at the very least, presents a real and serious challenge to conscious distortion. Institutional measures, such as grade point average (GPA), are usually the culmination of a host of complex behaviors influenced by a wide variety of

< previous page

page_125

next page >

< previous page

page_126

next page > Page 126

factors, and usually this type of behavior is produced without reference to the research project. Therefore, this type of data may be the least reactive data that can be collected. Green, Gleser, Stone, and Siefert (1975) compared final status scores, pretreatment to posttreatment difference scores, and direct ratings of global improvement in 50 patients seen in brief crisis-oriented psychotherapy. The Hopkins Symptom Checklist was filled out by the patient, whereas a research psychiatrist rated the patient on the Psychiatric Evaluation Form and the Hamilton Depression Rating Scale. Ratings of global improvement were made by the patient and the therapist. Green et al. (1975) concluded that the type of rating scale used has a great deal to do with the percentage of patients considered improvedmore so, in fact, than improvement per se. They also suggested that outcome scores have more to do with the finesse of rating scales than whether ratings are objective. Global improvement ratings by therapists and patients showed very high rates of improvement, with no patients claiming to do worse. When patients had to rate their symptoms more specifically, however, as with the Hopkins Symptom Checklist, they were likely to indicate actual intensification of some symptoms and to provide more conservative data than gross estimates of change (see Garfield, Prager, & Bergin, 1971). Ratings of client satisfaction are valuable as indices of outcome and often produce results correlated with other technologies and methods of assessing outcome (Berger, 1983). Waxman (1996), for example, showed that of 22 patients having the poorest treatment response (as measured by the Brief Symptom Index), 21 were also the most dissatisfied with treatment. But, because satisfaction scales usually do not provide the kind of theoretically or practically important information desired in outcome research (e.g., the specific kind of symptoms or problems that are changing during therapy), they are not valued highly in theoretically oriented work. Nevertheless, they can provide important data about satisfaction and limited information about improvement, albeit more soft data than that which usually is sought in formal research (Lambert, Christensen, & DeJulio, 1983). Although measures of satisfaction and gross ratings of outcome have been eschewed in efficacy research (e.g., clinical trials), they are frequently mentioned as important by those conducting research in managed care organizations (Gerarty, 1996). However, patient satisfaction is often independent of other clinical outcomes. It appears that staff (not therapist) capabilities, facility amenities, and to a large degree, patient expectations account for a significant portion of the variance in patient satisfaction scores (Hall, Elliot, & Stiles, 1992; Hsieh & Kagle, 1991). They also are highly sensitive to methodological circumstances. For example, phone follow-ups are overly sensitive to interviewer behavior, and they vary a great deal as a consequence of time from discharge, anonymity of feedback, face-to-face encounter with the therapist, and perceived purpose of the data gathering effort. Outcome Assessment Must be Sensitive to Change As the preceding discussion suggests, a central issue in outcome assessment is the degree to which different measures and measurement methods are likely to reflect changes that actually occur as a result of participation in therapy. For example, if the Beck Depression Inventory (a self-report instrument) is chosen as an outcome measure, will it reflect the same degree of change as the Hamilton Rating Scale for Depression (a clinician rating)? Will gross ratings of overall change provided by the patient show larger or smaller

< previous page

page_126

next page >

< previous page

page_127

next page > Page 127

amounts of improvement than a scale that measures change on specific symptoms? To what extent are the conclusions drawn in comparative outcome studies determined by the specific measures selected by researchers? Do the techniques of meta-analysis actually allow clinicians to summarize across the different outcome measures employed in different studies (essentially combining them), and thereby facilitate accurate conclusions about differential treatment effects? There is a growing body of evidence to suggest that there are reliable differences in the sensitivity of instruments to change. In fact, the differences between measures is not trivial, but large enough to raise questions about the interpretation of research studies. Two examples of such differences will make the importance of instrument selection clear. Table 4.3 presents data from the Ogles et al. (1990) review of agoraphobia outcome studies published in the 1980s. The effect sizes presented (based on pretest-posttest differences) show remarkable disparity in estimates of improvement as a function of the outcome instrument or method of measurement selected for a study. The two extremes based on measuring scales, Fear Survey Schedule (M = .99) and Phobic Anxiety and Avoidance Scale (M = 2.66), suggest different conclusions. The average patient taking the Fear Survey Schedule moved from the mean (50th percentile) of the pretest group to the 16th percentile after treatment. In contrast, the average patient being assessed with measures of Phobic Anxiety/Avoidance moved from the 50th percentile of the pretest group to the .00 percentile of the pretest group following treatment. Comparisons between the measures depicted in Table 4.3 are confounded somewhat by the fact that the data were aggregated across all studies that used either measure. But, similar results can be found when only studies that give both measures to a patient sample are aggregated. Table 4.4 presents data from a comparison of three frequently employed measures of depression: the Beck Depression Inventory (BDI) and Zung Self-Rating Scale for Depression (ZSRS), self-report inventories; and the Hamilton Rating Scale for Depression to (HRSD), an expert judge rating (Lambert, Hatch, Kingston, & Edwards, 1986). Meta-analytic results suggest that the most popular dependent measures used to assess depression following treatment provide reliably different pictures of change. It appears that the HRSD, as employed by trained professional interviewers, provides a significantly larger index of change than the BDI and ZSRS. Because the amount of actual improvement that patients experience after treatment TABLE 4.3 Overall Effect Size (ES) Means and Standard Deviations by Scale Scale Na Mes SDes Phobic anxiety and avoidance 65 2.66 1.83 Global Assessment Scale 31 2.30 1.14 Self-Rating severity 52 2.12 1.55 Fear Questionnaire 56 1.93 1.30 Anxiety during BATb 48 1.36 .85 Behavioral Approach Test 54 1.15 1.07 Depression measures 60 1.11 .72 Fear Survey Schedule 26 .99 .47 Heart rate 21 .44 .56 Note. Based on Ogles, Lambert, Weight, and Payne (1990). aN = the number of treatments whose effects were measured by each scale b BAT = Behavioral Avoidance Test

< previous page

page_127

next page >

< previous page

page_128

next page > Page 128

TABLE 4.4 Matched Pairs of Mean Effect Size (ES) Values Scale Pair Na Mesb SDes t HRSD/ZSRS 17 0.94*/0.62* 0.61/0.30 1.88 BDI/HRSD 49 1.16**/1.57** 0.86/1.08 2.11 0.70/1.03 ZSRS/BDI 13 0.46/0.52 1.65 Note. HRSD = Hamilton Rating Scale for Depression, ZSRS = Zung Self-Rating Scale, BDI = Beck Depression Inventory. From Lambert, Hatch, Kingston, and Edwards (1986). Reprinted by permission of the American Psychological Association and authors. aN = the number of treatments whose effects were measured by each pair of depression scales. b Values derived from studies in which subjects' depression was measured on two scales at a time. Effect-size represents within-study comparisons. * p < .05 **p < .25 is never known, these findings are subject to several different interpretations. It may mean that the HRSD overestimates patient improvement, but it could be argued just as easily that the HRSD accurately reflects improvement and the BDI and ZSRS underestimate the amount of actual improvement. Both over- and underestimation also may be suggested, with true change falling somewhere in between the HRSD estimate and those provided by the BDI and ZSRS. However, there are reliable differences between measures and these differences need to be explored and understood. Do self-report scales generally produce smaller effects than expert judge ratings? Is the difference due to the fact that the content of the scales is not identical or that different sources are providing the data? Additional meta-analytic data suggest further differences between the size of treatment effects produced by different outcome measures (cf. Miller & Berman, 1983; Ogles et al., 1990; D. A. Shapiro & D. Shapiro, 1982; Smith et al., 1980). Abstracting from these and related studies, the following conclusions tentatively can be drawn: (a) Therapist and expert judge-based data, in which judges are aware of the treatment status of clients, produce larger effect sizes than self-report data, data produced by significant relevant others, institutional data, or instrumental data. (b) Gross ratings of change produce larger estimates of change than ratings on specific dimensions or symptoms. (c) Change measures based on the specific targets of therapy (such as individualized goals or anxiety-based measures taken in specific situations) produce larger effect sizes than more distal measures, including tests of personality. (d) Life adjustment measures that tap social role performance in the natural setting (e.g., GPA) produce smaller effect sizes than more laboratory-based measures. (e) Measures collected soon after therapy show larger effect sizes than measures collected at a later date. And (f), physiological measures such as heart rate usually show relatively small treatment effects. These tentative conclusions are worthy of continual exploration in future research. This research should replicate past research in at least one regardin instructing those who provide data (such as clients) to report their status or judgments as honestly as possible. Past research has been interested in discovering the truth about outcomes, not in providing inflated estimates of treatment effects. Many measures are very susceptible to the instructional set given to those who are providing the data. Further research is needed to clarify the various factors that inflate and deflate estimates of change. For now, however, it is clear that dependent measures are not equivalent in their tendencies to reflect change and that meta-analysis, because it

< previous page

page_128

next page >

< previous page

page_129

next page > Page 129

typically is used to combine different measures, cannot overcome the differences between measures. The thoughtful researcher and consumer of research will give careful attention to the way in which technology, source, social level, time orientation, and content of measurement effects estimates of improvement and the meaning of the results of outcome studies. The effect size and practical importance of treatment effects remain highly dependent on which dependent measures are used to assess change. Practical Advances that May Close the Gap Between Research and Practice Outcome assessment, as part of an applied science of psychotherapy research, should have a direct impact on clinical practice. Two developments in outcome assessment may play an important role in bridging the gap between research and practice: individualizing outcome assessment and assessing clinically significant outcome. Individualizing Outcome Assessment As already pointed out, state-of-the-art assessment of outcome relies heavily on the application of atheoretical, monotrait, standardized scales applied with homogeneous patient samples. This practice can be contrasted with the practice of relying on a careful analysis of the unique goals of an individual patient. The possibility of tailoring change criteria to each individual in therapy was mentioned frequently in the 1970s and seemed to offer intriguing alternatives for resolving several recalcitrant dilemmas in measuring change. In the 1980s and early 1990s, there has been a new surge of interest in making change measures more ideographic. This interest has been bolstered by the flux of general articles on qualitative research methods (e.g., Polkinghorne, 1991) and the desire to make psychotherapy research more responsive to the needs of the clinician and general clinical practice. Typical of these approaches is the case-formulation method advocated by Persons (1991). She criticized outcome research for being incompatible with psychotherapy as it actually is practiced. Among her criticisms was the overreliance of research on standardized measures of outcome. She noted that even patients with a homogeneous disorder have a wide range of problems, including work problems, social isolation, financial stresses, medical problems, and tension in relationships with parents, spouse, children, or friends, to name a few. She argued that the typical standardized assessment procedure ignores most of these difficulties, whereas the therapist does not. She further noted that these assessment procedures in psychotherapy, as guided by theory, are ideographic and multifaceted, not standardized and limited to a single problem or related set of symptoms. Her suggestions for improving psychotherapy research called for individualization of outcome: Each patient will have a different set of problems assessed with a different set of measures. Her suggestions have not gone unchallenged (Garfield, 1991; Herbert & Mueser, 1991; Messer, 1991; Schacht, 1991; Silverman, 1991). But Persons (1991) was hardly the first to make such recommendations. For example, Strupp, Schacht, and Henry (1988) argued for the principle of problem-treatment-outcome congruence. Unfortunately, theirs and Person's proposals have yet to face the foreboding task of

< previous page

page_129

next page >

< previous page

page_130

next page > Page 130

empirical application. Similar, if not more practical, approaches were undertaken in the 1970s and 1980s with mixed success. One method that has received widespread attention and use is Goal Attainment Scaling (GAS; Kiresuk & Sherman, 1968). GAS requires that a number of treatment goals be set up prior to intervention. These goals are formulated by an individual or a combination of clinicians, client, and/or a committee assigned to the task. For each goal specified, a scale with a graded series of likely outcomes, ranging from least to most favorable, is devised. These goals are formulated and specified with sufficient precision that an independent observer can determine the point at which the patient is functioning at a given time. The procedure also allows for transformation of the overall attainment of specific goals into a standard score. In using this method for the treatment of obesity, for example, one goal could be the specification and measurement of weight loss. A second goal could be reduction of depressive symptoms as measured by a single symptom scale, such as the BDI. Marital satisfaction could be assessed if the patient has serious marital problems. The particular scales and behaviors examined could be varied from patient to patient, and, of course, other specific types of diverse measures from additional points of view can be included. Several methodological issues need to be attended to while using GAS or similar methodology in controlled research (Cytrynbaum, Ginath, Birdwell, & Brandt, 1979). GAS has been applied within a variety of settings with varied success. Woodward, Santa-Barbara, Levin, and Epstein (1978) examined the role of GAS in studying family therapy outcome. Woodward et al. used content analysis to analyze the nature of the goals that were set and the kind of goals set by therapists of different types. In their study, which focused on termination and 6-month follow-up goals, 270 families were considered. This resulted in an analysis of 1,005 goals. Woodward et al., advocates of GAS, reported reliable ratings reflecting diverse changes in the families studied. They also noted that GAS correlated with other measures of outcome, and thus seemed to be valid. This study also suggests an advantage of GAS: It not only is applicable with individuals, but can be used to express change in larger systems. Thus, it has been recommended for use in marital and family therapy (Russell, Olson, Sprenkle, & Atilano, 1983). It continues to be applied with families as a way to express changes in the family as a whole, rather than limiting assessment to the identified patient (Fleuridas, Rosenthal, G. K. Leigh, & T. E. Leigh, 1990). More critical analyses show that GAS suffers from many of the same difficulties as other individualized goalsetting procedures. The correlations between goals seems to be around .65, raising the question of their independence. Goals judged either too easy or too hard to obtain are often included for analysis, but, most important, goal attainment is judged on a relative rather than an absolute basis so that behavior change is confounded with expectations as well as importance (Clark & Caudrey, 1986). Further, the choice and attainment of goals are related to client as well as therapist characteristics that effect goal setting as well as change rating. Calsyn and Davidson (1978) reviewed and assessed GAS as an evaluative procedure. They suggested that GAS has poor reliability because there is insufficient agreement between raters on the applicability of predefined content categories to particular patients. In addition, the interrater agreement for goal attainment ranged from r = .51 to .85, indicating variability between those making ratings (e.g., therapist, client, expert judge). In general, studies that have correlated GAS improvement ratings with other ratings of improvement, such as MMPI scores, client satisfaction, and therapist improvement

< previous page

page_130

next page >

< previous page

page_131

next page > Page 131

ratings, have failed to show substantial agreement and frequently coefficients have been below .30 (Fleuridas et al., 1990). In addition, Calsyn and Davidson pointed out that the use of GAS also frequently eliminates the use of statistical procedures, such as covariance, that could otherwise correct for sampling errors. Because of this problem, as well as the unknown effects of low reliability, it is suggested that if GAS is used it only should be used in conjunction with standard scales applied to all patients. Suggestions for the use of GAS in psychotherapy research have been made by Mintz and Kiesler (1982), and the interested researcher may wish to review their recommendations or the review by Calsyn and Davidson (1978). Since these reviews, Lewis, Spencer, Haas, and DiVittis (1987) described methods of data gathering and scale construction that they felt increase the reliability and validity of GAS. They applied GAS in conjunction with family-based interventions with inpatients. Specific procedures for goal creation and later evaluation increased reliability and validity, without reducing the advantages of individualized goals. Among the innovations they suggested was the use of GAS ratings only at follow-up, with evaluations of the pattern of adjustment built into goal expectations and evaluations. GAS still is being applied in a variety of settings such as inpatient and school (Maher & Barbrack, 1984), with a variety of patient groups and treatment methods, such as group therapy (Flowers & Booarem, 1990) or with the severely mentally retarded (Bailey & Simeonsson, 1988). However, examination of these studies reveals widespread modification in its use, so that it is misleading to consider it a single method: GAS is a variety of different methods for recording and evaluating client goal attainment. It is not possible to compare goal attainment scores accurately from one study to the next. In addition to the previously stated problems, several issues are raised, given that individualized goals will not be much more than poorly defined subjective decisions by patient or clinician; units of change derived from individually tailored goals are unequal and therefore hardly comparable; different goals are differentially susceptible to psychotherapy influence; the tendency for goals to change early in therapy (which requires revision of the GAS goals); and the fact that because some therapies have a unitary goal or set of goals, the status of individually tailored goals is tenuous. Effective individualization of goals for the purpose of empirically assessing patient change remains an ideal, rather than a reality. The clinician may be better off having a wide range of standardized scales available. Among these scales would be those itemized in this book. At the very least, a clinician should have available a measure of depression (e.g., BDI), a measure of anxiety (e.g., STAI), and a measure of relationship (e.g., Marital Adjustment Inventory) for adult patients. Clinical Versus Statistical Significance Most psychotherapy research is aimed at questions of theoretical interest. Is dynamic therapy more effective than cognitive therapy? Is exposure in vivo necessary for fear reduction? These and a host of similar questions give rise to the research designs that have been used in outcome research. The data acquired in outcome studies are submitted to statistical tests of significance. Group means are compared, the within-group and between-group variability are considered, and the resulting numerical figure is compared with a preset critical value. When the magnitude of the distance between groups is sufficiently large, it is agreed that the results are not likely to be due to chance fluctuations in sampling, thus statistical significance has been demonstrated. This is the

< previous page

page_131

next page >

< previous page

page_132

next page > Page 132

standard for most research and is an important part of the scientific process. However, a common criticism of outcome research is that the results of studies, because they typically are reported in terms of statistical significance, obscure both the clinical relevance of the findings and the impact of the treatment on specific individuals. Unfortunately, statistically significant improvements do not necessarily equal practically important improvements for the individual client. Therefore, statistically significant findings may be of limited practical value. This fact raises questions about the real contributions of empirical studies for the practice of psychotherapy. It is conceivable that, in a well-designed study, small differences between large groups after treatment could produce findings that reach statistical significance, whereas the real-life difference between patients receiving different treatments is trivial in terms of the reduction of painful symptoms. For example, a behavioral method of treatment for obesity may create a statistically significant difference between treated and untreated groups if all treated subjects lost 10 pounds and all untreated subjects lost 5 pounds. However, the clinical utility of an extra 5-pound weight loss is debatable, especially in the clinically obese patient. This dilemma goes to the core of outcome assessment: adequate definitions and quantification of improvement. Numerous attempts have been aimed at translating reports of treatment effects into metrics that reflect the importance of the changes that are made: (a) In the earliest studies of therapy outcome, patients were categorized "posttherapy" with gross ratings of "improved," "cured," and the like, implying meaningful change. The lack of precision in such ratings, however, resulted in their waning use (Lambert, 1983). (b) Those interested in operant conditioning and single-subject designs developed concepts such as ''social validity" to describe practically important improvement (Kazdin, 1977; Wolf, 1978). However, decisions about the importance of change remained somewhat subjective and unquantified. (c) Some disorders easily lend themselves to analysis of important changes, because improvement can be defined as the absence of a behavior, such as cessation of drinking, smoking, or drug use. Unfortunately, most symptoms targeted in psychotherapy cannot be defined and measured so clearly, and even where the absence of a behavior can be easily quantified, there is a lack of consensus about the proper procedures. E.A. Wells et al. (1988), for example, reported identifying 25 different ways of estimating drug use cessation. There is growing recognition that the concept of clinical significance is important and that many different approaches can be used to operationalize it (Jacobson, 1988). Several related methods were compared in a special issue of Behavioral Assessment (e.g., Kendall & Grove, 1988). A discussion and illustration of some of these methods clarify their contemporary use. Jacobson, Follette, and Revenstorf (1984) brought clinical significance into prominence by proposing statistical methods that would illuminate the degree to which individual clients recovered at the end of therapy (see also Jacobson & Truax, 1991). Recovery was proposed to be a posttest score that was more likely to belong to the functional than dysfunctional population of interest. Estimating clinical significance requires norms for the functional sample and presumes certain assumptions about the test scores have been met. For change to be clinically significant, a patient must change enough so that one can be confident that the change exceeds measurement error (calculated by a statistic titled the reliable change index, or RCI). When a patient moves from one distribution (dysfunctional) to another (functional), and the change reliably exceeds measurement error (the RCI is calculated by dividing the absolute magnitude of change by the standard error of measurement), change is viewed as clinically significant.

< previous page

page_132

next page >

< previous page

page_133

next page > Page 133

The patient is more likely functional than dysfunctional. This method provides a way to assess the reliability of an individual client's pre- and posttreatment change, and the practical meaningfulness of this change. Jacobson, Follette, Revenstorf, Baucom, et al. (1984) applied their criteria of clinical significance to studies of behavioral marital therapy. They were able to develop cutoff scores based on the normative data of functional and dysfunctional couples who had taken the Locke-Wallace Marital Adjustment Inventory. A growing number of studies have employed these techniques with various treatment samples with considerable success (Lacks & Powlishta, 1989; Mavissakalian, 1986; Perry, Shapiro, & Firth, 1986; Schmaling & Jacobson, 1987). Although this method has received favorable reviews (Ankuta & Abeles, 1993; Goldfried, Greenberg, & Marmar, 1990; Lambert & Hill 1994, Lambert, Shapiro, & Bergin, 1986), there are several limitations that have gone unaddressed. Jacobson, Follette, and Revenstorf's (1984) proposal does not provide an operationalization of a comparative social standard. This failure to define normative samples results in three specific problems: an inability to identify and use relevant normative samples across studies, the restriction of the social validation methodology by the use of only one dysfunctional and one functional sample, and the lack of a procedure to determine the distinctness of samples. Although Jacobson, Follette, and Revenstorf (1984) proposed the use of a functional and a dysfunctional sample for "whatever variable is being used to measure the clinical problem" (p. 340), they did not specify what they meant by "functional" or provide a method for determining the social relevance of the samples. In a later article, Jacobson and Revenstorf (1988) revealed their awareness of this problem, yet still provided no suggestions as to how to arrive at this agreed on use of samples. The second problem, using only two normative samples to represent extremes (functional and dysfunctional), has produced a great deal of criticism. For example, Wampold and Jenson (1986) suggested that using two sample distributions relies on the assumption that the population forms a bimodal distribution. They concluded that this is seldom the case, thus this methodology has limited utility. Other writers (e.g., Hollon & Flick, 1988) pointed out that using only two poorly defined, extreme samples results in unstable cutoff points. Another criticism of using two extreme samples is the inability of identifying individuals who may make clinically meaningful change but do not change enough to enter the functional sample's distribution. Jacobson and Revenstorf (1988) recognized this limitation and identified it as one of the most fundamental questions regarding the viability of their method. The final problem is the lack of a procedure to determine the distinctness of samples. This prevents identifying whether sample overlap exists to such an extent that comparisons between samples become meaningless. Although Jacobson, Follette, and Revenstorf (1984) cautioned that using two samples as social standards for comparisons is justified only when they form distinct distributions, they offered no method to operationalize distinctness. Given the importance of distinct samples, a method for determining such needs to be developed. To further the objective of social validation and overcome some of its problems, Tingey, Lambert, Burlingame, and Hansen (1996a) proposed three guidelines focusing on the derivation of relevant social standards: guidelines for defining and identifying relevant normative samples, utilizing the social validation methodology more fully by employing multiple normative samples, and providing a procedure for determining the distinctness of these samples.

< previous page

page_133

next page >

< previous page

page_134

next page > Page 134

Identifying Relevant Normative Sample. In addition to matching client's demographics, the proposed social validation methodology requires that clinicians identify normative samples that show a level of performance that is important or relevant to society. Two basic factors are involved in applying this guideline: choosing a specifying factora characteristic or symptom to be studied; and identifying an impact factora particular effect or impact the specifying factor has on society. Specifying and impact factors are essentially an extension of Kendall and Grove's (1988) classification of outcome assessment. The specifying factor is the characteristic or symptom being measured via an outcome instrument (i.e., depression, anxiety). To be useful, specifying factors must be clearly defined and exist in varying amounts or degrees across society. Although the specifying factor need not necessarily be observable, it must at least be conceptually sound and justifiable. The impact factor is how the specifying factor affects individuals and societyits impact on treatment utilization or job performance, and the like. The impact factor is generally defined by some behavior resulting from various levels of the specifying factor. The behavior must be considered important or relevant by some segment of society, and vary proportionally with different levels of the specifying factor. One way to determine the impact factor is to observe important behaviors that covary with different levels of the specifying factor. One relevant behavior might be involvement in treatment. It is reasonable to conclude that individuals involved in varying intensities of treatment (impact factor)such as inpatient, partial hospitalization, outpatient, or no treatmentwould demonstrate different levels of pathological symptomology (specifying factor). Thus, a reduction in a client's score from levels of symptomatic distress found in inpatient populations to that normally exhibited by outpatient populations would imply a significant positive change in impact. The value of these two guidelines is that they introduce the possibility of deriving multiple normative samples using an impact factor, and thereby anchor the assessment of clinically significant change to important social standards. Multiple Normative Samples. The use of multiple normative samples as standards requires an expanded definition of clinically significant change. Tingey et al. (1996a) further proposed that clinically significant change be defined as movement from one socially relevant sample into another based on the impact factor selected, rather than from a "dysfunctional" to a "functional" distribution as proposed by Jacobson and Revenstorf (1988). These multiple samples would be organized along a rational or empirical continuum representing low to high impact, and correspondingly, low to high levels of the specifying factor. Illustrative Example. Demonstrating the process for establishing a continuum will help clarify the aforementioned guidelines and extensions. The basic steps involve selecting a specifying factor that is defined by a reliable outcome instrument, identifying an impact factor (a behavior relevant to society that covaries with different levels of the specifying factor) and normative samples demonstrating different levels of this factor, determining the statistical distinctness of these socially relevant samples, calculating RCIs for all possible sample pairs, and calculating cutoff points between adjacent sample pairs along the continuum. In accord with the first step, the specifying factor selected was general symptoms of psychological distress as measured by the Symptom Checklist-90-Revised (SCL-90-R; Derogatis & Melisaratos, 1983). The SCL-90-R defines a clear, discrete specifying factor, global psychological distress, and measures it in a continuous fashion using a Likert scale with 90 items sampling a variety of problem areas. In addition, it is a

< previous page

page_134

next page >

< previous page

page_135

next page > Page 135

frequently used instrument in psychotherapy research (J.E. Froyd et al., 1996; Lambert et al., 1983). As such, it is well suited for developing a normative continuum. Next, a socially relevant impact factor that logically results from symptoms of psychological distress and covaries with them is treatment. It is reasonable to conclude that groups of clients involved in varying intensities of treatment (from none to inpatient treatment) with varying costs to society would show substantial differences on their SCL-90-R scores. Using this rationale, four socially relevant normative samples were identified: a sample of healthy persons who were carefully screened to exclude those in treatment (asymptomatics), an unselected sample (with regards to treatment participation) of community adults (mildly symptomatics; Derogatis & Melisaratos, 1983), people who were receiving outpatient group treatment (moderately symptomatic; Burlingame & Barlow, 1996); and a sample of persons receiving inpatient treatment (severely symptomatic; Derogatis & Melisaratos, 1983). As cited earlier, the last three samples were taken from existing literature. The second sample, mildly symptomatic, is labeled such due to the estimate that 20% of the general population experience significant levels of psychological distress (Saunders, Howard, & Newman, 1988), and people with disorders who might have been undergoing treatment were not screened out. This community-based sample provides "normal" control statistics for the norms of the SCL-90-R. The first sample, asymptomatic, was collected by Tingey et al. (1996a) and requires more explanation. The asymptomatic sample consists of specially screened patients who were not in any treatment. The screening occurred at several levels. Initially, a subject pool was derived by approaching 30 licensed psychologists from a Western state and asking them to nominate individuals from the community who they felt were psychologically healthy and high functioning. Nominated individuals were then contacted by phone and, after agreeing to participate in the study, were given a screening interview. The interview excluded subjects if they were pregnant, currently taking any psychotropic medication, or had ever been diagnosed with a mental disorder. Subjects passing the phone screening were invited to a local mental health clinic, where they were further screened using a test battery including the Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961) and the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, Lushene, & Jacobs, 1983). These instruments were selected in an attempt to screen out subjects who were symptomatic for anxiety and depressionthe two most common symptoms of psychopathology. Cutoff scores for the test battery were based on norms reported in the literature. For the BDI, subjects scoring nine or under were considered free of depression and were included in the asymptomatic sample (Beck, 1972; Beck & Beamesderfer, 1974). A score of 35 was selected as the cutoff for both the State and Trait sections of the STAI in that it corresponds to the mean score of working adult females (Spielberger et al., 1983). There were 82 subjects who passed all screening criteria. This sample had a mean age of 43.23 (SD = 13.89) and consisted of 46.9% males and 53.1% females. A more complete description of this sample and methodology of data collection can be found in Tingey (1989). Descriptive data for these four samples on the SCL-90-R Global Severity Index (GSI) are reported in Table 4.5. This index was chosen because it is the usual score reported in outcome studies to express patient improvement, and because it "represents the best single indicator of the current level or depth of the disorder" (Derogatis & Melisaratos, 1983, p. 11). As postulated, the four samples indeed form a continuum of increasing symptomatology with a progressive increase in the means of adjacent samples as one ascends the continuum.

< previous page

page_135

next page >

< previous page

page_136

next page > Page 136

TABLE 4.5 Normative Continuum of SCL-90-R Global Severity Index (GSI) Raw Scores Across Samples Sample SCL-90-R GSI Score 0.19 (0.16) Asymptomatic (N = 82) mean (SD) 0.31 (0.31) Mildly Symptomatic (N = 974) mean (SD) 0.79 (0.45) Moderate Symptomatic (N = 97) mean (SD) 1.30 (0.82) Severely Symptomatic (N = 313) mean (SD) Note. From Tingey, Lambert, Burlingame, & Hansen (1996a). With the samples identified and the continuum formed, the third step was to determine the distinctness of adjacent samples using t and "d" tests. Each sample in the continuum met two criteria when compared with the adjacent sample. All t values surpassed alpha of .05, and all "d" values surpassed a criterion of .5, signifying a moderate effect according to Cohen (1980). Once relevant normative samples were identified and their distinctness statistically verified, the last two steps of generating RCIs and cutoff points were completed. As these procedures are quite complex, the reader is referred to Tingey et al. (1996a) for further explanation. Jacobson and Revenstorf (1988) suggested establishing a confidence band based on the RCI around the cutoff to alleviate the error associated with a discrete cutting point. The cutoffs between adjacent samples define the point where it is statistically more likely for a score to be in one, as opposed to the other overlapping distribution (Jacobson, Follette, & Revenstorf, 1984; Jacobson & Truax, 1991). The cutoff points plus the confidence bands around them indicate the boundary between two adjacent normative distributions that must be crossed from preto posttreatment in order for this difference to be considered clinically significant using the most stringent criterion (Jacobson & Revenstorf, 1988). Table 4.6 presents the different cutoff points and confidence bands for adjacent sample pairs. Figure 4.1 illustrates this material in graphic form. In keeping with Jacobson and Revenstorf (1988), confidence bands have been included in both Table 4.6 and Fig. 4.1. However, Tingey et al. (1996a) did not propose that a client should be required to exceed both the RCI and the confidence band to be considered improved. With general normative standards identified and RCIs, cutoff points, and confidence bands calculated, clinicianbased or research-based outcome data can be compared to them as a standard way of judging client change. Any set of outcome data that used the SCL-90-R can be compared to the normative standards identified here to assess their clinical significance. To illustrate, selected data from a group therapy process and outcome study (Burlingame & Barlow, 1996) were applied. In this study, 97 outpatients underwent group treatment for 15 weeks. They were assessed with the SCL-90-R at pretest, after 8 weeks of group treatment, after termination (at 15 weeks), and 6 months

< previous page

page_136

next page >

< previous page

page_137

next page > Page 137

TABLE 4.6 Cutoff Points and Confidence Bands for the Continuum Sample Pairs Asymptomatic Mildly Symptomatic Moderately vs. Mildly vs. Moderately Symptomatic vs. Symptomatic Symptomatic Severely Symptomatic Cutoffa 0.23 0.51 0.97 Confidence 0.15-0.31 0.38-0.64 0.76-1.19 Bandb Note. From Tingey, Lambert, Burlingame, and Hansen (1996a). a Cutoff = (Jacobson et al., 1984) b Confidence Band = 1/2 RC ± Cutoff (Jacobson & Revenstorf, 1988) following termination. For illustrative purposes, the course of therapy for four of the treated patients is illustrated in Fig. 4.2. These four patients were selected because they showed a different course in treatment. As can be seen in Fig. 4.2, two began treatment in the severely symptomatic range (c and d) and two began in the moderately symptomatic range (a and b). Patient d showed little improvement across the course of

Fig. 4.1. SCL-90-R GSI cutoffs and confidence bands for the four normative samples. From Tingey, Lambert, Burlingame, and Hansen (1996a).

< previous page

page_137

next page >

< previous page

page_138

next page > Page 138

Fig. 4.2. Illustration of clinically significant change patterns for four patients using multiple normative sample cutoffs and repeated measures. From Tingey, Lambert, Burlingame, and Hansen (1996a). treatment but showed rather dramatic improvement when assessed at follow-up. At follow-up, d had crossed the cutoff separating severe and moderate distributions as her change score exceeded the RCI of .42. Patient c, on the other hand, started within the severely symptomatic distribution making continual progress. Patient c met the criteria for clinically significant change after 8 weeks of treatment. By termination, c had not passed the cutoff for entry into the asymptomatic distribution, but this criterion was reached during the followup period. Patient b began within the moderately disturbed sample distribution and had the possibility of meeting the criteria for either clinically significant deterioration or improvement. Patient b showed a deteriorating course 8 weeks into the treatment passing the cutoff (.97) as well as changing more than .43 on the GSI. At termination, b's status was less clear. His posttreatment score was still past the cutoff (.97), but his RCI was now less than the .43 necessary to consider the change reliable. By the follow-up testing he again met the criteria for clinically significant change (deterioration).

< previous page

page_138

next page >

< previous page

page_139

next page > Page 139

Patient a's scores illustrate movement from the moderately symptomatic category into asymptomatic group. The course of improvement appeared to be relatively fast with regard to meeting the criteria for clinically significant improvement (from moderately to mildly symptomatic). By follow-up, a had passed the cutoff and moved into the asymptomatic distribution but was not below the confidence band, making his status with regards to this distribution uncertain. On the basis of meeting the criteria for clinically significant change during the active phase of treatment, patients c, b, and a, along with patients who made similar changes, were studied intensively with regard to their participation in group process and their individual pretherapy characteristics. Were their scores to be monitored during treatment, some attempt to prevent the deterioration of Patient b might have been possible. This possibility is enhanced by the development and use of cutoff scores and an estimate of reliable change. The addition of a continuum in the assessment of reliable change and clinical significance adds a considerable amount of information and softens the criteria for achieving clinical change. Clients do not necessarily have to enter the most functional sample distribution to be considered clinically improved. This addresses a complaint Jacobson, Follette, and Revenstorf (1986) made about their method being too stringent and unable to identify subjects who "gain substantial benefits from psychotherapy . . . even though they remain somewhat dysfunctional" (p. 311). In practical applications this may be particularly useful because clinically significant improvement can be reached by movement from the distribution of patients who often need very restrictive and expensive inpatient settings into the distribution of people who require less expensive outpatient treatment. At the other end of the continuum, employee assistance programs often treat patients who at the commencement of treatment are moderately symptomatic or even mildly symptomatic. The continuum makes it possible to track movement of their scores into the mildly or asymptomatic norms (i.e., even these patients can be identified as making clinically significant change). A continuum also introduces the likelihood of quantifying different degrees of clinical change and of detecting deterioration. With more than two normative samples, it now becomes possible for a client's score at posttreatment to have moved across more than one distribution in a positive direction, or to have moved in a negative direction into a less functional distribution. This valuable information could be particularly useful in differentiating the effects of different treatment types, or in evaluating the effectiveness of specific process variables (cf. deterioration rates in the NIMH collaborative depression treatment project; Ogles, Lambert, & Sawyer, 1995). Overall, a continuum appears to provide a more precise and relevant perspective on clinical change. One advantage of defining cutoff points that establish standards for identifying clinically significant change is in clinical practice. As managed health networks and government agencies insist on tracking patient outcome, it appears more likely that individual providers will search for a standard way of evaluating patient progress. Figure 4.2 provides an interesting and easy way for clinicians to use standard indices to identify patients' pretreatment status and track their progress over time while using one possible standard for improvement. Several limitations and needs for future research can be identified and require further attention. First, the Ns that make up normative samples should be substantial. Several samples rather than single samples of inpatients, outpatients, community, and asymptomatic individuals would be ideal. The popularity of the SCL-90-R and several other measures make this feasible. Investigations of the BDI (Seggar & Lambert, 1994), STAI

< previous page

page_139

next page >

< previous page

page_140

next page > Page 140

(Condon & Lambert, 1994) and the Achenbach Child Behavior Checklist (E.M. Grundy & Lambert, 1994) suggest that these scales have ample normative data. On the other hand, it is difficult to find scores for the untreated subjects on the Hamilton Rating Scale for Depression and, occasionally, scores on these scales do not discriminate between inpatients and outpatients (C.T. Grundy, Lambert, & E. Grundy, 1996). Large sample sizes would also help to insure that the distributions approximate normality. Much of the work on clinical significance assumes normality of distributions, and the consequences of violating this assumption have not been investigated. The size of the correlation coefficient selected to calculate RCI makes a large impact on estimates of reliable change. Rather than selecting a single reliability coefficient, it may be best to take the median figure from a number of studies of reliability. It can be argued that the coefficient should reflect stability over the course of treatment, a time period that could reach 20 weeks. It can also be argued that the coefficient should be based on 1 or 2 weeks, that is, on the time frame in which patients are being asked to rate their symptoms. Even shorter intervals for patient populations may be appropriate. However, 1-week test-retest data are generally not available and an approximate substitute that approaches this very tight time reference will need to be identified. In this respect, it may be appropriate to use an alpha coefficient to represent such a short time frame. The coefficient should reflect measurement error for an individual's score at the time that individual is assessed rather than reflecting changes in the score that could be presumed to be due to treatment. Another important issue in defining clinically significant change is the effect of having a limited range of scores such as those in the more functional distribution of the SCL-90-R and other specific tests that are used in outcome research. Although the normative samples plotted in Fig. 4.1 meet the proposed criteria of distinctiveness, the sizable reduction in differences between means for adjacent normative samples (e.g., severe to moderate, moderate to mild, mild to asymptomatic) coupled with a parallel reduction in variance across samples provide ample evidence for a floor effect on the GSI. This phenomena mandates the use of several RCI indices rather than one standard figure for the test (GSI) as a whole. Using one RCI computed from all the normative samples would result in an RCI that is too large to be useful at the lower end of the symptomatic (GSI) continuum. Most outcome measures were made for studying the presence rather than the absence of symptoms, suggesting the need to expand the capacity of measures to properly assess this end of the continuum. A final issue of importance is the value and desirability of placing confidence bands around the cutoff scores. Using bands around a cutoff score increases the degree of certainty that patients are classified correctly. Although the idea of using confidence bands has appeal and data on confidence bands has been presented for illustrative purposes, the use of bands results in practical problems that argue against their value in research and clinical settings. One practical problem is that bands result in unclear classification of patient status prior to and following therapy. It is desirable to calculate the percentage of patients who are among the ranks of the disturbed sample both before and after treatment. The use of a band makes classification ambiguous and needlessly complex. At pretreatment, it may result in large numbers of patients who cannot be considered dysfunctional. At posttreatment, it results in patients having to meet three criteria for improvement instead of two. For many patients (those above the cutoff, but within the band) it will not be possible to meet all three criteria. It is recommended that researchers use only the cutoff score and the RCI in presenting data about clinically significant change.

< previous page

page_140

next page >

< previous page

page_141

next page > Page 141

Similar and Related Procedures Additional examples of estimating clinically significant change have been published in recent years. These methods have emphasized the use of normative comparison. Examples include the use of social drinking behaviors as criteria for outcome in the treatment of problem drinking, or the use of definitions of adequate sexual performance (e.g., ratio of orgasms to attempts at sex, or as time to orgasm following penetration; Sabalis, 1983). These criteria are based on data about the normal functioning of individuals and can be applied easily and meaningfully with a number of disorders where normal or ideal functioning is readily apparent and easily measured (e.g., obesity, suicidal behavior). But normative comparison also can be used to quantify clinical significance. This strategy involves comparing the behavior of clients before and after treatment to that of a sample of nondisturbed "normal" peers. This method has the advantage that comparisons can be based directly on the psychological tests commonly used to measure therapy outcome, if a standardization sample of nonpatients is also available. Usually the procedure involves comparing the end state functioning of treated clients to various control groups. Thus, standards of clinical improvement can be based on normative data and posttreatment status gathered through meta-analysis of multiple samples of patients, instead of the magnitude of change of specific individual patients. For example, Trull, Nietzel, and Main (1988) reported a meta-analytical review of 19 studies of agoraphobia that used the Fear Questionnaire. Self-reported posttreatment adjustment of agoraphobics was compared with two normative samples. The normative samples were based on college students (at two universities) and a community sample drawn randomly from the phone directory. Both samples included subjects who had never received treatment for a phobic condition. As might be expected, the community sample was more disturbed than the college sample, probably because agoraphobia prohibits or inhibits attendance in college classes. As a consequence, in this study estimates of clinically significant change via normative comparison turned out to be a function of which normative group was used for comparison. Agoraphobics, treated mainly with exposure, improved during treatment. The average agoraphobic started at the 99th percentile of the college norms and improved to the 98.7th percentile at the end of treatment. The average agoraphobic also started at the 97th percentile of the community norm and progressed to the 68th percentile at posttreatment and to the 65.5th percentile at follow-up. Using similar methodology Nietzel, Russell, Hemmings, and Gretter (1987) studied the clinical significance of psychotherapy for unipolar depression. They compared the posttherapy adjustment of depressed and nondepressed adults who took the BDI. In all, 28 published studies were used to calculate composite BDI norms; these were compared with outcomes from 31 outcome studies that yielded 60 effect sizes. Three normative groups could be identified: a nondistressed group; a general population group (consisting mostly of collegiate subjects); and a situationally distressed group (e.g., pregnant women), which turned out to be very similar to the general population samples. Comparisons contrasting the depressed patients with the normative samples suggested that the various treatments (all of which appeared similar in their effectiveness) produced clinically significant changes in relation to the general population. In fact, the average depressed patient moved from the 99th percentile of the general population norms to the 76th percentile of this reference sample. These gains were maintained at follow-up. In reference to the nondistressed group, the same improvements were much less remarkable. The average patient only moved from the 99th percentile to the 95th percentile.

< previous page

page_141

next page >

< previous page

page_142

next page > Page 142

Nietzel et al. concluded that clinically significant improvement depends on the nature of the normative sample. Obviously, selection of normative samples has a high impact on estimates of meaningful improvement. A recent study combining various methods of calculating clinical significance illustrates the potential of using more than one procedure. Scott and Stradling (1990) studied and contrasted the effects of cognitive therapy offered in either an individual or group format to patients who were depressed. They reported results from an analysis of BDI scores. Patients were assigned to either of the treatment groups or a wait-list control while still receiving their customary treatment from their general practitioner (which included tricyclic medication in about one half of the patients). Besides the usual group comparisons based on inferential statistics, Scott and Stradling reported clinically significant improvements as well. They reported the percentage of patients reaching various cutoff scores on the BDI. Using Kendall, Hollon, Beck, Hammen, and Ingram's (1987) criteria for nondepression, mild depression, moderate depression, and severe depression, Scott and Stradling were able to show obvious differences between wait list and psychotherapy outcome over the 12 weeks of treatment, and for 1-year follow-up. Scott and Stradling (1990) also applied the RCI as a primary criterion, showing that patient change was of a great enough magnitude so that patients could reasonably be considered to have left the ranks of the dysfunctional. In fact, using the RCI, they estimated that 100% of those in the group treatment and 84% of those in individual treatment manifest clinically significant improvement. Fifty-three percent on the wait list showed similar improvement. In addition, 5% of the wait list subjects deteriorated, whereas none of the treatment subjects did. Although there have been favorable reviews, numerous problems remain with this methodology. These include the complexities created by the fact that researchers use multiple outcome measures, each one possibly providing different information about the individual and the group as a whole. One measure may show clinically significant change for the group or a specific individual, whereas another does not. Other problems include the use of discrete cutoff points and their derivation; the problems that result from score distributions that are not normal; and the limitations of floor and ceiling effects in many of the most frequently used tests. This latter problem is especially serious, because many tests are weighted heavily toward pathology and not developed for use with people who represent the actualized end of the continuum of functioning. In some instances, it is this actualized end of the continuum that represents the patients' non-disturbed peers. Moreover, there is considerable controversy about procedural and statistical analyses (cf. Lacks & Powlishta, 1989) that have substantial impact on estimates of clinical significance. Some procedures provide more conservative criteria, whereas others are lenient. Thus, statistical methods do not eliminate the seemingly inevitable application of values to operationalizing outcome. These statistical methods make the judgments explicit and replicable, so that researchers can equate clinical significance across studies. Finally, the use of multiple normative samples requires an expanded definition of clinically significant change. These procedures have been criticized by Follette and Callaghan (1996), who considered an enlarged definition of clinical significance to obscure its underlying principle to measure change in a way that is meaningful to the client. Martinovich, Saunders, and Howard (1996) argued that the extensions advocated by Tingey et al. (1996a) compound the problems associated with clinical significance rather than solving them. They suggested several methods for solving some of the problems inherent in this methodology. Tingey, Lambert, Burlingame, and Hansen

< previous page

page_142

next page >

< previous page

page_143

next page > Page 143

(1996b) responded to these and other criticisms. The interested reader will find the interchange quite helpful in evaluating the uses and problems with clinical significance. The development of statistically defined clinically significant change, although not without controversy, should be applauded and encouraged. Clinicians now have, at their disposal, normative data that allow for estimating clinically significant improvement on a few important outcome measures (e.g., BDI, SCL-90-R, Locke-Wallace Marital Adjustment Inventory, Fear Questionnaire, Child Behavior Checklist, Hamilton Depression Rating Scale, Outcome Questionnaire, and Inventory for Interpersonal Problems; the interested reader can find these cutoffs in Ogles et al., 1996). Data on clinical significance may stimulate research applications in private practice settings, as well as improve the translation of research findings into clinician friendly facts. Issues in Need of Research There is a long list of research topics that would be welcome in the struggle to make outcome assessment more useful to clinicians and to policymakers. Two topics, cost-effectiveness and provider profiling, are highlighted here. Cost-Effective Care as an Outcome Among those outcomes that can be monitored, the cost-effectiveness of treatments is an interesting but rarely studied phenomenon. In a society that is becoming more preoccupied with cost containment, the costeffectiveness of treatment is an important outcome. The value of care is often defined as the trade-off between quality of care (or traditional clinical outcomes) in relation to dollars spent on care. To health plans and employers (if not patients), the value of care, or cost-effectiveness of care, should be as important as absolute costs for deciding on a treatmentafter all, there is little point in spending money on something that provides no benefit just because it is cheap. Cost-effectiveness data is particularly important when the effects of different treatments are equala state of affairs that is common in psychotherapy and pharmacotherapy (Lambert & Bergin, 1994). Perhaps the best example of research on this topic is the Rand Medical Outcome Study, which examined outcomes in depression (K.B. Wells, Strum, Sherbourne, & Meredith, 1996). This study examined systems of care, patient case mix, process of care, utilization, and clinical outcomes in an indirect structural analysis in order to develop models that inform policy and treatment decisions. Wells et al. found that treatment increases mental health services utilization and costs, regardless of provider specialty (general practitioner, psychiatrist, or other mental health specialist). The lowest cost, but also the worst outcomes for depression, were found in the general medical sector; the highest costs, but also the best outcomes, occurred in psychiatry. When costeffectiveness ratios were calculated, the greatest ''value" was to be found in the other mental health specialist provider group (psychologists, etc.). Wells et al. also estimated that quality improvement programs or decisions can make substantial improvements in cost-effectiveness or value. Without quality improvement that takes into account the cost-benefit ratio of different treatment, the current tendency to shift treatment toward

< previous page

page_143

next page >

< previous page

page_144

next page > Page 144

general medical practitioners may continue because it reduces costs. Cost-benefit studies show that such decisions worsen patient functioning. Whereas the results of the Rand study are highly complicated and complex (depending on type of depression, follow-up time period, etc.), it is obvious that the results provide a rich source of data. Using the data analytic procedures utilized by Wells and colleagues, it is possible to calculate the amount of money it costs to reduce a single symptom by a particular amount (e.g., what it costs to reduce the number of headaches a person has by one per week through the use of medication versus biofeedback). It also would be possible to estimate the cost of bringing a depressed patient into a normal state of functioning (and keep them there). Moreover, it is possible to compare the costs associated with specific treatment strategiesthat is, the cost-effectiveness of group versus individual cognitive behavior therapy. The Rand study of depression was a large-scale, extensive effort costing approximately $4 million to complete (K.B. Wells et al., 1996). Numerous other studies have been conducted on a variety of other disorders, such as chronic pain (Texidor & Taylor, 1991) and psychosomatic disorders (Pautler, 1991), but none have reached the scope of the Rand study. The limited number of studies and their diversity make it difficult to identify the best methods of estimating costs. Like the area of clinical outcome measurement, there are few agreed on methods of estimating treatment costs. The Rand study defined health as the number of serious functioning limitations. Costs were based on the "direct" costs of providing services to treat depression, and the value of care was estimated for the cost of reducing one or more functioning limitations. Other researchers estimate cost as an average cost of providing treatment for an episode of illness per patient by adding up staff expenses (including benefits) and then dividing by the number of patients treated in a year (Melson, 1995). Researchers have also attempted to estimate social costs, such as those that arise from lost productivity, use of social services and criminal justice system, and use of other health services. Cost-benefit analysis combined with estimates of outcome based on clinical significance could be usefully applied in the managed mental health setting to understand the consequences of rationing treatment. What is the "value" of fewer sessions versus more sessions on the long-term adjustment of patients? The complexity of these issues is beyond the scope of this chapter. Suffice it to say that estimates of the cost, cost-effectiveness, and medical cost offsets of psychotherapy are important topics in the assessment of psychotherapy outcome. McKenzie (1995), for example, argued that the relative equivalence of outcome in group and individual psychotherapy can be a powerful argument for the use of group therapy when the cost of delivering treatment and cost-benefit group therapy is considered. At the very least, this finding emphasizes the importance of research aimed at selecting patients who are most suitable for group treatment. The Importance of Tracking Outcome for Quality Care: The Case for Provider Profiling Considerable psychotherapy outcome research shows that a major contributor to patient improvement and deterioration is the individual therapist (Lambert & Bergin, 1994). Despite current emphasis on "empirically validated" treatments (Task Force, 1996), manual-based therapy (Lambert, 1998; Wilson, in press), and treatment guidelines that assume the curative power of therapy rests on treatment techniques, ample evidence

< previous page

page_144

next page >

< previous page

page_145

next page > Page 145

suggests the importance of particular therapists for positive outcomes (Garfield, 1996; Lambert & Okiishi, 1997). One clear implication of this finding is that it is important to use psychological tests to track patient outcome (by provider) for the purpose of increasing the quality of services (Clement, 1996; Lambert & Brown, 1996). This type of research can be expected to directly modify the practices of clinicians while research on specific disorders (clinical trials) will be much slower in having a real impact on clinical practice. It is important to use outcome measures that can provide clinicians with information about the effectiveness of their practice. Figure 27.6 in chapter 27 (this volume) presents data on patients and their clinicians who work in different outpatient clinics. The data presented suggest clear differences in average pretherapy levels of disturbance and also in the average amount of change associated with particular therapists. These data alert the clinicians to the fact that they have different outcomes and suggest the need for particular clinicians to explore the reasons for poorer outcomes in their patients (relative to other clinicians). Figure 27.4 from chapter 27 (this volume) suggests the unusual rapidity of improvement in patients treated by therapist L.J. This provider profile, based on repeated measurement of patient progress, shows an unusual pattern of improvement and calls for exploration of the methods used by L.J. Without the use of a reliable tracking device along with criteria of successful outcome it would be far more difficult to compare patient/clinician success rates. It is not difficult to see the advantages of this methodology for clinicians, health systems, and most importantly patients. Tracking patient outcome through the use of meaningful outcome measures can result in improved clinical decision making and quality of patient care. Conclusions The assessment of psychotherapy outcome is an important endeavor, impacting both the science and practice of mental health services. Outcome assessment is based on a rich tradition of research, and has shown steady improvement as a scientific endeavor over the last five decades. The phenomenon of measuring change and improvement is a fascinating, although presently chaotic, topic of scientific inquiry. It is hoped that it will continue to be a fruitful ground for collaborative efforts and important discoveries. This exciting area of inquiry is wide open to the energetic and gifted student. Important discoveries await the determined and patient researcher. References Ankuta, G.Y., & Abeles, N. (1993). Client satisfaction, clinical significance, and meaningful change in psychotherapy. Professional Psychology: Research and Practice, 24, 70-74. Bailey, D.B., & Simeonsson, R.J. (1988). Investigation of use of goal attainment scaling to evaluate individual progress of clients with severe and profound mental retardation. Mental Retardation, 26, 289-295. Beck, A.T. (1972). Depression: Causes and treatment. Philadelphia: University of Pennsylvania Press. Beck, A.T., & Beamesderfer, A. (1974). Assessment of depression: The depression inventory. In P. Pichot (Ed.), Psychological measurements in psychopharmacology, modern problems in pharmacopsychiatry (Vol. 7, pp. 151-169). Basel, Switzerland: Karger.

< previous page

page_145

next page >

< previous page

page_146

next page > Page 146

Beck, A.T., Ward, C.H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571. Berger, M. (1983). Toward maximizing the utility of consumer satisfaction as an outcome measure. In M.J. Lambert, E.R. Christensen, & S.S. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 56-80). New York: Wiley. Berzins, J.I., Bednar, R.L., & Severy, L.J. (1975). The problem of intersource consensus in measuring therapeutic outcomes: New data and multivariate perspectives. Journal of Abnormal Psychology, 84, 10-19. Beutler, L.E., & Hamblin, D.L. (1986). Individualized outcome measures of internal change: Methodological considerations. Journal of Consulting and Clinical Psychology, 54, 48-53. Burlingame, G.M., & Barlow, S.H. (1996). Outcome and process differences between professional and nonprofessional therapists in time-limited group psychotherapy. International Journal of Group Psychotherapy, 46, 455-478. Calsyn, R.J., & Davidson, W.S. (1978). Do we really want a program evaluation strategy based on individualized goals? A critique of goal attainment scaling. Evaluation Studies: Review Annual, 1, 700-713. Cartwright, D.S., Kirtner, W.L., & Fiske, D.W. (1963). Method factors in changes associated with psychotherapy. Journal of Abnormal and Social Psychology, 66, 164-175. Clark, M.S., & Caudrey, D.J. (1986). Evaluation of rehabilitation services: The use of goal attainment scaling. International Rehabilitative Medicine, 5, 41-45. Clement, P.W. (1996). Evaluation in private practice. Clinical Psychology: Science and Practice, 3, 146-159. Cohen, L.H. (1980). Methodological prerequisites for psychotherapy outcome research. Knowledge: Creation, Diffusion, and Utilization, 2, 263-272. Condon, K.M., & Lambert, M.J. (1994, June). Assessing clinical significance: Application to the State-Trait Anxiety Inventory. Paper presented at the annual meeting of the Society for Psychotherapy Research, York, England. Cytrynbaum, S., Ginath, Y., Birdwell, T., & Brandt, L. (1979). Goal attainment scaling: A critical review. Evaluation Quarterly, 3, 5-40. Derogatis, L.R., & Melisaratos, N. (1983). The Brief Symptom Inventory: An introductory report. Psychological Medicine, 13, 595-605. Docherty, J.P., & Streeter, M.J. (1996). Measuring outcomes. In L.I. Sederer & B. Dickey (Eds.), Outcome assessment in clinical practice (pp. 8-18). Baltimore: Williams & Wilkins. Farrell, A.D., Curran, J.P., Zwick, W.R., & Monti, P.M. (1983). Generalizability and discriminant validity of anxiety and social skills ratings in two populations. Behavioral Assessment, 6, 1-14. Fleuridas, C., Rosenthal, D.M., Leigh, G.K., & Leigh, T.E. (1990). Family goal recording: An adaptation of goal attainment scaling for enhancing family therapy and assessment. Journal of Marital and Family Therapy, 16, 389-406. Flowers, J.V., & Booarem, C.D. (1990). Four studies toward an empirical foundation for group therapy. Journal of Social Service Research, 13, 105-121. Follette, W.C., & Callaghan, G.M. (1996). The importance of the principle of clinical significancedefining significant to whom and for what purpose: A response to Tingey, Lambert, Burlingame, and Hansen. Psychotherapy Research, 6, 133-143. Forsyth, R.P., & Fairweather, G.W. (1961). Psychotherapeutic and other hospital treatment criteria: The dilemma. Journal of Abnormal and Social Psychology, 62, 598-604. Froyd, J.E., Lambert, M.J., & Froyd, J.D. (1996). A review of practices of psychotherapy outcome measurement. Journal of Mental Health, 5, 11-15. Garfield, S.L. (1996). Some problems associated with validated forms of psychotherapy. Clinical Psychology: Science and Practice, 3, 245-250. Garfield, S.L. (1991). Psychotherapy models and outcome research. American Psychologist, 46, 1350-1351. Garfield, S.L., Prager, R.A., & Bergin, A.E. (1971). Evaluation of outcome in psychotherapy. Journal of Consulting and Clinical Psychology, 37, 307-313. Gerarty, R.D. (1996). The use of outcome assessment in managed care: Past, present, and future. In L.I. Sederer & B. Dickey

< previous page

page_147

next page > Page 147

(Eds.), Outcome assessment in clinical practice (pp. 129-138). Baltimore: Williams & Wilkins. Gibson, R.L., Snyder, W.U., & Ray, W.S. (1955). A factors analysis of measures of change following clientcentered psychotherapy. Journal of Counseling Psychology, 2, 83-90. Glaister, B. (1982). Muscles relaxation training for fear reduction of patients with psychological problems: A review of controlled studies. Behavior Research and Therapy, 20, 493-504. Goldfried, M.R., Greenberg, L.S., & Marmar, C. (1990). Individual psychotherapy: Process and outcome. Annual Review of Psychology, 41, 659-688. Green, B.C., Gleser, G.C., Stone, W.N., & Siefert, R.F. (1975). Relationships among diverse measures of psychotherapy outcome. Journal of Consulting and Clinical Psychiatry, 43, 689-699. Grundy, C.T., Lambert, M.J., & Grundy, E. (1996). Assessing clinical significance: Application to the Hamilton Rating Scale for Depression. Journal of Mental Health, 5, 25-33. Grundy, C.T., Lunnen, K.M., Lambert, M.J., Ashton, J.E., & Tovey, D.R. (1994). The Hamilton Rating Scale for Depression: One scale or many? Clinical Psychology: Science and Practice, 1, 197-205. Grundy, E.M., & Lambert, M.J. (1994, June). Assessing clinical significance: Application to the Child Behavior Checklist. Paper presented at the annual meeting of the Society for Psychotherapy Research, York, England. Hall, M.C., Elliot, K.M., & Stiles, G.W. (1992). Hospital patient satisfaction: Correlates, dimensionality, and determinants. Journal of Hospital Marketing, 1, 77-90. Hamilton, M. (1967). Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology, 6, 278-296. Herbert, L.D., & Mueser, K.T. (1991). The proof is in the pudding: A commentary on persons. American Psychologist, 46, 1347-1348. Hollon, S.D., & Flick, S.N. (1988). On the meaning and methods of clinical significance. Behavior Assessment, 10, 197-206. Horowitz, L.M., Strupp, H.H., Lambert, M.J., & Elkin, I. (1997). Overview and summary of the core-battery conference. In H.H. Strupp, L.M. Horowitz, & M.J. Lambert (Eds.), Measuring patient changes in mood, anxiety, and personality disorders: Toward a core battery. (pp. 11-54). Washington, DC: American Psychological Association. Howard, K.I., Lueger, R.J., Maling, M.S., & Martinovich, Z. (1993). A phase model of psychotherapy outcome: Causal medication of change. Journal of Clinical and Consulting Psychology, 61, 678-685. Hsieh, M.O., & Kagle, J.D. (1991). Understanding patient satisfaction and dissatisfaction with health care. Health Social Work, 16, 281-290. Jacobson, N.S. (1988). Defining clinically significant change: An introduction. Behavioral Assessment, 10, 131132. Jacobson, N.S., Follette, W.C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy, 15, 336-352. Jacobson, N.S., Follette, W.C., & Revenstorf, D. (1986). Toward a standard definition of clinically significant change. Behavior Therapy, 17, 308-311. Jacobson, N.S., Follette, W.C., & Revenstorf, D., Baucom, D.H., Hahlweg, K., & Margolin, G. (1984). Variability in outcome and clinical significance of behavioral marital therapy: A reanalysis of outcome data. Journal of Consulting and Clinical Psychology, 52, 497-504. Jacobson, N.S., & Revenstorf, D. (1988). Statistics for assessing the clinical significance of psychotherapy techniques: Issues, problems, and new developments. Behavioral Assessment, 10, 133-145. Jacobson, N.S., & Truax, P. (1991). Clinical Significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Kazdin, A.E. (1977). Assessing the clinical or applied importance of behavior change through social validation. Behavior Modification, 1, 427-452. Kendall, P.C., Hollon, S., Beck, A.T., Hammen, C., & Ingram, R.E. (1987). Issues and recommendations regarding use of the Beck

< previous page

page_147

next page >

< previous page

page_148

next page > Page 148

Depression Inventory. Cognitive Therapy and Research, 11, 289-300. Kendall, P.C., & Grove, W.M. (1988). Normative comparisons in therapy outcome. Behavioral Assessment, 10, 147-158. Kiresuk, T.J., & Sherman, R.E. (1968). Goal attainment scaling: A general method for evaluating comprehensive community mental health programs. Community Mental Health Journal, 4, 443-453. Lacks, P., & Powlishta, K. (1989). Improvement following behavioral treatment for insomnia: Clinical significance, long-term maintenance, and predictors of outcome. Behavior Therapy, 20, 117-134. Lambert, M.J. (1983). Introduction to assessment of psychotherapy outcome: Historical perspective and current issues. In M.J. Lambert, E.R. Christensen, & S.S. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 3-32). New York: Wiley Interscience. Lambert, M.J. (1998). Manual-based treatment and clinical practice: Hangman of life or promising development? Clinical Psychology: Science and Practice, 5, 391-395. Lambert, M.J., & Bergin, A.E. (1994). The effectiveness of psychotherapy. In A.E. Bergin & S.L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 143-189). New York: Wiley. Lambert, M.J., & Brown, G.S. (1996). Data-based management for tracking outcome in private practice. Clinical Psychology: Science and Practice, 3, 172-178. Lambert, M.J., Christensen, E.R., & DeJulio, S.S. (Eds.). (1983). The assessment of psychotherapy outcome. New York: Wiley. Lambert, M.J., Hatch, D.R., Kingston, M.D., & Edwards, B.C. (1986). Zung, Beck, and Hamilton rating scales as measures of treatment outcome: A meta-analytic comparison. Journal of Consulting and Clinical Psychology, 54, 54-59. Lambert, M.J., & Hill, C.E. (1994). Assessing psychotherapy outcomes and processes. In A.E. Bergin & S.L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 72-113). New York: Wiley. Lambert, M.J., Ogles, B.M., & Masters, K.S. (1992). Choosing outcome assessment devices: An organizational and conceptual scheme. Journal of Counseling and Development, 70, 527-532. Lambert, M.J., & McRoberts, C.H. (1993, April). Outcome measurement in JCCP: 1986-1991. Paper presented at the meetings of the Western Psychological Association, Phoenix, AZ. Lambert, M.J., & Okiishi, J.C. (1997). The effects of the individual psychotherapist and implications for future research. Clinical Psychology: Science and Practice, 4, 66-75. Lambert, M.J., Shapiro, D.A., & Bergin, A.E. (1986). The effectiveness of psychotherapy. In S.L. Garfield & A.E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed., pp. 157-212). New York: Wiley. Lewis, A.B., Spencer, J.H., Haas, G.L., & DiVittis, A. (1987). Goal attainment scaling: Relevance and replicability in follow-up of inpatients. Journal of Nervous and Mental Disease, 175, 408-418. Locke, H.J., & Wallace, K.M. (1959). Short-term marital adjustment and prediction tests: Their reliability and validity. Marriage and Family Living, 21, 251-255. Luborsky, L. (1971). Perennial mystery of poor agreement among criteria for psychotherapy outcome. Journal of Consulting and Clinical Psychology, 37, 316-319. Maher, C.A., & Barbrack, C.R. (1984). Evaluating the individual counseling of conduct problem adolescents: The goal attainment scaling method. Journal of School Psychology, 22, 285-297. Martinovich, Z., Saunders, S., & Howard, K.I. (1996). Some comments on "assessing clinical significance." Psychotherapy Research, 6, 124-132. Mavissakalian, M. (1986). Clinically significant improvement in agoraphobia research. Behavior Research and Therapy, 24, 369-370. McKenzie, K.R. (1995). Effective use of group therapy in managed care. Washington, DC: American Psychiatric Press. McLellan, A.T., & Durell, J. (1996). Outcome evaluation in psychiatric and substance abuse treatments: Concepts, rationale, and methods. In L.J. Sederer & B. Dickey (Eds.), Outcome assessment in clinical practice (pp. 34-44). Baltimore: Williams & Wilkins. Melson, S.J. (1995). Brief day treatment for nonpsychotic patients. In K.R. McKenzie (Ed.), Effective use of group therapy in managed care (pp. 113-128). Washington, DC: American Psychiatric Press.

< previous page

page_149

next page > Page 149

Messer, S.B. (1991). The case formulation approach: Issues of reliability and validity. American Psychologist, 46, 1348-1350. Miller, R.C., & Berman, J.S. (1983). The efficacy of cognitive behavior therapies: A quantitative review of the research evidence. Psychological Bulletin, 94, 39-53. Mintz, J., & Kiesler, D.J. (1982). Individualized measures of psychotherapy outcome. In P.C. Kendall & J.N. Butcher (Eds.), Handbook of research methods in clinical psychology (pp. 491-534). New York: Wiley. Mintz, J., Luborsky, L., & Christoph, P. (1979). Measuring the outcomes of psychotherapy: Findings of the Penn Psychotherapy Project. Journal of Consulting and Clinical Psychology, 47, 319-334. Mintz, J., Mintz, L.I., Arvuda, M.J., & Huang, S.S. (1993). Treatments of depression and the functional capacity to work. Archives of General Psychiatry, 49, 761-768. Monti, P.M., Wallander, J.L., Ahern, D.K., Abrams, D.B., & Munroe, S.M. (1983). Multi-modal measurement of anxiety and social skills in a behavioral role-play test. Generalizability and discriminant validity. Behavioral Assessment, 6, 15-25. Mylar, J.L., & Clement, P.W. (1972). Prediction and comparison of outcome in systematic desensitization and implosion. Behavior Research and Therapy, 10, 235-246. Nietzel, M.T., Russell, R.L., Hemmings, K.A., & Gretter, M.L. (1987). Clinical significance of psychotherapy for unipolar depression: A meta-analytic approach to social comparison. Journal of Consulting and Clinical Psychology, 55, 156-161. Ogles, B.M., Lambert, M.J., & Masters, K.S. (1996). Assessing outcome in clinical practice. Boston: Allyn & Bacon. Ogles, B.M., Lambert, M.J., & Sawyer, M.D. (1995). The clinical significance of the NIMH treatment of depression collaborative research program data. Journal of Consulting and Clinical Psychology, 63, 317-325. Ogles, B.M., Lambert, M.J., Weight, D.G., & Payne, I.R. (1990). Agoraphobia outcome measurement: A review and meta-analysis. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 2, 317-325. Pautler, T. (1991). A cost effective mind-body approach to psychosomatic disorders. In K.N. Anchor (Ed.), The handbook of medical psychotherapy: Cost effective strategies in mental health (pp. 231-248). Toronto: Hogrefe & Huber. Perry, G., Shapiro, D.A., & Firth, J. (1986). The case of the anxious executive: A study from the research clinic. British Journal of Medical Psychology, 59, 221-233. Persons, J.B. (1991). Psychotherapy outcome studies do not accurately represent current models of psychotherapy: A proposed remedy. American Psychologist, 46, 99-106. Pilkonis, P.A., Imber, S.D., Lewis, P., & Rubinsky, P. (1984). A comparative outcome study of individual, group, and conjoint psychotherapy. Archives of General Psychiatry, 41, 431-437. Polkinghorne, D.E. (1991). Two conflicting calls for methodological reform. The Consulting Psychologist, 19, 103-114. Ross, S.M., & Proctor, S. (1973). Frequency and duration of hierarchy item exposure in a systematic desensitization analogue. Behavior Research and Therapy, 11, 303-312. Russell, C.S., Olson, D.H., Sprenkle, D.H., & Atilano, R.B. (1983). From family system to family system: Review of family therapy research. American Journal of Family Therapy, 11, 3-14. Sabalis, R.F. (1983). Assessing outcome in patients with sexual dysfunctions and sexual deviations. In M.J. Lambert, E.R. Christensen, & S.S. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 205-262). New York: Wiley. Saunders, S.M., Howard, K.I., & Newman, F. L. (1988). Evaluating the clinical significance of treatment effects: Norm and normality. Behavioral Assessment, 10, 207-218. Schacht, T.E. (1991). Formulation-based psychotherapy research: Some further considerations. American Psychologist, 46, 1346-1347. Schmaling, K.B., & Jacobson, N.S. (1987, November). The clinical significance of treat ment gains resulting from parent-training interventions for children with conduct problems: A reanalysis of outcome data. Paper presented at the annual meeting of the Association for the Advancement of Behavior Therapy, Boston. Schulte, D. (1995). How treatment success could be assessed. Psychotherapy Research, 5, 281-296.

< previous page

page_149

next page >

< previous page

page_150

next page > Page 150

Scott, M.J., & Stradling, S.G. (1990). Group cognitive therapy for depression produces clinically significant reliable change in community-based settings. Behavioral Psychotherapy, 18, 1-19. Seggar, L., & Lambert, M.J. (1994, June). Assessing clinical significance: Application to the Beck Depression Inventory. Paper presented at the annual meeting of the Society for Psychotherapy Research, York, England. Seligman, M.E.P. (1995). The effectiveness of psychotherapy: The Consumer Reports study. American Psychologist, 50, 965-974. Shapiro, D.A., & Shapiro, D. (1982). Meta-analysis of comparative therapy outcome studies: A replication and refinement. Psychological Bulletin, 92, 581-604. Shore, M.F., Massimo, J. L., & Ricks, D.F. (1965). A factor analytic study of psychotherapeutic change in delinquent boys. Journal of Clinical Psychology, 21, 208-212. Silverman, W.K. (1991). Person's description of psychotherapy outcome studies does not accurately represent psychotherapy outcome studies. American Psychologist, 46, 1351-1352. Smith, M.L., Glass, G.V., & Miller, T. (1980). The benefits of psychotherapy. Baltimore: Johns Hopkins University Press. Spielberger, C.D., Gorsuch, R.L., Lushene, P. R., & Jacobs, G.A. (1983). Manual for the State-Trait Anxiety Inventory (Form Y). Palo Alto, CA: Consulting Psychologists Press. Strupp, H.H., & Hadley, S.W. (1977). A tripartite model of mental health and therapeutic outcomes: With special reference to negative effects in psychotherapy. American Psychologist, 32, 187-196. Strupp, H.H., Schacht, T.E., & Henry, W.P. (1988). Problem-treatment-outcome congruence: A principle whose time has come. In H. Dahl, H. Kaechele, & H. Thomas (Eds.), Psychoanalytic process research strategies (pp. 1-14). Berlin: Springer. Strupp, H.H., Horowitz, L.M., & Lambert, M.J. (1997). Measuring patient changes in mood, anxiety, and personality disorders: Toward a core battery. Washington, DC: American Psychological Association. Task Force in Promotion and Dissemination of Psychological Procedures (1996). An update on empirically validated therapies. The Clinical Psychologist, 49, 5-22. Texidor, M. & Taylor, C. (1991). Chronic pain management: The interdisciplinary approach and cost effectiveness. In K.N. Anchor (Ed.), The handbook of medical psychotherapy: Cost effective strategies in mental health (pp. 89-100). Toronto: Hogrefe & Huber. Tingey, R.C. (1989). Assessing clinical significance: Extension in methods and application to the SCL-90 R. Dissertation Abstracts International, 50, 04B. Tingey, R.C., Lambert, M.J., Burlingame, G. M., & Hansen, N.B. (1996a). Assessing clinical significance: proposed extensions to method. Psychology Research, 6, 109-123. Tingey, R.C., Lambert, M.J., Burlingame, G. M., & Hansen, N.B. (1996b). Clinically significant change: Practical indicators for evaluating psychotherapy outcome. Psychotherapy Research, 6, 144-153. Trull, T.J., Nietzel, M.T., & Main, A. (1988). The use of meta-analysis to assess the clinical significance of behavior therapy for agoraphobia. Behavior Therapy, 19, 527-538. Wampold, B.E., & Jenson, W.R. (1986). Clinical significance revisited. Behavioral Therapy, 17, 302-305. Waskow, I.E., & Parloff, M.B. (1975). Psychotherapy change measures. Rockville, MD: National Institute of Mental Health. Waxman, H.M. (1996). Using outcomes assessment for quality improvement. In L.J. Sederer & B. Dickey (Eds.), Outcome assessment in clinical practice (pp. 25-33). Baltimore: Williams & Wilkins. Wells, E.A., Hawkins, J.D., & Catalano, R.F. (1988). Choosing drug use measures for treatment outcome studies: 1. The influence of measurement approach on treatment results. International Journal of Addictions, 23, 851-873. Wells, K.B., Strum, R., Sherbourne, C.D., & Meredith, L.A. (1996). Caring for depression: A Rand study. Cambridge, MA: Harvard University Press. Williams, S.L. (1985). On the nature and measurement of agoraphobia. Progress in Behavior Modification, 19, 109-144. Wilson, G.T. (in press). Manual-based treatment and clinical practice. Clinical Psychology: Science and Practice, 5. Wilson, G.T., & Thomas, M.G. (1973). Self versus drug-produced relaxation and the effects

< previous page

page_151

next page > Page 151

of instructional set in standardized systematic desensitization. Behavior Research and Therapy, 11, 279-288. Wolf, M.M. (1978). Social validity: The case for subjective measurement or how applied behavior analysis is finding its heart. Journal of Applied Behavior Analysis, 11, 203-214. Worchel, P., & Byrne, D. (Eds.) (1964). Personality change. New York: Wiley. Woodward, C.A., Santa-Barbara, J., Levin, S., & Epstein, N.B. (1978). The roles of goal attainment scaling in evaluating family therapy outcome. American Journal of Orthopsychiatry, 48, 464.

< previous page

page_151

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_153

next page > Page 153

Chapter 5 Guidelines for Selecting Psychological Instruments for Treatment Planning and Outcome Assessment Frederick L. Newman Florida International University James A. Ciarlo University of Denver Daniel Carpenter Merit Behavioral Care Corporation Cornell University College of Medicine Envision the situation if oncological medicine were forced to base treatment decisions on just diagnosis and cost containment rather than on clinical status and outcome: "Mr. Smith, 90% of the tumor is now benign. Only 10% of the tumor remains malignant. Unfortunately you have used up the 20 sessions of radiation treatment allowed under your managed care plan for this year. Please come back next year when your insurance eligibility has been renewed" (Newman & Carpenter, 1997, p. 1040). Few would argue against employing health status, biological, or behavioral criteria when setting eligibility and level of care requirements for oncological medicine (or for the delivery of a baby, or for most nonelective surgery). The same logical arguments can and should be offered to support the delivery of mental health services. There are psychological assessment techniques that can be used to provide valid evidence that such criteria have been met. But how does one decide which instrument is most suitable to the circumstances? This chapter provides guidelines that can be used to select one or more instruments that are most suitable to the population being served and the treatment goals of the service(s). The guidelines can also be used to evaluate the appropriateness of an instrument(s) that is (are) currently in use, or proposed. One theme of the chapter is that the guidelines must be understood within the current demands on clinical practice and the delivery of mental health services. One contextual constraint is efforts to contain costs through managed care. The good news is the availability of psychological assessment instruments that may be applied in managed care settings to determine eligibility and level of care (see reviews by Howard, Moras, Brill, Martinovich, & Lutz, 1996; Newman & Tejeda, 1996). Additional good news is that many employers and behavioral health insurers now appear to understand that simple cost containment for one episode of care could lead to greater long-term expense. This is particularly true for the growing number of behavioral managed care programs that serve persons with severe and persistent mental illnesses. These insurers

< previous page

page_153

next page >

< previous page

page_154

next page > Page 154

are seeking valid procedures to address three basic questions: Should the illness be treated? What interventions are needed and by whom, and where should they be delivered? What are the outcome criteria? Or, more generally, who is eligible for what level of care? Level of care can be described as the amount and type of clinical or support resources that ought to be allocated to achieve a satisfactory outcome. Many researchers are actively involved in addressing these questions. The Internet bulletin boards serving mental health services researchers (e.g., OUTCMTEN) have active interchanges about which instruments are appropriate under what circumstances. When arguing for a particular level of care, practitioners find little evidence in the research literature to guide their decisions about the appropriate level of care (Newman & Tejeda, 1996). Traditional clinical research designs fix the treatment dosage level and run a horse race between experimental and control conditions or among several alternative treatments to determine which achieves the best outcomes with that dosage. Yet, in practice, the clinician works with the patient to achieve an agreed on level of functioning or reduction in symptom distress, or both. There is a need to modify research strategies on mental health services effectiveness such that it is possible to address such questions as: What type and amount of treatment will achieve a given behavioral criterion for XX% of the patients who meet the entry level of functioning? To be effective (and cost-effective) in the selection of an instrument or instruments, an additional series of questions must systematically be addressed: 1. What psychological and community functioning domains do clinicians wish to assess for this patient or this group of patients? 2. What are the behaviors that clinicians expect to impact? 3. What clinical or program decisions will be supported by an assessment of the person's psychological state or functional status? 4. What is the most cost-effective means for performing these assessments? Eleven guidelines are offered for instrument selection. The guidelines were originally developed by a panel of experts assembled by the National Institute of Mental Health (Ciarlo, Brown, Edwards, Kiresuk, & Newman, 1986)1 and were more recently updated to consider the potential impact on managed care (Newman & Carpenter, 1997). This chapter provides additional updates to the guidelines in terms of two demands on the clinical community: managed care and consumer choice. These are not, and should not be, independent. The assessment techniques used by managed care to determine eligibility, level of care, progress, and outcome should also be used as part of a delivery system report card to inform consumers and other purchasers of mental health services (Dewan & Carpenter, 1997; Mulkern, Leff, Green, & Newman, 1995; Newman, DeLiberty, Hodges, McGrew, & Tejeda, 1997). Consumer groups are requesting that the report card go beyond that of satisfaction with the manner in which services are provided (although that is also important) to incorporate the quality and long-term effects of the services themselves. Thus, the proper selection of psychological assessment techniques is critical to managed care and to consumer choice. The guidelines are summarized in Table 5.1 and are organized under five groupings: Applications of Measures, Methods and Procedures, Psychometric Features, Cost Considerations, and Utility Considerations. It should be obvious that the guidelines are not independent of each other. Yet, each focuses on unique concerns that will help readers 1 Members of the expert panel were A. Broskowski, J.A. Ciarlo, G.B. Cox, H.H. Goldman, W.A. Hargreaves, I. Elkins, J. Mintz, F.L. Newman, and J.W. Zinober.

< previous page

page_154

next page >

< previous page

page_155

next page > Page 155

TABLE 5.1 Guidelines for the Development, Selection and/or Use of Progress-outcome Measures Applications 1. Relevance to target group and independent of treatment provided, although sensitive to treatmentrelated changes. Methods and procedures 2. Simple, teachable methods. 3. Use of measures with objective referents. 4. Use of multiple respondents. 5. More process-identifying outcome measures. Psychometric features 6. Psychometric strength: Reliable, valid, sensitive to treatment-related change, and nonreactive. Cost considerations 7. Low costs. Utility considerations 8. Understanding by nonprofessional audiences. 9. Easy feedback and uncomplicated interpretation. 10.Useful in clinical services. 11.Compatibility with clinical theories and practices. consider the demands of their own situation, the literature and the relation of that guideline to the other guidelines. Guideline 1 Relevance to Target Group What are the characteristics of a target group that require its own assessment approach? An outcome measure or set of measures should be relevant and appropriate to the target group(s) whose treatment is being studied; that is, the most important and frequently observed symptoms, problems, goals, or other domains of change for the group(s) should be addressed by the measure(s). . . . Other factors being equal, use of a measure appropriate to a wider range of client groups is preferred. . . . Measures (should be) . . . independent of the type of treatment service provided are to be preferred. (Ciarlo et al., 1986, p. 26) Common wisdom holds that treatment selection, and a person's probable response to treatment, should be based on both clinical and demographic characteristics (Beutler & Clarkin, 1990). A target group can be described as a cluster of persons with similar clinical-demographic characteristics that are expected to have a similar response to treatment. Beutler and Clarkin (1990) provided guidelines that follow from the clinical literature. Combined use of needs assessment information from epidemiological surveys with expert panels have been employed in identifying target groups requiring similar systems of services for persons with a severe mental illness (Newman, Griffin, Black, & Page, 1989; Uehara, Smukler, & Newman, 1994). Another approach is to link the epidemiological data with historic levels of care or a combination of both (Uehara et al., 1994). A second feature of a target group that must be considered is personal characteristics that are known to influence how the information is collected. Differences in age, ethnicity (related to language and meaning), comorbidity with a physical illness or developmental disability, and past experiences can all influence the administration of a procedure. The

< previous page

page_155

next page >

< previous page

page_156

next page > Page 156

instruments discussed in this text provide an excellent platform for selecting such measures. However, if the particular target group served by a given program or practice has qualities that differ markedly from those of the overall target group, then a more detailed review of the literature cited within the reference lists at the end of a chapter will be required. Texts such as this one, and Ciarlo et al. (1986), contain sections and reference lists that provide greater detail on the limits of the techniques and their potential use with other populations. Guideline 2 Simple, Teachable Methods Ciarlo et al. (1986) pointed out that this second guideline was readily agreed on by all of the panelists working on these guidelines, but the development of training manuals and methods for assuring the quality of instrument administration at that time was seen as weak. Since then, the development of computer-assisted administration of assessment techniques has enhanced the reliability and validity of implementation by standardizing the way in which queries are presented. The long-standing difficulty of bridging the ethnic and cultural differences between the clinician and the patient can also be helped with the use of culturally sensitive selection of an assessment technique from a computerized menu of instruments. Even with the development of computer-assisted methods, the traditional guidelines for developing training materials and for controlling administrative quality must still be applied (see texts such as Nunnally & Burnstein, 1994; or Cronbach, 1970). Self-report measures (e.g., SCL-90-R, Basis-32, Beck Depression Inventory, MMPI-2) or measures completed by a significant other (e.g., the parents when using the Children's Behavior Checklist; Achenbach & Edelbrock, 1983) that have survived scrutiny and are considered to have adequate psychometric quality usually have good instructions and administration manuals. But, if the recommended guidelines for administration are ignored, then there are potentially disastrous effects on measure reliability and validity. For example, the instructions for most self-report instruments strongly recommend completion independent of guidance or advice from others, preferably in isolation. But, this requirement has not always been adhered to adequately. It is possible that the use of computers to collect self-report information will also increase the fidelity of the data collection. This is one area where computer-assisted applications are particularly useful. Many people are accustomed to interacting with a machine that asks them questionsoften of a quite personal nature (Locke et al., 1992; Navaline et al., 1994). Measures completed by an independent clinical observer or by the treating clinician can be very useful, but often the instructions on the instrument's use, training, and quality control procedures are poorly developed. On the one hand, such measures seek to make use of the professional's trained observations. On the other hand, such scales tend to be more reactive to clinician judgment bias (Newman, 1983; Patterson & Sechrest, 1983). Procedures for surfacing judgment biases in a staff training format are discussed in Newman (1983) and detailed in Newman and Sorensen (1985). When an assessment instrument is to be used as the basis for determining the level of need and reimbursement in a managed care environment, controls over training and use of a clinician rating assessment instrument are necessary to prevent improper use of the instrument. A good example of where such controls are currently employed is

< previous page

page_156

next page >

< previous page

page_157

next page > Page 157

with the Indiana Division of Mental Health's implementation of a managed care program, the Hoosier Assurance Plan (HAP). The HAP provides state funds to cover services to adults with a serious mental illness or a chronic addiction and children/adolescents with a severe emotional disorder. Two key features of the plan are that the consumer is to have informed choice2 of service provider; and the level of reimbursement is determined by the level of need demonstrated by the array of factor scores on an assessment instrument (one instrument for children and adolescents, and another for adults). The assessment instruments employed by HAP underwent 2 years of extensive pilot testing to assure that the psychometric qualities met the standards set by the advisory committee (Newman et al., 1997). To assure the integrity of the clinician's rating of the consumer, two controls have been implemented. First, all clinicians must be trained to an established criteria in the use of the instrument, with evidence of such achievement available for an audit. To support the local service program training efforts, each program is expected to have one or more clinical staff trained as trainers, where the training of trainers program, funded by the state, includes training packets, training vignettes, and guidelines on how to conduct the staff training programs. Second, an independent audit team (staffed by registered nurses with specific training) conducts a review of a random sample of cases on site at each service provider. A report of the audits goes both to the service program and to the state office of mental health. At this time, the reports have led to changes in training and supervision practices at the local level, with no official actions by the state office of mental health; however, it is understood that repeated problems could lead to actions by that office. Thus, training in the use of the instrument along with audits of both the local program's application of training procedures and the assessment instruments were seen as necessary controls. These controls were employed in the pilot work, and high levels of reliability and validity were observed (Newman et al., 1997). Guideline 3 Use of Measures with Objective Referents An objective referent is one for which concrete examples are given for each level of a measure or at least at key points on the rating scale. A major asset of objective referents is the potential to develop reliable and usable norms for an instrument, which is particularly critical when applied to managed care eligibility and level of care decisions. One of the best examples of a scale with objective referents is the Child Adolescent Functional Assessment Scale (CAFAS; Hodges, 1996; Hodges, & Gust, 1995). Examples of behaviors are provided at each of four levels of impairment for each of 35 categories of behavior. For example, under the most severe level of impairment for Unsafe/Potentially 2Informed Choice is supported by an annual The Hoosier Assurance Plan Provider Profile Report Card, which contains information on the performance of each service provider as measured by data from two data sources. The first contains the results of telephone interviews of a random sample of consumers asking about the impact of the services on their functioning and quality of life. The sample is stratified by service program and conducted by the University of Indiana Survey Research Unit. The second source is the reported baselines and the 90-day changes in factor scores from the clinical assessments performed on all consumers covered by Hoosier Assurance Plan (Indiana's managed care program). A copy of the Provider Profile Report Card, the adult instrument, and the training manual can be obtained from The Evaluation Center @ HSRI, 2336 Mass Avenue, Cambridge, MA 02140.

< previous page

page_157

next page >

< previous page

page_158

next page > Page 158

Unsafe Behavior there are five examples of behaviors at this level, two of which are the following: 114: Dangerous behavior caused harm to household member. 117: Sexually assaulted/abused another household member, or attempted to (e.g., sibling). Another approach involves the development of multiple items within a class of behaviors, and the rater is provided one referent behavior in an item and then requested to identify either the behavior's frequency (e.g., xtimes in last 24 hours, or in the last week, or in the last 30 days), the similarity of the observed behavior to the referent behavior (e.g., most like to least like the referent behavior), or the intensity of referent behavior (e.g., from not evident to mild to severe). Which approach may be best suited to each particular program is an empirical issue (see Guidelines 6-11). Clinicians often proclaim the attractiveness of instruments that are individualized to the client. The most attractive features of these instruments are that the measures can be linked more directly to the consumer's own behaviors and life situation, and treatment selection, course, and outcome can be individualized to the consumer. In fact, a consistent finding in the literature is that when the client and the clinician have agreed on the problems and goals, there is a significantly positive increase in outcome (Mintz & Kiesler, 1981). Measures of this sort include target complaints (severity of), goal-attainment scaling, problem-oriented records, and global improvement ratings. The major problem with such measures involves the issue of generalizability. Specifically, is the change in the severity of one person's complaint-problem comparable to a like degree of change in another person's complaintproblem? Although the issue of generalizability plagues all measures, without objective referents the distribution of outcomes becomes free-floating across settings or clinical groups (Cytrynbaum, Ginath, Birdwell, & Brandt, 1979), thereby limiting the utility of the measures. There are arguments on the other side of this issue, but mostly when data aggregation is involved. Several metaanalytic studies, where effect size is standardized, have been very informative without specifically identifying the behaviors that have been modified (e.g., Lipsey & Wilson, 1993). Howard, Kopta, Krause, and Orlinsky (1986) studied the relation of ''dosage" (number of visits) to outcome across studies where the measure of outcome was simply whether improvement was observed. But application of data aggregation methods may only be useful for addressing research and policy questions without satisfying the need for a clinician or a clinical service to communicate with a patient or an insurer about an individual patient's eligibility for care. Even with objective referents, local conditions, including statewide funding practices or community standards of "normal functioning," will transform the distributions of any measure, with or without objective referents (Newman, 1980). Thus, studies that identify local "norms" should become standard practice for any measure intended to set funding guidelines, to set standards for treatment review, or to conduct evaluation research (Newman, 1980; Newman, Kopta, McGovern, Howard, & McNeilly, 1988). Individualized problem identification and goal setting do have beneficial outcomes, so it may be possible to have the best of both worlds. This can be accomplished by using both an individualized instrument and an instrument with national norms that has objective referents. Sechrest (personal communication, November 1993) recommended that by using both individualized instruments and instruments with national norms, two demands can be satisfied. One demand is that of identifying the individual characteristics of the patient in terms that are most useful in the local situation. The

< previous page

page_158

next page >

< previous page

page_159

next page > Page 159

other is to relate these characteristics to the patient's performance on a standardized instrument. In both cases, the other guidelines that support the reliable and valid application of either assessment technique also must be applied. Guideline 4 Use of Multiple Respondents A number of theorists and researchers have noted that measures from the principal stakeholders (client, therapist, significant other collateral, research evaluator) should be obtained because each views the process and outcomes of treatment differently (Ciarlo et al., 1986; Ellsworth, 1975; Lambert, Christensen, & DeJulio, 1983; Strupp & Hadley, 1977). The importance of this guideline varies by target group and by the clinical situation involved. For example, a second informant can be helpful when assessing behaviors that are socially undesirable or about which someone might generally be guarded, reticent, or simply unaware. In the assessment of children, Achenback and Edlebrock (1983) considered the parents of psychologically troubled children as primary observers, whereas teachers are considered secondary. Similar issues are being addressed in the development of assessment scales for the elderly, where the adult children of the frail elderly would be considered as major stakeholders whose assessments are considered primary over self-reports (R.A. Kane & R.L. Kane, 1981; Lawton & Teresi, 1994; Mangen & Peterson, 1984). There have been a number of researchers who have contrasted the views of the four major respondents: client, treating clinician, significant other, and independent clinical observer. Turner, McGovern, and Sandrock (1982) found that there is a high level of agreement across different scales originally designed specifically for use by one of the respondent groups, as evidenced through high canonical correlations across respondents (e.g., the SCL-90-R for clients, the Colorado Clinical Rating Scale and the Global Assessment Scale for clinicians, and the Personal Adjustment and Role Skills scale for significant others) when instructions were modified to fit each of the respondents. High coefficients were obtained when observers described specific behaviors (where the scale had objective referents), and lower coefficients were obtained when observers described how another person felt (e.g., she/he felt "happy" or "sad"). The major advantages achieved by obtaining measures from multiple observers are: (a) Each observer's experiences result in a unique view of the client (although Turner et al., 1982, suggested that these views can be highly similar). (b) Concurrent validation of the client's behavioral status and changes can be obtained. (c) Responses are likely to be more honest if all of the respondents are aware that there are multiple respondents. And, (d) discrepancies between informants can enlighten the clinician to potential problem areas to be addressed in treatment (e.g., client: "I am sleeping ok!"; spouse: "He paces all night long!"). A major disadvantage of using multiple sources is higher costs, particularly in terms of the time and effort of data collection and analysis. There is also the added logistical problem of attempting to collect the functional status data from multiple respondents at the same time, such that the same states are being observed. The time and effort costs are becoming more manageable with the use of computer-assisted testing and scoring procedures; however, the additional costs of hardware and software must be considered.

< previous page

page_159

next page >

< previous page

page_160

next page > Page 160

Guideline 5 More Process-Identifying Outcome Measures Measure(s) that provide information regarding the means or processes by which treatments may produce positive effects are preferred to those that do not (Ciarlo et al., 1986, p. 28). The basic concept here is at least controversial. On one side of the issue, Orlinsky and Howard (1986) argued that there ought to be a relation between process and outcomes. Behavioral and cognitive behavioral treatments employing self-management, homework assignments, and self-help group feedback often use measures with objective behavioral referents as both process and outcome measures. On the other side, Stiles and Shapiro (1995) argued that most important interpersonal and relationship ingredients (processes) that occur during psychotherapy (possibly other psychosocial intervention sessions as well) are not expected to correlate with outcome. Adequate empirical support for either side of the argument is still lacking, and the different sides of the arguments appear to be theory related (Newman, 1995). It is probably best to consider this guideline in terms of measuring treatment progress or attainment of intermediate goals of the treatment plan. Steady progress toward these goals ought to be an integral part of the conversation between patient and clinician, and it is a key concern of the managed care companies. Howard et al. (1996) described how clinicians can map the individual client's progress on a standardized measure relative to the progress expected from a database that included a large sample of consumers who had similar clinical characteristics at the start of their therapeutic episode. (Details on this procedure are described later by Newman and Dakof in chap. 7). The approach proposed by Howard et al. (1996) is particularly attractive to managed care companies because it is possible to empirically describe when a consumer's progress is or is not satisfactory in a timely fashion. If the observed progress is working, continue treatment; if it is not working, it may be advisable to shift the treatment strategy. A strong argument can be made that behavioral "markers" of progress or risk level should be taken regularly during the course of treatment to decide if a change in therapeutic strategy should be considered. Examples of behavioral markers include signs or levels of suicidality, depression, anxiety, substance abuse, interpersonal functioning, or community functioning (Lambert & Lambert, chap. 5, this volume; Maruish, 1994). These "markers" are not necessarily describing the actual therapeutic process. Instead, they are global indicators describing whether the person is functioning adequately to consider continuing versus altering the planned treatment. Certainly, programs serving consumers with serious and persistent illnesses should adopt a strategy of regularly collecting such "progress" measures. Guideline 6 Psychometric Strengths The measure used should meet minimum criteria of psychometric adequacy, including: a) reliability (testretest, internal consistency, or inter-rater agreement where appropriate); b) validity (content, concurrent, and construct validity); c) demonstrated sensitivity to treatment-related change; and d) freedom from response bias and non-reactivity (insensitivity) to extraneous situational factors that may exist (including physical settings, client expectation, staff behavior, and accountability pressures). The measure should be difficult to intentionally fake, either positively or negatively. (Ciarlo et al., 1986, p. 27)

< previous page

page_160

next page >

< previous page

page_161

next page > Page 161

Two issues are discussed under the topic of psychometric features. First, it is important to use measures of high psychometric quality. Second, the psychometric quality of the local application of an instrument is related to the quality of services. Measures of high psychometric quality are important. On the surface, no one should argue to lower the standards for an instrument's psychometric qualities. Yet, the more reactive, less psychometrically rigorous global measures (e.g., global improvement ratings, global level of functioning ratings) tend to be more popular with upper level decision makers (e.g., program managers, legislators). Although it is possible to exert reasonable control over the application of these measures to assure psychometric quality (Newman, 1980), if such control is not enforced, then psychometric quality suffers (Green, Nguyen, & Attkisson, 1979). The psychometric quality of an instrument as implemented in a local program is related to the program's quality of care. This is a bold, double-edged assertion. On the one hand, the selection and use of an instrument of poor psychometric quality could depreciate the quality of care because it is likely that the wrong information about a person would be transmitted. One the other hand, it is also possible that a service providing poor quality care can depreciate the psychometric quality of the assessment techniques. If local data collection produces low reliability and validity estimates, and there exists evidence that the instrument has been demonstrated to have adequate reliability and validity in another context, then one of two possibilities needs to be considered. It is possible that lower levels of reliability and validity could be simply a matter of shoddy data collection and management. But, as presented in a later argument suggesting that it is also possible that if the psychometric quality of an instrument is lower in its local application than found in its national norms, then it is possible the quality and effectiveness of clinical services should be questioned. There are three conditions that, when satisfied, both the quality of care and the psychometric quality of assessment data are increased. First, a clinical service should have clearly defined the target groups it can treat (see Guideline 1). Second, the service should have clearly defined progress and outcome goals for each target group in terms that are observable and measurable. Third, the leadership and staff of a psychological service should have identified one or more instruments whose interpretative language is useful to support clinical communication about patient status relative to service goals. When one or more instruments are selected within the context of the first two assumptions, then the instrument can support reliable and useful communication, which in turn should promote a high quality of care. To illustrate the relation between the quality of care and the quality of an instrument as implemented, consider the issue of psychometric reliability as it might relate to program quality. If reliability of communication is low (between the client and therapist, or among two or more treatment staff), then it is likely that there is inconsistent communication or understanding about the client's psychological and functional status, the service treatment's intention, and/or the client's progress-outcome. If there is inconsistent communication or understanding regarding these aspects between client and therapist or among clinical staff, then a poor outcome would be the most likely result (Mintz & Kiesler, 1981). How does the use of standardized measures fit into the picture of increasing the accuracy of clinical communication? There are two points to be made. First, careful selection of the progress-outcome measures must be preceded by, and based on, a clear statement of a program's purpose and goals. Second, the language describing the functional domains (i.e., factor structure) covered by the instruments represents an agreed on vocabulary for staff to use when communicating with and about clients. If

< previous page

page_161

next page >

< previous page

page_162

next page > Page 162

the language of communication is related to the language of the instruments, then any inconsistency in use of the instruments would reflect an inconsistency in the communication with clients and among staff when discussing clients.3 Another consideration concerns an instrument's validity in its local application. If locally established estimates of instrument validity among services or within a service deviate from established norms, then the service staff's concept of "normal" needs to be studied. Classical examples of such differences are those found in estimates of community functioning between inpatient and outpatient staff (Newman, Heverly, Rosen, Kopta, & Bedell, 1983). Kopta, Newman, McGovern, and Sandrock (1986) found that when there were multiple frames of reference among clinicians of different theoretical orientations, there were different syntheses of the clinical material within a session and different intervention strategies and treatment plans proposed. McGovern et al. found that differences in attributions of problem causality and treatment outcome responsibility were related to judgments regarding the clinicians' choices of treatment strategies (McGovern, Newman, & Kopta, 1986). These differences in frames of reference influence (i.e., probably reduce) the estimates of concurrent validity of measures in use as well as measure interrater reliability. However, reduced coefficients of reliability and validity are not as serious an issue as the potential negative impact on services when purpose, language, and meaning lack clarity among service staff. A two-part recommendation should be considered. First, the leadership of a service program should implement or refine operations that satisfy the three assumptions identified earlier: Obtain a clear target group definition by service staff, provide operational definitions of treatment goals-objectives, and work toward selection of instruments whose structure and language reflects the first two assumptions. The program should also incorporate staff supervision and development procedures that will identify when and how differences in frames of reference and language meaning are occurring. Such staff development exercises can serve to document the level of measure reliability. These can be conducted at relatively low costs (see Newman & Sorensen, 1985). The exercises contrast staff assessments and treatment plan for the same set of patient profiles. The patient profiles could be presented via taped interviews or via written vignettes. Green and Gracely (1987) found that a two-page profile was as effective as a taped interview (and a lot less costly) when estimating interrater reliability. Methods for constructing such profiles and analyzing the results are described in Newman and Sorensen (1985) and in Heverly, Fitt, and Newman (1984). The data from these exercises can also be used to assess the degree to which the local use of the instruments matches the national norms. Guideline 7 Low Measure Costs Relative to Its Uses How much should be spent on collecting, editing, storing, processing, and analyzing progress-outcome information? The answer to this question must be considered in terms of the five important functions that the data support: screening-treatment planning, 3 Work with several colleagues has focused on both the methods and results of studies identifying factors influencing differences in clinician's perceptions. The theoretical arguments and historical research basis for this line of work is discussed in Newman (1983). The procedures for conducting these studies as staff development sessions are detailed in Newman and Sorensen (1985) and in Heverly et al. (1984). Examples of studies on factors influencing clinical assessment and treatment decisions include Heverly et al., 1984; Kopta, Newman, McGovern, and Angle (1989); Kopta et al. (1986); McGovern et al. (1986); Newman et al. (1983); and Newman et al. (1988).

< previous page

page_162

next page >

< previous page

page_163

next page > Page 163

quality assurance, program evaluation, cost containment (utilization review), and revenue generation. Given these functions, a better question might be: What is the investment needed to assure a positive return on these functions? There are several pressures on mental health (and physical health) services indicating that an investment in the use of progress-assessment instruments can be cost beneficial. The first is the requirement for an initial assessment to justify entry into services and development of a treatment plan for reimbursable clients. For persons with a serious and persistent mental illness, funded placement (e.g., by Medicaid in most states) in extended community services (waivered services) requires a diagnostic and functional assessment. Most thirdparty payers will reimburse judicious use of such activities and the affiliated resource costs if it can be shown that the testing is a cost-effective means of making screening (utilization review) decisions. Justification for continued care is also required by both public and private third-party payers. Again, the cost of the assessment can often be underwritten by the cost containment-quality assurance agreement with the thirdparty payer. The second pressure is the emerging litigious culture that requires increasing levels of accountability for treatment interventions. The legal profession is divided on this issue. One view says that the less hard data a service program has, the less liability it would have for its actions. The credo here appears to be, "Do not put anything in writing unless required to do so by an authority that will assume responsibility." The other view says that a service program increases its liability if it does not have any hard evidence to justify its actions. There is little doubt that the former view has been the most popular view until recently. With increased legal actions by consumer groups on the "right to treatment," there is likely to be an increased need for data that can justify the types and levels of treatment provided. A parallel force is exerted by the increased budgetary constraints by both private and public sources of revenues for mental health services. Pressures to enforce application of cost containment-utilization review guidelines appear to be far stronger than pressures for assuring quality of care. Although the literature has indicated the efficacy of many mental health interventions, empirical literature supporting the cost-effectiveness and cost benefits of these services still lags (Newman & Howard, 1986; Yates & Newman, 1980). When Ciarlo assembled the panel of experts for NIMH, a cost estimate of 0.5% of an agency's total budget was considered to be a fair estimate of affordable costs for collecting and processing progress-outcome data. This was to include the costs of test materials, training of personnel, as well as collecting and processing the data. This estimate was made at a time when the public laws governing the disbursements of Federal Block Grant funds required that 5% of the agency's budget go toward evaluations of needs and program effectiveness. There have been three notable changes in service delivery that have occurred since the time when the panel of experts met. One is that the Health Care Financing Agency (HCFA) and other third-party payers now require an assessment procedure that will identify and deflect those who do not require care, or that will identify the level of care required for clients applying for service. They do offer limited reimbursement for such assessment activities. The second change focuses on the use of assertive case management or continuous treatment team approaches for persons with a serious and persistent mental illness or who are substance abusers. Here the client tracking procedures can be part of the reimbursed overhead costs. The third and perhaps the most powerful impetus for change is the new National Committee on Quality Assurance (NCQA) standards for accrediting managed behavioral

< previous page

page_163

next page >

< previous page

page_164

next page > Page 164

health organizations. There is a standard required for more clinical and quality studies than in the past. Managed care contractors can expect to face rising expectations for outcome data collection and analysis, auditing of medical records, and adherence to practice guidelines. Assessment and client-tracking procedures are logically compatible activities. The requirement for initial and updating assessments to justify levels of care can be integrated with the client-tracking system requirement for case management or treatment team approaches. If a cost-effective technique for integrating the assessment and the client-tracking procedures is instituted, then the costs for testing become part of the costs of coordinating and providing services. It is possible that if the costs considered here were restricted to just the costs of purchasing the instrument and the capacity to process the instrument's data (and not the professionals' time), then the costs might not exceed the 0.5% estimate. Proper cost-estimation studies need to been done to provide an empirical basis to identify the appropriate levels of costs. Guideline 8 Understanding by Nonprofessional Audiences The scoring procedures and presentation of the results should be understandable to stakeholders at all levels. These stakeholders include consumers and their significant others, third-party payers, as well as administrative and legislative policymakers at the local, state, and federal levels. Consumers The analysis and interpretation of the results should be understandable at the individual consumer level. Two lines of reasoning support this claim. The first is the increased belief in and legal support for the consumer's right to know about the assessment's results and the associated selection of treatment and services. An understandable descriptive profile of the client can be used in a therapeutically positive fashion. Examples for the client or family member's consideration might include the following: 1. Does the assessment score(s) indicate my need for, progress in, or success with treatment, or the need for continued treatment? 2. Does a view of my assessment score(s) over time describe how I functioned in the past relative to how I am doing now? 3. Does the assessment score(s) help me communicate how I feel or function to those who are trying to serve, treat, or assist me (including my family)? 4. Does the assessment help me understand what I can expect in the future? Third Parties A second aspect of this guideline is the advantage of being able to aggregate understandable test results over groups of consumers in order to communicate evaluation research results to influential stakeholders (e.g., regulators, third-party payers, legislators, employers, citizens, or consumer groups). This includes needs assessment for program and

< previous page

page_164

next page >

< previous page

page_165

next page > Page 165

budget planning (Newman et al., 1989; Uehara et al., 1994). It also includes evaluating program effectiveness and/or cost-effectiveness among service alternatives for policy analysis and decisions (Newman & Howard, 1986; Yates, 1996; Yates & Newman, 1980). The budget planners and the policy decision makers require easily understood data. They are often reticent to rely solely on expert opinion to interpret the data, with some even preferring to do it themselves. Examples of questions that the data should ideally be able to address include: Do the scores show whether clients have improved functioning to a level where they either require less restrictive care or no longer require care? Do the measures assess and describe consumers' functioning in socially significant areas, for example, independent living, vocational productivity, appropriate interpersonal and community behaviors? Would the measures permit comparisons of relative program effectiveness among similar programs that serve similar clients? In summary, it is important to ensure not only that the test results are understandable to those at the front-line level (consumers, their families, and service staff) but also that the aggregate data is understandable to budget planners and policymakers. Guideline 9 Easy Feedback and Uncomplicated Interpretation The discussion under Guideline 8 is also relevant here, but here the focus is on presentation. Do the instrument and its scoring procedures provide reports that are easily interpreted? Does the report stand on its own without further explanations or training? For example, complex "look-up" tables are less desirable than a graphic display describing the characteristics of a client or a group of clients relative to a recognizable norm. Computerized scoring and profile printouts in both narrative and graphic form are becoming more common, which is to be commended. This trend reiterates the importance of Guideline 9. Another recent development is the trend to develop report cards on mental health services for use by consumers, their families, or by funding agencies (Dewan & Carpenter, 1997). In the past, such report cards focused on the types of consumers served and their level of satisfaction with a service. Recently, there has been a movement to provide report cards that describe the impact of the services on the consumers' quality of life and functioning (DeLiberty, 1998; Mulkern, Leff, Green, & Newman, 1995). The use of report cards is sufficiently new that systematic research on the quality and impact of instruments when used as part of a report card has yet to be done. Nevertheless, report cards of HMOs in general health care are substantial enough to reach wide distribution in the popular press, as was the case with the Wall Street Journal during 1997. There are two important cautionary notes about the relation between what is communicated by a report and the actual underlying variables captured by the scale. First, the language of the presentation should not be so "user friendly" that it misrepresents the data. The language used to label figures and tables must be carefully developed such that the validity of the instrument's underlying constructs are not violated. A related problem arises when it is assumed that the language used in the report matches the language used by patients or family members in their effort to understand and cope with their distress. For example, an elevated SCL-90 Depression subscale score might not match the patient's experience of elevated depression. It is important not to allow the language of test results to mask issues that are clinically important, as well as important to the patient.

< previous page

page_165

next page >

< previous page

page_166

next page > Page 166

Guideline 10 Useful in Clinical Services The assessment instrument(s) used should support the clinical processes of a service with minimum interference. An important selection guideline is whether the instrument's language, scoring, and presentation of results supports clinical decisions and communication. Those who need to communicate with each other include not only the clinical and service staff working with the client, but also the clients and their collateral-significant others. These clinically relevant questions might be considered when discussing the instrument(s) utility: Will the test results describe the likelihood that the client needs services and be appropriately responsive to available services? Do the test results help in planning the array and levels of services, treatments, and intervention styles that might best meet service goals? Do the test results provide sufficient justification for the planned treatment to be reimbursed by third-party payers? Is the client responding to treatment as planned, and if not, what areas of functioning are or are not responding as expected? An ideal instrument meeting this guideline would be sufficiently supportive of these processes such that the effort required to collect and process the data would not be seen as a burden. The logic here is complementary to Guideline 7, that is, the measure should have low costs relative to uses in screening-treatment planning, quality assurance, cost control, and revenue generation. Here, however, the emphasis is on utilization of the measure's results. The more the instrument is seen as supporting these functions, the less expensive and interfering the instrument will be perceived by clinical staff. Guideline 11 Compatibility with Clinical Theories and Practices An instrument that is compatible with a variety of clinical theories and practices should have wider interest and acceptance by a broad range of clinicians and stakeholders than one based on only one concept of treatment improvement. The former would provide a basis for evaluative research by contrasting the relative effectiveness of different treatment approaches or strategies. How does one evaluate the level of compatibility? A first step is to inquire about the context in which it was developed and the samples used in developing norms. For example, if the normative sample were clients on inpatient units, then it would probably be too limited because inpatient care is now seen as the most restrictive and infrequently used level of a continuum of care. The broader the initial sampling population used in the measure's development, the more generalizable the instrument. Ideally, there should be available norms for both clinical and nonclinical populations. For example, if an instrument is intended for a population with a chronic physical disability (e.g., wheelchair bound), then for sampling purposes the definition of a normal functioning population might change to persons with the chronic physical disability who function well in the community (Saunders, Howard, & Newman, 1988). Another indicator of measure compatibility is whether there is evidence that its use in treatment/service planning and review matches the research results published in refereed journals. This is especially important when the data are used to contrast the outcomes of two or more therapeutic (or service) interventions. In reviewing this type of research, look first at the types of clients served, the setting, and the type of diagnoses

< previous page

page_166

next page >

< previous page

page_167

next page > Page 167

and problems treated. Also note the differences in standard deviations among the groups in this literature. Evidence of compatibility would be indicated by similar (homogeneous) variations among the treatment groups. Homogeneity would indicate that errors of measurement (and/or individual differences and/or item difficulty) were not biased by the therapeutic intervention that was employed. One note of caution needed here is that it is possible for a measure to have homogeneity of variance within and across treatment groups, and to still lack equal sensitivity to the respective treatment effects. If a measure is not sensitive to treatment effects, its use as a progress or outcome assessment instrument is invalid. Methods for assessing these features are discussed in Newman and Tejeda (chap. 8, this volume) and go beyond the purposes of this chapter. Conclusions These guidelines are designed to support the evaluation of an assessment instrument and are not presented as firm rules of conduct. Few, if any, instruments can fully meet all the guidelines. But, it is expected that if their use is as a means of drawing together available information on an instrument, they will decrease the number of unexpected or unpleasant surprises in the adaptation (or adoption) and use of a measure. The application of the 11 guidelines has its own costs. Although a master's degree level of psychometric training is sufficient background to assemble the basic information on an instrument's ability to meet these guidelines, a full explication of the guidelines requires broader input. Some of the guidelines require clinical supervisors and managers to review clinical standards, program procedures, and policies. Other guidelines will require an interchange among clinical supervisory and fiscal management personnel in areas where they had little prior experience (e.g., the costs and worth of testing). Recent experience of involving consumers and families of consumers as participants in the selection or modification of instruments has been extremely useful (Mulkern, Leff, Newman, & Green, 1995; Newman et al., 1997). For example, Teague and colleagues (Teague, 1995; Teague, Drake, & Ackerson, 1995) found that self-management skills were as important an outcome concern as level of symptom distress across all subgroups of consumers, family members, and clinical service providers. This finding was confirmed in a study in Indiana, where the advisory panel that guided the development of the assessment instruments (which included both professionals and consumer representatives) insisted that both selfmanagement and level of problem severity be considered together in the assessment of each problem area (Newman et al., 1997). Therefore, the ultimate benefits to clients and stakeholders of applying these guidelines are well worth the costs. Although there are no controlled studies on whether timely feedback of consumer assessment information will positively influence consumer, clinician, managerial, or policy decision making. References Achenbach T.M., & Edelbrock C.S. (1983). Manual for the Child Behavior Checklist and Revised Behavior Profile. Burlington, VT: Dept. of Psychology, University of Vermont. Beutler, L.E., & Clarkin, J.F. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. New York: Brunner/Mazel.

< previous page

page_167

next page >

< previous page

page_168

next page > Page 168

Ciarlo, J.A., Brown, T.R., Edwards, D.W. Kiresuk, T.J., & Newman, F.L. (1986). Assessing mental health treatment outcome measurement techniques (DHHS Publication No. ADM 86-1301). Washington, DC: Superintendent of Documents, U.S. Government Printing Office. Cronbach, L.J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row. Cytrynbaum, S., Ginath, T., Birdwell, J., & Brandt, L. (1979). Goal attainment scaling: A critical review. Evaluation Quarterly, 3, 5-40. DeLiberty, R. (1998). Developing a public mental health report card: the Hoosier assurance plan provider profile report card. Managed Care Quarterly, 6, 1-7. Dewan, N., & Carpenter, D. (1997). Value account of health care services. In L.J. Dickstein, M. B. Riba, & J.M. Oldham (Series Eds.), & J.F. Clarkin & J. Docherty (Vol. Eds.), Psychologic/Biologic Testing: Issues for Psychiatrists of the Annual Review of Psychiatry (Vol. 16, pp. 81-101). Ellsworth, R.B. (1975). Consumer feedback in measuring the effectiveness of mental health programs. In M. Guttentag, & E.L. Struening (Eds.), Handbook of evaluation research (Vol. 2, pp. 239-224), Beverly Hills, CA: Sage. Green, R.S., & Gracely, E.J. (1987). Selecting a rating scale for evaluating services to the chronically mentally ill. Community Mental Health Journal, 23, 91-102. Green, R.S., Nguyen, T.D., & Attkisson, C.C. (1979). Harnessing the reliability of outcome measures. Evaluation and Program Planning, 2, 137-142. Heverly, M.A., Fitt, D.X., & Newman, F.L. (1984). Constructing case vignettes for evaluating clinical judgement. Evaluation and Program Planning, 7, 45-55. Hodges, K. (1996). Child and Adolescent Assessment Scale (CAFAS): Miniscale version. 2140 Old Earhart Road, Ann Arbor, MI 48105. Hodges, K., & Gust, J. (1995). Measures of impairment for children and adolescents. Journal of Mental Health Administration, 22, 403-413. Howard, K.I., Kopta, S.M., Krause, M.S., & Orlinsky, D.E. (1986). The dose-effect relationship in psychotherapy. American Psychologist, 41, 159-164. Howard, K.I., Moras, K., Brill, P., Martinovich, Z., & Lutz, W. (1996). Evaluation of psychotherapy: Efficacy, effectiveness, and patient progress. American Psychologist, 51, 1059-1964. Kane, R.A., & Kane, R.L. (1981). Assessing the elderly: A practical guide to measurement. MA: Lexington. Kopta, A.M., Newman, F.L., McGovern, M. P., & Angle, R.S. (1989). The relationship between years of psychotherapy experience and conceptualizations, interventions, and treatment plan costs. Professional Psychology, 29, 59-61. Kopta, S.M., Newman, F.L., McGovern, M. P., & Sandrock, D. (1986). Psychotherapeutic orientations: A comparison of conceptualizations, interventions and recommendations for a treatment plan. Journal of Consulting and Clinical Psychology, 54, 369-374. Lambert, M., Christensen, E., & DeJulio, R. (Eds.). (1983). The assessment of psychotherapy outcome. New York: Wiley. Lawton, M.P., & Teresi, J.A. (Eds.). (1994). Annual review of gerontology and geriatrics: Focus on assessment techniques. New York: Springer. Lipsey, M., & Wilson, D. (1993). The efficacy of psychological, educational, and behavioral treatment: Confirmation and meta-analyses. American Psychologist, 48, 1181-1209. Locke, S. E., Kowaloff, H.B., Hoff, R.G., Safran, C., Popovsky, M.A., Cotton, D.J., Finklestein, D.M., Page, P.L., & Slack, W. V. (1992). Computer based interview for screening blood donors for risk of HIV transmission. Journal of American Medical Association, 264, 1301-1305. Mangen, D.J., & Peterson, W.A. (1984). Health, program evaluation, and demography. Minneapolis: University of Minnesota. Maruish, M. (Ed.). (1994). Use of psychological testing for treatment planning and outcome assessment. Mahwah, NJ: Lawrence Erlbaum Associates. McGovern, M.P., Newman, F.L., & Kopta, S. M. (1986). Meta-theoretical assumptions and psychotherapy orientation: Clinician attributions of patients' problem causality and

< previous page

page_168

next page >

< previous page

page_169

next page > Page 169

responsibility for treatment outcome. Journal of Consulting and Clinical Psychology, 54, 476-481. Mintz, J., & Kiesler, D.J. (1981). Individualized measure of psychotherapy outcome. In P. Kendall & J.N. Butcher (Eds.), Handbook of research methods in clinical psychology (pp. 491-534). New York: Wiley. Mulkern, V., Leff, S., Green, R.S., & Newman, F. L. (1995). Section II: Performance indicators for a consumeroriented mental health report card: Literature review and analysis. In V. Mulkern (Ed.), Stakeholders perspectives on mental health performance indicators. Cambridge, MA: The Evaluation Center @ HSRI. Navaline, H.A., Snider, E.C., Christopher, J. P., Tobin, D., Metzger, D., Alterman, A. I., & Woody, G.E. (1994). Preparation for AIDS Vaccine Trials: An automated version of the Risk Assessment Battery (RAB): Enhancing the assessment of risk behaviors. Aids Research and Human Retro Viruses. 10, S281-283. Newman, F.L. (1995). Disabuse of the drug metaphor in psychotherapy: An editorial dilemma. Journal of Consulting and Clinical Psychology, 62, 940-941. Newman, F.L. (1983). Therapists' evaluations of psychotherapy. In M. Lambert, E. Christensen, & R. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 497-534). New York: Wiley. Newman, F.L. (1980). Global scales: Strengths, uses and problems of global scales as an evaluation instrument. Evaluation and Program Planning, 3, 257-268. Newman, F.L., & Carpenter, D. (1997). In L.J. Dickstein, M.B. Riba, & J.M. Oldham (Series Eds.), & J.F. Clarkin & J. Docherty (Vol. Eds.), Psychologic/Biologic Testing: Issues for Psychiatrists of the Annual Review of Psychiatry (Vol. 16, pp. 59-79). Newman, F.L., DeLiberty, R., Hodges, K., McGrew, J., & Tejeda, M.J. (1997). The Indiana Hoosier Assurance Plan packet: A technical report. Copies can be obtained from Cambridge, MA: Evaluation Center @ HSRI. Newman, F.L., & Tejeda, M.J. (1996). The need for research designed to support decisions in the delivery of mental health services. American Psychologist, 51, 1040-1049. Newman, F.L., Heverly, M.A., Rosen, M., Kopta, S.M., & Bedell, R. (1983). Influences on internal evaluation data dependability: Clinicians as a source of variance. In A.J. Love (Ed.), Developing effective internal evaluation: New directions for program evaluation (No. 20). San Francisco: Jossey-Bass. Newman, F.L., Griffin, B.P., Black, R.W., & Page, S.E. (1989). Linking level of care to level of need: Assessing the need for mental health care for nursing home residents. American Psychologist, 44, 1315-1324. Newman, F.L., Fitt, D., & Heverly, M.A. (1987). Influences of patient, service program and clinician characteristics on judgments of functioning and treatment recommendations. Evaluation and Program Planning, 10, 260-267. Newman, F.L., Kopta, S.M., McGovern, M. P., Howard, H.I., & McNeilly, C. (1988). Evaluating the conceptualizations and treatment plans of interns and supervisors during a psychology internship. Journal of Consulting and Clinical Psychology, 56, 659-665. Newman, F.L., & Howard, K.I. (1986). Therapeutic effort, outcome and policy. American Psychologist, 41, 181187. Newman, F.L., & Sorensen, J.E. (1985). Integrated clinical and fiscal management in mental health: A guidebook. Norwood, NJ: Ablex. Nunnally, J.C., & Burnstein, I. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill. Orlinsky, D.E., & Howard, K.I. (1986). Process and outcome in psychotherapy. In S.L. Garfield & A.E. Bergin (Eds.). Handbook of psychotherapy and behavior change (3rd ed., pp. 331-381). New York: Wiley. Patterson, D.R., & Sechrest, L. (1983). Nonreactive measures in psychotherapy outcome research. Clinical Psychology Review, 3, 391-416. Saunders, S.M., Howard, K.I., & Newman, F. L. (1988). Evaluating the clinical significance of treatment effects: Norms and normality. Behavioral Assessment, 10, 207-218. Stiles, W.B., & Shapiro, D.A. (1995). Disabuse of the drug metaphor: Psychotherapy process-outcome correlations. Journal of Consulting and Clinical Psychology, 62, 942-948. Strupp, H.H., & Hadley, S.W. (1977). A tripartite model of mental health and therapeutic outcome. American Psychologist, 32, 187-196.

< previous page

page_169

next page >

< previous page

page_170

next page > Page 170

Teague, G.B. (1995, May). Outcome assessments in New Hampshire. Presentation at the National Conference of Mental Health Statistics, Washington, DC. Teague, G.B., Drake, R.E., & Ackerson, T.H. (1995). Evaluating use of continuous treatment terms for persons with mental illness and substance abuse. Psychiatric Services, 46, 689-695. Turner, R.M., McGovern, M.P., & Sandrock, D. (1982). A multiple perspective analysis of schizophrenic symptomatology and community functioning. American Journal of Community Psychology, 11, 593-607. Uehara, E., Smukler, M., & Newman, F.L. (1994). Linking resource use to consumer level of need in a local mental health system: Field test of the ''LONCA" case mix method. Journal of Consulting and Clinical Psychology, 62, 695-709. Yates, B.T. (1996). Analyzing costs, procedures, processes, and outcomes in human services. Thousand Oaks, CA: Sage. Yates, B.T., & Newman, F.L. (1980). Findings of cost-effectiveness and cost-benefit analyses of psychotherapy. In G. VandenBos (Ed.), Psychotherapy: From practice to research to policy (pp. 163-185). Beverly Hills, CA: Sage.

< previous page

page_170

next page >

< previous page

page_171

next page > Page 171

Chapter 6 Design and Implementation of an Outcomes Management System Within Inpatient and Outpatient Behavioral Health Settings Jacque Bieber American Academy of Neurology Jill M. Wroblewski Strategic Advantage, Inc., Minneapolis, Mn Cheryl A. Barber University of Minnesota The increasing focus on measuring outcomes in behavioral health care is the result of increased pressure from accreditation bodies, managed care interests, consumers, and government bodies for behavioral health care providers to demonstrate results. This so-called move to measure has been driven by the need for quality and accountability in the health care industry in general. Dennis O'Leary, president of the Joint Commission on Accreditation of Healthcare Organizations (JCAHO), predicted that "a successful health care organization's future will increasingly depend on its ability to conduct measurement activities that affirm the quality and value of its services and provide direction for improvement efforts" (1997, p. i). In behavioral health care, the various parties involved all have their own purposes for measuring outcomes. Patients and their families want to know if a particular provider will be able to help them feel better and experience a higher quality of life; in addition, measures make the sometimes intangible changes in a patient's life more tangible and comparable. Payers and managed care companies want to know whether they are getting results and value commensurate with the dollars spent. Regulatory bodies and accreditation agencies want to see evidence that quality is an integral component of how a given provider conducts business. Meeting the Increasing Demand for Quality and Accountability To meet these increasing demands for quality and accountability, behavioral health care providers need systems to monitor and report their own performance over time and to implement continuous efforts that will improve that performance. Health care organizationsfrom providers to accreditation bodies to managed care companiesare expanding their quality efforts to include measurement. The lexicon refers to such efforts

< previous page

page_171

next page >

< previous page

page_172

next page > Page 172

as quality or performance measurement systems. Three dimensionsstructure, process, and outcomeprovide the basis for quality measurement (Donabedian, 1985). The movement in behavioral health care is to require measures in all three of these dimensions, providing a multidimensional assessment of quality. Given this framework, structure and process provide the context within which outcomes information becomes useful in the movement for continuous improvement/quality. To better understand the value of outcomes measures in the quality movement requires definition of all three dimensions: 1. Structure refers to the characteristics of the environment in which care is delivered, whether at a community, institutional, or provider level. Is the environment able to support quality care by ensuring that the right mix of health professionals, settings, locations, and methods of payment is available to meet the cultural, safety, and convenience needs of a given population? How do the characteristics of the population served impact the outcomes (Quality Measurement Advisory Service, QMAS, 1996)? 2. Process refers to the discrete activities that contribute to patient carewhat is done to, for, or by patients (Joint Commission on Accreditation of Healthcare Organizations, JCAHO, 1997a; Migdail, Youngs, & BengaenSeltzer, 1995). Process measures assess whether the admission, assessment, diagnosis, treatment, and disposition processes currently in place are provided skillfully and humanely to the people who need them and in ways that are responsive to people's preferences. Process measures also determine whether provisions exist to inform, involve, and support patients in the decision-making process (QMAS, 1997). 3. Within this context, outcomes can be defined as the intermediate or ultimate results of efforts to prevent, diagnose, and treat various health problems (McGlynn, 1996). Although a consensual definition of outcomes does not exist, certain components of existing definitions are commonly agreed on. For instance, it is generally accepted that outcomes are results that may be favorable or adverse (e.g., an increase or decrease in symptoms, risk factors, or functionality) and can be applied to individuals, subsets of a population, or an entire population. Moreover, outcomes can cover various lengths of time, such as from intake to discharge, or from intake to 6 months after the date of admission (Donabedian, 1985; JCAHO, 1997a; Migdail et al., 1995; National Committee for Quality Assurance, NCQA, 1997a). Finally, outcomes measures provide an accurate, reliable, quantitative assessment of the results of the procedures and treatments rendered. Outcomes Management Systems An outcomes management system applies outcomes measures to the health care decision-making process. Even though the term outcomes is loosely used to apply to measures ranging from cost, recidivism rates, and length of stay to functionality, patient satisfaction, and symptom reduction (Davis & Fong, 1996), the overriding concern of organizations that measure outcomes is the need to demonstrate value. Namely, is the quality of the care worth the dollars invested? This is the resounding question asked by payers who are reluctant to include mental health in their benefits packages, as well as consumers who have lost faith in providers' ability to deliver the right treatment in the best way possible. Consumers are assessing health care using the standard of value-based purchasing, and providers are being required to implement credible quality and outcomes measures to demonstrate that prescribed care is necessary and effective (QMAS, 1997). Good measures of outcomes make it possible for consumers to compare the performance of various health plans and providers and for providers and payers to better understand what does and does not work. However, it is only when empirical evidence is provided that it becomes possible to identify practices in need of change and make continuous improvement a reality.

< previous page

page_172

next page >

< previous page

page_173

next page > Page 173

Within the controlled environment, the structure and process practices can be tested and developed. On dissemination, they can be applied in a real-world setting that includes an outcomes management component. Practitioners and researchers alike can examine the results of the outcomes studies to modify the structure and process of treatment as they are applied in a real-world or research setting. Benefits of Measuring Individual Patient Outcomes Although the demand for mental health services to be empirically based and scientific means additional work for the providers of such services, these professionals also see the benefits of measuring both individual and aggregated patient outcomes. For instance, behavioral health care professionals may believe that psychotherapy is a powerful and useful service, but unless they can demonstrate this value to consumers and payers, continued financial and political support for this service, and perhaps others, will likely decrease. Thus, accountability can be viewed as a vital tool in promoting the behavioral health care profession's survival. In order to be accountable, the full range of providersfrom individual practitioners to full-continuum facilitiesneed to find new ways to incorporate measurement into their day-to-day operations. More specifically, providers need to demonstrate accountability at the beginning, middle, and end of each episode of care. On an individual basis, the therapist will need to be skilled at rapid assessment and intervention: two components of the brief and accountable therapy model required by most managed care companies (Johnson, 1995). Yet, with the increasing demand to deliver more for less, how will providers be able to incorporate outcomes measurement into their daily workflow? One way of managing the cost and time required to implement measurement on an individual patient basis is to use the same measurement tool for assessment and outcomes. At intake, such a tool can be used to support triage decisions, speed up the authorization process, and focus therapeutic time and effort on a specific, limited area of change that the client and therapist will work to achieve (Johnson, 1995). Brief assessment tools that measure a patient's symptoms or level of functioning can quickly establish a therapeutically relevant focus for treatment planning and provide a baseline from which to monitor an individual patient's progress during and after treatment (Fischer & Corcoran, 1994a, 1994b). As a client and health care professional act on a treatment plan, changes will occur in the client's symptoms and/or level of functioning. Outcomes measures make it possible to monitor these changes as they take place and to maintain or modify the treatment plan in midcourse, if necessary. Outcomes data also can support treatment termination. Sauber (1996) found that the use of concurrent outcomes measures allows clinicians to make sound judgments about each individual case by relating the information received from outcomes measures to the care process and to make any changes necessary. Benefits of Aggregating Patient Outcomes Measures When outcomes measures that assess individual patients' statuses pre- and posttreatment are aggregated, it is possible to demonstrate results for a specific program, level of care, clinician, or hospital. By aggregating the data across all of the patients who received a

< previous page

page_173

next page >

< previous page

page_174

next page > Page 174

specific treatment, the effectiveness of the treatment can be determined. Comparisons can be used to track trends over time, compare one entity with another, or serve as the basis for continuous improvement by helping identify which programs, clinicians, or facilities are producing the best results in terms of quality and cost. This information can then be included in published practice guidelines, making broad-based use of the best practices possible. Aggregated measures also support providers who need to meet regulatory requirements that call for performance measurement and continuous improvement efforts. Finally, outcomes measures that state results in a concise, userfriendly format serve as effective communication tools in discussions with referring sources and third-party payers. Predictive Modeling Capabilities Over the past 10 years, the aim of professionals involved in outcomes measurement has been to develop a system of prescriptive care that identifies practices associated with better outcomes for particular types of patients and bridges the gap between actual and optimal practice. The first step in determining what constitutes the optimal effective treatment is to determine what treatments are effective for specific patient groups. Although it is not exhaustive in scope, a list of such treatments for specific disorders has been developed by the Division 12 Task Force on Psychological Interventions of the Division of Clinical Psychology of the American Psychological Association (APA). This task force set out to develop a list of effective psychotherapies, referred to as empirically validated therapies (EVTs), to be used to educate clinical psychologists, third-party payers, and the public (Chambless, 1995). The information was not intended to serve as guidelines, standards, or official policy (Chambless et al., 1996); rather, its purpose was to highlight the fact that a variety of psychotherapies have been proven efficacious by research in the past 15 years (Chambless, 1995). The task force identified effective treatments for given disorders but did not identify those that are better within disorders or which patient characteristics will influence the choice of therapy. Although treatment matching (i.e., identification of the optimal treatment for an individual based on the patient characteristics) has not yet been realized, creating a list of EVTs is certainly a first step in this effort. Findings related to treatment matching are still extremely limited. Beutler (1991), a leader in this field, noted that the lack of success may be due to failure to specify and measure meaningful populations and interventions. Continued scientific research and identification of reliable treatments by consumer interactions will provide the information needed for the development of empirically based treatment matching. As the development continues, treatment-matching characteristics should include those that predict the need for varying treatment approaches (as cited in Lyons, Howard, O'Mahoney, & Lish, 1997). Patient characteristics also are used in outcomes management systems for the purpose of risk adjustment. Risk adjustment corrects results for differences in a population with respect to patient mix, allowing for more valid comparisons across groups. The first step in this process is to identify what patient characteristics make treatment outcomes more or less difficult to achieve, regardless of quality of treatment. Characteristics that should be considered include the severity, acuteness, and complexity of the disorder. In developing appropriate risk-adjustment models, patient groups that are typically less well served

< previous page

page_174

next page >

< previous page

page_175

next page > Page 175

or more difficult to treat effectively are often identified. This information may lead to valuable findings for research related to treatment matching (Lyon et al., 1997). There is a good deal of skepticism related to treatment matching, which is not surprising, given the scientific and practical problems of reaching agreement even on what outcomes should be measured and how. Clinicians have traditionally regarded their work as part science and part art. They sometimes resent measurement, viewing it as an intrusion into their field as well as an attempt to quantify what cannot be quantified (Sauber, 1996). Until researchers can demonstrate how outcomes measures and treatment matching will produce results that are superior in terms of cost and treatment effectiveness, clinicians will not likely be convinced that predictive modeling is worth the time and effort required to bring it to fruition. Common Reasons for Implementing an Outcomes System Outcomes measures can help clinicians and researchers alike determine whether a given treatment worked, whether an individual patient improved, or whether a new type of psychopharmaceutical drug worked better than another (Sauber, 1996). The new market-driven demand for outcomes measures in behavioral health care has dramatically changed and broadened the focus of outcomes research, from measuring the effects of treatment in ideal settings (i.e., efficacy studies) to using outcomes to justify behavioral health care services for entire populations. Whereas efficacy studies assess outcomes of treatment delivered under ideal conditions, effectiveness studies assess outcomes of treatment delivered under ordinary circumstances, typical of everyday practice (Institute of Medicine, IOM, 1997; McGlynn, 1996). Because efficacy studies are costly, however, many treatment strategies have not been evaluated across the various diagnoses, severities, and patient characteristics that are present in the average practice. Yet, with the advent of clinical management information systems, it should be possible to implement outcomes measures that can improve accountability and lead to an improved understanding of which practices are most effective in producing positive outcomes with different kinds of patients (Docherty & Streeter, 1996; Kane, Bartlett, & Potthoff, 1995). One of the current goals of the psychology industry is to integrate efficacy and effectiveness research (Fowler, 1996 ). The first step toward achieving this goal would be to continue the basic research into structure, process, and outcomes in a clinical trial setting, where a specific group of patients can be treated in strictly controlled settings. Doing so would help determine the treatment characteristics associated with particular outcomes for a given case mix. When such results are collected and applied to practice environmentswhere patient characteristics, treatment protocols, and practitioners vary in many areasit is difficult to evaluate the impact of each structure and process component of treatment (McGlynn, 1996). In spite of the difficulty of applying the results of controlled studies in less than controllable environments, quality assessment systems must begin to incorporate measures of client outcomes in their measurement programs (IOM, 1997). Continuous improvement efforts are attempting to evaluate the reasons actual practices may fail to achieve the results obtained in clinical trials (i.e., assessing the gap between efficacy and effectiveness) and to look at ways to improve the delivery of care in real settings. The Committee on Quality Assurance and Accreditation Guidelines for

< previous page

page_175

next page >

< previous page

page_176

next page > Page 176

Managed Behavioral Healthcare contended that "outcomes research is vitally important to improve the base of evidence related to treatment effectiveness. Outcomes research is needed to provide explicit direction in identifying performance indicators associated with good outcomes for different patient characteristics, types of treatment programs, and types of managed care organizations" (IOM, 1997, p. 5). The committee also recommended that outcomes measures increasingly should be based on evidence from research. According to Migdail et al. (1995), "The best process indicators focus on processes that are linked to patient outcomes, meaning that a scientific basis exists for believing that the process, when provided effectively, will increase the probability of a desired outcome" (p. 483). In recent years, as outcomes measures have extended beyond clinical efficacy research to initiatives for creating a behavioral health care delivery system that is more efficient and effective, a number of approaches have been developed for various stakeholders in order to assess the quality of behavioral health care. These approaches include profiling, report cards, instrument panels, score cards, and benchmarking. (They are discussed in detail in the following sections.) Accreditation standards for behavioral health care also have been developed in recent years, providing a national "seal of approval" used by various stakeholders to assess the value of services provided. Outcomes measures can be used in the process of continuous quality improvement for behavioral health care organizations to assess and improve the quality of care. In addition, outcomes can be used internally for clinical decision making in behavioral health care. Clinicians and researchers must be able to demonstrate that the treatment provided to patients will lead to more successful outcomes. By tracking outcomes of patients throughout treatment, so-called real-time clinical decision making is possible. In this context, outcomes measures provide clinicians with an effective tool for evaluating and monitoring the progress of treatment for patients. Generating Report Cards of Provider Performance The restructuring of health care in the 1980s and early 1990s is responsible for bringing quality and accountability to the forefront in this industry. Today, controlling costs and improving the quality of care are the aims of health care reform. Purchasers have demanded ways to measure the effectiveness of these efforts. Rather than make selections based solely on the costs or sizes of networks, purchasers wanted ways to measure the value of the health care services purchased. In response to this demand, various health care and government organizations have gathered and published data that allow consumers, purchasers, and state agencies to compare the performances of competing health plans and provider services. Key stakeholders, such as consumers and purchasers, also have pushed the agenda toward presenting objective information on the quality of health plans and provider services. The documents that have resulted from these initiatives are known as report cards (Mental Health Statistics Improvement Program Task Force, MHSIP, 1996). They are brief, standardized forms of profiling used to compare providers, health care systems, and health care plans. Eisen and Dickey (1996) described profiling as an analytical method of comparing practice patterns, client outcomes, accessibility to care, appropriate utilization of services, and satisfaction with care of providers. Profiling is a comparison of two or more groups of clients, providers, payers, geographic regions, or systems of care. From such comparisons, standards and guidelines are developed that

< previous page

page_176

next page >

< previous page

page_177

next page > Page 177

are based on expert opinion, research, and normative data from the general population or population subgroups. Report cards provide comparative data for all types of performance measures. In behavioral health care report cards, those measures include access to mental health services, appropriateness of services provided, patient outcomes, consumer satisfaction, quality of care, and prevention. Information about these performance measures are used primarily by the following stakeholders in the behavioral health care arena: Consumers: To compare and evaluate alternative behavioral health service options. Advocacy groups: To promote better services. Health care purchasers: To evaluate managed care organizations or provider systems. Providers: To monitor the performance of their systems over time and implement continuous quality improvement by identifying areas that need change and by developing strategies to increase the quality of care. Government mental health agencies: To monitor quality and desired outcomes across different provider systems. Managed care organizations: To monitor quality and desired outcomes across different provider systems. Public health officials: To assess whether the nation's health care goals are being met (Eisen & Dickey, 1996; MHSIP, 1996). Prior to the mid-1990s, medical/surgical report cards included few performance indicators for behavioral health care (Panzarino, 1996). In spring 1995, the American Managed Behavioral Healthcare Association (AMBHA; 1995) addressed this void by becoming one of the first groups to initiate a report card in the behavioral health care field. The AMBHA's goal was to produce a set of nonfinancial indicators that would provide accountability in behavioral health care. Around the same time, the Center for Mental Health Statistics Improvement Program (MHSIP) was working on its own report card; in spring 1996, the Consumer-oriented Mental Health Report Card was released. This initiative involved consumers in every stage of the development process, which distinguished it from earlier initiatives. In 1994, the General Accounting Office (GAO) reported in a summary of report card initiatives that "individual consumers have had minimal input into selecting report card indicators, and little is known about their needs or interests. . . . As a result, their needs may not be met" (as cited in MHSIP, 1996, p. 3). A collaborative was formed, which consisted of consumers and professionals from MHSIP of the Center for Mental Health Services. This group's mandate was to construct a report card that addressed the needs of mental health consumers. Adults with serious mental illnesses and children with serious emotional disturbances were targeted and included as active participants in the process. (Many other report card efforts had focused on the general population, so the needs of people with serious mental illnesses were not adequately addressed.) The product of this effortthe Consumer-oriented Mental Health Report Cardwas designed to help mental health consumers, advocates, health care purchasers, providers, and state mental health agencies compare and evaluate mental health services based on concerns that are important to consumers (MHSIP, 1996). Other national-level report cards that currently include or plan to include a behavioral health care component include the following: 1. National Report CardThe Impact of Managed Care on People with Severe Mental Illness: National Alliance for the Mentally Ill (NAMI)

< previous page

page_177

next page >

< previous page

page_178

next page > Page 178

2. Health Plan Employer Data and Information Set (HEDIS) and Quality COMPASS: National Committee for Quality Assurance (NCQA) 3. Foundation for Accountability (FACCT) 4. Consumer Assessment of Health Plans Study (CAHPS): Agency for Health Care Policy and Research (AHCPR) At the state level, about half of all U.S. states have developed or currently are developing report cards or performance outcome measurement systems. At the county level, the National Association of County Behavioral Healthcare Directors (NACBHD) has contracted with the Evaluation Center at Health Services Research Institute to develop candidate indicators for county performance outcomes (National Technical Assistance Center, 1996). In addition, many managed care plans have developed their own report cards, as have some hospitals, health care provider groups, and business coalitions (Atlantic Information Services, AIS, 1996b). Score Cards and Instrument Panels. Score cards and instrument panels comprise an additional category of aggregate measurement that has emerged in recent years to help health care delivery system administrators better manage both cost and quality. Instrument panels and score cards provide information at the systems level and key processes within the system, determined by a leadership group within a health care organization. A key feature of such measures is that they provide critical, real-time information to management, which supports action-oriented decision making. These decision-support tools provide performance rankings for overall health care systems or specific processes or outcomes, such as patient access, clinical processes, and customer satisfaction. These measures are put in place to help identify and manage variation in the system (Lyons et al., 1997). Nelson, Batalden, Plume, Mihevc, and Swartz (1995) suggested that "clinical leaders can quickly scan the control chart instrument panel to check for special-cause variation, to identify desirable or undesirable trends and to observe the level of variation and average performance" (p. 159). For example, based on the results of a score card or instrument panel, a leadership team will want to investigate why little or no improvement occurred in areas that had been targeted for change. The group also will want to observe if any key measures show too much variation or are performing at unacceptable levels (Nelson et al., 1995). Nelson et al. (1995) differentiated report cards and instrument panels as different tools that are suitable for stakeholders with different needs. Table 6.1 presents a comparison of these tools based on four issues: main user, main use, time frame, and focus. In sum, purchasers use report cards to make judgments about whom to select as a provider, to evaluate past performance, and to promote accountability. Providers, on the other hand, use instrument panels to monitor and control critical processes, to correctly identify trends (i.e., positive or negative), and to promote better quality and value. TABLE 6.1 A Comparison of Report Cards and Instrument Panels Issue Report Card Instrument Panel Main user Purchasers Providers Main use Judgment, Control, improvement accountability Time Past Present, future frame Focus Outcomes, charges Process, outcomes, costs

< previous page

page_178

next page >

< previous page

page_179

next page > Page 179

Benchmarking. The process of benchmarking involves finding the best performer in an industry for a particular measure or indicator (Nelson, 1996). Once the best case has been identified, the underlying processes responsible for generating consistently superior results can be examined and replicated, allowing application of the model in other like settings. In short, benchmarking is the hunt for the best process. That process is found when an organization's measured outcomes are consistently the best of the best. These best practices then can be incorporated into practice guidelines or implemented as individual clinical process improvements. The issues associated with successful benchmarking are numerous: Who should participate? What questions will efforts address? What tools will be used? How will information be disseminated? How will the results be used? (Gift & Mosel, 1997). The need to collect comparable data requires a collaborative approach. More and more providers are collaborating on projects, in which the member groups' data are pooled and each member then is able to benchmark its results against those from similar organizations (Trabin, 1997). As noted by Trabin and Kramer (1997): Busy behavioral healthcare organizations with narrow profit margins do not have the wherewithal to create experimental or even quasi-experimental designed outcomes research. They need assistance in identifying the middle group between clinical trials research on one extreme and outcomes evaluation designs that are so poor as to not be worth doing on the other extreme. An approach being widely considered as an alternative to experimental design is benchmarking against the outcomes results of similar organizations that agree to use the same evaluation methodology. . . . The large outcome databases used by organizations that benchmark against each other will not only become repositories of valuable data, but will also be primary generators of new knowledge for our field. (p. 348) A number of collaborative efforts are underway in the behavioral health care marketplace. The Outcomes Management Program (OMP), implemented by the University of Cincinnati Quality Center in collaboration with Mesa Mental Health Professionals in Albuquerque, New Mexico, and the Council of Behavioral Group Practices, currently uses its benchmark findings in the following ways (Zieman, Kramer, & Daniels, 1997): to comply with NCQA standards for MBHOs, to conduct quality liaison activities with payers, to develop protocol, to profile providers, and to develop networks. The Practice Research Network (PRN), established by the American Psychiatric Association (APA), states that its purposes are to help generate psychiatric research that supports clinical decisions in a timely fashion and to consider policy issues. Presently, the network has over 500 members, who have been randomly selected and/or systematically recruited (Zarin, Pincus, & West, 1997). The Pennsylvania Psychological Association has developed another network of practitioners with support from the American Psychological Association's Committee for the Advancement of Professional Practice and the Pennsylvania State University. This network of practitioners and researchers are interested in the application of outcomes and other specialized clinical research (Ragusea, 1997). Assisting in Clinical Decision Making Outcomes measures can assist in clinical decision making in a number of ways. Before examining those possibilities, however, it is important to understand the context in which they will occur. The realities of the current health care environment have specific

< previous page

page_179

next page >

< previous page

page_180

next page > Page 180

effects on clinical practice and decision making in behavioral health care. Consider the following: Health care, the largest service industry in the United States, is becoming a market-driven industry, which is changing the power paradigm and decision-making process. Payers, providers, and patients each have power in the new paradigm. Health care consumers want convenience, quality, information, control, and support when they purchase health careand all for a good price. It is difficult to measure the quality of health care, as the treatment applied often disappears at the point of delivery. Managed care companies have succeeded in making brief episodes of care the norm. Because all of health care is a young science (and thus has limited empirical underpinnings), ''some of today's medical advice may be dogma rather than science, based on beliefs about the relationship between cause and effect rather than on facts" (Herzlinger, 1997, p. 72). Medicine is in the process of industrializing how it does business. To be successful in this effort, health care will need accurate measures of the costs and benefits of various methods and will need to be able to compare outcomes measures, as well (Lyons et al., 1997). Efficiency and effectiveness have become essential ingredients of successful health care organizations. What do all these developments mean for the behavioral health care practitioner? Is there a panacea for practitioners who are willing to "grapple with the need to be customer driven?" (Chowanec, Neunaber, & Krajl, 1994, p. 47)that is, willing to see patients as consumers who are important partners in health care? What part can outcomes management play in helping practitioners address the transformation in behavioral health care that is currently underway? Assessment and outcomes measures can support brief therapy by providing a real-time tool to focus the efforts and energies of practitioners and consumer-patients on well-defined needs. For instance, assessment data can be collected readily by having patients complete questionnaires, which can be scored within 5 to 15 minutes via computer or paper-and-pencil-based FAX-back systems. The resulting patient profile then is available to support treatment-planning and authorization decisions. A number of the primary principles of delivery that Cummings believed are necessary for brief, effective therapy are readily supported by assessment and outcomes measures. These measures can help focus the psychotherapy, engage patients as partners in treatment, and ensure that they receive appropriate amounts of therapy at given sessions within lifelong relationships. Outcomes measures can be used from episode to episode to track an individual consumer-patient's progress over a number of years (Cummings & Sayama, 1995). In addition, outcomes measures can serve as communication tools, a necessary ingredient in accountability. Outcomes measures provide a common way for professionals to talk to each other as well as those who review their work (Johnson, 1995). Outcomes measures can help the practitioner and consumer-patient measure progress toward a plan; modify the plan, as needed; support managed care authorizations for further care; and/or demonstrate that it is appropriate to terminate treatment. Outcomes measures can serve as the basic measurement tool for developing practice guidelines and continuous improvement. "The relative effectiveness of different approaches needs to be known, and routine monitoring of outcomes in a treatment setting or system of care can be a basis for continuous quality improvement within a treatment delivery organization" (IOM, 1997, p. 232).

< previous page

page_180

next page >

< previous page

page_181

next page > Page 181

Although the use of outcomes measures in clinical practice is still in its infancy, a number of organizations (including Charter Behavioral Health System and Kaiser Permanente) have initiated efforts to implement outcomes management systems. Considerable research is still needed, however, before the measurement of outcomes will be an integral part of all behavioral health care operations. This was the conclusion stated by the Institute of Medicine's (IOM) Committee on Quality Assurance and Accreditation Guidelines for Managed Behavioral Health Care (1997) when it released its recommendations in the area of outcomes research. The committee recommended continued support and development of collaborative health services research on the effectiveness of different treatment strategies for a variety of practitioner types and for consumers with different needs. Continuous Quality Improvement Continuous quality improvement is a managerial approach in which the goal is to put processes and systems in place that will prevent problems from happening, which makes corrective efforts unnecessary. Contrast this approach with quality assurance efforts that involve inspection of people and products, often after the fact. Continuous quality improvement is the process of assessing an organization's key systems, processes, and outcomes in an effort to understand how functions, processes, and structures influence performance so that revisions can be made to improve performance. A continuous feedback loop provides the foundation for efforts at continuous quality improvement (Lyons et al., 1997). Efforts toward continuous quality improvement started in the 1920s, when statistical process control methods were introduced in the Bell Laboratories. W. Edwards Deming brought these quality measurement techniques to Japanese manufacturers, who applied them to production processes and created superior quality in Japanese products. American manufacturers embraced these quality management principles in the late 1970s and early 1980s. The health care industry followed suit in the early 1980s, but improvements in the quality of care came slowly. It was found that the tools that could be easily transferred from a Japanese manufacturing plant to an American one could not be so easily transferred to health care. Health care professionals needed to develop new tools for continuous quality improvement and create methods of incorporating quality improvement into their workflow. In fact, the development of measurement tools and processes is an evolutionary process that is still in its early stages in the health care field. When continuous improvement processes are in place, data drive decision making by helping monitor critical processes and tracking the effects of changes in the delivery of service (IOM, 1997). In a health care setting, tracking outcomes must be logically linked with the processes and structures that exist in order to improve the quality or performance of an individual or organization. Lyons et al. (1997) stressed the need to reduce practice pattern variability as a key component of any continuous improvement effort: Taking an important lesson from the quality improvement initiatives in manufacturing, healthcare has learned that variation is the enemy of quality and that the practice of medicine must undergo an industrialization process. This process, now fully underway, promises to rationalize healthcare

< previous page

page_181

next page >

< previous page

page_182

next page > Page 182

delivery, provide accurate measures of the costs and benefits of interventions, and compare outcomes across providers (p. 12). A number of quality tools and methods have been introduced within the field of health care to manage variation, from practice guidelines to critical pathways. Nelson (1996) provided five methods to manage the structure and process of an organization, to provide feedback on the results of implementing the practices and interventions aimed at improving the process and structure of an organization, and then to insert that feedback and feed forward into the system (see Fig. 6.1). This last stage, feeding forward, gives the clinician specific, timely information about the patient's clinical status, functioning and general well-being, health habits, and ratings of care for the problems experienced to date. The clinician can then use this information to design the plan of care that best matches the patient's needs and also provides a qualitative health profile for a given patient population (Nelson, 1996). This set of quality improvement tools is representative of work underway throughout the health care industry to ensure standardization and support continuous quality improvement efforts. Continued efforts are needed as more and more accreditation agencies and purchasers are expecting health care organizations to have formal quality improvement programs in place. For example, the NCQA (1997c) described quality management and improvement as "an integrative process that links knowledge, structures, processes, and outcomes to enhance quality throughout an organization." According to the committee, a well-structured quality improvement program provides the framework within which an organization can assess and improve the quality of clinical care, clinical services, and member services. This structure includes a clear definition of the authority of the quality improvement program, its relationships to other organizational components, and its accountability to the highest level of governance within the organization. (pp. 41-42) Improvements in patient care and outcomes are the desired end results of continuous quality improvement efforts. Accreditation Requirements In 1997, the NCQA released new standards for managed behavioral health care organizations and the JCAHO released a manual of standards for behavioral health care. Previously, no national measures had existed for assessing the quality and accountability of behavioral health care services. The release of the NCQA and JCAHO standards clearly demonstrates recognition of the importance of assuring that quality performance standards are developed and processed for positive behavioral health care outcomes. Accreditation tends to set the minimum standards and provide a national seal of approval that can be used to assess the value of services offered by providers and managed care organizations (Drissel & Taylor, 1997). Mathios (1997) asserted that "accrediting organizations raise the minimum standards under which treatment delivery and managed care organizations must function" (p. 75). In fact, accreditation is required by corporate and government purchasers who bid out major contracts for behavioral health care services. Recently, national organizations that accredit various health/social service providers have attempted to develop core competencies that are expected to be consistent across agencies. For example, the Commission for the Accreditation of Rehabilitation Facilities

< previous page

page_182

next page >

< previous page

page_183

next page > Page 183

Fig. 6.1. Basic framework of health care delivery and context of improvement models. Reprinted with permission from "Using Outcomes Measurement to Improve Quality and Value" (No. 71, p. 113) by E.C. Nelson, in D.M. Steinwachs, L.M. Flynn, G.S. Norquist, & E.A. Skinner (Eds.), Using Client Outcomes Information to Improve Mental Health and Substance Abuse Treatment: New Directions for Mental Health Services. San Francisco: Jossey-Bass. Copyright © 1996, Jossey-Bass Inc., Publishers. All rights reserved.

< previous page

page_183

next page >

< previous page

page_184

next page > Page 184

(CARF) and Council on Accreditation of Services for Families and Children, Inc. (COA) are in the process of developing cooperative arrangements with the JCAHO to recognize each other's accreditation status. Such arrangements will protect small providers against the increased costs of meeting multiple accreditation requirements. Accreditation Bodies. Accreditation bodies also are placing more emphasis on performance and outcomes measures and less on standards that measure processes and structures. The goal of this trend is to encourage administrators and clinicians to review their organization's performance over time and identify opportunities for improvement. The following items describe the major accreditation agencies for behavioral health care in the United States: Joint Commission on the Accreditation of Healthcare Organizations (JCAHO). The JCAHO is the largest standards-setting and accrediting body for health care in the nation. In 1997, it introduced ORYX, the JCAHO performance measurement system, which requires organizations to collect and submit data about the results of their care for the purpose of reviewing quality performance over time. The accreditation process will require performance measurement in the future (JCAHO, 1997b; Moore, 1997). Commission for Accreditation of Rehabilitation Facilities (CARF). The CARF is a private, nonprofit, international accreditation commission that was established to improve community-based rehabilitation programs designed for people who are chronically and persistently mentally ill. It, too, is developing performance-based indicators as standards for services. National Committee for Quality Assurance (NCQA). The NCQA accredits health maintenance organizations (HMOs), health plans, provider networks, managed care organizations (MCOs), and managed behavioral health care organizations (MBHOs). It focuses on performance and outcomes measurement at the administrative level. The NCQA publishes the Health Plan Employer Data and Information Set (HEDIS), a type of report card that gives purchasers and consumers information based on standardized performance measures organized by domains in which to compare the performance of MCOs. The standards are based on a population-based continuous quality improvement management model that begins with needs assessment at the preventive level as well as the treatment level. Doing so helps promote population-based improvements. To date, most of HEDIS has focused on medical, rather than behavioral, care; only recently, HEDIS was expanded to include several behavioral measures. Although HEDIS data collection is not required for NCQA accreditation, most managed care organizations utilize HEDIS measures. Continuous quality improvement is a core principle of the NCQA's Behavioral Health Accreditation Standards (NCQA, 1997c). Council on Accreditation of Services from Families and Children, Inc. (COA). The COA accredits high quality, often lower cost, less intensive providers of a full range of social service and behavioral health care programs to children and families services. It accredits many services for which no other accreditor has standards (Mathios, 1997). The COA is attempting to apply more outcomes measures in its standards (Drissel & Taylor, 1997). Council on Quality and Leadership in Support of People with Disabilities. The Council on Quality and Leadership in Support of People with Disabilities accredits organizations that provide services and support to this population. The council uses a unique accreditation process in which staff meet with individuals with disabilities to identify how they define the outcomes expected from the services and support provided them. The council then provides feedback and consultation to the organization in the form of steps toward quality improvement. Accreditation is given to those organizations that demonstrate responsiveness to clients' priority outcomes, rather than compliance

< previous page

page_184

next page >

< previous page

page_185

next page > Page 185

with organizational processes. The purpose of this approach is to focus organizations on the design of best practices to optimize outcomes (Mathios, 1997). Common Trends and Barriers to Accreditation Guidelines. Drissel and Taylor (1997) suggested a trend in which accrediting agencies review continuous quality improvement against two criteria: (a) "how individual components are involved in setting and monitoring the performance of their part of the overall system" and (b) "how information on performance is fed back to the organization to help maintain or improve the overall quality of service and operations" (p. 13). The IOM's Committee on Quality Assurance and Accreditation Guidelines for Managed Behavioral Health Care (1997) pointed to the burden of accreditation requirements that often overlap in systems performing multiple functions. Namely, the cost is prohibitive for organizations to obtain more than one type of accreditation to satisfy employers, states, and other stakeholders, which makes multiple accreditation unrealistic. The committee suggests that such issues question the utility and validity of accreditation. In sum, "the accreditation industry is faced with pressure to focus its standards on the relevant issues, collaborate with similar organizations, and consolidate the multitude of accreditation standards to reduce overlap and redundancy" (pp. 214-215). Supporting Marketing Efforts Using outcomes measures to support marketing efforts is becoming the norm, rather than the exception, in the behavioral health care field. In today's market-driven environment, consumer-patients are looking for ways to shop more effectively for their health care needs. Outcomes systems provide tools that can help consumers get the most for their money. For instance, report card systems that compare one provider or health plan with another provide consumers with information about what types of results they may expect from a given provider or plan, which they can then use to comparison shop. Organizations that participate in report card systems have taken a large step toward demonstrating their accountability to consumers. When results are tracked over a period of time and needed improvements are made, this information can be communicated to the market. Providers that meet the performance measurement portion of accreditation and regulatory requirements can market this benefit to consumers in brochures, magazines, trade shows, and public forums. Certainly, individual organizations will have different opportunities for using outcomes measures to support their marketing efforts; nonetheless, the benefits of implementing outcomes management systems can and should be communicated to consumers. Questions Commonly Addressed by Outcomes Systems As noted, some accrediting organizations and benchmarking collaboratives are striving to implement outcomes systems that have common data elements and thus make reference comparisons possible. However, little standardization is in place on the dataelement level. Different systems continue to measure data in different ways. Most are, however, attempting to answer the same basic questions.

< previous page

page_185

next page >

< previous page

page_186

next page > Page 186

How Much Improvement Has Occurred as a Result of Treatment? All stakeholders are interested in improving the conditions of patients: Payers want to know whether patients needed treatment and if treatment took place, how much change occurred. Patients, along with their families, friends, and employers, want to know the level of improvement. Clinicians want to monitor the level of improvement throughout treatment. Outcomes measures can be used to satisfy all these needs by monitoring patients throughout treatment to determine if improvement is taking place and if so, to what degree. Outcomes measures also can indicate whether patients are not improving or are deteriorating, or whether they have improved to the point where treatment is no longer needed. Clinicians can use all of this information to better plan and manage the treatment process. Patients and others interested in their well-being will also find this information useful, as will potential patients and payers who want to compare the results of one provider's treatment with those of another. Nelson (1996), a proponent of using outcomes measures to improve the value of mental health care, believed there is currently a high demand from society to prove that treatment is beneficial. Nelson pointed out that "because of the social prejudice surrounding mental illness, the need to demonstrate striking proof of treatment benefit is heightened" (p. 122). What Type(S) of Patients Benefited most from the Treatment, Organization, and/or Individual Provider? what Type(S) Benefited Least? Smith (1996) stated that there are three objectives of patient outcomes assessment: to assess patient characteristics, to assess treatment processes, and to assess the effects of routine care in order to improve the outcomes produced by mental health and substance abuse treatment (pp. 59-60). In addition, Smith noted that outcomes assessment is being used to improve care by linking providers and patients with various treatment approaches and outcomes. Continued research is needed to determine the most effective means of matching patients with specific characteristics to those providers or modalities that will best serve their needs. Doing so continues to be a significant goal of the industry's measurement efforts. As discussed earlier, Nelson (1996) proposed five models for producing "better-value health care": guidelines, protocols, benchmarking, feedback/outcome measures, and feed forward (see Fig. 6.1). Yet, according to Nelson, "the gap between the potential gains and those realized to date is staggering" (p. 120). This statement applies to all the process improvement methods currently in place in the mental health field, particularly those that attempt to match patients who have specific characteristics and diagnoses with specific providers and treatments. What Nelson referred to as feed forward is the model that could bridge this gap. Although this is the least well known of the five models, "it has the potential to be the most powerful and helpful, because it can be used to improve care of individual patients immediately, as well as to create an information feedback loop to improve the care for future patients'' (p. 119).

< previous page

page_186

next page >

< previous page

page_187

next page > Page 187

The IOM's Committee on Quality Assurance and Accreditation Guidelines for Managed Behavioral Health Care (1997) cited the need to identify which treatment modalities are used and to track how they impact outcomes for patients with various characteristics. More successful and cost-effective outcomes will be possible when research can help answer questions regarding which types of patients are best served with which treatment processes and protocols. How Satisfied are Patients with the Products and Services they Receive? Measuring consumers' satisfaction with the products and services they receive has become commonplace for successful businesses. The notion that consumers have values and perspectives worthy of soliciting and acknowledging can be traced to the 1950s, when Drucker (1954) suggested to the business world that the customer was "the reason for being." The health care industry has only recently embraced the idea that consumer-patient satisfaction is a crucial factor in the perceived value of services offered and that the consumer-patient deserves to have a voice in health care decision making. Today, consumer-patients can make choices about their health care purchases and are demanding to become involved in the decision-making process. Thus, measuring how satisfied they are throughout the treatment process is an essential ingredient of health care in a competitive, industrialized delivery environment. Patients, along with their family and friends, who are satisfied with the quality of care they have received are likely to recommend the provider to others, as the occasion presents itself. They are also likely to return to the same provider, if the need arises. Perhaps just as importantly, satisfied patients are less likely to complain to others about the services they received. Surveys are often used to assess overall patient satisfaction. To be useful, a survey should cover a wide range of issues, including access to services, level of patient involvement in decision making, respect for individual and cultural differences, and the degree to which the care delivered was acceptable, met expectations, and produced desired results. Satisfaction surveys generally show that consumers have positive feelings about behavioral health care services (Elbeck & Fecteau, 1990; Polowczyk, Brutus, Orvieto, Vidal, & Cipriani, 1993). The main issue to consider when creating patient questionnaires is whether the information gathered will make it possible to modify services in ways that will improve customers' levels of satisfaction. It is not enough to find out whether people are satisfied or even with what issues they have complaints. Additional information about patients' expectations is needed to consider alternatives and thus improve the quality of care. Survey questions must be worded and ordered purposefully to elicit this type of information from patients (MacStravic, 1991). What is the Value of the Services Offered? The perception of value is different for all of the major stakeholders in behavioral health care. For instance, does the consumer-patient value the changes that have occurred from receiving services? Do family members perceive positive changes have taken place? Do employers see more productive employees? Do patients at one clinic function better than those at another? Does the payer think the services provided were worth the investment?

< previous page

page_187

next page >

< previous page

page_188

next page > Page 188

TABLE 6.2 How Stakeholders Can Use Outcomes Measures Users Uses Consumers Track progress Discuss options with clinician Identify preferred provider Payers/PurchasersAssess effectiveness of provider to support referrals Track progress Clinicians Support treatment decisions Track progress Assess effectiveness of methods Management Support staffing and resourcing decisions Compare modalities Assess effectiveness Outcomes measures can encompass a number of domains. Nonetheless, the primary goal of treatment and thus the primary measure of outcome should be whether a reduction in symptoms, disability, risk factors, and/or social consequences of a given disorder occurred as a result of treatment. To what level has functionality and quality of life improved? Given stakeholders' diverse perspectives on the value of services provided, it follows that they will all have different uses for outcomes information (see Table 6.2). Thus, one of the major tasks of designing and implementing an outcomes system is to identify the needs and relative value that outcomes measures may have for various stakeholders. How Much is the Cost of Treatment Offset by Savings in Other Areas of Patients' Functioning? Payers are primarily interested in how much care costs and the exact services provided for that cost. Obviously, then, they are interested in measures of controlling cost, as well. One promising measure has come from research regarding the use of psychotherapy to relieve patients' distress and thus their need for medical services. This concept is known as medical cost offset. Some of the original research in this area was conducted by Follette and Cummings as early as 1967 with Kaiser Permanente's department of psychotherapy. They found that the use of medical resources by persons with emotional distress was significantly higher than that by the average health plan user. Follette and Cummings also found that there were significant declines in medical resource utilization rates for patients who received psychotherapy, as compared with those patients in the control group who did not receive psychotherapy (as cited in Cummings & Sayama, 1995). In 1981, Cummings and VandenBos summarized Kaiser Permanente's 20 years of research into this issue, concluding that the absence of psychotherapy can "leave patients with little alternative but to translate stress into physical symptoms that will command the attention of a physician" (as cited in Cummings & Sayama, 1995, p. 25). Other researchers have replicated these findings. The Dean Foundation for Health, Research, and Education examined the use of general medical services by depressed and nondepressed patients. They found that the general medical costs of patients with depression can be reduced if those patients are diagnosed and adequately treated

< previous page

page_188

next page >

< previous page

page_189

next page > Page 189

("Depression Treatment," 1997). In 1989, Fiedler and Wight reported that Medicaid patients who received mental health intervention in addition to being hospitalized for physical maladies saved a cumulative average of $1,500 over the following 2.5-year period (as cited in "Mental Health Benefit," 1993). And Holder and Blose reported the results of a 3-year study conducted by Aetna, which saw the average subject's health costs drop from $242 the year prior to receiving mental health treatment to $162 per year for the 2 years following such treatment (as cited in "Mental Health Benefit," 1993). The idea that providing behavioral health care treatment can decrease total health care costseven when the cost of the behavioral health care intervention is includedis becoming widely accepted. Continued research into this important paradox in the health care industry should be a critical component of any outcomes management system. Measuring Outcomes There are three important questions to ask about measuring outcomes: What outcomes are of interest? Who will provide and use the information? And, when should information be gathered and disseminated? Each question is discussed briefly here and in more detail later. In deciding what outcomes are of interest, it is important to realize that the area of mental health and substance abuse covers a broad range of outcome domains. To reflect the effects of treatment in a comprehensive and valid way, a system should include a broad range of outcome dimensions (Docherty & Streeter, 1996). Domains beyond treatment effectiveness (e.g., consumer satisfaction) also must be considered. Answering the question of Who? requires determining the source or sources of the information. Depending on the domains of interest, collecting information from multiple sources may be necessary; for example, subjective quality of life measures can only be collected from individual patients. Moreover, collecting information from multiple sources will increase the depth of the system. If data on a given domain are available from multiple sources, collecting them will increase the robustness of the data by providing varying perspectives of that domain. Finally, determining when measures will be collected is equally important. The first decision related to the timing of measures is to determine at which time points (e.g., baseline, end of treatment, short-term follow-up, long-term follow-up) data will be collected. In order to have a complete spectrum of effect, baseline and multiple time points afterward must be measured. Some treatments have only long-term effects, whereas others may have short-term effects that diminish over time. Beyond determining time points, it must be decided whenrelative to the beginning or end of treatmentthese measurements will be taken. Answers to these three questions about outcomes will depend, in part, on the goals of the system, the population of interest, the resources available (both time and money), and the business flow. In addition, there are specific issues related to each area of what to measure, from whom to measure, and when to measure. What to Measure The domains measured as outcomes and reported in the professional literature highlight the multidimensionality of mental health and substance abuse treatment. It is well accepted that mental illness and substance abuse affect patients in many ways beyond

< previous page

page_189

next page >

< previous page

page_190

next page > Page 190

the symptoms of the primary illness. Although this may be true in most illnesses, it is especially true of mental illness and substance abuse. Thus, a true measure of the effectiveness of treatment includes assessment not only of the clinical status of the disorder but also daily functioning, quality of life, and interpersonal relationships. This emphasizes the need for measurement of multiple domains within the area of clinical outcomes. As the move to achieve standardized data continues, measurement will need to move from collection of satisfaction only to collection of more clinically related outcomes and collection of a broader range of outcome domains (IOM, 1997). Most commonly measured outcomes can be categorized into one of five broad domains: clinical status, functionality, health-related quality of life, satisfaction, and resource use/treatment utilization (Coughlin, Simon, Soto, & Youngs, 1997). Each of these domains is considered further later. Although such a discussion is outside the scope of outcomes measures, it is important to note here the need for additional basic data. These core data should include patient identifiers and descriptors. Regardless of what domains are collected, supplemental information should be collected on all patients in the system to serve the following four main goals: to describe the population, to identify subpopulations of interest, to provide data for predictive modeling, and to provide data for risk adjustment. If one of the goals of a system is to assess the impact of treatment process on outcomes, additional information related to treatment characteristics must also be included. Clinical Status. Clinical status is a broad domain that includes not only the symptoms of the disorder, but also areas such as severity, stability, and complications. The most common and perhaps simplest measure of clinical status is that of symptoms, which may include evaluation with respect to frequency, severity, and duration (Coughlin et al., 1997). Measures of symptoms simply evaluate the events or attributes due to the disorder experienced by the patient, whereas other measures of clinical status evaluate the severity of the symptoms, stability of the patient's status, and any clinical complications that may arise. Severity is measured as the duration of symptoms produced by the illness and the corresponding impairment in life. The same level of severity does not imply the same level of symptoms, nor does the same level of symptoms imply the same level of severity (Coughlin et al., 1997). Changes in symptom severity is often the most important and observable short-term or immediate measure of effectiveness in mental health treatment. In addition, measures of symptoms typically are objective in nature and can be collected from independent evaluators, often in a simple checklist format. Stability is a measure of the degree to which the patient's status will not change or fluctuate with respect to the disorder. Where severity and symptoms tend to be static measures (i.e., what is the clinical status at this moment), stability can be thought of as an expectation of severity or symptoms in the future. Complications consist of any undesirable results that occur while the patient is being treated. These results are not necessarily limited to those that occur directly from treatment or the treatment setting. A simple example would be the side effects experienced from taking a certain medication. Functionality. Functionality can be thought of broadly as individuals' ability to maintain their independence and roles within life (Andrews, Peters, & Teesson, 1994). More specifically, being functional means being able to perform regular and expected tasks and responsibilities. Exactly what comprises regular and expected functions must

< previous page

page_190

next page >

< previous page

page_191

next page > Page 191

be defined. As with clinical status, the domain of functionality is multidimensional. In general, functionality can be divided into three areas: physical, social/interpersonal, and role (work or school). Physical functionality is defined as the ability to perform usual physical activities, such as climbing stairs. Clearly, when multiple age groups are combined, especially people from the geriatric population, the expectations of normal functioning vary. The geriatric individual may not be able to perform typical adult physical functioning due to complications related to age, not clinical condition. Therefore, both the population of interest and the normal expected functioning of that population must be considered when measuring physical functionality. Sociallinterpersonal functionality is defined as an individual's ability to interact and maintain relationships with others, including friends, family, and support networks (Coughlin et al., 1997). Unlike physical and role functioning, expected social and interpersonal functioning is more consistent across age groups; however, this area of functionality may vary across cultural groups. Role functionality is defined as people's ability to perform their primary roletypically, that of student or employee. Role functioning is difficult to measure, because it has different meanings and priorities across and within age groups. For example, the primary role function of an adolescent is to attend school, whereas that for an adult is typically to be employed. It may be unclear whether the priority an adolescent assigns to attending school is commensurate with that which an adult assigns to being employed (e.g., the adult may be financially dependent on continuing that role). Measuring role functioning is further complicated by the relatively high rate of unemployment within the population of adults with mental health and substance abuse problems; thus, this role is applicable only to a subset of the adult population. The inclusion of a geriatric population further complicates the ability to measure role functioning, which is even less consistently defined within the elderly population. Expected functions related to physical, social/interpersonal, and role functionality clearly differ across age groups and populations, which may necessitate different tools for assessment. Differentiating patients' ability from what they actually do is not necessary when measuring functionality, as mental disorders affect both. Moreover, the outcome of interest is truly the level of functioning achieved (Andrews et al., 1994). Functionality is important not only to patients and the people around them but also to society as a whole. Increasing patients' abilities to care for themselves, physically and financially, and to interact with others decreases the responsibilities of caregivers and the community. Quality of Life. As with functionality, quality of life is a multidimensional domain, encompassing the physical, cognitive, affective, social, and economic areas of a patient's life. When asked to list the factors that influence quality of life, patients identified physical and mental health, living environment, income, happiness, and meaningful work (Cook & Jonikas, 1996). Although quality of life overlaps with functionality (Coughlin et al., 1997) and clinical status, they differ in that quality of life measures typically emphasize the patient's subjective outlook and values as they relate to functioning and clinical or health status. Health-related quality of life measures evaluate the quality of life with specific regard for the disorder being treated. Some measures of quality of life emphasize the objective view of what resources are needed to achieve desired needs. Measures of burden experienced by the care provider, for example, can be included in the domain of quality of life (Andrews et al., 1994).

< previous page

page_191

next page >

< previous page

page_192

next page > Page 192

Measures of quality of life and general health concepts are not well understood, have poor specificity, and generally are not well accepted by the scientific community. Even so, quality of life measures are very important to the patient, because they represent individuals' beliefs about their own status (Dornelas, Correll, Lothstein, Wilber, & Goethe, 1996). Lansky noted that although health status measures may have limited use as outcomes, they can be useful as predictive or risk-adjustment variables (Lansky, Butler, & Waller, 1992). Satisfaction. Although the patient has always been viewed as providing important information related to mental health services, it was not until Attkisson and Zwick (1982) published the Client Satisfaction Questionnaire (CSQ) that satisfaction became an accepted outcomes domain. In recent years, satisfaction has become increasingly important in the health care marketplace, both to patients and payers (AIS, 1996a). Satisfaction is, by definition, subjective, depending on individual values and perspectives. Moreover, measures of satisfaction can only be provided by the patient. In general, though, the domain of satisfaction measures whether a patient's expectations were met (Lyons et al., 1997). This should include measures of whether patients received what they wanted, when they wanted it, and whether what was received met their expectations. Satisfaction tends to be an easy area in which to collect information, for two main reasons: (a) The goal of information collection is clear, and (b) some basic commonality exists in measuring satisfaction, regardless of consumer differences. For example, timeliness of service and courtesy of staff are important aspects of consumer satisfaction in all service-related areas, not just mental health. The dimensions of satisfaction as described by Lyons et al. (1997), based in part on the satisfaction measure developed by the Rand Corporation for the Medical Outcomes Study (Ware & Hays, 1988), are summarized in Table 6.3. These dimensions are consistent with those outlined by the Mental Health Statistics Improvement Program (MHSIP) of the Center for Mental Health Services. MHSIP (1996) identified the four areas of service of most importance to consumers: access, appropriateness, outcomes, and prevention. (Prevention, although important to consumers, is not readily measured as an outcome by consumers; thus, omission is acceptable.) Some practitioners may argue that patients' satisfaction with their treatment is not important to its effectiveness. Yet, research findings suggest that satisfaction and effectiveness TABLE 6.3 Dimensions of Patient Satisfaction Dimension Description Technical quality Expertise of a service, including policies and procedures Competence Skill, level of training, knowledge of provider Interpersonal quality Personable qualities of the people encountered; focus is on people, not professionals. Access Ease of obtaining desired services; more objective than other areas Availability and choice Choices available and level of involvement in of service choice; similar to access Duration of care Duration of care was as expected; too little or too much can affect satisfaction Benefit/value Gain from service, usually improvement in clinical outcomes; assess gain in conjunction with cost; data are limited

< previous page

page_192

next page >

< previous page

page_193

next page > Page 193

are closely related. For instance, Pickett, Lyons, Polonus, Seymour, and Miller (1995) found satisfaction to be a strong correlate with perceived clinical benefit. Furthermore, Andrews et al. (1994) suggested that satisfaction may be an indicator of long-term outcomes, as satisfaction with services can influence care-seeking behavior and adherence to treatment regimens. Resource Use/Treatment Utilization. The domain of utilization typically measures the amount of health care servicesmedical and behavioralused for a particular episode of care, but it can also include measures of concurrent services and end-of-treatment level of care (Lyons et al., 1997). This domain tends to be most important to payers and providers in assessing costs, with respect to both time and money. When combined with other data, such as measures of change in clinical status, resource utilization measures provide the information necessary to begin assessment of efficiency and cost offset. A measure of the health care services involved within a single episode of care is one of the simplest types of utilization measures. Furthermore, data such as length of stay often are measured to properly analyze other outcomes. Failure to measure length of stay for a given episode of care makes it difficult, if not impossible, to interpret other outcomes (e.g., clinical improvement; Lyons et al., 1997). Although measuring utilization outside the current episode of care is desirable and may provide the most valuable information, the deterrents to doing so are numerous. For example, to properly measure rehospitalization following treatment requires collaborative efforts and combining data from numerous facilities and services. Thus, a number of issues will arise, including legal and ethical issues related to patient confidentiality and practical issues in combining multiple databases. In sum, the process of measuring utilization outside the current episode is time consuming, complicated, and expensive (Lyons et al., 1997). From Whom to Measure Clinical outcome data can be collected from three primary sources: the clinical staff, the patient, and collateral to the patient (i.e., sources connected to the patient). Each will provide important data and, at times, may be the only source of data in a given domain. Moreover, sources will provide a varying perspective and rate the outcome as it relates to them. For example, when rating response to treatment, the therapist will rate the degree of change due to treatment, patients will focus on their experienced change in subjective state, and patients' friends or relatives will rate the change as it directly affects them (Docherty & Streeter, 1996). Clearly, it would not be surprising for the ratings of response to treatment to vary among these three sources. Although these ratings may be contradictory at face value, when considered from the vantage point of the source of the information, each will provide valid and important information. In fact, when it is possible to obtain multiple sources of information, doing so is preferable to having just one source. Each source of information has strengths and weaknesses, which vary depending on the domain being measured (Docherty & Streeter, 1996). The following sections address each type of source in more detail. The Clinical Staff. The clinical staff may include not only therapists but anyone who provides clinical services. This staff is considered the best source of information for objective clinical measures, such as symptoms or disease severity. Although the

< previous page

page_193

next page >

< previous page

page_194

next page > Page 194

clinician generally can be expected to be more objective than the patient, the clinician's objectivity in evaluation over the course of treatment may be questionable. Obtaining a second opinion from an external observer would theoretically eliminate this concern; however, in practice, obtaining such information is expensive and difficult. The one exception may be an independent evaluation from a case manager, if the usual business flow includes such an individual (Lyons et al., 1997). Clinical staff information is typically available in three forms: administrative records, including insurance records and encounter files; medical records, including clinical notes and laboratory tests; and specifically designed surveys, including research study forms and other documents developed to collect outcomes data (McGlynn, 1996). As outcomes measures continue to be required for accreditation purposes, information from clinical staff will be increasingly available. However, as is true whenever data are used for a purpose other than that for which they were collected, clinical data collected from sources other than surveys share a number of inherent problems when used in an outcome system: lack of standardization, inability to link across sources, uneven quality or detail, and lack of important elements (McGlynn, Damberg, Kerr, & Schenker, 1996). Although collecting data from clinical staff through surveys allows for customizing the information collected, doing so typically is more expensive, requires additional work by clinical staff, and may have a low response rate. The Patient. It is generally agreed that patients are a reliable source of information for issues related to their symptoms, functioning, and satisfaction (Docherty & Streeter, 1996; Panzarino, 1996). Patients also can provide essential data about their thoughts, feelings, and behavior that is often unavailable from other sources. However, due to the nature of mental health disorders, collecting accurate information from patients is especially complicated in this area of health care. The results will depend on the severity and type of disorder; for instance, the ability of an individual with a severe mental health problem (e.g., psychosis) to provide valid information (especially at the initiation of treatment) may be questionable (McGlynn, 1996). An additional complication related to patient-reported data is bias due to false or incorrect reporting. False reporting occurs when a patient knowingly reports incorrect data. Incorrect reporting occurs when a patient does not provide correct data, perhaps due to an inaccurate memory (Azar, 1997) or misinterpretation of a question. To minimize such bias, extra care must be taken in collecting data from patients. Such care can be in the form of providing an appropriate setting for answering questions, giving clear instructions, or scheduling data collection at a time when the patient will be as able as possible to provide accurate information. Collateral to the Patient. As noted earlier, collateral sources are those individuals who are connected to patients and have a stake in their treatment, such as relatives, spouses, friends, and employers. Because mental illness and substance abuse often affect the people close to patients (in addition to themselves), these people are able to provide additional information related to outcomes. Patients' family and friends often are most interested in the same things as the patients: their functional status, quality of life, and satisfaction. However, the collateral source's assessment of the patient's level of functioning may differ from the patient's own assessment (McGlynn, 1996). Actually, an evaluation of the patient's functionality with respect to employment or social functioning may be best conducted by someone other than the patient, such as an employer, family member, or friend. Such an individual will

< previous page

page_194

next page >

< previous page

page_195

next page > Page 195

be able to provide a more objective evaluation than will the patient (providing, of course, that the collateral source is close enough to the patient to provide valid information). Although collateral information brings an added dimension to the outcomes measured, obtaining consistent information from collateral sources is challenging. The first issue relates to the availability of data. Some patients may have no collateral sources available, whereas other patients may be unwilling to provide contact information for such outside sources. Even when information about collateral sources is provided, response rates from these individuals are typically lower than those obtained from patients. Collateral data also have inherent variability based on the relationship of the person to the patient. Just as a patient and a clinician will have different perspectives related to the same outcomes, so will a spouse and a sibling providing information about the same individual. Even the amount of time the collateral source spends with the patient will vary and thus affect the accuracy of information provided. Yet another complication related to collateral information is determining how to use contradicting data when quantitative information is collected from collateral sources (e.g., relapse rates in substance abuse patients). Making this determination is especially difficult when collateral data are not available for all patients. Even when all the complications are considered, collateral sources provide important data that should not be excluded. Adding basic information about the collateral source's formal and practical relationships to the patient will increase the value of the data collected. Although being able to collect collateral data is ideal, it is difficult in practice (Lyons et al., 1997). One of the most practical and useful means of collecting collateral information is through a satisfaction instrument. When to Measure Determining when to measure outcomes depends on the outcomes being measured, the intervention being examined, and the logical model of how one effects the other (McGlynn, 1996). In general, outcomes should be measured as early as possible, when it is meaningful and convenient, and on a fixed schedule. The use of multiple timepoints is desirable in theory but increases the burden of respondents, jeopardizes compliance rates, and complicates data management (Lyons et al., 1997). However, conducting measures at baseline and multiple measures following treatment is essential for demonstrating valid change and effectiveness. Baseline. The timing of baseline assessments of patients with mental health and substance abuse problems is complicated. Ideally, baseline data should be collected prior to treatment (e.g., at the time of intake) for the purpose of establishing an accurate pretreatment status. But sometimes, data collection may have to be delayed until the patient is stable; in these cases, the immediate effect of treatment (i.e., stabilization) will not be represented in the patient data. Following a delay, the baseline assessment should be conducted as soon as the patient is stable enough to provide valid information. When assessing inpatients, allowing time for the baseline assessment is necessary; however, a maximum time from intake should be established (say, 24 to 48 hours from intake), after which data will not be used to assess pretreatment status. End of Treatment (Short-Term Follow-up). End-of-treatment measures evaluate the immediate effects of treatment interventions. In inpatient settings, these assessments can be built into discharge practices and collected routinely. In other settings, however,

< previous page

page_195

next page >

< previous page

page_196

next page > Page 196

final treatments often are not planned, making end-of-treatment evaluations difficult. This is especially true of data collected from either the patient or a collateral source. Moreover, some outpatient treatments are long term, which may make it difficult to conduct end-of-treatment evaluations in time-limited studies (Lyons et al., 1997). Such problems can be avoided by scheduling assessments at regular times during treatment (e.g., monthly). Some domains, such as satisfaction, only need to be measured at the end of treatment; even so, there are still important issues related to the timing of such measures. Although collecting satisfaction results at the time of discharge from a facility or at a final treatment provides the highest response rate, the results tend to have an upward bias, indicating higher satisfaction than the true level due to lack of anonymity and fear of retaliation. To maximize the response rate and minimize bias, Dornelas et al. (1996) suggested mailing the patient a questionnaire 4 to 10 days after discharge. Follow-up. As noted earlier, the ability to show an immediate effect of treatment is of great importance in assessment. In today's health care marketplace, however, many consumers and payers also require a demonstration of sustained effectiveness, as shown by follow-up data. Some domains related to effectiveness require long-term follow-up in order to show effects. For example, although interpersonal functionality may change with improvement in symptoms, functionality will continue to improve after symptom recovery. Demonstrating effectiveness in domains such as long-term functionality, future resource utilization, and relapse rates requires long-term followup assessment. Whereas collecting data for baseline (intake) and end-of-treatment (discharge) measures can be built into usual clinical practice and business flow, data collection for follow-up measures is inherently more complicated. Namely, collecting follow-up data involves numerous logistical and clinical issues. For example, determining the optimal time to collect follow-up information is complicated. Although it is desirable to show the longest sustained effect possible with treatment, the greater the time between treatment and follow-up, the lower the expected response rate, the greater the chance of external factors affecting the outcome, and the lower the ability to attribute change to treatment. Thus, without a sufficient sample size, long-term follow-up (i.e., more than 6 months) may not be feasible, and determining the timing of the follow-up may be limited (Dornelas et al., 1996). The population under consideration also will affect the decision of when to follow-up. Follow-up assessment on measures such as relapse, for instance, will not be meaningful if they are not timed appropriately for the population being studied. The nature of the population will affect the expected response rate, as well, and thus should be considered in determining the timing of the assessment (Dornelas et al., 1996). For instance, it can be costly or even impossible to locate a transient population for the purpose of conducting a 6-month follow-up; a 30- to 60-day follow-up may be appropriate for this population. However, a population from an employee assistance program (EAP) may be available for a 6-month follow-up. As discussed earlier, most systems of outcome measures consider episode of care as the framework of when to measure outcomes. However, decisions related to the timing of outcome measures could be affected by a new concept, episode of illness, which may encompass many episodes of care and cross a number of levels of care and various providers. As the continuum of care concepts and complete-care provider networks continues to grow, so will measurements related to episode of illness. If possible, an outcomes system should allow for measurement of episode of care as well as progress throughout episode of illness (Docherty & Streeter, 1996).

< previous page

page_196

next page >

< previous page

page_197

next page > Page 197

Steps in Designing and Implementing an Outcomes System Designing and implementing an outcomes system must begin with identifying its purpose and goals. Why is the system being implemented? is the question that needs to be asked, discussed, clarified, and answered. Consider that the nature and design of an outcomes system being implemented primarily to meet regulatory body requirements will be considerably different from one created to demonstrate accountability to multiple stakeholders. Moreover, it is possible for the stakeholders within one health care system to have different goals. Some may want to measure the outcomes of the tasks and processes performed; others may want to know consumer-patients' perceptions of the treatment delivered; and still others may want to know how often and how well the processes in place are performed. Closely related to the question of why the system is being developed is the question of how the information will be used. More specifically, is the outcomes management system a subset of quality improvement efforts? Will reports be used to demonstrate accountability to the external marketplace or to meet regulatory requirements? Or, will information about reports be used to support internal continuous improvement efforts? Is a set of measures being established to further the scientific efforts of the professional community regarding the effectiveness of new treatments? (Solberg, Mosser, & McDonald, 1997). The answers to these questions are so tightly aligned with the question of purpose that it is difficult, if not impossible, to separate the two. The only way to approach these questions is to solicit input from those stakeholders who will be gathering the data and using the information once it has been collected and compiled. How does management want to use the information? What types of resistance are present in the areas where the data will be collected? What purpose will outcomes measures have for the people responsible for data quality? Because different stakeholders likely will have different needs and the needs of these various stakeholders will need to be served by the same system, it is vital that a clear, written definition be produced in the form of a design document, identifying what each stakeholder expects to achieve. Those individuals who are designing the system need to identify the various stakeholders' needs, prioritize them, and then plan and communicate whose needs will be met, as well as how and when it can be done. This design document will serve as the blueprint for developing and implementing the outcomes management system; as such, it should be considered a working document that can be changed as the process unfolds. In finalizing the design document, it is important to consider whether individual stakeholders' needs are aligned with the overall vision of top management. If that vision is not clear or has not been adequately communicated, it will be necessary to resolve these issues. The overriding purpose of the outcomes system must be clearly stated before it can be determined what part each stakeholder will play in achieving that purpose. To meet accountability, research, and continuous improvement needs, it may be necessary to establish different measures and assign responsibility for them to different functions within the organization. A board of directors or steering committee might be put in place to coordinate and facilitate the efforts of these various functions, as required. The size and complexity of the organization will dictate whether such delineation of responsibility is needed. Likewise, how much the organization can successfully manage

< previous page

page_197

next page >

< previous page

page_198

next page > Page 198

TABLE 6.4 Characteristics of Measurement for Improvement, Accountability, Improvement Accountability Research Who? Medical group Purchasers Science community Audience Quality improvement Payers General public (Customers) team Patients/members Users (clinicians) Providers and staff Medical groups Administrators Why? Understanding of Comparison New knowledge, Purpose a. process Basis for choice without b. customers Reassurance regard for its Motivation and focus Spur for change applicability Baseline Evaluation of changes What? Specific to an Specific to an Universal (though Scope individual individual often medical site and medical group and limited process process generalizability) Measures Few Very few Many Easy to collect Complex collection Complex collection Approximate Precise and valid Very precise and valid Time period Short, current Long, past Long, past Confounders Measure or control Consider but rarely Describe and try to measure measure How? External Internal and at least External and usually Measures involved in the prefer to control both selection of measures process and collection Sample size Small Large Large Collection Simple and requires Complex and Extremely complex process minimal time, cost, requires and and moderate effort and expensive expertise cost May be planned for Usually repeated several repeats Very high Need for None for objects of High, especially for (Organization and confidentialitypeople) comparisonthe goal the is exposure individual subjects Note. From ''The Three Faces of Performance Measurement: Improvement, Accountability, and Research," by L.I. Solberg, G. Mosser, & S. McDonald, 1997, Journal of Quality Improvement, 23, p. 141. Oakbrook Terrace, IL: Joint Commission on Accreditation of Healthcare Organizations. Reprinted with permission. and resource is a determining factor in making preliminary decisions about who will receive what information once the outcomes management system is in operation. Table 6.4 provides a summary of the types of purposes that can be met by various measures. Measures of Quality Improvement, Accountability, and Research Quality Improvement. Quality improvement measures focus on processes that consist of a complex series of linked steps. By examining processes in improving care, rather than people, fear of blame is removed; thus, everyone will be able to concentrate on improvement, rather than defensiveness. As summarized by Solberg et al. (1997), "The most powerful improvements usually come from an understanding of processes and

< previous page

page_198

next page >

< previous page

page_199

next page > Page 199

from efforts to systematize them. . . . A focus on process also forces one to pay more attention to the desires of the customer and to the use of a data-driven scientific approach to change rather than a reliance on hunches and tampering" (p. 138). Individual measures should be specific to the process being implemented and involve key steps in the process. The data gathered must be specific to a group, facility, or unit with an individual implementation scheme. Data from multiple groups or sites usually are not useful for improvement purposes, because the different entities have their own implementation needs. Accountability. Accountability measures are useful for external groups, such as managed care companies, employers, and consumer groups. Yet, any one of these organizations also may be interested in how it compares with other similar organizations. Therefore, some overlap between accountability and quality improvement initiatives may be acceptable, because the comparison may initiate a move for internal improvement. Outcomes measures for accountability are not intended to be confidential and thus may produce fear and defensiveness. Fear that the information will be misused is common among staff in an organization. Moreover, staff may become defensivetrying to show that the information is wrong or that their group is differentwhen too much emphasis is placed on outcomes for accountability, particularly in the early stages of implementing an outcomes system. In contrast, measures for quality improvement purposes should not be released to the public, because doing so could be damaging to improvement efforts in the organization. (A reasonable exception to this policy is when some of the information provided by the quality improvement process is needed for external groups.) Research. The purpose of outcomes measurement for research is to produce new knowledge of general value within the field. Clinicians are trained in the research process and the application of information from research to a realworld setting. However, the target populations for such research would be limited compared to those used for measuring process improvement or accountability, as a more rigorous study design would need to be put in place. Outcomes measurement does not have the rigor of efficacy studies, and a clear understanding of which measures will be used for whatimprovement, accountability, or researchis essential to success. One way to gather information from various stakeholders is to involve them in focus groups. These groups can be used throughout the design and development process to examine the overall purpose of the system, to formulate specific questions that various stakeholders want answered by the outcomes system, or to plan the actual implementation (i.e., who does what, for whom, etc.). The following are some of the questions focus groups might address: Who will be involved? Specifically, who will provide the data? Who will use the information? Who will be responsible for the results? What key questions will be answered, and for what group of stakeholders? What tools or measures will be used? When will data be collected, and how will they be processed? What is the best way to incorporate data collection into the current workflow? Employees who are involved in decision making will have ownership of the process and be more likely to support the changes needed for the outcomes management system to be successful.

< previous page

page_199

next page >

< previous page

page_200

next page > Page 200

In designing an outcomes management system, decisions must be made to prioritize the measurement needs of various stakeholders. In doing so, two important (albeit obvious) limitations should be considered: "Not everything can be measured at the onset," and "A perfect solution is not possible" (Sauber, 1996, p. 13). Acknowledging these limitations will allow those involved to narrow the scope of the system, making it possible to achieve some of the goals. The most important point to keep in mind is to begin measurement with the most important and achievable goals. Implementing a partial system is better than not implementing any system at all. After the system is in place and users have begun to utilize it, incremental expansions and improvements can be made (Sauber, 1996). Moreover, implementing a partial system and building on it allows for improvement as the experience grows. Creating a Work Plan Once the goals of the system have been explicitly defined and information has been collected from various stakeholders regarding their needs and views, those individuals responsible for the outcomes management system must create a work plan or program. That plan should carefully delineate and define all the steps that will be taken to implement the system. In particular, the work plan should provide operational definitions of the three major steps in the outcomes management system: data collection, data processing/reporting, and data storage. It is impossible to define one area independent of the others, because all three are interrelated. Nonetheless, each will be discussed in a following section. Data Collection. To collect outcomes data of high quality and quantity, a careful implementation strategy must be developed. The operational details of implementing an outcomes system should be well planned and discussed with all the staff involved in the process. The first step in planning for data collection is to answer a few preliminary questions: What populations are served by the organization being measured, and which populations will be included in the measures? What services are offered to these various populations, and which ones do you want to measure? Once these preliminary decisions have been made, it will be possible to identify and define the domains that could be measured to achieve the stated purpose and goals of the outcomes management system. In selecting domains to measure, it is important to ensure that the information collected will be relevant and understandable to the various stakeholders. A consensus of which domains should be included probably will not be possible, due in part to the stakeholders' differing needs and goals. Table 6.5 summarizes a number of questions that should be considered when designing the data collection process. After answering these questions, a workflow plan or diagram should be developed for the outcomes system. The following issues need to be analyzed: the flow of patients through the system, the staff's operational workflow and areas of responsibility, and interfaces with other departments and the flow of external resources (e.g., information processing, paper/information) to and from other departments. The workflow plan should identify each department that will be involved in the data collection process (see Fig. 6.2). And, within the plan, proper staff should be identified to coordinate the collection of staff and patient data.

< previous page

page_200

next page >

< previous page

page_201

next page > Page 201

TABLE 6.5 Questions to Consider in Designing the Data Collection Process What measures will be collected? What domains are required to meet the goals of the system? What additional information is needed to support the system? What additional domains may be of use or interest and easily built into the system? How will these domains be measured? From whom will data be collected? Who can provide the data needed? Who is the population of interest: all respondents or a subpopulation? Will data be collected from a sample of the population or from all respondents within the population? How will the data be collected, and who will be responsible for ensuring quality data? Does this data exist in another system, or will it need to be collected from the source? What type of instrument will be used to collect the information from the sources: self-report forms? interview format? When will the data be collected: at intake? throughout the treatment process? at discharge? during follow-up? At what times is it necessary to collect the data? What logistical or attitudinal barriers are likely to impact the data collection process? The success of any outcomes system is contingent on incorporating data collection into the day-to-day flow of operations, so performing a workflow analysis may be one of the most critical steps in the process. Within a single organization, it is possible that the workflow may vary from facility to facility and/or provider to provider. Therefore, multiple workflow processes may need to be examined and different implementation schemes developed to account for variances. Once the workflow diagram or diagrams have been completed, it will be possible to identify the best place to integrate data collection into the intake, treatment, and discharge processes. Large systems that are actively trying to improve their data collection processes should review the workflow plans of other facilities that provide good as well as poor qualities and quantities of data. The information garnered from such reviews can be used to implement the best practices. Once the workflow analysis has been completed, it will be necessary to develop explicit definitions of all the data elements. Regardless of how obvious and intuitive a data element may seem, it still may be misinterpreted unless clearly defined. The following are examples of data element definitions: Admission date: The date the patient was admitted to the health care organization as an inpatient or outpatient. Discharge date: The date the patient was discharged from the health care organization as an inpatient or outpatient. Date of birth: The month, day, and year the patient was born. Diagnosis: The patient's primary ICD-9 code. Discharge disposition: The code indicating patient status as of the ending service date of the period covered by the present episode of care. Developing operational definitions of the data elements to be collected is important to assure consistency as well (Gift & Mosel, 1997). Data Processing/Reporting. The workflow plan must incorporate the processing and reporting of outcomes data into the overall scheme of implementation. Data can be processed locally or at a database center for a multifacility organization. A neutral

< previous page

page_201

next page >

< previous page

page_202

next page > Page 202

< previous page

page_202

next page >

< previous page

page_203

next page > Page 203

Fig. 6.2. Adult psychiatric inpatient workflow diagram.

< previous page

page_203

next page >

< previous page

page_204

next page > Page 204

database center also can be used for a multifacility or multiorganizational outcomes collaborative. Again, the workflow of each department involved in processing and reporting data needs to be reviewed and a plan developed for each segment of the outcomes system. The goal should be achieving a smooth integration among collecting, processing, and reporting data. Before data collection actually begins, a detailed plan of the data analysis and reporting strategy should be developed. The following questions will guide the report planning process: What are the key questions to be answered? How will the data be analyzed? Who is the primary audience for the report? How will the report be used? In what format (i.e., charts, tables) will the results be presented and communicated? How often will a new report be generated? Determining the answers to such key questions and ensuring that valid analytic methods have been used are basic requirements of any report. Beyond these requirements, the most important element in creating a successful reporting system is to tailor the report to the audience. A report must be useful and valid from the user's perspective. Thus, the level of complexity and depth of information presented will depend on the audience and its abilities and needs. For example, clinicians, managers, and purchasers will want detailed information they can understand, trust, and act onnot just general statements but specific information (e.g., about how patients admitted on Saturday feel about their waiting time, information received about the diagnosis and treatment plan, etc.). However, the CEO of an institution may only want the "bottom line," or a general summary of how the institution is performing. Regardless of the audience, some basic guidelines should be followed in developing a valid and useful report. Whenever possible, comparative measures should be provided for all results. The valid comparative group may be a predefined benchmark or goal, historic results, or similar respondents from another group, depending on the goals of the system. Creating a guide for the use and interpretation of the report is also recommended. This guide should be brief yet address both the strengths and limitations of using and interpreting the report. Data Storage. Designing an effective and useful database for an outcomes system usually requires consultation with specialists, as database design and administration is a specialized area within computer science. Nonetheless, several important issues should be considered when designing and developing a database for an outcomes system. An outcomes system database serves two primary goals: to accurately maintain all necessary information and to support analysis. The design of the database must balance these two goals. Thus, it should be developed after the data collection and reporting segments of the outcomes system have been planned. Database design should include input not only from the database experts but from the researchers who designed the data collection instruments as well as those who will analyze the data. Ideally, a database should be designed to allow flexibility with respect to questionnaire modifications. Thus, as the outcomes system grows and improves, the database will be enhanced such that all data can be maintained and accessed. At the onset, the individuals responsible for maintaining the database should be identified, including those responsible for quality assurance. A quality assurance plan for

< previous page

page_204

next page >

< previous page

page_205

next page > Page 205

the database should be developed while the database is in the design stage. This plan should identify how the database will be assessed for quality, how often it will be assessed, how the quality assessments will be documented, and what measures will be taken if the database fails any of the quality assessments. A database that assures reproducibility of reports in the future, even as the database grows, is advantageous in assessing the accuracy and quality of reports and analyses. Thus, it is important to consider the issue of reproducibility of reports when developing a data quality assurance plan. When the outcomes system involves using information collected from another database, additional issues must be considered. For instance, if individual observations from two databases are to be linked, the two databases must be compatible. Furthermore, a method must be developed and tested to link individuals or individual stays using unique identifiers. The design of the outcomes database also may be restricted by the need to be compatible with other databases. Avoiding Implementation Problems The checklist in Table 6.6 should be considered to avoid problems and issues that may impede implementation of the outcomes system. Patient-Related Issues A number of patient-related issues need to be addressed when designing and implementing an outcomes system, including security and confidentiality of data and patient resistance to completing questionnaires. Security is defined as the means for preventing unauthorized access. Confidentiality is defined as restricting access to information only to those individuals with appropriate reasons to have such access (NCQA, 1997a, p. 41). Security and Confidentiality. Maintaining the security and confidentiality of patient information needs to be considered in all applications within an outcomes system. Some researchers fear that the existence of large databases that contain patient information from several organizations may allow for breaches in patient confidentiality. The NCQA (1997a) emphasized the necessity of making a clear commitment to protecting the confidentiality of patient information. Internal policies should be developed that create a balance between a patient's right to privacy and an organization's need to understand who is utilizing its services, under what conditions, and to what ends (Lyons et al., 1997). Patient Resistance. Another patient-related issue is the reluctance some patients may have about providing information about their current condition. Lyons et al. (1997) discovered that some patients were afraid that if they reported they were doing well, their benefits would end. Other patients feared that if they reported they were not improving or were doing poorly, their treatment would end because it was not working. In order to improve the collection of data about patients, patients need to understand that the information they provide will be used to improve the quality of the services they receive. Careful explanationsin person and in writingof how the information will be used is a crucial aspect of implementing an outcomes system.

< previous page

page_205

next page >

< previous page

page_206

next page > Page 206

TABLE 6.6 Checklist for Implementation of the Outcomes System 1. Planning Stages A. Develop a well-organized, comprehensive, and sensible plan. Incomplete or poorly developed plans that are shown to staff/providers will generate unnecessary concern and resistance. B. Involve staff in the design of the outcomes management system, and ask for their commitment to the new process. Receive feedback on the design of the outcomes systemfrom supporters as well as nonsupportersto allow for fine-tuning. Demonstrate how the outcomes system can help meet regulations and requirements to continuously improve the quality of patient care. C. Pilot test the measures and methods to fine-tune the implementation plan. For the pilot phase, enlist staff who are highly motivated as well as staff who are resistant to the new outcomes system. (By using staff who are resistant, potential shortcomings and problems can be addressed early.) Collect feedback from patients, staff, and managers concerning the pilot project. 2. Instrument Selection A. Select patient-response instruments that take less than 20 minutes to complete, as longer instruments probably will meet with noncompliance and yield poor data. Select staff-response instruments that take only about 5 minutes per patient. Staff may be resistant if data collection and processing require more time than they feel is necessary. B. Involve the staff who will be collecting and using information in building meaningful reports. Select and design reports at the beginning of the implementation stage. Although reports can be modified later, it is important to start with a few that will be widely accepted and used to support important initiatives. 3. Staff Training A. Train staff regarding all aspects of the system that will affect their work, including data collection and reporting. B. Develop a procedures manual. Include Quick Start Guides to provide easy and quick access to key tasks. Include a workflow diagram that incorporates data collection and reporting into daily operations. 4. Implementation Stage A. Assign one person to be responsible for the quality and completeness of data collected in each organization. This person's duties should include the following: Facilitate the data collection process by providing tools and updates to those who need them. Ensure that data are complete before submitting them for processing. Identify and correct operational problems. Facilitate the timeliness of data collection and processing. B. Encourage staff participation during data collection. Share results with staff so they are aware of the benefits of successfully implementing an outcomes system. Encourage staff to review and understand reports so they know how the results of their data collection efforts are being utilized. Share process improvements that occur as a result of data collection efforts. Training The fundamental question for the trainer concerns how value for performance improvement processes and measures can be created in a population that knows it must show favorable outcomes and patient satisfaction results to remain in business but has little experience and, in many cases, little desire to use the information to operate its organization.

< previous page

page_207

next page > Page 207

Building value starts with communicating the purpose and anticipated benefits of implementing an outcomes management system to everyone who will be involved in itfrom the initial design stage through the implementation and evaluation stages of the process. All of the interactions that take place early in implementation lay the groundwork for subsequent phases; thus, training starts early and can become part of the focus group data collection process conducted at the beginning of the design phase. Adults like to know why changes are being made and how the changes will affect their jobs. It is only when individuals understand the benefits they will enjoy that true commitment will occur. The trainer should share some of those benefits with individuals (e.g., that the organization is implementing an outcomes system to compete in the current market, to reduce cost and/or increase revenue, to provide the information so essential to doing business in a managed care environment to the people who need it, etc.). It is important for the organization to communicate to its employees realistic expectations and goals, both the macroand micropurposes of the change. Individuals will play important roles in the success of outcomes measurement, so individuals must understand what specific part they will play in implementing the system. When conducting training, it is important to identify learners' current level of knowledge and begin teaching from that point forward. Consider what three to five things learners must know to be motivated and able to collect high quality data. Include all employees who will be responsible for providing clinical data, implementing patient surveys, or managing medical records. Design training materials so that current staff can use them to train new staff when needed. Depending on the work environment, it might be possible to train in half-day sessions, via teleconferencing, or via telephone conferencing. The training should be timed so that data collection will begin within a week of the training. It is also important to have a support line in place to answer questions once the actual implementation begins. Employees will probably resist changes if they feel threatened. Trainers should understand ways to minimize such resistance. One approach is to identify informal leaders who support the move to outcomes measurement and arrange to have them participate in training sessions. If employees are willing to treat the organization's outcomes measurement results with the same importance as its financial results, not only will the quality of patient care be improved but the "bottom line" will be as well. Conclusions The need for quality and accountability in behavioral health care drives outcomes management efforts. Taking measurement from the research or assessment environment to the brief therapy world of today's practitioner is a challenging task. With managed care becoming a reality of the behavioral health care industry, the idea of devoting the precious commodities of time and money to measurement can be overwhelming. Yet, the paradox exists; with measurement and continuous improvement becoming an integral part of a market-driven health care industry, as evidenced by its inclusion in the accreditation process for payers and providers alike, it is not possible to compete in the behavioral health care arena without an outcomes management system. Thus, the challenge is not whether to design and implement an outcomes management system, but how to design one that is meaningful, cost-effective, and can be implemented in an efficient manner.

< previous page

page_207

next page >

< previous page

page_208

next page > Page 208

References American Managed Behavioral Healthcare Association. (1995, August). Performance measures for managed behavioral healthcare programs. Washington, DC: AMBHA Quality Improvement and Clinical Services Committee. Andrews, G., Peters, L., & Teesson, M. (1994). The measurement of consumer outcomes in mental health: A report to the national mental health information strategy committee. Sydney, Australia: Australian Government Publishing Service. Atlantic Information Services. (1996a). A guide to patient satisfaction survey instruments: Profiles of patient satisfaction measurement instruments and their use by health plans, employers, hospitals, and insurers. Washington, DC: Author. Atlantic Information Services. (1996b). Health care report cards: Profiles of all major reports, shopping guides and consumer satisfaction surveys (2nd ed.). Washington, DC: Author. Attkisson, C.C., & Zwick, R. (1982). The Client Satisfaction Questionnaire: Psychometric properties and correlations with service utilizations and psychotherapy outcome. Evaluation and Program Planning, 5, 233237. Azar, B. (1997). Poor recall mars research and treatment: Inaccurate self-reports can lead to faulty research conclusions and inappropriate treatment. The American Psychological Association (APA) Monitor, 28(1), 1, 29. Beutler, L.E. (1991). Have all won and must all have prizes? Revisiting Luborsky et al.'s verdict. Journal of Consulting and Clinical Psychology, 59, 226-232. Chambless, D.L. (1995). Training in and dissemination of empirically-validated psychological treatments: Report and recommendations: Task force on promotion and dissemination of psychological procedures: Division of Clinical Psychology: American Psychological Association. The Clinical Psychologist, 48(1), 3-23. Chambless, D.L., Sanderson, W.C., Shoham, V., Bennett-Johnson, S., Pope, K.S., Crits-Christoph, P., Baker, M., Johnson, B., Woody, S.R., Sue, S., Bautler, L., Williams, D. A., & McCurry, S. (1996). An update on empirically validated therapies. The Clinical Psychologist, 49, 5-18. Chowanec, G., Neunaber, D., & Krajl, M. (1994). Customer driven mental healthcare and the role of the mental healthcare consultant. Consulting Psychology Journal, 46, 47-54. Cook, J.A., & Jonikas, J.A. (1996). Outcomes of psychiatric rehabilitation service delivery. In D.M. Steinwachs, L.M. Flynn, G.S. Norquist, & E.A. Skinner (Eds.), Using client outcomes information to improve mental health and substance abuse treatment: New directions for mental health services (No. 71, pp. 33-47). San Francisco: Jossey-Bass. Coughlin, K.M., Simon, K., Soto, L., & Youngs, M.T. (Eds.). (1997). The 1997 behavioral outcomes and guidelines sourcebook: A progress report and resource guide on mental health and substance abuse management. New York: Faulkner & Gray. Cummings, N., & Sayama, M. (1995). Focused psychotherapy throughout the life cycle. New York: Brunner/Mazel. Davis, D.F., & Fong, M.L. (1996). Measuring outcomes in psychiatry: An inpatient model. Journal on Quality Improvement, 22, 125-133. Depression treatment in primary care settings cuts medical utilization. (1997, May). Behavioral Health Outcomes, 2(5), 8-9. Docherty, J.P., & Streeter, M.J. (1996). Measuring outcomes. In L.I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 8-18). Baltimore, MD: Williams & Wilkens. Donabedian, A. (1985). Explorations in quality assurance and monitoring: Vol. 3. The methods and findings of quality assessment and monitoring: An illustrated analysis. Ann Arbor, MI: Health Administration Press. Dornelas, E.A., Correll, R.E., Lothstein, L., Wilber, C., & Goethe, J.W. (1996). Designing and implementing outcome evaluations: Some guidelines for practitioners. Psychotherapy, 33, 237-245. Drissel, A.B., & Taylor, C. (1997). Accreditors: Adapting and evolving for behavioral care. Behavioral Health Management, 17, 10-13. Drucker, P.F. (1954). The practice of management. New York: Harper Business.

< previous page

page_208

next page >

< previous page

page_209

next page > Page 209

Eisen, S.V., & Dickey, B. (1996). Mental health outcome assessment: The new agenda. Psychotherapy, 33, 181189. Elbeck, M., & Fecteau, M.A. (1990). Improving the validity of measures of patient satisfaction with psychiatric care and treatment. Hospital and Community Psychiatry, 41, 998-1001. Fischer, J., & Corcoran, K. (Eds.). (1994a). Measures for clinical practice: A sourcebook: Vol. 1. Couples, families, and children (2nd ed.). New York: The Free Press. Fischer, J., & Corcoran, K. (Eds.). (1994b). Measures for clinical practice: A sourcebook: Vol. 2. Adults (2nd ed.). New York: The Free Press. Fowler, R.D. (Ed.). (1996). Outcome assessment of psychotherapy [Special Issue]. American Psychologist, 51(10). Gift, R.G., & Mosel, D. (1997). The benchmarking advisor: Tools, techniques, and tips: Avoid data collection calamities when benchmarking. In J. Mangano, L. Herold, P. Hamann, & B. Rosenthan (Eds.), 1998 medical quality management sourcebook: A comprehensive guide to clinical performance enhancement for plans and providers (pp. 87-91). New York: Faulkner & Gray. Herzlinger, R. (1997). Market driven health care: Who wins, who loses in the transformation of America's largest service industry. Reading, MA: Addison-Wesley. Institute of Medicine: Committee on Quality Assurance and Accreditation Guidelines for Managed Behavioral Health Care. (1997). Managing managed care: Quality improvement in behavioral health. Washington, DC: National Academy Press. Johnson, L.D. (1995). Psychotherapy in the age of accountability. New York: Norton. Joint Commission on Accreditation of Healthcare Organizations. (1997a). National library of healthcare indicators: Health plan and network edition. Oakbrook Terrace, IL: Author. Joint Commission on Accreditation of Healthcare Organizations. (1997b). ORYX Outcomes: The next evolution in accreditation: Performance measurement systems: Evaluation and selection. Oakbrook, IL: Author. Kane, R.L., Bartlett, J., & Potthoff, S. (1995). Building an empirically based outcomes information system for managed mental health care. Psychiatric Services, 46, 459-462. Lansky, D., Butler, J.V., & Waller, F. (1992). Using health status measures in the hospital setting: From acute care to ''outcomes management." Medical Care, 30, 57. Lyons, J.S., Howard, K.I., O'Mahoney, M.T., & Lish, J.D. (1997). The measurement and management of clinical outcomes in mental health. New York: Wiley. MacStravic, R.S. (1991). Beyond patient satisfaction: Building patient loyalty. Ann Arbor, MI: Health Administration Press (A Division of the Foundation of the American College of Healthcare Executives). Mathios, N. (1997). Accreditation for behavioral healthcare 1997: New initiatives relevant to your organization. Behavioral Healthcare Tomorrow, 6, 75-77. McGlynn, E.A. (1996). Setting the context for measuring patient outcomes. In D.M. Steinwachs, L. M. Flynn, G.S. Norquist, & E.A. Skinner (Eds.), Using client outcomes information to improve mental health and substance abuse treatment: New directions for mental health services (No. 71, pp. 19-32). San Francisco: Jossey-Bass. McGlynn, E.A., Damberg, C., Kerr, E.A., & E. L. Schenker. (1996, July). Building an integrated information system. Community health data systems: Special topics number 1 [On-line]. Available: http://www.chmis.org/qmas/chds1. html Mental health benefit is cost effective. (1993). American Psychological Association [On-line]. Available: http://www.apa.org/practice/costeffe.html Mental Health Statistics Improvement Program Task Force. (1996, April). Consumer-oriented mental health report card: The final report of the mental health statistics improvement program task force on a consumeroriented mental health report card. Cambridge, MA: The Evaluation Center at Human Services Research Institute. Migdail, K.J., Youngs, M.T., & Bengaen-Seltzer, B. (Eds.). (1995). The 1995 behavioral outcomes and guidelines sourcebook.: A progress report and resource guide on mental health and substance abuse management. New York: Faulkner & Gray. Moore, J.D. (1997). JCAHO tries again: Agency moves to accredit providers based on outcomes. Modern Healthcare, 27, 2-3. National Committee for Quality Assurance. (1997a). 1997 surveyor guidelines for the accreditation

< previous page

page_210

next page > Page 210

of managed behavioral healthcare organizations. Washington, DC: Author. National Committee for Quality Assurance. (1997b). A road map for information systems: Evolving systems to support performance measurement. In Health plan employer data and information (Set 3.0, Vol. 4). Washington, DC: Author. National Committee for Quality Assurance. (1997c). What is MBHO accreditation? [On-line]. Available: http://www.ncqa.org/accred/mbhotext.htm National Technical Assistance Center. (1996, Spring). Performance measures play crucial role in accountability for public behavioral healthcare system. Networks [On-line]. Available: http://www.nasmhpd.org/ntac/merged2. htm Nelson, E.C. (1996). Using outcomes measurement to improve quality and value. In D.M. Steinwachs, L.M. Flynn, G.S. Norquist, & E.A. Skinner (Eds.), Using client outcomes information to improve mental health and substance abuse treatment: New directions for mental health services (No. 71, pp. 19-32). San Francisco: Jossey-Bass. Nelson, E.C., Batalden, P.B., Plume, S.K., Mihevc, N.T., & Swartz, W.G. (1995). Report cards or instrument panels: Who needs what? Journal of Quality Improvement, 21, 155-166. O'Leary, D.S. (1997). Preface. In ORYX outcomes: The next evolution in accreditation: Technical implementation guide for performance measurement systems. Oakbrook Terrace, IL: Joint Commission on Accreditation of Healthcare Organizations. Panzarino, P. (1996). Outcomes management makes strides in quality accountability. Behavioral Healthcare Tomorrow, 5, 69-70. Pickett, S.A., Lyons, J.S., Polonus, T., Seymour, T., & Miller, S.I. (1995). Predictors of satisfaction with managed mental health care. Psychiatric Services, 46, 722-723. Polowczyk, D., Brutus, M., Orvieto, A.A., Vidal, J., & Cipriani, D. (1993). Comparison of patient and staff surveys of consumer satisfaction. Hospital and Community Psychiatry, 44, 589-591. Quality Measurement Advisory Service. (1996, May). Measuring health care quality for value-based purchasing. Fairfax, VA: Severyn Healthcare Consulting and Publishing. Quality Measurement Advisory Service. (1997). Organizing and financing a health care quality measurement initiative: A guide for getting started. Fairfax, VA: Severyn Healthcare Consulting and Publishing. Ragusea, S.A. (1997). Part 2: Newly emerging outcomes databases for individual practitioners: Pennsylvania psychological association's practice research network. Behavorial Healthcare Tomorrow, 6, 47. Sauber, S.R. (Ed.). (1996). Mental health practice under managed care: Vol. 6. Treatment outcomes in psychotherapy and psychiatric interventions. New York: Brunner/Mazel. Smith, G.R. (1996). State of the science of mental health and substance abuse patient outcomes assessment. In D.M. Steinwachs, L.M. Flynn, G.S. Norquist, & E.A. Skinner (Eds.), Using client outcomes information to improve mental health and substance abuse treatment: New directions for mental health services (No. 71, pp. 59-68). San Francisco: Jossey-Bass. Solberg, L.I., Mosser, G., & McDonald, S. (1997). The three faces of performance measurement: Improvement, accountability, and research. Journal of Quality Improvement, 23, 135-147. Trabin, T. (1997). The quality agenda accelerates: Outcomes collaboratives emerge. Behavioral Healthcare Tomorrow, 6, 11-13. Trabin, T., & Kramer, T. (1997). In the eye of the storm: Promoting quality initiatives for behavioral health care. Evaluation Review, 21, 342-351. Ware, J.E., & Hays, R.P. (1988). Methods for measuring patient satisfaction with specific medical encounters. Medical Care, 26, 393-402. Zarin, D.A., Pincus, H.A., & West, J. (1997). Part 2: Newly emerging outcomes databases for individual practitioners: APA's practice research network: An update on recent expansion and research activities. Behavioral Healthcare Tomorrow, 6, 45-46. Zieman, G.L., Kramer, T.L., & Daniels, A.S. (1997). Part 1: Measuring treatment effectiveness: Clinical outcomes in behavioral group practices. Behavioral Healthcare Tomorrow, 6, 38-40.

< previous page

page_210

next page >

< previous page

page_211

next page > Page 211

Chapter 7 Progress and Outcome Assessment of Individual Patient Data: Selecting Single-Subject Design and Statistical Procedures Frederick L. Newman Florida International University Gayle A. Dakof University of Miami School of Medicine One need not apologize for an interest in exploring and making inferences about observations on the individual consumer. There has been a long rich history of individual subject research in psychology, starting with the psychophysics and sensory psychology studies performed in Europe and the United States during the 1800s (Osgood, 1953; Stevens, 1951). The research methods, mostly relying on extensive within-person replication techniques, were sufficiently rigorous so that much of it withstood the tests of replication and generalizability over many individuals (Stevens, 1966). The psychophysics studies were also characterized by their focus on discovering what can be identified as trait, rather than state, characteristics of their subjects. State characteristics (i.e., characteristics distinguished by the environmental and modifiable character of the individual) became the domain of those interested in the areas of behavior change (e.g., behavior modification, learning theory, human judgment and decision making, health and clinical psychology, psychopharmacology, and some subsets of physiological psychology). Interest in how behavioral patterns or state characteristics changed over time becomes problematic for single-subject methods as practiced by the early psychophysics researchers. Simply stated, replications over time were expected to show change. Three traditions of single-subject clinical research have emerged: 1. The use of clinical case notes over the course of treatment: procedure used by those who were developing or studying psychodynamic theories, particularly those developing or studying psychoanalytic theories. 2. The counting of discrete observable behavioral events over time and under different conditions controlled by the experimenter: procedure employed by behavior analysts and therapeutic process researchers (many of whom would not wish to be identified as belonging to the same category as the behavioral analysts, and vice versa). 3. The scores from a standardized instrument contrasted with established empirical norms: procedure employed by neuropsychologists to understand the current intellectual or cognitive state of the individual, or by industrial organization researchers working on matching personal characteristics to a job.

< previous page

page_211

next page >

< previous page

page_212

next page > Page 212

The Use of Clinical Case Notes Reporting of individual clinical case studies in the form of written narratives taken from the theorist's own clinical notes or that of a close colleague has been an integral part of clinical psychology's and psychiatry's historical development (e.g., as presented in the psychoanalytic and psychodynamic literature). Unfortunately, evidence to support replication has been a subject of some controversy about the scientific credibility of those who have used this approach (e.g., the controversy about whether Freud and Jung reported the evidence in and from their case notes properly). The major difficulty in the early attempts to use clinical case studies as scientific evidence to construct or test theories was the lack of agreement on what methods would be necessary and sufficient to argue for the data's validity. Clinical case notes, in the narrative form, have too many "alternative explanations," including the frailty of human judgment in general (Meehl, 1954; Newman, 1983) and the inherent conflict of interest of having the clinician who is treating the person being both the observer and the synthesizer of the observations into a written narrative. Observing and Counting Discrete Behaviors Over Time Behavioral analysts have been employing single-subject techniques for some time (L.W. Craighead, W.E. Craighead, Kazdin, & Mahoney, 1994; Hersen & Barlow, 1976; Herson, Michaelson, & Bellack, 1994; Kazdin, 1992). The tradition of those employing behavior analysis techniques is to select a discrete, readily observable behavior that can be tracked reliably over time and is sensitive to the environmental intervention under control of the researcher/clinician. A review of the behavior analysis literature did not find single-subject studies that employed the types of psychological measures that are the focus of this volume. Instead, the investigator selects a marker behavior that is the most suitable indicator of an underlying construct. In the case of phobic, avoidance, or even abusive behaviors, selection of a behavioral marker to track over the course of treatment is quite straightforward. When the type of distress or functioning is more generic (e.g., depression, or interpersonal attachment/commitment), then the selection of one or more marker behaviors may be more problematic. Texts on behavioral analytic techniques (e.g., L.W. Craighead et al., 1994) will caution the reader that the selection of the marker behavior or indicator must make sense both theoretically and empirically (for its psychometric qualities). Contrasting Psychological Test Scores for an Individual on a Standardized Instrument with Published Norms This approach represents the core of the tradition of what is popularly called psychological testing. There are several examples distributed through this text (e.g., the chapters on the MMPI-2, MMPI-A, Conners Rating Scale). A major function of this tradition is to present the results of psychological test for an individual with those of either a clinical or a nonclinical norm to justify the need for treatment to a third-party payer

< previous page

page_212

next page >

< previous page

page_213

next page > Page 213

or an institution (e.g., a hospital or school). This approach is also used to justify continued treatment. Using the results of such tests to indicate when an individual's treatment should stop is not part of the tradition of psychological testing. Use of psychological testing in follow-up to treatment appears to be confined to treatment outcome studies where results on groups of consumers under one or more treatment conditions are compared. One conclusion that could be drawn from the traditional uses of single-subject studies is that the use of psychological testing of the sort covered in this text cannot be employed in single-subject research. That conclusion cannot yet be drawn for two reasons. First, there is a great need for single-subject studies in developing new treatments (Greenberg & Newman, 1996). Also, there are important applications for use of tests such as these by the individual clinician (or a clinical team) in tracking the progress of an individual consumer/client, providing justification for initiating, continuing, and shifting treatment strategies to a thirdparty payer (Newman & Tejeda, 1996). Selection and Use of Psychological Tests in Single-Subject Clinical and Research Applications Clinicians, supervisors, and mental health administrators, arguably, need to know whether or not the services they provide ameliorate the consumer's symptoms and improve on their capacity to manage their community functioning. Such knowledge should be useful in treatment planning and decision making. For instance, if the intervention yields little or no improvement where such improvement should be expected, given what is known in the literature, then the service provider(s) would want to consider a different intervention strategy. If the person is in individual therapy, the clinician might decide that progress could be achieved if the family is brought into the therapy process, or perhaps if the frequency of therapeutic contact is modified to be more frequent or changed to another mode, such as group or family therapy. For example, if observed behavior patterns move sufficiently in a negative direction, then more intensive or restrictive treatment (e.g., initiating or increasing the dosage of a medication, the use of day treatment or inpatient treatment) might be justified. On the other hand, the charting of the person's functioning might lead the clinician to realize that there has been sufficient symptom reduction and improvement in the management of their day-to-day functioning to warrant a discussion of termination from treatment. Too frequently, however, the clinician has no method for systematically charting individual consumers' progress. Well-established psychological instruments, such as those reviewed in this book, are ideally suited to chart the progress of an individual, and hence assist service providers in clinical decision making (e.g., when to refer the person to a more or less intensive treatment, when to terminate treatment, when to refer to another type of intervention). The psychological tests discussed here are widely used, reliable, and valid measures with established norms. Moreover, most of these tests are easy to administer, score, and interpret. Thus, it appears to be a relatively straightforward procedure to administer and score a few carefully selected instruments repeatedly throughout treatment, plot the scores on a graph, and use this information to assess the consumer's progress. What follows are the basic guidelines for designing a procedure to systematically collect and analyze data to assess individual patient progress.

< previous page

page_213

next page >

< previous page

page_214

next page > Page 214

Selection of an Instrumentation As discussed earlier, the tests and measures reviewed in the current volume can be very useful to individual clinicians because they are widely used, and have well-established psychometric properties and norms. But a more important consideration in single-subject research is that the selected test must demonstrate sensitivity to change. In psychotherapeutic work, especially in the early phase of treatment, important changes might be quite subtle. Marked changes in conduct disorder (see examples using the Child Behavior Checklist, or CBCL, by Achenbach; or the Child Adolescent Functional Assessment Scale, or CAFAS, by Hodges; or when using the Beck Depression Inventory) are not observable until substance use/abuse is curtailed and/or the family becomes engaged in the treatment (Liddle, Rowe, Dakof, & Lyke, 1998). Moreover, the clinician must make sure the selected test measures a behavior, attitude, or emotion expected to change as a result of the treatment. For example, in Multidimensional Family Therapy (Liddle & Saba, 1981) of adolescent drug abuse, the clinician targets parenting behaviors. Specifically, the therapists work to change how parents control (i.e., set limits and monitor), nurture (i.e., express love and concern), and grant autonomy to their acting out adolescent. It is important, then, that the clinician measure these specific parenting behaviors, rather than overall family closeness, such as is measured by the Family Environment Scale (Moos, 1994). The clinician must be careful to select a measure that is sensitive to the types of changes targeted in the intervention. For example, youth referred for substance abuse treatment frequently have behavior problems and symptoms that are comorbid with the drug use. For example, one youth might be particularly aggressive and violent. In this case, one of the first treatment goals would be to reduce the violent and aggressive behaviors. Aggressive behavior scale should be assessed with this particular youth, using a measure of such as the Achenbach's Child Behavior Checklist (CBCL) or Youth Self-report (YSR). Another drug using youth might present with depression and aggression; in this case, the clinician would be wise to measure both aggression and depression. Thus, as in any research, the researcher (or in this case the clinician) must carefully select measures that assess the targets of the intervention (see chap. 5, by Newman, Ciarlo, & Carpenter, for a discussion of the guidelines for selecting a measure). Selection of Criterion on which to Base Clinical Decision Making Targeting Clinical Decisions. The biggest challenge in what is proposed here is the absence of clear standards for decision making with regard to admissions, referral to more intense or less intense treatment settings, or termination of treatment. This is the start of an era where neither a clinician, a group of clinicians, nor a service agency simply opens the doors of a service and provides treatment for all those who enter. Decisions must be made about what the needs are (usually requiring some epidemiological needs assessments estimates) and to what degree these needs are being met by exiting service providers. The epidemiological technology of assessing mental health service needs should provide a profile of the major psychosocial characteristics of those who need a service. These characteristics typically include age, gender, socioeconomic status, major symptoms/diagnostic groups, community functioning level, and level of social support. Once a decision is made that there is a group of people who can be served, then a set of measures and instruments need to be selected to support four distinct sets of clinical decisions:

< previous page

page_214

next page >

< previous page

page_215

next page > Page 215

1. What are the psychosocial characteristics of those who should be admitted into the service? 2. What are the characteristics that indicate the need for referral to a more intensive-restrictive service? 3. What are the characteristics that indicate the need for referral to a less intensive-restrictive service? 4. What are the characteristics that indicate if services are no longer needed? To address each of these questions, operational definitions and measures are needed to estimate level of functioning, symptoms, socialization, and overall ability for an individual to manage their day-to-day affairs (Newman, Hunter, & Irving, 1987). Moreover, each of the last three questions also raises an issue of amount (dosage) and type of therapeutic effort, and amount of change in symptom distress, functioning, or selfmanagement achieved over time (Carter & Newman, 1976; Howard, Kopta, Krause, & Orlinsky, 1986; Howard, Moras, Brill, Martinovich, & Lutz, 1996; Newman & Howard, 1986; Newman & Sorensen, 1985; Newman & Tejeda, 1996; Yates, 1996; Yates & Newman, 1980). Figure 7.1 illustrates how an individual consumer may progress relative to the four aforementioned questions. This example uses technology that is currently available, with one limitation to be discussed later. The intent of Fig. 7.1 is to show that by employing existing technology, it is possible to describe the impact of two or more interventions in terms of its cost-effectiveness in achieving observable criteria. The example illustrated in Fig. 7.1 offers a graphic analysis in the form of a progress-outcome report for three hypothetical individuals. These three persons have been identified as having similar characteristics at the outset of treatment (i.e., all three have similar initial levels of functioning, and have service plans for a 26-week period, such that the expected cost of administering the treatment plan is the same). However, two individuals differ with regard to their performance over the course of treatment. Person 1 is expected to improve in overall functioning to a point where she/he can either be referred to a service that is less intensive, or she/he may be terminated from services because his/her functioning and/or self-management is adequate to be independent of formal treatment supports. Person 2 has maintained functioning or level of self-management within the bounds described as acceptable to community functioning. Person 3 represents a person whose treatment goals were similar to that of Person 2, and is discussed later. The vertical axis of Fig. 7.1 represents a person's overall ability to function in the community. It is understood that this global measure must be supported by a multidimensional view of the persons within each group (Newman & Ciarlo, 1994). The horizontal axis represents the cumulative costs of providing services from the beginning of this episode of treatment. The two horizontal dashed lines inside of the box represent the behavioral criteria set a priori, that is, the lower and upper bounds of functioning for which this mental health service is designed to serve. If consumers behave at a level below the lower dashed line, then they should be referred to a more intensive service. Likewise, if consumers behave at a level above the upper dashed line, then either services should be discontinued or a referral to a less intensive-expensive service should be considered. Finally, the vertical dashed line inside of the box represents the cumulative costs of the planned services for the 26-week period. In this hypothetical case, all three groups have treatment plans with the same expected costs. The circled numbers within Fig. 7.1 track the average progress of consumers within each of the groups at successive 2-week intervals. The vertical placement of the circled number represents the group's average level of functioning at that time and the horizontal placement represents the group's average costs of services up to that point. The sequence of 13 circled numbers represents the average progress of a person within 2-week intervals

< previous page

page_215

next page >

< previous page

page_216

next page > Page 216

Fig. 7.1. Hypothetical example of an analysis of changes in functioning and costs relative to managed care behavioral criteria for three consumers. Person 1 met the planned objective of improved functioning such that the person was able to move to terminate services within 6 months. Person 2 was able to maintain functioning over the same 6 months, but to use about the same amount of resources. Person 3 required additional resources to maintain adequate levels of community functioning. Adapted from "The Need for Research Designed to Support Decisions in the Delivery of Mental Health Services" by F.L. Newman & M.J. Tejeda, 1996, American Psychologist, 51. Copyright © 1996 by the American Psychological Association. Reprinted with permission. over the 26 weeks of care. For Person 1, with a 6-month objective of improvement above that of the upper dashed line, the objective is met. For Person 2, with the 6-month objective of maintaining functioning within the range represented by the area between two dashed lines, the objective also is met. Person 3 is an example of a client who exceeded the bounds of the intend service-treatment plan after 4 months, 2 months shy of the objective. People were able to maintain their community functioning, but only with the commitment of additional resources. A continuous quality improvement program should focus a post-hoc analysis on these consumers to determine if they belong to another cost-homogeneous subgroup or whether the services were provided as intended, as well as whether modification of current practices is needed.

< previous page

page_216

next page >

< previous page

page_217

next page > Page 217

A major difficulty in attempting to enact these recommendations is that of obtaining a believable database to determine the appropriate progress-outcome criteria or the expected social or treatment costs. Some have used expert panels to set the first draft of such criteria (e.g., Newman, Griffin, Black, & Page, 1989; Uehara, Smukler, & Newman, 1994). Howard et al. (1986) used baseline data from prior studies, along with measures taken on a nonpatient population. But both strategies have problems. The research literature is largely based on studies where the dosage was fixed within a study, thereby constricting inferences about what type and amount of effort can achieve specific behavioral criteria. Howard's early work on a dosage and phase model of psychotherapy was statistically confounded by combining the results of controlled studies where number of sessions was fixed, with data from naturalistic studies. Some of the naturalistic data came from persons who had no limit on sessions and others did have limits on the number of sessions imposed by third-party payers. Based on previous experiences, it would appear the dosage data as reported are probably trustworthy. However, there are no studies where behavioral criteria were set a priori or in which the type and amount of efforts were seen as the dependent variables in estimating the success or failure to achieve these criteria. Yet the logic of managed care and the logic of the National Institute of Mental Health (NIMH) practice guidelines would require that such study results be available to set criteria for using a particular intervention strategy or to set reimbursement standards. This void must be filled by data from well-designed efficacy and cost-effectiveness studies that can provide empirical support for setting behavioral outcome criteria for managed care programs. Without such data there is faint hope of changing the current practice of setting dosage guidelines independent of behavioral criteria. Patient Profiling. The technique developed by Howard et al. (1996) is sufficiently different from the example given in Fig. 7.1 to warrant a more detailed discussion. Howard et al. (1996) introduced the technique as a new paradigm of "patient-focused research." The paradigm addressed the question of "Is this patient's condition responding to the treatment that is being applied?" (p. 1060). The technique makes use of the progress and outcomes of a large database (N > 6,500) of adults who received outpatient psychotherapy and the dose-response curves that were obtained for those in the database. Based on the prior research reported by Howard's group, a predictive model of expected change based on seven intake characteristics can be generated (W. Lutz, personal communication, January 14, 1998). The seven intake variables are Level of Well-being, a subscale of the Mental Health Index (Howard, Luger, Mailing, & Martinovich, 1993); Level of Functioning, a subscale of the Mental Health Index (Howard et al., 1993); Symptom Severity, a subscale of the Mental Health Index (Howard et al, 1993); Prior psychotherapy (none, 1-3 months, 4-6 months, 6-12 months, 12+ months); Chronicity ("How long have you had this problem?"); Expectation ("How well do you expect to feel emotionally and psychologically after therapy is completed?"); and Clinician's rating of the Global Assessment of Functioning (Axis V of DSM-IV). Prior to the initial therapy session, the person is asked to complete the Mental Health Index, which includes items covering six of the seven variables (plus other areas of functioning). The values obtained on these six variables, plus the clinician's rating of the person's Global Assessment of Functioning (GAF) are entered as predictor variables into a growth curve modeling program (e.g., HLM; Bryk & Raudenbush, 1992) to obtain an estimate of the expected dose-response curve for that person. Formally, this is called a Level 2 analysis because the predicted outcome for the individual uses the results of the subset of those people with the database of approximately 6,500 people

< previous page

page_217

next page >

< previous page

page_218

next page > Page 218

who have scores on the seven predictor variables similar to those given by the person under study. The HLM program also permits an estimate of what is described as a "failure boundary," based on the growth curve of those persons for whom the expected growth curve was either flat (i.e., a slope of zero) or negative. Once the expected dose-response curve for those who showed improvement and the failure boundaries are estimated, then it is just a matter of obtaining and plotting the results of the scores a person obtains on the Mental Health Index after every fourth session. The plot of these data every four sessions represents the individual's growth curve that can be contrasted with the expected growth curve and the "failure boundary" curve. Figures 7.2 and 7.3 provide examples of outcome relative to the expected and "failure" growth curves for a hypothetical person undergoing psychotherapy treatment. The patient profiling method, though elegant, would appear to be beyond the resources of most clinicians working in private individual or group practices, small public clinics, or hospital settings without access to a large database from which they could use to estimate the growth curves. Moreover, most professionals currently working in the real world of clinical service delivery were not trained to use the latest statistical packages involving growth curve analysis. In spite of the limitations already mentioned (measures with appropriate norms, standards on which to base clinical decisions, adequate criterion for clinical decision making based on a large sample size), we still recommend that careful assessment of patient progress on measures be collected over the course of treatment. Graphing of these measures will then assist in clinical decision making.

Fig. 7.2. Course of Mental Health Index for a person for whom treatment progress and outcome had a good prognosis and the observed progress and outcome was better than expected. From "Evaluation of Psychotherapy: Efficacy, Effectiveness, and Patient Progress" by Howard et al., 1996, American Psychologist, 51, p. 1062. Copyright © 1996 by the American Psychological Association. Reprinted with permission.

< previous page

page_218

next page >

< previous page

page_219

next page > Page 219

Fig. 7.3. Course of Mental Health Index for a person for whom treatment progress and outcome had a poor prognosis and the observed progress and outcome was even worse than expected. From ''Evaluation of Psychotherapy: Efficacy, Effectiveness, and Patient Progress" by Howard et al., 1996, American Psychologist, 51, p. 1062. Copyright © 1996 by the American Psychological Association. Reprinted with permission. Clinical Significance as an Outcome Criterion. Still another approach is to identify progress or outcome on a standardized measure relative to a criteria of clinical significance (Jacobson & Truax, 1991). This can be done when the assessment instrument has been administered to both a clinical and a nonclinical group. A clinically significant change in behavior is when a person's score on the psychological test is statistically significant from the pretest average of a clinical group and more like that of the distribution of the nonclinical group than the distribution of the clinical group. When there are no norms on a nonclinical group, then the convention recommended by Jacobson and Truax is to say that clinically significant change has occurred if the difference score from pretreatment to posttreatment is significant and statistically equal to or greater than two standard deviations in the direction of improved functioning. Howard and his colleagues (1996) recommended that change equal to or greater than 1.8 standard deviations in a positive direction would recommend a score that is more likely to be represented by the nonclinical group than the clinical group prior to treatment. Practical Considerations: What Data to Collect and When? There is a diversity in opinions among clinicians about whether to, and/or how often and when (prior to or following a therapy session), to collect client self-report and/or clinician data on a standardized instrument. Having gone this far into the current

< previous page

page_219

next page >

< previous page

page_220

next page > Page 220

chapter, it is probably safe to assume that readers hold some interest in collecting intake and outcome information from the consumer. From experience, a key issue underlying the question of "whether" to collect the data using a standardized instrument about the person's status is related to whether the participating clinician feels that collecting such information is useful to therapy. To state the obvious: Those who do not see any reason or use for such information in their treatment strategy do not willingly collect such data and those who find such information useful to their therapeutic intervention do routinely collect such data. Those who do find such information useful tend to describe their theoretical orientation as behavioral or cognitive-behavioral, although there are many exceptions among colleagues within all theoretical orientations. As Beutler (1991) pointed out, most practicing clinicians make use of a number of different techniques, behaving quite eclectically in their actual practice, even when there is a claim that one theoretical orientation guides the core of their treatment strategy. A recent Internet discussion (W. Lutz, personal communication, January 14, 1998) indicated that the diversity described earlier exists, even among those who say they have a major professional interest in outcome evaluation. On the basis of this nonscientific review of the issues and the demands for the graphic and statistical analytic methods proposed, the following guidelines are offered: 1. Collect data at intake prior to the initial interview to obtain a baseline measure. 2. Select a battery of instruments for the intake assessment, but only select easily completed instruments for progress evaluation. a. The selection of the battery of instruments used at intake should be useful for treatment planning and for identifying those intake variables (and circumstances) that would help identify the subset of persons for whom normative data (on either clinical or nonclinical populations, or both) exists. Thus, a key concern in selecting these instruments is whether such normative data are available. b. The measure administered during and following treatment must cover the domain(s) that are the foci of treatment. Because the focus of treatment may not be obvious prior to the first clinical interview, the selection and administration will probably follow the intake interview. This requires that the clinician have available a collection of instruments ready for such selection. 3. The person needs to be told in advance that time will be spent before and after the initial interview in performing the tasks required of the instruments. 4. The recommended frequency of administering the instrument should vary in accord with how the instrument is used and what kind of information the clinician (researcher) is seeking. Howard recommended every fourth session as ideal for estimating change relative to the dose-response curves his group has been studying. Others (e.g., Richard Hunter in Newman, Hunter, & Irving, 1987) have argued for the use of such an instrument at every session that the person is capable to complete such a form. Cognitive behavioral therapists have indicated that having the person complete the instrument prior to each session provides information that could guide the direction that therapy might take on that day. Because it is seldom clear as to when therapy will end, do not expect to easily obtain information from the last session. Thus, a strategy of routinely collecting information on a standardized instrument (e.g., with each session or with every fourth session) is reasonable. The development of new statistical routines describing the trajectory of change over time, but not requiring equal intervals between times of observation, will allow the scheduling of assessment periods in a way that is tailored more specifically to the treatment strategy. Suppose a therapist subscribed to the phase model recommended by Howard et al. (1993), and wanted to schedule assessment periods according to what they might expect

< previous page

page_220

next page >

< previous page

page_221

next page > Page 221

to be critical times for change within specific domains of functioning. For example, one strategy might start with a schedule of such assessments every session for the first 4 weeks during the remoralization phase, during the remediation (of symptoms) phase, and then once each 8 to 12 weeks during the rehabilitation (habit modification) phase. The application of patient profiling recommended by Howard et al. (1996) can readily handle the fitting of these data to the normative data to produce an expected growth curve, even though the intervals between data collection occasions do vary. Treatment Innovation This chapter has focused on how to use psychological tests to track patient progress and inform clinical decision making. The proposed procedures also can facilitate treatment improvement. If clinicians track individual patient progress over time in the ways discussed, they can use the data collected to compare successful outcomes with unsuccessful outcomes and to identify patterns that might lead to treatment innovations. Although there is a substantial body of research indicating the benefits of specific psychotherapeutic interventions (Shadish et al., 1997), few would deny that there is still considerable room for improvement. For example, a review of interventions with children and adolescents revealed that only from 5% to 30% of youth participating in such treatments evidence clinically significant change (Kazdin, 1987). The situation with child as well as adult treatment has fueled a movement toward the development of treatment innovations (Beutler & Clarkin, 1990; Onken, Blaine, & Boren, 1993). Kazdin (1994) identified seven steps to developing effective treatments: conceptualization of the dysfunction, research on processes related to the dysfunction, conceptualization of the treatment, specification of the treatment, tests of treatment process, tests of treatment outcome, and tests of the boundary condition and moderators. Repeated administration of the tests reviewed in the current volume, then, can guide treatment innovation and development. Conclusions It is somewhat amazing that the technology of single-subject research has not advanced further given its long history in the behavioral sciences. But then, the elegant examples and graphic models provided by early psychophysics and behavioral researchers set a baseline that has served us well. The key requirements of any research paradigm rest on the care with which the researcher provides adequate operational definitions of exogenous and endogenous variables, along with the care in collecting and presenting the data. These same requirements are particularly important in single-subject research and for the presentation of psychological testing data on a single individual at one point in time and over time. Thus, the traditional presentation of a profile on an individual relative to empirical norms that has been the standard in the reporting of psychological tests is still a worthy tool. The traditional staple of behavior analysts of plotting behaviors over time has been augmented by plotting the results of a psychological test on an individual over time relative to the changes expected in a norm treatment group

< previous page

page_221

next page >

< previous page

page_222

next page > Page 222

as recommended by Howard et al. (1996) and shown in Figs. 7.2 and 7.3, or relative to a standard as recommended by Newman and Tejeda (1996) as shown in Fig. 7.1. The introduction of the logic of "clinically significant change" by Jacobson and Truax (1991) identified another benchmark for the both group and single-subject research that should influence the interpretation of singlesubject research and clinical data. The critical question regarding the individual person becomes: Are the changes in behavior, as represented by psychological testing over time, clinically significant? But what is the standard that distinguishes the boundary between clinically significant and a nonsignificant outcome? Empirical norms, as exemplified in Figs. 7.2 and 7.3, should be the first choice. At a minimum, an operationally defined standard based on a consensus of experts (Newman, Hunter, & Irving, 1987; Uehara et al., 1994) could also be considered, but only as a temporary standard while an effort is made to collect normative data to set the standard. A basic standard set by the traditions of psychophysics and behavioral analysis is that of simplicity in the graphic display of the results. The integration of graphics with readily available spreadsheets and statistical packages on the PC has made the technology of simple graphic displays available to everyone. An expanded use of this technology in both clinical practice and in clinical research should become so common in the near future that such a chapter as this will need to take the discussion to a new plane. Hopefully, this means exploring other applications of the graphic analysis, such as statistical quality control theory (Green, in press) at that time. Acknowledgments Thanks are owed to our thoughtful and patient colleagues for their recommendations and comments on this chapter: Michael Dow, Howard Liddle, and Manuel J. Tejeda. References Beutler, L.E. (1991). Have all won and must all have prizes? Revisiting Luborsky et al.'s verdict. Journal of Consulting and Clinical Psychology, 59, 226-232. Beutler, L.E., & Clarkin, J.F. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. New York: Brunner/Mazel. Bryk, A.S., & Raudenbush, S.W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage. Carter, D.E., & Newman, F.L. (1976). A client oriented system of mental health service delivery and program management: A work-book and guide (Series FN No. 4, DHEW Pub. No. ADM 76-307). Rockville, MD: Mental Health Service System Reports. Craighead, L.W., Craighead, W.E., Kazdin, A.E., & Mahoney, M.J. (1994). Cognitive and behavioral interventions: An empirical approach to mental health problems. Boston: Allyn & Bacon. Green, R.S. (in press). The application of statistical process control to manage global client outcomes in behavioral healthcare. Evaluation and Program Planning. Greenberg, L., & Newman, F.L. (1996). An approach to psychotherapy process research: Introduction to the special series. Journal of Consulting and Clinical Psychology, 64, 435-438. Hersen, M., & Barlow, D.H. (1976). Single case experimental designs: Strategies for studying behavioral change. New York: Pergammon Press. Hersen, M., Michaelson, L., & Bellack, A.S. (1994). Issues in Psychotherapy Research. New York: Plenum. Howard, K.I., Kopta, S.M., Krause, M.S., & Orlinsky, D.E. (1986). The dose-effect

< previous page

page_222

next page >

< previous page

page_223

next page > Page 223

relationship in psychotherapy. American Psychologist, 41, 159-164. Howard, K.I., Lueger, R.J., Maling, M.S., & Martinovich, Z. (1993). The attrition dilemma: Toward a new strategy for psychotherapy research. Journal of Consulting and Clinical Psychology, 54, 106-110. Howard, K.I., Moras, K., Brill, P.L., Marinovich, Z. & Lutz, W. (1996). Evaluation of psychotherapy: Efficacy, effectiveness, and patient progress. American Psychologist, 51(10), 1059-1064. Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Kazdin, A.E. (1992). Research design in clinical psychology. Needham Heights, MA: Allyn & Bacon. Kazdin, A.E. (1987). Comparative outcome studies of psychotherapy: Methodological issues and strategies. Journal of Consulting and Clinical Psychology, 54, 95-105. Kazdin, A.E. (1994). Methodology, design and evaluation in psychotherapy research. In A. E. Bergin & S.L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 543-594). New York: Wiley. Liddle, H.A., Rowe, C.L., Dakof, G.A., & Lyke, J. (1998). Translating parenting research into clinical interventions for families of adolescents. Clinical Child Psychology and Psychiatry, 3, 419-443. Liddle, H.A., & Saba, G. (1981). Systemic chic: Family therapy's new wave. Journal of Strategic and Systemic Therapies, 1, 36-69. Meehl, P.E. (1954). Clinical versus statistical prediction. Minneapolis: University of Minnesota. Moos, R. H. (1994). Editorial: Treated or untreated, an addiction is not an island unto itself. Addiction, 89, 507509. Newman, F.L. (1983). Level of functioning scales: Their use in clinical practice. In P.A. Keller & L.G. Ritt (Eds.), Innovations in clinical practice: A source book. Sarasota, FL: Professional Resource Exchange. Newman, F.L., Griffin, B.P., Black, R.W., & Page, S.E. (1989). Linking level of care to level of need: Assessing the need for mental health care for nursing home residents. American Psychologist, 44, 1315-1324. Newman, F.L., & Howard, K.I. (1986). Therapeutic effort, outcome and policy. American Psychologist, 41, 181187. Newman, F.L., Hunter, R.H., & Irving, D. (1987). Simple measures of progress and outcome in the evaluation of mental health services. Evaluation and Program Planning, 10, 209-218. Newman, F.L., & Sorensen, J.L. (1985). Integrated clinical and fiscal management in mental health. Norwood, NJ: Ablex. Newman, F.L., & Tejeda, M.J. (1996). The need for research designed to support decisions in the delivery of mental health services. American Psychologist, 51, 1040-1049. Onken, L.S., Blaine, J.D., & Boren, J.J. (1993). Behavioral treatments for drug abuse and dependence (NIDA Research Monograph No. 137). Rockville, MD: U.S. Department of Health and Human Services. Osgood, C.E. (1953). Method and theory in experimental psychology. New York: Oxford University Press. Shadish, W.R., Matt, G.E., Navarro, A.M., Siegle, G., Crits-Christoph, P., Hazelrigg, M. D., Jorm, A.F., Lyons, L.C., Nietzel, M. T., Prout, H.T., Robinson, L., Smith, M. L., Svartberg, M., & Weiss, B. (1997). Evidence that therapy works in clinically representative conditions. Journal of Consulting and Clinical Psychology, 65(3), 355-365. Stevens, S.S. (1951). Handbook of experimental psychology. New York: Wiley. Stevens, S.S. (1966). Metric for the social consensus. Science, 151, 530-541. Uehara, E.S., Smukler, M., & Newman, F.L. (1994). Linking resources use to consumer level of need: Field test of the "LONCA" method. Journal of Consulting and Clinical Psychology, 62, 695-709. Yates, B.T. (1996). Analyzing costs, procedures, processes, and outcomes in human services. Applied Social Research Series (Vol. 42). Thousand Oaks, VA: Sage. Yates, B.T., & Newman, F.L. (1980). Findings of cost-effectiveness and cost-benefit analyses of psychotherapy. In G. Vanden Bos (Ed.), Psychotherapy: From practice to research to policy (pp. 163-185). Beverly Hills, CA: Sage.

< previous page

page_223

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_225

next page > Page 225

Chapter 8 Selecting Statistical Procedures for Progress and Outcome Assessment: The Analysis of Group Data. Frederick L. Newman Florida International University Manuel J. Tejeda Gettysburg College The selection of appropriate statistical procedures in the analysis of psychological test results must be driven by the clinical and decision/administrative environments in which the procedures are used. This chapter provides recommendations and guidelines for selecting statistical procedures useful in two such environments: screening and treatment planning, and progress and outcome assessment. Each environment has unique demands warranting different, though not necessarily independent, statistical approaches. Concurrently, there must be a common concern regarding a measure's psychometric qualities and its relation with outcome. For screening and treatment planning, there is greater concern with predicting the effectiveness of outcome and the concomitant costs of resources to be consumed in treatment. For progress and outcome applications, there is the additional requirement of sensitivity to the rate and direction of the change relative to treatment goals. The discussion in this section focuses on issues of analysis that should be addressed and guidelines for evaluating and selecting statistical procedures within each application. The chapter is organized such that the more commonly used statistical models are discussed under a number of different clinical topics (issues or questions), for example, traditional regression or analysis of variance. In each instance, the exact form of the analysis takes on the format best suited to the clinical issue under discussion. Moreover, the analysis is usually contrasted with alternative approaches regarding assumptions, interpretations, and practicality. It should be noted that the best approach is more dependent on the specific clinical issue under investigation. A major theme of this chapter recommends that the clinical question should drive the selection of the analytic approach. Approach to Presenting the Statistical Material The logic of presentation is to first discuss a specific clinical or mental health service issue and then to recommend one or more statistical procedures that can address the issue. The language of the mathematical expression underlying the statistical procedure

< previous page

page_225

next page >

< previous page

page_226

next page > Page 226

serves to bridge the clinical issue with the statistical procedure. The expressions are presented here for three reasons. First, the clinical focus of the discussion is designed to help readers understand the logical link between the clinical issue and the statistical procedure. Second, the discussion is designed to provide readers with a sufficient understanding of the statistical logic and vocabulary to read and use a statistical computer package manual and related texts, or to converse with their resident statistician. Third, the discussion should help readers understand where and how the link between the clinical issue and the statistical procedure is strong and where it is weak. References and computational details, along with examples of how the technique is used in clinical research applications, are provided. Discussion on selecting statistical procedures will follow from two baselines. One is a formal conceptualization of measurement: What does the instrument seek to measure and what are the potential sources of error in the measurement? The second baseline is the clinical or service management question that is being asked of the measure. What follows is a definition of the general model and notation used throughout the chapter. Note that this model and notation is introduced for descriptive purposes. It is not the only model, or necessarily the best model for all situations. It is, however, a model that lends itself to a discussion of treatment planning and outcome assessment in mental health services. Collins and Horn (1991) provided a good review of alternative models. Suppose that at a specific time (t) researchers are interested in obtaining a measure Yijkt that proposes to describe a particular domain of human functioning (d) on the ith individual who belongs to a particular target group (bk) and is receiving a specific treatment (aj). The measurement model describes the influence of the person belonging to the kth target group and being under jth treatment at time t, on the observed behavior (Yijkt) of the domain called d, as follows:

The term abdijkt is the interaction of the jth treatment and the kth target group that influences the functional domain (d) for the ith subject at the time t, when the measure was taken. As a final note, Yijkt must contain the characteristics of the traditional operational definition. The measurement of Yijkt is bound by the same issues regarding accurate and valid measurement as other variables in the equation, such as subject characteristics and treatment assignment. There are two features of this model that are different from that which is offered in standard texts. First, the time that the measure is obtained (t) is included as a subscript in each term, and as such appears to be a constant. Time, of course, is a constant but t is included here as a reminder that all measures, particularly clinical measures of functional status, are time dependent. Such measures reflect states rather than traits (Kazdin, 1986). A more formal statement of the model could have treated time as an additional element of the model, thereby adding complexity to the presentation. Another tactic could have left t out completely. However, the temporal nature of the measure of functional status is important in most clinical service applications. Thus, a simple subscript t is used to indicate the temporal status of the measurement model. The clinical and statistical issues involved in measuring changes in functional status over time (progress and outcome) are discussed later, when the focus is on the impact of treatment and for whom the treatment works best. Second, this measurement model adds the parameter d to represent a particular domain of functioning (behavior). As with the use of the time element in the expression, d, a specific functional domain, is added for emphasis. Each of the elements in the

< previous page

page_226

next page >

< previous page

page_227

next page > Page 227

expression (treatment or target population characteristic in this case) should be seen as interacting with the measure of the individual on a specific functional domain. The inclusion of d is a reminder that the model may not hold if the observation made (Yijkt) does not actually represent the behavioral domain of interest. Prior to treatment the model reduces to:

where the term bdikt is the true value of the domain for the ith person belonging to the bk target population, at a given time (t). The last term, eikt, is an error term for the ith person that combines potential differences (error) due to at least three (potentially interacting) features: (a) Item (measure) difficulty (bit) at that time; (b) imprecise measurement at that time (mit); and, (c) individual differences introduced by that person (the individual's state) at that time (dit). The three potentially confounding components of the error term are discussed later. As additional target population characteristics are considered, the potential for it to interact with each of the other terms will only add to the model's complexity. Because most scoring procedures attempt to derive a composite score for a set of factors, they add sources of variation such as client characteristics, which typically compound error (i.e, combined with b, m, and d). Because these sources of error are nonsystematic, they cannot simply be subtracted from aggregate scores. The number of sources of variance expand with the addition of just one additional client characteristic (G1). This would add two fixed interactions with treatment effects (aGjl and abGjkl) and the potential for up to seven random interactions with the three confounded components of error (item difficulty b, error of measurement m, and individual differences d). Despite what appears to be intractable complexity, there are a number of measures available that have demonstrated sufficient psychometric quality, with sufficiently strong treatment effects and client effects, but small random error effects. These instruments may be applied to a fairly wide range of client characteristics. This is consistent with the outcome assessment recommendation of an NIMH expert panel (see Newman, Ciarlo, & Carpenter, 1998) that an ideal measure should be able to serve a wide range of client groups. The wide applicability of these measures also decreases the costs of providing different measures for each of the client groups and increases the measure's utility (e.g., in planning or evaluating a program of services). Measurement and Statistics Despite advances in methodology and analysis, measurement remains the foundation of statistical inference. The validity of these measures dictates the extent and nature of inferences that can be made as a result of the analysis. Thus, measurement issues cannot be underestimated in their importance when considering the presentation of statistical material. Fluency with measurement issues will impact both the design of studies and the subsequent statistics used in the analysis of their data. The interaction between design, measurement, and statistics is often overlooked when planning a study. Briefly, consider the impact of an invalid or unavailable measure on design and statistics. Clearly, a study could not be designed if measures were unavailable. Likewise, interpretation of results would be impossible in the presence of an invalid instrument.

< previous page

page_227

next page >

< previous page

page_228

next page > Page 228

In terms of the organization of this chapter, screening is synonymous with measurement, and treatment planning is based on valid measurement. Moreover, progress and outcome analyses as discussed in the subsequent section are impossible without valid and reliable measures. Thus, the next sections present a careful review of the foundations of measurement as well as new advances in the field of psychometrics, linking these points to the discussion of statistical analysis. Screening and Treatment Planning Primary Objectives There are two. The first is to provide reliable and valid evidence as to the appropriateness of the client's eligibility for a treatment. If appropriate and eligible, then the second objective is to obtain evidence as to which treatments or services would best help the client progress toward the outcome goals. At a minimum, statistical procedures required to assess reliability and validity must be evaluated in the context of these two objectives. Scale Reliability and Consistency of Clinical Communication Although a scale may have a published history of reliability, it is sometimes useful to determine if the local applications of the scale are reliable and, if not, to identify the factors contributing to reduced reliability. Note that a number of forms of reliability exist. In the current discussion, refer to estimates of internal consistency, particularly what is commonly called Cronbach's alpha. However, other forms of reliability provide valuable information. The coefficient of concordance (sometimes referred to as parallel forms) provides information about how sections of a measure relate to one another and if measurement is consistent over sections measuring the same construct. Similarly, the coefficient of stability (sometimes referred to as test-retest) provides information about change in measurement over time when no intervention has occurred. Such forms of reliability are not discussed here, but nevertheless add to the overall psychometric evaluation of an instrument. By conducting studies to assess the scale's reliability, it is possible to investigate the consistency of staff communication about the client. The concern for staff communication emanates from the need for staff to maintain a consistent frame of reference when discussing a client's strengths, problems, treatment goals, and progress. Moreover, if done properly, it will be possible to identify the factors depreciating the reliability of staff communication, thereby determining the need for staff development and future training. The general concept of reliability can be described as a proportion. The proportion's numerator describes the amount of variability due to true measurement of individuals' functional characteristics, bd. The denominator describes the referent, or baseline variability. The referent is the variability due to the functional characteristics plus the variability due to extraneous factors, e (e.g., those factors that reduce the consistency with which the instrument is employed). Essentially, the variance that the items have

< previous page

page_228

next page >

< previous page

page_229

next page > Page 229

in common is divided by the total variance of the items in the scale. The proportion is expressed as follows:

Reliability increases directly with variation due to differences in the functional characteristics in the target population, s2bd, and decreases with variability due to extraneous factors, s2e. Internal consistency considers the variability among items, Ip, used to estimate the functional characteristic, d. The expected value of a set of unbiased items, Ip, equals the expected value of d for the ith person in the bk target group. As the correlation, rId, increases, the items are said to have increased internal consistency with each other and with their estimation of d and therefore to have increased reliability. This can be seen in following expression:

where (1 - r2Id) is the proportion of the total variance that is not described by the relation between I and d across individuals. Earlier in the chapter, "item difficulty" was identified as a potential source of error variance in the basic measurement equation. Item difficulty in the present context could be described as the extent to which the internal consistency of items varies among persons functioning at different levels within the target population at time t, bkt. Reliability-internal consistency estimates could be used to estimate item difficulty effects. This could be done by obtaining the reliability-internal consistency estimates for persons functioning at the lowest, middle, and highest third of the distribution on a global measure of functioning (e.g., using the current DSM-IV Axis V). If the proportions do not differ significantly, then item difficulty, as defined here, is not a significant source of error variance. (It must be noted that in other environmentse.g., educational performance testing and personnel selectionhaving item difficulty as a significant source of variance is considered to be desirable.) Procedures for estimating internal reliability are part of most statistical packages (e.g., SPSS, SYSTATTESTAT, SAS) as either a stand-alone computer program or part of a factor analysis program. Interrater Reliability and Clinical Communication The second major concern regarding reliability in the screening and treatment planning processes is interrater reliability. This is particularly relevant where raters are members of a clinical team. If their judgments vary, then treatment goals and treatment actions will differ in ways not necessarily related to client needs. For interrater reliability the descriptive expression is:

The magnitude of the interaction, S2(rater × bd), decreases as agreement among raters increases. Thus, interrater reliability increases as this interaction decreases. Four major

< previous page

page_229

next page >

< previous page

page_230

next page > Page 230

features of an instrument are said to increase interrater reliability. The first is to increase the internal consistency of the items, particularly by anchoring the items to objective referents that have the same meaning to all team members. The second is to develop the instrument's instructions so as to minimize the influence of inappropriate differences among raters. Unfortunately, the only means of uncovering ''inappropriate" rater behaviors is through the experience of pilot testing the instrument in a variety of situations. Thus, it is also important to look at the testing history of the instrument. The third is training and retraining to correct for initial rater differences and the "drift" in the individual clinician's frame of reference that can occur over time. Fourth and last, it is critical to anchor observation in behaviors as much as possible to reduce inference. The greater the inference required, the greater the likelihood of error being introduced. High inference coding procedures generally produce lower reliability estimates than low inference coding procedures because perception varies by individual raters as well as the intentions of those being rated. To maximize interrater reliability, training manuals with example cases are very important. As described in Newman and Sorensen (1985), it is possible to fold some of the training and retraining activities into treatment team meetings and case conferences. Discussions of methods of assessing factors that may be influencing interrater reliability are presented in Newman (1983) and in Newman and Sorensen (1985). The major computer packages provide programs that permit partitioning the sources of variance (univariate or multivariate) due to differences among raters when multiple-rater data are entered for the same individuals. Measurement Model Testing Measurement modeling currently represents one of the forefronts of psychometrics in terms of construct validation. This section provides an introductory review of measurement modeling and testing. Remember that the purpose here is to detail the statistical process in an effort to refresh the reader in the analytic method, or to provide enough foundation that a conversation with a resident statistician is possible during analyses. Measurement modeling is a fundamental step to be conducted prior to data analysis with regard to research questions. Psychometric examination remains a universal first step in data analysis. Measurement modeling is conducted via the more general technique of structural equation modeling. A number of software packages for structural equation modeling are available, including the more common LISREL (developed by Joreskog & Sorbom), EQS (developed by Bentler), and AMOS (developed by Arbuckle). Because these packages are under constant review and revision, it is best to seek the latest information about each. There are also shareware programs available on the Internet. Each program handles data slightly differently and differential constraints on variance are placed by each program. Users of these programs are urged to familiarize themselves with the subtleties of the program they use in order to understand their results. Measurement modeling initially involves identifying the factor structure of each of several instruments and subsequently defining relations among instruments based on similar constructs. Consider a 10-item instrument composed of two, 5-item scales that measure M and N. In principal axis factor analysis, the 10 items would form a linear composite that may or may not reflect the two constructs, M and N. Factor analysis

< previous page

page_230

next page >

< previous page

page_231

next page > Page 231

partitions variance and creates factors mathematically. Conversely, measurement modeling imposes a structure on the data forcing the program to consider only the solution of two factors composed of the designated 5-item sets. (The terms latent variable, construct, and factor are often used interchangeably, with latent variable used more often in structural equation modeling.) The formal mathematical statement that represents how well a set of items measures a construct plus error is

Where X represents a vector of observed responses of items. Lx represents a vector of factor loadings imposed by the user that essentially suggests the amount or contribution of each item to the vector of factors, X. Finally, d represents a vector of disturbance terms, or error. This equation can easily be expanded to include various measures at the item level. Because it involves vectors, the equation represents an unlimited set of items relating to an unlimited set of constructs. In measurement modeling, it is desirable to determine how constructs relate to one another, thus any number of constructs can be introduced in order to examine their interrelations. For example, another measure with two subscales, D and E, could be added to M and N. All items could then be included in the analysis and a 4-factor model of M, N, D, and E could be tested. Or, if M and D supposedly measure the same construct, it would be possible to test a three-factor model with items that loaded on M and D loading on a single factor. The power of measurement modeling is to define the interrelations of the data based on constructs prior to intervention and to create latent variables based on diverse measures that might also include diverse methods. In the previous example, M could be self-report and D could be clinician ratings. In examining factor structures via measurement modeling, there is also interest in assessing how well the a priori structure is reflected in the data. That assessment is termed "fit." A large number of fit indices have been developed, each of which is designed to reflect the amount by which a model reduces the covariance among items in a set of observed (or item-level) data by describing the results in terms of the factors or constructs instead of describing the results in terms of the individual items. Thus, a fit index of .90 often suggests that a hypothesized model has reduced covariance by approximately 90%. In recent years, fit indices have been developed to allow for model comparisons, assessment of parsimony, as well as covariance reduction. Like structural equation modeling in general, a full discussion of fit indices is impossible here. In general, however, the comparative fit index (CFI; Bentler, 1990) and the nonnormed fit index (NNFI; Bentler & Bonett, 1980) represent two important indices in the assessment of most measurement models with values approaching 1.00, representing the best fit. What does the testing of a measurement model mean in terms of construct validation? Measurement modeling provides a method of testing whether various sources of data (cf. self-report and observer) are related to one another, as well as whether various constructs are independent or interrelated. In a nutshell, the measurement model represents the measures and the interrelations of the measures employed in any study. A measurement model supported by a fit index exceeding .90 provides evidence that multiple measures, whether by source or method of the same construct, are related to one another. Thus, the measures that meet a conservative criteria such as .90 provide evidence of construct validity.

< previous page

page_231

next page >

< previous page

page_232

next page > Page 232

Other Forms of Scale Validation It is often inexpensive to collect some additional data to estimate the instrument's concurrent and construct validity as a screening instrument. This can be done by first identifying variables that ought to be related to (i.e., predict) the instrument's factor scores (scores derived from the factor analysis). Then a set of multivariate regression or analysis of variance equations are developed to estimate whether relations that ought to exist do so. Variables often used in such validation analyses for populations of persons with a severe mental illness are: prior psychiatric history (e.g., hospitalizations, number of episodes per year), major diagnosis (e.g., schizophrenic versus nonschizophrenic, with or without a dual diagnosis of substance abuse), employment (school attendance) history, scores on known measures (e.g., Beck Depression Inventory, SCL-90-R, State-Trait Anxiety Inventory, MMPI-2, current Global Assessment of Functioning), and social support (yes-no, number of contacts per week/month, or number of housing moves over the last 6 or 12 months). In the present context, these variables are referred to as predictor variables, that is, variables that the literature indicates should predict differences in scores on the instrument for persons in the target population. The selection of the predictor variables must be tailored specifically to the target population and the types of service screening decisions that need to be made. Once selected, the use of multivariate analysis of variance or regression analysis requires the user to identify these variables in a prediction equation. For the present case, the instrument's factor scores should be listed on the left side of the expression and the predictor variables are listed as a sum on the right:

Note that in the true multivariate case, each of the predictor variables is correlated or regressed on the family of measures Y1,Y2, . . . , Yp via a statistic called a canonical correlation, that is, a correlation between an independent variable and a set of dependent variables. The canonical correlation has an interesting clinical interpretation. First, assuming that the set of dependent measures in the multivariate analysis represents a profile of one or more behavioral domains of clinical interest, then the canonical correlation represents a description of the strength of the relation of that variable with the clinical profiles represented by the set of dependent measures. That knowledge could lead to developing hypotheses about how the manipulation or control of that predictor might influence outcome as represented by that set of dependent measures. If the predictor is a potential moderator variable, then the interpretation might be one of trying to determine for whom an intervention (treatment) works best. Some sets of predictor variables can be considered as main effects and some are best considered as a first-order interaction. For example, DSM-III-R Axis V (Global Assessment of Functioning) at admission is often a significant effect when considered as a first-order interaction with either current social support or prior psychiatric history on multiple factored scales for the seriously mentally ill (Newman, DeLiberty, Hodges, & McGrew, 1997; Newman, Griffin, Black, & Page, 1989; Newman, Tippett, & Johnson, 1992). Thus, the expression for this example would be:

To do this for a community-based sample, ordinal classes of Axis V scores (1-35, 35-50, 51-65, 65+) and prior hospitalization (none last 12 months, once, two plus) were created

< previous page

page_232

next page >

< previous page

page_233

next page > Page 233

that would produce a sufficient number of subjects within each combination (cell) of the interaction (Newman et al., 1992; Newman et al., 1997). For the mixed nursing home and community-based sample, the ordinal classes on Axis V were more detailed at the lower end (1-25, 26-40, 41-60, 61+; Newman et al., 1989). One Application of Reliable Instruments: Discriminant Analysis The focus here is on how well the instrument's scoring procedures, often factor scores, will lead to correctly placing a client into an appropriate service modality or program. The key questions here are: How does one define a correct placement? What is meant by an appropriate service modality or program? There have been two general approaches to answering these questions. One is to contrast the recommendations or predictions made by the instrument with the placement recommendations of an experienced group of clinicians. Suppose researchers wish to evaluate a scale with p factors in terms of how likely it will correctly place a person in the appropriate service modality. For a scale that discriminates well, a discriminant score, Di, can be estimated for an individual from the scale's p factor scores, Xj (j = 1, 2, . . . , p), where the value of Di, implies a recommendation of the most appropriate service modality placement. The computation of Di for the ith person is the sum of each of that person's factor scores, Xij, weighted by the factor's coefficient, bj. Thus,

To determine which service modality group assignment, Gk, is most appropriate, a Bayesian rule is applied that gives the probability of each group assignment for a given value of Di. In a discriminant analysis, the probability for each group assignment, Gk, is computed for each value of Di, and the group with the highest probability, relative to the other possible group assignments, is labeled the "appropriate" service modality. The specific expression of Bayes' rule for estimating the probability of an assignment to the Gk service modality group for the ith person is:

The second approach to validating the use of the instrument's factor structure (and the discriminant function analysis results) in service modality placement is to estimate the relations between the test scores, the program of services, and client outcome. If the relation between service program and outcome is improved by knowing the screening test results, then the instrument can be viewed as beneficial in the screening and treatment planning process. Collecting data for this approach takes considerable time. Thus, the discriminant analysis is best applied first. However, because successful outcome is considered to be the gold standard, the second approach should be planned and conducted as a long-term evaluation of a screening instrument's worth. Linking Levels of Need to Care: Cluster Analysis Another approach can be employed when using a multidimensional instrument to recommend a more complex array of treatments and services (e.g., services for persons with a serious and persistent mental illness). If an instrument has an internally consistent

< previous page

page_233

next page >

< previous page

page_234

next page > Page 234

factor structure, then a cluster analysis technique can be employed to identify a mix of consumers with similar factor scores that are likely to have similar treatment and service resource needs (Newman et al., 1989; Uehara, Smukler, & Newman, 1994). The approach uses one or more panels of experienced clinicians in a structured group decision process to identify a treatment plan for a person who is described as having a given level of problem severity or functioning within one of the factors. For example, the panel might be given the following description of a person with a moderate level of depression: Signs of Depression: insomnia-hypersomnia, low energy-fatigue; decreased productivity, concentration; loss of interest in usual activities and in sex; tearfulness and brooding. Moderate: The signs are less severe than intense (the previous task considered an intense level of depression) and will often, after a few days or weeks, shift to either periods of normal behavior or moderate manic levels. Because of severity and duration of the signs are less than severe, the person is generally more cooperative with therapeutic efforts, but will seldom seek out the assistance himself. The panel is provided a list of available services and the professional disciplines of those who could provide each service. Using nominal groups procedures, the panel is then asked to indicate which services would be provided for the next 90 days, by whom, how often, and with what duration (e.g., 1 hour per day, week, month, or for 90 days). In the study by Newman et al. (1989) of nursing home and community care in Utah, 31 services were available. The panels developed an array of services for each of three problem intensities within each of 11 factors: psychotic signs, confusion, depression with suicide ideation, agitated-disruptive behavior, inappropriate social behavior, dangerousness-victimization, personal appearance, community functioning capability, socialinterpersonal adaptability, activities of daily living, and medical-physical. This exercise lead to 33 treatment service plans. The final step in linking level of need to level of care is to cluster together those treatment service plans that require similar resources (mostly professional personnel) for similar amounts of time. To perform a clustering of similar treatment plans within the total of 33 treatment plans, a common measure of therapeutic effort was developed based on the costs of each unit of service recommended in each of the 33 treatment plans. The employment costs of the professional(s) in a given service were estimated by the usual accounting procedures based on the salaries of who did what, with what resources, how frequently, and for what duration. To complete the cluster analysis, a matrix of service costs for each of the 33 combinations of factor intensity by service was set. The matrix is shown in Table 8.1. The statistical cluster analysis used here sorts through the columns of the matrix and clusters those columns that have the smallest adjacent cell differences (distances). The various statistical software packages permit the user to employ any one of a number of rules for assessing the distances between the cell entries in adjacent columns. The Utah study employed a Euclidian measure of distance contrasting differences between the costs of the service of adjacent cells in a pair of columns, Distance , where i = 1, 2, . . . , 31 services. It was found that the 33 columns of 31 service-cost cell entries formed six clusters. In other words, there were six patterns (clusters) of services that could be considered together as program modalities to provide coverage for all of the consumers sampled. Stated another way, the services listed under a cluster had similar staffing and scheduling requirements for consumers with a given

< previous page

page_234

next page >

< previous page

page_235

next page > Page 235

Service 1 Service 2 | | | Service 31

TABLE 8.1 Schemata of the Data Matrix Used in the Cluster Analysis Factor 1 Factor 2 Factor 3 Minimal Moderate Severe Minimal Moderate Severe Minimal Moderate Severe $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx | | | | | | | | | | | | | | | | | | | | | | | | | | | $xx $xx $xx $xx $xx $xx $xx $xx $xx

< previous page

page_235

next page >

< previous page

page_236

next page > Page 236

set of characteristics. Most important, the consumer characteristics could be described by their scores on the multifactored scale. Closing Thoughts on Evaluating Measures The well-known influences of initial impressions, when documented by a psychological assessment technique, can be a wonderful asset or an expensive liability. Unless the consumer chooses to terminate treatmentand many dothe initial course of treatment typically persists without major modification for months and often years. Using an assessment technique to screen and to plan treatment is a social commitment by the clinician(s) and those who pay for the service. The strengths, problems, and goals identified with the support of the assessment should guide treatment. Reliability and validity studies are conducted on the instrument so that there are hard data to support the decision to influence the life of another person, the client, and to use the clinical and economic resources needed in doing so. A theme that runs through the first part of this chapter is that there are methods to determine if an instrument is effectively doing the job. Even if there are studies in the published literature that demonstrate the instrument's reliability and validity, it is desirable to perform studies on its local application. An additional theme is that some of these methods can be used as the basis for staff training and development as well as an empirical basis for utilization and quality assurance review. Thus, the statistical procedures described here can be useful in the internal evaluation of a program's quality as well as in studies on the instrument's application. It might be informative to consider a brief example of an internal evaluation study on the consistency of clinical communication in a program serving persons with a severe mental illness. Consistent communication across members of a treatment service team is considered to be important in supporting this population of consumers. Suppose that it is found that the interrater reliability among members of the service team is low on an instrument that has been demonstrated to be reliable and valid for this population under well-controlled conditions. In this case, it might be suspected that members of the service staff have different frames of references when describing a consumer's functioning. If such differences (inconsistencies) exist and go undetected, then the team members will treat the consumer differently based on their different frames of reference. Because consistency is vital to successful outcome in mental health service planning and delivery, inconsistent communication will probably lead to a breakdown in the quality of care. Progress and Outcome The section begins with a description of four assumptions about defining treatment progress and outcome goals that need to be addressed prior to selecting a statistical procedure for assessing questions regarding client progress and outcome. The remainder of the chapter focuses on a sequence of six clinical service questions that set the stage for selecting statistical questions. Recommendations for selecting a statistical procedure are given under each of the following six questions: Did change occur, by how much, and was it sustained? For whom did it work best? What was the nature of the change?

< previous page

page_236

next page >

< previous page

page_237

next page > Page 237

What effort (dose) was expended? Did the state(s) or level(s) of functioning stabilize? What occurred during the process? Specifying Treatment Service Goals The selection of a statistical approach to describe treatment progress or outcome should depend on the anticipated goal(s) of treatment. There are four assumptions that must be explicit when specifying a treatment goal and selecting a statistical approach. The first is that the consumers (clients, patients) are initially at a clinically unsatisfactory psychological state and/or level of functioning, and that there is a reasonable probability of change or at least stabilization by a given therapeutic intervention. Second, an agreed on satisfactory psychological state or level of functioning, observably different from the initial state, can be defined for that individual or for that clinical population. Third, it is assumed that a measure, scale, or instrument is available that can reliably and validly describe the status of the person in that target population at any designated time. The fourth assumption is that the instrument's score(s) describing an individual at an "unsatisfactory" state is reliably different from the score(s) describing that same individual at a "satisfactory" state, and the scores are not limited by "ceiling" and "floor" effects. If each of the assumptions is met, then specifying treatment goals and selecting the statistical approach to estimate the relative effectiveness of client progress or outcome as described in the specific goals can proceed. To support this approach, the remainder of this chapter organizes the discussion of the statistical procedures around the six generic questions regarding the achievement of specific treatment goal(s) for specific therapeutic intervention(s). The six questions and related statistical approaches are ordered from a macrolevel to a microlevel of investigation. This is done to give a context for determining what should be studied and how the data should be analyzed. A statistical procedure must fit the context of the question being asked. The statistical literature has a rich history of controversy and discussion on which aspects of change should be investigated and how each aspect should be analyzed (Collins & Horn, 1991; Cronbach & Furby, 1970; Francis, Fletcher, Strubing, Davidson, & Thompson, 1991; Lord, 1963; Rogosa, Brandt, & Zimowski, 1982; Rogosa & Willett, 1983, 1985; Willett, 1988; Willett & Sayer, 1994; Zimmerman & Williams, 1982a, 1982b). Historical controversies can be avoided by carefully articulating the research or evaluation question. The criterion for a well-developed question is that it frames the appropriate unit and level of analysis relevant to the question. To use the following section, the reader should first formulate an initial draft of the question(s) and the unit of analysis. Next, the investigator should check that the four assumptions have been made explicit, modifying the question(s), if necessary. Then, investigators can match their research question(s) to those given next and consider the recommendations offered. Although there is no "perfect" method, the discussion should provide the reader with guidelines for identifying the best method for their own situation. Question 1 Did Change Occur, by how Much, and Was it Sustained? Did specific domains of consumer functioning or states change by the end of therapy? If so, by how much? Were the changes sustained over time? These are often considered to be the first questions that those developing a therapeutic innovation seek to address:

< previous page

page_237

next page >

< previous page

page_238

next page > Page 238

Does the therapy make a difference? There are two general approaches that have been employed when addressing this question. One is to investigate the magnitude of the difference between the pretreatment and the posttreatment scores on the selected measure across subjects. The second is to contrast the trends on the status measures taken on subjects over time (pre-, during, post-, and follow-up to treatment). Each approach has its strengths and limitations, but they both have the potential to provide a gross estimation as to whether the therapeutic intervention makes a difference. Difference Scores. The difference score, Di, is often considered the most basic unit of analysis, with Di = (Xi1 - Xi2), where Xi1 and Xi2 are the observations recorded at Time 1 (usually prior to treatment) and at Time 2 (usually at the end of treatment) for the ith person. The mean values of Di can be contrasted between groups or within a single group against an expected outcome of no difference (D = 0.0, or an equal number of positive and negative values of D). There are two issues to be addressed here. One is to decide whether to use Di as the basic unit of analysis. The second is the research design used to address the question. There is a extensive and controversial literature on whether to use Di. Discussion has focused on two features of Di: the reliability of the difference score is inversely related to the correlation between the pre- and posttreatment measures, and the potential for a correlation between initial (pretreatment) status and the magnitude of the difference score. The potential bias introduced by these two features led Cronbach and Furby (1970) to recommend that the difference score not be used at all. They recommended that researchers instead concentrate on between-group outcome, posttreatment measures. Others have argued for the use of alternatives such as a residualized gain score, where the difference score is adjusted for initial, pretest differences (Webster & Bereiter, 1963). There is, however, an opposing point of view. Rogosa et al. (1982), Rogosa and Willett (1983, 1985), Willett (1988, 1989), and Zimmerman and Williams (1982a) collectively developed arguments with sufficient empirical support to conclude that difference scores were being damned for the wrong reasons. They also provided strong evidence that some of the most popular solutions (e.g., the residual gain score) have worse side effects than the problems those they sought to solve. The potential inverse relation between the reliability of D and the correlation between pre- and posttreatment scores is not necessarily a problem. The difference score is an unbiased estimator of change (a process), and scores at each of the other times (pre-and post-) are estimators of status (not a process) at those two respective times. When there is low reliability in D, it is to be interpreted as no consistent change. But as Zimmerman and Williams (1982a) showed, it is possible for the reliability of the difference score to exceed the reliability of the pretreatment score or the reliability of the posttreatment score. When this occurs, it is still valid to conclude that there is reliable change for persons even though there were unreliable measures obtained at Time 1 and at Time 2. Although there is a problem of measurement error at each of these times, it can still be concluded that there is a consistent change (a reliable process) among the inconsistent measures of status at each time. The second problem regarding Di pertains to the correlation between the magnitude of a difference score and the value of the pretreatment score. It is intuitively obvious that a person with a low pretreatment score, indicative of more severe psychological maladjustment, would appear to have a greater probability of obtaining a higher score on the second occasion (posttreatment). Despite this obvious relation, Rogosa and Willett (1983) showed that there is a "negative bias" in the estimate of the correlation

< previous page

page_238

next page >

< previous page

page_239

next page > Page 239

between the initial score and the difference score. Negative bias must not be misinterpreted as a negative correlation (Francis et al., 1991). Rogosa and Willett (1983) amply demonstrated that a raw difference score is not the best statistic for estimating the correlation between initial status and change. A number of texts recommend using a residual gain score to estimate the differences. The residual gain score is calculated by adjusting the difference score by the correlation between initial level and either the posttest score or the difference score. However, Rogosa and Willett (1985) and Francis et al. (1991) recommended that the residual gain score is a poor choice and should be avoided. There are two critical flaws with using the residual gain score. First, when used in the context of clinical practice, it describes a state of affairs that does not exist in reality (adjusting subjects to be at the same initial level, which is seldom, if ever, true). Second, it adjusts a measure of change (a process measure) with a status measure (pretreatment scores). The resulting statistic is no longer an unbiased measure of the change process because it was adjusted by a measure of status, which contains its own unique sources of errors. Should a difference score be used? The answer is yes, if the research question is simply stated (e.g., "Is there change related to treatment?"). Unfortunately, the issues related to a treatment intervention are often more complex. At a minimum, the investigator typically questions the treatment's differential effects with regard to one or more consumer characteristics over the course of treatment. Recently, investigators have become interested in the sustenance of the change after treatment has formally stopped. Some of these concerns can be assessed at the level of the first question (Did change occur, by how much, and did it sustain?). In these instances, the issues when addressing these refined questions pertain to design: What groups need to be contrasted? When does one need to sample consumer behaviors? In other instances, the refined question requires going to another level of focus. Comparing Trends in Status Measures Over Time. For those question refinements that can still be articulated as "Did change occur, by how much, and did it sustain?," issues of design and sampling time frames need to be identified. One design issue that can be easily dealt with is whether a Solomon four-group design is required in the evaluation of a therapeutic intervention. The Soloman four-group design attempts to control for or estimate the effects of carryover from pretest to posttest. Half of the subjects in the treatment group and half of the subjects in the control group are randomly assigned to a "no pretreatment" test condition and half to both a pre- and a posttreatment test condition. The carryover effects of testing that are estimated via a Solomon four-group design may not be a factor in most treatment research. Most consumers who enter therapy are not naive as to what their problems are, nor are they naive as to the general purpose of the intervention. This is particularly common in the experience of those working with persons with a severe mental illness or with those who are substance abusers. Moreover, work by Saunders (1991) has clearly shown that the majority of those who have entered psychotherapy and go beyond two sessions had prior experience with at least the intake process for psychotherapy. Given this, serious researchers studying a particular therapy or therapies should, if the opportunity presents itself, run a pilot study with a Solomon four-group design to assure themselves that such carryover effects are not a significant source of variance. There are two remaining design issues: What groups ought to be sampled? When does one collect data? The quick answer to the first is to sample those groups that satisfy the question and eliminate alternative explanations. As discussed earlier, it is

< previous page

page_239

next page >

< previous page

page_240

next page > Page 240

best to partition consumers by levels of any characteristic that is expected to modify the impact of the treatment. Enough has been written about the difficulty of interpreting single-group results so that most researchers will understand the need to develop either a ''waiting-list" control or a "treatment as usual" control to contrast with an innovative treatment. There is also a quick answer to the second question: If possible, collect data two or more times during treatment and two or more times after treatment. An optimal minimum is to collect data at four times: pretreatment, half-way through treatment, posttreatment and at least once in follow-up (e.g., 6 months following treatment termination). In this case, the inclusion of a fifth time pointsay, 1 year following treatmentallows for the testing of the stability of posttreatment effects. A typical design here would be a mixed between-group repeated measures design with two between-group variables (treatment variable, aj, and an initial consumer characteristic identified as a "moderator" variable, bk) and one within-group (time) variable of two, three, or four levels. Here, an analysis of variance or multivariate analysis of variance of the linear and quadratic trends can describe between-group differences in the direction and rates of change over time. Between-group contrasts of the linear trends within each group offer a test of whether direction and magnitudes of change vary as a function of groups. Between-group contrasts of the quadratic trends within each group would describe whether there are significant differences among groups in how their initial changes were modified over time. For the analysis of functional status over the three times from pretreatment, midtreatment to immediately posttreatment, the between-group contrast of quadratic trends would describe change over the course of treatment. When considering the changes between pretreatment and followup, evidence of regressive or sustenance trends could be tested. As was true of other forms of univariate and multivariate analysis of variance, the standard statistical packages have programs that can perform these analyses. One problem with these forms of analysis is that they do not tolerate missing data. They discard all subjects with data missing at any one point in time. But, if the research questions are at the macrolevel of group trends and effects, then these designs will be adequate. From experience, most investigators using these designs are initially satisfied, but then wish to understand some of the differences among subjects within the groups. Here, the analysis of variance methods are limited and the more microlevel questions discussed later are found to be more satisfying. In recent years, random regression models have increasingly dominated the way in which change is described and analyzed. Random regression models define the set of general statistical models that allow for change to be modeled over time such that individual trajectories are compared to one another, and then groups of trajectories are compared for statistical differences. The more common of random regression models seen in the literature is the Hierarchical Linear Model (HLM; Bryk & Raudenbush, 1987). HLM has found success in the outcome literature because of its ability to model individual and group trajectories and relate these to important clinical questions. Hierarchical Linear Models are so named because of the hierarchical structure of data required for analyses. In the present scope of mental health outcome, consider one typical hierarchical nesting structure occurring in these studies. Specifically, repeated measures of a symptom are nested within individuals and individuals nested within groups (such as treatment vs. no treatment, or innovative treatment vs. treatment as usual). The discussion of HLM begins here in a limited scope by examining the question of whether changed occurred. HLM is discussed later using its full application in the section, "Question 3: What was the nature of the change?"

< previous page

page_240

next page >

< previous page

page_241

next page > Page 241

HLM provides for statistical analyses very similar to repeated measures analysis of variance (ANOVA). However, HLM moves beyond ANOVA-based models by relieving the necessary restrictions these models place on data. ANOVA-based models require that time points be unvarying when data are collected. Therefore, data collected from subjects at 6 months postadmission result in a hard and fast rule of a 6-month interval that must be applied throughout the study. However, psychotherapy, as an example, rarely follows clean delineation of time. Termination from psychotherapy varies and thus is difficult to characterize as simply X months after admission. For the sake of analysis, the involvement of data collected at termination in a series of data collection time points is not immediately possible because of the ANOVA restrictions. Another situation where HLM provides advantages over ANOVA-based techniques involves missing data. Missing observations have serious impact on ANOVA-based techniques. The result of a single missing observation is the listwise (complete) deletion of the case. Radical loss of power can result when observations are missing at a single time point, even if data at subsequent time points are present. Because HLM extrapolates trends based on available observations, missing data does not result in a harsh elimination of observations. Rather, whatever data is available is used in the estimation of the group's change. Thus, data points are recovered in HLM when cases are lost due to any form of attrition and power retained. A broader discussion of rates of change is forthcoming in Question 3; however, several recommendations and considerations about HLM and ANOVA-based techniques are important to consider at this stage of discussion. First, HLM offers the advantage of allowing for time points that vary across subjects. This is only relevant when more than two time points have justifiable differences in terms of time. However, the advantages of HLM over two, and even three, time points is not defensibly superior to ANOVA-based techniques such that they warrant one to conclude that either is better. Beyond the fact that they have served as an historical bedrock, ANOVAbased techniques are rudimentary in describing whether change occurred (the point of Question 1). HLM, too, will describe whether two groups differed significantly from one another over two time points. The advantages of HLM are not immediately apparent until three more time points are included in the analyses. Then, HLM can answer important questions about trajectories and the nature of change that can only be described in a very rudimentary fashion with ANOVA-based techniques. Question 2 For Whom Did it Work Best? What were the characteristics that differentiated those who did from those who did not achieve a satisfactory state or level? There are two levels of discussion required of this question. The first is a review of the historical issues and findings of clinical investigations on the interactions between consumer characteristic and treatment. This concludes with recommendations focusing on how a treatment's theory can be used to identify design variables and classes of consumer variables that may interact with and impact on the outcome of treatment. The second level of discussion focuses on analytic methods that follow from how the design is developed. The study's design, of course, would follow from its theoretical rationale. When investigating the relations between initial status and change, the research question should be refined to focus on the consumer characteristic that predicts different rates of change. The refined question considers the characteristic related to initial status

< previous page

page_241

next page >

< previous page

page_242

next page > Page 242

as a moderator variable (i.e., that client attribute that alters the effects of the treatment variable). In other words, individuals in one category of a moderator variable experience a different outcome than individuals in other categories of the moderator variable. The logic here focuses on the interaction between the moderator variable and the treatment variable. For the simplest case, the values of Dijk would be predicted by the consumer's initial level on that the characteristic, bk, to moderate the potential impact of the treatment, aj, and this influence will be observed as an interaction effect, abjk. This conceptualization results in an expression that can then be evaluated by either a regression or analysis of variance using the standard statistical computer packages:

There is a strong logical argument for incorporating a well-conceived consumer characteristic as a moderator variable in most therapeutic intervention studies. The field has long claimed that individual therapeutic approaches are not all things to all people. But it is also becoming apparent that this question needs to be refined. A consumer characteristic moderator variable is just that: A variable that potentially moderates the degree to which the therapeutic intervention will have impact because of a characteristic brought into therapy by the consumer. It also could be argued that if the theoretical construct underlying the therapeutic intervention is adequately developed, then the moderator variable(s) should be easily identified. It is also possible that the variable might be a mediating variable rather than a moderating variable. Here a mediating variable is one whose refinement, presence, or absence in the design is required in order for the therapeutic intervention to be observed. Shadish and Sweeney (1991) showed that effect sizes in psychotherapy studies were directly related to moderator variables such as the outcome measure selected, standardization of the therapeutic interventions, and setting in which study is conducted. By incorporating an appropriate consumer characteristic as a moderator variable in the research design, the investigator will be able to obtain estimates of relations that can then be employed in a testable structural equation describing how the various variables come together to produce a therapeutic outcome. Client-Treatment Interaction. The client-treatment interaction literature is mixed. Significant interactions of client characteristics and treatment approaches, particularly within the psychotherapy interventions, are typically not found (Garfield, 1986; Shadish & Sweeney, 1992; Shoham-Salomon, 1991). A similar situation exists in educational research regarding student aptitude by instructional technique interactions (Cronbach & Snow, 1977; Snow, 1991). This has fascinated and puzzled investigators for some time and a number of logical explanations have been offered, often at odds with each other. Smith and Sechrest (1991) questioned whether such interactions exist. Beutler (1991) argued that there are a number of significant examples with important theoretical impact, albeit few, to recommend continuing development of research techniques that can surface the nature of these interactions. Across critics and defenders of investigating the client characteristic by treatment interaction, there appears to be agreement on the issues that need to be addressed if such research is to be done. All agree that better theory development is needed, where the investigator ought to articulate the answers to several questions: Which consumer characteristics will moderate differential outcome and why? Which behavioral progress or outcome measures are sufficiently sensitive to detect the interaction effects? What courses or rates of change are expected among the client-by-treatment groups and why?

< previous page

page_242

next page >

< previous page

page_243

next page > Page 243

The studies that are the exceptions (i.e., showing significant interactions) selected dependent measures that were directly related to the theory being tested. The dependent measures were of two types. In one type, the investigators selected measures that described client behaviors closely linked to the theory underlying the intervention (e.g., Shoham-Salomon, Avner, & Neeman, 1989; Shadish & Sweeney, 1991). The second type of study employed measures of effort as the unit of analysis. Howard, Kopta, Krause, and Orlinsky (1986) showed that persons treated for anxiety had different dose-effect curves than those treated for depression. Turner, Newman, and Foa (1983) contrasted the costs of follow-up treatment for persons who had successfully completed a flooding treatment for their obsessive-compulsive behaviors, but who had different styles of conceptualizing information. The two conceptualization styles contrasted were those that used a one-dimensional cognitive style expected of persons with obsessive-compulsive behaviors, and those with a "normal" threedimensional cognitive style. Although all subjects achieved the experimental criteria of successful outcome in terms of their obsessive-compulsive behaviors (e.g., excessive hand-washing) as Beutler (1991) predicted, the amount and costs of follow-up psychotherapy over the next 12 months differed between groups. Ironically, most of the subjects with a one-dimensional cognitive style sought out additional psychotherapy care focusing on general anxiety over the next 12 months. Those who exhibited the three-dimensional cognitive style typically avoided follow-up mental health (psychotherapy) care. A 12-month follow-up showed that those who engaged in follow-up psychotherapy had lower levels of general anxiety. Newman, Heverly, Rosen, Kopta, and Bedell (1983) analyzed intake, termination, and service cost data on 949 clients in New Jersey community mental health programs. Clients with unstable employment, history of aggressive behaviors, and an unwillingness to be in treatment (as perceived by the intake clinician) had statistically higher costs of services during the period from intake to discharge. A major problem in testing for interaction effects is the fact that tests of interaction effects under traditional analysis of variance and regression techniques have lower power (McClelland & Judd, 1993; Jaccard & Wan, 1995). Through the use of simulation studies, these investigators have provided ample demonstration that detecting interactions and moderator effects typically have low power, even when difficulties due to measurement reliability are minimal. When measurement reliability is a problem, issue of power worsened. But there are some alternative techniques emerging. Jaccard and Wan (1995) found that structural equation approaches are more powerful in detecting such interactions. Even more exciting are the findings by Willett and Sayer (1994), who discovered that structural equation modeling can also be employed to test differences in pattens of change among different subgroups within and between treatment groups. Willett and Sayer ended with the strong recommendation that "we can measure `change'and we should" (p. 379). Several authors (Bryk & Raudenbush, 1987; Lyons & Howard, 1991; Willett & Sayer, 1994) have described procedures that can be called on to identify when a source of error variance may be covering the interaction of a moderator and a treatment variable that should be investigated in follow-up research. However, as indicated earlier, one alternative explanation for large error variance is measurement (un)reliability (Lyons & Howard, 1991). In explaining the lack of significant interaction effects, Beutler (1991) also noted that most consumers who are involved in research are typically motivated to achieve a satisfactory psychological or functioning status. The same could be assumed of the

< previous page

page_243

next page >

< previous page

page_244

next page > Page 244

treating clinician. Although the beginning and the end points of the treatment may look similar for consumers with different characteristics, the processes for getting to the end point may be different. Beutler recommended that investigators consider obtaining more data to describe the progress during treatment. The form of analysis should focus on the course and rate of change during the treatment time frame. If this argument is convincing, then the reader should also consider the analyses discussed under the next question, "What is the rate of change? " In fact, the examples discussed in that section did show rates of change that were related to client characteristics (Francis et al., 1991; Willett, Ayoub, & Robinson, 1991). The next several sections explore the various alternatives to assess the question, For Whom? Regression or Analysis of Variance of "For Whom?" A number of authors have argued that it is best to use a traditional approach to describe and test for the nature of an interaction (Cronbach, 1987; Rosnow & Rosenthal, 1989). The expression recommended by Rosnow and Rosenthal (1989) to describe an interaction score, ABjk, for the jkth cell, influenced jointly by the jth treatment level and the kth level of client characteristic is:

An analysis of the interaction's significance is simply that of creating an F test of the proportion

The numerator represents an estimate of the interaction after adjusting for the main effects of treatment and client characteristics. The denominator is an estimate of the error of prediction that is typical (averaged) across combinations of row and column effects. When the ratio is much larger than 1.00, it can be assumed that the differences among the cell means, after adjusting for main effects, is greater than differences due to measurement error. Thus, the F ratio provides an estimate of whether treatment effects differ in some systematic way over levels of the client characteristic. There is one worrisome assumption in the analysis of interaction effects. It is technically identified as the assumption of independence. This assumption holds that there is no correlation between error of measurement within the cells and any of the three between-group sources of variance: levels of treatment effect, levels of client characteristic, combinations of treatment by client characteristic. Why is this considered to be important? First, most tests for a significant interaction effect (e.g., the F test given earlier) are not valid because both the numerator and the denominator terms are influenced in different ways by the relations between random error and one or more of the independent variables. When this occurs, the statistical tests of the hypotheses will be either positively or negatively biased, depending on the nature of the relation. A second issue regarding the independence assumption is that the research question needs to be reconsidered given that the existence of such correlations indicate that the interaction is not one of simple mean differences. If one or more correlations are significant, then the mean effects are likely to be confounded with one or more of three sources of random error: interactions with item difficulty, error of measurement, or temporal effects. Meyers and Well (1991) recommended a rough rule of thumb to determine when to be concerned about the existence of heterogeneity of variance that may be related to such correlations (i.e., hetroscedasticity). They suggested creating a ratio of the largest

< previous page

page_244

next page >

< previous page

page_245

next page > Page 245

to the smallest within cell variance (what is sometimes called the F-MAX test), and if the ratio is greater than 4:1 for equal n designs and 2:1 for unequal n designs, then there is reason for concern. Here Myers and Well joined Kenny (1979) in advising a direct analysis of scatter plots and the correlations between individual observations and group means. Specifically, consider calculating each of three correlations: Xijk with Aj treatment groups across levels of Bk, Xijk with Bk client characteristics over levels of Aj, and Xijk with all levels of ABjk. If the magnitude of a correlation is equal to or greater than .2, then look at the covariance structure of the treatment and client characteristics with the measures of change. Hoyle (R. Hoyle, University of Kentucky, personal communication, November 1993) pointed out that an additional problem in regression with interaction terms is multicollinearity. Frequently, the interaction term will be correlated .90 or more with one of the main effect terms. A strategy recommended by Hoyle is to center (i.e., subtract the mean from each subject's score) the two main effect terms before creating the interaction term. In summary, the use of analysis of variance or regression models are direct methods of testing for an interaction effect. The scarcity of significant treatment by client characteristic interactions suggests that either the theories relating treatment to client characteristic are fuzzy, or the measurement technique selected is inappropriate, or the analysis of variance model (with easily violated assumptions) is inadequate. Another possibility is that Smith and Sechrest (1991) were correct when they asked whether there are any significant sources of interaction variance to detect. Structural Equations. Structural equation modeling (SEM) represents a method that can describe the magnitude of the relations among independent variables as antecedent to outcome (i.e., they are either causal to, or mediators of, outcome). Consider the hypothetical example shown in Fig. 8.1 where there are two paths to outcome (effect) from the treatment variable (cause). One path goes directly from the treatment variable to the outcome measure. The strength of this relation is described by the parameter (a). When estimated, the value of (a) describes the amount of change in outcome that can be predicted by one unit of change in treatment. When treatment is a dichotomous variable (experimental versus control), then the value (a) would be the coefficient that best predicts effect size in a point-biserial regression equation. The coefficient represents the mean difference between groups on the dependent variable. When the treatment variable is ordinal or quantitative (e.g., dosage), then the value of (a) could be the coefficient in a regular regression equation. At this point, note that in analysis of variance, and particularly regression, models can be thought of as elementary structural equation models. The second path to outcome in Fig. 8.1 includes the client characteristic. This route to outcome has two coefficients, (b) and (c). In structural equation analysis, one will often develop a picture of the alternative paths to be considered, and then develop and contrast structural equations that predict the relations for each path. In a formal structural equation analysis, one could contrast the strength of each of the outcome prediction equations developed to represent each of the paths to the outcome variable(s). The critical issue is whether considering the client characteristic enhances the prediction of outcome over the more direct path as described by (a). If the combined relationships of (b) and (c) have greater predictive value of outcome than (a) alone, then the addition of a client characteristic will improve the prediction of outcome for a given treatment level.

< previous page

page_245

next page >

< previous page

page_246

next page > Page 246

Fig. 8.1. Two paths that could be used to describe relations between treatment and outcome. A hypothetical example of hypnotherapy focusing on a classroom anxiety and the client characteristic of susceptibility to hypnosis to illustrate the causal model. According to Kenny (1979, p. 44), there are five basic steps in creating a structural model to perform a path analysis: 1. From theory, draw a set of structural equations that describes relations among the independent variables (called exogenous variables) and the effect (dependent or endogenous variables). This step is no different from what has been emphasized throughout this chapter. The prediction of an influential client characteristic must be based on strong theoretical assumptions of how the client characteristic will modify the effects of the treatment on a given set of observable behaviors. 2. Choose a measurement and design model. The measures must be directly related to those that the treatment is supposed to influence, and the incorporation of these measures into the design model must also be consistent with the theory of the intervention. Here the reader is asked to recall the four assumptions regarding selection of measures for an outcome analysis described at the beginning of the chapter. The design model (true or quasi-experiment) needs to be one that is sufficiently powerful to detect potential differences. Not only do researchers need to identify groups that will eliminate alternative hypotheses, but also levels of the client characteristics that offer the potential to show differences. 3. Respecify the structural model to conform to design and measurement specifications. As is true in any test of a model, the reality of collecting data in the real world takes hold. This step is stated formally to remind the researcher that an error of the third kind can be easily committed: to test a prediction with the wrong statistical model. 4. Check that there are sufficient numbers of predicted relations (each called an identification) to test the model. One of the dangers of using structural modeling is that a model can be created that has more unknowns than can be estimated. The major restriction here is the degrees of freedom in the covariance matrix, where the number of predicted relations cannot exceed the

< previous page

page_246

next page >

< previous page

page_247

next page > Page 247

number of terms identified to covary with each other. It is possible that a study will not produce sufficient information to reach a conclusion about the best model. 5. Estimate the covariances between the measured variables, and from the covariances, estimate the parameters of the model and test hypotheses. This is the bottom line where researchers can decide whether the data support a conclusion that by considering the client characteristic they enhance the prediction of treatment on outcome. There are two major limitations of structural modeling. First, the investigator must be able to develop very specific predictions. The discipline of developing such specific predictions has not been a common practice in clinical research. Thus, its use here will require the researcher to be more explicit in what is being predicted and why. The other is that there must be sufficient data to provide stable estimates of correlations (covariances). This often requires sample sizes in the hundreds rather than those frequently obtained in treatment outcome studies. The reader is referred to the classic texts written on structural modeling (Bentler, 1989; Bollen, 1989; Jöreskog & Sörbom, 1988; Hayduk, 1987; Kenny, 1979). Those not familiar with the matrix algebra might find the classic texts by Kenny (1979) or Hayduk (1987) as the best starting point. With new advances in technology, structural equation modeling programs such as LISREL, AMOS, and EQS are become increasingly accessible. That accessibility has been translated in increasing ease of use via graphical interfaces that allow for drawings rather than programming code to create models (as in EQS and AMOS). In the history of SEM programs, early versions of LISREL, for example, required the user to learn an arcane programming code consisting of the names of the matrices, numerical values, and few descriptors (Jöreskog & Sorbom, 1988, 1993). Although LISREL continues to read this code, the move toward graphical interfaces is part of the package that was available at the time of this writing. There is a troubling side to this ease of use. SEM is more than a simple test of parameters generated from covariance matrices resulting in a set of fit indices. SEM requires that the user develop a theoretical rationale, based on support from the empirical and theoretical literature for the model being tested. Previously, the development of the model required substantial effort on the part of the investigator as computer programming code was generated to represent the predicted relation. In present day, that has been replaced by graphical interfaces that allow multiple models to be generated in a single sitting of about an hour. The rapidity at which models can be developed places demands on the investigator such that each model developed must be given careful thought, with support from the literature. This is, lamentably, not done as often as it should (Scandura & Tejeda, 1997). The latest versions of SEM software permit automatic modification of the model under test such that a reasonable post hoc alternative can be considered. When this happens and the user pursues the model that results from the automatic modification, the confirmatory logic of SEM is lost and the analysis becomes exploratory. It is critical to note that unlike traditional hypothesis testing, SEM provides the opportunity of confirming and falsifying theoretical models. Thus, the construction of models based on automatic modifications obviates the important duty of falsifying existing theories in an effort to improve understanding of social, psychological, and behavioral processes: The use of automatic modification is simply pure mathematical speculation and solely exploratory. The reader may wonder whether these rantings manifest concerns. Figure 8.2 provides real data suggesting that the latent variable of bigotry is positively related to the latent variable of violent intentions; and the latent variable of self-esteem is negatively related

< previous page

page_247

next page >

< previous page

page_248

next page > Page 248

Fig. 8.2. A structural equation model describing the relation of bigotry and self-esteem on violent intentions. to the latent variable of violent intentions. Bigotry, self-esteem, and violent intentions each have two indicators depicted by squaresthe convention in SEM. Without entering into detail, the model fits the data quite well, with most fit indices exceeding .95. Figure 8.3 provides the results of automatic modifications that have resulted in fit indices exceeding .99; however, the resulting diagram now includes two new factors and a host of new paths. Whereas Fig. 8.2 provides an interpretable model, Fig. 8.3 presents a near saturated model of relations that fail to be parsimonious. Do "Something" and "Something Else" really add to the understanding of bigotry, self-esteem, and violent intentions? Although this example represents an extreme in the use of automatic modifications, more conservative uses of automatic modification can result in equal gibberish. Therefore, given that technological changes are inevitable, what recommendations can be made to the use of SEM knowing the concerns just mentioned? First, nothing replaces good theory and careful reflection. The cornerstone of structural equation models must remain careful assertions derived from theory and empirical findings. Second, it is profoundly helpful to construct a graphical representation of any structural equation including the measurement model. By producing a graphic, hypothesized relations become very clear. This is exceptionally helpful if the model will be run by another party, such as a statistician. Rival models should be depicted graphically as well. Third, negative findings should result in reflection as well as publication. Findings that fail to support theory form the basis of the theory's refinement. This refinement should not be driven by mathematics, but rather by careful thought. Question 3 What Was the Nature of the Change? There are several variations on this theme: What was the rate and character of the change? What did or did not change? Are the rates of change consistent within a treatment group?

< previous page

page_248

next page >

< previous page

page_249

next page > Page 249

Fig. 8.3. A modified structural equation model recommended by an automatic modification. Are the rates of change different among treatment groups? How quickly do individuals or groups of subjects achieve a satisfactory level and then stabilize at that level? The character of the predicted changes are only limited by the constraints of the investigator's questions and the study's design. For most studies, a simple linear change is predicted: What are the rates of change over time? For others, both a linear and a quadratic function are of interest. For example, the investigator could ask whether, in addition to potential differences in the rate of change over time, there are differences in how soon the predicted behaviors plateau (i.e., reaches an asymptotic level)? It is also possible that an investigator will predict three types of change functions over time: first an initial increase, then a plateau, and then a continued increase. This is what is called a cubic function. There are two prominent changes in direction. Consider this example: After an initial performance increase due to symptom relief, performance either plateaus or has a slight decline when the client discovers the "tougher problems" (e.g., that they must take some degree of responsibility for managing the factors influencing the problem). Further improvement can only occur when a strategy to deal with the "tougher problems" becomes evident. As the research question moves from linear to quadratic to cubic predictions, the demands on the study's design increases from at least two waves of data collection (i.e., at two different time points), to three and four waves of data collection. As is discussed later, most methodologists argue that at least three waves of data collection (pre-, during-, and posttreatment) are better than two. But many practitioners might argue that taking repeated measures on a client is intrusive on the clinical process. Another

< previous page

page_249

next page >

< previous page

page_250

next page > Page 250

strategy might be to collect pre-, post-, and 6- or 12-month follow-up data. But collecting follow-up data is both costly and often results in more missing data. There are two general approaches recommended to investigate the nature of change. Each approach has a different emphasis on the character of change, and its own assets and liabilities. The traditional approach is a repeated measures analysis of variance (univariate or multivariate), where between-group linear or curvilinear trends are contrasted among treatment groups (Myers & Well, 1991; O'Brien & Kaiser, 1985). A more recent development is growth curve analysis. This technique can be employed to describe changes in performance (for individuals and for groups) over time as part of an ongoing process (Bryk & Raudenbush, 1987; Francis et al., 1991; Rogosa & Willett, 1985; Willett et al., 1991). The analysis of variance model uses the semantics of, and restricts inferences to, differences among group trends over time relative to random variances of the trends among subjects within the groups. The semantics of growth curve analysis emphasize change as a process. The unit of analysis is the measure of change in the target behaviors over time for each subject. Growth curve analysis can develop hypotheses and describe results in terms of the process of change in individuals' behavior over time, as well as contrast observed change processes among treatment groups. The starting point for growth curve analysis is to select a growth model (or models) that can be used to estimate the measure of the change process for each individual. Most applications have employed a simple linear or quadratic expression to describe the change process; but, almost any mathematical expression can be used to describe the change process (e.g., linear, quadratic, cubic, exponential, Markovian). The major restriction is that there needs to be a theoretical basis for the model. For the purposes of exposition, discussion will center on predictions of linear change, using an example of treatment of families at risk of maladaptive parenting, child abuse, or neglect (Willett et al., 1991). These investigators were interested in within-family growth of family functioning (Ayoub & Jacewitz, 1982) over the course of treatment as it is related to entry level of family violence/maltreatment. The basic linear model used to describe growth in family functioning (FF) for the ith family over time, t, in months was:

The term pli is the slope of Equation 14 and is the unit of measure of the analysis. When this term is positive, change is in a positive direction, representing increasing levels of family functioning over the months of treatment. The estimation of pli is obtained by using an ordinary least squares procedure, where the monthly level of family functioning is entered to estimate the slope of the best-fitting straightline. This is easily estimated with any of the standard computer packages (e.g., MINITAB, SAS, SPSS, or SYSTAT). Stable estimates of the slope parameter can be found when measures are taken at three or more different times. Two-point estimations (e.g., pre- and posttreatment) typically do not have sufficient stability. Moreover, as Beutler (1991) suggested, the course and rate of progress (change) during the time period between the pre- and posttreatment could be where differences due to client characteristics are detected. This was the case in the two studies describe next. Once the estimate of the change parameter is obtained (pit in this case), it can be entered into its own prediction equation to fit the study design. The study conducted by Willett et al. (1991) focused on describing growth rates as they related to entry levels of family functioning (FF), violence/maltreatment (VM), and number of distressed

< previous page

page_250

next page >

< previous page

page_251

next page > Page 251

parenting problems (DD). The between-family linear regression model used to describe the predictive relation was:

The investigators found each of these factors to be significant contributors to rates of change in family functioning. For example, the growth rates of families with four or more parenting problems were slower, requiring more treatment to achieve a satisfactory level of family functioning. Although the example used here focused on a measure of linear change for one treatment intervention, it should be obvious that other measures of change can be estimated as well. For example, Francis et al. (1991) employed a quadratic function predicting the effect of treatment on the level of visual motor impairment (Yit) at time t, for children with head injuries:

Each subject's slope coefficient, pli, and the quadratic coefficient, p2i, is entered into a regression equation with three patient characteristic variables as predictor variables: age at onset of injury, initial severity of injury, and evidence of pupil contraction impairment. Age and initial severity were significant predictors of both the slope and quadratic coefficients. These investigators used an hierarchical linear model (HLM) developed by Bryk and Raudenbush (1987) and a software package for HLM (Bryk, Raudenbush, Seltzer, & Congdon, 1986) to test the predictors of the rate coefficients. This software package also provides estimates of the proportions of ''true" to "total" variance in rate measures accounted for by the model. In this case, the models developed by Francis et al. (1991) accounted for 79.4% of the variance among the subjects' rates of change. There are several additional advantages of growth curve analysis when contrasted with the traditional trend analysis of variance models. First, the number of repeated measures per subject does not have to be equal, nor does the interval between measures need to be the same. All that is required is a sufficient number of repeated measures to estimate the coefficients of change specified in the prediction model. This is not to say that the investigator does not need to be concerned about how many measures are to be taken or when in the course of treatment the observations are made. Both concerns, along with the structure of the prediction model itself, will influence the precision of the rate measures. If the number and spacing between observations is too haphazard, the precision will deteriorate sufficiently to prevent any significant findings to be detected. However, no data are lost due to minor variations in the number of observations or due to slight variations in spacing between observations. The HLM computer program does adjust for the degree of precision by considering the number and spacing between observations, when estimating each rate measure and when conducting the test of the prediction equation. Thus, the investigator can inspect the degree to which variation in numbers of observations and spacing has detracted from the estimation of the model's fit to the data. In summary, each of the two approaches has its uses. The best one is that which best fits the question raised by the investigator. The investigator, however, ought to experiment with the logic of both forms of analysis (differences in trend vs. mean differences in rates). The logic of repeated measures trend analysis of variance focuses on the average between-group differences in outcome over fixed intervals of time. If measures are taken at more than two intervals, then trend analysis can also test for between-group differences in trends; however, all subjects must have an equal number of observations taken at equal intervals. Subjects with missing data are discarded from

< previous page

page_251

next page >

< previous page

page_252

next page > Page 252

the entire analysis of trends. The growth curve analytical technique changes the focus (and unit) of analysis from the magnitude of a behavioral measure to its direction and rate of change. Growth curve analysis does not require that all subjects have the same number of observations or that the spacing between observations be exactly the same, although excessive variation in either will deteriorate the precision and therefore the power to detect significant differences. The technique of growth curve analysis is still too new to determine whether actual studies will be as productive as early billing provides. But if Beutler (1991) is correct, then these forms of analyses may bring to the surface the elusive treatment-by-client characteristic interactions. Question 4 What Effort (Dosage) Was Expended? What was the amount of time or effort expended for a consumer to achieve a "satisfactory" psychological state or level of functioning? Although the issues underlying this question have been proclaimed for some time (Carter & Newman, 1975; Fishman, 1975; Yates, 1980; Yates & Newman, 1980), it was not until the late 1980s that this question began to be recognized as part of a fundamental issue of outcome research (Howard et al., 1986, Newman & Howard, 1986). Recent interest appears to be centered on the economic concern regarding the worth of the investment in mental health care, rather than a scientific concern on how much is enough. The text by Yates (1996) is a wonderful exception. Yates provided a clear description of the steps needed for the research to understand how to analyze the costs incurred in the therapeutic and material efforts included in the procedures, processes, and outcomes of clinical and human services. Measures of effort are often easy to develop and are readily available to the researcher if there is a plan to collect the effort data. There are three major classes of effort measures that have served as either predictor or dependent variables in psychotherapy and mental health services research. They are dosage (i.e., the number of therapeutic events provided over the period of a clinical service episode), the level of treatment restrictiveness (i.e., the use of environmental manipulations to control the person's behavior during the clinical service episode), and the cumulative costs of the resources invested in treatment (i.e., the type of staff, staff time, and material resources consumed during the clinical service episode; Newman & Howard, 1986). Dosage is the measure most frequently employed when only a single modality is considered (e.g., number of days of inpatient or nursing home treatment). Restrictiveness measures can be developed at a sophisticated or a simple level. Hargreaves and his colleagues (Hargreaves, Gaynor, Ransohoff, & Attkisson, 1984; Ransohoff, Zackary, Gaynor, & Hargreaves, 1982) had panels of clinical experts, employing a magnitude estimation technique, scale the levels of restrictiveness for interventions designed to serve the seriously mentally ill. Newman et al. (1983) used a simpler approach to quantifying level of restrictiveness by giving a value of 1 to an outpatient visit, 2 to day treatment, and 3 to inpatient care in the treatment plans proposed by 174 clinicians for a standardized set of 18 cases. To create a dependent measure that combines dosage with restrictiveness of effort, these dosage and restrictiveness scores were cross-multiplied. Significant relations were found between this dependent measure and three predictor variables: levels of functioning at intake, level of social support, and level of cooperativeness at the start of treatment.

< previous page

page_252

next page >

< previous page

page_253

next page > Page 253

A measure of the costs of resources consumed combines the concepts of dosage and restrictiveness because the costs of staff time and the resources used to exert environmental control during the clinical service episode are summed to calculate the costs. However, the concept of employing costs as an empirical measure of therapeutic effort is still sufficiently new to the field such that there appears to be some misconceptions that inhibit its use in research. Newman and Howard (1986) described the three popular incorrect perceptions: 1. Confusion of costs with revenues. Revenues are the monies that come to the service from many different sources (payment of fees charged, grants, gifts, interest on cash in the bank). 2. Confusion of costs with fees charged. Fees charged may or may not cover costs of services provided. Profits accrue when fees collected are greater than costs, and deficits accrue when they are less. If all clients have similar diagnoses and problems, if they receive the same type and amount of treatment, and if the fees charged equal the costs of the resources used, then and only then do costs equal fees. However, for mental health programs and for private practice, the costs of the clinical efforts vary across consumers and therapeutic goals. 3. Confusion of costs and fees in private practice. This is being recognized as a myth by more and more private practice clinicians. Unfortunately, this myth is being perpetuated by third-party payer reimbursement practices where a single reimbursement rate is being set for a broad spectrum of diagnoses, independent of clients' levels of psychosocial functioning, social circumstances, and therapeutic goals. It is intuitively obvious to most private practice clinicians that not all clients require the same levels of care to achieve a satisfactory psychological or functioning level. It is also obvious that it is more profitable to restrict a practice to those consumers who can be profitably treated within the limits set by reimbursable fees rather than by the treatment goals achieved. Unprofitable clients might, unfortunately, be referred elsewhere. There are three statistical approaches that can be usefully applied to measures of effort: probit analysis, focusing on the cumulative proportion of the sample that has achieved a criterion of success (or failure) at each level (dose) of the intervention's events; log-linear analysis, using a multidimensional test of independence of two or more variables (each having two or more levels) in predicting two or more classes of outcomes; and univariate and multivariate regression and variance analysis, focusing on the unique characteristics and limitations of applying these traditional approaches to analyzing effort data. The shape of the distributions of measures of effort is of some concern. They are typically positively skewed, with as much as 3% to 10% of the subjects having effort measures 3+ standard deviations above the median. The first two approaches are less affected by the precise shape of the distribution of the measure of effort than the last approach. The focus of questions addressed by probit and log-linear analyses is on the relative frequency of observations that fall within given classes or ranges of outcome. The distribution of the measures of effort are more important for univariate and multivariance regression and variance analyses that have the usual assumptions of parametric statistics (e.g., normality, independence between within-group error variance, and group assignment). The difficulty of analyzing extremely skewed distributions is typically dealt with by one of two methods. One is to drop the "outliers," that is, those subjects with extreme scores (e.g., the top 10% or 20%) from the analysis. Another approach is to transform the values to produce a more normal appearing distribution. The arcsine and the log transformations are two popular transformations. These approaches may have negative consequences of either dropping data that should be considered or transforming the conceptual base to mean something other than it was originally conceptualized to mean.

< previous page

page_253

next page >

< previous page

page_254

next page > Page 254

Investigators often will invoke either of these approaches to deal with the statistical issues without considering what the implications are to the clinical aspects. Although probit and log-linear analyses do not require throwing away data, they do use a log transformation of the data during the analytic process. For the examples reviewed here, the probit analyses do appear to have preserved the conceptual basis of the studies as they were designed by the investigators. It also can be argued that both probit and log-linear analyses have their own set of negatives. The principal negative aspect is that relatively large samples are required to assure that observed differences in relative frequencies are stable. With these notes of caution, the conceptual basis underlying and the applications of each of the three approaches are described. Probit Analysis. The basic unit of probit analysis is the proportion of subjects within a specific group to achieve a satisfactory level of psychological or social functioning after a given dosage of treatment has been provided. Howard et al. (1986) described this relationship as "the amount of treatment (dose) needed to achieve a specific percentage of patient improvement (effect)" (p. 160). A probit model is created that uses the observed proportions of subjects in the jth group to achieve a satisfactory outcome at the ith dosage level,

To estimate the values of Aj and Bj in the model for the jth group, the observed values (proportions of subjects to achieve a satisfactory criterion at each dosage) are entered into a maximum likelihood procedure. Once estimated for a group, a model is created that generates a function describing the expected relations between dose and proportions to achieve a satisfactory outcome. The probit analysis provided in most statistical packages will generate a set of probit values and the estimated standardized proportions of persons expected to achieve a measured satisfactory state or level of functioning for each successive dose level, along with a the 95% confidence intervals about each probit value within a given group. Thus, for each group, the analysis provides dosage by success rate functions, along with a 95% confidence interval about each function. The extent of nonoverlap between the 95% interval envelopes for two or more groups will describe the statistical significance of between-group differences. Howard et al. (1986) found significantly different dose-effect functions for three diagnostic groups receiving psychotherapy: depression, anxiety, and borderline psychotic. There are three major limitations of probit analysis in addressing the question, "How much is enough?" One is that relatively large samples are needed to refine the dosage levels, probably more than 50 subjects per treatment group. The second is that only between-group main effects can be tested. However, it is possible to contrast the overlap in the 95% confidence interval across dose levels (or "confidence interval envelop") among any two or more groups. This results in a test of simple effects among groups in a design with two or more between-group variables; therefore, experimentwise error rates are an important concern and should be controlled (e.g., employing a Bonferroni correction of Type I error rates per comparison). The third limitation is that probit analysis is only applicable to measures of frequency such as the dose-effect relations. It cannot be applied to measures of intensity such as the restrictiveness or the cumulative cost classes of effort measures. The next two sets of approaches can be used for all three classes of effort measures.

< previous page

page_254

next page >

< previous page

page_255

next page > Page 255

Log-Linear Analysis of Effort Measures. The basic unit of measure when applying a log-linear analytic approach is the rank order of the magnitude of the effort measure when considering all subjects across all groups. For example, if there are 320 subjects in four groups, 80 per group, then the rank order values on the effort measure can vary from 1 to 320. Once subjects receive their rank order score, then any between-group rank order (nonparametric) statistical approach can be applied. Here, the log-linear analysis is considered because of its ability to consider higher level designs, with at least one treatment and one consumer characteristic variable. Table 8.2 provides an example of the general form of an analysis that can be considered by log-linear analysis. Consider that researchers are working with persons who have a serious and persistent mental illness and who are entering a community support program. At admission, they are first evaluated for their levels of interpersonal (including communication) skills, along with other characteristics. Half of those who score at a low level of interpersonal skills (Client Group B-1) are randomly assigned to a program that focuses on social and community functioning skills, where the treatment team works out of an office adjacent to a consumer-run dropin center (Group A-1, B-1). The remaining half of the consumers with low interpersonal skill are assigned to a program whose treatment team interventions focus on symptom control with a case manager who works out of a community mental health center (Group A-2, B-1). The same random assignment to Groups A-1 and A-2 are made for those clients scoring at the moderate to high levels of interpersonal skills (i.e,, assigned to Groups A-1, B-2 and A-2, B-2, respectively). Thus, in this example there are two treatment groups and two levels of a client characteristic (i.e., the moderator variable of entry level of interpersonal skills at low or moderate-high). The client characteristic is expected to interact with the effects of the therapeutic intervention. TABLE 8.2 The Frequencies of Subjects in Each Group for the Successive Quartiles When Ranked By the "Cumulative Costs of the Clinical Service Episode" Quartile Ranking of Cumulative Costs of Service Episode Treatment Client Group Q-1 Q-2 Q-3 Q-4 Group Lowest 26th to 51st to Highest Cost 50th 75th Cost 1st to 25th percentile percentile 76th to percentile 100th percentile f[111] f[112] f[113] f[114] A-1 Social B-1, Low Rehabilitation Interpersonal Treatment Skills Team f[121] f[122] f[123] f[124] B-2, Moderate to High Interpersonal Skil f[211] f[212] f[213] f[214] A-2 B-1, Low Symptom Interpersonal Control, Case Skills Manager at f[221] f[222] f[223] f[224] B-2, Moderate CMHC to High Interpersonal Skills Sum of Columns Equals: 25% of 25% of 25% of 25% of Total Total Total Total Note: The cell frequencies are described as f(ijk), for the ith quartile, in the jth treatment group and the kth level of the client characteristic.

< previous page

page_255

next page >

< previous page

page_256

next page > Page 256

The cumulative costs of treatment for each of the 320 consumers can be calculated for the first 6 months of treatment. These include the costs of personnel time and the materials consumed by agency personnel while serving the consumers over their first 6 months in the community support program. If cumulative costs of serving these consumers over the 6-month period were independent of either treatment or interpersonal communication skills, then it could be expected that the 80 subjects within each of the four groups would be evenly distributed across the cells of Table 8.2 (i.e., 20 subjects per cell). Because the columns represent the four respective quartiles, the columns will always sum to 25% of the sample. Based on the null hypothesis for all effects, there would be 20 consumers in each cell, indicating that the distribution of cumulative service costs are independent of either treatment or client characteristic. The outcome indicated that the most cost-efficient group would be the group-row with the largest observed cell frequencies in the lower quartile cells (Q-1 and Q-2) and the smallest observed cell frequencies in the higher quartile cells (Q-3 and Q-4). Although the logic here is that of a chi-square test of independence (testing whether row assignments are independent of column outcomes), the multidimensional classification (treatment group-by-client characteristic) nullifies the simple test of independence provided by the ordinary chi-square test. Log-linear analysis can provide a test of a cell frequency's independence of the association with the combinations of column and multiple row classifications that define the cell. As with the classical test of independence (chi-square), observed cell values are contrasted with expected cell values. Given that the cells are embedded in a multivaried classification scheme (three or more classes), the expected cell values need to be adjusted for main and firstorder interaction effects. The mathematical technique employed is to model each cell frequency using a natural log transformation of the observed frequencies. This permits the development of additive rather than exponential models to describe the relations among classifications. Considering the example, there is interest in testing the independence of each of the two main effects (treatment type and consumer characteristic) and the interaction with quartile ranking of cumulative service costs. The likelihood that the magnitude of service costs (level of Qi) is associated with type of treatment (Aj), or client characteristic (Bk), or both is being assessed. The natural log of the expected cell frequency, f, for the ijkth cell is

where µ is the average of all of the natural log frequencies within the table. Each of the omega terms are parameters estimated for each of the marginal effects. Each of the marginal effects is obtained in a fashion similar to a univariate analysis of variance. This can be seen, by example, in computing the parameter for Aj, WA-j = (µj - µ), where µj is the average of the natural log of the cell frequencies contained within Aj. The test statistic for the interaction of costs by treatment (Q by A) would be derived from the ordinary chisquare test for independence,

but employing the natural log values, an alternative, and widely used statistic called the Likelihood Ratio, L2, is computed as

< previous page

page_256

next page >

< previous page

page_257

next page > Page 257

Most statistical packages require that the user identify a design describing the interactions of interest and set an hierarchical order to the effects of interest. For example, the highest order interaction is Q-by-A-by-B, but there are only two other terms of interest: Q × A and Q × B. As is true of any hierarchal model, if the full (secondorder) interaction of Q × A × B is significant, then follow-up tests must be contingency tables investigating simple main effects and the interactions. Because the follow-up tests could inflate Type I error, two precautions are recommended. First, plan the follow-up tests in advance, restricting the number of planned comparisons to the degrees of freedom available (three in the example used here). Second, adjust the Type I error level of the follow-up tests to be more conservative using the Bonferroni correction (e.g., dividing the Type I error rate used to test the interaction by the number of degrees of freedom: .05 ÷ 3 = .017). The two major limitations of the a log-linear approach are that relatively large samples are required to obtain stable results, and the analysis leads to conclusions regarding treatment cost-efficiency and not costeffectiveness or benefit. The issue of sample size can sometimes be handled by careful consideration of expected outcomes. As with most chi-square techniques, expected cell frequencies should be greater than or equal to five for the highest order of classification (the ijkth cell in the current example). It is possible to establish a model that excludes certain cells from the analysis, provided that the exclusion can be logically defended as to why those cells ought to contain a count approaching zero. This could happen when contrasting a very inexpensive procedure with a very expensive procedure, where the investigator is interested in the middle level cost values and the interactions with two or more predictor variables. In this case, it is quite possible for the expected cell frequencies for the lowest quartile for the very expensive procedure to have expected values under five. Most computer packages permit the user to specify the cell structure and the model to be tested. If there is a good rationale for zero frequency cells, then this option should be used in analyzing the data. The issue that this design only attends to cost-efficiency and not to cost-effectiveness is best handled by treating this analysis as part of larger analysis where tests of treatment effectiveness will be considered alongside of the test of treatment costs, as illustrated by Fishman (1975). He used a strategy that considers the results of a treatment effectiveness study along with a cost-efficiency analysis (see Table 8.3). Fishman recommended a two-dimensional array contrasting the results of the cost study (the columns in the Table 8.3) with the results of the effectiveness study (the rows in Table 8.3). For seven of the nine combinations of dual outcome cost analyses, an investigator or policymaker would be able to decide on the most cost-effective choice. The issue of the need to consider an outcome (effectiveness) study along with a cost-efficiency study is also to be considered when performing the variance and regression analyses of costs (see the discussion that follows). Multivariate and Univariate Regression and Analysis of Variance of Effort Measures. There are some interesting possibilities when considering regression or variance analysis with cumulative costs, dosage, or restrictiveness measures. One possibility is to use an effort measure alongside progress or outcome measures in a multivariate regression or variance analysis. This would address the research issue of whether the independent (treatment or consumer characteristic) variables produce differences in client outcome profiles alongside of differences in cumulative costs of serving them. It is obvious that when consumers improve quickly, less long-term effort is needed; and when they are slow to react to treatment and slow to change, more or extended effort is required. Thus, it is defensible to include an effort measure along with the progress or outcome measure in the prediction equation that defines the regression or the variance analysis.

< previous page

page_257

next page >

< previous page

page_258

next page > Page 258

TABLE 8.3 Decisions Possible When Cumulative Costs and Outcome Effectiveness Are Jointly Compared Cumulative Episode Costs Outcome AB A better than Choose A Choose A No B decision A equal to B Choose A Choose Choose B either B better than No Choose B Choose B A decision Another possibility is to investigate the covariance structures of effort with consumer characteristics as they relate to outcome in a multiple regression analysis. Here the interaction (covariance) of the level of effort with the level of client characteristic is treated as a predictor variable and one or more client progress or outcome behavior measures serves as the dependent variable. The specific test is whether the slopes of the regression coefficients on one predictor dimension (e.g., dosage) differ across levels of the other dimension (e.g., initial severity of disorder prior to treatment). Finally, a major cautionary note regarding positively skewed distributions needs to be restated here. Dropping the data for the outlying top 5% to 20% has been used in many diagnostic-related grouping (DRG) studies and is accepted in some quarters. Others have felt that eliminating data should be avoided and a transformation that will approximate normality should be used instead. Some outliers are so extreme that even accepted transformations (e.g., arcsin, log or natural log) do not sufficiently modify the distribution to be acceptably normal. In these cases, it is often required to drop the drastically extreme cases and do a normalizing transformation as well. Question 5 Did the Person's State(S) or Levels of Functioning Stabilize? Although traditional research and statistical methods are designed to test for differences in behaviors or rates rather than testing for stability, it is possible to evaluate a prediction regarding stabilizing functional behaviors. The key is to carefully develop questions that follow the logic of the treatment goal of stability and to identify and collect the data needed for the corresponding dependent measures. The investigative methods and statistical analysis will follow from well-formulated questions. The following is a presentation of examples of several lines of questions. Contrasting Measures of Variations in Behaviors for Specific Periods of Time. Researchers often ask themselves, "Did fluctuations per unit of time (day, week, month, year) in psychological state or functioning change (decrease) as a result of the intervention? If so, what was the duration, amount, or rate of this change in variation from an unstable to a stable state within and across groups?" Here, trend or growth curve analysis can be applied, depending on the specification of the dependent measure. Some examples are: Count the number of fluctuations in a target behavior of a given magnitude per unit time; measure the duration of time the person remains within a given range of functioning; or estimate the rate of change from a unstable state to a stable state following a crisis.

< previous page

page_258

next page >

< previous page

page_259

next page > Page 259

Contrasting Differences in Odds or Probabilities. Results may also generate another question (stated in its formal null form), "What are the odds during a given unit of time for no housing or civil crisis to occur for individuals within and across groups?" Here logit, probit, or log-linear analytical approaches could be employed to contrast the outcomes of two or more treatment and/or client groups. As is true of all procedures that analyze relative frequencies or probabilities within and across categories, the definitions of categories or baseline conditions is very important. How does one define a housing or a civil crisis? Two criteria should be reviewed. First, the scheme for categorization identifies individuals as belonging to one and only one group, and to be in one and only one outcome category. Formally, this is the criterion for all events to be mutually exclusive and exhaustive. The second criterion is to assure that assumptions of independence or dependence are logically defensible in terms of the clinical theory. Still another line of questioning would follow traditional statistical tests where changes in the magnitude of the behaviors can be contrasted between groups over time: "What is the number of productive employment hours (independent of wages earned) by an individual?" Here univariate or multivariate analysis of variance or regression analysis can be readily applied. If sample size and precision permits, then structural equation modeling can also be applied. Moreover, the logic of the question could be easily modified to apply growth curve analysis of the rates of change in these magnitude measures. Thus, for Question 5, the issue is not so much one of what statistics to use, but one of carefully formulating a question (prediction) that can be analyzed. Logical traps that would require acceptance of the null hypothesis to demonstrate the worth of an intervention must be avoided. Predictions of stabilized functioning can be, and should be, tested if they are to be considered as reasonable treatment or service goals. There is a balance required. On the one hand, researchers do not want to compromise a clinical theory by fashioning a testable question. On the other hand, a good clinical theory ought to be testable. When confronting treatment goals of "stable functioning" most researchers have to learn to revise the way they formulate a testable question. Question 6 What Occurred During the Process of Treatment? What characteristics changed during the process of treatment? At what stages in the process did the change occur? How did the changes relate to final outcome characteristics? Historical Notes on Process Research and Its Measures. To date, most "process" research has been conducted within the content of specific forms of psychotherapy: individual psychotherapy (Orlinsky & Howard, 1986), group psychotherapy (Kaul & Bednar, 1986), and family therapy (Gurman, Kniskern, & Pinsof, 1986). No controlled research has been published outside of the psychotherapy literature. For example, none was found for the treatment team or case management or psychosocial approaches used in treating persons with a serious mental illness. The focus of the psychotherapy process research has, for the most part, been on the various aspects of the relation between the therapist and the client or the client's social system (e.g., family). The process measures employed are seldom standardized in the same fashion as the outcome measures discussed (for the most part) in this text. Instead, the process measures focus on the observable behaviors taken from videotapes or

< previous page

page_259

next page >

< previous page

page_260

next page > Page 260

transcripts, or from reports by the client or therapist about what occurred during or between the therapeutic interactions. Although reliability studies are frequently reported on the process measures, the basis for establishing validity of these measures is not clearly understood. Some session report techniques have been extensively studied (e.g., the Orlinsky & Howard, "Therapy Session Report," 1975). However, the majority of the techniques reported in the literature were specifically designed for a particular study. Some of the more popular instruments used in recent years are: the Therapy Session Report (Orlinsky & Howard, 1975), the Vanderbilt Negative Indicators Scale (Sachs, 1983), Structural Analysis of Social Behavior (Benjamin, Foster, Roberto, & Estroff, 1986); Helping Alliance Scale (Laborsky, Crits-Christoph, Alexander, Margolis, & Cohn, 1983), and the Working Alliance Scale (Horwrath & Greenberg, 1988). Others have developed systematic taxonomies for evaluating the content or tone of therapy sessions (Elliot, 1985; Stiles & Shapiro, 1994). Although none have norms that can be applied in the same fashion as those used with traditional psychological testing, each has a record of several published studies showing some degree of significant discriminative validity. Should process be related to outcome? Orlinsky and Howard (1986) and Silbershatz (1994) argued strongly that process ought to be related to outcome. Stiles and Shapiro (1994) argued that a true process measure should not be correlated with outcome. It will be left to the reader to decide which side of the argument to take, or to join those who still see it as an issue to be empirically settled (Newman, 1994). Orlinsky and Howard (1986) offered a fruitful "generic model" that outlines the process-outcome research literature. The outline presented in the generic model, and the literature cited, is recommended as a good starting point when designing a study on process-outcome relationships. The generic model has five interrelated components describing the therapeutic process: (1) The therapeutic contract is the purpose, format, terms and limits of the therapeutic enterprise. (2) Therapeutic interventions comprises the `business' of helping carried on under the terms of the therapeutic contract. (3) The therapeutic bond is an aspect of the relationship that develops between the participants as they perform their respective parts in the therapeutic interventions. (4) Patient's self-relatedness refers to the patient's ability to absorb the impact of therapeutic interventions and their therapeutic bond. (5) Therapeutic realizations, such as insight, catharsis, discriminative learning, and so on, occur within the session and presumably are productive of changes in the patient's life or personality. (Orlinsky & Howard, 1986, pp. 312-313) The areas of process research covered within each of the five areas of process research are well documented in Orlinsky and Howard. This research literature, for the most part, used the statistical procedures already covered in this chapter. Thus, no further discussion is needed beyond the recommendation that the generic model can be used to identify an area of interest in process research and then to review the studies cited as a guide for designs and statistical models. Having said this, consider one additional approach that was not covered by the literature in Orlinsky and Howard (1986), but nevertheless shows significant promise. Interpersonal Interaction Analysis in Therapy. Most psychosocial therapies involve the exchange of information and feelings among those involved. The interpersonal interaction process in therapy is not haphazard when done by a professional, but rather is goal directed. It should follow a predictable pattern over therapy sessions, with understandable variations as a function of the content of the material covered within a therapy session and/or the stage of the treatment.

< previous page

page_260

next page >

< previous page

page_261

next page > Page 261

The content of a therapy session which is to be analyzed is what is said, by whom, and in what context (i.e., within the context of what was said before). During a session, people (clients or therapists) can react to what others said or to their own line of thought from one utterance to another. Consider a second example, as shown in Table 8.4, that was offered by Canfield, Walker, and Brown (1991) as they attempted to describe the interpersonal and the intrapersonal interactions that can take place during a therapy session. The investigators classified all of the words in an utterance by the client and by the therapist according to whether each word in the utterance represented the degree of positive or negative valance (from +4 to -4) on each of three dimensions: emotional, cognitive, and contract. Table 8.4 describes the results of this analysis in one therapy session by giving the correlations between pairs of classes of utterances during the verbal interactions between the client and therapist, as well as the correlations of pairs of intrapersonal utterances. The data in Table 8.4 are presented as three correlation matrices. The 36 (6 × 6) entries on the lower left represent the contingent relationships between the client and the therapist. According to Canfield et al. (1991), the correlation of .40 between positive emotion and positive cognition in the client's intrapersonal correlation matrix indicates that if a client utterance was rated high in positive emotion, it was likely also to be rated high in positive cognition. Furthermore, the correlation of 0.46 in the therapist's intrapersonal correlation matrix between negative emotion and negative contract indicates that when the therapist utterance was rated high in negative emotion, it was likely to be rated high in contract as well. (p. 62) The major asset of developing the assessment of within-session interactions in this manner is that once the correlation matrix has been developed, then all of the procedures of regression, variance, or path analysis can be applied. Consider some examples: Will the matrices be the same over different stages of the treatment process? Will sessions rated ''rough" have a different correlation matrix than "information" sessions? What is the content of sessions that contained a "critical incident" that change the focus of therapy? It is recommended that the reader focus on the methods rather than on the specific content of the analysis. Other investigators may choose to use different systems for classifying utterances within therapy sessions. The form of the analysis could still be applied to other classification systems. Although the amount of effort to do these forms of analyses is high, it does offer great potential for understanding the basic ingredients of the therapeutic process and their interactions. When, in the not-too-distant future, the spoken word can be inexpensively digitized onto computer files for analysis, then the applications of these procedures should become as common as outcome studies. Conclusions The application of statistical methods to the analysis of data collected to address clinical issues has been traditionally awkward. Statistics are typically taught with examples from the literature. This chapter reversed the order: The clinical issue, along with its theoretical basis and clinical empirical findings, was presented first. Using this base, the chapter explored the relative merits of various statistical methods that could be applied to data collected to address each clinical issue. The expectation is that if clinical researchers use this approach, they may have a better chance of generating studies, with the analysis of the data, that maintain the integrity of the study's clinical issues.

< previous page

page_261

next page >

< previous page

page_262

next page > Page 262

TABLE 8.4 Interpersonal Interaction-contingency Matrix (Lower Left 6 × 6 Entries); Intrapersonal Correlation Matrix for the Client (Upper Triangular Array); and Intrapersonal Correlation Matrix for the Therapist (Lower Triangular Array) Client Utterances Therapist Utterances Emotion Cognition Contract Emotion Cognition Contract Variable 1(+) 2(-) 3(+) 4(-) 5(+) 6(-) 7(+) 8(-) 9(+) 10(-) 11(+) 12(-) Client 1. + Emotion .39 2. Emotion .40 .17 3. + Cognition .06 .07 .08 4. Cognition .29 .36 .29 .23 5. + Contract .25 .41 .14 .18 .44 6. Contract Therapist .48 .23 .15 .12 .12 .01 1. + Emotion .09 .01 .14 .10 .02 .09 .30 2. Emotion .03 .17 .10 .04 .34 .05 .19 .28 3. + Cognition .10 .02 .14 .07 .07 .18 .23 .11 .06 4. Cognition 0.40 .17 .20 .11 .12 .22 .33 .44 .13 .28 5. + Contract .05 .08 .29 .10 .00 .07 .17 .46 .20 .12 .37 6. Contract Note. The cell frequencies are described as f(ijk), for the ith quartile, in the jth treatment group and the kth level of the client characteristic.

< previous page

page_262

next page >

< previous page

page_263

next page > Page 263

Acknowledgments Thanks are owed to our thoughtful and patient colleagues, Siobhan A. Morse, for her recommendations and comments on drafts of this chapter, and José Szapocznik and colleagues at the Center for Family Studies, for their support and comments. References Ayoub, C., & Jacewitz, J. (1982). Families at risk of poor parenting: A descriptive study of 60 at risk families in a model prevention program. Child Abuse and Neglect, 6, 413-422. Benjamin, L., Foster, S., Roberto, L., & Estroff, S. (1986). Breaking the family code: Analysis of videotapes of family interactions by structural analysis of social behavior (SASB). In L. Greenberg & W. Pinsof (Eds.), The psychotherapeutic process: A research handbook (pp. 391-438). New York: Guilford. Bentler, P.M. (1989). EQS structural equations program manual. Los Angeles: BMPD Statistical Software. Bentler, P.M. (1990). Comparative fit indexes in structural equation modeling. Psychological Bulletin, 88, 588606. Bentler, P., & Bonett, D. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588-606. Beutler, L.E. (1991). Have all won and must all have prizes? Revisiting Luborsky et al.'s verdict. Journal of Consulting and Clinical Psychology, 59, 226-232. Bollen, K.A. (1989). Structural equations with latent variables. New York: Wiley. Bryk, A.S., & Raudenbush, S.W. (1987). Application of hierarchical linear models to assessing change. Psychological Bulletin, 101, 147-158. Bryk, A.S., & Raudenbush, S.W., Seltzer, M., & Congdon, R.J. (1986). An introduction to HLM: Computer program and user's guide. Chicago: University of Chicago. Canfield, M L., Walker, W.R., & Brown, L.G. (1991). Contingency interaction analysis in psychotherapy. Journal of Consulting and Clinical Psychology, 59, 58-66. Carter, D.E., & Newman, F.L. (1975). A client oriented system of mental health service delivery and program management: A work-book and guide (Series FN No. 4, DHHS Publication No. 80-307). Rockville, MD: Mental Health Service System Reports. Collins, L.M., & Horn, J.L. (Eds.). (1991). Best methods of the analysis of change: Recent advances, unanswered questions, future directions. Washington, DC: American Psychology Association. Cronbach, L.J. (1987). Statistical tests for moderator variables: Flaws in analyses recently proposed. Psychological Bulletin, 102, 414-417. Cronbach, L.J., & Furby, L. (1970). How we should measure "change"or should we? Psychological Bulletin, 74, 68-80. Cronbach, L.J., & Snow, R.E. (1977). Aptitudes and instructional methods: A handbook for research on interactions. New York: Irvington. Elliot, R. (1985). Helpful and non-helpful events in brief counseling interviews: An empirical taxonomy. Journal of Consulting and Clinical Psychology, 32, 307-321. Fishman, D.B. (1975). Development of a generic cost-effectiveness methodology for evaluating patient services of a community mental health center. In J. Zusman & C.R. Wurster (Eds.), Evaluation in alcohol, drug abuse, and mental health service programs (pp. 139-159). Lexington, MA: Heath. Francis, D.J., Fletcher, J.M., Strubing, K.K., Davidson, K.C., & Thompson, N.M. (1991). Analysis of change: Modeling individual growth. Journal of Consulting and Clinical Psychology, 59, 27-37. Garfield, S.L. (1986). Research on client variables in psychotherapy. In S.L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change (pp. 503-543). New York: Wiley. Gurman, A.S., Kniskern, D.P., & Pinsof, W.M. (1986). Research on marital and family therapies. In S.L. Garfield & A.E. Bergin (Eds.),

< previous page

page_263

next page >

< previous page

page_264

next page > Page 264

Handbook of psychotherapy and behavior change (pp. 525-564). New York: Wiley. Hargreaves, W.A., Gaynor, J., Ransohoff, R., & Attkisson, C.C. (1984). Restrictiveness of care among the severely mentally disabled. Hospital and Community Psychiatry, 35, 706-709. Hayduk, L.A. (1987). Structural equation modeling with LISREL: Essentials and advances. Baltimore, MD: Johns Hopkins University Press. Howard, K.I., Kopta, S.M., Krause, M.S., & Orlinsky, D.E. (1986). The dose-effect relationship in psychotherapy. American Psychologist, 41, 159-164. Horwrath, A.O., & Greenberg, L. (1986). The development of the Working Alliance Inventory. In L. Greenberg & W.M. Pinsof (Eds.), The psychotherapeutic process (pp. 529-556). New York: Guilford. Jaccard, J., & Wan, C.K. (1995). Measurement error in the analysis of interaction effects between continuous predictors using multiple regression: Multiple indicator and structural equation approaches. Psychological Bulletin, 117, 348-357. Jöreskog, K.G., & Sorbom, D. (1988). LISREL VII. Chicago: SPSS. Jöreskog, K.G., & Sorbom, D. (1993). LISREL 8: Structural qquation modeling with simple command language. Chicago: Scientific Software. Kaul, T.J., & Bednar, R.L. (1986). Research on group and related therapies. In S.L. Garfield & A.E. Bergin (Eds.), Handbook of psychotherapy and behavior change (pp. 671-714). New York: Wiley. Kazdin, A.E. (1986). The evaluation of psychotherapy: Research design and methodology. In S.L. Garfield & A.E. Bergin (Eds.), Handbook of psychotherapy and behavior change (pp. 23-68). New York: Wiley. Kenny, D.A. (1979). Correlation and causation. New York: Wiley. Laborsky, L., Crits-Christoph, P., Alexander, L., Margolis, M., & Cohn, M. (1983). Two helping alliance methods for predicting outcomes of psychotherapy: A counting signs versus a global rating method. Journal of Nervous and Mental Diseases, 171, 480-492. Lord, F.M. (1963). Elementary models for measuring change. In C.W. Harris (Ed.), Problems in measuring change (pp. 21-38). Madison: University of Wisconsin Press. Lyons, J.S., & Howard, K.I. (1991). Main effects analysis in clinical research: Statistical guidelines for disaggregating treatment groups. Journal of Consulting and Clinical Psychology, 59, 745-748. McClelland, G.H., & Judd, C.M. (1993). Statistical difficulties of detecting interactions and moderator effects. Psychological Bulletin, 114, 376-390. Myers, J.L., & Well, A.D. (1991). Research design and Statistical analysis. New York: Harper-Collins. Newman, F.L. (1983). Therapists' evaluations of psychotherapy. In M. Lambert, E. Christensen, & R. DeJulio (Eds.), The Assessment of psychotherapy outcome (pp. 497-534). New York: Wiley. Newman, F.L. (1994). When is observing non-significance enough? (Introduction to Special Feature). Journal of Consulting and Clinical Psychology, 62, 941. Newman, F.L., Ciarlo, J.A., Carpenter, D. (1998). Guidlines for selecting psychological instruments for treatment planning and outcome. In M. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. Newman, F., DeLiberty, R., Hodges, K., & McGrew, J. (1997, June). Hoosier assurance plan: Linking level of care to level of need. Paper presented at the National Conference for Mental Health Statistics, Washington, DC. Newman, F.L., Griffin, B.P., Black, R.W., & Page, S.E. (1989). Linking level of care to level of need: Assessing the need for mental health care for nursing home residents. American Psychologist, 44, 1315-1324. Newman, F.L., Heverly, M.A., Rosen, M., Kopta, S.M., & Bedell, R. (1983). Influences on internal evaluation data dependability: Clinicians as a source of variance. In A.J. Love (Ed.), Developing effective internal evaluation: New directions for program evaluation (No. 20, pp. 61-69). San Francisco: Jossey-Bass. Newman, F.L., & Howard, K.I. (1986). Therapeutic effort, treatment outcome, and national health policy. Journal of Consulting and Clinical Psychology. 41, 181-187. Newman, F.L., & Sorensen, J.E. (1985). Integrated clinical and fiscal management in

< previous page

page_264

next page >

< previous page

page_265

next page > Page 265

mental health: A guidebook. Norwood, NJ: Ablex. Newman, F.L., Tippet, M.T., & Johnson, D.A. (1992, July). A screening instrument for consumer placement in a level of CSP: Psychometric properties. Paper presented at the NIMH National Conference on Mental Health Statistics in Washington, DC. O'Brien, R.G., & Kaiser, M.K. (1985). MA-NOVA method for analyzing repeated measures designs. Psychological Bulletin, 97, 316-333. Orlinsky, D.E., & Howard, K.I. (1975). Varieties of psychotherapeutic experience. New York: Teacher's College Press. Orlinsky, D.E., & Howard, K.I. (1986). Process and outcome in psychotherapy. In S.L. Garfield and A.E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed., pp. 311-381). New York: Wiley. Ransohoff, P., Zackary, R.A., Gaynor, J.A., & Hargreaves, W.A. (1982). Measuring the restrictiveness of psychiatric care. Hospital and Community Psychiatry, 33, 361-366. Rogosa, D.R., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change. Psychological Bulletin, 92, 726-748. Rogosa, D.R., & Willett, J.B. (1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20, 335-343. Rogosa, D.R., & Willett, J.B. (1985). Understanding correlates of change by modeling individual differences in growth. Psychometrika, 50, 61-72. Rosnow, R.L., & Rosenthal, R. (1989). Definition and interpretation of interaction effects. Psychological Bulletin, 105, 143-146. Sachs, J.S. (1983). Negative factors in brief psychotherapy: An empirical assessment. Journal of Consulting and Clinical Psychology, 51, 557-564. Saunders, S.M. (1991). The process of seeking psychotherapy: Routes, difficulty and social support. Unpublished doctoral dissertation, Northwestern University, Psychology Department, Evanston, IL. Scandura, T.A., & Tejeda, M.J. (1997, August). Models as fiction in structural equation modeling. Paper presented at the annual meeting of the Academy of Management, Boston, MA. Shadish, W.R., & Sweeney, R.B. (1991). Mediators and moderators in meta-analysis: There's a reason we don't let dodo birds tell us which psychotherapies should have prizes. Journal of Consulting and Clinical Psychology, 59, 883-893. Shoham-Soloman, V. (1991). Introduction to special section on client-therapy interaction research. Journal of Consulting and Clinical Psychology, 59, 203-204. Shoham-Soloman, V., Avner, R., & Neeman, R. (1989). Your are changed if you do and changed if you don't: Mechanisms underlying paradoxical interventions. Journal of Consulting and Clinical Psychology, 57, 590-598. Silbershatz, G. (1994). Spurious or uncorrelated? Comments on Stiles and Shapiro. Journal of Consulting and Clinical Psychology, 62, 949-951. Smith, B., & Sechrest, L. (1991). Treatment of aptitude x treatment interactions. Journal of Consulting and Clinical Psychology, 59, 233-244. Snow, R.E. (1991). Aptitude-treatment interaction as a framework for research on individual differences in psychotherapy. Journal of Consulting and Clinical Psychology, 59, 205-216. Stiles, W.B., & Shapiro, D.A. (1994). Abuse of the drug metaphor: Psychotherapy process-outcome correlations. Journal of Consulting and Clinical Psychology 62, 942-948. Turner, R.M., Newman, F.L., & Foa, E. (1983). Relating obsessive-compulsive emotional structure to the cost and outcome of long term behavior therapy. Journal of Consulting and Clinical Psychology, 59, 233-244. Uehara, E., Smukler, M., & Newman, F.L. (1994). Linking resource use to consumer level of need in a local mental health system: Field test of the "LONCA" case mix method. Journal of Consulting and Clinical Psychology, 62, 695-709. Webster, H., & Bereiter, C. (1963). The reliability of changes measured by mental test scores. In C.W. Harris (Ed.), Problems in measuring change (pp. 39-59). Madison, WI: University of Wisconsin Press. Willett, J.B. (1988). Questions and answers in the measurement of change. In E.Z. Rothkopf (Ed.), Review of research in education (Vol. 15, pp. 345-422). Washington, DC: American Educational Research Association.

< previous page

page_266

next page > Page 266

Willett, J.B. (1989). Some results on reliability for the longitudinal measurement of change: Implications for the design of studies of individual growth. Educational and Psychological Measurement, 49, 587-602. Willett, J.B., Ayoub, C.C., & Robinson, D. (1991). Using growth modeling to examine systematic differences in growth: An example of change in the functioning of families at risk of maladaptive parenting, child abuse, or neglect. Journal of Consulting and Clinical Psychology, 59, 38-47. Willett, J.B., & Sayer, A.G. (1994). Using covariance structure analysis to detect correlates and predictors of individual change over time. Psychological Bulletin, 116, 363-381. Yates, B.T. (1980). Improving effectiveness and reducing costs in mental health. Springfield, IL: Thomas. Yates, B.T. (1996). Analyzing costs, procedures, processes, and outcomes in human services. Thousand Oaks, CA: Sage. Yates, B.T., & Newman, F.L. (1980). Findings of cost-effectiveness and cost-benefit analyses of psychotherapy. In G. VandenBos (Ed.), Psychotherapy: From practice to research to policy (pp. 163-185). Beverly Hills, CA: Sage. Zimmerman, D.W., & Williams, R.H. (1982a). Gain scores in research can be highly reliable. Journal of Educational Measurement, 19, 149-154. Zimmerman, D.W., & Williams, R.H. (1982b). The relative error magnitude in three measures of change. Psychometrika, 47, 141-147. [Page 266(a)]

PART II CHILD AND ADOLESCENT ASSESSMENT INSTRUMENTATION

< previous page

page_266

next page >

< previous page

page_267

next page > Page 267

Chapter 9 Use of the Children's Depression Inventory Gill Sitarenios Multi-Health Systems, Inc. Maria Kovacs University of Pittsburgh, School of Medicine From a clinical perspective, a syndrome refers to a characteristic constellation of psychopathologic symptoms and signs. A depressive syndrome typically implies the presence of negative dysphoric mood, and complaints such as a sense of worthlessness or hopelessness, preoccupation with death or suicide, difficulties in concentration or making decisions, disturbance in patterns of sleep and food intake, and reduced energy. A disorder implies the presence of a particular syndrome that has been shown to have the characteristics of a diagnosable condition, that is, it has a recognizable pattern of onset and course, clear negative consequences with respect to the individual's functioning, distinct biologic or related correlates, an association with known etiologic or risk factors, and a course that may be altered in predictable ways by various treatments. Major depressive disorder and dysthymic disorder are two forms of depressive disorders that affect children and adults. Episodes of major depression in childhood last about 10 months on average and may have psychotic or melancholic features associated with them (Kovacs, Obrosky, Gatsonis, & Richards, 1997). Major depression often is comorbid with other disorders, most commonly with disorders of anxiety and conduct (Kovacs, Gatsonis, Paulauskas, & Richards, 1989; Puig-Antich, 1982; Strober & Carlson, 1982). Major depression in childhood is associated with a high rate of recovery; there is, however, a very high risk of episode recurrence, and an increased risk for the development of other related disorders (Kovacs, 1996a, 1996b; Kovacs et al., 1989; Strober & Carlson, 1982). Compared to major depression, dysthymic disorder is a milder and possibly less impairing form of depression. However, dysthymia usually lasts longer than major depression, with an average duration of about 3 1/2 years or longer (Kovacs et al., 1997). Similar to major depression, dysthymia has a high rate of eventual recovery. But, it also is associated with a high rate of comorbid psychiatric disorders, and dysthymia increases the risk for major depression and other related conditions (Kovacs, Akiskal, Gatsonis, & Parrone, 1994; Kovacs et al., 1997). Weiss et al. (1991) noted that depression in childhood, which was once thought to be rare or nonexistent, is now the subject of much clinical and research activity and is currently recognized by almost all authoritative sources (e.g., DSM-IV). In fact, estimates

< previous page

page_267

next page >

< previous page

page_268

next page > Page 268

of prevalence rates of depressive disorders in children have been found to be quite high (e.g., see Kashani et al., 1981) and some clinicians have diagnosed them as early as preschool age (e.g., Kashani & Carlson, 1985). The pattern of symptoms seen in childhood depression is similar to that seen in adults with similar affective, cognitive, behavioral, and somatic complaints (Kaslow, Rehm, & Siegel, 1984), and there appears to be little variability in the associated features of the disorder across the life span (Kovacs, 1996b). Depressive disorders can disrupt the functioning of children and adolescents in a number of areasmost notably in schooland cause significant developmental delays. Moreover, children who have depressive disorders may have trouble "catching up" in development (Kovacs & Goldston, 1991, p. 389). Assessment of Depression Using Self-Report Assessment of depression can focus on the early identification of the extent and severity of depressive symptoms, the diagnosis of depression and associated disorders, and monitoring the effectiveness of interventions. Self-rated inventories have long been a part of the assessment of depressive symptoms in adults (e.g., Beck Depression Inventory; Beck, 1967). Such inventories typically are easy to administer, inexpensive, and readily analyzable. Because they quantify the severity of the depressive syndrome, they have been used for descriptive purposes, to assess treatment outcome, to test research hypotheses, and to select research subjects. However, because self-rated inventories do not assess the temporal features, the onset, the course, or the contributing factors of the syndrome being examined, they cannot yield diagnostic information. For children, self-report inventories nonetheless provide especially useful information in that many features of depression and internal and are not easily identified by informants such as parents or teachers. Moreover, according to psychological models, children's self-perceptions are of predictive value in their own right (Kovacs, 1992; C.F. Saylor, Finch, Baskin, Furey, & Kelly, 1984a). The Children's Depression Inventory (CDI) has been one of the most widely used and cited inventories of depression. According to Fristad, Emery, and Beck (1997), the CDI was used in over 75% of the studies with children in which self-report depression inventories were employed. The initial version of the CDI was developed in 1977. Formal publication of the instrument in 1992 increased its accessibility. This chapter provides a timely opportunity to summarize the research history and usage of the CDI since its inception and publication. The CDI and its componentsas well as its various forms, associated manuals, and scoring formsare described in the first part of this chapter. Current research and theory related to the CDI are also highlighted. The CDI manual (Kovacs, 1992) includes an annotated bibliography of about 150 related research studies up to the end of 1991 and, according to recent literature (Fristad et al., 1997), at least 200 additional articles pertaining to the CDI have been published since that time. Other goals of this chapter are to examine current usage of the CDI, distinguish proper usage of the instrument from improper usage, and address questions frequently asked by practitioners. The CDI can be useful in the early identification of symptoms and in the monitoring of treatment effectiveness. The CDI also can play a role in the diagnostic process but, as already noted, should not be used alone to diagnose a

< previous page

page_268

next page >

< previous page

page_269

next page > Page 269

depressive disorder. Finally, this chapter describes the ongoing development of the CDI, including anticipated accessories, future research directions, and extended applications. Summary of the Development of the CDI The Beck Depression Inventory (Beck, 1967), a clinically based, 21-item, self-rated symptom scale for adults was the starting point for the development of a paper-and-pencil tool appropriate for children. The research literature supported the decision to use an "adult" scale as the model, given that there appeared to be much overlap between the salient manifestations of depressive disorders in juveniles and in adults (Kovacs & Beck, 1977). Scale construction proceeded in four phases. Phase I The first version of the children's inventory (March 1975) was derived with the help of a group of 10- to 15year-old "normal" youths and similar-age children from an urban inpatient and partial hospitalization program. After the purpose of the scale revision project was explained individually to the children, they were asked for advice on how the items could be worded to make them "clear to kids." Although in this phase of scale construction the Beck item on sexual interest was replaced by an item on loneliness, the content and format of 20 items of the ''adult" scale were essentially retained. However, five "Appendix" items, adapted from Albert and Beck (1975), were added concerning school and peer functioning. Piloting yielded further semantic changes. Phase II Data from normal youth and children who were under psychiatric-psychological care were used along with a semantic and conceptual item analysis to produce a second major revision (February 1976) that also included a new item on self-blame. This version of the inventory was administered to 39 8- to 13-year-old children who were consecutively admitted to a child guidance center's hospitalization units, 20 "normal" 8- to 13-year-olds with no history of psychiatric contacts, and 127 10- to 13-year-old fifth- and sixth-grade students in the Toronto public school systems. The resultant data were analyzed according to standard psychometric principles and the findings were used to derive a completely new version of the scale. Two of the original 21 items ("shame" and "weight loss") and two of the Appendix items ("family fights" and "self-blame") were replaced by four new items that had face validity and appeared age appropriate (e.g., "feeling unloved"). The CDI item-choice distributions in these samples also revealed that the items could be recast into a threechoice format: One choice reflects "normalcy," the middle choice pertains to definite although not disabling symptom severity, and the other response option reflects a clinically significant complaint. In order to prevent response bias, approximately 50% of the items (randomly selected) were worded so that the first response choice suggested the most pathology, whereas the response choice order was reversed for the remaining items.

< previous page

page_269

next page >

< previous page

page_270

next page > Page 270

Phase III The newly modified version of the CDI (May 1977) was again pilot tested and sent to colleagues for a critique. A cover page was added with revised instructions and a sample item. Based on the results of piloting, the items were further refined and reworded in order to improve face validity and comprehensibility. Phase IV One minor change preceded preparation of the final version of the CDI (August 1979). The score values were eliminated from the inventory and scoring templates were developed. Current Work Since the initial development of the CDI, additional psychometric analyses have been conducted. Based on these analyses, five factors have been identified and are fully described in the CDI manual (Kovacs, 1992). A short form of the CDI has been derived as well, and software has been developed for online administration, scoring, and reporting. The instrument is now available in several foreign languages. Overview of the CDI The CDI is appropriate for children and adolescents from age 7 to 17. The instrument quantifies a range of depressive symptoms, including disturbed mood, problems in hedonic capacity and vegetative functions, low self-evaluation, hopelessness, and difficulties in interpersonal behaviors. Several items pertain to the consequences of depression with respect to contexts that are specifically relevant to children (e.g., school). Each of the 27 CDI items consists of three choiceskeyed 0 (absence of a symptom), 1 (mild symptom), or 2 (definite symptom)with higher scores indicating increasing severity and yielding a total scale score that can range from 0 to 54. In addition to the total score, the CDI also yields scores for five factors or subscales. These factors are labeled Negative Mood, Interpersonal Problems, Ineffectiveness, Anhedonia, and Negative Self-esteem. Although author-approved definitions of these subscales have been available for some time, the definitions have not been widely published (although they are given in the recent Software User's Manual; Kovacs, 1995). Therefore, these definitions are provided in Table 9.1. Reliability Psychometric information on reliability is directly related to the proper use and interpretation of an instrument. The reliability of the CDI has been examined in terms of internal consistency, test-retest reliability, and standard error.

< previous page

page_270

next page >

< previous page

page_271

next page > Page 271

Scale Negative Mood

TABLE 9.1 Definitions of the Subscales of the CDI Definition This subscale reflects feeling sad, feeling like crying, worrying about "bad things," being bothered or upset by things, and being unable to make up one's mind.

Interpersonal This subscale reflects problems and difficulties Problems in interactions withpeople, including trouble getting along with people, social avoidance, and social isolation. Ineffectiveness This subscale reflects negative evaluation of one's ability and school performance. Anhedonia This subscale reflects "endogenous depression," including impairedability to experience pleasure, loss of energy, problems with sleeping and appetite, and a sense of isolation. Negative SelfThis subscale reflects low self-esteem, selfesteem dislike, feelings of being unloved, and a tendency to have thoughts of suicide. Internal Consistency. Internal consistency refers to the fact that all items on the given instrument consistently measure the same dimension. Kovacs (1992) summarized several research studies that reported alpha reliability statistics for the CDI. Alpha coefficients from .60 to .70 are usually taken to indicate satisfactory reliability, .70 to .80 indicate good reliability, and .80 to .95 indicate excellent reliability. The majority of the studies reported total score alpha values over .80, and all of the values were greater than .70. For instance, Kovacs (1985) found the total score coefficient alpha to be .86 with a heterogeneous, psychiatrically referred sample of children, .71 with a pediatricmedical outpatient group, and .87 with a large sample of public school students (n = 860). Although the internal consistency of the CDI Total Score has often been reported, data on alpha coefficients for the five factor scores have been less available. Therefore, the internal consistency of the five subscales was assessed using two large data sets: the CDI normative sample of 1,266 children and an independent sample of 894 Canadian children. The reliability values obtained are shown in Table 9.2, along with a summary of alpha values previously reported for the CDI Total Score. Although reliability values for the five subscales are not as high as those for the CDI Total Score, the findings for the subscales are satisfactory. Furthermore, the alpha values obtained from the two samples are very similar. Test-Retest Reliability. The CDI is completed based on the respondent's feelings, moods, and functioning during the 2-week period just prior to the test administration. Thus, the inventory measures state symptoms, rather than traits, which are less changeable over time. Because the CDI measures a state rather than a trait, the retest interval TABLE 9.2 Estimates of Internal Consistency of the CDI and the Five CDI Factors Scale Internal Consistency (Cronbach's Alpha) Total CDI Alphas ranging from .71-.89 (Kovacs, 1992) Negative Mood Normative Sample: .62; Canadian Sample: .65 Interpersonal Normative Sample: .59; Canadian Sample: Problems .60 Ineffectiveness Normative Sample: .63; Canadian Sample: .59 Anhedonia Normative Sample: .66; Canadian Sample: .64 Negative Self-esteem Normative Sample: .68; Canadian Sample:

.66

< previous page

page_271

next page >

< previous page

page_272

next page > Page 272

for assessing reliability should be short (2 to 4 weeks). In the research reviewed by Kovacs (1992), studies using such short intervals found test-retest correlations between .56 and .87 (an outlier of .38 was obtained in one study) and the median test-retest correlation was .75. Thus, the CDI has acceptable short-term stability. Standard Error. Two types of standard error (Lord & Novick, 1968) are most relevant to the CDI: standard error of measurement (SEM1) and standard error of prediction (SEM2). The standard error of measurement (SEM1) represents the standard deviation of observed scores if the true score is held constant. In the case of the CDI, this means that if parallel forms of the scale were used to assess the same individual at the same time, then about 68% of the scores would fall within a ± 1.96 SEM1 unit of the score obtained on the CDI scale, and about 95% of the scores would fall within ±1.96 SEM1 units. The standard error of prediction (SEM2) has particular relevance because SEM2 has an intimate connection to outcome assessment. SEM2 represents the standard deviation of predicted scores if the obtained score is held constant. That is, if 100 individuals were reassessed on the CDI, about 68% of the retest scores would fall within ±1 SEM2 unit of the respective predicted scores and about 95% of the retest scores would fall within ±1.96 SEM2 units of the predicted scores. Thus, the SEM2 value is one way of assessing how much CDI scores can be expected to change due to random fluctuation. Any change in CDI scores that far exceeds the expected random fluctuation is most likely attributable to a significant change in the status of the individual's symptomatology. The absolute value for the standard error of measurement (SEM1) or the standard error of prediction (SEM2) varies according to both the estimate of reliability and the estimate of the population standard deviation used in the calculation. The previously noted SEM1 value was calculated based on the median Cronbach alpha for the CDI Total Score, shown in Table 9.2, and SEM2 values were derived using the median 2- to 4-week test-retest reliability estimate for the CDI Total Score. The resultant standard error of measurement values are presented in Table 9.3. Validity The validity of an instrument is assessed by estimating the extent to which it correctly measures the construct or constructs that it purports to assess. Because constructs cannot TABLE 9.3 Standard Error Values for the CDI Total Score Gender Standard Error of Standard Error of (Age Measurement Prediction (SEM2) Group) (SEM1) Boys 2.9 3.8 (overall) Boys (72.8 3.7 12) Boys (133.1 4.2 17) Girls 2.6 3.5 (overall) Girls (72.7 3.6 12) Girls (132.4 3.2 17) Overall 2.7 3.7

< previous page

page_272

next page >

< previous page

page_273

next page > Page 273

be directly observed, several options are available to assess the validity of an instrument: namely, its correlation with other scales purported to measure the same construct (construct validity: other depression scales), its correlation with scales purported to measure related constructs (construct validity: related constructs), its correlation with independent ratings of behaviour (construct validity: other measures), factor analytic support for its subscale structure (factorial validity), and its ability to predict appropriate behaviors (predictive validity). Thus, the validity of a test rests on accumulated evidence from a number of studies using various methodologies (Campbell & Fiske, 1959). The CDI has been utilized in hundreds of clinical and experimental research studies and its validity has been well established using a variety of techniques. Overall, the weight of the evidence indicates that the inventory assesses important constructs that have strong explanatory and predictive utility in the characterization of depressive symptoms in children and adolescents. Table 9.4 shows a listing of some of the research related to different aspects of validity. Also, see Barreto (1994) for a brief review of validity information, and C.F. Saylor, Finch, Baskin, Furey, et al. (1984a) and C.F. Saylor, Finch, Spirito, and Bennett (1984c), who used the multitrait-multimethod approach to assess the construct validity of the CDI. Further validation data pertinent to specific uses of the CDI are presented later. CDI Short Form The 10-item CDI Short Form was developed to enable more rapid and economical assessment of depressive symptoms than the long form. The CDI Short Form can be used when a quick screening measure is desired or when the examiner's time with the child is limited. The short form takes 5 to 10 minutes to administerabout half the time it takes to administer the long version. However, the long and short forms generally provide comparable results. That is, the correlation between the CDI Total Score and the CDI Short Form total score was r = .89 (Kovacs, 1992). Administration of the CDI Reading Level Past computations of the reading level for the CDI have produced different grade readability estimates (Berndt, Schwartz, & Kaiser, 1983; Kazdin & Petti, 1982). A first-grade reading level for the CDI is most frequently cited (e.g., Kovacs, 1992). Variable assessments of the instrument's reading level probably reflect the use of different reading level formulae. The Dale-Chall formula (Dale & Chall, 1948) has been found to be the most valid and accurate of the nine commonly utilized readability formulas (e.g., Harrison, 1980). It is based on semantic (word) difficulty, and syntactic (sentence) difficulty. Usually, two 100-word samples are taken to calculate the reading level using the Dale-Chall formula (Chall & Dale, 1995). However, to provide greater accuracy, the computation reported here used all of the CDI items. In accordance with the Dale-Chall

< previous page

page_273

next page >

< previous page

page_274

next page > Page 274

TABLE 9.4 Studies Containing Information Relevant to the Validity of the CDI Salient Measures/Methodology Reference Construct Validity CDI compared to CBCL other measures of " childhood depression " Bodiford, Eisenstadt, " Johnson, & Bradlyn, 1988 Hammen, Adrian, Gordon, Burge, Jaenicke, & Hiroto, 1987 Hepperlin, Stewart, & Rey, 1990 Weiss & Weisz, 1988

CDS and others Haley, Fine, CDS Marriage, Moretti, BDI & Freeman, 1985 Rotundo & MMPI-D Hensley, 1985 DSRS Seligman, Peterson, Kaslow, Tanenbaum, Alloy, & Abramson, 1984 Lipovsky, Finch, & Belter, 1989 Asarnow & Carlson, 1985 CDI compared to Anxiety (RCMAS) measures of related " constructs " Eason, Finch, " Brasted, & C.F. " Saylor, 1985 Felner, Rowlison, Raley, & Evans, 1988 Kovacs, 1985 Norvell, Brophy, & Finch, 1985 Ollendick & Yule, 1990 Anxiety (STAI) Blumberg & Izard, "

" Wolfe, Finch, " C.F. Saylor, RADS Blount, RADS, Pallmeyer, & HAMILTON Carek, 1987 Worchel, Hughes, Hall, S.B. Stanton, H. Stanton, & Little, 1990 Nieminen & Matson, 1989 Shain, Naylor, & Alesi, 1990 CES-D Faulstich, " Carey, CES-D and Ruggiero, SAS Enyart, & CDS Gresham, 1986 Felner, Rowlison, Raley, & Evans, 1988 Weissman, Orvaschel, & Padian, 1980 Bartell & Reynolds, 1986

1986 Self-concept (PiersWolfe, Finch, C. Harris) Saylor, Blount, " Pallmeyer, & " Carek, 1987 " Allen & Tarnowski, 1989 Elliott & Tarnowski, 1990 Knight, Hensley, & Waters, 1988 Kovacs, 1985 McCauley, " Mitchell, Burke, & " Moss, 1988 " Rotundo & " Hensley, 1985 C.F. Saylor, Finch, Baskin, Furey, & Kelly, 1984a C.F. Saylor, Finch, Spirito, & Bennett, 1984c Self-esteem Kaslow, Rehm, & (Coopersmith) Siegel, 1984 " Kovacs, 1985 " Reynolds, Self-esteem (SelfAnderson, & esteem Inventory) Bartell, 1985 Kazdin, French, Unis, & EsveldtDawson, 1983 Attributional Style Bodiford, (CASQ) Eisenstadt, Johnson," & Bradlyn, 1988 " Curry & Craighead, " 1990 Gladstone & Kaslow, 1995 Hammen, Adrian, & Hiroto, 1988 " Kuttner, Delameter, " & Santiago, 1989 " McCauley, Hopelessness Mitchell, Burke, & (Hopelessness Scale) Moss, 1988 Nolen-Hoeksema, Girgus, & Seligman, 1986 Elliott & Tarnowski, 1990 Kazdin, French, " Unis, & Esveldt" Dawson, 1983 " Kazdin, French, Unis, EsveldtDawson, & Sherick, 1983 McCauley, Mitchell, Burke, & Moss, 1988 " Spirito, Overholser, Perceived Competence & Hart, 1991 Scale Fauber, Forehand, Social Adjustment Long, Burke, & Scale Faust, 1987 Weissman,

Orvaschel, & Padian, 1980 (continued) (table continued on next page)

< previous page

page_274

next page >

< previous page

page_275

next page > Page 275

(table continued from previous page) Table 9.4 (Contineued) Reference CDI compared to behavioral measures/observations of depressive behavior/symptoms Blumberg & Izard, 1986 Huddleston & Rust, 1994 Ines & Sacco, 1992 Reynolds, Anderson, & Bartell, 1985 Renouf & Kovacs, 1994 Sacco & Graves, 1985 Shah & Morgan, 1996 Slotkin, Forehand, Fauber, McCombs, & Long, 1988

Salient Measures/Methodology

Parent/Teacher rating/observation " " " " " " "

Therapist/Staff ratings Breen & Weinberger, 1995 Perceptions of Stocker, 1994 relationships/adjustment Hodges, 1990 Interview findings C.F. Saylor, Finch, Baskin, Furey, & Kelly, Peer reports 1984a Factorial Validity Kovacs, 1992 Carey, Faulstich, Gresham, Ruggiero, & Enyart, 1987 Helsel & Matson, 1984 C.F. Saylor, Finch, Spirito, & Bennett, 1984c Weiss & Weisz, 1988 Weiss, Weisz, Politano, Carey, Nelson, & Finch, 1991 Predictive Validity Devine, Kempton, & Forehand, 1994 Longitudinal procedure used DuBois, Felner, Bartels, & Silverman, 1995 " Mattison, Handford, Kales, Goodman, & " McLaughlin, 1990 Reinherz, Frost, & Pakiz, 1991 " Marciano & Kazdin, 1994 Statistical prediction procedure used Slotkin, Forehand, Fauber, McCombs, & Long, " 1988 standard procedure for determining reading level, the number of complete sentences were counted and divided into the number of words to determine average sentence length (WDS/SEN). Next, the number of "unfamiliar" words (UFMWDS) were counted. A word is considered unfamiliar if it does not appear on a list of 3,000 "familiar" words compiled by Dale (revised in 1983). "Familiar'' words are known by at least 80% of children in the fourth grade. Consideration of the number of familiar and unfamiliar words in a sample of text increases the accuracy of the reading level assessment. The grade level was determined using the following formula:

The Dale-Chall procedure produced a third-grade reading level for the CDI, suggesting that the often-cited firstgrade reading level for the CDI is not definitive. Administrators/practitioners should not assume that all younger children will be able to understand the language on the inventory. For 7- and 8-year-olds and children with reading difficulties, it is recommended (Kovacs, 1992) that the administrator read aloud the instructions and the CDI items while children read along on their own form.

< previous page

page_275

next page >

< previous page

page_276

next page > Page 276

Administration Methods One way to administer the CDI is to allow children to indicate their responses on a special QuikScoreTM form (Kovacs, 1992). The QuikScoreTM form is self-contained and includes all materials needed to score and profile the CDI. Conversion to T-scores is automatically made in the QuikScoreTM form. The CDI also can be computer administered and scored using an IBM-compatible microcomputer (Kovacs, 1995). Regardless of which option/format is chosen, the administrator should make sure the child carefully reads the instructions and fully understands the inventory. As already noted, for younger children or those with reading difficulties, it may be necessary to read the instructions and the items aloud while children read along on their own form or the computer screen. After reading each item, children select one of the three response options provided. Children may say that none of the choices in a given item really applies to them. In such a case, they should be instructed to select the item choice that fits them best. Although the CDI is most often administered on an individual basis, group administration is permitted (e.g., Friedman & Butler, 1979; C.F. Saylor, Finch, Baskin, C.B. Saylor, et al., 1984b). Additionally, with nonclinical populations, some test administrators have considered inclusion of the suicide item to be inappropriate; in such instances, it may be preferable to use the CDI Short Form, which does not include this item. Applicable Populations In interpreting clinically significant patterns of total scale and factor scores on the CDI, it is important to consider the children's background, including their socioeconomic status, country of origin, and ethnicity. The norms presented in the main manual for the CDI (Kovacs, 1992) are based on a select sample of North American children. The validity of the instrument for other groups of children is suggested by research studies with different populations. In general, this body of research, cited in Table 9.5 and Table 9.6, shows very widespread applicability of the CDI. Table 9.5 lists research citations in connection with the use of the CDI with children from different cultures and from different countries. The CDI research includes data on children who were African American, North American, Spanish, German, Australian, Egyptian, Japanese, Brazilian, Icelandic, Croatian, and French. These references should be consulted to aid in the interpretation of CDI results regarding those populations. Table 9.5 also cites some of the translated versions of the CDI. Table 9.6 lists some of the research on the CDI with children in special circumstances. Data have been obtained from samples of low socioeconomic status; urban and rural residents; those in public housing situations; and children with mental retardation, learning disability, or emotional problems. In addition to the studies cited in Table 9.6, the CDI has also been used in situations where a family member or the child has cancer (Siegel, Karus, & Raveis, 1996; Polaino & del-Pozo-Armentia, 1992), with children going through the tribulations of parental divorce (e.g., Pons-Salvador & del-Barrio, 1993), and with children who have insulin-dependent diabetes mellitus (Kovacs, Iyengar, Stewart, Obrosky, & Marsh, 1990).

< previous page

page_276

next page >

< previous page

page_277

next page > Page 277

TABLE 9.5 Research Reports on the Use of CDI with Children of Different Ethnic and National Backgrounds Reference Notes Abdel-Khalek, 1996 n = 1,981, Arabic version, Kuwaiti students Abdel-Khalek, 1993 n = 2,558*, Arabic version Arnarson, Smari, Einarsdottir, & Jonasdottir, n = 436, Icelandic version 1994 Canals, Henneberg, Fernandez-Ballart, & n = 534, Spanish sample Domenech, 1995 Chartier & Lassen, 1994 n = 792*, North American sample DuRant, Getts, Cadenhead, Emans, & Woods,n = 225, African American 1995 sample Fitzpatrick, 1993 n = 221, African American sample Frias, Mestre, del-Barrio, & Garcia-Ros, n = 1,286, Spanish sample 1992 Ghareeb & Beshai, 1989 n = 2,029*, Arabic version Goldstein, Paul, & Sanfilippo-Cohn, 1985 n = 85, African American n = 305, Brazilian version Gouveia, Barbosa, de-Almeida, & de Andrade-Gaiao, 1995 Koizumi, 1991 n = 1,090*, Japanese version Lobert, 1989, 1990 n = 128, German version Mestre, Frias, & Garcia-Ros, 1992 n = 952*, Spanish sample Oy, 1991 n = 432, Turkish sample Reicher & Rossman, 1991 n = 658, German version Reinhard, Bowi, & Rulcovius, 1990 n = 84, German version Saint-Laurent, 1990 n = 470, French version Sakurai, 1991 n = 237, Japanese version Spence & Milne, 1987 n = 386*, Australian sample Steinsmeier-Pelster, Schurmann, & Urhahne, n = 319, German sample 1991 Steinsmeier-Pelster, Schurmann, & Duda, n = 918, German version 1991 n = 135, Hispanic sample Worchel, Hughes, Hall, S.B. Stanton, H. Stanton, & Little, 1990 Zivcic, 1993 n = 480, Croatian version * Sample sufficient to be considered normative data for this group. TABLE 9.6 Some Research Reports on the Use of CDI with Special Groups Reference Notes Benavidez & Matson, 1993 n = 25, Mentally retarded children DuRant, Getts, Cadehead, n = 225, Public housing Emans, & Woods, 1995 Goldstein, Paul, & Sanfilippo- n = 85, Learning disabled children Cohn, 1985 Meins, 1993 n = 798, Mentally retarded adults administered modified version of CDI Nestre, Frias, & Garcia-Ros, n = 25, Mentally retarded children 1992 Nelson, Politano, Finch, n = 535, Emotionally disturbed children Wendel, & Mayhall, 1987 Oy, 1991 n = 432, Different socioeconomic status Politano, Nelson, Evans, n = 551, Emotionally disturbed children Sorenson, & Zeman, 1985 C.F. Saylor, Finch, Spirito, & n = 154, Children with emotional-behavioral Bennett, 1984c problems

< previous page

page_277

next page >

< previous page

page_278

next page > Page 278

Approaches to CDI Interpretation The manner in which CDI results are used or interpreted is generally a function of the setting in which the instrument was administered and the ostensible reason for the administration. Consequently, the interpretative focus can be a detailed consideration of the specific responses of a given child to each individual item. The interpretation also may emphasize the total CDI T-score or individual CDI factor T-Scores, each of which "rank" the child in comparison to "normal" age- and gender-matched peers. Determining the Validity of the Results Regardless of the interpretive focus, CDI results need to be examined in the context of potential threats to validity. One approach is determination of the quality of the completed inventory. Another approach is examination of the Inconsistency Index. Procedural Issues. The following issues should be kept in mind in determination of the quality of the completed CDI: 1. Has the inventory been filled in properly? Missing items will invalidate the total score. Although the administrator may prorate a missing item (e.g., by taking the average score on all remaining items and assigning that value to the missing item), subsequent interpretation must take the missing item(s) into account. 2. Is there an apparent response bias? Response bias may be operating if a child consistently checks the first option on each item, the middle option, or the last option on each item. Random checking of options, which may be inferred by the detection of apparently contradictory answers to similar items, may represent biased responding as well. Such patterns invalidate the CDI Total Score. 3. Are there any suggestions of lack of truthfulness? In a clinical setting that involves testing a child who has been referred, this possibility may arise, as indicated by the child "denying" every symptom or endorsing the most severe option to all, or almost all, items. In such instances, inquiry into the child's expectations of the evaluation may be more informative than focus on the CDI score itself. 4. Is the testing environment appropriate to psychological examination? As with all forms of psychological assessment, the CDI should be completed in a setting that is free from distraction, affords the child the requisite privacy, and is reasonably comfortable. An unsuitable testing environment is likely to threaten the validity of the child's responses and must be considered in score interpretation. the Inconsistency Index. Children may exaggerate or misrepresent symptoms in some circumstances. As a result, some self-rated instruments include special items or scales to identify distorted responses (e.g., Beitchman, 1996; Reynolds & Richmond, 1985). Alternatively, for some instruments (e.g., MMPI-2 VRIN and TRIN scales: Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989; MASC Inconsistency Index: March 1997), an inconsistency index has been developed that does not usually require special items. Inconsistency indexes are based on the premise that the most similar items, or the most highly correlated items on a measure, elicit similar (although not necessarily identical) responses. As determined by statistical procedures, if there is a large discrepancy in the responses for several correlated item pairs, then inconsistent and possibly invalid responding must be considered.

< previous page

page_278

next page >

< previous page

page_279

next page > Page 279

An Inconsistency Index exists for the CDI. Each of the five scales on the CDI (i.e., Negative Mood, Interpersonal Problems, Ineffectiveness, Anhedonia, and Negative Self-esteem) contains sets of items that are highly correlated with one another. If a pair of items is highly correlated, then a child whose response is indicative of a symptom for one item of the pair should give a response indicative of a symptom for the other item of the pair. Although such consistency is generally expected, some inconsistency can and will occur to a limited extent, the magnitude of which can be assessed through the CDI Inconsistency Index (Kovacs, 1995). This index is generated based on a computer algorithm taking into account the factor loadings of items. For the Negative Mood scale, the following items form the highly correlated item group that is used to measure consistency: Items 1, 8, 10, and 11. For Interpersonal Problems, the item set consists of: Items 5, 26, and 27. For Ineffectiveness, the item set consists of: Items 15, 23, and 24. For Anhedonia, the item set consists of: Items 16, 19, 20, and 22. For Negative Self-esteem, the item set consists of: Items 7, 9, 14, and 25. In the normative sample for the CDI, only 89 out of 1,266 children (6.9%) scored greater than or equal to 7 on the Inconsistency Index. And, only 36 out of 1,266 (2.8%) scored greater than or equal to 9. Based on these data, the results from the Inconsistency Index are assessed as follows: If the Inconsistency Index is less than 7, then the responses are considered sufficiently consistent. If the Inconsistency Index is greater than or equal to 7 but less than 9, then the responses are somewhat inconsistent. If the Inconsistency Index is greater than or equal to 9, then the responses are very inconsistent. A high Inconsistency Index score should not be interpreted to mean that the CDI results should be disregarded. Inconsistent responding can occur for a variety of reasons, including the child being unable to concentrate on the task or to understand the instructions. Such considerations must be part of interpreting the Inconsistency Index for a respondent. Interpretive Steps Interpretation of CDI results in the context of community-based or epidemiologic studies are straightforward in so far as they usually employ clinically validated cutoff scores or normative T-scores to define "caseness," and is not discussed in this chapter. Likewise, when the CDI is used as a screening instrument, a priori defined raw cutoff scores (or T-scores) are generally employed, with no need for specific interpretations. Because most questions regarding CDI score interpretation arise in the context of clinical assessment, and for clinical purposes such as planning interventions or evaluations, pertinent information on these aspects of CDI use are now described in detail. Interpretation of Total Scores and Factor Scores as t-Scores. Normative data tables are incorporated into the Profile Form for the CDI. The normative data tables utilize Tscores that are standardized to have a mean or average of 50 and a standard deviation of 10. The normative tables automatically compare the child being assessed to children in the normative sample of the same gender and age, and allow each component in the profile to be compared to every other. T-scores above 65 are generally considered clinically significant when the child being studied is from a "high base-rate" group, such as children in a clinical setting. When the child is believed to be from a "low base-rate" group, such as children without identified behavioral problems, a much higher cutoff

< previous page

page_279

next page >

< previous page

page_280

next page > Page 280

(e.g., a T-score of 70 or 75) should be used for inferring clinical problems. High scores suggest a problem, whereas low scores indicate the absence of the problem. It should be noted that the T-scores used with the CDI are linear T-scores. Linear T-scores do not transform the actual distributions of the variables, and hence, whereas each variable has been transformed to have a mean of 50 and a standard deviation of 10, the distributions of the scale scores do not change. Variables not normally distributed in the raw data will continue to be nonnormally distributed after the transformation. As a rule of thumb, T-scores for the CDI can be interpreted using the guidelines in Table 9.7. These interpretations reflect how an individual child's score compares to those of children of the same age range and gender from the normative sample. Note, however, that the suggested adjectives are guidelines, and there is no reason to believe there is a perceptible difference, for instance, between a T-score of 55 and a T-score of 56. Therefore, these guidelines should not be used as absolute rules. For many clinical tests, it is common practice to interpret the overall profile based on the most elevated test scores. In such a case, a clinically elevated test score (in the metric of T-scores) would be defined as above 65. If, for a given set of scores, no test scores are above a T-score of 65, the profile is usually considered to be "normal." A profile in which a single T-score is elevated above 65 is usually considered to have a "1-point" code, and is referred to by the single elevated scale. In general, given the high correlations of the factors of the CDI, such profiles should be relatively rare and, when encountered, may be viewed as an indication of only moderate evidence of a problem. When two or more subscale scores are clinically elevated, the profile is usually categorized by the two factors that are the highest and is called a "2-point code." Although 2-point codes have not usually been employed with the CDI, some clinical practitioners may find it useful to do so. Experience with inventories such as the MMPI or the Personality Inventory for Children (PIC) indicates that 2-point codes tend to be useful and robust ways of categorizing clinically meaningful patterns of behavior (Lachar & Gdowski, 1979). In general, therefore, thoughtful examination of the CDI subscale profile should be more informative than consideration of only the total score. The CDI subscale T-score profile can be used to indicate specific areas of vulnerability as well as areas of strength. For example, from a clinical perspective, elevated T-scores on the Anhedonia factor or the Ineffectiveness factor may be particularly important. Because the Anhedonia factor contains items traditionally associated with "endogenous" depression, a child with a TABLE 9.7 Interpretive Guidelines for CDI T-score T-score Interpretation of Overall Symptoms/Complaints* Above 70 Very much above average 66-70 Much above average 61-65 Above average 56-60 Slightly above average 45-55 Average 40-44 Slightly below average 35-39 Below average 30-34 Much below average Below 30 Very much below average * Compared to children of similar age and gender in the normative sample.

< previous page

page_280

next page >

< previous page

page_281

next page > Page 281

high T-score on this factor may be at particular risk for a serious depressive episode. A high score on the Ineffectiveness factor may indicate notable functional impairment, which may warrant additional interventions for a particular child. Concomitantly, in interpreting the CDI profile, a child who has elevated T-scores on both of the aforementioned scales may be of greater clinical concern than a child who has an elevated score on the Anhedonia factor but an average score on the Ineffectiveness factor. In the former case, the child may be evidencing both functional impairment and troublesome depressive symptoms, whereas in the latter case, the troublesome depressive symptoms (area of vulnerability) are somewhat counteracted by having maintained reasonable functioning (area of strength). Examination of Total Raw Score and Item Response Pattern. A practitioner conducting a clinical assessment may decide to focus on the raw CDI score and individual item responses. For example, a total CDI score of 20 may result if a child endorses only 10 items, but each to its most severe degree. Alternatively, a child may receive a score of 20 by endorsing up to 20 items, but each to a mild degree. Examination of the number of items and the options for the items that contributed to the total CDI score can provide useful information about the extent and severity of the child's complaints and symptoms. The examiner also may find it helpful to group the items endorsed by a child into phenomenologically meaningful categories. This approach can provide an additional perspective regarding the nature of the child's complaints. For example, if most, or all, endorsed CDI items pertain to physical and neurovegetative symptoms (somatic complaints, problems with sleep, appetite, energy), then a pediatric examination may be warranted. If all items with symptomatic responses relate to school or peer problems, then a closer examination of those aspects of the child's life may be in order. Examination of Individual Item Responses. By studying the individual responses of a child to the CDI items, the examiner may form hypotheses about the range and type of the child's difficulties. Furthermore, in conjunction with other information, item analyses can help to identify cases at particular risk for serious depression, even in the absence of a highly elevated total score. For example, endorsements of the most severe options on Item 1 (sadness), Item 4 (anhedonia), and Item 10(crying) are indicative of pervasive despondent mood. In so far as the presence of such a mood state has been shown to represent an early phase of depression, a child with such responses may warrant ongoing monitoring. Similarly, research evidence has suggested that children who are isolated may be at risk for a variety of adjustment problems. Thus, even if the total CDI score is low, a child who endorses both Item 20 (loneliness) and Item 22 (lack of friends) may be at risk for subsequent difficulties and could benefit from monitoring. Unlike many other inventories, specific items on the CDI have not been designated as "critical" because all of the items have been preselected by the author and validated by numerous investigators as pertinent to the syndrome of depression in the juvenile years. However, the question pertaining to suicidal thoughts (Item 9) may be particularly important for screening children in clinical settings or identifying those at risk. Endorsement of this item should prompt the examiner to conduct a detailed clinical assessment to determine the frequency and severity of suicidal ideation, whether it involves a specific contemplated method, and whether the child has ever attempted suicide. The information obtained should facilitate the planning of strategies for management or treatment.

< previous page

page_281

next page >

< previous page

page_282

next page > Page 282

Integrate the CDI Scores with All Other Information About the Child. The examiner should observe the child directly and the CDI results should be integrated with other test scores and with information about the child's background, family history, and school adjustment. Interviews with the child, parent, and possibly teachers should be obtained. Consideration of such diverse information sources should result in a more valid conclusion regarding children's problems as well as strengths, and the extent to which depression may be undermining their functioning. Determination of Appropriate Intervention or Remediation Strategy for the Child. Based on all sources of information, the examiner should decide what kinds of feedback are appropriate and ethical for the parent(s) and how to make that information available, how and when a report should be filed, and who should have access to the information. A treatment plan should be developed in concordance with the parents or an appropriate referral should be made. The results of the CDI can be particularly useful in determining suitable interventions for the child and in selecting treatment targets. As already noted, CDI factor scores and responses to items can identify problems or areas of concern. For example, children with an elevated score on the Interpersonal Problems factor may benefit from social skills training, modeling, or targeted group intervention as a way to treat their depression. Children with an elevated score on the Ineffectiveness factor may benefit from remedial help as well as behavior modification. A very high score on the Negative Mood factor may indicate consideration of referral for antidepressant pharmacotherapy. If children have a particularly high score on the Negative Self-esteem factor, the intervention may focus on improving self-image and building confidence. In a similar vein, endorsement of items such as "I never have fun at school" and "I have to push myself all the time to do my schoolwork" would suggest that the treatment have a school-based component. Use of the CDI for Clinical Purposes The Standards of Educational and Psychological Testing, developed through collaboration of the American Psychological Association (APA, 1985) and other professional organizations, has emphasized the need to validate a measure with respect to each of its proposed purposes or uses. Therefore, in the following sections, validation information is integrated with descriptions of the main uses of the CDI. Screening for Depression The CDI is recommended for use as a screening tool and has been widely used for this purpose (e.g., Canals, Henneberg, Fernandez-Ballart, & Domenech, .1995; Polaino-Lorente & Domenech, 1993; Stavrakaki, Williams, Walker, Roberts, & Kotsopoulos, 1991). As a screening tool, the CDI can serve to identify children who are atrisk for a depressive disorder and may require further assessment with a more complex test battery (including behavioral observations, interviews, other psychological testing, etc.). The validity of the use of the CDI for this purpose largely depends on the ability of the inventory to differentiate children identified with depressive disorders from those who have not been identified with a depressive disorder. Many research studies have shown that the CDI effectively differentiates between depressed and nondepressed children. Table 9.8 lists some of this supporting literature.

< previous page

page_282

next page >

< previous page

page_283

next page > Page 283

TABLE 9.8 Literature Citations of Research Showing Differences on the CDI Between Depressed and Nondepressed Children Citations Armsden, McCauley, Greenberg, Burke, & Mitchell, 1990 Carey, Faulstich, Gresham, Ruggiero, & Enyart, 1987 Craighead, Curry, & Ilardi, 1995 Fine, Moretti, Haley, & Marriage, 1985 Fristad, E.B. Weller, R.A. Weller, Teare, & Preskorn, 1988 Hodges, 1990 Hodges & Craighead, 1990 Jensen, Bloedau, Degroot, Ussery, & Davis, 1990 Kazdin, Rodgers, & Colbus, 1986 Kazdin, Esveldt-Dawson, Unis, & Rancurello, 1983 Knight, Hensley, & Waters, 1988 Kovacs, 1985 Lipovsky, Finch, & Belter, 1989 Lobovits & Handal, 1985 Marriage, Fine, Moretti, & Haley, 1986 McCauley, Mitchell, Burke, & Moss, 1988 Moretti, Fine, Haley, & Marriage, 1985 Rotundo & Hensley, 1985 C.F. Saylor, Finch, Spirito, & Bennett, 1984c Spirito, Overholser, & Hart, 1991 Stark, Kaslow, & Laurent, 1993 Worchel, Nolan, & Willson, 1987 The validity of the CDI as a screening tool also has been examined in terms of sensitivity and specificity. Sensitivity refers to the percentage of diagnosable depressed children who are correctly classified by the test. Specificity refers to the percentage of nondepressed children who are correctly classified by the test. For example, Craighead, Curry, and Ilardi (1995) reported that the five CDI factor scores classified participants as depressed versus not depressed with a high degree of accuracy. Using the CDI total score cutoff of 17 as the classification criterion, these investigators also found sensitivity of 80% and specificity of 84%. When the CDI is used for screening purposes, a specific cutoff is usually selected and children scoring above the cutoff are identified as those at risk. Different cutoff values may be used depending on the relative importance of sensitivity and specificity in a particular screening situation (Kovacs, 1992). In general, raising the cutoff value decreases sensitivity while it increases specificity of a test. Lowering the cutoff value has the opposite affect by increasing sensitivity and decreasing specificity. High cutoff scores are more appropriate than low ones when it is important to minimize "false positives," that is, falsely identifying a nondepressed child as at-risk for depression. As noted, however, with high cutoff scores, the "false negative" rate is increased; that is, many individuals who fall below the cutoff but are actually depressed will not be identified as at-risk. Low cutoff scores are preferred when it is important to minimize false negatives, that is, failure to identify a depressed child as at-risk. However, the use of a low cutoff score will result in a higher false positive rate, that is, more nondepressed individuals will be identified as at-risk. When used as a general population-based screen, Kovacs (1992) recommended the raw score of 20 as a cutoff. An example of a situation where the CDI can be used as a general screen with this cutoff score is in a school system wherein routine testing is

< previous page

page_283

next page >

< previous page

page_284

next page > Page 284

conducted on a large segment of the student population. On the other hand, for screening in clinical settings, a lower cutoff is appropriate because the base rate of depression can be expected to be higher. In the research literature (e.g., Garvin, Leber, & Kalter, 1991; Kazdin, Colbus, & Rodgers, 1986; Lobovits & Handal, 1985), cutoff scores as low as 12 or 13 have been proposed in clinical contexts. Use as an Aid in the Diagnostic Process Although the CDI can serve as an aid in the diagnostic process, it cannot by itself yield a diagnosis. As already noted, a psychiatric diagnosis of major depression or dysthymia requires that certain inclusionary and exclusionary diagnostic criteria be met, that the constellation of symptoms and signs be present for a particular duration, and that they should be associated with distress or functional impairment (APA, 1980, 1985, 1994). The necessary information can only be obtained through a detailed clinical diagnostic interview. Regrettably, current usage of the CDI has not been satisfactory in this regard. An assessment by Fristad et al. (1997) found that 44% of the studies that used the CDI alone referred to high CDI scorers as "depressed" without providing a clear cautionary statement. After a referred child has been administered the CDI, the results can be used in various ways to facilitate the process of diagnosis. If the clinical interview has confirmed the presence of a depressive disorder, the children's CDI score can serve as an indicator of the overall severity of their current symptomatology. For example, a youngster with a CDI score of 28 is clearly more severely depressed than a comparably aged child with a CDI score of 16. The CDI results also can be useful in reaching a diagnosis in cases where, subsequent to having interviewed the parent about the child, it is unfeasible to conduct a full face-to-face clinical assessment with the referred child. In such a case, information from the CDI may clarify aspects of the data provided by the parent because the test items and DSM criteria for depression overlap. Ponterotto, Pace, and Kavan (1989), who reviewed the most commonly used depression measures, noted that the CDI was the only measure with items pertaining to each of the DSM-III-R symptom criteria for major depression. The criteria for major depression essentially have remained the same in the DSM-III, DSM-III-R, and DSM-IV. Table 9.9 shows the correspondence between the nine criterion symptoms and specific CDI items. Alternatively, the child's responses on the CDI can be used as starting points for probes in the clinical interview. The evaluator may note which particular CDI items were endorsed. And then, citing to the children their item responses, the evaluator can ask them during the interview to provide further information or to elaborate on those complaints. Use of the CDI for Treatment Monitoring and Treatment Outcome Assessment Because the CDI yields a quantified rating, the instrument is appropriate for monitoring levels of depressive symptoms during and at the end of treatment. For example, the CDI has been used to assess the effects of group therapy (e.g., Garvin et al., 1991),

< previous page

page_284

next page >

< previous page

page_285

next page > Page 285

TABLE 9.9 Correspondence of CDI Items to DSM-IV Symptom Criteria for Major Depression DSM-IV Criterion Related CDI Item and the Most Symptomatic Response 1. Depressed mood Item 1: ''I am sad all the time." Item 2: "Nothing will ever work out for me." Item 10: "I feel like crying every day." Item 20: "I feel alone all of the time." Item 4: "Nothing is fun at all." 2. Markedly diminished interest or pleasure Item 18: "Most days I do not feel like 3. Significant weight loss or eating." decreased appetite nearly every day 4. Insomnia or hypersomnia Item 16: "I have trouble sleeping every night." 5. Psychomotor agitation or Item 15: "I have to push myself all retardation the time to do my schoolwork." Item 17: "I am tired all the time." 6. Fatigue or loss of energy nearly every day Item 3: "I do everything wrong." 7. Feelings of worthlessness or Item 7: "I hate myself." excessive guilt nearly every day Item 8: "All bad things are my fault." Item 25: "Nobody really loves me." Item 13: "I cannot make up my mind 8. Diminished ability to think or about things." concentrate or indecisiveness Item 9: "I want to kill myself." 9. Recurrent thoughts of death, suicidal ideation, or suicide attempt social training (e.g., Milne & Spence, 1987), pharmacotherapy (e.g., Preskorn, E.B. Weller, Hughes, R.A. Weller, & Bolte, 1987), and preventive intervention (e.g., Garvin et al., 1991). The application of the CDI in clinical practice or treatment monitoring entails several issues or considerations that are described in the following sections. Establish Baseline Severity of Symptomatology If feasible and appropriate, the CDI should be administered twice at baseline. The resultant two scores can be averaged to yield an index of initial symptom severity. This procedure, also known as multiple baseline assessment, has been recommended by Milich, Roberts, Loney, and Caputo (1980), Conners (1997), and Nelson and Politano (1990)particularly for studies designed to evaluate treatment outcome. Repeated administration of a scale, such as the CDI, is known to be associated with declines in scores (Finch, C.F. Saylor, Edwards, & McIntosh, 1987; Kaslow et al., 1984; Meyer, Dyck, & Petrinack, 1989), which may reflect a methodological artifact, "statistical" regression to the mean, "placebo" response to the initial assessment, as well as some spontaneous improvement. Therefore, a multiple baseline (rather than a single baseline) assessment is usually considered to yield a more valid index of symptom severity at the beginning of treatment. Determine a Treatment Goal The goals of treatment can include an a priori defined decrement in overall symptom severity, the absence of depressive symptoms, and improvements in specific areas of the child's functioning. Changes in the total CDI score can be interpreted as reflecting changes in the severity of the child's depressive symptoms. If CDI item responses scored

< previous page

page_285

next page >

< previous page

page_286

next page > Page 286

"2" are initially selected as treatment targets, the clinician's goal may include the lessening or elimination of these particular complaints. Additionally, change, or the lack of change in factor scores may help pinpoint areas of functioning wherein therapy has had the most (or the least) impact. Frequency of CDI Administration During Treatment Practical considerations are likely to affect how often the CDI can be readministered during treatment. Such considerations may include the time interval between sessions with the child, as well as the burden of other assessments to which the child may be subjected. In general, a 2-week test-retest interval may be most appropriate (Kovacs, 1992), and the time required for any given test battery (including the CDI) should not exceed 20 minutes, particularly with younger patients. If possible, the instrument should be administered at about the same time of day each time and in the same location in order to control extraneous variables that might impact on the responses. Assess the Statistical/Clinical Significance of Change in CDI Scores CDI scores for the same respondent are likely to vary with repeated administration owing to random fluctuation in responses. Therefore, it is important to define the magnitude of change in CDI scores that is to be considered significant. On a purely descriptive level, significant improvement can be defined in terms of a desired change in responses to selected CDI items. For example, if one treatment target is to improve the child's sleep, then a change on Item 16 from "I have trouble sleeping every night" to "I have trouble sleeping many nights" or "I sleep pretty well'' may be considered clinically meaningful. As Conners (1994) noted: Clinically . . . it is always useful in assessing change to . . . circle three to five items that . . . are the most crucial problem areas. Then, regardless of changes in factor scores, it is possible to examine particular target symptoms or behaviors for evidence of a treatment effect. Obviously, one must be mindful of the possibility of interpreting random fluctuations as real change, but this is precisely the reason for not relying on a single outcome measure. (p. 569) From a clinical perspective, T-score changes of five or more points on the CDI subscales also may be considered to be indicative of significant change (e.g., Conners, 1994). This approach has the advantage of ease of application and the guideline will be useful in most instances. Other methods, including the procedure described in Jacobsen and Truax (1991), address "significant change" with reference to statistical criteria (for a review, see Speer & Greenbaum, 1995). The Jacobson-Truax method involves obtaining the difference between the baseline raw score and the raw score obtained during or after treatment, which is then divided by the standard error of the differences. This formula utilizes an appropriate reliability value for the test instrument, which can be a test-retest, Cronbach's alpha, or split-half reliability value. A repeated measures t test represents an alternative statistical method of estimating significant change in scores. The responses from the baseline CDI administration are paired with the responses from the administration taken during or after treatment. The

< previous page

page_286

next page >

< previous page

page_287

next page > Page 287

repeated measures t-test procedure is produced automatically by the CDI software program (Kovacs, 1995) allowing ready access to information regarding the significance of change in CDI scores. Decide on the Effects of Treatment In general, downward trends in CDI scores are likely to indicate that treatment is progressing in a proper direction. If CDI scores rise or fluctuate unpredictably from one administration to the next, then a full clinical reassessment is warranted to verify the child's psychiatric status and reevaluate the appropriateness of the intervention. Based on treatment studies of adults, it has been shown that most of the improvement in symptom status occurs by the eighth treatment session (Howard, Kopta, Krause, & Orlinsky, 1986). Thus, after 1 or 2 months of treatment, there should be an observable reduction in the child's depressive symptomatology, although full remission would not yet be evident. Decisions about the effects of treatment with a depressed child should not depend solely on the CDI. For example, one research study found a tendency among children to deny symptoms and to respond defensively (Joiner, K.L. Schmidt, & N.B. Schmidt, 1996). Such findings reinforce the need to corroborate self-report information prior to making decisions about the effects of treatment. A Hypothetical Case Study A hypothetical case study is now provided (using elements of actual clinical cases) to illustrate some of the aforementioned principles in the use of the CDI. This case includes screening, treatment planning, treatment monitoring, and outcome assessment components. Tamara is a 9-year-old girl who has been living with her mother. Tamara's mother contacted the clinic out of concern for her daughter's behavior. The mother described Tamara as being overly sensitive and emotionally labile, as well as prone to extreme emotional outbursts. During some of these outbursts, Tamara screamed, cried, and voiced concerns that her mother would leave her. The CDI was first administered to Tamara after the initial contacts with the mother. The first administration yielded a CDI total raw score of 34, which is well above established cutoff points for identifying children who are at-risk. In a 3-year period before the initial assessment, Tamara experienced several major negative life events, including a fire in the family home that involved the death of Tamara's older brother and the destruction of all of the family's personal belongings, and the subsequent disappearance of her natural father. A psychiatric interview with the mother revealed symptomatology for Tamara that dated back to the disappearance of her natural father. At the time of his disappearance, Tamara had developed considerable sadness, crying, negative self-esteem, and guilt. She also had difficulty sleeping. After the fire, she additionally developed nightmares. Tamara started to experience occasional thoughts of wanting to die, as well as difficulty with concentration. The latter symptom was verified by her school records and declining school grades.

< previous page

page_287

next page >

< previous page

page_288

next page > Page 288

In a psychiatric interview with Tamara, it became clear that she was aware of what was upsetting her and talked about her fear of being apart from her mother. She spoke of her long-standing sadness, difficulty with concentration, difficulty in sleeping, and feeling like a burden to others. She also believed that nothing would change in her life. She admitted to not wanting to go to school because of how the other children were treating her. Based on the information obtained during these detailed psychiatric interviews, it was determined that Tamara met psychiatric diagnostic criteria (APA, 1994) for dysthymic disorder. She also had a diagnosable anxiety disorder. By examining her CDI factor scores, it became apparent that negative affect, ineffectiveness, and anhedonia were more problematic for her than behavior problems or low self-esteem. The Ineffectiveness score was relatively elevated and was consistent with her recent school problems. The assimilation of information from the CDI combined with the developmental history and clinical information resulted in the development of an intervention plan. Before treatment began, a second administration of the CDI was conducted in order to strengthen the accuracy of the baseline, and to corroborate other clinical observations. Recommendations for individual and concomitant parent-child therapy sessions were made, and treatment began approximately 1 month after the initial evaluation. Over the next few months of the intervention program, important improvements were noted. A third administration of the CDI was done and it appeared that the symptoms had been reduced to an acceptable level. On the third administration, Tamara's CDI total raw score had dropped to 11. The CDI software program was used to generate a comparison between the posttreatment administration and the baseline scores; the large change was determined to be statistically significant. A full clinical evaluation at that point suggested that Tamara had recovered from her depression and anxiety. Periodic follow-up checks were done to make sure that Tamara had maintained the gains from the therapeutic intervention. Six months after discontinuing intervention, a follow-up (fourth) administration of the CDI was given, and although the scores had increased slightly compared to the third administration, Tamara continued to show reasonably benign levels of depressive symptomatology. Figure 9.1 shows portions of the report produced by the CDI software, which includes a graph of the four CDI administrations and a statistical assessment of the magnitude of the change that occurred over administrations. There was no significant difference between the two baseline administrations but, after treatment, scores were significantly lower than both of the baseline results. These findings strongly suggest the treatment was effective in dealing with Tamara's depression. Conclusions The National Institute of Mental Health (NIMH) has specified 11 criteria for evaluating outcome measures (Ciarlo, Brown, Edwards, Kiresuk, & Newman, 1986; Newman & Ciarlo, 1994). The CDI rates favorably with respect to each of these criteria (set in italic in the following paragraphs). The CDI has been highly useful with various populations and in different settings. As described earlier, the CDI has been validated with the key target groups of nonreferred children as well as clinically depressed children. It has also been used with various other populations. As emphasized throughout this chapter, proper use of the CDI involves

< previous page

page_288

next page >

< previous page

page_289

next page > Page 289

Fig. 9.1. Portions of Sample Report from CDI software (based on hypothetical case of Tamara).

< previous page

page_289

next page >

< previous page

page_290

next page > Page 290

Fig. 9.1. (Continued) its integration with information from multiple informants and sources in order to make diagnostic and treatment decisions. An amendment to the CDI, currently in progress, includes the development of parallel forms that can be completed by parents and teachers. Preliminary versions of the CDI-Parent version (CDI-P; Kovacs, 1997a) and CDI-Teacher version (CDI-T; Kovacs, 1997b) are being pilot tested and standardized. It has been demonstrated that the CDI has a high degree of utility in the area of clinical services and is compatible with a variety of clinical theories and practices. Its results can easily be translated so as to be appropriate and useful in clinical treatment strategies. The CDI also can be used to evaluate the effectiveness of such treatment strategies. The CDI adheres to the NIMH criterion that an outcome measure be useful in identifying relevant changes in the client during the process of treatment, which can be "behavioral markers of progress or risk level" (Newman & Ciarlo, 1994, p. 102). Several strategies were described in this chapter for assessing the significance of client change in CDI scores during treatment.

< previous page

page_290

next page >

< previous page

page_291

next page > Page 291

The psychometric strengths of the CDI are well established and documented by an abundance of research publications. Normative data, described in the CDI manual (Kovacs, 1992), provide clinicians with benchmarks that act as objective referants to be used in interpreting test results. The norms in the manual are based on a North American sample, but data from many other countries are also available. Furthermore, in accordance with the American Psychological Association and the Association of Test Publishers stipulations, the CDI has been validated in accordance with each of its proposed uses. From a pragmatic perspective, the CDI is simple and easy to use; manuals and materials are available to facilitate proper administration, scoring, and interpretation. In addition, the CDI is extremely cost-efficient, and its results are both easy to relay and readily comprehensible by nonprofessional audiences. For all of the previous reasons, the CDI deserves the worldwide attention it has received in a variety of studies and a wide range of contexts both as a research and a clinical tool. And, its adherence to NIMH standards for assessment instruments also supports its suitability for monitoring treatment and assessing outcome. Acknowledgments The authors wish to express their appreciation to Joanne Morrison and Karen Hirscheimer for their help gathering the information for this chapter. References Abdel-Khalek, A.M. (1993). The construction and validation of the Arabic Children's Depression Inventory. European Journal of Psychological Assessment, 9, 41-50. Abdel-Khalek, A.M. (1996). Factorial structure of the Arabic Children's Depression Inventory among Kuwaiti subjects. Psychological Reports, 78, 963-967. Albert, N., & Beck, A.T. (1975). Incidence of depression in early adolescence: A preliminary study. Journal of Youth and Adolescence, 4, 301-307. Allen, D.M., & Tarnowski, K.J. (1989). Depressive characteristics of physically abused children. Journal of Abnormal Child Psychology, 17, 1-11. American Psychiatric Association (1980). Diagnostic and statistical manual of mental disorders (3rd ed.). Washington, DC: Author. American Psychiatric Association (1985). Diagnostic and statistical manual of mental disorders (3rd rev. ed.). Washington, DC: Author. American Psychiatric Association (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. American Psychological Association (1985). Standards for educational and psychological testing. Washington, DC: Author. Armsden, G.C., McCauley, E., Greenberg, M. T., Burke, P.M., & Mitchell, J.R. (1990). Parent and peer attachment in early adolescent depression. Journal of Abnormal Child Psychology, 18, 683-697. Arnarson, E.O., Smari, J., Einarsdottir, H., & Jonasdottir, E. (1994). The prevalence of depressive symptoms in pre-adolescent school children in Iceland. Scandinavian Journal of Behaviour Therapy, 23, 121-130. Asarnow, J.R., & Carlson, G.A. (1985). Depression Self-Rating Scale: Utility with child psychiatric inpatients. Journal of Consulting and Clinical Psychology, 53, 491-499. Barreto, S.J. (1994). Understanding the Children's Depression Inventory (CDI): A critical review. Child Assessment News, 3, 3-5. Bartell, N.P., & Reynolds, W.M. (1986). Depression and self-esteem in academically gifted and nongifted children: A comparison study. Journal of School Psychology, 24, 55-61.

< previous page

page_291

next page >

< previous page

page_292

next page > Page 292

Beck, A.T. (1967). Depression: Clinical, experimental, and theoretical aspects. New York: Harper & Row. Beitchman, J.H. (1996). Feelings, Attitudes, and Behaviors Scale for Children (FAB-C). Toronto, ON: MultiHealth Systems. Benavidez, D.A., & Matson, J.L. (1993). Assessment of depression in mentally retarded adolescents. Research in Developmental Disabilities, 14, 179-188. Berndt, D.J., Schwartz, S., & Kaiser, C.F. (1983). Readability of self-report depression inventories. Journal of Consulting and Clinical Psychology, 51, 627-628. Blumberg, S.H., & Izard, C.E. (1986). Discriminating patterns of emotions in 10- and 11-year-old children's anxiety and depression. Journal of Personality and Social Psychology, 51, 852-857. Bodiford, C.A., Eisenstadt, T.H., Johnson, J.H., & Bradlyn, A.S. (1988). Comparison of learned helpless cognitions and behavior in children with high and low scores on the Children's Depression Inventory. Journal of Clinical Child Psychology, 17, 152-158. Breen, M.P., & Weinberger, D.A. (1995). Regulation of depressive affect and interpersonal behavior among children requiring residential or day treatment. Development and Psychopathology, 7, 529-541. Butcher, J.N., Dahlstrom, W.G., Graham, J.R., Tellegen, A.M., & Kaemmer, B. (1989). Minnesota Multiphasic Personality Inventory-2 (MMPI-2): Manual for administration and scoring. Minneapolis, MN: University of Minnesota Press. Campbell, D., & Fiske, D. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105. Canals, J., Henneberg, C., Fernandez-Ballart, J., & Domenech, E. (1995). A longitudinal study of depression in an urban Spanish pubertal population. European Child and Adolescent Psychiatry, 4, 102-111. Carey, M.P., Faulstich, M.E., Gresham, F.M., Ruggiero, L., & Enyart, P. (1987). Children's Depression Inventory: Construct and discriminant validity across clinical and nonreferred (control) populations. Special Issue: Eating disorders. Journal of Consulting and Clinical Psychology, 55, 755-761. Chall, J.S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Cambridge, MA: Brookline. Chartier, G.M., & Lassen, M.K. (1994). Adolescent depression: Children's Depression Inventory norms, suicidal ideation, and (weak) gender effects. Adolescence, 29, 859-864. Ciarlo, J.A., Brown, T.R., Edwards, D.W., Kiresuk, T.J., & Newman, F.L. (1986). Assessing mental health treatment outcome measurement techniques (DHHS Publication No. ADM 86-1301). Washington, DC: U.S. Government Printing Office. Conners, C.K. (1994). Conners Rating Scales. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 550-578). Hillsdale, NJ: Lawrence Erlbaum Associates. Conners, C.K. (1997). Conners' Rating Scales-Revised: Technical manual. Toronto, Ontario: Multi-Health Systems. Craighead, W.E., Curry, J.F., & Ilardi, S.S. (1995). Relationship of Children's Depression Inventory factors to major depression among adolescents. Psychological Assessment, 7, 171-176. Curry, J.F., & Craighead, W.E. (1990). Attributional style and self-reported depression among adolescent inpatients. Child and Family Behavior Therapy, 12, 89-93. Dale, E., & Chall, J.S. (1948). A formula for predicting readability. Columbia, OH: Ohio State University Bureau of Educational Research. Devine, D., Kempton, T., & Forehand, R. (1994). Adolescent depressed mood and young adult functioning: A longitudinal study. Journal of Abnormal Child Psychology, 22, 629-640. DuBois, D.L., Felner, R.D., Bartels, C.L., & Silverman, M.M. (1995). Stability of selfreported depressive symptoms in a community sample of children and adolescents. Journal of Clinical Child Psychology, 24, 386396. DuRant, R.H., Getts, A., Cadenhead, C., Emans, S.J., & Woods, E.R. (1995). Exposure to violence and victimization and depression, hopelessness, and purpose in life among adolescents living in and around public housing. Journal of Developmental and Behavioral Pediatrics, 16, 233-237.

< previous page

page_292

next page >

< previous page

page_293

next page > Page 293

Eason, L.J., Finch, A.J., Jr., Brasted, W., & Saylor, C.F. (1985). The assessment of depression and anxiety in hospitalized pediatric patients. Child Psychiatry and Human Development, 16, 57-64. Elliott, D.J., & Tarnowski, K.J. (1990). Depressive characteristics of sexually abused children. Child Psychiatry and Human Development, 21, 37-48. Fauber, R., Forehand, R., Long, N., Burke, M., & Faust, J. (1987). The relationship of young adolescent Children's Depression Inventory (CDI) scores to their social and cognitive functioning. Journal of Psychopathology and Behavioral Assessment, 9, 161-172. Faulstich, M.E., Carey, M.P., Ruggiero, L., Enyart, P., & Gresham, F. (1986). Assessment of depression in childhood and adolescence: An evaluation of the Center for Epidemiological Studies Depression Scale for Children (CES-DC). American Journal of Psychiatry, 143, 1024-1027. Felner, R.D., Rowlison, R.T., Raley, P.A., & Evans, E. (1988). Depression in children and adolescents: A comparative analysis of the utility and construct validity of two assessment measures. Journal of Consulting and Clinical Psychology, 56, 769-772. Finch, A.J., Saylor, C.F., Edwards, G.L., & McIntosh, J.A. (1987). Children's Depression Inventory: Reliability over repeated administrations. Journal of Consulting and Clinical Psychology, 16, 339-341. Fine, S., Moretti, M., Haley, G., & Marriage, K. (1985). Affective disorders in children and adolescents: The dysthymic disorder dilemma. Canadian Journal of Psychiatry, 30, 173-177. Fitzpatrick, K.M. (1993). Exposure to violence and presence of depression among low-income, AfricanAmerican youth. Journal of Consulting and Clinical Psychology, 61, 528-531. Frias, D., Mestre, V., del Barrio, V., & Garcia-Ros, R. (1992). Estructura familiar y depresion infantil [Family structure and childhood depression]. Anuario-de-Psicologia, 52, 121-131. Friedman, R.J., & Butler, L.F. (1979). Development and evaluation of a test battery to assess childhood depression. Final report to Health and Welfare, Canada, for Project No. 606-1533-44. Fristad, M.A., Emery, B.L., & Beck, S.J. (1997). Use and abuse of the Children's Depression Inventory. Journal of Consulting and Clinical Psychology, 65, 699-702. Fristad, M.A., Weller, E.B., Weller, R.A., Teare, M., & Preskorn, S.H. (1988). Self-report vs. biological markers in assessment of childhood depression. Journal of Affective Disorders, 15, 339-345. Garvin, V., Leber, D., & Kalter, N. (1991). Children of divorce: Predictors of change following preventive intervention. American Journal of Orthopsychiatry, 61, 438-447. Ghareeb, G.A., & Beshai, J.A. (1989). Arabic version of the Children's Depression Inventory: Reliability and validity. Journal of Clinical Child Psychology, 18, 323-326. Gladstone, T.R.G., & Kaslow, N.J. (1995). Depression and attributions in children and adolescents: A metaanalytic review. Journal of Abnormal Child Psychology, 23, 597-606. Goldstein, D., Paul, G.G., & Sanfilippo-Cohn, S. (1985). Depression and achievement in subgroups of children with learning disabilities. Journal of Applied Developmental Psychology, 6, 263-275. Gouveia, V.V., Barbosa, G.A., de Almeida, H.J.F., & de Andrade-Gaiao, A. (1995). Inventario de depressao infantilCDI: Estudo de adptacao com escolares de Joao Pessoa [Children's Depression InventoryCDI: Adaptation study with students of Joao Pessoa]. Jornal Brasileiro de Psiquiatria, 44, 345-349. Haley, G.M.T., Fine, S., Marriage, K., Moretti, M. M., & Freeman, R.J. (1985). Cognitive bias and depression in psychiatrically disturbed children and adolescents. Journal of Consulting and Clinical Psychology, 53, 535-537. Hammen, C., Adrian, C., Gordon, D., Burge, D., Jaenicke, C., & Hiroto, D. (1987). Children of depressed mothers: Maternal strain and symptom predictors of dysfunction. Journal of Abnormal Psychology, 96, 190-198. Hammen, C., Adrian, C., & Hiroto, D. (1988). A longitudinal test of the attributional vulnerability model in children at risk for depression. British Journal of Clinical Psychology, 27, 37-46. Harrison, C. (1980). Readability in the classroom. Cambridge, England: Cambridge University Press.

< previous page

page_293

next page >

< previous page

page_294

next page > Page 294

Helsel, W.J., & Matson, J.L. (1984). The Assessment of depression in children: The internal structure of the Child Depression Inventory (CDI). Behaviour Research and Therapy, 22, 289-298. Hepperlin, C.M., Stewart, G.W., & Rey, J.M. (1990). Extraction of depression scores in adolescents from a general-purpose behaviour checklist. Journal of Affective Disorders, 18, 105-112. Hodges, K. (1990). Depression and anxiety in children: A comparison of self-report questionnaires to clinical interview. Psychological Assessment, 2, 376-381. Hodges, K., & Craighead, W.E. (1990). Relationship of Children's Depression Inventory factors to diagnosed depression. Psychological Assessment, 2, 489-492. Howard, K.I., Kopta, S.M., Krause, M.S., & Orlinsky, D.E. (1986). The dose-effect relationship in psychotherapy. American Psychologist, 41, 159-164. Huddleston, E.N., & Rust, J.O. (1994). A comparison of child and parent ratings of depression and anxiety in clinically referred children. Research Communications in Psychology, Psychiatry and Behavior, 19, 101-112. Ines, T.M., & Sacco, W.P. (1992). Factors related to correspondence between teacher ratings of elementary student depression and student self-ratings. Journal of Consulting and Clinical Psychology, 60, 140-142. Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Jensen, P.S., Bloedau, L., Degroot, J., Ussery, T., & Davis, H. (1990). Children at risk: I. Risk factors and child symptomatology. Journal of the American Academy of Child and Adolescent Psychiatry, 29, 51-59. Joiner, T.E., Schmidt K.L., Schmidt, N.B. (1996). Low-end specificity of childhood measures of emotional distress: Differential effects for depression and anxiety. Journal of Personality Assessment, 67, 258-271. Kashani, J.H., & Carlson, G.A. (1985). Major depressive disorder in a preschooler. Journal of the American Academy of Child Psychiatry, 24, 490-494. Kashani, J.H., Husain, A., Shekim, W.O., Hodges, K.K., Cytryn, L., & McKnew, D.H. (1981). Current perspective on childhood depression: An overview. American Journal of Psychiatry, 138, 143-153. Kaslow, N.J., Rehm, L.P., & Siegel, A.W. (1984). Social-cognitive and cognitive correlates of depression in children. Journal of Abnormal Child Psychology, 12, 605-620. Kazdin, A.E., & Petti, T.A. (1982). Self-report and interview measures of childhood and adolescent depression. Journal of Child Psychology and Psychiatry, 23, 437-457. Kazdin, A.E., Colbus, D., & Rodgers, A. (1986). Assessment of depression and diagnosis of depressive disorder among psychiatrically disturbed children. Journal of Abnormal Child Psychology, 14, 499-515. Kazdin, A.E., Esveldt-Dawson, K., Unis, A.S., & Rancurello, M.D. (1983). Child and parent evaluations of depression and aggression in psychiatric inpatient children. Journal of Abnormal Child Psychology, 11, 401-413. Kazdin, A.E., French, N.H., Unis, A.S., & Esveldt-Dawson, K. (1983). Assessment of childhood depression: Correspondence of child and parent ratings. Journal of the American Academy of Child Psychiatry, 22, 157-164. Kazdin, A.E., French, N.H., Unis, A.S., Esveldt-Dawson, K., & Sherick, R.B. (1983). Hopelessness, depression, and suicidal intent among psychiatrically disturbed inpatient children. Journal of Consulting and Clinical Psychology, 51, 504-510. Kazdin, A.E., Rodgers, A., & Colbus, D. (1986). The Hopelessness Scale for Children: Psychometric characteristics and concurrent validity. Journal of Consulting and Clinical Psychology, 54, 241-245. Knight, D., Hensley, V.R., & Waters, B. (1988). Validation of the Children's Depression Scale and the Children's Depression Inventory in a prepubertal sample. Journal of Child Psychology and Psychiatry, 29, 853863. Koizumi, S. (1991). The standardization of Children's Depression Inventory. Syoni Hoken Kenkyu, 50, 717-721. Kovacs, M. (1985). The Children's Depression Inventory. Psychopharmacology Bulletin, 21, 995-998. Kovacs, M. (1992). The Children's Depression Inventory (CDI) manual. Toronto, ON: Multi-Health Systems.

< previous page

page_294

next page >

< previous page

page_295

next page > Page 295

Kovacs, M. (1995). The Children's Depression Inventory (CDI) software manual. Toronto, ON: Multi-Health Systems. Kovacs, M. (1996a). The course of childhood-onset depressive disorders. Psychiatric Annals, 26, 326-330. Kovacs, M. (1996b). Presentation and course of major depressive disorder during childhood and later years of the life span. Journal of the American Academy of Child and Adolescent Psychiatry, 35, 705-715. Kovacs, M. (1997a). The Children's Depression Inventory Parent Version (CDI-P). Toronto, ON: Multi-Health Systems Inc. Kovacs, M. (1997b). The Children's Depression Inventory Teacher Version (CDI-T). Toronto, ON: MultiHealth Systems. Kovacs, M., Akiskal, H.S., Gatsonis, C., & Parrone, P.L. (1994). Childhood onset dysthymic disorder: Clinical features and prospective naturalistic outcome. Archives of General Psychiatry, 51, 365-374. Kovacs, M., & Beck, A.T. (1977). An empirical-clinical approach toward a definition of childhood depression. In J.G. Shulterbrandt & A. Raskin (Eds.), Depression in childhood: Diagnosis, treatment, and conceptual models (pp. 1-25). New York: Raven. Kovacs, M., Gatsonis, C., Paulauskas, S.L., & Richards, C. (1989). Depressive disorders in childhood: IV. A longitudinal study of comorbidity with and risk for anxiety disorders. Archives of General Psychiatry, 46, 776782. Kovacs, M., & Goldston, D. (1991). Cognitive and social cognitive development of depressed children and adolescents. Journal of the American Academy of Child and Adolescent Psychiatry, 30, 388-392. Kovacs, M., Iyengar, S., Stewart, J., Obrosky, S., & Marsh, J. (1990). Psychological functioning of children with insulin-dependent diabetes mellitus: A longitudinal study. Journal of the American Academy of Child and Adolescent Psychiatry, 30, 388-392. Kovacs, M., Obrosky, S., Gatsonis, C., & Richards, C. (1997). First-episode major depressive and dysthymic disorder in childhood: Clinical and sociodemographic factors in recovery. Journal of the American Academy of Child and Adolescent Psychiatry, 36, 777-784. Kuttner, M.J., Delamater, A.M., & Santiago, J. V. (1989). Learned helplessness in diabetic youths. Journal of Pediatric Psychology, 15, 581-594. Lachar, D., & Gdowski, C.L. (1979). Actuarial assessment of child and adolescent personality: An interpretive guide for the Personality Inventory for Children Profile. Los Angeles, CA: Western Psychological Services. Lipovsky, J.A., Finch, A.J., & Belter, R.W. (1989). Assessment of depression in adolescents: Objective and projective measures. Journal of Personality Assessment, 53, 449-458. Lobert, W. (1989). Untersuchung von Merkmalen depressiver Verstimmung in der Pubertat mit dem KinderDepressions-Inventar nach Kovacs [Investigation of symptoms of depressive moodiness during puberty with the Children's Depression Inventory according to Kovacs]. Zeitschrift fur Kinder und Jugendpsychiatrie, 17, 194201. Lobert, W. (1990). Untersuchung zur Struktur der depressiven Verstimmung in der Pubertat mit dem GCDI (German Children's Depression Inventory) [Investigation of the structure of depressive moodiness during puberty with the GCDI (German Children's Depression Inventory)]. Zeitschrift fur Kinder und Jugendpsychiatrie, 18, 18-22. Lobovits, D.A., & Handal, P.J. (1985). Childhood depression: Prevalence using DSM-III criteria and validity of parent and child depression scales. Journal of Pediatric Psychology, 10, 45-54. Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley. March, J.S. (1997). The Multidimensional Anxiety Scale for Children (MASC). Toronto, ON: Multi-Health Systems. Marciano, P.L., & Kazdin, A.E. (1994). Selfesteem, depression, hopelessness, and suicidal intent among psychiatrically disturbed inpatient children. Journal of Clinical Child Psychology, 23, 151-160. Marriage, K., Fine, S., Moretti, M., & Haley, G. (1986). Relationship between depression and conduct disorder in children and adolescents. Journal of the American Academy of Child Psychiatry, 25, 687-691. Mattison, R.E., Handford, H.A., Kales, H. C., Goodman, A.L., & McLaughlin, R. E. (1990). Four-year pedictive value of the Children's Depresion Inventory. Psychological Assessment, 2, 169-174.

< previous page

page_295

next page >

< previous page

page_296

next page > Page 296

McCauley, E., Mitchell, J.R., Burke, P., & Moss S. (1988). Cognitive attributes of depression in children and adolescents. Journal of Consulting and Clinical Psychology, 56, 903-908. Meins, W. (1993). Assessment of depression in mentally retarded adults: Reliability and validity of the Children's Depression Inventory (CDI). Research in Developmental Disabilities, 14, 299-312. Mestre, V., Frias, D., & Garcia-Ros, R. (1992). Propiedades psicomettricas del Children's Depression Inventory (CDI) en poblacion adolescente: Fiabilidad y validez [Psychometric properties of the Children's Depression Inventory (CDI) in the adolescent population: Reliability and validity]. Psicologica, 13, 149-159. Meyer, N.E., Dyck, D.G., & Petrinack, R.J. (1989). Cognitive appraisal and attributional correlates of depressive symptoms in children. Journal of Abnormal Child Psychology, 17, 325-336. Milich, R., Roberts, M.A., Loney, J., & Caputo, J. (1980). Differentiating practice effects and statistical regression on the Conners' Hyperkinesis Index. Journal of Abnormal Psychology, 8, 549-552. Milne, J., & Spence, S.H. (1987). Training social perception skills with primary school children: A cautionary note. Behavioural Psychotherapy, 15, 144-157. Moretti, M.M., Fine, S., Haley, G., Marriage, K. (1985). Childhood and adolescent depression: Child-report versus parent-report information. Journal of the American Academy of Child Psychiatry, 24, 298-302. Nieminen, G.S., & Matson, J.L. (1989). Depressive problems in conduct-disordered adolescents. Journal of School Psychology, 27, 175-188. Nelson, W.M., & Politano, P.D. (1990). Children's Depression Inventory: Stability over repeated administrations in psychiatric inpatient children. Journal of Clinical Child Psychiatry, 19, 254-256. Nelson, W.M., Politano, P.M., Finch, A.J., Wendel, N., & Mayhall, C. (1987). Children's Depression Inventory: Normative data and utility with emotionally disturbed children. Journal of the American Academy of Child and Adolescent Psychiatry, 26, 43-48. Newman, F. L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Nolen-Hoeksema, S., Girgus, J.S., & Seligman, M.E.P. (1986). Learned helplessness in children: A longitudinal study of depression, achievement, and explanatory style. Journal of Personality and Social Psychology, 51, 435442. Norvell, N., Brophy, C., & Finch, A.J. (1985). The relationship of anxiety to childhood depression. Journal of Personality Assessment, 49, 150-153. Ollendick, T.H., & Yule, W. (1990). Depression in British and American children and its relation to anxiety and fear. Journal of Consulting and Clinical Psychology, 58, 126-129. Oy, B. (1991). Children's Depression Inventory: A study of reliability and validity. Turk Psikiyatri Dergisi, 2, 132-136. Polaino-Lorente, A., & del-Pozo-Armentia, A. (1992). Modification de la depresion mediante un programa de intervencion psicopedagogica en ninos cancerosos no hospitalizados [Modification of the depression through a psychopedagogic intervention program in childhood cancer hospitalization]. Analisis y Modificacion de Conducta, 18, 493-503. Polaino-Lorente, A., & Domenech, E. (1993). Prevalence of childhood depression: Results of the first study in Spain. Journal of Child Psychology and Psychiatry and Allied Disciplines, 34, 1007-1017. Politano, P.M., Nelson, W.M., Evans, H.E., Sorenson, S.B., & Zeman, D.J. (1985). Factor analytic evaluation of differences between Black and Caucasian emotionally disturbed children on the Children's Depression Inventory. Journal of Psychopathology and Behavioral Assessment, 8, 1-7. Pons-Salvador, G., & del Barrio, V. (1993). Depresion infantil y divorcio [Child depression and divorce]. Avances en Psicologia Clinica Latinoamericana, 11, 95-106. Ponterotto, J.G., Pace, T.M., & Kavan, M.G. (1989). A counselor's guide to the assessment of depression. Journal of Counseling and Development, 67, 301-309.

< previous page

page_296

next page >

< previous page

page_297

next page > Page 297

Preskorn, S.H., Weller, E.B., Hughes, C.W., Weller, R.A., & Bolte, K. (1987). Depression in prepubertal children: Dexamethasone nonsupression predicts differential response to imipramine vs. placebo. Psychopharmacology Bulletin, 23, 128-133. Puig-Antich, J. (1982). Major depression and conduct disorder in prepuberty. Journal of the American Academy of Child Psychiatry, 21, 118-128. Reicher, H., & Rossmann, P. (1991). Zu den psychometrischen Eigenschaften einer deutschen Version des Children's Depression Inventory [The psychometric properties of a German version of the Children's Depression Inventory]. Diagnostica, 37, 236-251. Reinhard, H.G., Bowi, U., & Rulcovius, G. (1990). Stabilitat, Reliabilitat und Faktorenstruktur einer deutschen Fassung des Children's Depression Inventory [Reliability, stability, and factor structure of a German version of the Children's Depression Inventory]. Zeitschrift fur Kinder und Jugendpsychiatrie, 18, 185-191. Reinherz, H.Z., Frost, A.K., & Pakiz, B. (1991). Changing faces: Correlates of depressive symptoms in late adolescence. Family and Community Health, 14, 52-63. Renouf, A.G., & Kovacs, M. (1994). Concordance between mothers' reports and children's self-reports of depressive symptoms: A longitudinal study. Journal of the American Academy of child and Adolescent Psychiatry, 33, 208-216. Reynolds, C.R., & Richmond, B.O. (1985). Revised Children's Manifest Anxiety Scale manual. Los Angeles: Western Psychological Services. Reynolds, W.M., Anderson, G., & Bartell, N. (1985). Measuring depression in children: A multimethod assessment investigation. Journal of Abnormal Child Psychology, 13, 513-526. Rotundo, N., & Hensley, V.R. (1985). The Children's Depression Scale. A study of its validity. Journal of Child Psychology and Psychiatry, 26, 917-927. Sacco, W.P., & Graves, D.J. (1985). Correspondence between teacher ratings of childhood depression and child self-ratings. Journal of Clinical Child Psychology, 14, 353-355. Saint-Laurent, L. (1990). Psychometric study of Kovac's Children's Depression Inventory with a Frenchspeaking sample. Canadian Journal of Behavioural Science, 22, 377-384. Sakurai, S. (1991). The relation between depression and causal attributional style in Japanese children. Japanese Journal of Health Psychology, 4, 23-30. Saylor, C.F., Finch, A.J., Baskin, C.H., Furey, W., & Kelly, M.M. (1984a). Construct validity for measures of childhood depression: Application of multitrait-multimethod methodology. Journal of Consulting and Clinical Psychology, 52, 977-985. Saylor, C.F., Finch, A.J., Jr., Baskin, C.H., Saylor, C.B., Darnell, G., & Furey, W. (1984b). Children's Depression Inventory: Investigation of procedures and correlates. Journal of the American Academy of Child Psychiatry, 23, 626-628. Saylor, C.F., Finch, A.J., Spirito, A., & Bennett, B. (1984c). The Children's Depression Inventory: A systematic evaluation of psychometric properties. Journal of Consulting and Clinical Psychology, 52, 955-967. Seligman, M.E.P., Peterson, C., Kaslow, N.J., Tanenbaum, R.L., Alloy, L.B., & Abramson, L. Y. (1984). Attributional style and depressive symptoms among children. Journal of Abnormal Psychology, 93, 235-238. Shain, B.N., Naylor, M., & Alessi, N. (1990). Comparison of self-rated and clinician-rated measures of depression in adolescents. American Journal of Psychiatry, 147, 793-795. Shah, F., & Morgan, S.B. (1996). Teacher's ratings of social competence of children with high versus low levels of depressive symptoms. Journal of School Psychology, 34, 337-349. Siegel, K., Karus, D., & Raveis, V.H. (1996). Adjustment of children facing the death of a parent due to cancer. Journal of the American Academy of Child and Adolescent Psychiatry, 35, 442-450. Slotkin, J., Forehand, R., Fauber, R., McCombs, A., & Long, N. (1988). Parent-completed and adolescentcompleted CDIs: Relationship to adolescent social and cognitive functioning. Journal of Abnormal Child Psychology, 16, 207-217. Speer, D.C., & Greenbaum, P.E. (1995). Five methods for computing significant individual client change and improvement rates: Support for an individual growth curve approach.

< previous page

page_297

next page >

< previous page

page_298

next page > Page 298

Journal of Consulting and Clinical Psychology, 63, 1044-1048. Spence, S.H., & Milne, J. (1987). The Children's Depression Inventory: Norms and factor analysis from an Australian school population. Australian Psychologist, 22, 345-351. Spirito, A., Overholser, J., & Hart, K. (1991). Cognitive characteristics of adolescent suicide attempters. Journal of the American Academy of Child and Adolescent Psychiatry, 30, 604-608. Stark, K.D., Kaslow, N.J., & Laurent, J. (1993). The assessment of depression in children: Are we assessing depression or the broad-band construct of negative affectivity? Journal of Emotional and Behavioral Disorders, 1, 149-154. Stavrakaki, C., Williams, E.C., Walker, S., Roberts, N., & Kotsopoulos, S. (1991). Pilot study of anxiety and depression in prepubertal children. Canadian Journal of Psychiatry, 36, 332-338. Stiensmeier-Pelster, J., Schurmann, M., & Duda, K. (1991). Das Depressionsinventar fur Kinder und Jugendliche (DIKJ): Unter-suchungen zu seinen psychometrischen Eigenschaften [The psychometric properties of the German version of the Children's Depression Inventory]. Diagnostica, 37, 149-159. Stiensmeier-Pelster, J., Schurmann, M., & Urhahne, D. (1991). Das Depressionsinventar fur Kinder und Jugendliche (DIKJ): Einsetzbarkeit in der Primarstufe [The Depression Inventory for Children and Adolescents (DICA): Its applicability on the elementary school level]. Zeitschrift fur Entwicklungs-psychologie und Padagogische Psychologie, 23, 171-176. Stocker, C.M. (1994). Children's perceptions of relationships with siblings, friends, and mothers: Compensatory processes and links with adjustment. Journal of Child Psychology and Psychiatry and Allied Disciplines, 35, 1447-1459. Strober, S., & Carlson, G. (1982). Bipolar illness in adolescents with major depression. Clinical, genetic, and psychopharmacologic predictors in a 3-to 4-year prospective follow-up investigation. Archives of General Psychiatry, 39, 549-555. Weiss, B., & Weisz, J.R. (1988). Factor structure of self-reported depression: Clinic-referred children versus adolescents. Journal of Abnormal Psychology, 97, 492-495. Weiss, B., Weisz, J.R., Politano, M., Carey, M., Nelson, W.M., & Finch, A.J. (1991). Developmental differences in the factor structure of the Children's Depression Inventory. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 3, 38-45. Weissman, M.M., Orvaschel, H., & Padian, N. (1980). Children's symptom and social functioning self-report scales: Comparison of mothers' and children's reports. Journal of Nervous and Mental Diseases, 168, 736-740. Wolfe, V.V., Finch, A.J., Jr., Saylor, C., Blount, R.L., Pallmeyer, T.P., & Carek, D. J. (1987). Negativity affectivity in children: A multitrait-multimethod investigation. Journal of Consulting and Clinical Psychology, 55, 245-250. Worchel, F.F., Hughes, J.N., Hall, B.M., Stanton, S.B., Stanton, H., & Little, V.Z. (1990). Evaluation of subclinical depression in children using self-, peer-, and teacher-report measures. Journal of Abnormal Child Psychology, 18, 271-282. Worchel, F., Nolan, B., & Willson, V. (1987). New perspectives on child and adolescent depression. Journal of School Psychology, 25, 411-414. Zivcic, I. (1993). Emotional reactions of children to war stress in Croatia. Journal of the American Academy of Child and Adolescent Psychiatry, 32, 709-713.

< previous page

page_298

next page >

< previous page

page_299

next page > Page 299

Chapter 10 The Multidimensional Anxiety Scale for Children (MASC) John S. March Duke University Medical Center James D.A. Parker Trent University Presumably because pathological anxiety is associated with significant suffering, disruption in normal psychosocial and academic development and family functioning, and increased utilization of medical services, ''worry" is among the more common reasons for referral to children's mental health care providers (Black, 1995; Simon, Ormel, VonKorff, & Barlow, 1995). Unfortunately, the population prevalence of childhood-onset fears, structure of anxiety symptoms in the general pediatric population, and the relative importance of specific anxiety dimensions within gender, ethnic, or cultural groupings across time have, until recently, remained unclear. This is in part because of a lack of acceptable measurement tools (Costello & Angold, 1995; Last, Perrin, Hersen, & Kazdin, 1992). Ideally, instruments intended to assess anxiety in pediatric patients should provide reliable and valid ascertainment of symptoms across multiple symptom domains; discriminate symptom clusters; differentiate normal from pathological anxiety both qualitatively and quantitatively; incorporate and reconcile multiple observations, such as parent and child ratings; and be sensitive to treatment-induced change in symptoms. Other factors that may influence instrument selection include the reasons for the assessmentscreening, diagnosis, or monitoring treatment outcome, for exampleas well as time required for administration, level of training necessary to administer and/or interpret the instrument, reading level, and cost. Finally, with increasing emphasis on multidisciplinary approaches to assessment and treatment, assessment tools must facilitate communication, not only among clinicians, but also between clinicians and regulatory bodies, such as utilization review committees within managed care environments. Although currently available instruments fall well short of these goals, a complex matrix of tools for assessing normal and pathological fears is now available (March & Albano, 1996). This chapter describes one such instrument, the Multidimensional Anxiety Scale for Children (MASC), which was designed to address the multidimensional assessment of anxiety in children and adolescents in a psychometrically rigorous fashion (March, Parker, Sullivan, Stallings, & Conners, 1997). Excellent reviews of pediatric anxiety disorders in general (March, 1995), and assessment issues in particular (Albano,

< previous page

page_299

next page >

< previous page

page_300

next page > Page 300

1991; March & Albano, 1998) are available. The interested reader is referred to these and other recent articles for a detailed discussion of assessment issues (March & Albano, 1996; McNally, 1991; Thyer, 1991; R.P. Wolff & L.S. Wolff, 1991), neuropsychological assessment (Conners, March, Erhardt, & Butcher, 1995; Hooper & March, 1995), structured interviews (Kearney & Silverman, 1990; Silverman, 1991, 1994), developmental classification (Cicchetti & Cohen, 1995; Costello & Angold, 1993), observational measures (Dadds, Rapee, & Barrett, 1994; Steketee, Chambless, Tran, Worden, & Gillis, 1996) and family assessment (Dadds, Barrett, Rapee, & Ryan, 1996). Overview of the MASC Background Instruments designed specifically to address anxiety in children and adolescents are required for several reasons. First, children appear to undergo a developmentally sanctioned progression in anxiety symptoms (Keller et al., 1992; Last, Strauss, & Francis, 1987). Second, their day-to-day environments differ from those most typically experienced by adults, so that the presentation of anxiety also differs, as in "school phobia." Third, to differentiate normal from pathological anxiety, gender and age norms are necessary. Finally, some fears may be viewed as adaptive or protective; only when anxiety is excessive or the context is developmentally inappropriate does anxiety become clinically significant (Marks, 1987). Other fears, such those seen in obssessive-compulsive disorder (OCD), are developmentally inappropriate under many, if not all, circumstances (Leonard, Goldberger, Rapoport, Cheslow, & Swedo, 1990). Thus, clinicians and researchers interested in childhood anxiety disorders face the challenging task of differentiating pathological anxiety from fears occurring as a part of normal developmental processes. The DSM-III-R addressed this nosological conundrum by introducing a subclass of anxiety disorders of childhood and adolescence (American Psychiatric Association, 1987). The DSM-IV both refined these constructs and established a greater degree of continuitydevelopmental and nosologicalwith the adult anxiety disorders (American Psychiatric Association, 1994). The DSM taxonomy in essence reflects an expert consensus regarding the actual clustering of anxiety in pediatric samples (Shaffer, Campbell, Cantwell, & Bradley, 1989), but empirical support in some cases is questionable (Beidel, 1991), and the hypothesis has never been tested. Some anxiety symptoms, such as refusing to attend school in the patient with panic disorder and agoraphobia, are readily observable; other symptoms are open only to child introspection and thus to child self-report. For this and other reasons, self-report measures of anxiety, which provide an opportunity for children to reveal their internal or "hidden" experience, have found wide application in both clinical and research settings. Typically, self-report measures use a Likert scale format in which a child is asked to rate each questionnaire using either a frequency or intensity format. For example, a child might be asked to rate "I feel tense" on a 4-point frequency scale that ranges from "almost never'' to "often." Self-report measures are easy to administer, require a minimum of clinician time, and economically capture a wide range of important anxiety dimensions from the child's point of view. Taken together, these features make self-report measures ideally suited to gathering data prior to the initial evaluation, because self-report measures used in this fashion increase clinician efficiency by facilitating

< previous page

page_300

next page >

< previous page

page_301

next page > Page 301

accurate assessment of the prior probability that a particular child will or will not have symptoms within a specific symptom domain. For the most part, available self-report rating scales for assessing pediatric anxiety have, until now, represented age-downward extensions of adult measures that fail to capture or adequately operationalize important dimensions of anxiety in young persons (March & Albano, 1996). Three commonly cited instruments have been in use for over 20 years. The Fear Survey Schedule for Children-Revised (FSSC-R) focuses primarily on phobic symptoms, including fear of failure and criticism, fear of the unknown, fear of injury and small animals, fear of danger and death, and medical fears. The Revised Children's Manifest Anxiety Scale (RCMAS) provides three factors: physiological manifestations of anxiety, worry and oversensitivity, and fear/concentration (Reynolds & Paget, 1981). However, the presence of mood, attentional, impulsivity, and peer interaction items on the RCMAS clearly confound other diagnoses, such as attention deficit hyperactivity disorder (ADHD) and major depression (Perrin & Last, 1992). Another widely used measure, the State-Trait Anxiety Inventory for Children (STAIC) (Spielberger, Gorsuch, & Luchene, 1976), consists of two independent 20-item inventories that assess anxiety symptoms from a variety of domains, but that do not exhaustively cover the symptom constellations represented in DSM-IV. The State scale purports to assess present-state and situation-linked anxiety; the Trait scale addresses temporally stable anxiety across situations. Numerous authors have questioned the validity of the state-trait distinction (Kendall, Finch, Auerbach, Hooke, & Mikulka, 1976), and the nature of item selection for the STAIC (Finch, Kendall, & Montgomery, 1976; Perrin & Last, 1992). Table 10.1 contrasts these older measures with the MASC with respect to construct validity, applicability to DSM-IV, reliability, and convergent and divergent validity. Thus, the MASC was developed within the context of a broad agreement by clinicians and researchers that new instruments were necessary if the field of pediatric anxiety disorders was to progress scientifically (see, e.g., Jensen, Salzberg, Richters, & Watanabe, 1993; March & Albano, 1996). Characteristics of the MASC The MASC is a 39-item, Likert-style self-report measure that was developed to index a wide range of anxiety symptoms in elementary, junior high, and high school youngsters from age 8 to 19. As shown in Table 10.2, the MASC has four main factors, three of which can be further divided into two subfactors. Taken together, these factors and subfactors capture the central constructs of pediatric anxiety as they emerge in both population and clinical samples. TABLE 10.1 Anxiety Rating Scales MASCRCMASFSSC- STAI-C R Broad Yes Yes No Yes Conceptualization Specific Dimensions Yes Partial PhobiasNo Matches DSM-IV Yes No No No Reliable Yes Yes Yes Trait Scale Convergent Validity Yes Yes Yes Yes Divergent Validity Yes No No No

< previous page

page_301

next page >

< previous page

page_302

next page > Page 302

TABLE 10.2 MASC Factors and Subfactors Physical Symptoms Tense Somatic Social Anxiety Humiliation Fears Performance Fears Harm Avoidance Perfectionism Anxious Coping Separation Anxiety Procedures for developing and psychometrically validating a new rating scale are complex and time consuming (Cicchetti, 1994). In developing the MASC, the following sequence was taken: An exhaustive review of available rating scales, diagnostic interviews, and the DSM-IV, which generated over 400 potential items. A Q-sort procedure in which these items were divided into cognitive, emotional, physical, and behavioral categories. A data reduction procedure that generated a 41-item scale representing the four conceptual domains. A pilot study of over 1,000 elementary, junior high, and senior high school students conducted in a schoolbased community sample. Based on results from the pilot study, which yielded a 5-factor solution, a 104-item scale (with approximately 20 items per factor) was again piloted in a school-based sample. Principle components factor analyses of data from this population survey provided the current MASC factor structure, which shows excellent internal reliability without excessive redundancy in item content. Based on further clinical and research experience using the scale with children and adolescents from age 5 to 18, 39 items were retained for the final version of the MASC. Confirmatory factor analyses in clinical and community populations and in a large sample of ADHD children replicate the MASC factor structure. Parent-child and parent-parent concordance was shown to be poor overall, supporting the clinical utility of the MASC as a child self-report measure. Convergent and divergent validity of the MASC with respect to parent ratings of externalizing behavior and internalizing symptoms was shown to be high. Test-retest reliability (stability over time) has been demonstrated in clinical and epidemiological samples. The MASC has been shown to be treatment sensitive. The MASC in now in wide use in industry-, foundation, and NIMH-funded studies of pediatric anxiety disorders. Development of the MASC Preliminary Studies Although work to date on the taxonomy of anxiety in children and adolescents provides limited support for the DSM-IV anxiety clusters (see, e.g., Silverman & Eisen, 1992), some have suggested that a broader conceptualization is necessary (Ollendick & King,

< previous page

page_302

next page >

< previous page

page_303

next page > Page 303

1994; Ollendick, Matson, & Helsel, 1985). In contrast to scales that assess a specific DSM-IV anxiety construct (see, e.g., Beidel, Turner, & Morris, 1994), the MASC was developed to assess a wide spectrum of common anxiety symptoms in children across the elementary, junior, and senior high school age range. Thus, when beginning the item selection procedure, nothing was assumed about the normative clustering of pediatric anxiety symptoms other than the hypothesis that specific descriptors should, on theoretical grounds (March, 1995; Marks, 1987), fall within emotional, cognitive, physical, or behavioral symptom domains. The actual procedure followed several steps. First, available self-report anxiety scales covering general and specific symptom domains, as well as the DSM-III-R criterion items, were reviewed. Each of the over 400 resulting items/questions from these measures was then placed on a 3 × 5 card and sorted by two expert clinicians into four symptom domains: cognitive, physical, emotional, and behavioral. Cognitive items were defined as ascertaining a thought, urge, or image, which could be specific (e.g, a fear of dogs) or general (e.g., "worry"). Physical items were characterized by physiological indicators, such as nausea or a racing heart. Emotional items were defined as ascertaining a subjective feeling (e.g., fear), or a subjective sensation (e.g., tension). Behavioral items were defined as ascertaining operant mechanisms of anxiety reduction through approach behaviors (e.g., reassurance seeking) or avoidance behaviors (e.g., avoiding public speaking). Disagreements were resolved by forced consensus judgment. Second, the item pools were reduced by (a) retaining items that were easy to understand, covered the desired age range, and closely reflected one and only one of the four chosen anxiety dimensions; and (b) by eliminating duplicates and rewording. Third, a Q-sort procedure was used to enhance item-content validity. Expert clinicians, members of an anxiety disorders support group, and lay nonexperts classified 60 items (15 per group) into the four selected domains. Fourth, based on their comments and the pattern of misclassification, a 41-item, 4-point Likert scalehaving approximately 10 items per hypothesized symptom domainwas developed and piloted in a population sample of 1,066 fourththrough eighth-grade students. A 3-point Likert version was discarded because of the possibility of excessive midpoint responding. Results from this preliminary study suggested a 5-factor solution, which only partially conformed to the hypothesized four domain model of anxiety: somatic/autonomic arousal (14 items), fears and worries (7 items), social fears (10 items), behavioral avoidance/approach (6 items), and separation anxiety (4 items). The uneven distribution of the items, which attenuated the internal reliability of the smaller factors, coupled with the lack of precision in the model, clearly showed the need for further scale development. Based on the results from the first study, additional items were added to the 5 factors enumerated above to bring each up to a total item pool of approximately 20 items. The resultant 104-item questionnaire was then administered to a population sample of 374 3rd-through 12th-grade students. One classroom from each school was chosen at random for each grade; subjects thus were evenly split from grades 4 to 12. Elementary school students were tested in their usual classroom; junior high school students were tested in homeroom. Questionnaires were read aloud to students, who had the opportunity to ask questions about individual items but not to seek clarification about how they should respond. Like the earlier questionnaire, this questionnaire also used a 4-point Likert scale in which respondents were asked to rate each question as: "Never," "Sometimes," "Rarely," and "Always true about me.'' Students with reading disabilities were given extra time or reading support as needed. Teachers provided demographic information.

< previous page

page_303

next page >

< previous page

page_304

next page > Page 304

Factor Structure With this data in hand, a series of exploratory principal components factor analyses (using Varimax rotation) was conducted on the total sample. A robust 4-factor solution emerged: physical symptoms, social anxiety, separation anxiety, and harm avoidance (March et al., 1997). All had 9 items except the first, which had 10 items. Specifying a conventional Eigenvalue of 1.0 as the principle computer factor analysis (PCA) entry criterion generated additional factors. In contrast to the reported factor structure, where between-factor overlap proved minimal at the item level, these smaller factors explained little additional variance and contained items that tended to load across multiple factors. Each major factor was then subjected to a principal components factor analysis (again using Varimax rotation). Three of the four main factorsall except the separation anxiety factorproduced a clear 2-factor solution, using an Eigenvalue of one as the entry criterion. Physical symptoms factored into tense/restless and somatic/autonomic subfactors. Harm avoidance factored into perfectionism and anxious coping. Social anxiety factored into humiliation/rejection fears and performance anxiety, whereas the separation anxiety factor was found to be unidimensional. In all cases, the first listed subfactor carried the majority of the variance (March et al., 1997). A large body of literature suggests that anxieties of all sorts are more common in females than males (Benjamin, Costello, & Warren, 1990), and some symptoms, for example separation anxiety, vary with age (Francis, Last, & Strauss, 1987). To establish between-group differences for age or gender when using a self-report questionnaire, it is crucial to first establish that the factor structures are identical. To this purpose, a multisample confirmatory factor analysis was conducted using the EQS (Bentler, 1995) statistical program to test whether the 4-factor model for the 39 MASC items was equivalent for males and females. All factor loadings were constrained to be equal for males and females, as were the correlations between the four MASC factors. Multiple goodness-of-fit indicators revealed that the 4-factor model fit well in both sexes. The nonnormed fit index (NNFI; Bentler & Bonnett, 1980) was .913, the comparative fit index (CFI; Bentler, 1988) was .916, and the incremental fit index (IFI; Bolen, 1989) was. 917. The magnitude of the three indexes (above .90, as suggested by Bentler, 1995) suggests that the model had excellent fit to the data regardless of gender. A multisample confirmatory factor analysis was also conducted to test whether the 4-factor model was equivalent for younger and older students. The sample was separated into two groups: 12 years old and under (n = 159) and 13 years old and over (n = 211). As suggested by Weiss on theoretical grounds (Weiss et al., 1991), this age cutoff approximates the move from concrete to formal operations in the context of emerging puberty. Multiple goodness-of-fit indicators revealed that the 4-factor model fit well in both age-groups: NNFI = .976, CFI = .977, and IFI = .978. Thus, it was concluded that the MASC factor structure is invariant across age and gender. Confirmatory Factor Analyses Having established the factor structure of the MASC, the next step was to replicate the factor structure in two groups of subjects: a second large, school-based sample of 2,698 children and adolescents and a clinical sample of 390 children and adolescents (March, 1998). As before, multiple goodness-of-fit indices were used to evaluate the fit of the data to the measurement model. In both nonclinical and clinical samples, the 4-factor

< previous page

page_304

next page >

< previous page

page_305

next page > Page 305

model for the 39-item MASC met the criteria standards for adequacy of fit (Bentler, 1988). Parameter estimates for the relations were statistically significant. Thus, the data had good fit to the MASC model. Confirmatory factor analyses for the 4-factor MASC model also have been conducted in a large sample of ADHD children, again demonstrating adequacy of fit of the data (March, unpublished data). The overall conclusion to be gained from the CFA procedures is that the MASC factor structure replicates nicely across diverse samples of children and adolescents. Reliability Reliability in psychometric terms has several meanings. Internal reliability represents consistency between items within a group of items comprising a discrete factor (Chronbach, 1970). Test-retest reliability represents consistency in a set of scores by the same rater (single case intraclass correlation coefficient, ICC) or set of raters (mean ICC) over time (Shrout & Fleiss, 1979). Test-retest reliability varies with the conditions under which the test is administered, practice or memory effects, true change in the variable(s) of interest plus an instability component due to measurement error attributable to the instrument itself. Without adequate reliability, it is not possible to determine whether differences in scores between individuals or within-subject over time are due to "true" differences or to "chance" error. Internal Reliability. Using a cutoff of .6below which internal consistency is suspecttotal sample a reliabilities, which range from .6 to .85, are acceptable for all main factors and subfactors for the 39-item MASC (March et al., 1997). Internal reliability for the MASC total score is .9. Furthermore, a reliabilities for the MASC total score are generally comparable for males (.85) and females (.87). Very high reliability coefficients (above .9) indicate excessive redundancy at the item level. Inspection of item content shows individual items within a factor or subfactor to be face valid for the measured construct, but not redundant with respect to item content. Test-Retest Reliability. In a clinical population of children and adolescents with a mixture of anxiety disorders and/or ADHD (March et al., 1997), the test-retest reliability of the MASC at 3 weeks and 3 months was examined using the intraclass correlation coefficient (ICC) calculated according to procedures outlined by Shrout and Fleiss (1979). Mean ICCs for the MASC total score were. 785 at 3 weeks and .933 at 3 months, indicating satisfactory to excellent test-retest reliability (March et al., 1997). Similarly, mean ICCs for all factors and subfactors save the harm avoidance factor fell in the satisfactory to excellent range at 3 weeks; all factors and subfactors proved satisfactory to excellent at 3 months (March et al., 1997). Mean ICCs for the MASC-10 and anxiety disorders index ranged from .64 to .89, again indicating satisfactory to excellent stability (March, in press). More recently, the test-retest reliability of the MASC in a school-based sample of children and adolescents was explored (March, Sullivan, & Parker, in press). For both single case and mean ICCs, the MASC exhibited satisfactory to excellent stability across all factors and subfactors. Satisfactory test-retest reliability also was demonstrated for an empirically derived short form, the MASC-10, and for an anxiety disorder index with high discriminant validity. Thus, the MASC can be said to demonstrate excellent test-retest reliability in both clinical and epidemiological samples.

< previous page

page_305

next page >

< previous page

page_306

next page > Page 306

Validity Correlational Analysis. The factor structure of the MASC also is unique among extant scales in its subdivision of main factors into subfactors that nevertheless explain a meaningful proportion of the variance (March et al., 1997). With the exception of perfectionism, which shows a weaker relation to physical symptoms in females than in males, the pattern of shared variance as indicated by correlational analysis is similar for males and females. Importantly, although almost all correlations are significant at a Bonferroni-corrected alpha level of .05 or lower, the absolute magnitude of the shared variance is in the low to moderate range for most pairs. This suggests that the MASC is indeed measuring separate dimensions of anxiety, even at the subfactor level, which in turn should make it ideally suited to discriminate patterns of anxiety in subgroups of children with anxiety disorders. Convergent and Divergent Validity. For the MASC to be useful clinically, the MASC factors would share greater variance with measures in the same symptom domain (convergent validity) than in different domains (divergent validity). In a test of this hypothesis in a clinical sample of children and adolescents with a variety of internalizing and externalizing disorders, it was hypothesized that the MASC would be strongly correlated with a measure of anxiety (RCMAS), less so with depression (CDI), and not at all correlated with a measure of disruptive behavior (Abbreviated Symptom Questionnaire-Parent, or ASQ-P). In all instances, the results went in the predicted direction, implying that the MASC is a specific indicator of pediatric anxiety symptomatology. Notably, the MASC performed significantly better than either the RCMAS or the CDI in this regard (March et al., 1997). Discriminant Validity. Of course, these results say little or nothing about the ability of the MASC to discriminate between children with anxiety disorders and normal or psychopathological controls as defined by structured interviews. Perrin and colleagues showed that the RCMAS and the STAI-C differentiated children with DSM-III-R anxiety and attention deficit disorders from normals but not from each other, whereas the Fear Survey Schedule for Children-Revised (FSSC-R) was ineffective at discriminating between any grouping (Perrin & Last, 1992). The discriminant validity of the four central scales from the MASC were examined by using discriminant function analysis to predict group membership in patients with anxiety disorders versus normal controls. Two groups of children and adolescents were used in the present analysis. The first group consisted of children and adolescents who met DSM-IV criteria for an anxiety disorder (American Psychiatric Association, 1994) other than obsessive-compulsive disorder. The second group (nonclinical) consisted of children and adolescents randomly selected from a large pool of normative data on the MASC (March, 1998) and matched with the clinical sample on the basis of age and sex. A discriminant function analysis was performed using the four MASC subscales as predictors of membership in two groups (clinical vs. nonclinical). Discriminant function scores from this analysis were used to classify subjects into clinical or nonclinical groups. Following the definitions and procedures outlined by Kessel and Zimmerman (1993), a variety of diagnostic efficiency statistics were calculated from these classification results: sensitivity was 90%, specificity was 84%, positive predictive power was 85%, negative predictive power was 89%, false positive rate was 16%, false negative rate was 11%, kappa was .74, and the overall correct classification rate was 87%.

< previous page

page_306

next page >

< previous page

page_307

next page > Page 307

Females are More Anxious Than Males. The literature is consistent across ages and disorders that girls show more anxiety than boys (March, 1995). As expected, females show more anxiety than males in Bonferroni-corrected planned contrasts between item-mean scores for males and females on the 39-item MASC, with these differences significant at the p < .001 LEVEL; HOWEVER, THE ABSOLUTE MAGNITUDE OF THE DIFFERENCES IS LOW IN MOST CASES. MASC Anxiety Index To further highlight discriminant validity for both normal and psychopathological controls, an anxiety index was also developed. Two groups of children and adolescents were used to develop the anxiety index for the MASC. The first group consisted of 40 children and adolescents (24 males and 16 females) who met DSM-IV clinical criteria for an anxiety disorder (American Psychiatric Association, 1994) other than obsessive-compulsive disorder. The mean age was 11.96 years (SD = 2.07) for males and 10.88 years (SD = 2.80) for females. The second group (nonclinical) consisted of 40 children and adolescents randomly selected from a large pool of normative data on the MASC (March, 1998) and matched with the clinical sample on the basis of age and sex. Having defined a sample of subjects with and without anxiety disorders, the next step in the development of the MASC Anxiety Index was to identify items from the MASC that appeared to discriminate between clinical and nonclinical groups. Based on a series of t-test analyses, 15 items were identified as significantly discriminating between the two groups. A direct discriminant function analysis was performed using the 15 items as predictors of membership in two groups (clinical vs. nonclinical). Items with the lowest standardized discriminant function coefficients (coefficients below .25) were dropped from the item pool, and the analyses was repeated until the only items remaining had coefficients above .25. Discriminant function scores from the 10 items identified in this analysis were then used to classify the 80 children and adolescents into clinical or nonclinical groups. Again, following the definitions and procedures outlined by Kessel and Zimmerman (1993), a variety of diagnostic efficiency statistics were calculated from these classification results: sensitivity was 95%, specificity was 95%, positive predictive power was 95%, negative predictive power was 95%, false positive rate was 5%, false negative rate was 5%, kappa was .90, and the overall correct classification rate was 95%. Cross-validation in an identically derived sample produced similar results (March, 1998). Having established that the Anxiety Index discriminates anxiety disordered from normal children and adolescents, a similar discriminant validity needed to be established between children and adolescents with an anxiety disorder and those with DSM-IV diagnosis of ADHD. As pointed out by Perrin, this is psychometrically a more difficult problem than discriminating between subjects with and without clinical symptoms (Perrin & Last, 1992). Two groups of children and adolescents were used in the present analysis: one that met DSM-IV criteria for an anxiety disorder other than OCD, the other for ADHD matched with the anxiety disorder group on the basis of age and sex. A direct discriminant function analysis was performed using the MASC Anxiety Index. Discriminant function scores were then used to classify the 140 children and adolescents into anxiety or ADHD groups. The following diagnostic efficiency statistics were calculated from these classification results: sensitivity was 75%, specificity was 67%, positive predictive power was 73%, negative predictive power was 69%, false positive rate was

< previous page

page_307

next page >

< previous page

page_308

next page > Page 308

33%, false negative rate was 25%, kappa was .42, and the overall correct classification rate was 71%. Although not quite as robust as the anxiety versus normal comparison, the Anxiety Index nevertheless shows an acceptable ability to discriminate children with anxiety and attention deficit hyperactivity disorder. Parent-Child and Parent-Parent Concordance In general, parent-child and parent-parent concordance agreement is low for internalizing symptoms, especially for domains that are relatively less observable by parents (see, e.g., Jensen et al., 1993; Jensen, Traylor, Xenakis, & Davis, 1988). In considering this issue, it is important to keep clear the distinction between concordance (i.e., agreement at a single point in time) and reliability (i.e., stability of agreement over time irrespective of concordance), because it is at least theoretically possible that parent reports would show low concordance and high reliability, or the converse. In a preliminary study in which fathers and mothers were asked to complete MASC ratings of their child's symptoms, it was hypothesized that fathers would be less concordant than mothers with respect to their child's MASC scores and that parent-child agreement would be poor (March et al., 1997). As predicted, parent-child concordance was poor. Fathers proved less likely than mothers to identify anxiety symptoms in their offspring. Both parents were more likely to identify anxiety symptoms, such as social avoidance, that are readily observable and stable over time. The Use of the MASC for Treatment Planning The task of the mental health practitioner using the MASC is to understand the presenting symptoms in the context of constraints to normal development. Practitioners must also devise a treatment program that ameliorates those constraints so that the youngster can resume a normal developmental trajectory insofar as is possible. For most children with anxiety disorders, this requires a careful multimodal evaluation, and some combination of cognitive-behavioral, psychopharmacological, and in many cases, behavioral or pedagogical academic interventions (March, 1995). Leaving out one or more legs of this three-legged stool is a common cause of so-called treatment resistance. Because few practitioners possess all the essential skills, and because reimbursement schedules increasing constrain practice patterns, such complex assessment and treatment regimens are best delivered within a multidisciplinary "team" milieu using efficient diagnostic assessment tools, such as the MASC and other dimensional rating scales. The Initial Evaluation It goes without saying that a thorough diagnostic assessment, including a clinical interview and a multimethod and multiinformant empirical evaluation, is essential to generating a comprehensive treatment plan (Conners et al., 1995; March, Mulle, Stallings, Erhardt, & Conners, 1995b). In the Program for Child and Adolescent Anxiety Disorders at Duke University Medical Center, the evaluation begins with the initial telephone contact and proceeds through previsit data gathering and a clinical interview

< previous page

page_308

next page >

< previous page

page_309

next page > Page 309

before concluding with a feedback and treatment planning session. To speed and concentrate the evaluation process, a sizable amount of data was gathered prior to the patient's initial visit, with the same evaluation used for every child seen within the subspecialty clinic. In addition to requesting psychiatric/psychological, neuropsychological, hospitalization, and school records, patients and family members were asked to complete a packet of materials designed to assess important domains of psychopathology in the context of the patient's presenting concerns. In addition to information about the clinic, these materials include rating scales that screen for the major internalizing and externalizing symptoms constellations and the Conners-March developmental questionnaire (Conners & March, 1996). Table 10.3 lists the rating scales typically obtained from child and parent/teacher; Table 10.4 summarizes the information obtained in the Conners-March Developmental Questionnaire. Each patient and family complete an extensive clinical evaluation (lasting 1 1/2 hours) by a child psychiatrist or psychologist. The overall goal is to move from the presenting complaint through a DSM-IV five-axis diagnosis to an ideographic portrayal of the problems besetting the child patient. This initial visit includes a clinical interview of the children and their parents covering Axis I through Axis V of DSM-IV; review of finding from the rating scale data; the Conners-March developmental questionnaire (Conners & March, 1996); school records and previous mental health treatment records; a formal mental status examination; and, in some cases, a specialized neurodevelopmental evaluation. By carefully examining the MASC in advance of seeing the patient, the assessing clinician's "prior probabilities" were adjusted relative to the major domains of anxiety TABLE 10.3 Rating Scales Rating Scale Conners Parent Rating Scale Conners Teacher Rating Scale Multidimensional Anxiety Scale for Children (MASC) Leyton Obsessional Inventory Child and Adolescent Trauma Survey Children's Depression Inventory

Type of Information Parent-rated general psychopathology Teacher-rated general psychopathology Self-reported anxiety Self-reported OCD Self-reported stressors and PTSD symptoms Self-reported depression

TABLE 10.4 Conners/March Developmental Questionnaire Information Specific Type of Information Demographics Age, gender, race, school grade, SES History of Presenting Narrative summary by parent Problem Previous Treatment List of providers and addresses Providers Treatment History Type and adequacy of drug and psychotherapy trials Birth and Pregnancy Pre- and perinatal risk factors History Early Developmental Temperament and developmental History milestones School History/Learning Pedagogic and behavioral school Problems experience Peer Relationships Number and quality of friendships Family Psychiatric Multigenerational FH of mental History illness Family Medical History Heritable medical illnesses Patient Medical History General medical history

< previous page

page_309

next page >

< previous page

page_310

next page > Page 310

(Weinstein & Fineberg, 1980). By examining the other scales, it was possible to estimate the likelihood of complicating comorbidities. This allows the clinician to set up a diagnostic hierarchy, with a primary diagnosis or diagnoses and a set of rule-out diagnoses and unlikely diagnoses that guide the clinical interview. Ideally, a structured interview, such as the Anxiety Disorders Interview Schedule for Children (ADIS; Silverman & Eisen, 1992), should be part of every diagnostic assessment. Unfortunately, there is a lack of staffing resources to complete an ADIS, which requires separate interviews of child and one parent on all clinical patients. Thus, the development of reliable, valid, and cost-effective instrumentation (e.g., the MASC) that can be used in combination with other assessment tools (e.g., the Conners scales) in lieu of structured interviews is of considerable interest. Because the clinician has reviewed the child's MASC responses at the item level in advance of seeing the child, it is very easy to empathetically gather information about the anxiety symptoms besetting the patient. This both speeds the interview, and builds trust between the doctor and the patient, which in turn facilitates treatment planning. Following a careful discussion of the diagnostic impression, recommendations are then made in each of the following categories: additional assessment procedures, when required; cognitive-behavioral psychotherapies; pharmacotherapies; behavioral and/or pedagogic academic interventions, when necessary; and level of care, including expected time to response and setting in which care can reasonably be delivered. Unlike less formal evaluations that concentrate more heavily on family interactional elements, this study attempted to implement interventions that present a logically consistent and compelling relation between the disorder, the treatment, and the specified outcome. In particular, an effort was made to keep the various treatment targets ("the nails") distinct with respect to the various treatment interventions ("the hammers") so that aspects of the symptom picture that are likely to require or respond to a psychosocial as distinct from a psychopharmacological intervention are kept clear insofar as is possible. This method allows for a detailed review of the indications, risks, and benefits of proposed and alternative treatments, after which parents and patient generally chose a treatment protocol usually consisting of cognitive-behavior therapy (CBT) alone or CBT in combination with an appropriate medication intervention. Such a procedure is consistent with medical evaluation procedures across medical specialties, and meets goals for guidelines-based practice in managed care (Lenhart & March, 1996). General Interpretive Considerations Having described a general framework for approaching the anxious child or adolescent, it is time to consider the administration, scoring, and interpretation of the MASC. This involves considering whether the child's responses are valid indicators of the measured constructs, reviewing the item scores, reviewing the total score and factor and subfactor scores, looking for patterns in the factor scores that might suggest a diagnosis, and placing this data in the context of all the other information available about the child. are the Child's Responses Valid? Before proceeding with the actual interpretation of the MASC, it is crucial to consider threats to the validity of information contained in the MASC. Although self-report measures, such as the MASC, directly ascertain a subject's anxiety level across multiple behavioral/symptomatic domains, MASC scores

< previous page

page_310

next page >

< previous page

page_311

next page > Page 311

are subject to a variety of biases (La & Silverman, 1993; Weissman, Orvaschel, & Padian, 1980). For example, some children tend to underestimate or underreport anxiety in the service of presenting a favorable evaluation of themselves (Silverman, 1987). Some children overreport anxiety in order to minimize enforced exposure to phobic stimuli; others do the opposite (e.g., underreport symptoms) for exactly the same reason. Gender and cultural differences also may influence reporting, with females more willing to endorse fearfulness than males (Ollendick et al., 1985). A child's ability to read and to understand the questionnaire items directly influences the validity of responses. When help is necessary to read the questions, the expectations of the child regarding the adult helper may set up a response bias that in turn may influence the validity of the data obtained. Whereas the MASC shows excellent test-retest reliability and validity, these and other factors may lead to poor test-retest reliability and suspect validity in the individual child. Thus, it is important to ask about the circumstances under which the child completed the questionnaire, and to ask directly about whether the child had difficulty in interpreting or understanding particular questions. To further aid in interpreting the validity of the child's responses, the MASC includes an empirically derived inconsistency index (March, 1998). The inconsistency index uses summed difference scores on items that are expected to be highly intercorrelated. Using a T-score distribution for these scores derived from the normal sample, it is possible to establish cutoff scores beyond which valid responding is questionable. Low scores indicate a pattern consistent with valid responding; scores above an age and gender adjusted cutoff suggest that the MASC should be interpreted cautiously. Interpreting Item Responses. The first step in interpreting the results from the MASC is to examine individual item responses. Each MASC factor has approximately 10 items; each subfactor contains approximately half that many. From perusing the "Often" or "Always" response categories, it is often apparent which categories of anxiety are problematic for the patient. For example, a child may endorse many symptoms in the social anxiety category, but few indicating separation anxiety, thereby keying the interviewer to think first of social phobia. Alternatively, a youngster who appears tense, has mild to moderate worries from many categories and is high in perfectionism and anxious coping may be showing signs of generalized anxiety disorder. It is also informative to examine items that receive a "Never" response, as these often flag symptom domains that are not important or reflect developmental considerations. For example, adolescents generally do not sleep with a light on; when they do, it may indicate significant panic symptomatology. Conversely, when perusing individual items, it is important to look for consistency in the pattern of responses, and not to overinterpret any individual response with respect to predictive power for a DSM-IV disorder. In this context, the MASC contains no "critical items" (i.e., items weighted as more important than other items). Importantly, symptoms at the item level may be important indicators of ideographic treatment targets (i.e., targets defined at the point at which treatment is tailored for the individual child). For example, a child with separation anxiety disorder and palpitations will be approached differently in cognitive-behavioral psychotherapy than another child with the same diagnosis but with dizziness as the most prominent somatic/autonomic symptom. To habituate the somatic/autonomic cue, one will climb stairs; the other will spin until dizziness no longer initiates the panic cascade (Carter & Barlow, 1993). The MASC makes it easy to pinpoint several of the more important somatic/autonomic symptoms, which in turn allows the clinician to efficiently and empathetically direct the clinical

< previous page

page_311

next page >

< previous page

page_312

next page > Page 312

interview. Item review permits a similar approach to many other important signs and symptoms that may be present, and allows the clinician to pay relatively less attention to symptoms that have not been endorsed. Interpreting the Total Score and Individual Factor Scores. As noted earlier, interpretation of the factor scores for the MASC requires that the reader have a general understanding of the nature of anxiety in pediatric patients. Given such an understanding, the MASC is easy to interpret based on an analysis of where a particular child or group of children fall with respect to MASC population norms. Using T-scores, high scores represent a problem; lower scores suggest the child does not have these particular symptoms or set of symptoms. For example, a child with a T-score above 70 on the social anxiety factor is likely to have significant concerns regarding self-presentation and may meet DSM-IV diagnostic criteria for social phobia. When using this strategyfor example, using T-score norms to compare child report to population normsit is important to note at the outset that population norms in this case must represent an appropriate comparison group. For the MASC, normative comparisons are presented by gender and age for a normal population sample. Configural Interpretation. When interpreting the MASC, the clinician will wish to examine the pattern of elevation in T-scores in addition to considering individual T-score elevations. Where no T-score is above 65, the MASC is not indicative of clinically elevated anxiety symptoms. When one T-score is above 65, then the pattern is marginal; in turn, the greater the number of factors and subfactors that show clinically relevant elevations, the greater the likelihood that the MASC scores indicate a problem in the moderate to severe range. Additionally, elevations in the social or separation anxiety factors are often accompanied by elevations in physical symptoms and anxious coping behaviors. Thus, when social or separation anxiety are elevated, it is useful to examine the physical symptoms and anxious coping factors, subfactors, and items to better understand the child's total symptom picture. A Step-by-Step Interpretive Strategy The following steps represent a typical sequence for interpreting the MASC. Step 1 Is the MASC a Valid Representation of Anxiety for This Particular Child? Given an understanding of childrens' motivation to complete the scale, impact of other comorbidities on their ability to complete the scale accurately and/or with bias, the setting in which the MASC was administered, and the purpose for which the results will be used, the clinician must make a judgment regarding the validity of the MASC data. As a first step, inspection of the Validity Index provides an estimate of whether the child's pattern of item responses is both internally consistent and consistent with the response patterns shown by other children of the same age, gender, and race. If it is not, then the results may or may not be valid, depending on other information available to the clinician. Motivational issues include the child's desire to avoid treatment by inflating symptoms ("it is too hard; where's the magic pill") or minimizing symptoms ("I don't need it"). Concerns regarding self-presentationfor example, the need to look perfect in the eyes of valued adultsmay introduce a systematic response bias, especially if the child knows that a parent will see the results. This is a particular concern when a parent is required to help the child read and/or understand the scale items. Not surprisingly, it is also important to consider whether response biases associated with the gender and/or cultural background might influence the child's report of symptoms. Where norms by age, gender, and race are available, these factors are controlled to some extent. However, regional and cultural differences may extend

< previous page

page_312

next page >

< previous page

page_313

next page > Page 313

even to the neighborhood level, which requires a level of molecular analysis not possible in a manualized format. Finally, the MASC can be used as both a clinical and epidemiological instrument. In the former situation, MASC T-score elevations will be less likely to be associated with a false positive result because the prior probability of clinically significant anxiety symptomatology is already elevated in the population. In epidemiological surveys, the investigator will need to individualize the T-score cutoff that will best optimize the percentage of false positive and false negative results depending on whether the purpose of the scale is to capture all positive cases (lower cutoff) or to eliminate false positive children (screen) or to optimize the two (Costello & Angold, 1988). Conventionally, receiver operating curve (ROC) analyses have been used for this purpose (Weinstein & Fineberg, 1980). Step 2 What is the Overall Level of Anxiety Symptomatology? The MASC total score represents a measure of the overall level of anxiety. Norms are given for population and clinical samples by age and gender, which allows clinicians to refine their estimate of whether the MASC total score is elevated into the clinical range. T-scores above 65 likely represent clinically significant symptoms in a ''high base rate" group, such as a mental health clinic or a population study of pediatric posttraumatic stress disorder (PTSD) after a natural disaster. Conversely, the clinician may wish to use a higher criterion score (say, T-score of 70 or even 75) in a "low base rate" group, such as a population of children without identified behavior problems, for inferring clinical problems. Step 3 Are All Scales Elevated or is there a Pattern that Suggests a Specific Anxiety Disorder? Many children show elevations in all scales; other show selective elevation of specific domains of anxiety. Examining the MASC factor and subfactor scores allows the clinician to identify problem areas versus those areas in which the child does not appear to be clinically symptomatic. In many cases, these patterns may correspond to diagnostic groupings. For example, a child with separation anxiety disorder will likely show elevations on physical symptom (especially the somatic/autonomic subfactor), harm avoidance (especially anxious coping), and the primary separation anxiety factors. Step 4 What Item Responses are Elevated? Having obtained a good sense of the child's global level of anxiety and which MASC factors and subfactors appear problematic, it is now possible to scan the individual items for those that are or are not particularly problematic. Particular items are very useful in helping the clinician target questions during the clinical interview and in selecting targets for treatment. For example, a child with heart rate accelerations that are panicogenic will require habituation to this particular cue; a child with dizziness but not heart rate triggers will be approached differently when constructing an ideographic exposure hierarchy. Step 5 Integrate Information from the MASC with Other Information. Using available information from other rating scales, parent and child interviews, teacher reports, and data from other mental health professionals, the clinician can now interpret the MASC scores with respect to validity and clinical significance. Step 6 Taking All Sources of Information into Consideration, Including the Masc, Define a Set of Recommendations for Additional Assessments, Psychosocial Treatment(S), Possible Use of Medication, and/or Pedagogic/Behavioral Interventions at School. In addition to deciding on a treatment plan that is tailored to the needs of a particular child, the clinician will need to decide how best to make of the MASC data with respect to discussing the children's problems with the children themselves, the family, and the school. Additionally, the MASC format lends itself nicely to report generation, but the decisions of whether, when, and who should have access to a report are decisions for the clinician and family. Use of the MASC in Monitoring Treatment Outcome Considerable attention has been placed on the problem of measurement error in assessing treatment outcome (see, e.g., Hsu, 1995; Jacobson & Revenstorf, 1988). Because the MASC provides a reliable and valid estimate of the "true score" variance associated

< previous page

page_313

next page >

< previous page

page_314

next page > Page 314

with the measured construct(s), it is an excellent candidate measure to be used as the dependent variable (or as a mediator or moderator variable) in treatment outcome studies (March & Curry, 1998). In a recent study of cognitive-behavioral psychotherapy for PTSD, the MASC proved sensitive to therapeutic change (March, Amaya-Jackson, Murry, & Schulte, 1998). Based on its robust psychometric profile, and the lack of satisfactory alternatives, the MASC is in wide use in industry, foundation and NIMH-funded treatment outcome studies despite the fact that it is a relatively new scale. Whereas the most robust criteria for response to a clinical therapeutic intervention is movement from the clinical range (e.g., a T-score above 60 to 65) into the normal range, the MASC is sufficiently stable so that a half standard deviation T-score change of 5 (if clinically supportable) represents meaningful change. Evaluation of the MASC Against Nimh Criteria for Outcomes Measures In an update of criteria for selecting screening, treatment planning and/or evaluating the outcome of treatment developed by a panel of experts convened by the National Institute of Mental Health (NIMH), Newman and Ciarlo (1994) proposed five groupings by which a measure should be judged: applications of the measure, methods and procedures, psychometric features, cost considerations, and utility considerations. Although these main groupings, and the criteria subsumed under them, are not orthogonal, they represent the main concerns that clinicians and researchers hold relative to the usefulness of assessment measures. Applications. As a general pediatric anxiety measure, the MASC clearly meets the first criterion: First, it should be relevant to the target group to which is it being applied. And, second, it should be independent of any treatment provided. In particular, an argument can be made that the MASC is the only scale that accurately represents the factor structure of anxiety in the pediatric population irrespective of age (8-18), gender, or race. At the subfactor level, the MASC taps constructs that represent the DSM-IV constructs of social phobia, separation/panic anxiety, and generalized anxiety. Additionally, the MASC targets anxiety-reinforcing coping behaviors, which by themselves are often targets for treatment. At the item level, each MASC item is face valid for the constructs represented thereby encouraging agreement between provider and patient vis-à-vis selection of target symptoms for treatment. The MASC is sensitive to treatment-induced change (March, Amaya-Jackson, et al., 1998; March et al., 1997), and has been chosen as a dependent measure in a wide variety of NIMH-funded comparative and single modality treatment studies, including the Multimodal Treatment of ADHD study (Hinshaw et al., 1997), the Research Units on Pediatric Psychopharmacology (RUPP) study of fluvoxamine in pediatric anxiety disorders (Greenhill, Pine, March, & Riddle, in press), and the newly funded comparative treatment outcome study in pediatric OCD (J. March and E. Foa, personal communication). Methods and Procedures. As illustrated in the MASC manual (March, 1998), the MASC also meets the second criterion: the use of simple, teachable methods. In particular, the MASC items, subfactors, and factors are all face valid for the constructs they represent, making it very easy to interpret MASC scores at the item or factor level. Similarly, the MASC is easy to administer and score, using either computer-scored scanable forms, or the pencil-and-paper QuickScore forms. The MASC includes a

< previous page

page_314

next page >

< previous page

page_315

next page > Page 315

detailed manual that provides a review of anxiety disorders in children and adolescents, instructions for administering and interpreting the MASC, normative data (by three age groupings and by gender), and documentation of psychometric adequacy for both clinical and research applications (March, 1998). The third criterion, the use of objective referents, is a particular strength of the MASC. Before publishing the MASC, the factor structure was replicated in both clinical and population samples and across age and gender (March, 1998); an anxiety disorder index with high discriminant validity was established for normal and ADHD samples (March, 1998); stability over time was documented in both clinical and population samples (March et al., in press; March et al., 1997); a validity index was developed to provide an estimate of valid versus invalid responding (March, 1998); and normative data was provided in large population sample of children and adolescents to allow a clinician, researcher, or utilization reviewer to establish extent of deviance (need for treatment) and to determine when a patient has returned to the normal range (signifying the end of treatment; March et al., 1997). No other pediatric anxiety scale provides these assurances of robust psychometric properties. Additionally, the MASC scales, subscales, and items are specifically designed to provide important information regarding treatment planning and outcome monitoring. For example, a child with excessive motor tension is a candidate for relaxation training; absent such a complaint, this intervention may not be necessary. The MASC tense/restless subfactor provides this information. Anecdotally, patients report that the detailed symptom review inherent in the MASC factor structure often indicates to the child that the clinician is interested in and understands those behavioral/symptomatic indicators that are disturbing to the child. In this fashion, the MASC facilitates communication between provider and patient, ultimately identifying unique targets (e.g., suffocation anxiety) for ad hoc treatment interventions as implemented in empirically validated treatment packages (Barlow, 1997). With respect to the fourth criterion, the use of multiple respondents, children and adolescents typically are much better reporters of internalizing symptoms than their parents (Faraone, Biederman, & Milberger, 1995; Jensen et al., 1988). In the initial study of the MASC, parent-child agreement was poor even in a sample of clinically ill children who might have been expected to show readily observable symptoms (March et al., 1997). The fourth criterion, therefore, may be not be as applicable to the assessment of pediatric anxiety disorders as it might be, for example, to children with disruptive behavior disorders (Angold, 1989; Faraone et al., 1995; Jensen et al., 1988). Lastly, the fifth criterion, the use of process-identifying outcome measures, is of critical importance with respect to understanding the mechanisms by which treatment works and to disseminating new treatments (March & Curry, 1998). Although not a stated goal of the MASC, the MASC is unique among general pediatric anxiety scale in that it includes a harm avoidance factor, which in turn is subdivided into perfectionism and anxious coping subfactors. To the extent that these anxiety-reinforcing coping strategies are modified by treatment, a reduction in scores on the anxious coping factor may be construed as reflecting corollary therapy processes, for example, in single case designs aimed specifically at component analyses (Hayes, 1981; March & Curry, 1998). Psychometric Features. With the exception of cross-cultural documentation, where the RCMAS clearly shows important strengths (see, e.g., Ollendick & Yule, 1990; Yang, Ollendick, Dong, & Xia, 1995), the MASC, though a recent scale, shows more robust psychometric properties and surprisingly documentation than older scales, such as the

< previous page

page_315

next page >

< previous page

page_316

next page > Page 316

RCMAS, FSSC-R or STAI-C (March, 1998; March & Albano, 1998; March et al., 1997). More important, the MASC unquestionably measures anxiety, unlike other extant scales that purport to the full range of pediatric anxiety symptoms, which do not (Perrin & Last, 1992). For example, the MASC shows a particular strength in discriminant validity (March, 1998; March et al., 1997). Given excellent test-retest and robust population norms, the MASC appears to be an appropriate instrument for identifying sufficient deviance/impairment to warrant consideration for treatment at the single subject level. Cost Considerations. The seventh criterion, low cost, is unfortunately not a strength of the MASC, which is only available commercially through MultiHealth Systems, Toronto, Canada. On the other hand, with the exception of the FSSC-R and the Children's Manifest Anxiety Scale, which are in the public domain, other commonly used scales for this purpose are also proprietary instruments. Given efficiencies in the diagnostic process and validity considerations, the MASC likely is cost-effective for its intended purpose, though empirical data supporting this assertion is as yet lacking. Furthermore, the MASC is available at reduced cost for researchers interested in using it in research protocols. The MASC is explicitly supported with respect to research purposes, with data from research protocols fed back into MASC psychometric studies; research collaboration for this purpose is invited. Utility Considerations. Criterion 8 (understanding by nonprofessional audiences), Criterion 9 (easy feedback and uncomplicated interpretation) and Criterion 10 (utility in clinical services) have been addressed previously. Criterion 11, compatibility with clinical theories and practices, is an important strength of the MASC. As pointed out earlier, the MASC was developed in an atheoretical fashion to represent the factor structure of anxiety in the population rather than to conform to a particular theory of the genesis of anxiety or any anxiety subtype. Hence, the MASC fits well with a variety of theoretical perspectives where the objective is to ascertain anxiety symptoms and not specifically to represent a particular theoretical perspective. In this regard, the MASC should minimize measurement error across divergent treatment interventions, making it suitable for comparative treatment outcome studies (Arnold, 1993; Jensen, 1993). Case Study Ann is a 7-year-old Caucasian girl from a two-parent, lower middle-class family. About 1 month before coming to the clinic, she began to experience stomachaches at school. Many other children were sick with a stomach virus at the time, so Ann's symptoms did not arouse unusual concerns. After a visit to the pediatrician, which did not reveal anything unusual, Ann soon went back to school. Unfortunately, although the other children were back to normal, Ann continued to have stomachaches; other sick feelings, such as dizziness, began as well. After several more days of this, she began to resist going to school because she felt better at home. Ann's mother, who was on the shy side and was generally sympathetic and protective of Ann because Ann reminded her of herself when she was a child, let Ann stay home. In contrast, Ann's father grew angry when she repeatedly wanted to stay home. Over Mom's objections, he insisted that Ann go to school; Ann went, but cried all the way. By midmorning, she experienced her first

< previous page

page_316

next page >

< previous page

page_317

next page > Page 317

full panic attack, actually throwing up in class so that her mother had to come to school to take Ann to the pediatrician, who again found nothing wrong. By this time, Ann had become clingy, refused to stray far from home, and repeatedly expressed fears that something might happen to her parents, particularly her mother. She worried that her mother might not be able to help her when she felt sick and scared. By the time she came to the clinic on the advice of her pediatrician, Ann had been out of school for 2 weeks. By this point, Ann and her family were "at war" over whether Ann was sick or just being oppositional. Family history was positive of panic disorder and social phobia in Ann's mother, and subclinical affective disorder in her father. She was generally healthy, and neither she nor other family members were under any unusual stress. Step 1 Is the MASC a Valid Representation of Anxiety for This Particular Child? Ann filled out the MASC with her mother, who had to read but not to explain the questions. Like most anxious kids, Ann knew from her own experiences what the questions meant. Clinically, it appeared that the mother's bias, like Ann's, was toward endorsing rather than minimizing symptomsthough only for symptoms that actually were present. Given the presenting complaint and a normal MASC validity index, it appeared that the MASC represented a valid index of Ann's symptoms. Step 2 What is the Overall Level of Anxiety Symptomatology? Ann's MASC total T-score was mildly elevated at 65, reflecting the fact that not all anxiety domains were problematic, and even within symptomatic domains, not all symptoms were equally problematic. Step 3 Are All Scales Elevated or is there a Pattern that Suggests a Specific Anxiety Disorder? As might be expected, her T-score for separation anxiety were markedly elevated (T-score of 80) as were the Tscores for anxious coping (T-score of 74) and somatic/autonomic symptoms (T-score of 68). Conversely, the other T-scores for all factors and subfactors either were not or only marginally elevated. Clinical questioning later revealed that the elevation in humiliation fears related to her fears about the effects of separation anxiety symptoms on her relationships at school. Step 4 What Item Responses are Elevated? Consistent with her history, dizziness and gastrointestinal symptoms were maximally elevated; conversely, she endorsed little in the way cardiac symptoms. Unlike many children with separation anxiety, Ann did not endorse a fear of sleeping away from home, perhaps because her fears had not had time to generalize beyond the school setting. Step 5 Integrate Information from the MASC with Other Information. The Conners Parent and Teacher Rating Scales suggested problems with disruptive behaviors at school and home plus elevated anxiety/shyness. The Children's Depression Inventory suggested problems with ineffectiveness, but no other indicators of depression. Taken together, the family history, clinical picture, and testing data all pointed to a diagnosis of separation anxiety disorder. Step 6 Taking All Sources of Information into Consideration, Including the Masc, Define a Set of Recommendations for Additional Assessments, Psychosocial Treatment(S), Possible Use of Medication, and/or Pedagogic/Behavioral Interventions at School. No additional assessments seemed necessary. Treatment began with cognitive-behavioral psychotherapy, with the possibility of the later addition of a medication if not rapidly responsive to CBT. To encourage a graded return to school, school personnel were closely involved in the CBT intervention. Conclusions To summarize, the MASC provides reliable and valid ascertainment of anxiety symptoms across all major symptom domains as they exist in young persons from age 8 to 18; discriminates between symptom clusters within anxiety groupings and between anxiety and other psychopathological groupings; evaluates severity against age and gender norms; provides information from the most important rater, the child or adolescent;

< previous page

page_317

next page >

< previous page

page_318

next page > Page 318

and indexes treatment-induced symptom change. With the increasing emphasis on multidisciplinary assessment and treatment strategies, the MASC should facilitate communication, not only among clinicians, but also between clinicians and regulatory bodies, such as utilization review committees. Finally, in a world where research advances increasingly drive differential therapeutics within a medical model, it is critical that mental health providers develop rapid and efficient tools for defining treatment targets for medication and psychosocial treatments. Perhaps because of insufficient time, lack of training, or methodological constraints (Setterberg et al., 1991), clinical practice all too often fails to include a semistructured interview incorporating information from multiple informants (Reich & Earls, 1987). The result is often missed diagnoses and ineffective treatment planning (Costello et al., 1988). In addition, clinicians under managed care will increasingly rely on practice guidelines, which in turn require systematic assessment tools (Barlow, 1994). Self-report measures like the MASC represent a time-efficient way to capture information about a wide variety of anxiety symptoms. In the Pediatric Anxiety Disorders Program at Duke, all new patients/parents are asked to complete a comprehensive developmental questionnaire, the MASC, the Children's Depression Inventory, and the Conners Parent and Teacher Rating Scales before their first visit (March, Mulle, Stallings, Erhardt, & Conners, 1995a). Reviewing this information in advance of seeing the patient dramatically increases the efficiency of the clinical diagnostic interview by establishing a set of prior probabilities for specific diagnoses (Weinstein & Fineberg, 1980). The clinician is thereby freed to allocate more time to devising a comprehensive tailored treatment plan where the hammers (the treatments) accurately match the nails (the targets). Scales, like the MASC, will drive this process forward much to the benefit of anxious pediatric patients. References Albano, A.M. (1991). Assessment of anger in children: Multitrait-multimethod methodology and validation of a cognitive assessment method. Dissertation Abstracts International, 52(6-B). American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd rev. ed.). Washington, DC: American Psychiatric Press. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: American Psychiatric Press. Angold, A. (1989). Structured assessments of psychopathology in children and adolescents. In C. Thompsen (Ed.), The instruments of psychiatric research (pp. 271-304). Chichester, England: Wiley. Arnold, L. (1993). Design and methodology issues for clinical treatment trials in children and adolescents. Psychopharmacology Bulletin, 29, 3-4. Barlow, D.H. (1994). Psychological interventions in the era of managed competition. Clinical Psychology Science and Practice, 1(2), 109-122. Barlow, D.H. (1997). Cognitive-behavioral therapy for panic disorder: Current status. Journal of Clinical Psychiatry, 58(Suppl. 2), 32-6, discussion 36-7. Beidel, D.C. (1991). Social phobia and over-anxious disorder in school-age children. Journal of the American Academy of Child and Adolescent Psychiatry, 30(4), 545-552. Beidel, D., Turner, S., & Morris, T. (1994, March). The SPAI-C: A new child self-report inventory for children. Paper presented at the annual meeting of the Anxiety Disorders of America, Santa Monica, CA. Benjamin, R.S., Costello, E.J., & Warren, M. (1990). Anxiety disorders in a pediatric sample. Journal of Anxiety Disorders, 4(4), 293-316. Bentler, P. (1988). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238-246.

< previous page

page_318

next page >

< previous page

page_319

next page > Page 319

Bentler, P. (1995). EQS Structural Equations Program Manual. Encino, CA: Multivariate Software, Inc. Bentler, P., & Bonnett, D. (1980). Significance test and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588-606. Black, B. (1995). Anxiety disorders in children and adolescents. Current Opinion in Pediatrics, 7(4), 387-391. Bolen, K. (1989). A new incremental fit index for general structural equation models. Sociological Methods and Research, 17, 303-316. Carter, M. M., & Barlow, D.H. (1993). Interoceptive exposure in the treatment of panic disorder (Vol. 12). Sarasota, FL: Professional Resource Press. Chronbach, L. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row. Cicchetti, D., & Cohen, D.J. (1995). Perspectives on developmental psychopathology. In D. Cicchetti & D. Cohen (Eds.), Developmental psychopathology: Theory and methods (pp. 3-20). New York: Wiley. Cicchetti, D.V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Special Section: Normative assessment. Psychological Assessment, 6(4), 284-290. Conners, C., & March, J. (1996). The Conners/March Developmental Questionnaire. Toronto, Canada: MultiHealth Systems. Conners, C., March, J., Erhardt, D., & Butcher, T. (1995). Assessment of attention-deficit disorders. Journal of Psychoeducational Assessment, 28, 186-205. Costello, E., & Angold, A. (1988). Scales to assess child and adolescent depression: Checklists, screens, and nets. Journal of the American Academy of Child and Adolescent Psychiatry, 27(6), 357-363. Costello, E.J., & Angold, A. (1993). Toward a developmental epidemiology of the disruptive behavior disorders. Special Issue: Toward a developmental perspective on conduct disorder. Development and Psychopathology, 5(1-2), 91-101. Costello, E.J., & Angold, A. (1995). Epidemiology. In J. March (Ed.), Anxiety disorders in children and adolescents (pp. 109-124). New York: Guilford. Costello, E.J., Edelbrock, C., Costello, A.J., Dulcan, M.K., Burns, B.J., & Brent, D. (1988). Psychopathology in pediatric primary care: The new hidden morbidity. Pediatrics, 82(3 Pt. 2), 415-424. Dadds, M.R., Barrett, P.M., Rapee, R.M., & Ryan, S. (1996). Family process and child anxiety and aggression: An observational analysis. Journal of Abnormal Child Psychology, 24(6), 715-734. Dadds, M.R., Rapee, R.M., & Barrett, P.M. (1994). Behavioral observation. In T. Ollendick, N. King, & W. Yule (Eds.), International handbook of phobic and anxiety disorders in children and adolescents (pp. 349-364). New York: Plenum. Faraone, S.V., Biederman, J., & Milberger, S. (1995). How reliable are maternal reports of their children's psychopathology? One-year recall of psychiatric diagnoses of ADHD children. Journal of the American Academy of Child and Adolescent Psychiatry, 34(8), 1001-1008. Finch, A.J., Jr., Kendall, P.C., & Montgomery, L.E. (1976). Qualitative difference in the experience of state-trait anxiety in emotionally disturbed and normal children. Journal of Personality Assessment, 40(5), 522-530. Francis, G., Last, C.G., & Strauss, C.C. (1987). Expression of separation anxiety disorder: The roles of age and gender. Child Psychiatry and Human Development, 18(2), 82-89. Greenhill, L., Pine, D., March, J., & Riddle, M. (in press). Assessment issues in treatment research of pediatric anxiety disorders: What is working, what is not working and what needs improvement. Psychopharmacology Bulletin. Hayes, S. (1981). Single case experimental design and empirical clinical practice. Journal of Consulting and Clinical Psychology, 49, 193-211. Hinshaw, S., March, J., Abikoff, H., Arnold, L., Cantwell, D., Conners, C., Elliott, G., Greenhill, L., Halperin, J., Hechtman, L., Hoza, B., Jensen, P., March, J., Newcorn, J., Pelham, W., Richters, J., Severe, J., Schiller, E., Swanson, J., Veeren, D., Wells, K., & Wigal, T. (1997). Comprehensive assssment of childhood attention-deficit hyperactivity disorder in the context of a

< previous page

page_319

next page >

< previous page

page_320

next page > Page 320

miltisite, multimodal clinical trial. Journal of Attention Disorders, 1(4), 217-234. Hooper, S.R., & March, J.S. (1995). Neuropsychology. In J. March (Ed.), Anxiety disorders in children and adolescents (pp. 35-60). New York: Guilford. Hsu, L.M. (1995). Regression toward the mean associated with measurement error and the identification of improvement and deterioration in psychotherapy. Journal of Consulting and Clinical Psychology, 63(1), 141144. Jacobson, N.S., & Revenstorf, D. (1988). Statistics for assessing the clinical significance of psychotherapy techniques: Issues, problems, and new developments. Special Issue: Defining clinically significant change. Behavioral Assessment, 10(2), 133-145. Jensen, P.S. (1993). Development and implementation of multimodal and combined treatment studies in children and adolescents: NIMH perspectives. Special Section: Design and methodology issues for clinical treatment trials in children and adolescents. Psychopharmacology Bulletin, 29(1), 19-26. Jensen, P.S., Salzberg, A.D., Richters, J.E., & Watanabe, H.K. (1993). Scales, diagnoses, and child psychopathology: I. CBCL and DISC relationships. Journal of the American Academy of Child and Adolescent Psychiatry, 32(2), 397-406. Jensen, P.S., Traylor, J., Xenakis, S.N., & Davis, H. (1988). Child psychopathology rating scales and interrater agreement: I. Parents' gender and psychiatric symptoms. Journal of American Academy of Child and Adolescent Psychiatry, 27(4), 442-450. Kearney, C.A., & Silverman, W.K. (1990). A preliminary analysis of a functional model of assessment and treatment for school refusal behavior. Behavior Modification, 14(3), 340-366. Keller, M.B., Lavori, P.W., Wunder, J., Beardslee, W. R., Schwartz, C.E., & Roth, J. (1992). Chronic course of anxiety disorders in children and adolescents. Journal of the American Academy of Child and Adolescent Psychiatry, 31(4), 595-599. Kendall, P.C., Finch, A.J., Jr., Auerbach, S. M., Hooke, J.F., & Mikulka, P.J. (1976). The State-Trait Anxiety Inventory: a systematic evaluation. Journal of Consulting and Clinical Psychology, 44(3), 406-412. Kessel, J.B., & Zimmerman, M. (1993). Reporting errors in studies of the diagnostic performance of selfadministered questionnaires: Extent of the problem, recommendations for standardized presentation of results, and implications for the peer review process. Psychological Assessment, 5, 395-399. Last, C., & Silverman, W.K. (1993). Parent reports of child behavior problems: Bias in participation. Journal of Abnormal Child Psychology, 21(1), 89-101. Last, C.G., Strauss, C.C., & Francis, G. (1987). Comorbidity among childhood anxiety disorders. Journal of Nervous and Mental Disease, 175(12), 726-730. Last, C., Perrin, S., Hersen, M., & Kazdin, A. E. (1992). DSM-IIIR anxiety disorders in children: sociodemographic and clinical characteristics. Journal of American Academy of Child and Adolescent Psychiatry, 31(6), 1070-1076. Lenhart, L., & March, J. (1996). Treatment of Psychiatric Disorders in Children and Adolescents. In B. Levin & J. Petrilla (Eds.), Mental health services: A public health perspective (pp. 211-233). New York: Oxford University Press. Leonard, H.L., Goldberger, E.L., Rapoport, J. L., Cheslow, D.L., & Swedo, S.E. (1990). Childhood rituals: Normal development or obsessive-compulsive symptoms? Journal of the American Academy of Child and Adolescent Psychiatry, 29(1), 17-23. March, J. (1995). Anxiety disorders in children and adolescents. New York: Guilford. March, J. (1998). Manual for the Multidimensional Anxiety Scale for Children (MASC). Toronto: MultiHealth Systems. March, J., & Albano, A. (1996). Assessment of anxiety in children and adolescents. In L. Dickstein, M. Riba, & M. Oldham (Eds.), Review of psychiatry (Vol. 15, pp. 405-427). Washington, DC: American Psychiatric Press. March, J., & Albano, A. (1998). New developments in assessing pediatric anxiety disorders. In T. Ollendick & R. Prinz (Eds.), Advances in clinical child psychology (Vol. 20, pp. 213-242). New York: Plenum Press. March, J., Amaya-Jackson, L., Murry, M., & Schulte, A. (1998). Cognitive-behavioral psychotherapy for children and adolescents with post-traumatic stress disorder: A controlled trial of a new protocol-driven treatment

< previous page

page_320

next page >

< previous page

page_321

next page > Page 321

package. Journal of American Academy of Child and Adolescent Psychiatry, 37(6), 585-593. March, J., & Curry, J. (1998). The prediction of treatment outcome. Journal of Abnormal Child Psychology, 26(1), 39-52. March, J., Mulle, K., Stallings, P., Erhardt, D., & Conners, C. (1995a). Organizing an anxiety disorders clinic. In J. March (Ed.), Anxiety disorders in children and adolesents (pp. 420-435). New York: Guilford. March, J.S., Mulle, K., Stallings, P., Erhardt, D., & Conners, C.K. (1995b). Organizing an anxiety disorders clinic. In J. March (Ed.), Anxiety disorders in children and adolescents (pp. 420-435). New York: Guilford. March, J.S., Parker, J.D., Sullivan, K., Stallings, P., & Conners, C.K. (1997). The Multidimensional Anxiety Scale for Children (MASC): Factor structure, reliability, and validity. Journal of the American Academy of Child and Adolescent Psychiatry, 36(4), 554-565. March, J., Sullivan, K., & Parker, J. (in press). Test-retest reliability of the Multidimensional Anxiety Scale for Children (MASC). Journal of Anxiety Disorders. Marks, I. (1987). Fears, phobias, and rituals. New York: Oxford Unversity Press. McNally, R.J. (1991). Assessment of posttraumatic stress disorder in children. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 3(4), 531-537. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Ollendick, T.H., & King, N.J. (1994). Fears and their level of interference in adolescents. Behavious Research and Therapy, 32(6), 635-638. Ollendick, T.H., Matson, J.L., & Helsel, W.J. (1985). Fears in children and adolescents: Normative data. Behaviour Research and Therapy, 23(4), 465-467. Ollendick, T.H., & Yule, W. (1990). Depression in British and American children and its relation to anxiety and fear. Journal of Consulting and Clinical Psychology, 58(1), 126-129. Perrin, S., & Last, C.G. (1992). Do childhood anxiety measures measure anxiety? Journal of Abnormal Child Psychology, 20(6), 567-578. Reich, W., & Earls, F. (1987). Rules for making psychiatric diagnoses in children on the basis of multiple sources of information: Preliminary strategies. J Abnorm Child Psychol, 15(4), 601-616. Reynolds, C.R., & Paget, K.D. (1981). Factor analysis of the Revised Children's Manifest Anxiety Scale for blacks, whites, males and females with a national normative sample. Journal of Consulting and Clinical Psychology, 49, 352-359. Setterberg, S.R., Ernst, M., Rao, U., Campbell, M., Carlson, G.A., Shaffer, D., & Staghezza, B. M. (1991). Child psychiatrists' views of DSM-III-R: A survey of usage and opinions. Journal of the American Academy of Child and Adolescent Psychiatry, 30(4), 652-658. Shaffer, D., Campbell, M., Cantwell, D., & Bradley, S. (1989). Child and adolescent psychiatric disorders in DSM-IV: Issues facing the work group. Journal of the American Academy of Child and Adolescent Psychiatry, 28(6), 830-835. Shrout, P., & Fleiss, J. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86, 420-428. Silverman, W.K. (1987). Childhood anxiety disorders: Diagnostic issues, empirical support, and future research. Journal of Child and Adolescent Psychotherapy, 4(2), 121-126. Silverman, W.K. (1991). Diagnostic reliability of anxiety disorders in children using structured interviews. Special Issue: Assessment of childhood anxiety disorders. Journal of Anxiety Disorders, 5(2), 105-124. Silverman, W.K. (1994). Structured diagnostic interviews. In T. Ollendick, N. King, & W. Yule (Eds.), International handbook of phobic and anxiety disorders in children and adolescents (pp. 293-315). New York: Plenum. Silverman, W.K., & Eisen, A.R. (1992). Age differences in the reliability of parent and child reports of child anxious symptomatology using a structured interview. Journal of the American Academy of Child and Adolescent Psychiatry, 31(1), 117-124. Simon, G., Ormel, J., VonKorff, M., & Barlow, W. (1995). Health care costs associated with depressive and anxiety disorders in primary

< previous page

page_322

next page > Page 322

care. American Journal of Psychiatry, 152(3), 352-357. Spielberger, C., Gorsuch, R., & Luchene, R. (1976). Manual for the state-trait anxiety inventory. Palo Alto, CA: Consulting Psychologists Press. Steketee, G., Chambless, D.L., Tran, G.Q., Worden, H., & Gillis, M.M. (1996). Behavioral avoidance test for obsessive compulsive disorder. Behaviour Research and Therapy, 34(1), 73-83. Thyer, B.A. (1991). Diagnosis and treatment of child and adolescent anxiety disorders. Behavior Modification, 15(3), 310-325. Weinstein, M., & Fineberg, H. (1980). Clinical decision analysis. Philadelphia: Saunders. Weiss, B., Weisz, J., Politane, M., Carey, M., Nelson, W., & Finch, A. (1991). Developmental differences in the factor structure of the Children's Depression Inventory. Psychological Assessment, 3(1), 38-45. Weissman, M.M., Orvaschel, H., & Padian, N. (1980). Children's symptom and social functioning self-report scales. Comparison of mothers' and children's reports. Journal of Nervous and Mental Disease, 168(12), 736740. Wolff, R.P., & Wolff, L.S. (1991). Assessment and treatment of obsessive-compulsive disorder in children. Behavior Modification, 15(3), 372-393. Yang, B., Ollendick, T.H., Dong, Q., & Xia, Y. (1995). Only children and children with siblings in the People's Republic of China: Levels of fear, anxiety, and depression. Child Development, 66(5), 1301-1311.

< previous page

page_322

next page >

< previous page

page_323

next page > Page 323

Chapter 11 Characteristics and Applications of the Revised Children's Manifest Anxiety Scale (RCMAS) Anthony B. Gerard Western Psychological Services Cecil R. Reynolds Texas a&M University The Revised Children's Manifest Anxiety Scale (RCMAS; Reynolds & Richmond, 1985) assesses both the degree and quality of anxiety experienced by children and adolescents. Based on the original Children's Manifest Anxiety Scale (CMAS; Castaneda, McCandless, & Palermo, 1956), the RCMAS is a relatively brief instrument suitable for group or individual administration in both clinical and educational settings. Norms are available for children from age 5 through 19. The work that led to the development of the RCMAS and subsequent experience with the instrument have shown it to be a valid, useful indicator of anxiety. Because it can be administered to groups, it is highly suitable for use in school contexts. Administration of the RCMAS is, of course, only part of a thorough clinical evaluation of a child's anxiety. Because of the strategies employed in its development, however, the RCMAS can be an effective aid in guiding diagnosis and treatment at both the individual and institutional level. The RCMAS more than meets Koppitz's (1982) first basic rule of the use of personality tests with children: ''. . . use the simplest and the most economical tests, in terms of time and effort, first" (p. 275). If necessary, additional or more detailed questions can occur on follow-up. In these days of managed care, use of the RCMAS certainly makes sense in any screening for childhood psychopathology. Koppitz, for example, finds the RCMAS to be especially useful early in the evaluation process because it provides fodder for follow-up in the process of diagnosis and treatment. Overview The RCMAS is a 37-item instrument subtitled "What I Think and Feel." Each of the items embodies a description of feelings and actions that, in turn, reflect an aspect of anxiety. For that reason, all of the items are positively keyed, and scoring consists of a count of "Yes" responses. Yielding a Total Anxiety score, three empirically derived subscale scores (Physiological Anxiety, Worry/Oversensitivity, and Social Concerns/Concentration),

< previous page

page_323

next page >

< previous page

page_324

next page > Page 324

and a Lie scale score, the RCMAS is suitable for assessing anxiety in children and adolescents from age 6 to 19. Items from the RCMAS subscales are presented in Table 11.1. Because it is both brief and specific, it is useful as both a screener and an assessment instrument. The RCMAS is the product of an intensive development effort dependent on both research specifically aimed at the construction of a new instrument and a great deal of prior work on the measurement of anxiety. The research that led to the development of the RCMAS, including both the standardization and validation studies, was informed by the goal of producing an instrument that is powerful and flexible, but also brief. Consequently, the RCMAS is not only psychometrically sound, but meets many of the sometimes conflicting demands placed on a measure of a phenomenon as common, variable, and widespread as anxiety. Witt, Heffer, and Pfeiffer (1990) noted that the RCMAS "appears to be a reliable and valid measure of general anxiety" (p. 384), and that the RCMAS assesses the two primary modes of expression of anxiety: physiological and cognitive. Moran (1990) suggested that, compared with omnibus, multidimensional personality scales, the RCMAS is a more reliable measure of anxiety. Development The RCMAS addresses many of the limitations of the original CMAS. Although it had been used successfully for some timeand perhaps because of clinicians' extensive experience with itthe CMAS received a number of criticisms over the years. Teachers described some of the items as too difficult for younger children and poor readers. Some of the CMAS items, researchers and clinicians recognized, failed to meet the criteria usually applied to test items (Flanagan, Peters, & Conry, 1969). As research in the TABLE 11.1 Sample Items from the Four RCMAS Subscales Physiological Worry/Oversensitivity Social Lie (L) Anxiety (11 items) Concerns/Concentration (9 items) (10 items) (7 items) 18. My feelings get 1. I have trouble hurt easily. 3. Others seem to do 4. I like making up my things easier than I can. everyone I mind. know. 5. Often I have 22. I worry about 15. I feel alone even 8. I am trouble getting what is going to when there are people always my breath. happen. kind. with me. kind. 17. Often I feel 30. I worry when I go 31. It is hard for me to 32. I never sick in the to bed at night. keep my mind on my say things stomach. schoolwork. I shouldn't. 19. My hands 34. I am nervous. 35. A lot of people are 36. I never feel sweaty. against me. lie. 33. I wiggle in my seat a lot. Note. Items from the FCMAS ("What I Think and Feel"), copyright © 1985 by Western Psychological Services. Reprinted by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, U.S.A. Not to be reprinted in whole or in part for any additional purpose without the expressed, written permission of the publisher. All rights reserved.

< previous page

page_324

next page >

< previous page

page_325

next page > Page 325

anxiety of children progressed, it also became clear that the CMAS did not measure some important aspects of anxiety, or did not measure them thoroughly enough. In addition, users wanted an instrument that would be a valid measure of anxiety in children across a much wider age range. The development of the RCMAS was an effort to make a popular instrument better by addressing these issues (Reynolds & Richmond, 1978). As described more completely in the manual for the RCMAS (Reynolds & Richmond, 1985), instrument development included a number of goals that, if they could be met, would result in an instrument that was easy to use, psychometrically sophisticated, and clinically useful. The first objective was to create a measure of children's anxiety that was suitable for group administration, which requires an instrument with relatively few items that can be administered during a short time period. To meet objections about CMAS items, the items in the new instrument had to be clear and easy to read. The norms were conceived as addressing a broad range of contexts, demanding not only a large-scale standardization study, but one that took into account the manifestation of anxiety among diverse groups of children. To the extent possible, the development studies were designed to determine whether manifest anxiety is best conceived as unidimensional or multidimensional. Within the limits imposed by the construction of a practical instrument that teachers, researchers, and clinicians could actually use, development goals required that the measure as a whole satisfy contemporary psychometric standards. A new version of the CMAS suitable for standardization research was constructed with these goals in mind. Some of the wording of the items was altered so that they would be easier to read and understand, with the effect of improving the items, expanding the potential age range of the original instrument, and giving the instrument greater currency. Every effort was made to insure that items could be read at a third-grade level. New items generated by a panel of experienced teachers and clinicians were also included. This larger research instrument, which contained 73 items, was administered to 329 children representing the entire age range of the proposed instrument (grades 1-12). Using these data, the items themselves were subjected to a rigorous item analysis. All items with a probability of endorsement less than .30 or greater than .70 were eliminated; also, if the biserial correlation of an item with the total score was less than .40, it was eliminated. When these criteria were applied, 37 items remained (i.e., 28 anxiety items and 9 lie items). The KR20 reliability estimates for the total Anxiety score computed on the development sample and on a cross-validation sample of 167 children were in the .80 range. The low correlation obtained between the Total Anxiety and Lie scores was expected and desired. The new instrument contained five fewer items than did the original scale, but had reliabilities on the same order as those reported for the CMAS. The presence of fewer items almost automatically reduces the time of administration, a feature rendering the instrument more attractive as a screener than was its predecessor. In spite of the improvement in length, the new instrument retained 25 of the 28 Anxiety items from the original CMAS. Consistent with the results of previous studies (Bledsoe, 1973; Castaneda et al., 1956) and confirmed in even larger, more recent samples (Reynolds & Kamphaus, 1992), the girls received higher Total Anxiety scores than did the boys, suggesting the need for separate norms. Consistent differences appeared between the scores of Black and white participants, also suggesting the need for separate norms. Nevertheless, with some qualifications discussed in the RCMAS manual (Reynolds & Richmond, 1985), the scale behaved similarly regardless of age, ethnicity, or gender. Factor analytic procedures were used to develop the RCMAS subscales. The purpose of these procedures was twofold. First, factor analytic techniques address questions

< previous page

page_325

next page >

< previous page

page_326

next page > Page 326

about the unidimensionality or multidimensionality of anxiety. Second, the factors that have consistently emerged from a series of studies were used to establish a scale structure. For this reason, the current RCMAS embodies a rigorously derived theoretical model of manifest anxiety in children that has been tested against the results of an extensive series of studies. Factor analyses of the RCMAS have consistently yielded remarkably similar results. An early factor analysis of the CMAS yielded three factors labeled Worry/Oversensitivity, Physiological, and Concentration (Finch, Kendall, & Montgomery, 1974). When Reynolds and Richmond (1978) examined the factor structure of the RCMAS using the original development sample, they also retained a three-factor varimax solution as the most statistically and psychologically sound reflection of performance on the instrument. Ultimately, they applied essentially the same labels as those used by Finch et al. Subsequent factor analytic studies yielded results consistent with those of earlier studies. For example, a study by Reynolds and Paget (1981), employing the data from the RCMAS standardization sample (described in the following section), yielded a 5-factor solution consisting of three Anxiety factors and two Lie factors. A factor analytic study by Paget and Reynolds (1984) using data obtained from 106 learning disabled students had similar results, as did a study by Reynolds and Scholwinski (1985) using results obtained from a large group of gifted students. Factor analytic evidence obtained from RCMAS results and extended to a more comprehensive description of children's manifest anxiety suggests the presence of a strong general anxiety factor (Ag), represented by the Total Anxiety score of the RCMAS, and three more specific anxiety factors, represented by the Anxiety subscales of the RCMAS. Based on multiple large sample studies and expert review of the content (e.g., Finch et al., 1974), anxiety in children is represented within the RCMAS as a multidimensional construct. That both anxiety and the overall symptom presentation of children with various psychopathological disorders may be multidimensional has long been recognized (American Psychiatric Association, 1994). Therefore, the description of anxiety on which the RCMAS is based fits closely with the diagnostic and treatment process as a whole. Different children may present with different patterns of symptoms of anxiety and different symptoms may respond to treatment differently. The RCMAS is designed, through the presence of a general anxiety factor, to allow the clinician to assess and to monitor overall anxiety levels, and through the subscales, to permit monitoring of selective changes in symptom patterns. The RCMAS subscales also assist in differentiating between anxiety as a disorder (most likely when the Total Anxiety score is elevated) and anxiety as a symptom of other disorders (most likely when one or possibly two subscales are elevated but the Total Anxiety score remains below 70T). When anxiety is present as a symptom of another disorder such as depression, the RCMAS can be useful in identifying these symptoms and in monitoring their responsiveness to treatment as well. Standardization The standardization sample for the RCMAS was both large and diverse. It included approximately 5,000 children from age 6 to 19. Half were female and roughly 10% were Black. The participants represented all regions of the United States and were drawn from rural, suburban, and urban areas. In addition to the norms based on this large sample, group data reported in the RCMAS manual (Reynolds & Richmond, 1985) for

< previous page

page_326

next page >

< previous page

page_327

next page > Page 327

97 kindergarten children may be used as norms for this younger age group. The testing procedures used to collect these data employed the same instructions that presently accompany the RCMAS. All of the data were collected through group administration, with the items being read to the younger children. Standard score distributions for the RCMAS Total Anxiety scale, the three specific anxiety scales, and the Lie scale were derived through normalized transformation of the raw score distributions using the method of rolling weighted averages. Slight smoothing of the score distributions was necessary. For each scale, there are separate norms for boys and girls at each age from 6 to 16, as well as separate gender-by-ethnicity (Black and White) norms for each age. Scores for participants from 17 to 19 years old were collapsed to form a single normative group for each gender and for Blacks and Whites of each gender. Total Anxiety is expressed as a T-score with a mean of 50 and a standard deviation of 10; the scaled scores for the subscales have a mean of 10 and a standard deviation of 3. Psychometrics Reliability Two aspects of an instrument's reliability are usually of interest: the accuracy of scores at the time of assessment and the stability of scores over time. The first of these is largely a function of the internal consistency of the scale as a whole and of its subscales. The temporal stability of a test is ordinarily perceived as coextensive with its test-retest reliability. The statistic typically used to estimate internal consistency is coefficient alpha (Cronbach, 1951), and it is generally agreed that the coefficient alpha for a psychological scale should be approximately .70 or more (Nunnally, 1978). Across all age and ethnicity groups as well as across samples, coefficients alpha for the Total Anxiety score of the RCMAS are, with few exceptions, in the .80 range. For the Physiological Anxiety subscale and the Social Concerns/Concentration subscale, the alpha coefficients are typically in the .60 or .70 range, but are occasionally below .60. Reliability estimates for the Worry/Oversensitivity and Lie subscales are typically in the .70 or .80 range. The reliability of RCMAS scores has also been demonstrated to be equivalent for children with disabilities (Paget & Reynolds 1984). Another view of coefficient alpha is that it assesses how well (i.e., accurately) the particular set of items in question samples the domain of potential items. The relatively high alpha value obtained for the RCMAS thus indicates a good sampling of the general domain of potential anxiety items. Reynolds (1981) reported a test-retest reliability coefficient of .68 for the Total Anxiety score after an interval of 9 months, which is relatively high given both the time between testings and the temporal stability of personality measures in general. Validity The RCMAS rests on a sound empirical foundation, which is described in detail in the test manual. A substantial proportion of the validity picture for the RCMAS depends

< previous page

page_327

next page >

< previous page

page_328

next page > Page 328

on the results of the factor analyses that determined the instrument's scale structure. These findings suggest that RCMAS results are constant across a range of subject variables, including gender, ethnicity, and IQ. In addition, a series of studies comparing children's RCMAS scores with their scores on the State-Trait Anxiety Inventory for Children (STAIC; Spielberger, 1973) demonstrated that RCMAS scores are highly correlated with scores on the Trait subscale and essentially uncorrelated with scores on the State subscale (Reynolds, 1980, 1982, 1985). These results comport with the conception of the RCMAS as a measure of manifest anxiety, conceived as an enduring response to stress. Validity research on the RCMAS is voluminous. The original article reporting the development of the RCMAS (Reynolds & Richmond, 1978) is the most frequently cited article published in the Journal of Abnormal Child Psychology as of this writing; because of this distinction, it was reprinted in the 25th anniversary issue of the journal. In implicit acknowledgment of the extensive data supporting the RCMAS as a valid measure of chronic, manifest anxiety, the RCMAS is commonly used in studies validating other instruments (e.g.., Casey, Lubin, & Bruwer, 1992; Kaslow, Stark, Pritz, Livingston, & Tsai, 1992; Kearney & Silverman, 1993). Cross-Cultural Applications and Equivalence Unlike the vast majority of personality scales available, the RCMAS has been examined extensively for its cross-cultural validity, as well as for ethnicity and gender bias. There is surprisingly little empirical work designed to detect cultural bias in personality tests or their individual items (Reynolds, Lowe, & Saenz, in press), and the RCMAS is one of a very few personality scales for which cross-cultural and cross-gender bias has been examined (Moran, 1990). Dana (1993), reviewing cross-cultural assessment with personality scales, concluded that most comparative studies across cultural groups have used inadequate statistics, selected samples inappropriately, and failed to provide an adequate basis for cross-cultural application of most measures of affect or personality. The RCMAS is an exception, having a foundation in several studies of ethnic and gender bias, which are reviewed in the RCMAS manual (Reynolds & Richmond, 1985). Unlike the typical study of crosscultural applications of personality tests that tend to focus principally on mean score differences among groups, studies of cross-cultural applications of the RCMAS have focused on issues of validity across gender and ethnic lines. Most cross-cultural research involving the RCMAS has examined its use with Blacks and Whites within the United States, although some work in other countries has been done (Boehnkem, Silbereisan, Reynolds, & Richmond, 1986; Pela & Reynolds, 1982). Cultural and ethnic influences on the willingness to report affective responses and characteristics can be quite strong (Moran, 1990). Therefore, it is necessary to assess how easily a measure of affect traverses ethnic and gender boundaries. RCMAS item bias was evaluated empirically by Reynolds, Plake, and Harding (1983), who found that the RCMAS does contain some potentially biased items. Individuals from different ethnic backgrounds and of different genders, but with equivalent levels of anxiety, respond differently to some items on the RCMAS. The effect was, however, acceptably small, the race-by-item and gender-by-item interaction terms both being associated with effect sizes of less than 1% cumulatively across all of the items, suggesting little if any bias of clinical significance. The direction of the bias was found to be balanced across groups as well.

< previous page

page_328

next page >

< previous page

page_329

next page > Page 329

Comparative factor analysis across groups is another method of examining cross-cultural equivalence of tests that is viewed as quite important in determining whether test-takers of various backgrounds perceive and respond to a given item based on a common latent cognitive structure (Dana, 1993; Reynolds et al., in press). Reynolds and Paget (1981) examined the factor structure of the RCMAS across ethnicity and gender for a large sample of Blacks and Whites from 5 to 19 years old. The high coefficients of congruence obtained demonstrate the equivalence of the factor structure across groups. Examination of the internal consistency of the scales across groups revealed that young Black females (below age 11) responded less reliably than other groups to these anxiety items, but this finding has not been replicated. Considerably more detail regarding these various results may be found in the RCMAS manual (Reynolds & Richmond, 1985). Few, if any, personality scales have been scrutinized cross-culturally as carefully as the RCMAS. It is now in use in more than 16 countries throughout the world, representing a myriad of cultures. Because of its emphasis on sound psychometric principles in its early development and on the universal construct of trait anxiety, it has held up well. At this stage, clinicians should be relatively comfortable in applying RCMAS results to the diagnosis of minority group members in the United States, and to monitoring the effects of treatment on minority group members. Clinicians in other countries are also likely to find local literature addressing cross-cultural applications of the RCMAS. Basic Interpretive Strategy A child experiencing high stress at home or in school is likely to reveal this stress in responses to the RCMAS items. The results may guide counseling sessions through identifying the sources of anxiety and the means for ameliorating fearful and stressful reactions. Not only do the Total Anxiety score and the scores on the anxiety subscales suggest the character of the child's anxiety, but responses to individual items may indicate specific areas of concern. Children with depression who go untreated also show significant increases over time in their anxiety levels, as measured on the RCMAS (DuBois, Felner, Bartel, & Silverman, 1995). Symptoms of anxiety may be masked by externalizing disorders, especially when the assessment follows an interview format, but are often present in children with externalizing disorders who do not receive an anxiety diagnosis. In either instance, such symptoms may well require treatment (Rabian, Peterson, Richters, & Jonsen, 1993). Although the RCMAS can be a powerful tool for identifying and classifying anxiety, interpretation of RCMAS results, particularly those indicating the presence of significant anxiety, must always be embedded within a frame of reference informed by empirical research and clinical experience. To determine the validity of RCMAS results requires both the application of clinical insight and attention to the form of the child's responses. The administration of the instrument constitutes part of the larger evaluation process because it affords the opportunity to observe the child's willingness to answer the items carefully and honestly. Obvious resistance to taking the test or a marked inability to record self-perceptions deserves particular attention. Because few children have trouble completing the RCMAS, failure to complete it according to instructions signals a problem. Resistance to reporting symptoms is most often accompanied by elevated Lie

< previous page

page_329

next page >

< previous page

page_330

next page > Page 330

scale scores, but children who resist even completing the scale require additional investigation. First, it must be determined whether the child can read the questions and may be resisting out of fear or embarassment. If the RCMAS or another objective questionaire is the first task facing a child, then the examiner may wish to move to something perhaps less threatening like a simple projective drawing. Continued resistance may require long-term efforts to establish a relationship with a troubled, cautious child before the child is comfortable relating feelings and cognitions to the clinician. In some instances, of course, a child clearly suffering from anxiety does not receive high RCMAS scores. Even in those instances when the scores themselves do not contribute to an accurate assessment of the child's level of anxiety, the pattern of responding may provide other clinically useful information, signaling difficulties with concentration, reading problems, or defiance. The Lie subscale, in addition to providing a check on the validity of the child's responses, functions as a measure of "faking good," or defensiveness manifesting as the need to provide socially desirable responses, which can sometimes point to a distorted view of self and others. The raw Total Anxiety score ranges from 0 to 28. The first task in the interpretation of an RCMAS protocol is to determine the deviance of the Total score. In general, scores falling at least one standard deviation from the mean (60T or greater) are of clinical interest, and those falling two standard deviations or more from the mean (70T or greater) are clearly deviant and may indicate significant pathology. As discussed under the heading "Item Analysis" later in this chapter, it is important to note which items the child endorses, because the individual pattern of item endorsement may indicate problems that are not indicated by the pattern of subscale scores. Each of the three factor-based subscales reflects a different aspect of anxiety. A high score on the Physiological Anxiety subscale suggests that the child is experiencing a number of the physiological signs of anxiety, such as stomach pains and sweaty hands. A high score on the Worry/Oversensitivity subscale is a sign that the child internalizes the experience of anxiety. Because this often means that the child feels overwhelmed, it is important for him or her to develop ways of relieving anxiety through discussing feelings and of coping through reaching out to others. A high score on the Social Concerns/Concentration subscale suggests that the child feels unable to live up to the expectations of parents and other important figures. The feeling of not being as capable as others can generate a level of anxiety that makes it difficult to concentrate on schoolwork or other responsibilities. Responses to individual RCMAS items can yield information or suggest clinical hypotheses about the origin and nature of a child's anxiety. There are no norms for individual items. Because the items reflect aspects of anxiety, however, examination of individual items can help in determining the extent and character of the child's anxiety. In addition, it may be possible to discuss each endorsed item with the child, not only giving the clinician more information regarding the child's distress, but giving the child practice in exploring and expressing emotion. Such discussions about RCMAS items can, therefore, serve both assessment and treatment goals. Use of the RCMAS in Treatment Planning Anxiety appears to be, in part, a function of the increasing complexity of society. Paradoxically, as industrialization and social modernization improve the latitude for action of many individuals, the number of decisions and the pressures to keep pace

< previous page

page_330

next page >

< previous page

page_331

next page > Page 331

provide the perfect atmosphere to elevate anxiety levels. Relatively low levels of anxiety can facilitate performance, but chronic anxiety ultimately reduces an individual's effectiveness and can adversely affect both mental and physical health. Anxiety is the most frequent indicator of mental health problems, and anxiety may form the basis of depression. In a wide range of psychotherapeutic settings, the first task of the psychologist, psychiatrist, or counselor is to alleviate the symptoms of anxiety, permitting the client to function more easily and effectively. Anxiety is unique among the psychopathologies in that it may be either a symptom or a disorder. Research results, review of the DSM-IV, and clinical experience with patients clearly show, moreover, that anxiety and depression are related constructs, but can and should be differentiated (Crowley & Emerson, 1996; Ialongo, Edelsohn, Werthamer-Larsson, Crockett, & Kellam, 1996; Reynolds & Kamphaus, 1992). It is, of course, common for children with a diagnosis of depression to display significant symptoms of anxiety. The RCMAS is useful across a range of clinical contexts in part because its results comport well with the DSM-IV criteria for Generalized Anxiety Disorder (GAD; Tracy, Charpita, Douban, & Barlow, 1997), but it is detailed enough in its assessment approach to address anxiety as a symptom of other disorders. (As pharmaceutical advertisements claimed so insistently in the 1970s and 1980s, "underlying every depression, is anxiety.") The specificity of the RCMAS helps clinicians to distinguish the two problems and thus address them in treatment planning, and to monitor the breadth of the child's symptom patterns in response to interventions (Crowley & Emerson, 1996). Unfortunately, vulnerability to the stresses of society is not confined to adults. Many children experience anxiety in response to the pressure placed on them in a world that demands ever more decisions and ever higher performance. Naturally, school represents perhaps the most common source of stress for children; they worry about their academic progress and they grow apprehensive with the approach of each test. Peer and family relationships constitute other sources of anxiety for children. Younger children become involved in negative interactions on the playground or with their siblings; adolescents face the prospect of relationships with members of the opposite sex, which is a realm full of worries even for a relatively well-adjusted child. Problems with one or both parents can, of course, manifest themselves in debilitating anxiety and lead to further negative self-talk, a downward spiral associated with increased anxiety levels in children and adolescents (Rowan & Kendall, 1997). For these reasons, the character and extent of a child's anxiety are important information for the clinician, teacher, or parent. It can also be of great value to the child, assuming it is presented in a manner consistent with his or her level of development. Because anxiety appears to result inevitably from the complexities of life as it is now constituted, efforts not only to treat anxiety can be seen as lifelong, and they can begin with an understanding gained in childhood or early adolescence. The key to such an approach is organized information about the individual's anxiety, information that can be gained, in part, from examination of the RCMAS profile. Objective measures of anxiety play an essential role in identifying a child's problems. The teacher, parent, or mental health professional may not be fully aware of the complex interrelation of emotion, stress, and performance in a child's life. A structured description of each child's level of anxiety can help a teacher gauge the overall level of anxiety in the classroom, which can help in predicting which children will need intervention. By the same token, parents armed with fairly precise information about a child's level of anxiety may be in a better position to help a child cope with anxiety-provoking circumstances. A

< previous page

page_331

next page >

< previous page

page_332

next page > Page 332

counselor, social worker, or psychologist can, of course, make use of objective data about a child's anxiety in treating an array of difficulties. In addition, structured and specific information about anxiety presented directly to the client may support efforts to cope with the pressures of growing up. Because children usually cannot recognize either the extent or antecedents of anxious feelings, they naturally cannot discover effective strategies for overcoming those feelings or their possible effects. For example, children typically cannot reason that anxieties rooted in family relationships have caused their grades to fall. A closer look at a family conflict and the stresses within it, as well as the emotional and physical reactions to those stresses, can help them to develop better means of adapting and coping. Interpretive Strategies and Treatment Planning By providing insight into the child's feelings across situations, the RCMAS can illuminate the process of treatment planning. Because of the prevalence of anxiety and its relation to depression, the evaluation of a child's anxiety level is crucial to the larger process of assessment. Information about the overall level of anxiety, as well as more detailed information about the type and pattern of anxiety symptoms, may inform the choice of treatment modality. As a relatively brief instrument, the RCMAS also lends itself to use in screening for anxiety in the classroom, and RCMAS results can be used to guide the design of programs for preventing or ameliorating anxiety among groups of children. The development of the RCMAS assured that individual items correspond closely to symptoms associated with anxiety. For that reason, the child's endorsement of a given item or group of items points directly to his or her symptomatology. This information is available for use in counseling sessions to generate discussion, identify causes of anxiety-related symptoms, and construct efforts to alleviate those symptoms. Furthermore, because the RCMAS items embody anxiety symptoms, including anxiety-driven attitudes and behaviors, attention to the items the client has endorsed can support the selection of treatment modality. The Total Anxiety score, the pattern of RCMAS subscale scores, and the individual items all provide information useful in treatment planning. Among other things, the overall level of anxiety accurately predicts the degree of dysfunction. The pattern of scale scores, if it is consistent with other test scores and with additional information about the client, can reveal the contours of the client's experience of anxiety. The RCMAS items themselves provide clues to the child's condition, and they can be incorporated into the treatment process. Total Anxiety. The Total Anxiety score indicates the breadth of symptomatology and the best assessment of the presence of GAD. Treatment approaches, such as cognitive behavior modifications (CBM) or perhaps play therapy with younger children, may be appropriate. This score is sensitive to treatment effects as well, and should decline over time. Physiological Anxiety. The score on the Physiological Anxiety scale is important to both diagnosis and treatment planning because physiological symptoms are central to the experience of anxiety. Most of the items on this scale correspond closely to the symptoms of chronic overarousal: sleep problems, nightmares, irritability, indecisiveness, restlessness. Although all of the items ultimately imply a negative physical response

< previous page

page_332

next page >

< previous page

page_333

next page > Page 333

to stress, a few itemsthose referring to sweaty palms, breathlessness, and nauseacorrespond to the more immediate aspect of anxious arousal. Learning disabled children, along with children who have experienced trauma, tend to have elevated scores on this scale as well. Those with high scores on this scale are experiencing the signs of physical tension and the accompanying autonomic arousal. For that reason, it may be necessary to select a form of treatment that more or less directly reduces the level of tension. Some types of strenuous physical exercise, such as running or swimming, may help in achieving this goal. On the other hand, training in progressive relaxation, for example, may be used to reduce the overall level of tension and may also form a part of treatment for specific anxieties such as test anxiety. Biofeedback is often chosen for individuals with these symptoms. Worry/Oversensitivity. Items on the Worry/Oversensitivity scale either contain the word ''worry," or mention the experience of fear, nervousness, and excitability. A high score on this scale indicates strong reactions to environmental pressures. Because this often means that the child feels overwhelmed by external events and internal pressures, it is important for him or her to develop ways of relieving anxiety through discussing feelings and of coping through reaching out to others. Social Concerns/Concentration. Items on this subscale tend to reflect concerns about the self in interaction with others and a tendency to express problems with concentration. A good assessment of social skills and other measures of interpersonal relationships, available in the Behavior Assessment System for Children (Reynolds & Kamphaus, 1992), would be an excellent follow-up to elevations on this scale. In addition to CBM, social skill development and practice in role playing might also be useful. Negative self-talk is a significant problem for those children who are overly concerned that they are not as good, effective, or capable as others. Item Analysis and PTSD. Individual item responses may be of particular importance in children suspected of having posttraumatic stress disorder (PTSD). Although hypervigilance, a key organic symptom of PTSD, is associated with general increases in anxiety levels, more specific symptoms may appear. A content analysis of items in comparison with the DSM-IV criteria for a diagnosis of PTSD suggests the following critical items, some or all of which may be PTSD-related: Item 6: I worry a lot of the time. Item 13: It is hard for me to get to sleep at night. Item 22: I worry about what is going to happen. Item 25: I have bad dreams. Item 29: I wake up scared some of the time. Item 30: I worry when I go to bed at night. Item 37: I often worry about something bad happening to me. This list is not exhaustive, by any means, but represents symptoms associated with PTSD in a wide variety of cases. Although multiple approaches are necessary in the appraisal of PTSD and the monitoring of its resolution in treatment, endorsement of select RCMAS items does predict abuse, particularly sexual abuse (Spaccarelli & Fuchs, 1997). As these symptoms resolve, treatment may be seen to progress.

< previous page

page_333

next page >

< previous page

page_334

next page > Page 334

Monitoring and Treatment Outcome Assessment Comparisons of a client's scores involving, for example, multiples of the standard error of the difference or increments based on the scale standard deviation are not recommended for use in evaluating clinical change based on RCMAS results. Although statistical or psychometric criteria are often applied to the problem of evaluating change in therapy, such efforts are fraught with difficulty. Much could be said about the complexities of comparing change scores, but there are two aspects of the statistical context that are worth noting: First, all scores are subject to statistical regression, which means that any purely statistical evaluation of change scores must take regression into account; second, comparisons involving a single client's scores have limited statistical power because only one pair of scores is involved. Going beyond these technical considerations, the issue at hand is not whether the change in scores is significant according to some essentially arbitrary standard, but whether the level of relief is significant according to the client. For this reason, the recommended method for assessing treatment outcomes using RCMAS results is the simple tracking of score trends, a tracking informed by clients' description of their status. The principal method for evaluating change is to look for consistent declines in anxiety as reflected in the Total score or the subscale scores, rather than attempting to identify specific point differentials that theoretically reflect clinically meaningful change. It may be useful, for example, to readminister the RCMAS periodically and to record the results graphically. This form of outcome assessment may appear to be less satisfying than procedures that yield actual numbers against which to measure progress. On the other hand, it is also methodically sound without depending on the complicated calculations required by a statistically rigorous process that may fall short of the appropriate standardthat is, clinical significance. Case Studies Case 1 Distractibility and Parental Neglect In some cases, ruling out anxiety can prove to be of value in arriving at an accurate assessment that will, in turn, lead to appropriate treatment. A lack of concentration and a tendency to act out can result from anxiety, but they may also reflect other problems, including conditions that affect attentional mechanisms. Furthermore, it is important to see past a child's presenting symptoms to the possibility of problems within the family. John, who is 8 years old, is in the third grade. His parents brought him to a counseling center because of his problems at school. John's teacher describes his behavior as immature and inattentive. He looks around the schoolroom and out the window frequently, and talks to other children during lessons. Not only does this behavior interfere with his own learning, but it disrupts learning and discipline in the classroom as a whole. On the surface, there did not seem to be significant problems in John's family. His mother, who is a native of an Asian country, has two children by a previous marriage. She met his father while he was overseas in the military, and they have lived in the United States for 6 years. Both parents were college students, and all three children lived with them. They had few problems with John at home, although they had noticed

< previous page

page_334

next page >

< previous page

page_335

next page > Page 335

that he rarely stayed with any task for very long. He got along well with his older brother and sister. His achievement and IQ test scores suggested that John is of average ability. On the Wechsler Intelligence Scale for Children (WISC-III; Wechsler, 1991), he obtained a Verbal IQ of 99 and a Performance IQ of 94, resulting in a FSIQ of 96. His performance on the Human Figure Drawing Test (Mitchell, Trent, & McArthur, 1993) suggested a mental age of 8-0; he obtained a developmental age of 7-6 to 8-0 on the Bender-Gestalt Test (Clawson, 1962). His achievement scores on the Norris Educational Achievement Test (NEAT; Switzer & Gruber, 1992) were adequate for his age level; he scored 107 on Reading, 107 on Spelling, and 101 on Arithmetic, and registered a grade level of 3.6. John's behavior during the testing session was characterized by the psychologist as initially cooperative and polite. He proceeded, however, to exhibit avoidance behaviors, such as saying that he was tired, bored, or hungry. During testing, he got up and walked around the room asking questions, which necessitated bringing him back on task several times. At the end, he appeared to miss a few items deliberately to shorten the time. Consequently, his achievement and IQ test results may not reflect his full ability. John's RCMAS scores suggested average or below-average levels of anxiety, but also a response bias. John obtained a scaled score of 17 on the Lie scale. His Total Anxiety score, however, was only 46, and he had scaled scores of 11, 6, and 11, respectively, on the Physiological Anxiety, Worry/Oversensitivity, and Social Concerns/Concentration subscales. Of the 11 anxiety items he did endorse, 6 were on the Physiological Anxiety subscale, 4 were on the Social Concerns/Concentration scale, and I was on the Worry/Oversensitivity subscale. Although he did not appear to suffer from anxiety, John's Lie score suggested the need to present himself in a socially desirable light, and the pattern of his scores is consistent with the presenting problem. In conference, the parents recognized that their involvement in their own activities had left them with little time for John and the other children. Consequently, they were not aware of John's distractibility and immature behavior. Because his mother was especially busy with studying, household chores, and adapting to a new society, she spent practically no time with John individually. John's immature behavior did not result in his obtaining the social rewards he needs. Until the crisis precipitated by his teacher's report, this behavior did not help him gain the attention of his parents. The defensiveness he displayed in his RCMAS responses implies that he viewed himself as perfect in order to compensate for the experience of rejection; not only did he feel neglected by his parents, but his peers tended to avoid him because of his acting out. He needed help to find his way out of this vicious cycle. The parents decided to restructure their daily routines, allowing more time to interact with John and his siblings, and they planned more family activities. Although the elevations of his RCMAS scores do not indicate problems with anxiety, the pattern of scores is consistent with the possibility that John has difficulties with arousal and attention. He received assistance in developing better on-task behavior, and in asking for help in a responsible way. It is assumed that if these changes prove insufficient, John would be evaluated for ADHD. Case 2 Acting-Out Adolescent Jeannie, a 17-year-old girl in residence at a group home for emotionally disturbed adolescents, has had serious academic, emotional, and social problems since she was in

< previous page

page_335

next page >

< previous page

page_336

next page > Page 336

8th grade. She has managed to complete 10th grade, but her problems persist. Those problems began to appear around the time her brother was born and her interest in boys emerged. Prior to that, she had been a good student, earning As and Bs, and an outstanding athlete. The previous year she had been an all-star pitcher in the local American Girl softball league, and she was expected to make the varsity on her high school softball team as a freshman; she also excelled in several track and field events. Her father had always been proud of Jeannie's talent and of her willingness to work in order to succeed on the field. Ever since her special abilities began to appear when she was in elementary school, the relationship between father and daughter had increasingly revolved around her participation in sports and his role as her coach. When she started dating and stopped participating in sports, her father objected strongly and a serious rift quickly developed between them. At the same time, and perhaps out of disappointment with his daughter, the father began to focus on his infant son. Feeling shut out herself, Jeannie let her grades slip, became promiscuous, started smoking cigarettes, and began abusing drugs and alcohol. When he discovered a pack of cigarettes and a bag of marijuana in her room, Jeannie's father threw her out of the house, claiming that she had betrayed everything he had taught her. Jeannie moved in with her mother's sister, who lived in the same town, promising to "cool her jets." Soon, however, she stopped coming home at night, and had all but dropped out of school. When she had a falling out with her aunt, with whom she had always been close, the family decided that Jeannie had to enter the group home, which was in another town about an hour's drive away. Apparently, Jeannie is of roughly average intellectual ability. She obtained a Full Scale IQ of 94 on the Wechsler Adult Intelligence Scale-Revised (WAIS-R; Wechsler, 1981); her Performance IQ was slightly elevated, but not significantly higher than her Verbal IQ. Her scores on the NEAT were 102 for Arithmetic, 119 for Reading, and 98 for Spelling. Although she made no errors on the Bender-Gestalt, she did have several erasures and made second attempts at some drawings. Jeannie's performance on projective instruments was revealing. Her responses on a sentence completion task and the Thematic Apperception Test (TAT; Murray, 1943) revealed a strong attachment to her father and a need for his acceptance. She was frightened and depressed because he no longer wanted her in the home. Quite dependent, with a poor self-concept and a marked inability to envision solutions to her problems, she expected others to solve her problems for her. Her descriptions of her own previous behavior alternately reflected strong feelings of guilt and a deep sense of rejection. Jeannie's RCMAS scores comport closely with the rest of the symptom picture. She had a Total Anxiety T-score of 74 (99th percentile), and Physiological Anxiety, Worry/Oversensitivity, and Social Concerns/Concentration subscale scores of 19, 16, and 14, respectively; her Lie scale score was 9. Clearly, she was experiencing high levels of anxiety. When interviewed, she reported feeling so "nervous and upset" that she found it nearly impossible to cope with any kind of stress, especially that related to her family situation. She also reported feeling overcome by worries about her future. Her pleasant world of acceptance, success, and love had deteriorated rapidly over the previous 2 years. She was afraid, and saw herself as unable to deal with her situation. In the context of her history and her other test scores, the pattern of Jeannie's RCMAS scores guided the approach to treatment. Most of the RCMAS items she endorsed fell on the Physiological Anxiety and Worry/Oversensitivity scales. Although Jeannie's overall anxiety is high, according to her self-description in interviews she suffers most from the physical elements of nervous arousal, from badly hurt feelings,

< previous page

page_336

next page >

< previous page

page_337

next page > Page 337

and from her fears about the future. Therefore, the intervention was organized around three goals. First, it was deemed important to find ways for her to relax by releasing pent-up energy. Second, efforts were made to eliminate some of the external sources of stress from her life. Third, she needed to confront her feelings of hurt and rejection, perhaps as a first step in reconciling with her father. The means to address the physiological component of her anxiety were already a large part of her life. Jeannie was encouraged to reacquaint herself with athletics, but with a difference. Instead of focusing on competition, which was too reminiscent of the difficulties that brought her to treatment, she focused on conditioning, including weight training and aerobic exercise. Eventually her huge competitive spirit could not be denied, and she enrolled in a kung-fu course, which had the dual benefit of bringing her back into competition and of helping her develop greater self-confidence. To reduce her fears about the future, Jeannie must acquire some skills and education. She is already a year behind in school. Therefore, she receives additional tutoring to help insure that she will progress at an acceptable pace in her academic work. Because she already has had the experience of succeeding in school, she simply needs to rediscover the skills and strategies she used before she entered high school. It is hoped that as she does succeed in her schoolwork, her self-image will improve. The most difficult part of the treatment involves healing the hurt Jeannie has sustained in the destructive conflict with her father. Clearly, she needs to differentiate her own contribution to her difficulties from her father's overly harsh reaction to the changes in her, which were partly a consequence of her entry into adolescence. For a child like her, so dependent on the good opinion of others (particularly her father), the more or less sudden withdrawal of affection was devastating. Jeannie often describes what happened as her father having lost interest in her because she was "just a girl" and he finally had a son. Interviews with the father indicate that he recognizes that he did not handle the situation with appropriate sensitivity. He is still very angry at his daughter, but he may be ready to reconcile if, as he says, "she stops using drugs and sleeping around." For her part, Jeannie must acquire a more realistic view of her father and her family, and better control over the angry impulses that lie at the root of her acting out. She is receiving training in both impulse control and assertiveness. Case 3 Academic Underachiever Chuck was 14 years old and was in the eighth grade at the time of this evaluation. His father is a high school mathematics teacher, and his mother is a nurse. He has a 10-year-old brother. He was referred by his parents for counseling because of his poor academic progress. Unable or unwilling to concentrate on his schoolwork, Chuck usually does not do his homework and seems uninterested in academic achievement. The parents, who have always valued good grades, reported having tried everything to encourage Chuck to do better. Although they punished him, praised him, and helped him with his homework, he remained unmotivated. They grew concerned that they were punishing him too much and that the conflict over Chuck's grades had become the most significant feature of family interaction. In addition, the younger brother began to resent the situation because Chuck received more attention. Chuck's test results suggest that his academic achievement does not match his ability. On the WISC-III, Chuck obtained a Verbal IQ of 100 and a Performance IQ of 128, resulting in a FSIQ of 113, which is in the high average range. On the other hand, his performance on the Wide Range Achievement Test (WRAT-3; Wilkinson, 1993) was

< previous page

page_337

next page >

< previous page

page_338

next page > Page 338

poor. He is functioning at the 9th percentile in Arithmetic, the 18th in Spelling, and 42nd in Reading, which are scores lower than his ability would predict. He reports an interest in science and art, but indicates that school is usually boring. Chuck's RCMAS scores suggest that he is highly anxious. His Total Anxiety score of 75 (at the 99th percentile) reflects his responding in the scale-positive direction on all but five of the anxiety items. Not surprisingly, therefore, his scores on the anxiety subscales are also high: 15 on Physiological Anxiety, 16 on Worry/Oversensitivity, and 16 on Social Concerns/Concentration. His Lie scale score was 9. Chuck is an extremely anxious youngster with anxiety severe enough to interfere with his concentration and with his ability to develop better social and interpersonal skills. He realizes that his poor grades form the subject of many family conflicts. He expresses concern that he does not have any friends, and he believes that his parents, as well as his peers, dislike him. Chuck is unhappy and sees no solution to his problems. His relatively low verbal ability may reflect his inability to verbalize his feelings of frustration and anxiety. The RCMAS subscale scores he received suggest that Chuck's anxiety has affected him in many ways. Nevertheless, his Social Concerns/Concentration score of 16 represented endorsement of every item on that scale. For that reason, and because the presenting problem was his inability to concentrate, the initial twopronged intervention focused on this aspect of his anxiety. He received help recognizing the feelings of anxiety he experiences in social situations. At the same time, treatment involved an attempt to restructure his perceptions of others and of his relationships with them. The items on the Social Concerns/Concentration scale became the basis of discussions about the realities of social interaction, and about the accuracy of his ideas about people and situations. For example, one of the items (11) is "I feel that others do not like the way I do things," and another (27) is "I feel someone will tell me I do things the wrong way." Chuck was asked to examine whether others usually disapprove of his actions or see him as incompetent. One item on the Social Concerns/Concentration scale (3) is "Others seem to do things easier than I can." He was helped to see that some people are indeed better than he is at some things, but that this is not a reflection on him and does not, in itself, cause others to see him as incompetent or unattractive. Chuck's grades have improved. Moreover, he recently asked a girl to go with him to a dance, and she accepted. Conclusions The design of the RCMAS facilitates it use in planning and monitoring treatment. The RCMAS, more or less in its current form, has been used in many research studies since the late 1970s. Prior to that time, more than 100 research articles using the CMAS appeared as part of the effort to define accurately the nature of manifest anxiety in children and its relation to a number of cognitive, affective, and achievement variables. Well over 100 papers using the RCMAS have appeared since the 1978 revision. The development of the RCMAS proceeded, in part, from a recognition that a scale used to identify the symptoms of anxiety must embody features that permit the detection of relations between anxiety and other disorders, as well as between anxiety and external factors.

< previous page

page_338

next page >

< previous page

page_339

next page > Page 339

The RCMAS is a flexible, powerful instrument that functions well as an assessment tool, but is sufficiently brief to use in screening for anxiety within a wide range of clinical and institutional settings. It is a useful diagnostic indicator and an effective guide to treatment selection. It is also a tool that can be used in therapy, because scores on the subscales can help in characterizing a child's anxiety and in identifying sources of anxiety. The overall level of anxiety, as measured by the total Anxiety score, predicts the level of dysfunction, and the pattern of scale scores may reveal the contours of the client's experience of anxiety. References American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Bledsoe, J. (1973). Sex and grade differences in children's manifest anxiety. Psychological Reports, 32, 285-286. Boehnkem, K., Silbereisen, R.K., Reynolds, C. R., & Richmond, B.O. (1986). What I Think and Feel: German experience with the revised form of the Children's Manifest Anxiety Scale. Personality And Individual Differences, 7, 553-560. Carey, M.P., Lubin, B., & Brewer, D.H. (1992). Measuring dysphoric mood in preadolescents and adolescents: The Youth Depression Adjective Checklist (Y-DACL). Journal of Clinical Child Psychology, 21, 331-338. Castaneda, A., McCandless, B., & Palermo, D. (1956). The children's form of the Manifest Anxiety Scale. Child Development, 27, 317-326. Clawson, A. (1962). The Bender Visual Motor-Gestalt Test for Children. Los Angeles: Western Psychological Services. Cronbach, L. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. Crowley, S.L., & Emerson, E.N. (1996). Discriminant validity of self-reported anxiety and depression in children: Negative affectivity or independent constructs? Journal of Clinical Child Psychology, 25, 139-146. Dana, R.H. (1993). Multicultural assessment perspectives for professional psychology. Boston: Allyn & Bacon. DuBois, D.L., Felner, R.D., Bartel, C.L., & Silverman, M.M. (1995). Stability of self-reported depressive symptoms in a community sample of children and adolescents. Journal of Clinical Child Psychology, 24, 386396. Finch, A., Kendall, P., & Montgomery, L. (1974). Multidimensionality of anxiety in children: Factor structure of the Children's Manifest Anxiety Scale. Journal of Abnormal Child Psychology, 2, 331-336. Flanagan, P., Peters, C., & Conry, J. (1969). Item analysis of the Children's Manifest Anxiety Scale with the retarded. Journal of Educational Research, 62(10), 472-477. Ialongo, N., Edelson, G., Werthamer-Larson, L., Crokett, L., & Kellam, S. (1996). Social and cognitive impairment in first grade children with anxious and depressive symptoms. Journal of Clinical Child Psychology, 25, 15-24. Kaslow, N.J., Stark, K.D., Printz, B., Livingston, R., & Tsai, S.L. (1992). Cognitive triad inventory for children: Development and relation to depression and anxiety. Journal of Clinical Child Psychology, 21, 339-347. Kearney, C.A., & Silverman, W.K. (1993). Measuring the function of school refusal behavior: The school refusal assessment scale. Journal of Clinical Child Psychology, 22, 85-96. Koppitz, E. M. (1982). Personality assessment in the schools. In C.R. Reynolds & T.B. Gutkin (Eds.), The handbook of school psychology (pp. 273-295). New York: Wiley. Mitchell, J., Trent, R., & McArthur, R. (1993). Human Figure Drawing Test. Los Angeles: Western Psychological Services. Moran, M/P. (1990). The problem of cultural bias in personality assessment. In C.R. Reynolds & R.W. Kamphaus (Eds.), Handbook of psychological and educational assessment

< previous page

page_339

next page >

< previous page

page_340

next page > Page 340

of children: Vol. 2. Personality, behavior, and context (pp. 524-545). New York: Guilford. Murray, H. (1943). Thematic Apperception Test manual. Cambridge, MA: Harvard University Press. Nunnally, J. (1978). Psychometric theory. New York: McGraw-Hill. Paget, K.D., & Reynolds, C.R. (1984). Dimensions, levels, and reliabilities on the revised children's manifest anxiety scale with learning disabled children. Journal of Learning Disabilities, 17, 137-141. Pela, O.A., & Reynolds, C.R. (1982). Cross-cultural application of the Revised Children's Manifest Anxiety Scale: Normative and reliability data for Nigerian primary school children. Psychological Reports, 51, 11351138. Rabian, B., Peterson, R.A., Richters, J., & Jensen, P.S. (1993). Anxiety sensitivity among anxious children. Journal of Clinical Child Psychology, 22, 441-446. Reynolds, C.R. (1980). Concurrent validity of What I Think and Feel: The Revised Children's Manifest Anxiety Scale. Journal of Consulting and Clinical Psychology, 48, 774-775. Reynolds, C.R. (1981). Long-term stability of scores on the Revised Children's Manifest Anxiety Scale. Perceptual and Motor Skills, 53, 702. Reynolds, C.R. (1982). Convergent and divergent validity of the Revised Children's Manifest Anxiety Scale. Educational and Psychological Measurement, 42, 1205-1212. Reynolds, C.R. (1985). Multitrait validation of the Revised Children's Manifest Anxiety Scale for children of high intelligence. Psychological Reports, 56, 402. Reynolds, C.R., & Kamphaus, R.W. (1992). Behavior Assessment System for Children. Circle Pines, MN: American Guidance Service. Reynolds, C.R., Lowe, P.L., & Saenz, A. (in press). The problem of bias in psychological assessment. In C.R. Reynolds & T.B. Gutkin (Eds.), The handbook of school psychology (3rd ed.). New York: Wiley. Reynolds, C.R., & Paget, K.D. (1981). Factor analysis for the Revised Children's Manifest Anxiety Scale for blacks, whites, males, and females with a national normative sample. Journal of Consulting and Clinical Psychology, 49, 349-352. Reynolds, C.R., Plake, B.S., & Harding, R.E. (1983). Item bias in the assessment of children's anxiety: Race and sex interaction on items of the Revised Children's Manifest Anxiety Scale. Journal of Psychoeducational Assessment, 1, 135-142. Reynolds, C.R., & Richmond, B. (1978). What I Think and Feel: A revised measure of children's manifest anxiety. Journal of Abnormal Child Psychology, 6, 271-280. Reynolds, C.R., & Richmond, B. (1985). Revised Children's Manifest Anxiety Scale. Los Angeles: Western Psychological Services. Reynolds, C.R., & Scholwinski, E. (1985). Dimensions of anxiety among high IQ children. Gifted Child Quarterly, 29, 125-130. Rowan, K.R., & Kendall, P.C. (1997). Self-talk in distressed youth: States of mind and content specificity. Journal of Clinical Child Psychology, 26, 330-337. Spaccarelli, S., & Fuchs, F. (1997). Variability of symptom expression among sexually abused girls: Developing multivariate models. Journal of Clinical child Psychology, 26, 24-35. Spielberger, C. (1973). Preliminary manual for the State-Trait Anxiety Inventory for Children ("How I Feel Questionnaire"). Palo Alto: Consulting Psychologists Press. Switzer, J., & Gruber, C. (1992). Norris Educational Achievement Test. Los Angeles: Western Psychological Services. Tracy, S.A., Chorpita, B.F., Douban, J., & Barlow, D.H. (1997). Empirical evaluation of DSM-IV generalized anxiety disorder criteria in children and adolescents. Journal of Clinical Child Psychology, 26, 404-414. Wechsler, D. (1981). Wechsler Adult Intelligence Scale-Revised. San Antonio, TX: Psychological Corporation. Wechsler, D. (1991). Wechsler Intelligence Scale for Children (3rd ed.). San Antonio, TX: Psychological Corporation. Wilkinson, G.S. (1993). Wide Range Intelligence Test (3rd ed.). Wilmington, DE: Wide Range. Witt, J.C., Heffer, R.W., & Pfeifer, J. (1990). Structures rating scales: A review of self-report and informant rating processes, procedures, and issues. In C.R. Reynolds & R. W. Kamphaus (Eds.), Handbook of psychological and educational assessment of children: Vol. 2. Personality, behavior, and context (pp. 364-394).

New York: Guilford.

< previous page

page_340

next page >

< previous page

page_341

next page > Page 341

Chapter 12 Overview of the Minnesota Multiphasic Personality Inventory-Adolescent (Mmpi-a) Robert P. Archer Eastern Virginia Medical School This chapter reviews the Minnesota Multiphasic Personality Inventory-Adolescent (MMPI-A), a revision of the original MMPI specifically designed for use with teenagers. Similar to the development of the MMPI-2, the MMPI-A attempted to build on the most useful and productive aspects of the original test instrument. Thus, for example, the original MMPI basic clinical scales were retained in the MMPI-A. The MMPI-A, however, also represents an attempt to improve on several aspects of the original test instrument in relation to adolescent assessment. These changes include a 16% reduction in the total length of the item pool, revision of 70 items to simplify or improve wording, the collection of new national norms representing diverse geographic and ethnic groups, and the development of several new scales specifically related to adolescent development and psychopathology. Since the publication of the MMPI-A in 1992, there has been a steady, albeit small, flow of publications on this instrument. This chapter attempts to summarize the development, structure, and more recent literature on the MMPI-A. Overview of the MMPI-A Summary of Development The restandardization committee responsible for the development of the MMPI-2 (Butcher, W.G. Dahlstrom, Graham, Tellegen, & Kaemmer, 1989) was also involved in the initial development of the MMPI-A. Specifically, the restandardization committee developed an experimental form of the MMPI (Form TX) that contained 704 items, and supervised the initial collection of normative data with Form TX in several geographic settings in the United States. On July 1, 1989, the MMPI Adolescent Project Committee was created by the University of Minnesota Press and consisted of James N. Butcher, Auke Tellegen, Beverly Kaemmer, and Robert P. Archer. This committee made the final determination to proceed with the development and publication of an

< previous page

page_341

next page >

< previous page

page_342

next page > Page 342

adolescent form of the MMPI and provided recommendations concerning normative criteria, item and scale selection, and profile construction to be incorporated in the adolescent form. The adolescent project committee, wishing to maintain continuity between the original MMPI and the MMPI-A, sought to preserve the standard or basic MMPI scales. Scale F was substantially modified, however, to improve its psychometric performance with adolescents, and Scales Mf and Si were shortened to reduce the total item pool of the instrument. The MMPI basic clinical scales were developed by Hathaway and McKinley using a criterion keying method. Items were selected for scale membership based on the occurrence of item response frequencies that differentiated between a criterion group manifesting a specific diagnosis or characteristic, and a comparison group (the Minnesota adult normal sample) thought not to manifest the trait or characteristic under study. Indeed, the original MMPI is widely cited (e.g., Anastasi, 1982) as an outstanding example of this method of test construction. In addition to the basic clinical scales, the MMPI-A contains 4 new validity scales presented within the Basic Scale Profile, 15 content scales, 6 supplementary scales, 28 Harris-Lingoes subscales, and 3 Si subscales. Table 12.1 provides an overview of the scale structure of the MMPI-A, with scales organized into three broad headings corresponding to the three MMPI-A profile sheets. The new validity scales in the basic scale profile include the F1 and F2 subscales, each containing a 33-item subset of the 66-item MMPI-A F scale. These items were selected based on a criterion that the item was endorsed in the deviant direction by no more than 20% of males and females in the MMPI-A normative sample. The MMPI-A validity scales also include the Variable Response Inconsistency (VRIN) scale and the True Response Inconsistency (TRIN) scale, consistency measures developed using a methodology very similar to that employed in the development of the MMPI-2 counterparts of these measures. The order of appearance of validity scales on the basic scale profile, from left to right, is as follows: VRIN, TRIN, F1, F2, F, L, and K. The 15 MMPI-A content scales heavily overlap with both the MMPI-2 content scales (Butcher et al., 1989) and with the Wiggins scales (Wiggins, 1966, 1969) created for use with the original MMPI. The MMPI-A content scales were developed based on a combination of rational and statistical criteria as described in the MMPI-A manual (Butcher et al., 1992) and by Williams, Butcher, Ben-Porath, and Graham (1992), who specifically focused on the MMPI-A content scales. The six supplementary scales for the MMPI-A include the continuation, in modified form, of three scales used with the original form of the MMPI. These scales are slightly shortened versions of Welsh's (1956) Anxiety (A) and Repression (R) scales, and a revision of MacAndrew's (1965) Alcoholism Scale, the MacAndrew Alcoholism Scale-Revised (MAC-R). In addition, there are three new supplementary scales developed for the MMPI-A: the Immaturity (IMM) scale, the Alcohol/Drug Problem Acknowledgment (ACK) scale, and the Alcohol/Drug Problem Potential (PRO) scale. The Harris-Lingoes content scales (Harris & Lingoes, 1955) developed for the original MMPI were carried over to the MMPI-A, with a few item deletions resulting from modifications of the item pool within the basic scales. The Si subscales are identical to the MMPI-2 Si subscales, and are presented on the same MMPI-A profile sheet with the Harris-Lingoes subscales. In addition to the 58 items deleted from the original standard scales of the MMPI (88% of these items occurring in relation to F, Mf, or Si), 69 items were modified from their appearance in the original test form. Archer and Gordon (1994) and Williams, Ben-Porath, and Hevern (1994) examined the equivalency of the revised form of these items in adolescent samples. The findings from these studies indicated that the items

< previous page

page_342

next page >

< previous page

page_343

next page > Page 343

TABLE 12.1 Overview of the MMPI-A Scales and Subscales Basic Profile Scales (17 scales) Validity Scales (7) VRIN (Variable Response Inconsistency) TRIN (True Response Inconsistency) F1 F2 F (Frequency) L (Lie) K (Defensiveness) Clinical Scales (10) 1/Hs (Hypochondriasis) 2/D (Depression) 3/Hy (Hysteria) 4/Pd (Psychopathic Deviate) 5/Mf (Masculinity-Femininity) 6/Pa (Paranoia) 7/Pt (Psychasthenia) 8/Sc (Schizophrenia) 9/Ma (Mania) 0/Si (Social Introversion) Content and Supplementary Scales (21 scales) Content Scales (15) A-anx (Anxiety) A-obs (Obsessiveness) A-dep (Depression) A-hea (Health Concerns) A-aln (Alienation) A-biz (Bizarre Mentation) A-ang (Anger) A-cyn (Cynicism) A-con (Conduct Problems) A-lse (Low Self-esteem) A-las (Low Aspirations) A-sod (Social Discomfort) A-fam (Family Problems) A-sch (School Problems) A-trt (Negative Treatment Indicators) Supplementary Scales (6) MAC-R (MacAndrew Alcoholism-Revised) ACK (Alcohol/Drug Problem Acknowledgment) PRO (Alcohol/Drug Problem Potential) IMM (Immaturity) A (Anxiety) R (Repression) Harris-Lingoes and Si Subscales (31 subscales) Harris-Lingoes Subscales (28) D1 (Subjective depression) D2 (Psychomotor retardation) D3 (Physical malfunctioning) (Continued)

< previous page

page_343

next page >

< previous page

page_344

next page > Page 344

TABLE 12.1 (Continued) D4 (Mental dullness) D5 (Brooding) Hy1 (Denial of social anxiety) Hy2 (Need for affection) Hy3 (Lassitude-malaise) Hy4 (Somatic complaints) Hy5 (Inhibition of aggression) Pd1 (Familial discord) Pd2 (Authority problems) Pd3 (Social imperturbability) Pd4 (Social alienation) Pd5 (Self-alienation) Pa1 (Persecutory ideas) Pa2 (Poignancy) Pa3 (Naivete) Sc1 (Social alienation) Sc2 (Emotional alienation) Sc3 (Lack of ego mastery, cognitive) Sc4 (Lack of ego mastery, conative) Sc5 (Lack of ego mastery, defective inhibition) Sc6 (Bizarre sensory experiences) Ma1 (Amorality) Ma2 (Psychomotor acceleration) Ma3 (Imperturbability) Ma4 (Ego inflation) Si Subscales (3) Si1 (Shyness/Self-consciousness) Si2 (Social avoidance) Si3 (Alienation-self and others) Note. From MMPI-A: Assessing Adolescent psychopathology (2nd ed., pp. 54-55) by R.P. Archer, 1997, Mahwah, NJ: Lawrence Erlbaum Associates. Copyright © 1997 by Lawrence Erlbaum Associates. Reprinted with permission. rewritten for the MMPI-A resulted in response frequencies similar to those of the original versions of these items. The final version of the MMPI-A is a 478-item, true-false objective measure of psychopathology. Scoring for the instrument is accomplished through hand-scoring templates or by computer programs available through organizations licensed to score the MMPI-A by the University of Minnesota Press. The scoring of the MMPI-A continues the MMPI tradition of using a simple summation of items endorsed in the critical direction for a particular scale, without the use of differential weighting formulas for items. It should be noted, however, that the scoring formula for the TRIN scale, described in the test manual (Butcher et al., 1992), is more complex than that of other scales because the endorsement of certain item pairs may result in a subtraction from the total raw score value, and because TRIN scale T-score values must be ³ 50. MMPI-A Norms The MMPI-A normative data was collected in eight states, seven of which also provided normative data for the MMPI-2. Adolescent normative subjects were generally solicited

< previous page

page_344

next page >

< previous page

page_345

next page > Page 345

by mail from the student rosters of junior and senior high schools in preselected areas, and subjects were tested in group sessions usually conducted within school settings. Adolescents in all sites except New York were paid for their participation in the MMPI-A normative data collection, with subjects receiving between $10 to $15 at the time of their completion of testing materials. New York subjects participated without reimbursement as part of school activities. In total, approximately 2,500 adolescents were evaluated in data collection procedures in California, Minnesota, New York, North Carolina, Ohio, Pennsylvania, Virginia, and Washington state. A variety of exclusion criteria were applied to the collected data to create the final normative set. Subjects were excluded who did not complete all data collection measures, or left more than 35 items unanswered on MMPI Form TX, or produced a raw score value > 25 on the F scale (using the original item pool for this scale). Subjects below age 14, or above age 18, also were excluded from the normative sample. Using these criteria, the final MMPI-A norms were based on 805 male and 815 female respondents. The ethnic backgrounds of these subjects reflected a reasonably balanced sample, with approximately 76% of the data collected from White respondents and roughly 12% from Black adolescents. The remaining 12% came from adolescents representing several ethnic groups, including Hispanics and Native Americans. The MMPI-A normative sample ethnic distribution appears reasonably consistent with U.S. census figures, and several data collection sites were selected to increase sampling from diverse ethnic backgrounds (Butcher et al., 1992). Data presented in the MMPI-A manual (Butcher et al., 1992) summarize parental educational levels as reported by adolescents in the normative sample. These data show that the parents of the MMPI-A normative sample overrepresented the higher educational levels in comparison to the 1980 U.S. census, and clearly represent a well-educated group (Archer, 1997). Approximately 50% of fathers and 40% of mothers of adolescents who participated in the MMPI-A normative sample had obtained an educational level equal to or greater than a baccalaureate degree. In comparison, the 1980 census indicates that only 20% of males and 13% of females reported comparable educational levels. This degree of overrepresentation of better educated individuals in the MMPI-A sample is very similar to the educational bias found for the MMPI-2 adult normative sample (Archer, 1992b) and could be subject to some of the same debates focused on this issue in relation to the MMPI-2. Archer (1997) speculated that this type of educational and occupational bias is related to the use of unselected, volunteer subjects in normative data collection, a procedure that tends to sample from better educated components of the society. Additional descriptive data concerning the MMPI-A normative sample, including adolescents' grade levels, parental occupational levels, and adolescents' living situations, are reported in the MMPI-A test manual (Butcher et al., 1992). The MMPI-A norms are based on adolescents between ages 14 and 18, inclusive. The mean age for male adolescents in the MMPI-A normative sample was 15.5 years (SD = 1.17 years) and the mean age for females was 15.6 years (SD = 1.16 years). The age 18 adolescent group overlaps with the 18-year-old subsample of the MMPI-2 norms, reflecting that an 18-year-old respondent could potentially be evaluated with either the MMPIA or the MMPI-2. In this regard, the MMPI-A manual recommends the following criterion for determining the form most appropriate to evaluate the 18-year-old: ''A suggested guideline would be to use the MMPI-A for those 18-year-olds who are in high school and the MMPI-2 for those who are in college, working, or otherwise living an independent adult lifestyle. The MMPI-A, not the MMPI-2, should always

< previous page

page_345

next page >

< previous page

page_346

next page > Page 346

be used for those 17 and younger, regardless of whether they are in school" (Butcher et al., 1992, p. 23). In the application of these guidelines, however, it is quite possible to encounter an occasional adolescent for whom the selection of the most appropriate form is a difficult or ambiguous decision. For example, an 18-yearold single mother with a 6-month-old infant, who is in her senior year in high school but living with her parents, presents a considerable challenge in terms of identifying the most appropriate MMPI form for use with this individual. In these difficult cases, an important question arises concerning what effects, if any, the selection of the MMPI-A versus the MMPI-2 might have on the resulting T-score profile. Shaevel and Archer (1996) examined the effects of scoring 18-year-old respondents on the MMPI-2 and the MMPI-A and found that substantial differences can occur in T-score elevations as a function of this decision. Specifically, Shaevel and Archer reported that 18-year-olds scored on MMPI-2 norms generally produced lower validity scale values and higher clinical scale values than the same adolescents scored on MMPI-A norms. These differences ranged as high as 15 T-score points, and resulted in different single-scale and 2-point profile configurations in 34% of the cases examined in this study. Shaevel and Archer concluded that for those relatively rare assessment cases in which the selection of the MMPI-A versus the MMPI-2 is a relatively difficult decision for the 18-year-old respondent, a reasonable practice would be to score that individual on both MMPI-A and MMPI-2 norms in order to permit the clinician to assess the relative effects of instrument selection on profile characteristics. Subjects were eliminated below the age of 14 for the MMPI-A normative sample. Preliminary data analyses were interpreted by MMPI-A Adolescent Project Committee members as indicating that 12- and 13-year-old subjects tended to produce substantially different normative values than those in the 14- through 18-year-old grouping; consequently, there were concerns regarding the usefulness of MMPI-A data produced by adolescents under age 14. The MMPI-A manual notes that the instrument can be used cautiously with 12- and 13-year-old respondents with an awareness of the higher rate of administration difficulties found in this population. Archer (1997) provided a set of MMPI-A adolescent norms for 13-year-old boys and girls. He based these norms on linear T-score conversions and used the same exclusion criterion employed for the 14-through 18-year-old MMPI norms developed for the test instrument. In general, preliminary studies (Archer, 1997) appear to indicate that MMPI-A norms based on this 13-year-old sample tend to produce lower T-score values on most clinical scales in comparison to T-scores from the 14- through 18-year-old MMPI-A norms applied to identical raw score values. The 13-year-old norm set was created to promote research with this age group and to provide the clinician with the potential to evaluate a 12- or 13-year-old adolescent who meets all administration criteria on this specialized norm set in conjunction with the standard MMPI-A norms. Such a comparison would allow the clinician to refine interpretive comments made from the use of the standard MMPI-A norms, based on elevation differences found for the 13-year-old norm set. The profile interpretation, however, should be primarily based on the standard MMPI-A norms. The MMPI-A should not be employed with adolescents below the age of 12, and the 12- and 13-year-old age group will contain many adolescents unable to successfully read and comprehend the MMPI-A item pool. For many years, a sixth-grade reading level was generally accepted as the basic requirement for MMPI administration. The MMPI-2 manual (Butcher et al., 1989) indicates a reading level of eighth grade is required for successful MMPI-2 administration.

< previous page

page_346

next page >

< previous page

page_347

next page > Page 347

Archer (1997) noted that over 80% of the MMPI-A item pool can be accurately read and comprehended by adolescents reading at the seventh-grade reading level. Archer also reviewed a variety of methods of evaluating reading comprehension on the MMPI-A, including the use of total test administration time, VRIN scale values, and the random MMPI-A profile configuration expected for the basic scales and the content and supplementary scales. W.G. Dahlstrom, Archer, Hopkins, Jackson, and L.E. Dahlstrom (1994) evaluated the reading difficulty of the MMPI, MMPI-2, and MMPI-A using various indices of reading difficulty. One important finding derived from this study was that the instructions provided in the MMPI test booklets tended to be somewhat more difficult to read than the typical item contained within the inventories. Therefore, clinicians should ensure that the instructions are fully understood by respondents. It is often appropriate to ask the test-taker to read the instructions aloud and explain the meaning of the instructions in order to ensure adequate comprehension. W.G. Dahlstrom et al. (1994) found that the average difficulty level for all three forms of the MMPI was approximately the 6th-grade level. The MMPI-A test instructions and items were slightly easier to read than the MMPI-2 or the original form of the MMPI; however, the total differences tended to be relatively small. If the most difficult 10% of items were excluded, the remaining 90% of items on all three versions of the MMPI had an average difficulty level of the 5th grade. Dahlstrom et al. also reported that approximately 6% of the MMPIA items required a 10th-grade reading level or better. On average, the most difficult items appeared on Scale 9, whereas the easiest items tended to be presented within the item pool of Scale 5. They cautioned that the number of years of education completed by a subject is often an unreliable index of the individual's reading competence. The MMPI-A, similar to the MMPI-2, employs both linear T-score and uniform T-score transformation procedures within its collection of scales. This may be contrasted with the original adult norms, and the adolescent norms developed by Marks and Briggs (1972) for the original form of the MMPI, which were exclusively based on linear transformations to convert raw scores to T-score values. The MMPI-A retained the use of linear T-score conversions for all validity scales and for MMPI-A basic scales 5 and 0. Additionally, linear T-score transformations were employed for all 6 MMPI-A supplementary scales, and for all scales appearing on the Harris-Lingoes and Si subscales profile sheet. In contrast, 8 of the clinical scales on the MMPIA basic scale profile (1, 2, 3, 4, 6, 7, 8, and 9) and all of the 15 content scales employ uniform T-score transformations. These latter two scale groupings were selected for uniform T-scores because these clinical scales produced similar score distributions, and scales within each set (i.e., basic and content scales) were developed using similar construction strategies. The rationale and methods involved in uniform T-score transformation are discussed extensively in the MMPI-A manual (Butcher et al., 1992). In general, uniform Tscore transformations produce T-score values that essentially represent the average linear T-score found for the scales employed in the composite distributions for the basic scales and the content scales analyzed separately by gender. The T-score values obtained by uniform transformations are quite similar to those that would be obtained from linear T-score conversions for a given scale. The purpose of the uniform T-score procedure is to produce T-score values with equivalent percentile value meanings across scales for a given T-score. This procedure, however, also maintains the underlying positive skew in the distribution of scores from these measures; thus, uniform T-scores do not convert to percentile values that would be expected if scores were normally distributed (e.g., a

< previous page

page_347

next page >

< previous page

page_348

next page > Page 348

uniform T-score value of 50 does not convert to the 50th percentile on the MMPI-A, but rather to the 55th percentile). Most of the differences found between the Marks and Briggs (1972) norms for the original form of the MMPI and the MMPI-A norms are not attributable to the issue of uniform versus linear T-score transformation procedures. Rather, these T-score differences result from the substantial differences in the raw score means and standard deviations produced by the two normative groups on most basic scales. The overall effect of these differences, as is discussed later, is that MMPI-A T-score values for a given raw score tend to be lower than those produced by the Marks and Briggs traditional norms. Appendix G of the MMPI-A manual, and Appendix E of Archer (1997), provide T-score conversion tables to permit estimates of the Marks and Briggs normative values that would be produced for a given MMPI-A basic scale raw score value. This allows the clinician to evaluate the extent to which the adolescent's item responses would have produced a similar profile on the original MMPI, in relation to the profile obtained on the MMPI-A. This issue is relevant to the degree to which the research literature developed for the original MMPI may be generalized for use in the interpretation of MMPI-A findings for a specific adolescent. Basic Validity and Reliability Information The MMPI-A manual (Butcher et al., 1992) provides information concerning test-retest reliability, internal consistency, and factor structure of MMPI-A scales. The test-retest correlations for the MMPI-A basic scales range from r = .49 for F1 to r = .84 for Si and are very similar to the test-retest correlations found for MMPI-2 basic scales. The typical standard error of measurement of basic scales is estimated to be two to three raw score points (Butcher et al., 1992). The internal consistency of MMPI-A scales, as represented in coefficient alpha values, ranges from low to moderate values found for scales such as Mf (r = .43) and Pa (r = .57), to high (r ³ .80) values found for many of the content scales and the IMM scale. These latter scales were constructed using methods designed to produce high internal consistency values. The factor analytic findings for the MMPI-A are reasonably consistent with prior factor analytic findings reported in adolescent populations for the original MMPI (Archer, 1984; Archer & Klinefelter, 1991). Validity data from normal and clinical samples of adolescents are also presented in the MMPI-A manual. In addition to MMPI Form TX, the MMPI-A normative sample was administered a 16-item biographical information form and a 74-item life stress events questionnaire. These forms served as external correlate sources to evaluate the concurrent validity of the MMPI-A. The MMPI-A manual also reports validity findings for a clinical sample of 420 boys and 293 girls between the ages of 14 and 18 receiving psychological services in Minnesota. In addition, Archer (1997) provided MMPI-A scale correlate data from a sample of 128 adolescent inpatients collected in Virginia. Three recent studies have examined the psychometric characteristics or correlates of specific aspects of the MMPI-A. Imhof and Archer (1997) examined the concurrent validity of the Immaturity (IMM) scale based on a residential treatment sample of 66 adolescents, from age 13 to 18. The MMPI-A IMM scale was developed to provide an objective measure of ego development or maturation. Participants were administered the MMPI-A, the Defining Issues Test (DIT), a short form of the Washington University Sentence Completion Test (WUSCT), the Extended Objective Measure of Ego Identity

< previous page

page_348

next page >

< previous page

page_349

next page > Page 349

Status-2nd Revision (EOM-EIS-2), and standardized measures of intelligence and reading ability. The results of this study provided evidence of the concurrent validity of the IMM scale, and a number of IMM correlate descriptors were reported. Archer and Jacobson (1993) examined the endorsement frequency of the Koss-Butcher (1973) and the LacharWrobel (1979) critical items resulting from administrations of the MMPI-A to normal and clinical adolescent samples, and compared these results to MMPI-2 findings for adults. These data indicated that adolescents in both normal and clinical samples endorse critical items with a higher frequency than do normal adults. Further, results demonstrated that significant differences were uniformly found between the endorsement frequencies for normative versus clinical subjects for the MMPI-2 samples, whereas similar comparisons for the MMPI-A samples typically showed that adolescents in clinical settings did not endorse critical items more frequently than normal adolescents. These findings indicate that it may be difficult to construct critical item lists for adolescents based on the type of empirical methodology used with adults in which items are selected based on endorsement frequency differences found between comparison groups. Finally, Alperin, Archer, and Coates (1996) attempted to derive age-appropriate K-weights for the MMPI-A to determine the degree to which the use of this procedure could improve test accuracy in the classification of participants into normal and clinical groups. Discriminant function analyses were performed to determine the Kweight that, when combined with basic scale raw score values, optimally predicted normal versus clinical status for each of the eight basic clinical scales. Hit rate analyses were utilized to assess the degree to which Kcorrected T-scores resulted in improvements in classification accuracy in contrast to standard MMPI-A non-Kcorrected norms. Results indicate that the adoption of a K-correction procedure for the MMPI-A does not result in systematic improvements in test accuracy, and these findings did not support the clinical use of a K-correction factor for interpreting MMPI-A protocols. Basic Interpretive Strategy Several guides have been provided for the interpretation of the MMPI-A, including extensive recommendations in the test manual (Butcher et al., 1992) and in recommendations by Butcher and Williams (1992a) and Williams et al. (1992). Table 12.2 provides a brief overview of the interpretive approach offered by Archer (1997). The first two steps presented in this model emphasize the importance of consideration of the setting where the MMPI-A is administered, and the history and background information available for the adolescent. As reviewed in Archer (1997), the original form of the MMPI has been used for research and clinical purposes with adolescents in a variety of settings, including public and private schools, medical groups, alcohol and drug treatment settings, correctional and juvenile delinquency programs, and out-patient and inpatient psychiatric settings. It is always important to combine MMPI-A test data with information from other psychological tests and from demographic, psychosocial history, and psychiatric history information collected in individual and family interviews to increase the accuracy and utility of inferences derived from the MMPI-A. The third step noted in Table 12.2 involves the evaluation of the technical validity of the MMPI-A profile. This process begins with the review of the number of items omitted in the response process, with a recommendation that profiles be viewed as invalid if greater than 30-item omissions have occurred in the response record. Validity

< previous page

page_349

next page >

< previous page

page_350

next page > Page 350

TABLE 12.2 Steps in MMPI-A Profile Interpretation 1. Setting in which the MMPI-A is administered a. Clinical/psychological/psychiatric b. School/academic evaluation c. Medical d. Neuropsychological e. Forensic f. Alcohol/drug treatment 2. History and background of patient a. Cooperativeness/motivation for treatment or evaluation b. Cognitive ability c. History of psychological adjustment d. History of stress factors e. History of academic performance f. History of interpersonal relationships g. Family history and characteristics 3. Validity a. Omissions b. Consistency of responses c. Accuracy of responses 4. Codetype (provides main features of interpretation) a. Degree of match with prototype (1) Degree of elevation (2) Degree of definition (3) Caldwell A-B-C-D Paradigm for multiple high points b. Low-point scales c. Note elevation of scales 2 (D) and 7 (Pt) 5. Supplementary scales (supplement and confirm interpretation) a. Factor 1 and Factor 2 issues (1) Welsh A and R b. Substance abuse scales (1) MAC-R and PRO (2) ACK c. Psychological maturation (1) IMM scale 6. Content scales a. Supplement, refine, and confirm basic scale data b. Interpersonal functioning (A-fam, A-cyn, A-aln), treatment recommendations (A-trt), and academic difficulties (A-sch and A-las) c. Consider effects of overreporting/underreporting 7. Review of Harris-Lingoes subscales and critical item content a. Items endorsed can assist in understanding reasons for elevation of basic scales 8. Structural Summary (Factor Approach) a. Identify factor dimensions most relevant in describing an adolescents' psychopathology b. Use to confirm and refine traditional interpretation c. Consider effects of overreporting/underreporting on factor patterns Note. From MMPI-A: Assessing Adolescent Psychopathology (2nd ed., p. 272) by R.P Archer, 1997, Mahwah, NJ: Lawrence Erlbaum Associates. Copyright © 1997 by Lawrence Erlbaum Associates. Reprinted with permission. assessment continues with an evaluation of the degree to which the adolescent responded in a consistent manner (e.g., VRIN and TRIN scale scores) and in an accurate manner (F, L, and K configural pattern) using the validity assessment model proposed by Greene (1989, 1991). In this model, a distinction is made between response consistency, defined as the extent to which the respondent endorses items in a reliable pattern, and response accuracy, defined as the degree to which the respondent has overreported or underreported symptomatology. Response consistency may be viewed as a necessary, but not

< previous page

page_351

next page > Page 351

sufficient, condition for technical validity. The tendency to overreport or underreport symptomatology, in turn, may be seen as relatively independent from the respondent's actual level of symptomatology (Greene, 1989, 1991). The fourth step in MMPI-A interpretation involves the review of the basic scale clinical profile. This review should examine the degree to which one or more basic scales manifest clinical-range elevations and the relative magnitude of these elevations. In general, the greater the magnitude of an adolescent's T-score on a particular basic scale, the more likely it is that the respondent is accurately described by the correlates typically associated with elevations on that scale. In addition, the degree of correspondence between the profile configuration and the existing 2-point codetype literature should be examined. In this regard, the degree of definition manifested by an adolescent's 2-point code also should be evaluated, with 2-point codetype definition defined by the degree of Tscore elevation difference between the second and third highest elevations within the clinical profile. The greater the degree of definition for the 2-point code, the more likely it is that the descriptive statements associated with that codetype are accurate for a particular adolescent. If an adolescent's MMPI-A profile does not display clearly elevated and defined 2-point code characteristics, then the profile may be interpreted by an approach emphasizing individual scale descriptors (Archer, 1997; Butcher et al., 1992). Basic individual scale descriptors have been established based on the empirical literature for the original instrument in adolescent samples, and for the MMPI-A summarized in the test manual (Butcher et al., 1992). In the case of an MMPI-A basic scale profile that displays clinical-range elevations on more than two clinical scales, the A-B-C-D paradigm developed by Caldwell (1976) also may be employed for profile interpretation purposes. This latter approach emphasizes the common descriptor characteristics generated from multiple 2-point configurations. For example, a 2-4-7 codetype would be broken into 2-point codes and interpreted based on common descriptors found for the 2-4, 27, and 4-7 codetypes. The 2-point codetype correlate literature rests on the work of Marks, Seeman, and Haller (1974), and this literature has been summarized and extended by Archer (1997) for the MMPI-A. In this regard, Janus, Tolbert, Calestro, and Toepfer (1996) investigated the accuracy of MMPI-A codetype narratives in a sample of 134 adolescent psychiatric inpatients. The single and 2-point codetype narratives generated for each patient from two sets of adolescent norms, as well as one set of adult norms, were blindly rated along various accuracy dimensions by inpatient treatment staff. Results indicated that the MMPI-A produced higher accuracy ratings when codetype narratives were based on either the original set of adolescent norms developed by Marks and Briggs (1972) or standard MMPI-A adolescent norms than when adult K-corrected norms were used to generate codetype narratives. The study by Janus et al. is an important initial step in establishing the clinical utility of codetype interpretation with the MMPI-A. A review of the MMPI-A supplementary and content scales is involved in Steps 5 and 6. Supplementary scales A and R provide an overall estimate of the adolescent's degree of maladjustment and the use of repression as a primary defense mechanism, respectively. Extensive substance abuse screening information is available through the combined use of supplementary scales MAC-R, ACK, and PRO. In particular, the adolescent's willingness to acknowledge substance abuse problems is reflected in ACK scale scores, whereas the adolescent's similarity to teenagers with known substance abuse problems is assessed through responses to the MAC-R and PRO scales. The supplementary IMM scale also allows for an assessment of the adolescent's maturational level

< previous page

page_351

next page >

< previous page

page_352

next page > Page 352

as related to cognitive processes, ability to engage meaningfully in interpersonal relationships, and degree of egocentricity and frustration tolerance (Archer, Pancoast, & Gordon, 1994). The 15 content scales provide valuable information in refining and augmenting the interpretation of the basic scales (Williams et al., 1992). For example, scores from A-anx may be helpful in refining interpretation of Scale 7, Scale A-biz may be useful in clarifying interpretation of Scale 8, and scales such as A-con and A-fam may be useful in refining interpretations of MMPI basic scale 4. Further, scales such as A-trt may be useful in evaluating the adolescent's readiness to engage in a therapy process, particularly when used in conjunction with Scales L and K. Content scales such as A-fam, A-cyn, and A-aln provide valuable information concerning the adolescent's interpersonal functioning, and scales such as A-sch and A-las provide important information concerning possible problems in the academic environment. In evaluating the findings from the content scales, it is important to consider the effects of overreporting and underreporting on profile accuracy because the content scales consist primarily of obvious and face valid items (Archer, 1997). Thus, content scales can easily be biased by the adolescent's attempt to underreport or overreport symptomatology. In the seventh stage of profile analysis, the clinician may wish to further consider and evaluate the content of the adolescent's MMPI-A responses. This may entail a selective review of the Harris-Lingoes and Si subscales, and may also include a cautious review of responses to critical items lists such as the Koss and Butcher (1973) and Lachar and Wrobel (1979) critical lists. In reviewing critical items, it should be remembered that responses to any individual MMPI-A items are inherently unreliable, and normal adolescents tend to endorse critical items with a relatively high frequency. Archer (1997) provided information on the 99 Lachar-Wrobel critical items retained in the MMPI-A item pool, and offered guidelines for content subscale and critical item interpretation. In the final (eighth) stage of profile analysis, the interpreter may wish to use the MMPI-A Structural Summary form to organize MMPI-A scale data in a manner that identifies the most salient dimensions of the adolescent's current functioning. The first seven steps covered in the interpretive approach provide correlates and inferences concerning the adolescent's behaviors based on an organization of scales into traditional categories such as validity scales, basic clinical scales, content and supplementary scales, and Harris-Lingoes and Si subscales. The Structural Summary approach promotes a comprehensive assessment of the adolescent's functioning, deemphasizing the largely arbitrary distinction between categories of scales. The use of this approach serves to remind the clinician that data derived from the MMPI-A scales are highly intercorrelated and reflective of broad underlying dimensions of psychological functioning. The interpretive guidelines for the Structural Summary form involve the following two propositions: 1. The higher the percentage of scales and subscales within a factor that produce critical values, the greater the role of that factor or dimension in providing a comprehensive description of the adolescent. 2. A majority of the scales or subscales associated with the particular factor must reach critical values (T ³ 60 or T £ 40 depending on the scale or subscale) before the interpreter emphasizes that dimension as salient in describing the adolescent's personality characteristics. As illustrated in Figs. 12.1a and 12.1b, the first section of the Structural Summary organizes information relevant to the evaluation of the validity of the MMPI-A along

< previous page

page_352

next page >

< previous page

page_353

next page > Page 353

Fig. 12.1. MMPI-A Structural Summary Form. From R.P. Archer and R. Krishnamurthy, 1994, Odessa, FL: Psychological Assessment Resources, Inc. Copyright © 1994 by Psychological Assessment Resources, Inc. Reprinted with permission.

< previous page

page_353

next page >

< previous page

page_354

next page > Page 354

Fig. 12.1. (Continued)

< previous page

page_354

next page >

< previous page

page_355

next page > Page 355

Fig. 12.2. MMPI-A basic scale profile sheet for clinical case example (Deborah). From J. N. Butcher, C.L. Williams, J.R. Graham, R.P. Archer, A. Tellegen, Y.S. Ben-Porath, and B. Kaemmer, 1992, Minneapolis, MN: Regents of the University of Minnesota. Copyright © the Regents of the University of Minnesota, 1942, 1943 (renewed 1970), 1992. This profile form a1992. Reproduced by permission of the publisher. three dimensions, which include the number of item omissions, indices related to response consistency, and indices of response accuracy. The remainder of the Structural Summary presents groupings of MMPI-A scales and subscales organized around the eight factors identified by Archer, Belevich, and Elkins (1994) in the MMPIA normative sample, and replicated by Archer and Krishnamurthy (1997b) in a clinical sample of 358 adolescents. Within each factor, scales and subscales are grouped logically within the traditional categories of basic scales, content scales, and supplementary scales. Within each of these groupings, scales are presented in descending order from those measures that have the highest correlation with a particular factor (i.e., those scales and subscales serving as the most effective markers) to those scales that show progressively lower correlations with the total factor. With very few exceptions, all of the scales and subscales presented in the Structural Summary produce correlations ³ .60 or £ -.60 with their assigned factor. The Structural Summary also presents spaces at the bottom of each factor grouping to derive the total number, or percentage of scales, that show critical values for a specific factor. The basic concept underlying the development of the MMPI-A Structural Summary involves parsimoniously organizing the myriad of data provided by the MMPI-A to assist the clinician in identifying the most salient dimensions to be utilized in describing

< previous page

page_355

next page >

< previous page

page_356

next page > Page 356

adolescents' personality functioning. Archer and Krishnamurthy (1994) provided a description of the empirical correlates of the MMPI-A Structural Summary factors based on an investigation of the 1,620 adolescents in the MMPI-A normative sample and an inpatient sample of 122 adolescent respondents. A comprehensive presentation of all external correlates of the Structural Summary factors are provided in the MMPI-A Casebook by Archer, Krishnamurthy, and Jacobson (1994), and a narrative summary of these correlates is provided in Table 12.3. Computer-Based Test Interpretation (CBTI) There are several computer-based test interpretation packages that are currently available for the MMPI-A. These include the revised MMPI-A CBTI report developed by Archer (1992a, 1996) and an MMPI-A CBTI report developed by Butcher and Williams (1992b). Both CBTI products are based on combinations of expert judgment and actuarial data. Butcher (1987) and Archer (1997) provided guidelines for the evaluation and assessment of CBTI products, including the relative advantages and disadvantages associated with this approach. It should be emphasized that the use of a CBTI product in the interpretation of the MMPI-A (or any other assessment instrument) does not reduce clinicians' responsibility for the accuracy of their interpretation of the individual patient's profile. Use of the MMPI-A in Treatment Planning General Treatment Planning Issues Archer, Maruish, Imhof, and Piotrowski (1991) surveyed 165 clinicians who routinely evaluate teenage clients and asked respondents who used the MMPI to indicate the primary advantages and disadvantages associated with the original instrument. Results indicated the advantages associated with the MMPI were its usefulness in treatment planning issues, including the accuracy of interpretive statements generated from profile information, the comprehensive aspect of the measures of psychopathology assessed by the MMPI, and the extensive research literature available to assist in the interpretation process. The major disadvantages associated with the MMPI by survey respondents were the length of the item pool and administration time required, the outdated aspects of the adolescent norms, the reading requirements of the MMPI, and the inclusion of inappropriate or outdated items. The developers of the MMPI-A attempted to address most of these problem areas by reducing instrument length, providing contemporary norms, and revising many items to simplify wording and increase the appropriateness of item content. Nevertheless, the MMPI-A has continued to manifest many of the same advantages and disadvantages as the original instrument. Despite potential improvements in the MMPI-A, the revised instrument continues to require substantial patience on the part of the adolescent to deal with the lengthy item pool, and requires a level of literacy that renders administration problematic with many adolescents. It is also useful to recognize that the MMPI-A, like the original MMPI, is designed as a measure of psychopathology rather than an assessment instrument appropriate

< previous page

page_356

next page >

< previous page

page_357

next page > Page 357

TABLE 12.3 Description of the MMPI-A Structural Summary Factors Factor Description General Maladjustment This factor is associated with substantial emotional (23 scales or subscales) distress and maladjustment. Adolescents who score high on this dimension subscales)experience significant problems in adjustment at home and school and feel different from other teenagers. They are likely to be self-conscious, socially withdrawn, timid, unpopular, dependent on adults, ruminative, subject to sudden mood changes, and to feel sad or depressed. They are viewed as less competent in social activities and as avoiding competitive situations with peers. These adolescents are more likely than other teenagers to report symptoms of tiredness or fatigue, sleep difficulties, and suicidal thoughts, and to be referred for counseling and/or psychotherapy. Academic problems including low marks and course failures are common, and they are likely to be referred for counseling or psychotherapy. Immaturity The Immaturity dimension reflects attitudes and (15 scales or subscales) behaviors involving egocentricity and selfcenteredness, limited self-awareness and insight, poor judgment and impulse control, and disturbed interpersonal relationships. Adolescents who obtain high scores on this factor often have problems in the school setting involving disobedience, suspensions, and histories of poor school performance. Their interpersonal relationships are marked by cruelty, bullying, and threats, and they often associate with peers who get in trouble. These adolescents act without thinking and display little remorse for These adolescents act without thinking and display little remorse for their actions. Familial relationships are frequently strained, with an increased occurrence of arguments with parents. Their family lives are also often marked by instability that may include parental separation or divorce. High-scoring boys are more likely to exhibit hyperactive and immature behaviors whereas girls are prone to display aggressive and delinquent conduct. Disinhibition/Excitatory High scores in this dimension involve attitudes and Potential (12 scales behaviors related to disinhibition and poor impulse orsubscales) control. Adolescents who score high on this factor display significant impulsivity, disciplinary problems, and conflicts with parents and peers. They are perceived as boastful, excessively talkative, unusually loud, and attention seeking. They display increased levels of heterosexual interest and require frequent supervision in peer contacts. High-scoring adolescents typically have histories of poor school work and failing grades, truancy, disciplinary actions including suspensions, school drop-out, and violations of social norms in the home, school, and social environment. Their interpersonal relationships tend to be dominant and aggressive, and they quickly become negative or resistant with authority figures. These adolescents are likely to engage in alcohol/drug use or abuse. Their behavioral problems include stealing, lying, cheating, obscene language, verbal abuse, fighting, serious disagreements with parents, and running away from home. In general, they may be expected to use externalization as a primary defense mechanism. Social Discomfort Adolescents who elevate the scales involved in this (8 scales or subscales) dimension are likely to feel withdrawn, self-

conscious, and uncertain in social situations, and display a variety of internalizing behaviors. They are frequently bossed or dominated by peers and tend to be fringe participants in social activities. These adolescents are typically perfectionistic and avoid competition with peers. They are viewed by others as fearful, timid, passive or docile, and acting young for their age. They may present complaints of tiredness, apathy, loneliness, suicidal ideation, and somatic complaints. These adolescents have a low probability of acting-out behaviors, including disobedience, alcohol or drug use, stealing, or behavioral problems in school. (continued) (table continued on next page)

< previous page

page_357

next page >

< previous page

page_358

next page > Page 358

(table continued from previous page) TABLE 12.3 (Continued) Factor Description Health Adolescents who obtain high scores on the Health Concerns Concerns dimension are seen by others as dependent, socially isolated, (6 scales or shy, sad, and unhappy. They are prone to tire quickly and subscales) haverelatively low levels of endurance. They may display a history of weight loss and report sleep difficulties, crying spells, suicidal ideation, and academic problems. A history of sexual abuse may be present. High-scoring boys are likely to be viewed as exhibiting schizoid withdrawal, whereas highscoring girls are primarily seen as somatizers. These adolescents typically display lower levels of social competence in the school setting. They are unlikely to be involved in antisocial behaviors or have histories of arrests. Naivete High scores on the Naivete factor are produced by adolescents (5 scales or who tend to deny the presence of hostile or negative impulses subscales) and present themselves in a trusting, optimistic, and socially conforming manner. They may be described as less likely to be involved in impulsive, argumentative, or socially inappropriate behaviors, and are more often seen as presenting in an ageappropriate manner. They have a low probability of experiencing internalizing symptoms such as nervousness, fearfulness, nightmares, and feelings of worthlessness, or of acting-out and provocative behaviors (i.e., lying or cheating, disobedience, and obscene language). Familial Adolescents who score high on scales or subscales related to Alienation this dimension are more likely to be seen by their parents as (4 scales or hostile, delinquent, or aggressive, and as utilizing externalizing subscales) defenses. They are also viewed as being loud, verbally abusive, threatening, and disobedient at home. These adolescents tend to have poor parental relationships involving frequent and serious conflicts with their parents. Presenting problems in psychiatric settings may include histories of running away from home, sexual abuse, and alcohol/drug use. In addition to family conflicts, high-scoring adolescents are also more likely to have disciplinary problems at school resulting in suspensions and probationary actions. Psychoticism Adolescents who produce elevations on the Psychoticism factor are more likely to be seen by others as obsessive, socially (4 scales or disengaged, and disliked by peers. They may feel that others subscales) are out to get them, and are more likely to be teased and rejected by their peer group. Sudden mood changes and poorly modulated expressions of anger are likely. They may also exhibit disordered behaviors including cruelty to animals, property destruction, and fighting, and are likely to have histories of poor academic achievement. Note. From MMPI-A Casebook (pp. 17-18) by R.P. Archer, R. Krishnamurthy, and J.M. Jacobson, 1994, Odessa, FL: Psychological Assessment Resources. Copyright © 1994 by Psychological Assessment Resources, Inc. Reprinted with permission. for the evaluation of normal range personality dimensions. Thus, the information drawn from the MMPI-A is restricted in describing adaptive functioning characteristics or nonpathological dimensions beyond MasculinityFemininity (Mf), Social Introversion-Extroversion (Si), and possibly some of the content domain of the Immaturity (IMM) scale. Additionally, as discussed in Archer (1997), both the MMPI and the MMPI-A are best used as measures of the individual's current level of functioning in relation to standardized measures of psychopathology. Moreover, the MMPI-A, like its predecessor, is limited in yielding data useful in long-range predictions of personality functioning due

< previous page

page_358

next page >

< previous page

page_359

next page > Page 359

to the instability manifested in adolescents' psychopathology and the consequent instability of test findings over extended periods (e.g., Hathaway & Monachesi, 1963). Research Findings and Clinical Applications Most clinicians see the development of an accurate and comprehensive diagnosis as central to the design of an effective intervention or treatment plan. In this regard, a substantial literature relates several diagnostic groupings or issues to relatively specific MMPI profile patterns. As noted in the MMPI-A manual (Butcher et al., 1992), the original form of the MMPI was used to examine a variety of diagnostic issues among adolescents, including behavioral problems, borderline personality disorder, depressed mood, eating disorders, homicidal behavior, aggression, incest and sexual abuse, sleeping problems, medical and neurological problems, schizophrenia, and suicide. The earliest research application of the MMPI with adolescents centered on the usefulness of this instrument in identifying groups of delinquent adolescents (Capwell, 1945a, 1945b). In a research study begun in 1947, Hathaway and Monachesi (1963) examined the usefulness of the MMPI in predicting the onset of delinquent behaviors in Minnesota samples involving approximately 15,000 adolescents. In their research findings, Hathaway and Monachesi reported modest relations between adolescents' original MMPI profiles and the later onset of delinquent behaviors. Hathaway and Monachesi found that elevations on Scales 4, 8, or 9, singularly or in combination, were associated with higher rates of delinquent behavior and they therefore labeled these excitatory scales. Hathaway and Monachesi also noted much instability in the elevation pattern in adolescents' profiles when ninth-graders were reevaluated during their senior year in high school. They did observe, however, that adolescents who produced marked elevations during the ninth-grade assessment were more likely to show relative stability on those scales when reevaluated 3 years later. The issue of the relation between MMPI data and clinicians' diagnostic judgments has been examined in several studies within adolescent samples. Archer and Gordon (1988) investigated the relation between Scale 2 and Scale 8 elevations and the occurrence of clinical diagnoses related to depression and schizophrenia in a sample of 134 adolescent inpatients. Archer and Gordon found little evidence of a meaningful relation between Scale 2 elevations and clinicians' use of depression-related diagnoses. However, they did report that Scale 8 elevations were an effective and sensitive indicator of schizophrenic diagnoses. A criterion of T-score values ³ 75 on Scale 8, used to identify schizophrenia in this study, resulted in an overall classification accuracy rate of .76. This level of performance is comparable to findings reported for Scale 8 in adult populations (e.g., Hathaway, 1956). A recent investigation by Archer and Krishnamurthy (1997a) extended the earlier Archer and Gordon research by examining the extent to which combining indices from the MMPI-A and the revised (Exner, 1986) Rorschach Comprehensive System furnishes incremental validity in terms of improved diagnostic prediction. The predictive accuracy of selected MMPI-A and Rorschach variables conceptually related to diagnoses of depression and conduct disorder were compared in a clinical sample of 152 adolescents. Results of these analyses revealed some significant differences between diagnostic groups on several MMPI-A scales, and one significant difference on the Rorschach involving the Vista variable. Stepwise discriminant function analyses resulted in two MMPI-A scales and two Rorschach variables that collectively accounted for a small proportion of variance in the diagnosis of depression, and three MMPI-A scales that accounted

< previous page

page_359

next page >

< previous page

page_360

next page > Page 360

for a significant component of variance in the conduct disorder diagnosis. Classification accuracy results indicated that the hit rate for the depression diagnosis did not improve using an optimal linear combination of these four variables over the .68 hit rates produced by the single use of either the MMPI-A Depression content scale (A-dep) or Scale 2. For the conduct disorder diagnosis, the optimal linear combination of MMPI-A Conduct Problems (A-con), Cynicism (A-cyn), and the IMM scales served as the best predictor, and no Rorschach variables contributed significantly to classification accuracy. These results replicated the findings of Archer and Gordon (1988) in indicating that the combined use of MMPI-A and Rorschach variables does not appear to produce incremental increases in accuracy of diagnostic classification. Johnson, Archer, Sheaffer, and Miller (1992) investigated the relations between characteristics of MMPI and Millon Adolescent Personality Inventory (MAPI) profiles and psychiatric diagnoses in a sample of 199 adolescent inpatients and outpatients. Results indicated low levels of congruence or agreement between MMPIderived diagnoses and clinician judgments. This finding is consistent with those typically obtained by researchers in adult populations employing broad diagnostic groups (e.g., Hedlund, Won Cho, & Wood, 1977; Moreland, 1983; Pancoast, Archer, & Gordon, 1988). The results of these studies underscore cautions that have been provided concerning the use of the MMPI, or any other personality measure used in isolation, in attempts to provide definitive psychiatric diagnoses for patients. Graham (1993) noted that the poor correspondence traditionally found between MMPI results and psychiatric diagnosis may be a result of the high degree of intercorrelation between standard MMPI scales, as well as the unreliability of specific diagnostic groups employed by Hathaway and McKinley in the original MMPI. These findings also likely reflect the wellestablished problems in reliability that appear to be inherent in the psychiatric nosology embodied in the Diagnostic and Statistic Manual (DSM) series. MMPI-A Scales Related to Treatment Planning The MMPI-A includes a variety of scales relevant to a number of treatment planning issues. For example, research by Archer, White, and Orvin (1979) associated higher scores on validity scales L and K with longer treatment durations for hospitalized adolescents. Elevations on Welsh's Repression (R) scale, and the Negative Treatment Indicators (A-trt) content scale also appear to be relevant to evaluating the adolescent's readiness and capacity to engage in the treatment process. Basic scale measures including Scales 2 and 7, and the supplementary scale Anxiety (A), have direct relevance in attempting to estimate the degree of affective distress experienced by the adolescent. This issue is also illuminated by content scales Anxiety (A-anx), Obsessiveness (A-obs), and Depression (A-dep). Issues related to impulse and behavioral control, as noted in the discussion of excitatory scales, are related to elevations on Scales 4, 8, and 9. This issue is relevant to findings from supplementary Scale IMM, and content scales such as Conduct Disorder (A-con), Anger (A-ang), and Cynicism (A-cyn). Potential problems can be identified in a number of specific life areas using the MMPI-A, including the academic environment (A-sch, A-las) and family environment (A-fam). Also of note are the relative contributions of the MAC-R, ACK, and PRO scales to screening and evaluation of substance abuse problems among teenagers. In addition, Marks et al. (1974) noted an association between several 2-point codetypes and the occurrence of abuse or alcohol problems, including 2-4/4-2 and 4-9/9-4 codetypes. Archer and Klinefelter (1992) demonstrated that, in a sample of 1,347 adolescents in clinical

< previous page

page_360

next page >

< previous page

page_361

next page > Page 361

settings, certain MMPI codetypes, particularly those involving elevations on Scale 4 or scale 9, are much more likely to be associated with elevations on the MAC scale. As noted by Archer (1987, 1997), the MMPI has proved to be a very useful tool in treatment planning for adolescents for over 40 years. It is likely that the MMPI-A will continue and expand this tradition, particularly as more information becomes available concerning the correlate patterns for new MMPI-A scales. Integration of MMPI-A Results with Other Evaluation Data for Prediction of Therapeutic Outcome Findings from the MMPI-A should be routinely integrated with results from other test instruments, clinical interview, family assessment, and psychosocial history data in deriving diagnostic and treatment planning recommendations. Gallucci (1990) reviewed the literature related to the combination of MMPI results with data from other instruments, including the Wechsler Intelligence Scales, the Rorschach, and the Millon inventory in adult populations. Archer and Krishnamurthy (1993b) reviewed the literature derived from 37 studies that reported interrelations between MMPI and Rorschach variables in adult populations. The results of this body of literature indicated, with impressive consistency, generally limited or minimal relations between the MMPI and Rorschach. Archer and Krishnamurthy (1993a) also examined the empirical findings concerning the relations between Rorschach and MMPI variables in seven studies conducted with adolescent samples and found consistently modest or nonsignificant relations between the two instruments in this population as well. Most recently, Krishnamurthy, Archer, and House (1996) conducted an empirical investigation on the relation between carefully selected MMPI-A and Rorschach variables in a clinical sample of 152 adolescents based on a priori hypotheses focused on specific construct areas. The constructs examined included such areas as anxiety, depression, somatic concern, defensiveness, bizarre thinking, self-image, and impulse control and produced hypotheses that involved a total of 28 MMPI scales and 43 Rorschach variables. Once again, the results consistently indicated very limited associations between conceptually related MMPI-A and Rorschach variables. Krishnamurthy et al. observed that a logical conclusion from this body of literature is that variables receiving similar labels on the MMPI and Rorschach, such as the Rorschach DEPI variable and Scale 2 of the MMPI, actually measure different constructs, or at least markedly different components of the same broad construct. Perhaps most troubling, they observed that a matrix comprised of MMPI-A and Rorschach variables would not display significant evidence of convergent validity in terms of the patterns of intercorrelations that might be expected given the theoretical constructs attributed to these variables. Krishnamurthy et al. (1996) cautioned that scores on sets of similarly labeled variables across the Rorschach and the MMPI should not necessarily be viewed as confirming or disconfirming the data provided by either instrument. In cases where the Rorschach and MMPI would lead to contradictory clinical inferences, Archer and Krishnamurthy (1993a) recommended that the clinician place particular emphasis on the use of additional sources of data, including individual and family interview data and psychosocial history findings, in reaching interpretive conclusions. In addition to the clinical interview of the adolescent, the assessment of parental perceptions concerning the adolescent's functioning is very important. Several instruments, including the Child Behavior Checklist-Revised (CBCL) by Achenbach and

< previous page

page_361

next page >

< previous page

page_362

next page > Page 362

Edelbrock (1983), provide a standardized format to collect this type of information. Archer (1987, 1997) and Williams (1986) also stressed the importance of MMPI assessment of the parents of adolescents being evaluated to generate a greater understanding of possible family dynamics and influences that may help to shape or distort parental perceptions of their child's functioning. As Archer (1987) noted, The current literature supports the involvement of parents of psychiatrically disturbed children in psychiatric treatment efforts. Perhaps the clearest finding from this literature is that the parents of psychiatrically disturbed children typically display substantial features of psychological distress and maladjustment. This conclusion is particularly marked for the parents of children in inpatient treatment settings. Therefore the involvement of parents in treatment programs that are responsive to the psychological features of the parents, as well as the symptomatology of the adolescent patient, appears to have firm empirical grounds. Clearly, such a treatment involvement does not require a causal assumption of a parental role in the etiology of the child's disorder. These treatment efforts may be more parsimoniously based upon the recognition of the marked degree of psychological pain and disturbance commonly reported among parents of children experiencing deviant psychological development. (p. 178) Provision of MMPI-A Feedback Archer (1997) noted that the provision of MMPI-A feedback to the adolescent is an important factor in increasing the adolescent's motivation to cooperate with testing procedures. The issue of MMPI test feedback has been a central emphasis in several texts (Butcher, 1990; Finn, 1996; Lewak, Marks, & Nelson, 1990). Also, a computer software package has been developed by Marks and Lewak (1991) to assist the clinician in providing MMPI test feedback to adolescent clients, and a feedback manual was recently developed by Finn (1996). MMPI-A feedback with adolescents should begin with an explanation of the test instrument, including the ways MMPI-A data are used to generate hypotheses concerning patients' personality characteristics. Adolescents should be encouraged to interact with the psychologist during the feedback session. The adolescent's input into the feedback process allows the psychologist an opportunity to appraise the adolescent's reaction to, and acceptance of, various features of test findings. It is usually much easier for the adolescent to accept test feedback when findings are presented individually, instead of the framework of a family therapy session. It is probable that many clinicians often underestimate the extent of information that an adolescent is capable of usefully assimilating, particularly if the psychologist is careful to avoid the use of technical jargon and uses language and concepts understandable to the patient when presenting test findings. Areas of Limitations or Potential Problems in MMPI-A Use Several limitations or potential problems can be identified when using the MMPI-A for treatment planning purposes. Issues similar to those regarding the generalizability of the literature from the MMPI to the MMPI-2 with adults have been raised concerning the applicability of adolescent research findings based on the original form of the MMPI to the MMPI-A. The 2-point codetype congruence rates between the MMPI and MMPI-A for adolescents in the normative sample were 67.8% for males and 55.8% for females, and 69.5% for males and 67.2% for females in a clinical sample (Butcher et al., 1992). Using a 5-point codetype definition requirement, the congruence rates increased

< previous page

page_362

next page >

< previous page

page_363

next page > Page 363

to 95.2% for males and 81.8% for females in the normative sample, and 95.4% for males and 94.4% for females in the clinical sample (Butcher et al., 1992). These data are very similar to the 2-point codetype congruence rates between the MMPI and MMPI-2 for normal and clinical samples of adults (Butcher et al., 1989). In addition to issues concerning the generalizability of findings from the original form of the MMPI to the MMPI-A, there are 15 content scales, 3 supplementary scales, 3 Si subscales, and 4 new validity scales that do not have counterparts on the original form of the MMPI. These measures will require ongoing validity studies to establish the correlate meanings for these measures in clinical populations. As more clinical correlate data is firmly established, the interpretation of these scales should become less tentative and provisional. It has been noted that the MMPI-A requires a substantial amount of cognitive maturation and reading ability for successful administration, and the revision of the test instrument has not substantially changed these administration requirements. Adolescents must still have the capacity and motivation to complete a relatively long and demanding test instrument. As with the original form of the MMPI, short forms are not recommended as a way of attempting to reduce the requirements of the MMPI-A for the adolescent respondent. Butcher and Hostetler (1990) defined the term short form as ''sets of scales that have been decreased in length from the standard MMPI form. An MMPI short form is a group of items that is thought to be a valid substitute for the full scale score even though it might contain only four or five items from the original scale" (p. 12). Archer (1997) noted the potential problems involved in the use of short forms with the MMPI-A. These problem areas center on the loss of important clinical information when short forms are substituted for the administration of the full MMPI-A item pool. Short-form administrations are contrasted with abbreviated administrations in which a clinician elects to administer the first 350 items in the MMPI-A. This administration approach will result in item endorsements necessary to score the basic clinical scales. The abbreviated administration will not, however, provide sufficient information to score the content scales, several of the supplementary scales, or the validity scales VRIN, TRIN, F and F2. If an abbreviated format increases the motivation or cooperation of an adolescent, this option may be used with an understanding of what data the clinician can gather from such an approach and what scales and measurement areas cannot be addressed. A final area of potential limitation related to the MMPI-A is associated with the relatively low magnitude of MMPI-A basic scale elevations that are likely to occur with this revised instrument. As noted by Archer (1987), normal range mean profiles for adolescent populations were often found on the original form of the MMPI, leading to the recommendation by Ehrenworth and Archer (1985) that T-score values ³ 65 be used as the demarcation point for clinical range elevations when using adolescent norms. Archer, Pancoast, and Klinefelter (1989) found that the use of a clinical scale T-score value of 65 or greater (rather than 70) to define clinical levels of psychopathology resulted in increases in sensitivity in accurately identifying profiles produced by normal adolescents versus adolescents receiving treatment in outpatient and inpatient settings. The MMPI-A produces even lower mean T-score values for adolescent clinical samples than those found on the original form of the MMPI using the Marks and Briggs (1972) adolescent norms (Archer, 1997). The MMPI Adolescent Project Committee recognized that the revised test instrument would often produce lower T-score values for adolescents in comparison with the original test instrument. This observation led to the development of the "gray zone," or "shaded

< previous page

page_363

next page >

< previous page

page_364

next page > Page 364

zone," on the MMPI-A profile sheets. Specifically, the use of a single "black line" value to delineate the demarcation point between normal and clinical range scores was abandoned in favor of the creation of a range of scores that serve as a transition area between normal and clinical range elevations. On the MMPI-A, this zone is placed in the range of T-score values ³ 60 and £ 65 for all MMPI-A scales, regardless of whether linear or uniform T-score procedures were used for that particular scale. A central question requiring further study relates to the sensitivity and specificity of the MMPI-A instrument in identifying psychopathology in adolescents. Substantial research data is needed to determine whether the MMPI-A may be subject to increased problems in the accurate detection of psychopathology (sensitivity) because of the reduction of T-score values. These issues are directly related to the questions of how often a normal adolescent will produce T-score values within normal ranges on the MMPI basic scales, and how frequently adolescents experiencing significant psychopathology will produce one or more significant elevations on the MMPI-A basic scales. Although much more research is needed to resolve this issue, Alperin et al. (1996) provided data on the relative efficacy of applying a T-score value ³ 60 versus T ³ 65 as the criterion in defining MMPI-A clinical range elevations when using the standard MMPI-A norms in the normative sample of 1,620 adolescents and a clinical sample of 122 adolescent inpatients. In this investigation, the T ³ 65 criterion produced an overall hit rate of 70% accurate identification, in contrast to a 57% hit rate with the T ³ 60 criterion, and the former criterion also produced a more effective balance between test sensitivity (71%) and specificity (70%). These results appear consistent with the recommendation contained in the MMPI-A manual that "a clinically significant elevation is defined as an MMPI-A T-score ³ 65" (Butcher et al., 1992, p. 43). Use of the MMPI-A for Treatment Monitoring and Outcome Assessment General Issues The use of the MMPI-A has been discussed in terms of objectively evaluating and describing an adolescent's level of functioning in relation to standardized measures of psychopathology. The MMPI-A also may be used in repeated administrations to assess changes in functioning across intervals of time. This use of the MMPI-A is particularly important because many aspects of psychopathology manifested by adolescents during this developmental stage are subject to rapid changes over time. When the MMPI-A is administered at various points in the treatment process, it can provide the clinician with a sensitive index of therapeutic progress. Further, when the MMPI-A is administered at the conclusion of treatment it can provide a comprehensive assessment of psychological change as a function of the intervention process. Evaluation Against Criteria for Outcome Measures Whereas many aspects of the MMPI-A contain new features requiring much further research and investigation, it is possible to offer some speculations concerning the ability of the MMPI-A to meet the criteria for outcome assessment measures as formulated by Ciarlo, Edwards, Kiresuk, Newman, and Brown (1981) and discussed earlier. It is likely, for example, that the MMPI-A will have substantially more relevance to the

< previous page

page_364

next page >

< previous page

page_365

next page > Page 365

assessment of adolescent psychopathology than the original MMPI form because of the inclusion in the revised instrument of items and scales specifically targeted for this population. Thus, the MMPI-A retains the benefits of the original MMPI in the assessment of a wide range of psychopathology, and extends the applicability of the instrument to the adolescent age group in a manner consistent with the Ciarlo et al. Criterion 1 emphasis on the relevance of an instrument to the target group. In addition to Criterion 1, the MMPI-A would appear to hold special utility in meeting the 6th criterion related to psychometric strength of an instrument, and the 10th criterion regarding the usefulness of the instrument in clinical services. More is probably known concerning the psychometric properties of the original form of the MMPI than any other widely used psychopathology-related assessment instrument. For example, Butcher (1987) estimated that over 10,000 articles and books have documented the use of the MMPI, and Butcher and Owen (1978) estimated that 84% of all research conducted in the personality inventory domain has centered on the MMPI. Archer (1997) provided approximately 400 references relevant to the use of the MMPI and MMPI-A with adolescents, and the MMPI-A manual provides extensive information concerning the reliability and validity of the revised instrument (Butcher et al., 1992). The MMPI and MMPI-A are also particularly strong in the area of the assessment findings related to the provision of clinical services. The use of the MMPI-A as a treatment outcome assessment measure provides extensive clinical information regarding usefulness to both the treatment team and to the patient. Finally, it might also be noted that the MMPI and MMPI-A have a particular strength in reference to the last criterion listed by Ciarlo et al. (1981), that is, Criterion 11, regarding compatibility with clinical theories and practices. Although the original MMPI, and to a lesser extent the MMPI-A, were developed in an atheoretical and empirical fashion, these instruments are clearly compatible with a very wide range of theories of psychopathology from the behavioral to the psychoanalytic. This compatibility with a broad range of clinical orientations and theories is probably one of the most important factors in the widespread popularity of this instrument in assessment practices with both adults and adolescents. Balanced against these areas of strength for the MMPI-A, it might be argued that the revised instrument, as well as the original MMPI, are less effective in meeting other outcome measure criteria developed by Ciarlo et al. (1981), including their emphasis on instruments with a simple, teachable methodology (Criterion 2), use of outcome measures that might be employed with multiple respondents (Criterion 4), criteria related to cost factors (Criterion 7), understandability by nonprofessional audiences (Criterion 8), and simplicity of feedback and interpretation processes (Criterion 9). It could also be noted that these latter factors are likely to be particularly valued in a managed health care environment. In these areas it should be acknowledged that the MMPI-A is a complicated, extensive test instrument requiring substantial time on the part of the adolescent to respond to the lengthy item pool and extensive training and expertise on the part of the psychologist in order to ensure accurate interpretation practices. Research Findings Systematic and controlled treatment outcome studies are not yet available for the MMPI-A. However, much treatment outcome research data is available concerning MMPI basic and special scales in adult populations. For example, Barron (1953) developed the Ego Strength scale by identifying items that separated the response

< previous page

page_365

next page >

< previous page

page_366

next page > Page 366

patterns of 17 neurotic patients judged to have clearly improved after 6 months of psychotherapy versus 16 neurotic patients judged unimproved over the same time interval. Because of the largely contradictory results generated by studies examining the usefulness of the Ego Strength scale, however, this measure was not retained by the MMPI Adolescent Project Committee for the MMPI-A. In contrast, a revised form of the MacAndrew Alcoholism scale (MAC-R) was retained in the MMPI-A. Individuals' scores on the MAC scale appear to remain relatively stable across time (Archer, 1987, 1997). For example, MAC scale scores in alcoholics remained elevated following treatment in studies by Gallucci, Kay, and Thornby (1989) and others. In addition to the MAC scale, Welsh's Anxiety (A) and Repression (R) scales were carried over from the original MMPI to the MMPI-A. Welsh (1956) created the Anxiety and Repression scales to measure the first two factors of the MMPI. The particular usefulness of the A and R scales in the assessment of treatment outcome may be directly related to their relation to the factor structure of the MMPI. Welsh found that the first factor of the MMPI had high positive loadings on MMPI Scales 7 and 8, with a negative loading on Scale K. This factor was originally labeled general maladjustment (Tyler, 1951) and subsequently labeled by Welsh as Anxiety (Welsh, 1956). This first factor has also been identified in factor analyses of adolescents' basic scale values on the MMPI (Archer, 1984; Archer & Klinefelter, 1991) and on the MMPI-A (Butcher et al., 1992). Thus, the MMPI-A Welsh's A scale served as a "marker" for first factor variance in the test instrument. In the MMPI-A normative sample, the A scale was highly intercorrelated with several other MMPI-A measures, including basic scales K (r = -.72), Pt (r = .89), Sc (r = .76), and content scales A-anx (r = .83), A-obs (r = .82), and A-dep (r = .80). Thus, T-score values on all of these measures except Scale K (which is negatively correlated to the first factor) tend to be lower when an adolescent reports lower reevaluation levels of emotional distress and maladjustment as a result of successful treatment efforts. Welsh's second factor, although less clearly defined than the first factor, tends to be related to elevations on Scale 3 and negatively related to elevations on Scale 9. Welsh labeled this factor Repression, and this factor has also been identified in factor analytic studies of adolescents with the original form of the MMPI (Archer, 1984; Archer & Klinefelter, 1991) and with the MMPI-A (Butcher et al., 1992). The Repression scale is most highly intercorrelated with Scales L (r = .44), K (r = .45), and Ma (r = -.43) in the MMPI-A normative sample. All 33 items in the MMPI-A R scale are scored in the false direction, and involve the denial of symptomatology, particularly aggressive or hostile feelings, and the expression of disinterest in sensation-seeking activities. As a component of this dimension, Scale K is highly and negatively intercorrelated with several MMPI-A scales, including content scales A-anx (r = -.59), A-obs (r = -.67), A-ang (r = -.62), and A-cyn (r = -.70), and supplementary scale A (r = -.72). This pattern implies that MMPI-A test-retest administrations may often show a pattern where reduction of Factor 1 symptomatology will be associated with increased elevations on Factor 2 related scales such as Repression and particularly the K scale. This pattern may be related to the observation that the K scale, in its use in adult populations, has often been seen as an indicator of psychological health rather than a measure exclusively of defensiveness. An understanding of the interrelations between Factor 1 and Factor 2 patterns in the MMPI-A will assist in interpreting individual change in test-retest MMPI-A administrations by providing a conceptual organization to the changes shown on the individual scale level.

< previous page

page_366

next page >

< previous page

page_367

next page > Page 367

Clinical Applications Butcher and Tellegen (1978) and Ullmann and Wiggins (1962) reported that from 80% to 85% of the items on the original MMPI were worded in a manner that related to trait personality features or biographical information that should not change on retest. This estimate leaves approximately from 15% to 20% of the original item pool to provide information on changes in psychological characteristics. If only 15% of the original 550 items were capable of showing state changes in functioning, however, there would still be a pool of approximately 83 items capable of reflecting changes in psychological functioning. Several studies have been conducted on the stability of high-point, 2-point, and even 3-point codetypes for the original MMPI and this literature has been reviewed by Graham (1993). Among his conclusions, Graham noted that codetypes are likely to be more stable when the primary scales are more elevated, and when there is a greater degree of elevation of the primary scales in relation to other scales in the profile (i.e., when the codetype is well defined). Graham also noted that although codetypes may change from one administration to another, they are likely to remain within the same broad diagnostic grouping. Pancoast et al. (1988) examined the agreement or congruence rate between discharge diagnoses rendered by psychiatrists, and the admission and discharge MMPI-derived diagnoses from four diagnostic classification systems developed for the MMPI. The four classification systems included a simple high-point code based on most elevated clinical scale in the profile, and Henrichs' (1964, 1966) revision of the Meehl-Dahlstrom rules (Meehl & W.G. Dahlstrom, 1960), the Goldberg Equation (Goldberg, 1965), and the system developed by Lachar (1974). This study indicated a modest hit rate of between 24% and 34% for MMPI-derived diagnoses (across the various classification systems) and psychiatric diagnoses. Further, the stability of MMPI-based diagnoses from admission to discharge ranged from 48% to 51% depending on the classification system employed. Thus, there appeared to be little difference in the accuracy or the stability of profiles related to the complexity of the system used for diagnostic classification purposes. Of the several factors that may affect the evaluation of change on the MMPI-A, perhaps the most important issue relates to the concept of the standard error of measurement. As previously noted, the MMPI-A manual reports that the standard error of measurement for the MMPI-A basic scales is approximately two to three raw score points or four to six T-score points (Butcher et al., 1992). This standard error of measurement estimate indicates that if an individual were to retake the MMPI-A within a very brief period of time with their emotional and psychopathology status remaining constant, their T-score values on the basic scales would be expected to fall within a range of plus or minus approximately five T-score points roughly 68% of the time. The standard error of measurement range on the MMPI-A places practical limits on the interpretation of small T-score differences in the evaluation of an individual's degree of change obtained by comparing original with readministration scores from the MMPI-A. As noted in the MMPI-A manual, this standard error of measurement also has implications for codetype interpretation. For example, a 2-4-8 codetype, with all three scales having Tscore values of 70, would be arbitrarily placed within a 2-point code category (i.e., 2-4), but could be markedly different from a clearly defined 2-4 profile type with a substantial T-score difference between the second and third most elevated scales.

< previous page

page_367

next page >

< previous page

page_368

next page > Page 368

Use with Other Evaluation Data As previously noted under the discussion of the MMPI-A in treatment planning, findings from the MMPI-A should be routinely integrated with the results of other sources of information concerning the adolescent, particularly those that provide other perspectives on the adolescent functioning, including reports and ratings as provided by teachers, parents, and treatment team members. These external sources of information provide very valuable and unique data that supplement the types of information the adolescent can provide in the MMPI-A self-report format. Provision of MMPI-A Feedback Regarding Assessment Findings As previously noted, the provision of MMPI feedback has become a central emphasis in discussions of this instrument (Butcher, 1990; Finn, 1996; Lewak et al., 1990). Unfortunately, much of these discussions relate to feedback connected with the use of the instrument for treatment planning rather than as a treatment outcome assessment measure. Nevertheless, it is clear that the MMPI and the MMPI-A can provide valuable information when used in a feedback process to document the adolescent's change over time as a result of participation in the treatment process. Used within this format, the initial testing provides a baseline against which later MMPI administrations can be compared in order to evaluate the degree of change in personality and psychopathology patterns over the course of treatment. The review of such test findings provide both the adolescent and the therapist with an important opportunity to explore the degree of consensual agreement between therapist, patient, and test findings concerning the amount and nature of change that has been experienced. The process of readministering the MMPI-A to evaluate treatment process or treatment outcome is usually not resisted by adolescents if they have a "stake" in such testing in the sense of a clear understanding that they will receive feedback concerning test findings. As previously noted, adolescents are capable of receiving and understanding a great deal of information concerning the MMPI-A. In addition to avoiding technical jargon, however, the therapist should avoid the use of feedback sessions as a means of "confronting" reluctant or resistant adolescents concerning their lack of treatment progress. Whereas such confrontations might be indicated for a particular adolescent, the use of the MMPI-A to provide the grounds for such a confrontation may reduce the adolescent's willingness to accurately report on this instrument in future evaluations. Limitations/Potential Problems in MMPI-A Use As previously noted, the greatest single problem in the evaluation of change on the MMPI is related to the overemphasis of small T-score shifts representing changes that are less than the standard error of measurement on the test (i.e., 5 T-score points). In addition, the MMPI interpreter is often left with the challenge of determining whether an adolescent's improvement as reflected in MMPI-A T-score reductions on clinical scales represents actual positive changes in psychological functioning, or the adolescent's use of a defensive response set in attempt to minimize report of psychopathology during the test readministration. One of the relatively unique aspects of the MMPI-A, which

< previous page

page_368

next page >

< previous page

page_369

next page > Page 369

substantially helps in this differentiation task, is the presence of extensive validity scale information concerning the adolescent's approach to the response process. Using the original form of the MMPI, Herkov, Archer, and Gordon (1991) examined the relative efficacy of the traditional validity scales and the Wiener-Harmon SubtleObvious subscales in identifying fake-bad and fake-good response sets among adolescents. This study involved 403 adolescents from a nonpatient adolescent group administered the MMPI-A under standard conditions, a nonpatient group instructed to "fake bad," and a psychiatric inpatient group instructed to "fake good." The results of this study indicated that elevations on Scale L were a highly sensitive indicator of adolescents' attempts to fake good, whereas elevations on MMPI Scale F were quite sensitive in accurately identifying adolescents attempting to overreport symptomatology on the test instrument. In general, the use of the MMPI and MMPI-A validity scales should be able to provide very important assistance to the interpreter in determining the accuracy of the adolescents' reports of change in symptomatology across MMPI-A administrations. Clinical Case Example Examples of MMPI-A interpretation principles can be found in the test manual (Butcher et al., 1992), as well as in Archer (1997), Archer, Krishnamurthy, and Jacobson (1994), and Butcher and Williams (1992a). The following clinical case example was selected from Archer (1997) to illustrate the use of the MMPI-A for the purposes of personality description and treatment planning. Deborah, a 17-year-old, White, female adolescent, was admitted to an acute inpatient unit in a psychiatric hospital. This patient had a history of antisocial behaviors and legal violations that included loitering, petty larceny, vagrancy, possession of drugs, and possession of drugs with intent to distribute. Her psychiatric symptomatology at the time of hospitalization included anger, hostility, and depression. On admission, the treatment team DSM-III-R diagnoses for this patient included Dysthymic Disorder (300.40), Conduct Disorder, Undifferentiated Type (312.90), and Psychoactive Substance Abuse (305.90). She had an extensive history of alcohol and substance abuse, including hallucinogens, marijuana, cocaine, and barbiturates. Immediately prior to hospitalization, Deborah required emergency hospitalization for an unintentional drug overdose from her use of a combination of Valium and cocaine. This adolescent was an only child from an upper socioeconomic class background. Deborah's father was an executive vice president for a multinational corporation and his job responsibilities resulted in multiple relocations of the family to a variety of Western European countries. Approximately 1 year prior to the patient's current psychiatric admission, she had been arrested by British authorities for the possession and sale of narcotics. Deborah's parents reported a long history of difficulty controlling their daughter's behavior, and indicated that Deborah had an extensive history of school truancy and episodes of running away from home. Her parents also indicated their suspicions regarding their daughter's possible use of prostitution to support and maintain her drug use. Deborah's academic records indicated a history of underachievement with grades in the average to below-average range. Results of administration of the Wechsler Adult Intelligence Scale-Revised (WAISR) produced a Verbal IQ score of 110, a Performance IQ score of 124, and a Full Scale IQ score of 116. The Child Behavior Checklist (CBCL), developed by Achenbach and

< previous page

page_369

next page >

< previous page

page_370

next page > Page 370

Edelbrock (1983), was administered to Deborah's mother with resulting elevations on the Delinquent and Hyperactive scales. Staff ratings on the Devereux Adolescent Behavior (DAB) rating scale, developed by Spivack, Haimes, and Spotts (1967), showed elevations on the Unethical and Defiant/Resistant behavior factors. Deborah's MMPI-A basic scale profile is shown in Fig. 12.3. This profile displays T-score values based on MMPI-A norms (Butcher et al., 1992) and on the norms developed by Marks and Briggs (1972) for the original form of the MMPI that can be found in Appendix G of the MMPI-A manual (Butcher et al., 1992). The third step of the interpretive model noted in Table 12.2 involves the evaluation of the technical validity of the MMPIA profile. This step is undertaken by review of scales and raw score values appearing on the left side of the basic scale profile sheet. Note that Deborah omitted only one item on the Cannot-Say scale, a value clearly within acceptable limits for profile interpretation. The response consistency measures VRIN (T= 43) and TRIN (T = 54) also produced values within acceptable limits for valid profile interpretation. Also note that there is relatively little difference between the T-score elevations on scales F1 (T = 66) and F2 (T = 53), providing evidence that Deborah did not respond to the latter part of the test booklet in a random manner. The validity scale configuration produced by MMPI-A Scales F, L, and K are also within acceptable limits and consistent with a meaningful and useful interpretation of MMPI-A clinical scale findings. The fourth step shown in Table 12.2 involves a review of the basic scale clinical profile. Deborah's basic scale profile is a well-defined 4-9 codetype. The term definition, as applied to 2-point code types, refers to the degree of T-score difference between the second (Scale 9) and third (Scale 8) most elevated clinical scales. The 4-9 codetype is very commonly found among adolescents in clinical settings on both the original form of the MMPI and the MMPI-A (Archer, 1987, 1997). In Marks et al. (1974) description of 2-point codetypes, the 4-9 code was found for adolescents who were described as defiant, impulsive, disobedient, and school truant. Marks et al. also noted that these adolescents were likely to be runaways and were often described by their parents as difficult to control. The chief defense mechanism of the 4-9/9-4 adolescent was acting out, and therapists described these adolescents as resentful of authority, insecure, socially extroverted, and capable of initially arousing liking in others. Marks et al. (1974, p. 221) referred to these adolescents as "disobedient beauties" and provided descriptors including seductive, provocative, and handsome. The clinical correlate data for the 4-9/9-4 codetype indicates individuals with this MMPI pattern are often in trouble with their environment because of antisocial behaviors. In the adult literature, individuals with this codetype pattern often receive a diagnosis of antisocial personality disorder and are described as selfish, impulsive, and self-indulgent. As noted in the model for profile interpretation, it is often useful to review values for Scales 2 and 7 to assess the overall degree of affective distress. Deborah's scores on these measures are markedly low for an adolescent recently admitted to inpatient treatment, and are equivalent to those found for the MMPI-A normative population. This adolescent's absence of emotional or affective distress is a negative prognostic indicator for Deborah and may reflect a lack of necessary motivation (i.e., emotional distress) to engage in the therapeutic change process. Steps 5 and 6 in the profile interpretation process involve a review of the content and supplementary scales as presented in Fig. 12.3. Consistent with the absence of affective distress reflected in the basic scale profile, Deborah's score on Welsh's A (T = 51) suggests little distress or discomfort at the time of her MMPI-A assessment. Further, her score on Welsh's R scale (T = 46) reinforces the findings from her 4-9/9-4

< previous page

page_370

next page >

< previous page

page_371

next page > Page 371

Fig. 12.3. MMPI-A content and supplementary scale profile for clinical case example (Deborah). From J.N. Butcher, C.L. Williams, J.R. Graham, R.P. Archer, A. Tellegen, Y.S. Ben-Porath, and B. Kaemmer, 1992, Minneapolis, MN: Regents of the University of Minnesota. Copyright © the Regents of the University of Minnesota, 1942, 1943 (renewed 1970), 1992. This profile form a1992. Reproduced by permission of the publisher.

< previous page

page_371

next page >

< previous page

page_372

next page > Page 372

codetype in suggesting that acting out, rather than repression, is her primary defense mechanism. A review of Deborah's supplementary scale scores also provides a number of interesting observations related to potential substance abuse problems. This adolescent's raw score value of 30 on the MAC-R would result in a classification of this adolescent as a probable substance abuser, a finding consistent with her elevated scores on the PRO scale (T = 84) and her psychosocial history findings. Additionally, research by Archer, Gordon, Anderson, and Giannetti (1989) indicated that adolescents with elevated MAC scores are much more likely to receive diagnoses related to conduct disorder. In contrast, Deborah's scores are within normal limits on the ACK scale (T = 56), a measure of this adolescent's willingness to acknowledge or discuss alcohol or drug use symptoms and problems. Thus, Deborah may have many more problems in the area of drugs and alcohol than she will admit in clinical interview. Finally, Deborah also shows a marginal elevation on the IMM scale, a measure of deficits and problems in the area of ego maturation, self-awareness, and the ability to form meaningful and nonexploitive relationships with others. Archer, Pancoast, and Gordon (1994) found that female adolescents who produce elevations on the IMM scale have poor relationships with their parents and frequently have histories of school truancy. Deborah's content scale profile, consistent with her low score on the Welsh's A scale, produced normal-range values on measures of affective distress and internal symptoms. This is reflected in her normal-range values on Scales A-anx, A-obs, A-dep, A-hea, and A-biz. Deborah is likely to have a substantial difficulty in interpersonal functioning as reflected in her substantial elevation on the A-fam scale, which indicates the presence of marked family conflict and discord, as well as marginal elevations on A-ang and A-cyn (T ³ 55 to T £ 60). Deborah also shows a marginal elevation on the A-con scale, indicative of problem behaviors involving unlawful actions or attitudes and behaviors that violate societal standards. Deborah's score on the A-trt content scale probably reflects a negative attitude toward mental health treatment, or doubts concerning her capacity to benefit from psychotherapy. Her A-trt scale value underscores that Deborah is likely to present substantial initial barriers to the treatment process, a factor often found for conduct-disordered adolescent patients. Finally, it can be noted that Deborah's A-sch score is quite elevated and accurately reflects her extensive problems in the academic environment. These problems have included extensive school truancy, repeated suspensions and disciplinary actions, and marginal academic performance given this adolescent's intellectual potential. The next step in terms of the MMPI-A profile interpretation illustrated in Table 12.2 involves a review of Harris-Lingoes subscales and critical item content. A review of Harris-Lingoes subscales for basic Scale 4 (see Fig. 12.4) indicate marked elevations on Pd1 (Familial Discord) and Pd2 (Authority Problems). These scores indicate that Deborah is likely to perceive her home environment as unsupportive, conflictual, and controlling and critical (Pd1), that she is likely to harbor substantial resentment of authority that may include a history of academic or legal difficulties (Pd2), and that she is likely to feel misunderstood, alienated, isolated, and unhappy (Pd4). Deborah's Scale 9 elevation is related to Harris-Lingoes subscale elevations on Ma1 (Amorality) and Ma3 (Imperturbability). Such adolescents might be described as relating to others in an opportunistic, manipulative, and selfish manner (Ma1) and these adolescents tend to operate independently, seek out excitement, and deny social anxiety (Ma4). Deborah's item endorsement pattern on the Lachar-Wrobel critical items indicate few items related to depression or emotional distress, with the majority of her critical item endorsements related to adjustment problems in the areas of antisocial attitude and family conflict.

< previous page

page_372

next page >

< previous page

page_373

next page > Page 373

Fig. 12.4. MMPI-A profile for Harris-Lingoes and Si subscales for clinical case example (Deborah). From J.N. Butcher, C.L. Williams, J.R. Graham, R.P. Archer, A. Tellegen, Y.S. Ben-Porath, and B. Kaemmer, 1992, Minneapolis, MN: Regents of the University of Minnesota.

< previous page

page_373

next page >

< previous page

page_374

next page > Page 374

Figure 12.5 provides the MMPI-A Structural Summary data for Deborah. Of the eight factor dimensions included in the MMPI-A Structural Summary, Deborah produced critical range findings on the Familial Alienation (Factor 7) dimension. As noted

Fig. 12.5. MMPI-A Structural Summary for clinical case example (Deborah). From R.P. Archer and R. Krishnamurthy, 1994, Odesa, FL: Psychological Assessment Resources, Inc. Copyright © 1994 by Psychological Assessment Resources, Inc. Reprinted with permission.

< previous page

page_374

next page >

< previous page

page_375

next page > Page 375

Fig. 12.5. (Continued) in the correlate study by Archer, Krishnamurthy, and Jacobson (1994), adolescents who produce elevations on a majority of the scales or subscales of this dimension are likely to utilize externalizing defenses and to be seen as delinquent, aggressive, or hostile. Empirical findings have related elevations on this dimension to the occurrence of frequent and serious parental conflicts and to significant disciplinary problems in the academic environment. Further, histories of alcohol and drug abuse are related to

< previous page

page_375

next page >

< previous page

page_376

next page > Page 376

elevations on Familial Alienation. Congruent with prior interpretation of this profile, Deborah provides few critical range elevations on the General Maladjustment dimension, consistent with the view that she is experiencing little generalized emotional distress at the time of this MMPI-A assessment. Deborah was discharged to outpatient care following 27 days of intensive inpatient treatment. Outpatient psychotherapy recommendations included recommendations of individual, family, and group psychotherapy sessions. Deborah's prognosis at the conclusion of inpatient treatment was rated as guarded by treatment staff who believed she continued to manifest a low frustration tolerance, interpersonal manipulativeness, and a relative absence of guilt or remorse concerning antisocial behaviors. Within 6 months of her discharge from inpatient treatment, Deborah dropped out of high school and resumed involvement in drug use and distribution. Deborah was arrested on multiple counts of possession and distribution of drugs approximately 18 months following her discharge from the inpatient treatment unit. Unfortunately, negative treatment outcomes for the 49/9-4 codetype are common among both adolescents and adults. Marks and Seeman (1963) reported that adult psychiatric patients admitted with a 4-9/9-4 profile also tended to produce the same codetype at discharge. Lachar (1974) reported that 80% of adult patients with an admission codetype of 4-9 were rated as unimproved at the time of discharge from treatment. Given that Deborah was only 17 years old at the time of this MMPI-A evaluation, it would be inappropriate to assume that these characteristics had the same stability as might be expected for a 30-year-old adult with similar profile features. It does seem probable, however, that had a discharge profile been obtained for this adolescent, it would have remained a 4-9 configural pattern. The consistency of the 4-9 profile through an admission, discharge, and readmission process is presented in a clinical case example on the original MMPI by Archer (1987). The MMPI-A admission characteristics exhibited by Deborah underscored the treatment challenges and difficulties that were likely to be encountered in this case. Conclusions This chapter provided an overview of the MMPI-A, including a description of the development of the test instrument and the MMPI-A normative sample. A number of recommendations were offered for the clinical use of the MMPI-A in treatment planning, and directions for future research were also noted. These research areas included the need for additional codetype congruency studies between the original form of the MMPI and the MMPI-A to determine the generalizability of findings from the original form to the revised instrument. Additional external validity studies also will prove useful in establishing the correlate meanings for MMPI-A measures, particularly those newly created for this instrument. Research also should be focused in the area of the sensitivity and specificity of the MMPI-A in the accurate detection of adolescents with and without histories of psychiatric symptomatology. Finally, this chapter reviewed the use of the MMPI-A for treatment outcome evaluation. Areas of relative strengths and weaknesses were noted, and emphasis was placed on interpreting change related to the factor structure of the instrument and validity scale findings. Future research studies on the MMPI-A should look at controlled treatment outcome studies in which the test instrument is used in a test-retest format to assess psychological change as a function of psychotherapy.

< previous page

page_376

next page >

< previous page

page_377

next page > Page 377

References Achenbach, T.M., & Edelbrock, C. (1983). Manual for the Child Behavior Checklist and Revised Child Behavior Checklist. Burlington, VT: Department of Psychiatry, University of Vermont. Alperin, J.J., Archer, R.P., & Coates, G.D. (1996). Development and effects of an MMPI-A K-correction procedure. Journal of Personality Assessment, 67, 155-168. Anastasi, A. (1982). Psychological testing (5th ed.). New York: MacMillan. Archer, R.P. (1984). Use of the MMPI with adolescents: A review of salient issues. Clinical Psychology Review, 4, 241-251. Archer, R.P. (1987). Using the MMPI with adolescents. Hillsdale, NJ: Lawrence Erlbaum Associates. Archer, R.P. (1992a). MMPI-A interpretive system [computer program]. Odessa, FL: Psychological Assessment Resources, Inc. Archer, R.P. (1992b). Review of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2). The 10th Mental Measurements Yearbook. Lincoln, NE: Buros Institute of Mental Measurements. Archer, R.P. (1996). MMPI-A interpretive system [computer program]. Odessa, FL: Psychological Assessment Resources, Inc. Archer, R.P. (1997). MMPI-A: Assessing adolescent psychopathology (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. Archer, R.P., Belevich, J.K.S., & Elkins, D.E. (1994). Item-level and scale-level factor structures of the MMPIA. Journal of Personality Assessment, 62, 332-345. Archer, R.P., & Gordon, R.A. (1988). MMPI and Rorschach indices of schizophrenic and depressive diagnoses among adolescent inpatients. Journal of Personality Assessment, 52, 276-287. Archer, R.P., & Gordon, R.A. (1994). Psychometric stability of MMPI-A item modifications. Journal of Personality Assessment, 62, 416-426. Archer, R.P., Gordon, R.A., Anderson, G.L., & Giannetti, R.A. (1989). MMPI special scale correlates for adolescent inpatients. Journal of Personality Assessment, 52, 707-721. Archer, R.P., & Jacobson, J.M. (1993). Are critical items ''critical" for the MMPI-A? Journal of Personality Assessment, 61, 547-556. Archer, R.P., & Klinefelter, D. (1991). MMPI factor analytic findings for adolescents: Item- and scale-level factor structures. Journal of Personality Assessment, 57, 356-367. Archer, R.P., & Klinefelter, D. (1992). Relationships between MMPI codetypes and MAC scale elevations in adolescent psychiatric samples. Journal of Personality Assessment, 58, 149-159. Archer, R.P., & Krishnamurthy, R. (1993a). Combining the Rorschach and MMPI in the assessment of adolescents. Journal of Personality Assessment, 60, 132-140. Archer, R.P., & Krishnamurthy, R. (1993b). A review of MMPI and Rorschach interrelationships in adult samples. Journal of Personality Assessment, 61, 277-293. Archer, R.P., & Krishnamurthy, R. (1994). A structural summary approach for the MMPI-A: Development and empirical correlates. Journal of Personality Assessment, 63, 554-573. Archer, R.P., & Krishnamurthy, R. (1997a). MMPI-A and Rorschach indices related to depression and conduct disorder: An evaluation of the incremental validity hypothesis. Journal of Personality Assessment, 69, 517-533. Archer, R.P., & Krishnamurthy, R. (1997b). MMPI-A scale-level factor structure: Replication in a clinical sample. Assessment, 4, 337-349. Archer, R.P., Krishnamurthy, R., & Jacobson, J.M. (1994). MMPI-A casebook. Odessa, FL: Psychological Assessment Resources, Inc. Archer, R.P., Maruish, M., Imhof, E.A., & Piotrowski, C. (1991). Psychological test usage with adolescent clients: 1990 survey findings. Professional Psychology: Research and Practice, 22, 247-252. Archer, R.P., Pancoast, D.L., & Gordon, R.A. (1994). The development of the MMPI-A Immaturity (IMM) scale: Findings for normal and clinical samples. Journal of Personality Assessment, 62, 145-156. Archer, R.P., Pancoast, D.L., & Klinefelter, D. (1989). A comparison of MMPI code types produced by traditional and recent

< previous page

page_378

next page > Page 378

adolescent norms. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 1, 23-29. Archer, R.P., White, J.L., & Orvin, G.H. (1979). MMPI characteristics and correlates among adolescent psychiatric inpatients. Journal of Clinical Psychology, 35, 498-504. Barron, F. (1953). An Ego-Strength scale which predicts response to psychotherapy. Journal of Consulting Psychology, 17, 327-333. Butcher, J.N. (1987). Computerized psychological assessment: A practitioner's guide. New York: Basic Books. Butcher, J.N. (1990). MMPI-2 in psychological treatment. New York: Oxford University Press. Butcher, J.N., Dahlstrom, W.G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). Minnesota Multiphasic Personality Inventory-2 (MMPI-2): Manual for administration and scoring. Minneapolis: University of Minnesota Press. Butcher, J.N., & Hostetler, K. (1990). Abbreviating MMPI item administration: What can be learned from the MMPI for the MMPI-2? Psychological Assessment: A Journal of Consulting and Clinical Psychology, 2, 12-21. Butcher, J.N., & Owen, P.L. (1978). Objective personality inventories: Recent research and some contemporary issues. In B. Wolman (Ed.), Clinical diagnoses of mental disorders (pp. 475-546). New York: Plenum. Butcher, J.N., & Tellegen, A. (1978). Common methodological problems in MMPI research. Journal of Consulting and Clinical Psychology, 46, 620-628. Butcher, J.N., & Williams, C.L. (1992a). Essentials of MMPI-2 and MMPI-A interpretation. Minneapolis: University of Minnesota Press. Butcher, J.N., & Williams, C.L. (1992b). The Minnesota report: Adolescent interpretive system [computer program]. Minneapolis: National Computer Systems. Butcher, J.N., Williams, C.L., Graham, J.R., Archer, R.P., Tellegen, A., Ben-Porath, Y. S., & Kaemmer, B. (1992). MMPI-A (Minnesota Multiphasic Personality Inventory-Adolescent): Manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press. Caldwell, A.B. (1976, January). MMPI profile types. Paper presented at the 11th Annual MMPI Workshop and Symposium, sponsored by the University of Minnesota Press, Minneapolis, MN. Capwell, D.F. (1945a). Personality patterns of adolescent girls: I. Girls who show improvement in IQ. Journal of Applied Psychology, 29, 212-228. Capwell, D.F. (1945b). Personality patterns of adolescent girls: II. Delinquents and nondelinquents. Journal of Applied Psychology, 29, 284-297. Ciarlo, J.A., Edwards, D.W., Kiresuk, T.J., Newman, F.L., & Brown, T.R. (1981). Final report: The assessment of client/patient outcome techniques for use in mental health programs (NIMH Contract No. 278-80-0005DB). Denver, CO: University of Denver. Dahlstrom, W.G., Archer, R.P., Hopkins, D.G., Jackson, E., & Dahlstrom, L.E. (1994). Assessing the readability of the Minnesota Multiphasic Personality Inventory Instruments: The MMPI, MMPI-2, MMPI-A (MMPI2/MMPI-A Test Rep. No. 2). Minneapolis: University of Minnesota Press. Ehrenworth, N.V., & Archer, R.P. (1985). A comparison of clinical accuracy ratings of interpretive approaches for adolescent MMPI responses. Journal of Personality Assessment, 49, 413-421. Exner, J.E. (1986). The Rorschach: A comprehensive system: Vol. 1. Basic foundations (2nd ed.). New York: Wiley. Finn, S.E. (1996). Manual for using the MMPI-2 as a therapeutic intervention. Minneapolis: University of Minnesota Press. Gallucci, N.T. (1990). On the synthesis of information from psychological tests. Psychological Reports, 67, 1243-1260. Gallucci, N.T., Kay, D.C., & Thornby, J.I. (1989). The sensitivity of 11 substance abuse scales from the MMPI to changing clinical status. Psychology of Addictive Behaviors, 3, 29-33. Goldberg, L.R. (1965). Diagnosticians versus diagnostic signs: The diagnosis of psychosis versus neurosis from the MMPI. Psychological Monographs, 79 (Whole No. 602). Graham, J.R. (1993). MMPI-2: Assessing personality and psychopathology (2nd ed.). New York: Oxford University Press. Greene, R.L. (1989). Assessing the validity of MMPI profiles in clinical settings. Clinical

< previous page

page_379

next page > Page 379

Notes on the MMPI (No. 11). Minneapolis: National Computer Systems. Greene, R.L. (1991). The MMPI-2/MMPI: An interpretive manual. Boston: Allyn & Bacon. Harris, R.E., & Lingoes, J.C. (1955). Subscales for the MMPI: An end to profile interpretation. Unpublished manuscript, University of California. Hathaway, S.R. (1956). Scales 5 (Masculinity-Femininity), 6 (Paranoia), and 8 (Schizophrenia). In G.S. Welsh & W.G. Dahlstrom (Eds.), Basic readings on the MMPI in psychology and medicine (pp. 104-111). Minneapolis: University of Minnesota Press. Hathaway, S.R., & Monachesi, E.D. (1963). Adolescent personality and behavior: MMPI patterns of normal, delinquent, dropout, and other outcomes. Minneapolis: University of Minnesota Press. Hedlund, J.L., Won Cho, D., & Wood, J.D. (1977). Comparative validity of MMPI-168 factors in clinical scales. Multivariate Behavior Research, 12, 327-329. Henrichs, T.F. (1964). Objective configural rules for discriminating MMPI profiles in a psychiatric population. Journal of Clinical Psychology, 20, 157-159. Henrichs, T.F. (1966). A note on the extension of MMPI configural rules. Journal of Clinical Psychology, 22, 51-52. Herkov, M.J., Archer, R.P., & Gordon, R.A. (1991). MMPI response sets among adolescents: An evaluation of limitations of the Subtle-Obvious subscales. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 3, 424-426. Imhof, E.A., & Archer, R.P. (1997). Correlates of the MMPI-A Immaturity (IMM) scale in an adolescent psychiatric sample. Assessment, 4, 169-180. Janus, M.D., Tolbert, H., Calestro, K., & Toepfer, S. (1996). Clinical accuracy ratings of MMPI approaches for adolescents: Adding 10 years and the MMPI-A. Journal of Personality Assessment, 67, 364-383. Johnson, C., Archer, R.P., Sheaffer, C.I., & Miller, D. (1992). Relationships between the MAPI and the MMPI in the assessment of adolescent psychopathology. Journal of Personality Assessment, 58, 277-286. Koss, M.P., & Butcher, J.N. (1973). A comparison of psychiatric patients' self-report with other sources of clinical information. Journal of Research and Personality, 7, 225-236. Krishnamurthy, R., Archer, R.P., & House, J.J. (1996). The MMPI-A and Rorschach: A failure to establish convergent validity. Assessment, 3, 179-191. Lachar, D. (1974). The MMPI: Clinical assessment and automated interpretation. Los Angeles: Western Psychological Services. Lachar, D., & Wrobel, T.A. (1979). Validating clinicians' hunches: Construction of a new MMPI critical item set. Journal of Consulting and Clinical Psychology, 47, 277-284. Lewak, R.W., Marks, P.A., & Nelson, G.E. (1990). Therapist guide to the MMPI & MMPI-2: Providing feedback and treatment. Muncie, IN: Accelerated Development, Inc. MacAndrew, C. (1965). The differentiation of male alcoholic outpatients from nonalcoholic psychiatric outpatients by means of the MMPI. Quarterly Journal of Studies on Alcohol, 26, 238-246. Marks, P.A., & Briggs, P.F. (1972). Adolescent norm tables for the MMPI. In W.G. Dahlstrom, G.S. Walsh, & L.E. Dahlstrom (Eds.), An MMPI handbook: Vol. 1. Clinical interpretation (rev. ed., pp. 388-399). Minneapolis: University of Minnesota Press. Marks, P.A., & Lewak, R.W. (1991). The Marks MMPI adolescent feedback and treatment report [computer program]. Los Angeles: Western Psychological Services. Marks, P.A., & Seeman, W. (1963). The actuarial description of personality: An atlas for use with the MMPI. Baltimore: Williams & Wilkins. Marks, P.A., Seeman, W., & Haller, D.L. (1974). The actuarial use of the MMPI with adolescents and adults. New York: Oxford University Press. Meehl, P.E., & Dahlstrom, W.G. (1960). Objective configural rules for discriminating psychotic from neurotic MMPI profiles. Journal of Consulting Psychology, 24, 375-387. Moreland, K.L. (1983). Diagnostic validity of the MMPI and two short forms. Journal of Personality Assessment, 47, 492-394. Pancoast, D.L., Archer, R.P., & Gordon, R.A. (1988). The MMPI and clinical diagnosis: A comparison of classification system outcomes with discharge diagnoses. Journal of Personality Assessment, 52, 81-90.

Shaevel, B., & Archer, R.P. (1996). Effects of MMPI-2 and MMPI-A norms on T-score

< previous page

page_379

next page >

< previous page

page_380

next page > Page 380

elevations for 18-year-olds. Journal of Personality Assessment, 67, 71-77. Spivack, G., Haimes, P.E., & Spotts, J. (1967). Devereux Adolescent Behavior Rating Scale manual. Devon, PA: The Devereux Foundation. Tyler, F.T. (1951). A factorial analysis of fifteen MMPI scales. Journal of Consulting Psychology, 15, 451-456. Ullmann, L.P., & Wiggins, J.S. (1962). Endorsement frequency in the number of differentiating MMPI items to be expected by chance. Newsletter of Research in Psychology, 4, 29-35. Welsh, G.S. (1956). Factor dimensions A and R. In G.S. Welsh & W.G. Dahlstrom (Eds.), Basic readings on the MMPI in psychology and medicine (pp. 264-281). Minneapolis: University of Minnesota Press. Wiggins, J.S. (1966). Substantive dimensions of self-report in the MMPI item pool. Psychological Monographs, 80 (22, Whole No. 630). Wiggins, J.S. (1969). Content dimensions in the MMPI. In J.N. Butcher (Ed.), MMPI: Research developments and clinical applications (pp. 127-180). New York: McGraw-Hill. Williams, C.L. (1986). MMPI profiles from adolescents: Interpretive strategies and treatment considerations. Journal of Child and Adolescent Psychotherapy, 3, 179-193. Williams, C.L., Ben-Porath, Y.S., & Hevern, B.W. (1994). Item level improvements for use of the MMPI with adolescents. Journal of Personality Assessment, 63, 284-293. Williams, C.L., Butcher, J.N., Ben-Porath, Y.S., & Graham, J.R. (1992). MMPI-A content scales: Assessing psychopathology in adolescents. Minneapolis: University of Minnesota Press.

< previous page

page_380

next page >

< previous page

page_381

next page > Page 381

Chapter 13 Studying Outcome in Adolescents: The Millon Adolescent Clinical Inventory and Millon Adolescent Personality Inventory Roger D. Davis Mark Woodward Antonio Goncalves Institute for Advanced Studies in Personology and Psychopathology Sarah E. Meagher University of Miami And Institute for Advanced Studies in Personology and Psychopathology Theodore Millon Harvard University, University of Miami And Institute for Advanced Studies in Personology and Psychopathology The Millon inventories have proven extremely popular with clinicians. Although the Millon Clinical Multiaxial Inventory (MCMI), the Millon Behavioral Health Inventory (MBHI), and the Millon Adolescent Personality Inventory (MAPI) are among the top 10 tests administered by psychologists (Piotrowski & Lubin, 1990), the MAPI and its recent revision, the Millon Adolescent Clinical Inventory (MACI), are the only instruments specifically designed for an adolescent population. Like the MCMI, the MAPI and MACI were constructed to be consonant with the multiaxial format of the Diagnostic and Statistical Manual of Mental Disorders (DSM), and are thus geared toward the assessment of both the problematic behaviors and clinical conditions of Axis I, and the personality variables of Axis II. Although the main focus of this chapter is the MACI, the MAPI remains in widespread use, and is presented as well. Scales and Structure of the MACI and MAPI The MAPI is an 150-item, true-false, self-report inventory consisting of eight Personality Styles scales, eight Expressed Concerns scales, and four Behavioral Correlate scales. The eight personality styles described in the MAPI mirror the styles posited by Millon's (1969) theory of personality. These styles, at maladaptive levels, correspond somewhat to the personality disorders described in the DSM-III-R (American Psychiatric Association, 1987). However, a decision was made to avoid the term disorder, defined in DSM-III-R (p. 335) as referring to "behaviors or traits that are characteristic of the person's recent (past year) and long-term functioning since early adulthood," because

< previous page

page_381

next page >

< previous page

page_382

next page > Page 382

the MAPI is normed for adolescents as young as 13 years old. The eight Expressed Concerns scales focus on worries that many teens experience at one time or another, and the remaining four scales address specific behavioral issues. Previously, two separate answer forms were available, the MAPI (G) for educational and guidance purposes and the MAPI(C) for clinical cases. The MACI, with its several new, clinically oriented scales, supplants the MAPI(C) for use in assessing clinical cases within the teenage population. The MAPI is now intended only for nonclinical educational and vocational appraisals, and can be used with a sixth-grade or higher reading level. The structure of the MACI and MAPI are given in Table 13.1. Scale descriptions for the MACI are given in Table 13.2. Since the publication of the DSM-III in 1980, a total of 14 personality constructs have been represented in the body of Axis II or in the appendix. Sadistic and Self-Defeating were added to the appendix of DSM-III-R. In the DSM-IV (APA, 1994), both of these disorders were dropped, the Depressive was added, and the content of the Passive-Aggressive was broadened and renamed the Negativistic, and both disorders were placed in the appendix. TABLE 13.1 Structure and Scale of the MAPI and MACI MAPI MACI Personality Styles 1. Introversive 1. Introversive 2. Inhibited 2a. Inhibited 2b. Depressive 3. Cooperative 3. Submissive 4. Sociable 4. Dramatizing 5. Confident 5. Egotistic 6. Forceful 6a. Unruly 6b. Forceful 7. Respectful 7. Conforming 8. Sensitive 8a. Oppositional 8b. Self-demeaning 9. Borderline Tendency Expressed Concerns A. Self-concept A. Identity Confusion B. Personal Esteem B. Self-devaluation C. Body Comfort C. Body Disapproval D. Sexual Acceptance D. Sexual Discomfort E. Peer Security E. Peer Insecurity F. Social Tolerance F. Social Insensitivity G. Family Rapport G. Family Discord H. Academic Confidence Behavioral Correlates Clinical Indices SS. Impulse Control AA.Eating Dysfunctions TT. Societal Conformity BB. Academic Noncompliance UU. Scholastic CC. Alcohol Predilection Achievement WW. Attendance DD.Drug Proneness Consistency EE. Delinquent Disposition FF. Impulsivity Propensity GG.Anxious Feelings HH.Depression Affect . II. Suicidal Ideation

< previous page

page_382

next page >

< previous page

page_383

next page > Page 383

State

TABLE 13.2 MAPI Scale Descriptions Descriptors Personality Styles

Introversive

Quiet and unemotional; interpersonally remote due to indifference toward others Inhibited Shy; socially ill at ease; lonely, yet keeps to self due to fear of Cooperative rejection Avoids asserting self, letting others take the lead; plays down own achievements and underestimates own abilities; kind and sentimental in relationships Sociable Talkative; charming; dramatic and emotionally expressive; easily bored with routine and long-term relationships Confident Rarely doubtful of own self-worth; seen by others as selfcentered and egocentric; takes others for granted Forceful Tends to lead and dominate; strong willed and tough minded; blunt, unkind, and impatient with others Respectful Rule conscious, serious minded, and efficient; lives orderly life; Sensitive avoids Unexpected and unpredictable situations; behaves properly; unpredictable shifts of mood; negative attitude; discontented and pessimistic Expressed Concerns Self-concept Clarity of one's identity or self-image Personal Level of satisfaction with oneself Esteem Body Level of satisfaction with body development and personal Comfort appearance Sexual Attitudes regarding emerging sexuality and its associated Acceptance impulses Peer Feelings of acceptance by one's peers Security Social Degree of empathy for others, especially peers Tolerance Family Degree of conflict and tension with family members Rapport Academic Extent to which one feels comfortable and/or satisfied with Confidence school performance Behavioral Correlates Impulse Degree of control over problematic impulses Control Societal Inability or unwillingness to comply with social regulations Conformity Scholastic Influences resulting in underachievement Achievement Attendance Signs of either school phobia or school truancy Consistency The magnitude of these content changes required that the MAPI(C) be revised in order to coordinate the Millon clinical inventories more closely with the DSM-IV. The resulting revision of the MAPI(C), the MACI, is a 160item, true-false, self-report inventory that both corresponds more closely to the DSM-IV personality constructs and assesses those clinical issues seen more frequently among troubled adolescents. Whereas the distinction between incipient adolescent personality styles and adult personality disorders was retained, all MACI scales received more negatively toned labels to reflect the inventory's clinical focus. The MACI's 12 personality scales include revisions of the original 8 from the MAPI, as well as the Doleful, Forceful, Self-demeaning, and Borderline Tendency scales. The clinical codes for these constructs parallel those of the MCMI and reflect the underlying generative theory on which all the Millon inventories are based. Changes also have been made to the Expressed Concerns scales. Whereas the MAPI focused on expressed concerns within the context of a more normal adolescence, the expressed concerns of more clinically disordered youths assume a more negative tone. The MAPI measures level of Personal Esteem, and the MACI assesses Self-devaluation. Family Rapport in the MAPI is translated into Family Discord in the MACI, and so on. Similarly, the item content of these scales has been revised to allow discrimination

< previous page

page_383

next page >

< previous page

page_384

next page > Page 384

within clinical populations. Moreover, whereas the MAPI includes four scales that address the behavioral issues of Impulse Control, Societal Conformity, Scholastic Achievement, and Attendance Consistency, the events that bring adolescents to the attention of clinicians often take the form of more maladjusted behaviors. For this reason, the MACI includes nine Clinical Indices oriented to such serious problems as eating dysfunctions, substance dependencies, mood disorders, and nonconformity behaviors. Given its increased clinical focus, approximately 70% of the MACI items are unique, that is, not contained in the MAPI. Construction of the MAPI and MACI The Role of Theory in Test Construction Unlike most instruments widely used in psychological assessment, both the MAPI and MACI were constructed through a synthesis of theoretical and empirical perspectives, notably the biopsychosocial reinforcement (Millon, 1969, 1981) and evolutionary theories (Millon, 1990) of personality and its disorders. Although an early proponent of a relatively blind, empirical, criterion keying approach used to develop tests such as the MMPI, Meehl (1972) took the view that theory should be an indispensable part of test construction: "I now think that at all stages in personality test development, from initial phase of item pool construction to a late-stage optimized clinical interpretive procedure for the fully developed and "validated" instrument, theory; and by this I mean all sorts of theory, including trait theory, developmental theory, learning theory, psychodynamics, and behavior genetics should play an important role" (p. 150). The theory underlying the eight basic personality styles assessed by the MAPI can be explained using two basic dimensions to form a four-by-two matrix. One dimension describes an individual's basic coping pattern as either active or passive, depending on how the person usually behaves to obtain pleasure and minimize pain. The other dimension pertains to the primary source from which the individual gains this reinforcement, either from self of others. Individuals who receive little reinforcement from self or others are termed "Detached." Individuals whose values are based primarily on what others think and feel about them are called "Dependent," and those who derive reinforcement through themselves are termed "Independent." Finally, some persons, termed "Ambivalent,'' develop a style born out of conflict between opposing dependent and independent tendencies. Crossing these theoretical dimensions results in the eight personality styles addressed by the MAPI: the passivedetached (Introversive), active-detached (Inhibited), passive-dependent (Cooperative), active-dependent (Sociable), passive-independent (Confident), active-independent (Forceful), passive-ambivalent (Respectful), and active-ambivalent (Sensitive). In contrast, the theory on which the MACI is grounded reflects advances both in Millon's personality theory (Millon, 1990) and recent developments in the DSM. A supplementary dimension has been added, reflecting a reversal of reinforcement between pleasure and pain. Those termed passive-discordant were referred to as "selfdefeating personalities" in the DSM-III-R, whereas those termed active-discordant were referred to as "sadistic personalities." Additionally, the MACI includes a scale that assesses structural pathology of personality, the Borderline Tendency scale. The Depressive

< previous page

page_384

next page >

< previous page

page_385

next page > Page 385

personality, presented in the appendix of DSM-IV, is interpreted as a passive-pain orientation; its clinical code reflects it relation to the Avoidant personality. The former represents an acceptance of pain, whereas the latter reflects the anticipation of pain. The adolescent stylistic variants of these disorders are represented in the Doleful (2b) and Inhibited (2a) scales, respectively. Admittedly, the pervasiveness of both depression and anxiety across both Axis I and Axis II presents challenges to psychometricians who would tease apart what is long-standing and pervasive from what is transient and situational or reactive. Stages of Development Validity is a consideration at all phases of test development, not a quality to be examined once inventory items have been finalized. In contrast to such established inventories as the MMPI-2 and MMPI-A, modern psychological inventories are constructed by balancing a variety of theoretical-substantive, internal-structural, and external-criterion parameters (Jackson, 1970; Loevinger, 1957). This section reviews the construction of the MAPI, because it served as the foundation of the MACI. The theoretical-substantive stage concerns how closely the content of the individual scale items match the guiding theory behind the instrument and the constructs its measures. For the MAPI, the initial theory-driven item pool for the personality style scales was derived from personality and abnormal psychology textbooks and a review of other psychological tests. Over 1,000 items formed the initial pool, many of which were specially written for their respective constructs. After numerous studies (see Millon, Green, & Meagher, 1982, for a synopsis), the MAPI Personality Style scales were trimmed to just 64 items, and the Expressed Concerns scales to 80 items. Six validational items were generated for a total MAPI of 150 items. The second stage of test construction, internal-structural validation, was driven by theoretically predicted relations between scales, not factorial requirements. Because the underlying theory predicts a certain degree of scale overlap, internal-structural validation could not center on a factor analytic search for pure personality traits. Both the Inhibited and Introversive personality styles, for example, are related through their detached coping style. Likewise, content overlap also may occur logically between some Personality Style scales and those in the Expressed Concerns, because some personalities are inclined toward particular concerns and issues rather than others. The goal of internal-structural validation, then, was not the elimination of items that could be logically assigned to multiple scales. Instead, internal scale consistency required that each particular item show its strongest, but not necessarily its only, correlation with its own theoretically designated scale. The assignment of items to multiple scales also allows the number of test items to be kept at a minimum. The last stage, external-criterion validation, involved the administration of the final test form to a 2,157-member "normal" comparison group, and a 430-member "problem" criterion group chosen from clinical and school counseling settings. Item responses from individuals with specific diagnosed psychopathology were then compared to the responses within the criterion group. This procedure enhances differential diagnosis, and stands in contrast to the approach used to construct some other personality inventories. For example, the authors of the MMPI simply compared the responses of groups judged to belong to particular diagnostic categories with the responses of "normals." Meehl and Rosen (1955) argued persuasively against such a procedure. External validation

< previous page

page_385

next page >

< previous page

page_386

next page > Page 386

also included clinical judgment data from the psychologists, counselors, and social workers who administered the MAPI to the 430 clinical criterion group subjects. Blind to the results of the test, these professionals were asked to rate their respective clients using a "clinical judgment form," which described the eight basic personality styles. The four Behavioral Correlates scales were derived by determining which items statistically differentiated criterion from comparison groups. Although the significant items were assessed later as to their content and internal consistency, empirical considerations were given primary attention with these four scales. Construction of the MACI followed the same three-stage logic outlined earlier, building on the foundation created by the MAPI. The MACI now includes three modifying indices that assess the response styles of examinees. The first scale, Disclosure, appraises the degree to which patients are open and revealing of themselves. The two other scales, Desirability and Debasement, assess efforts to present oneself in a good or bad light, respectively. Because the results of these response styles affects the validity of other scales, they were used to develop certain correction factors. This idea should not be new to persons familiar with tests like the MMPI and MCMI, which use such scales for similar purposes. Additionally, the modifying scales may be, in and of themselves, of intrinsic interest to clinicians. Information regarding the way patients wish to present themselves, for example, by responding openly and frankly, or by denying or concealing pathology, is often of special assistance to clinicians during early treatment planning. Reliability Test-retest and cross-validated internal consistency reliabilities for the MACI and MAPI are presented in Table 13.3. All are sufficient to superior for clinical assessment and support the use of the MACI and MAPI in outcome research. Scoring and Interpretation Third-party payers are increasingly requesting documentation in support of psychological diagnoses. Although the responsibility of mental health professionals is primarily to the welfare of their clients, psychological assessment should nevertheless serve both sides beneficially. Here, outcome assessment is concerned with a single subject. At the beginning of treatment, the question is: What are the subject's clinical diagnoses, and how do they relate to the subject's personality characteristics and level of functioning and current psychosocial milieu? Near the end of treatment, the question is: Which of the subject's problems have been addressed, and what degree of progress has been made? Step 1 Scoring Proper use of the MACI and MAPI first requires that the instruments be properly scored. Given the number of scales that comprise both the MAPI and the MACI, their assignment to multiple scales and differential weighting, the existence of separate group norms based on gender and age differences, the various correction factors that may be

< previous page

page_386

next page >

< previous page

page_387

next page > Page 387

TABLE 13.3 MACI Alpha and Test-Retest Reliability Coefficients MACI Scales Number CrossTestof Items validation Retest Personality Patterns 1 Introversive 44 .82 .63 2A Inhibited 37 .86 .70 2B Doleful 24 .85 .83 3 Submissive 48 .73 .88 4 Dramatizing 41 .84 .70 5 Egotistic 39 .82 .82 6A Unruly 39 .83 .79 6B Forceful 22 .81 .85 7 Conforming 39 .86 .91 8A Oppositional 43 .82 .76 8B Self-demeaning 44 .89 .88 9 Borderline Tendency 21 .86 .92 Expressed Concerns A Identity Diffusion 32 .76 .77 B Self-devaluation 38 .90 .85 C Body Disapproval 17 .84 .89 D Sexual Discomfort 37 .69 .74 E Peer Insecurity 19 .77 .57 F Social Insensitivity 39 .79 .83 G Family Discord 28 .76 .89 H Childhood Abuse 24 .81 .81 Clinical Syndromes AA Eating Dysfunction 20 .85 .78 BB Substance Abuse 35 .88 .90 Proneness CC Delinquent 34 .76 .80 Predisposition DD Impulsive Propensity 24 .75 .78 EE Anxious Feelings 42 .75 .85 FF Depressive Affect 33 .88 .81 GG Suicidal Tendency 25 .87 .91 Modifying Indices X Disclosure .86 Y Desirability 17 .75 .71 Z Debasement 16 .85 .84 Note. For cross-validation samples, N = 333; test-retest interval of 3-7 days, N = 47. invoked depending on response style, and the unavoidable probability of human error at some point in the scoring process, it becomes clear that only computer scoring can reduce the probability of scoring error to negligible levels. Besides accuracy and speed, computer scoring has a number of other distinct advantages. Test forms and computer scoring are available through the publisher. Completed forms may be processed for scoring and interpretation using an in-office, personal computer system that provides immediate feedback, or mailed or scanned using telecommunications devices. Most psychologists are familiar with the transformation of raw scores into other metrics, such as percentile ranks or T-scores. In contrast, in the MACI and MAPI raw scores are transformed in what are termed "Base Rate" (BR) scores. Whereas T-scores implicitly assume that the prevalence of personality patterns and clinical syndromes are distributed equally and normally within the clinical population, BR scores do not. Clinical wisdom argues strongly that within any given psychological treatment venue,

< previous page

page_387

next page >

< previous page

page_388

next page > Page 388

certain disorders are more common than others. For example, Borderline Personality Disorder and Depression are relatively more common, whereas Sadistic Personality and Systematic Delusions are relatively rare. If approximately twice as many subjects are diagnosed depressed as schizophrenic, then the test should be constructed to mirror this reality. The BR score is a distinct shift from more conventional cutoff assignments, not only in jettisoning the tradition of identifying pathology by employing a single, universal standard score cutoff for all scales (e.g., two standard deviations above the mean), but also in establishing cutoffs on prevalence data based on representative and relevant national norms. Step 2 Making Diagnoses For the MAPI, prevalence estimates were provided by the clinical appraisals of mental health professionals, who rated a total of 430 subjects in terms of their basic personality style. After minor adjustments to reduce false positives, two arbitrary BR thresholds were set as signifying the presence or prominence of scale features. Scale scores of BR 75 and above identify the presence of each personality style, expressed concern, behavioral correlate, or clinical index. For the MACI, prevalence estimates were determined not only through data from the normative sample, but also from the author's years of experience gained through use of the MAPI. In the MACI personality scales, the numbers of BR 75 and BR 85 indicate the presence of a trait or disorder or its prominence, respectively. The same is true for the Clinical Indices and Expressed Concerns scales. However, in cases where the term disorder is not applicableas with Family Discord, for examplean elevation at BR 75 or above indicates this to be an area worthy of clinical attention. Step 3 Configural Interpretation The interpretive yield of an instrument is significantly enhanced when its scales are interpreted configurally rather than one at a time. For the personality scales, configural interpretation is not just a matter of clinical convention. Instead, it is formally required by the nature of personality itself, the dynamic patterning of variables across all domains of the person. Just as individual scales stand in place of and assess hypothetical constructs, a personality profile should stand in place of the person. This naturally leads to the idea of "idiographic validity," person-centered parallel of construct validity. It reflects the extent to which the clinical formulation or case conceptualization derived from the assessment in fact mirrors individual reality. Because personality is an intrinsically integrative construct, an adequate assessment of personality requires a configural synthesis of the personality scales available in any given instrument. Idiographic validity is maximized where the scales that compose the personality profile are themselves anchored to a coherent and generative personality theory. The two or three most highly elevated Axis II scales are thus taken as the basic interpretive context for case conceptualization, into which the meaning of symptom patterns in conjunction with psychosocial events and individual history are integrated. Ideally, this idiographically valid case

< previous page

page_388

next page >

< previous page

page_389

next page > Page 389

formulation would become the foundation of treatment planning and thus the baseline against which any outcome assessment would be made. In contrast, there is no necessary reason why the Expressed Concerns and Clinical Indices scales must be interpreted configurally. Elevations among these scales are best interpreted in context of the Axis II personality styles scales, the Axis IV psychosocial milieu, and current stressors. The logic of the multiaxial model holds that personality characteristics transform the meaning of clinical syndromes. For example, depression in a narcissistic patient has a different meaning and requires a different intervention than does depression in a compulsive subject. In addition, valid interpretation also should take into account the individual's background, biographic history, and other auxiliary data. Step 4 Configural Domain Synthesis Personality styles and disorders are not unitary traits. Instead, they are higher order constructs representing characteristics covariant across the entire matrix of the person. Although personality may be considered as being exclusively psychodynamic or exclusively biological, such positions are narrow and restrictive. The integrative perspective encouraged here views personality as a multidetermined and multireferential construct that may be profitably studied and assessed across a variety of content areas. Diagnostic features are best distinguished in accord with the data levels they representbiophysical, intrapsychic, phenomenological, and behavioralin accord with the four historic approaches that characterize the study of psychopathology. These domains can be systematically organized in a manner similar to distinctions drawn in the biological realm, that is, by dividing them into structural and functional attributes. Functional domains represent "expressive modes of regulatory action," that is, behaviors, social conduct, cognitive processes, and unconscious mechanisms that manage, adjust, transform, coordinate, balance, discharge, and control the give and take of inner and outer life. The Expressive Acts domain relates to the observables seen at the "behavioral level" of data. The Interpersonal Conduct domain captures how actions impact on others, intended or otherwise; the attitudes that underlie, prompt, and give shape to these actions; the methods by which others are engaged in satisfying needs; and ways of coping with social tensions and conflicts. The Cognitive Style domain assesses how the patient focuses and allocates attention, encodes and processes information, organizes thoughts, makes attributions, and communicates reactions and ideas. Finally, the Regulatory Mechanisms domain captures efforts at self-protection, need gratification, and conflict resolution derived primarily from an unconscious level. In contrast to functional characteristics, structural attributes represent a deeply embedded and relatively enduring template of imprinted memories, attitudes, needs, fears, conflicts, and so on, which guide experience and transform the nature of ongoing life events. Psychic structures have an orienting and preemptive effect in that they alter the character of action and the impact of subsequent experiences in line with preformed inclinations and expectancies. For purposes of definition, structural domains may be conceived as substrates and action dispositions of a quasi-permanent nature. The Self-image domain seeks to crystallize the patient's implicit sense of who they are, though these schemes differ greatly in clarity, accuracy, and complexity. The Object Representations

< previous page

page_389

next page >

< previous page

page_390

next page > Page 390

domain synthesizes internalized representations of significant figures and relationships of the past. The Morphologic Organization domain conveys the overall architecture of the "psychic interior," referring to the structural strength, interior congruity, and functional efficacy of the personality system. Finally, MoodTemperament variables are conveyed by terms such as distraught, labile, fickle or hostile, but are revealed as well by level of activity, speech quality, and physical appearance. Domain descriptions for the Avoidant personality are given in Table 13.4 for illustrative purposes. Although these have been developed for adult personalities and are not presented in the MACI or MAPI manuals, their characteristics may be extrapolated backward to adolescents. There is, after all, continuity between adolescence and adulthood. Because the personalities of adolescents are, however, presumably more malleable or less crystallized than those of adults, (so that the term personality disorder is strictly inapplicable), clinicians who draw on these descriptions should adjust their interpretations to reflect lower levels of severity. Most examinees present with multiple scale elevations. Pure prototypes are seldom encountered in clinical practice. In the vast majority of cases, individuals receive elevated scores on multiple scales. TABLE 13.4 Avoidant Descriptors Behavioral Level: (F) Expressively Fretful (e.g., conveys personal unease and disquiet, a constant timorous, hesitant, and restive state; overreacts to innocuous events and anxiously judges them to signify ridicule, criticism, and disapproval). (F) Interpersonally Aversive (e.g., distances from activities that involve intimate personal relationships and reports extensive history of social pain-anxiety and distrust; seeks acceptance, but is unwilling to get involved unless certain to be liked, maintaining distance and privacy to avoid being shamed and humiliated). Phenomenological Level: (F) Cognitively Distracted (e.g., warily scans environment for potential threats and is preoccupied by intrusive and disruptive random thoughts and observations; an upwelling from within of irrelevant ideation upsets thought continuity and interferes with social communications and accurate appraisals). (S) Alienated Self-image (e.g., sees self as socially inept, inadequate, and inferior, thereby justifying isolation and rejection by others; feels personally unappealing, devalues self-achievements, and reports persistent sense of aloneness and emptiness). (S) Vexatious Objects (e.g., internalized representations are composed of readily reactivated, intense, and conflict-ridden memories of problematic early relationships; limited avenues for experiencing or recalling gratification, and few mechanisms to channel needs, bind impulses, resolve conflicts, or deflect external stressors). Intrapsychic Level: (F) Fantasy Mechanism (e.g., depends excessively on imagination to achieve need gratification, confidence building, and conflict resolution; withdraws into reveries as a means of safely discharging frustrated affectionate, as well as angry, impulses). (S) Fragile Organization (e.g., a precarious complex of tortuous emotions depends almost exclusively on a single modality for its resolution and discharge, that of avoidance, escape, and fantasy and, hence, when faced with personal risks, new opportunities, or unanticipated stress, few morphologic structures are available to deploy and few back-up positions can be reverted to, short of regressive decompensation). Biophysical Level: (S)

Anguished Mood (e.g., describes constant and confusing undercurrent of tension, sadness and anger; vacillates between desire for affection, fear of rebuff, embarrassment, and numbness of feeling).

< previous page

page_390

next page >

< previous page

page_391

next page > Page 391

Factor Subscales Personality may be described on several levels of abstraction. Personality styles represent the covariant structure of personality traits. When these styles are expressed rigidly, they tend to create and perpetuate problems over and over again. Alternately, they may predispose the person to the development of symptoms, and thus shade into personality disorders. The content of personality assessment instruments can be examined using any number of empirical methods, including cluster and factor analysis. If factor analysis is chosen, a decision must be made whether to factor scales or items. If items are chosen, a further decision must be made concerning whether the items should be grouped in some logical fashion, that is, whether the items assigned to Axis II should be factored separately from those assigned to Axis I, whether only the items within a particular personality cluster should be factored, or whether only the items within a particular scale should be factored. Further, where items are weighted depending on their centrality to the construct assessed, as in the Millon inventories, a decision must be made concerning whether only core features should be factored (for the MACI, those weighted either three or two points) or whether the analysis should include all scale items (i.e., both core and peripheral features). Different choices lead to different results. Thus far, exploratory studies with the MACI personality scales using data from the normative sample have been conducted by factoring all the items within each scale. First, 3-, 4-, 5-, 6-, and 7-factor solutions were extracted for each scale. Next, the resulting item loadings were inspected to determine which solution best conformed to theoretical expectations. Finally, the internal consistencies of each subscale were calculated, and those found to be inadequate were dropped. A list of the resulting factors is presented in Table 13.5. Because factor analysis relies on the covariance of items, not the item weight, the logical distinction between more core and more peripheral features is lost. Items that are assumed to be prototypal of their respective constructs cannot necessarily be assumed to be prototypal of the traits extracted through factor analysis. Moreover, the relatively large number of factors relative to scales, which already share a proportion of items, means that the issue of item overlap is amplified for the subscales. Some subscales share all their items. The surviving subscales were thus named within the context of the personality style from which they were derived, and not on the basis of item content alone. The advantage of this method is that it provides a loose guide to the content of the personality prototypes factored. Future factor studies will be directed toward comparing results with more central items, those weighted three and two points, with those using both more central and more peripheral features, which constitute the factors presented here. The Automated Interpretive Report In addition to their own interpretive skills, clinicians have available to them a large database of automated MACI and MAPI reports. Whereas professionals may be able to remember some features associated with a number of the more common profile configurations, a computer provides quick access to all salient data for each possible profile. Moreover, even experienced clinicians may have difficulty interpreting novel profiles. Due to their low frequency, these relatively uncommon configurations receive little actuarial attention. Both the MACI and MAPI use the underlying theory to inform

< previous page

page_391

next page >

< previous page

page_392

next page > Page 392

TABLE 13.5 Factor Content Scales of MACI Personality Scales Scale Factors Introversive Existential Aimlessness Anhedonic Affect Social Isolation Sexual Indifference Inhibited Existential Sadness Preferred Detachment Self-conscious Restraint Sexual Aversion Rejection Feelings Unattractive Self-image Doleful Brooding Melancholia Social Joylessness Self-destructive Ideation Abandonment Fears Submissive Deficient Assertiveness Authority Respect Pacific Disposition Attachment Anxiety Social Correctness Guidance Seeking Dramatizing Convivial Sociability Attention Seeking Attractive Self-image Optimistic Outlook Behavioral Disinhibition Egotistic Admirable Self-image Social Conceit Confident Purposefulness Self-assured Independence Empathic Indifference Superiority Feelings Unruly Impulsive Disobedience Socialized Substance Abuse Authority Rejection Unlawful Activity Callous Manipulation Sexual Absorption Forceful Intimidating Abrasiveness Precipitous Anger Emphatic Deficiency Conforming Interpersonal Restraint Emotional Rigidity Rule Adherence Social Conformity Responsible Conscientiousness Oppositional Self-punitiveness Angry Dominance Resentful Discontent Social Inconsiderateness Contrary Conduct Self-demeaning Self-ruination Low Self-valuation Undeserving Self-image Hopeless Outlook Borderline Tendency Empty Loneliness Capricious Reactivity Uncertain Self-image Suicidal Impulsivity

< previous page

page_392

next page >

< previous page

page_393

next page > Page 393

such sparse actuarial data. The resulting interpretive report is considered a professional consultation, and constitutes a rich source of clinical hypotheses from which relevant descriptive paragraphs or phrases may be culled when writing the final interpretive report. Every individual is unique, but there are only a finite number of clinical reports and variations. Thus, the professional is advised to personalize the report by adapting it to the unique characteristics and psychosocial situation of the subject. Treatment Planning The idea of using standardized instruments for treatment planning and the assessment of outcome is controversial. According to Choca, Shanley, and Van Denburg (1992), some maintain that the most important information about a client can only be obtained through personal interview sessions, and others contend that testing before the onset of, or during, treatment obfuscates the therapeutic relationship (Dewald, 1967). Finally, some researchers have attached little clinical significance to assessment or diagnoses (Beutler, 1989), whereas others have believed testing during treatment to be almost always detrimental (Langs, 1973). However, Choca et al. (1992) also cited several other sources that show that assessment is relied on and encouraged by a sizable number of clinicians (Berndt, 1983; van Reken, 1981). In some cases, the individual's current psychic state is such that immediate intervention is warranted to protect the subject from self or others. Although these conditions are typically assessed as part of the clinical interview, the subject's status may be further inspected through the examination of so-called noteworthy responses. Here, the response to a single item suggests a condition requiring immediate clinical attention, such as suicidal or homicidal intentions. For example, Item 16 states, "I think everyone would be better off if I were dead." Alternately, a noteworthy response may suggest conditions that should be addressed in therapy. For example, Item 137 states, "People did things to me sexually when I was too young to understand." Most clinical cases, however, do not require immediate crisis hospitalization. In the era of managed care, therapy is brief, and the most relevant clinical goal is remediation of those problems that are currently most pressing. Although personality provides an important context for the development of Axis I symptoms, brief therapy requires that only the most troublesome issues be considered. Here, Personality Style scales are deemphasized, and the Expressed Concerns and Clinical Indices become the proper focus of treatment efforts. Given that only the most observable and vivid problems will be treated, behavioral or cognitive-behavioral interventions can be expected to dominate. The clinical question is: How can current problems best be addressed or stopped? Whatever direction therapy eventually takes, the relatively high test-retest reliabilities of the MACI scales makes outcome assessment a relatively simple affair: The test can simply be administered again at a later date, with the difference between beginning and final BR scores becoming a rough measure of therapeutic change. Where therapy is less time limited, the focus shifts from the immediate problem to the subject's characteristic way of viewing and responding to the world as the major predisposing factor in the development and perpetuation of psychological symptoms. Here, the Personality Style scales move into the foreground. The clinical question is: What characteristics do individuals possess that cause them to perpetuate the same dysfunctional coping responses over and over again? Rigid and extreme personality

< previous page

page_393

next page >

< previous page

page_394

next page > Page 394

styles are thus viewed as major factors that contribute to a vulnerability toward symptom development, be it anxiety, depression, or other Axis II syndromes. As Choca et al. (1992) related, ''In the majority of cases we see, especially after the symptomatology diminishes, the client is left to struggle with cumbersome or pathological personality traits" (p. 199). A hypothetical example may concern an emaciated anorexic who presents with elevated Borderline Tendency, Identity Confusion, Body Disapproval, and Eating Dysfunctions scale scores. Such a person might require immediate medical supervision, supplemented with behavioral therapy. After some degree of physical stability has been attained, supportive, insight-oriented or even family therapy might be administered depending on the elevation and configuration of other scales. The construction of treatment plans based on configural codes is best accomplished on the basis of the case conceptualization already outlined. However, knowledge of typical issues that different personalities bring to therapy in their prototypal form can be valuable when developing plans for individuals whose clinical codetypes synthesizes multiple scales. For example, because an avoidant personality's mistrust of others contributes to and reinforces social withdrawal, development of a therapeutic alliance presents a special challenge. This introductory process may require an extended period of supportive enhancement of the client's self-esteem. Once the bond has been formed, the second phase of treatment may center on evoking insight regarding the client's unique etiology. Such reappraisal may help the client recognize current problems and deal with them more effectively. The following techniques may prove helpful as adjuncts: medication and/or behavior modification to alleviate stresses involved with therapy and its generalization, principles of cognitive therapy to counter distorted thinking patterns, and family and group therapy to improve communicative and social skills. Unlike the avoidant, the dependent personality typically poses no threat to the early development of the therapeutic bond. Such a client usually is eager to assume the familiar submissive stance within the therapeutic milieu. Thus, although the introductory stage of treatment may move quickly and smoothly, the client will be highly resistant to the therapist's later efforts to engender a healthy degree of autonomy. Directive therapies are logically contraindicated because these would simply reinforce the client's dependency needs. Nondirective dynamic and humanistic approaches usually emphasize the importance of the client and, over time, can be effective in improving self-esteem. These therapies may be too anxiety provoking for severe dependents, however. In these cases, medication may be required before the client is capable of producing the insight needed for change. Through additional group treatment, the dependent may learn new social skills and gain increased self-confidence. In contrast, the unruly adolescent usually arrives for treatment at the insistence of family members or school administrators. Because this client has little motivation to change, prognosis generally is seen as poor. However, if the therapist can patiently withstand the client's disruptive behavior (e.g., attempts at humiliation, belittlement, bluff, arrogance), a modicum of rapport can be built in some cases. If this is achieved, the therapist can act as a model mixture of "power, reason, and fairness" (Millon, 1981, p. 214) for the teen. In addition group therapies can help foster social and communications skills. These examples are but a few of the literally infinite number of combinations of personality styles, expressed concerns, and symptoms that adolescents present. The structure of the Millon inventories parallels the multiaxial model. Clinicians should be familiar with the principles of multiaxial assessment to use the instrument to its fullest potential.

< previous page

page_394

next page >

< previous page

page_395

next page > Page 395

Assessment of Treatment Outcome Evaluation of the MACI as an Outcome Measure The MACI fares well when evaluated against the Newman and Ciarlo (1994) criteria presented in the first edition of this text. Whereas other inventories represent a downward extension of instruments originally constructed with adult populations, the MACI was specifically normed on adolescent subjects. Moreover, the inventory was constructed as a multiaxial instrument coordinated with both a coherent clinical theory and with the DSM-IV nosology. Although some adolescents will require supervision, its 160-item length and sixth-grade reading level make it basically self-administering. The inventory requires less than 30 minutes to complete. As with the other Millon inventories, scale scores are based on national samples, and prevalence rates are informed by clinical ratings on the normative population, external validity studies, and clinical wisdom. Correction factors are available to mitigate the influence of response biases. Assessments of the reliability and validity of the instrument were an integral part of the test construction process. Given that the inventory is still relatively new, a smaller database of publications is available than for the MCMIs. However, the two inventories are based on the same clinical theory, and were developed using the same underlying logic of test construction. Computer scoring is available and provides either a profile report, or the more comprehensive interpretive report written in easy-to-understand language. The scale names are descriptive, and scale elevations beyond the BR cutoff scores indicate the relative prominence of the personality features or the relative severity of Expressed Concerns or Clinical Syndrome scores. General Considerations Although it is an implicit assumption among nosologists that legitimate psychological disorders should "breed true" over time, the interaction between intrinsic maturational capacities and variegated environmental influences creates diverse multiple pathways of development that make adolescent pathologies extremely difficult to study. For example, in outcome assessments conducted approximately 5 to 10 years following hospitalization (Weiss & Burke, 1970), the majority of school phobic youths were found to be high school graduates who had performed academically at or above their expected levels. Thus, on the surface, it seemed that therapeutic interventions had been effective. However, at the time of the later assessment, most of the subjects did not conceive of their earlier problem as being school phobia. Further, around half of the subjects were assessed as having made inadequate social adjustment. As with any study, researchers are advised to be aware of multitrait-multimethod factors. Accurate diagnosis and treatment planning should take into consideration not only self-reports, but also reports from parents, teachers, and others associated with the youth. Outcome assessment techniques also must advance to accommodate multiple measures from a variety of information sources. As more information is integated into the assessment, clinical baselines become successively more qualitative, less quantitative, and less amenable to empirical study, simply because the individual is understood as a unique developmental entity, rather than a collection of scale scores. Researchers designing outcome studies with multiaxial instruments must first define the scope of the outcome to be assessed. In a managed care setting, for example, personality change is often not addressed because therapy is intended to be palliative

< previous page

page_395

next page >

< previous page

page_396

next page > Page 396

rather than substantive. Here, a minimal interpretation of efficacy might examine only pre- and posttreatment difference scores of only the Expressed Concerns and Clinical Syndromes scales to which treatment is addressed. Because the raw score distributions of most MACI and MAPI scales are not normally distributed, nonparametric statistics are recommended as a means of determining the statistical significance of change scores. Most nonparametric tests result in only modest loss of statistical power relative to parametric tests as sample size increases. While pre-post differences on the Personality Styles scales would thus appear useful only with longer term interventions where personality change becomes a primary goal, the Personality Styles scales can be incorporated in outcome studies in a variety of ways. Elevations on the Personality Styles scales could be inspected as a means of subject selection, for example, to select primarily narcissistic subjects, or to divide the sample into contrast groups with high and low levels of self-reported personality pathology on the basis of their BR scores. If a large sample is available, the raw scores of the personality scales could be factor analyzed and pre-post difference scores compared on the resulting factors. If the outcome assessment is intended for a single subject, MACI scores can be used to document treatment efficacy. Research with the MCMIs has shown that for some subjects, the BRs of certain scales actually increase in response to therapy, namely the Histrionic, Narcissistic, and Compulsive. This is likely to be the case for the MACI as well. These three constructs possess normal variants that are often highly adaptable in modern society. The self-confidence of normal range narcissism, for example, is seen as positive and motivating, whereas the sociability of the normal range histrionic is a positive form of extraversion. For these scales, the relation between scale score and pathology is nonlinear. Too little self-confidence is bad, too much is bad, but a certain level is actually valued and even envied. Although repeated administration of inventories is questioned by some, many clinicians find follow-up assessments to be useful. Furthermore, insurance companies, lawyers, consumer interest groups, and others are increasingly calling for documentation that supports the value of treatment. The BR thresholds built into the instrument provide reference points against which the efficacy of treatment for a single subject may be judged. Because BR 75 indicates the presence of pathology for most scales, posttreatment scores that drop below BR 75 suggest pathologies that have been treated into the subclinical range. This does not necessarily mean no further basis for treatment exists, because the scales that are often the focus of outcome assessment are those related to Axis I like condition. Because the MACI is a multiaxial instrument, the focus of treatment should be understood in advance before results are communicated. For example, the best index of recovery for a patient referred for the treatment of depression is the change score in the Depressive Affect scale. The personality profile and its overall elevation and relation to the subject's symptoms may be interesting, but if the issue is the disposition of the referral issue, certain scales may not be relevant. Conclusions The Millon inventories have proven extremely popular with clinicians. The MACI and MAPI are intended primarily for adolescent subjects. Because the MACI is a relatively recently published instrument, an important direction for research is the use of the

< previous page

page_396

next page >

< previous page

page_397

next page > Page 397

MACI as an instrument in outcome studies. The reliability of the MACI scales, their basis in a coherent theory of personality and psychopathology, and their coordination with the DSM-IV should be attractive to researchers seeking to quantify outcome in adolescent groups. At the same time, the availability of interpretive reports is of assistance to clinicians seeking to document baseline and progress in their own therapy. References American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd rev. ed.). Washington, DC: Author. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Berndt, D. (1983). Ethical and professional considerations in psychological assessment. Professional Psychology: Research and Practice, 14, 580-587. Beutler, L. (1989). Differential treatment selection: The role of diagnosis in psychotherapy. Psychotherapy, 26, 271-281. Choca, J.P., Shanley, L.A., & Van Denburg, E. (1992). Interpretive guide to the Millon Clinical Multiaxial Inventory. Washington, DC: American Psychological Association. Dewald, P. (1967). Therapeutic evaluation and potential: The psychodynamic point of view. Comprehensive Psychiatry, 8, 284-298. Jackson, D.N. (1970). A sequential system for personality scale development. In C.D. Spielberger (Ed.), Current topics in clinical and community psychology (Vol. 2, pp. 61-92). New York: Academic Press. Langs, R. (1973). The technique of psychoanalytic psychotherapy (Vol. 1). Northvale, NJ: Aronson. Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635-694. Meehl, P.E. (1972). Reactions, reflections, projections. In J.N. Butcher (Ed.), Objective personality assessment (pp. 131-189). New York: Academic Press. Meehl, P.E., & Rosen, A. (1955). Antecedent probability and the efficiency of psychometric signs patterns, or cutting scores. Psychological Bulletin, 52, 194-216. Millon, T. (1969). Modern psychopathology. Philadelphia: Saunders. Millon, T. (1981). Disorders of personality: DSM-III, Axis II. New York: Wiley. Millon, T. (1990). Toward a new personology: An evolutionary model. New York: Wiley. Millon, T., Green, C.J., & Meagher, R.B. (1982). Millon Adolescent Personality Inventory manual. Minneapolis: National Computer Systems. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Piotrowski, C., & Lubin, B. (1990). Assessment practices of health psychologists: Survey of APA Division 38 clinicians. Professional Psychology: Research and Practice, 21, 99-106. van Reken, M. (1981). Psychological assessment and report writing. In C.E. Walker (Ed.), Clinical practice of psychology: A guide for mental health professionals. New York: Pergamon. Weiss, M., & Burke, A. (1970). A 5- to 10-year follow-up of hospitalized school phobic children and adolescents. American Journal of Orthopsychiatry, 40, 672-676.

< previous page

page_397

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_399

next page > Page 399

Chapter 14 Personality Inventory for Children, Second Edition (Pic-2), Personality Inventory for Youth (Piy), and Student Behavior Survey (Sbs) David Lachar University of TexasHouston Medical School This chapter introduces a family of relatively new or recently revised measures used in the evaluation of schoolage children: the Personality Inventory for Children, Second Edition (PIC-2), Personality Inventory for Youth (PIY), and Student Behavior Survey (SBS). These questionnaires are similar to those measures developed by Achenbach, Conners, and Reynolds and Kamphaus described in subsequent chapters. Each of these families of measures assesses multiple dimensions of problem behavior; collects observations from parents, teachers, and youth; and provides standard scores based on contemporary national samples. Lachar (1998) offered an integrated review of these four families of measures. Multidimensional assessment is both efficient and accurate. Important clinical phenomena are measured using the same format and are evaluated using the same or a similar standardization sample (vs. an assessment in which narrow instruments with different response characteristics and different normative samples are integrated into a description of the same student). Adoption of multidimensional assessment acknowledges that the documentation of symptom or problem absence is as important as the documentation of symptom or problem presence. A multidimensional approach to assessment also recognizes that a pattern of significant clinical problems often occurs in the same child. These problem constellations or pattern of diagnoses are often designated as "comorbid." For example, problem dimensions of anxiety and depression are often comorbid (Lonigan, Carey, & Finch, 1994), as are a variety of problem dimensions and attention deficit hyperactivity disorder (ADHD; August, Realmuto, MacDonald, Nugent, & Crosby, 1996; Vaughn, Riccio, Hynd, & Hall, 1997). Routine initial application of a multidimensional assessment measure also acknowledges that children referred for an evaluation often experience problems that are different from those hypothesized to be present in the first place. For example, application of a narrowly focused ADHD measure at the beginning of an evaluation of a child with possible ADHD may be problematic for two reasons. Such children may not obtain clinically elevated scale scores on such a questionnaire, leaving alternative explanations for observed inattention (e.g., depression, anxiety, situational adjustment, learning disability,

< previous page

page_399

next page >

< previous page

page_400

next page > Page 400

and acquired cognitive disability) unexplored. Alternatively, application of such a narrowly focused ADHD measure may result in one or more clinically significant scale scores but provide no evidence of the presence or absence of frequently comorbid conditions. It would therefore seem most logical and efficient to first apply a multidimensional measure and then subsequently focus on further differentiation of problem areas highlighted in the initial assessment effort. Such a "successive hurdles" approach recognizes the value of initial psychometric information in the design of subsequent evaluative effort. Multisource assessment has become the preferred model for the evaluation of child and adolescent emotional and behavioral adjustment. Unlike the evaluation of adjustment in adults, which usually relies on self-report, such self-description by itself is usually not adequate in the evaluation of school-age children. Indeed, the context of assessment is fundamentally different for children and adolescents who are unlikely to refer themselves for evaluation or treatment and may not possess the academic, cognitive, or motivational status to complete a comprehensive self-report instrument. Young elementary schoolchildren, perhaps students in kindergarten through third grade, may be unable to describe themselves adequately through response to questionnaire statements. These children are unlikely to have mastered the range of vocabulary necessary to adequately describe dimensions of adjustment; such language competence is not usually attained before fourth or fifth grade. Another consideration is that youth are most often referred for evaluation because they are either noncompliant with the requests of the significant adults in their lives or exhibit problems in academic achievement, often presenting with inadequate reading skills. Therefore, completing a self-report inventory of several hundred items could present an assessment challenge even for a high school student. Parents and teachers, in addition to making referrals for assessment, are also the primary sources of useful systematic observation. Certainly adults are the most direct informants because they can report the child's noncompliance with their requests firsthand. Parents are the only consistently available source for report of early childhood development and description of behavior in the home. Teachers offer the most accurate observations concerning the age-appropriateness of a child's adjustment in the classroom and academic achievement, as well as the attentional, motivational, and social phenomena unique to the classroom and to the school. It is likely, however, that such observational accuracy decreases after the elementary school grades, as middle school and high school teachers in regular education classrooms have very little time for continuous observation of students (usually 45 minutes a day, along with 30 other students). Youth self-description, regardless of problems that have been documented for this source of information (Greenbaum, Dedrick, Prange, & Friedman, 1994; Jensen et al., 1996), still represents the most direct and accurate expression of personal thoughts and feelingsonce the potentially distorting effects of response sets are identified. (Note that Michael and Merrell, 1998, demonstrated adequate short-term temporal stability for the self-report of third- to fifth-grade students.) The availability of two or three independent sets of child descriptions provides a natural opportunity for comparison across informants. Achenbach, McConaughy, and Howell (1987) conducted a comprehensive literature review and found very limited concordance between the report of parent, teacher, and youth, although relatively greater between-source agreement was obtained for scales representing externalizing behaviors. A review of similar studies that evaluated the responses to objective interviews from

< previous page

page_400

next page >

< previous page

page_401

next page > Page 401

parent and child concluded that greater agreement was obtained for visible behaviors and for child-parent pairs with increasing child age (Lachar & Gruber, 1993). Although one reasonable approach to the interpretation of differences between parent, teacher, and youth is to assign such differences to situation-specific variation (e.g., the child is only oppositional at home, not in the classroom), other explanations are equally plausable. Cross-informant variance may reflect the fact that scales with similar names may contain significantly different content. On the other hand, the development and application of valid, strictly content-parallel measures may limit instrument validity. Such assignment of only parallel scale content may restrict the diagnostic potential of each informant source by excluding the measurement of attributes that may be uniquely obtained from only one informant source. Such attributes might be measured through a parent-completed measure of developmental delay, a teacher-completed measure of classroom behavior, or a youth-completed measure of self-concept. The PIC-2, PIY, and SBS attempt in their structure and content to provide the opportunity both to compare similar scale content across informants, as well as to measure phenomena that may be uniquely obtained from only one informant. Along with dissimilar scale content, another cause of poor across-informant agreement is the substantial effect of response sets on the accuracy of such information. The child or adolescent being assessed may not adequately comply with questionnaire instructions, due to either inadequate language or reading skills or lack of adequate motivation for the task. It is equally likely that a youth may not want to describe a personal history of maladaptive behavior and current internal discomfort for mental health professionals, although a negative presentation of parent adjustment and home conflict may be readily provided. At times, youth may also be motivated in an assessment context to admit to problems and symptoms that are not present. This same variety of motivations and conditions may also influence parent report. Indeed, there has been some concern that poor parent adjustment may compromise the validity of parent report (Achenbach, 1981). Subsequent review (Richters, 1992) and specific analysis of this issue with the Personality Inventory for Children (Lachar, Kline, & Gdowski, 1987) have found no empirical support for this consideration. The PIC-2 and the PIY incorporate validity scales to identify response sets. These scales are designed to measure random or inadequate response to scale statements, defensive denial of existing problems, as well as admission of symptoms that are unlikely to be present or exaggeration of actual problems. Clinicians may approach interpreting inconsistencies across informants in a variety of ways. At one extreme, a clinician might consider accepting any evidence of symptom presence from any informant source. At the other extreme, a clinician's focus on symptoms may exclude the interpretation of all scale dimensions not demonstrated within the clinical range by at least two, or all three, informant sources. Although an optimal approach to the interpretation of multisource questionnaires has not been established (certainly the opinions of mental health professionals and parents have been studied: Loeber, Green, & Lahey, 1990; Phares, 1997), there is a distinct pragmatic advantage in using an assessment system with comparable parent, teacher, and youth versions. Conditions regularly occur in the conduct of psychological evaluations that make it difficult or impossible to obtain a parent, teacher, or youth report. Children may be too young, uncooperative, or language impaired. The evaluation may occur during the summer vacation, or the youth may not have consistently attended one classroom, or

< previous page

page_401

next page >

< previous page

page_402

next page > Page 402

may have left school permanently. Parents may miss family appointments when their child is hospitalized, or children may be under agency guardianship. In such instances, a set of comprehensive parent, teacher, and selfreport measures that can be applied independently of each other provide the flexibility to facilitate data collection. The Personality Inventory for Children, Second Edition (PIC-2) Forty years ago, two University of Minnesota psychologists began the development of a new inventory approach to the evaluation of children and adolescents. They assembled a 600-item administration booklet and named it the "Personality Inventory for Children. For use with children from six through adolescence." The directions stated that each item was to be answered as either "True" or "False" by the child's mother in order to describe both the child and family relationships. Professors Wirt and Broen accumulated administration booklet descriptive statements following an agreed on outline. To ensure comprehensive coverage of child behavior and adjustment, 50 statements were accumulated for each of 11 different content areas: Aggression, Anxiety, Asocial Behavior, Excitement, Family Relations, Intellectual Development, Physical Development, Reality Distortion, Social Skills, Somatic Concern, and Withdrawal. To these 550 potential scale items, 50 items were added in an effort to strengthen or clarify the meaning of certain areas of concern. Although all statements were to be completed by a parent, statement format varied: Some items described historical fact (e.g., "My child has failed a grade [repeated a year] in school."), other items concerned the observation of others ("School teachers complain that my child cannot sit still.''), and still other items involved direct parental report. These direct statements describe behaviors (e.g., "My child sometimes swears at me."), as well as emotional states (e.g., "My child worries some."). These items altogether have been found to require a sixth- to seventh-grade reading level (Harrington & Follett, 1984). Following many of the general procedures employed in the development of the Minnesota Multiphasic Personality Inventory, PIC scales were constructed from these potential items over a span of 20 years. The initial 1977 published profile included a visual display of the linear T-scores of 3 validity scales, a general screening scale, and 12 measures of child ability and adjustment and family function developed through either empirical item selection techniques or through iterative content valid procedures. In 1981, the administration booklet was revised and PIC-R items were sorted into one of four parts. Completion of part I (Items 1-131) allowed the scoring of four additional broad-band factor-derived scales (Lachar, Gdowski, & Snyder, 1982). Completion of parts I and II (Items 1-280) generated the entire clinical profile with "shortened scales" (Lachar, 1982), and parts I through III (Items 1-420) allowed the scoring of the original length scales. The final 180 items of this booklet were eventually dropped from the administration booklet because they did not appear on any of the standard full-length profile scales. From the beginning, the task of PIC scale and profile interpretation attributed relatively little importance to item content (except for the construction of a Critical Items list). Instead, an integrated program of research established external correlatives and interpretive guidelines for individual profile scales (Lachar & Gdowski, 1979) and replicated profile patterns (Gdowski, Lachar, & Kline, 1985; Kline, Lachar, & Gdowski, 1987; Lachar, Kline, Green, & Gruber, 1996; Lacombe, Kline, Lachar, Butkus, &

< previous page

page_402

next page >

< previous page

page_403

next page > Page 403

Hillman, 1991). A profile interpretive procedure in which similarity coefficients are calculated between the individual profile to be interpreted and the mean profiles of students receiving specific special education services has also been incorporated into profile scoring and interpretation software (Kline, Lachar, Gruber, & Boersma, 1994). Special effort has also focused on demonstrating that PIC scale validity is not restricted by a child's age, gender, or ethnicity status (Kline & Lachar, 1992; Kline, Lachar, & Sprague, 1985). A comprehensive review of this version of the PIC may be obtained from the first edition of this volume (Lachar & Kline, 1994). In addition, a bibliography of approximately 400 PIC citations is also available from this author. Current test revision efforts began in 1989 with the rewriting of the first 280 items of the PIC booklet into the self-report format for the Personality Inventory for Youth (PIY; Lachar & Gruber, 1993; Lachar & Gruber, 1995a, 1995b). Development of the PIY facilitated concurrent critical review of the structure and content of PIC scales and the PIC profile. Revision efforts have been sensitive to the need to maintain continuity with PIC interpretation principles established over the past 20 years, as well as to introduce psychometric changes that improve its efficiency. A research edition administration booklet that allowed both the scoring of the PIC profile and the collection of data on revised and new inventory items facilitated test revision. Over 1,000 of these clinical protocols have been subjected to considerable statistical analysis through which the PIC (now the PIC-2) and the PIY have achieved a great deal of similarity to facilitate the comparison of parent description and self-description. This similarity incorporates comparable subscale-within-clinical-scale structure and specific item revisions that have improved comprehension and application. These revisions have included the removal of almost all negatively worded statements to which "True" (vs. the natural response of "False") represents the unscored response (e.g., "My child has never had cramps in the legs." False), as well as the substitution of the word "parent(s)" for designation of either mother or father in family description. The final PIC-2 administration booklet of 275 statements generates a profile of gender-specific linear T-scores of 3 validity scales, 9 scales, and 21 subscales, as well as a second profile of 8 shortened scales used for screening and the measurement of treatment effectiveness. The PIC-2 provides a representative normative sample of school-age children (kindergarten through 12th grade). PIC-2 Clinical Scales and Subscales Each of the nine clinical scales was constructed using the same iterative process (Lachar, 1999). Initial scale composition was based on either previous PIC or PIY item placement or substantive item content. Item-to-scale correlation matrices generated from a sample of 950 clinical protocols were then inspected to establish the accuracy of these initial item placements. Each inventory statement retained on a final clinical scale demonstrated a significant and substantial correlation to the scale on which it was placed. When an item obtained a significant correlation to more than one clinical dimension, it was placed in almost all cases on the dimension to which it obtained the largest correlation. In this manner, 94% of the 264 PIC-2 statements that comprise these nine scales are placed on only one scale. The 16 items that were placed on two of these final scales obtained substantial correlations to both of these dimensions and represented substantive content consistent with both scale dimensions. For example, "Others often say that my

< previous page

page_403

next page >

< previous page

page_404

next page > Page 404

child is moody" has been placed on DIS2: Depression (.63) and DLQ2: Dyscontrol (.61), because "moody" signifies both dysphoria and anger. The relatively unique item composition of the nine PIC-2 clinical scale dimensions is in contrast to the previous PIC-R structure. For example, in the PIC-R, 68% of Anxiety scale items also appear on the Depression scale. In addition, considerable between-scale overlap occurs among the three PIC-R cognitive scales: Achievement (56%), Intellectual Screening (37%), and Development (84%). As presented in Table 14.1, these scales demonstrate substantial internal consistency. PIC-2 clinical scales average 31 items in length (range of 19 to 47 items) and obtain a median coefficient alpha of .898 (range of .796 to .957). Similarity of PIC-2 to the PIC-R and PIY clinical scales was measured by percent item overlap, as well as correlation between PIC-2 and comparable PIC/PIY scales. PIC-2 scales, on average, obtain 66% overlap with PIC-R scales (range of 33% to 96%) and obtain a substantial median correlation of .93 (range of .81 to .99) with the PIC-R equivalent. As would be expected, PIC-2 clinical scales obtain substantial item overlap with PIY scales similarly named (average 79%, range of 51% to 100%). In spite of this substantial scale similarity, the difference in informants (parent to youth) resulted in only moderate concordance estimates (median correlation = .43, range of .28 to .53). The items of each clinical scale are also partitioned into two or three subscales. Application of principal component factor analysis with varimax rotation guided the identification of two or three relatively homogeneous item subsets for each clinical scale. PIC-2 subscales average 13 items in length (range of 6 to 21 items), with only 3 of 21 subscales incorporating less than 10 items. Table 14.1 provides coefficient alpha values for subscales and lists two representative items for each subscale. The majority of subscales demonstrate psychometric characteristics comparable to scales on shorter published questionnaires. In all instances, the division of scales into subscales facilitates the interpretation process. For example, the actuarial interpretation of the PIC-R Delinquency scale (Lachar & Gdowski, 1979) identified T-score ranges associated with dimensions noncompliance, poorly controlled anger, and antisocial behaviors. These dimensions are each represented in PIC-2 DLQ subscales; their patterns of elevation represent the dominant endorsed content of this clinical scale. (Note the comparable subscales on the PIY Delinquency scale.) Correlations between PIC-2 scale scores and clinician, teacher, and youth descriptions readily provide interpretive guidelines for the following nine dimensions: Cognitive Impairment (COG). The statements that reflect limited general intellectual ability (COG1), problems in achieving in school (COG2), and a history of developmental delay or deficit (COG3) have been placed on this scale. COG2 elevation has been found to be associated with a broad range of inadequate academic habits and poor achievement in the classroom. Both teacher and clinician ratings demonstrate a strong relation between COG3 elevation and language deficits. Impulsivity and Distractibility (ADH). The majority of these items (21 of 27) appear on the first dimension. ADHI (Disruptive Behavior) receives substantial support from teacher ratings: Elevation on this subscale is associated with poor behavioral control in the classroom that disrupts the classroom process, whereas clinicians report impulsive, hyperactive, and restless behaviors associated with excessive attention seeking. The second dimension (ADH2, Fearlessness) appears to measure an aspect of bravado that may best be classified as a personality dimension. Delinquency (DLQ). DLQ1 (Antisocial Behavior) elevation is associated with behaviors readily associated with the total scale name. Subscale elevation suggests admission by both clinician and youth of a variety of unacceptable behaviors: truancy, alcohol and drug misuse, theft, running away from home, deceit, and association with other youth who are similarly troubled. DLQ2

< previous page

page_404

next page >

< previous page

page_405

next page > Page 405

TABLE 14.1 PIC-2 Clinical Scales and Subscales COGNITIVE IMPAIRMENT (39 items, a = .88) PIC-R Overlap: 74%, ACH: .87, IS: .67, DVL: .93 PIY Overlap: 51%, COG: .43 COG1:Inadequate Abilities (13 items, a = .79) Others think my child is talented. My child seems to understand everything that is said. COG2:Poor Achievement (13 items, a = .79) It is hard for my child to make good grades. Reading has been a problem for my child. COG3:Developmental Delay (13 items, a = .80) At one time my child had speech difficulties. My child could ride a tricycle by age 5 years. IMPULSIVITY AND DISTRACTIBILITY (27 items, a = .92) PIC-R Overlap: 33%, HPR: .81 PIY Overlap: 63%, ADH: .31 ADH1:Disruptive Behavior (21 items, a = .92) My child jumps from one activity to another. My child cannot keep attention on anything. ADH2:Fearlessness (6 items, a = .69) My child will do anything on a dare. Nothing seems to scare my child. DELINQUENCY (47 items, a = .96) PIC-R Overlap: 53%, DLQ: .93 PIY Overlap: 83%, DLQ: .53 DLQ1: Antisocial Behavior (13 items, a = .88) My child has been in trouble with the police. My child has run away from home. DLQ2: Dyscontrol (17 items, a = .92) When my child gets mad, watch out! Many times my child has become violent. DLQ3: Noncompliance (17 items, a = .93) My child often breaks the rules. My child tends to see how much he/she can get away with. FAMILY DYSFUNCTION (25 items, a = .88) PIC-R Overlap: 96%, FAM: .99 PIY Overlap: 100%, FAM: .44 FAM1:Conflict Among Members (15 items, a = .85) There is a lot of tension in our home. Our family argues a lot at dinner time. FAM2:Parent Maladjustment (10 items, a = .78) One of the child's parents drinks too much alcohol. The child's parents are now divorced or living apart. REALITY DISTORTION (29 items, a = .90) PIC-R Overlap: 34%, PSY: .85 PIY Overlap: 69%, RLT: .28 RLT1: Developmental Deviation (14 items, a = .84) My child often gets confused. My child needs protection from everyday dangers. RLT2: Hallucinations and Delusions (15 items, a = .82) My child thinks others are plotting against him/her. My child is likely to scream if disturbed. SOMATIC CONCERN (28 items, a = .83) PIC-R Overlap: 79%, SOM: .95 PIY Overlap: 86%, SOM: .43 SOM1:Psychosomatic Preoccupation (17 items, a = .79)

My child is worried about disease. My child often has an upset stomach. SOM2:Muscular Tension and Anxiety (11 items, a = .68) Recently my child has complained of chest pains. My child often has back pains. (Continued) (table continued on next page)

< previous page

page_405

next page >

< previous page

page_406

next page > Page 406

(table continued from previous page) TABLE 14.1 (Continued) PSYCHOLOGICAL DISCOMFORT (39 items, a = .91) PIC-R Overlap: 79%, D: .94, ANX: .90 PIY Overlap: 79%, DIS: .42 DIS1: Fear and Worry (13 items, a = .73) My child will worry a lot before starting something new. My child is often afraid of little things. DIS2: Depression (18 items, a = .88) My child has little self-confidence. My child hardly ever smiles. DIS3: Sleep Disturbance/Preoccupation with Death (8 items, a = .77) My child's sleep is calm and restful. My child thinks about ways to kill himself/herself. SOCIAL WITHDRAWAL (19 items, a = .80) PIC-R Overlap: 68%, WDL: .91 PIY Overlap: 95%, WDL: .30 WDL1:Social Introversion (11 items, a = .78) My child is usually afraid to meet new people. Shyness is my child's biggest problem WDL2:Isolation (8 items, a = .66) My child does not like to be close with others. My child often stays in his/her room for hours. SOCIAL SKILL DEFICITS (28 items, a = .92) PIC-R Overlap: 75%, SSK: .96 PIY Overlap: 82%, .49 SSK1: Limited Peer Status (13 items, a = .85) My child often brings friends home. My child is very popular with other children. SSK2: Conflict with Peers (15 items, a = .89) My child seems to get along with every one. Other children make fun of my child's ideas. Note. Scale/subscale a values and PIC-2, PIC-R correlations based on a clinical sample n = 905. PIC2, PIY correlations based on a clinical sample n = 382. (Dyscontrol) elevation suggests the presence of disruptive behavior associated with poorly modulated anger. Teachers note fighting and youth admit to similar problems (e.g., "I lose friends because of my temper."). Clinicians rate these children as assaultive, defiant, argumentative, irritable, destructive, and manipulative. This lack of emotional control often results in behaviors that demonstrate poor judgment. DLQ3 (Noncompliance) elevation emphasizes disobedience to parents and teachers, the ineffectiveness of discipline, and the tendency to blame others for problems. Youth agreement with this perception is demonstrated by a variety of PIY item correlates, including "I give my parent(s) a lot of trouble." Family Dysfunction (FAM). This scale is divided into two meaningful dimensions. FAM1 reflects conflict within the family (e.g., "There is a lot of tension in our home." "My parents do not agree on how to raise me."). Clinicians note conflict between the child's guardians and concern regarding the emotional or physical abuse of the child. The second FAM dimension more directly measures parent adjustment. Selfreport correlates of FAM2 include: "One of my parents sometimes gets drunk and mean" and "My parents are now divorced or living apart." Reality Distortion (RLT). This content valid scale is considerably different than the empirically keyed PICR Psychosis scale, although substantial overlap is obtained with the PIY RLT scale (RLT1: 57%, RLT2: 80%). RLT1 (Developmental Deviation) elevation describes a level of intellectual, emotional, and social functioning usually associated with substantial developmental retardation or regression. RLT2 (Hallucinations and Delusions) describes symptoms and behaviors often associated with a psychotic adjustment.

< previous page

page_406

next page >

< previous page

page_407

next page > Page 407

Somatic Concern (SOM). The first dimension of this scale measures a variety of health complaints often associated with poor psychological adjustment. SOM1 (Psychosomatic Preoccupation) elevation is often associated with the self-report of similar complaints (e.g., "I feel tired most of the time." "I often have headaches." "I often have an upset stomach. "). The second SOM dimension (SOM2, Muscular Tension and Anxiety) appears to measure the somatic components of internalization. Psychological Discomfort (DIS). This relatively long scale of 39 items is best described as a measure of negative affectivity, divided as in the PIY into three meaningful dimensions. The first dimension (DIS1) measures fearfulness and worry, and is associated with clinician description of anxiety, fear, and tearfulness, as well as self-report of fear and emotional upset. The second dimension (DIS2) is a general measure of depression that obtains considerable correlation with parent, teacher, and youth description. Teachers see students with an elevated DIS2 scale score as sad or unhappy, moody and serious, and not having fun. Clinicians note many of the classical symptoms of depression, including feelings of helplessness, hopelessness, and worthlessness. Demonstrating inadequate self-esteem, such children are overly self-critical and usually expect rejection. The third dimension is similar to the PIY DIS3 dimension, combining report of problematic sleep and a preoccupation with death. Elevation of DIS3 correlates with clinician documentation of concern regarding suicide potential and a wide variety of self-report description, including sleep disturbance, dysphoria, and thoughts about suicide. Social Withdrawal (WDL). This is the shortest PIC-2 clinical scale (19 items). The two WDL dimensions parallel those of the PIY: The first subscale (WDL1) measures the personality dimension social introversion. Most items reflect psychological discomfort in social interactions. Clinician observation and youth self-report describe shyness and an unwillingness to talk with others. The second WDL dimension (Isolation) is a brief subscale of eight items that describes intentional lack of contact with others. Social Skill Deficits (SSK). This scale consists of two dimensions. Both dimensions receive considerable support from self-report correlates. The first SSK subscale reflects limited social influence. SSK1 elevation relates to self-report of few friends, a lack of popularity with peers, and little influence. Teachers note avoidance of peers and a lack of awareness of the feelings of others. SSK2 elevation, in contrast, measures problematic relations with peers. Self-report correlates document this conflict, and clinicians observe poor social skills and a problematic social adjustment. Table 14.2 presents the PIC-2 clinical scales and subscales that describe five students seen for outpatient evaluation. (Although interpretive guidelines should be applied within the context of each evaluation, these individual T-scores are marked for potential interpretation if clinical scales exceed 59T and subscales exceed 64T.) The Case A PIC-2 profile presents only one minimal elevation of WDL with WDL2 at 62T. This 6-yearold girl was assessed within the context of a custody evaluation; the grandmother who had raised her completed the PIC-2. This relatively quiet young girl obtained WISC-III and achievement scores within the normal range and teacher ratings reflecting a normal adjustment. Case B includes a PIC-2 completed by this 9-year-old boy's father. The PIC-2 was obtained as a screening measure to rule out problems in adjustment as part of a periodic reevaluation. This student was placed in a self-contained special education classroom for the cognitively impaired. Isolated elevation of COG and COG subscales were consistent with individual assessment performance in which all standard scores of measures of intellectual functioning, academic achievement, and adaptive behavior (except for socialization) were below 60. Case C's PIC-2 was completed by his mother as part of a comprehensive outpatient evaluation. Just entering the third grade, this 9-year-old boy had received problematic evaluations by his first- and second-grade teachers, who had complained of inadequate concentration and participation in classroom activities, undercontrolled anger, and

< previous page

page_407

next page >

< previous page

page_408

next page > Page 408

TABLE 14.2 Case Examples of PIC-2 Clinical Scales and Subscales Case Examples Scale/Subscale A B C D E Cognitive Impairment 50 73*56 80*78* 49 72*52 74*64 Inadequate Abilities 49 67*63 75*78* Poor Achievement 54 78*48 76*78* Developmental Delay Impulsivity & Distractibility 59 48 63*62*84* Disruptive Behavior 61 49 66*64 80* Fearlessness 50 39 47 50 81* Delinquency 46 45 69*65*89* Antisocial Behavior 46 54 46 46 71* Dyscontrol 43 46 78*58 98* Noncompliance 49 43 66*72*79* Family Dysfunction 58 50 48 63*79* Conflict Among Members 62 48 45 69*78* Parent Maladjustment 49 54 54 49 71* Reality Distortion 45 56 44 94*85* Developmental Deviation 46 60 46 88*73* Hallucination & Delusions 44 50 43 92*92* Somatic Concern 53 45 41 47 74* Psychosomatic Preoccupation 55 48 42 46 87* Muscular Tension & Anxiety 49 42 42 49 50 Psychological Discomfort 45 41 59 75*97* Fear & Worry 51 43 49 71*76* Depression 43 42 67*67*99* Sleep Disturbance/Preoccupation with Death 45 43 43 81*73* Social Withdrawal 62*39 39 56 96* Social Introversion 62*41 41 40 82* Isolation 57 42 42 79*95* Social Skill Deficits 49 50 66*80*68* Limited Peer Status 50 56 59 74*52 Conflict with Peers 48 43 70*79*83* * Significant scale T > 59, subscale T > 64. impulsive, disruptive behaviors. Although the test examiner reported a constant need to redirect C to the tasks at hand, individual assessment demonstrated age-appropriate intellectual abilities and 6- to 12-month academic delay. Profile C provided evidence of ADHD-related classroom phenomena (ADH1), noncompliance and poorly modulated anger (DLQ2, DLQ3), as well as poor peer relationships (SSK2). A borderline COG2 (63T) in this case reflects developing problems in academic achievement. An isolated DIS2 may reflect a dysphoric reaction to his entry into a newly established stepfamily. Even a cursory review of Profile D reveals T-scores suggestive of severe developmental and behavioral disability. This 16-year-old girl had been diagnosed as autistic or with pervasive developmental disorder for the past 11 years. Cognitive disability is reflected in elevations of RLT1 and COG1-3. RLT1, WDL2, and SSK1-2 suggest substantial deficits in adaptive behavior. Case D is described as fearful, emotionally labile, preoccupied with death, and demonstrating a sleep disorder (DIS1-3). D is in constant need of supervision, often noncompliant and disruptive (DLQ3, ADH1). This family, and

< previous page

page_408

next page >

< previous page

page_409

next page > Page 409

especially D's parents, have experienced a great deal of stress in trying to meet D's needs and to plan for her future (FAM1). Profile E also demonstrates severe psychopathology. (The review of validity scales rules out the likelihood of either symptom exaggeration or inconsistent response.) Relevant history and adjustment during this evaluation and subsequent treatment were consistent with scale elevations. This PIC-2 profile was completed by E's mother as part of an outpatient evaluation. The referral questioned the presence of a learning disability in this 13-yearold boy. He repeated kindergarten, received special education services for the learning disabled in grades two through six, obtained declining academic grades in the seventh grade, obtained borderline estimates of both intellectual ability and academic achievement in current assessment (COG1-3), was disruptive at school (ADH12; DLQ1-3), and was suspicious of peers (SSK1, SSK2). In the fifth grade, "voices" told him to hurt himself (RLT1-2). This young adolescent was afraid to be separated from his mother and demonstrated a continuing preoccupation with death on both projective assessment and the PIY (DIS1-3). E's mother expressed despair in her attempts to help her son overcome his fears, because of the severity of his disabilities and her own psychological problems (FAM1-2). Elevations on SOM1 and WDL1-2 reinforce the severity of current problems and demonstrate the lack of social support for this young adolescent. Treatment with weekly therapy, psychotropic medication, continued monitoring of suicide potential, and need for psychiatric hospitalization were consistent with a diagnosis of major depression with psychotic features. PIY Scales and Subscales The majority of PIY items were derived from rewriting the first 280 items of the PIC-R administration booklet to a first-person format (see Table 14.3 for examples of PIY items). PIY self-report scales and subscales were constructed in the same manner as PIC-2 scales (Lachar & Gruber, 1993, 1995a, 1995b). The nine clinical scales were constructed with a uniform methodology, resulting in assignment of each of 231 items to only one scale, as well as a high degree of both scale content appropriateness and homogeneity. In addition, each of these scales have been further divided into two or three nonoverlapping subscales that represent factor-guided dimensions of even greater content homogeneity. The pattern of scale and subscale elevation is a major focus of the PIY and PIC-2 interpretive process. Gender-specific linear T-scores have been derived from a national normative sample of 2,327 regular education students in grades 4 through 12, and a variety of analyses have been conducted using a large sample of clinically referred students (N= 1,178). PIY clinical scales average 26 items in length (range of 17 to 42 items) and the median coefficient alpha in referred protocols was .85 (range of .74 to .92). The 24 subscales average 10 items in length (range of 4 to 16 items with 5 subscales of less than 8 items in length) and the mean coefficient alpha in referred protocols was .73 (range of .44 to .84 with 8 of 24 subscales less than .70). The PIY Administration and Interpretation Guide (Lachar & Gruber, 1995a) provides interpretive guidelines for scales and subscales as well as 15 case studies. (PIY profile data are integrated into a demonstration of the effect of a defensive response set and in a study of treatment effectiveness presented later.) Differences in the nature of self-report and parent report are demonstrated when PIY scale and subscale content are compared to the PIC-2 equivalents. The PIY Cognitive Impairment scale includes only half of the items of the comparable PIC-2

< previous page

page_409

next page >

< previous page

page_410

next page > Page 410

TABLE 14.3 PIY Clinical Scales and Subscales COGNITIVE IMPAIRMENT (20 items, a = .74) COG1:Poor Achievement and Memory (8 items, a = .65) I often forget to do things. School has been easy for me. COG2:Inadequate Abilities (8 items, a = .67) People say that I have common sense. I think I am stupid or dumb. COG3:Learning Problems (4 items, a = .44) I have been held back a year in school. Because of my learning problems, I get extra help, or am in a special class in school. IMPULSIVITY AND DISTRACTIBILITY (17 items, a = .77) ADH1:Brashness (4 items, a = .54) I brag a lot. I often nag and bother other people. ADH2:Distractibility/Overactivity (8 items, a = .61) I cannot wait for things like other kids can. Most of the time I run rather than walk. ADH3:Impulsivity (5 items, a = .54) I often act without thinking. I am often restless. DELINQUENCY (42 items, a = .92) DLQ1: Antisocial Behavior (15 items, a = .83) I sometimes skip school. I use illegal drugs. DLQ2: Dyscontrol (16 items, a = .84) People think that I am mean. I lose friends because of my temper. DLQ3: Noncompliance (11 items, a = .83) I often disobey my parent(s). Punishment does not change how I act. FAMILY DYSFUNCTION (29 items, a = .87) FAM1:Parent-Child Conflict (9 items, a = .82) I am unhappy about my home life. My parent(s) are too strict with me. FAM2:Parent Maladjustment (13 items, a = .74) My parents disagree a lot about how to raise me. My parents often argue. FAM3:Marital Discord (7 items, a = .70) My parent(s) always discuss things before they make a big decision. My parents' marriage has been solid and happy. REALITY DISTORTION (22 items, a = .83) RLT1: Feelings of Alienation (11 items, a = .77) I do strange or unusual things. I often get confused. RLT2: Hallucinations and Delusions (11 items, a = .71) I am afraid I might be going insane. People secretly control my thoughts. SOMATIC CONCERN (27 items, a = .85) SOM1:Psychosomatic Syndrome (9 items, a = .73) I often get very tired. I often have headaches. SOM2:Muscular Tension and Anxiety (10 items, a = .74) At times I have trouble breathing. Sometimes my heart pounds or races. SOM3:Preoccupation with Disease (8 items, a = .60) I often talk about sickness. Being sick upsets me more than it does most others. (Continued) (table continued on next page)

< previous page

page_410

next page >

< previous page

page_411

next page > Page 411

(table continued from previous page) TABLE 14.3 (Continued) PSYCHOLOGICAL DISCOMFORT (32 items, a = .86) DIS1: Fear and Worry (15 items, a = .78) Small problems do not bother me. I worry about things that adults worry about. DIS2: Depression (11 items, a = .73) I try to make the best of most things. I am often in a good mood. DIS3: Sleep Disturbance (6 items, a = .70) I have a lot of nightmares. I often think about death. SOCIAL WITHDRAWAL (18 items, a = .80) WDL1: Social Introversion (10 items, a = .78) Talking to others makes me nervous. I am often embarrassed. WDL2: Isolation (8 items, a = .59) I almost always play alone. I keep my thoughts to myself. SOCIAL SKILL DEFICITS (24 items, a = .86) SSK1: Limited Peer Status (13 items, a = .79) Other kids look up to me as a leader. People always listen when I speak. SSK2: Conflict with Peers (11 items, a = .80) I do not get along with the other students at school. I wish that I were more able to make and keep friends. Note. Coefficient a values based on a clinical sample N = 1,178. scale. This difference reflects the exclusion of developmental or historical items in the self-report format (children are not accurate reporters of developmental delay), as well as the reality that fewer self-report items correlated with this dimension for youth. The PIY Impulsivity and Distractibility scale also incorporated fewer scale items (17) than its PIC-2 equivalent (27 items). Perhaps the report of ADH disruptive behavior would be more likely to be expected from an adult informant who finds such behavior distressful than from a student who may not find such behaviors disturbing. Such results suggest that the PIC-2 COG and ADH scales will demonstrate superior diagnostic performance in comparison to these PIY scales. In contrast, the other seven PIY clinical scales achieved a significant degree of similarity in content and length with the PIC-2 scale equivalents. PIC-2 And PIY Screening and Short Forms The PIY and PIC-2 each incorporate a screening or short assessment procedure. The first 80 items of the PIY comprise a 32-item screening scale chosen to provide an optimal identification of those regular education students who, when administered the full PIY, produce clinically significant results. These items also include three "scan items" for each clinical scale. Scan items were selected in such a manner so that students who endorse two or more of each set of three items would be those with a high probability of scoring greater than 59T on the corresponding clinical scale. Shortened versions of three validity scales can also be derived from these items.

< previous page

page_411

next page >

< previous page

page_412

next page > Page 412

The PIC-2 provides a short form designed specifically to measure change in clinical status associated with therapeutic intervention. Although PIC scales have demonstrated such sensitivity to change (see example of treatment-related PIC-R and PIY change in Lachar & Kline, 1994), a brief form tailored specifically for this purpose was constructed. Selected items were written in the present tense, frequently endorsed in the context of clinical assessment, and described clinical phenomena often the focus of short-term intervention. Using these guidelines, the 12 most favorable items from each of eight PIC-2 clinical scales were selected. Comparable items were not selected from the Cognitive Impairment scale because the majority of COG items demonstrate either historical content or a lack of appropriate associated therapeutic focus due to the global or stable nature of the descriptions on this dimension. These 96 inventory statements have been placed at the beginning of the 275-item PIC-2 administration booklet to serve as both a short form and a method of efficient reevaluation of a child following short-term intervention. These 96 items will also be available in a self-scoring format. It is intended for these scale scores to be graphed on the same profile at baseline and at appropriate interim and posttreatment intervals to demonstrate both dimensions of change and dimensions of stability. The initial concurrent validity of these shortened PIC-2 scales was established through the correlation of scale scores with clinician ratings, teacher descriptions, and self-report descriptions. These correlations were drawn from the data generated from an ongoing clinical project in which PIC-2, PIY, SBS, clinician ratings, diagnoses, and the results of individually administered cognitive ability and academic achievement tests are being collected to some degree in over 1,000 assessments. Even though only an exploratory analysis, some care was taken in the selection of these correlates. Each obtained scale descriptor was classified with only the one shortened scale with which it received the largest correlation, all being at least significant at p < .01. Table 14.4 summarizes the number of external ratings identified in this manner from each source (clinician, teacher, and student) and provides up to three examples from each rating source for each of these eight scales. Correlations between these shortened scales and their full-length versions are also presented. Table 14.4 documents that these 12-item scales correlate substantially with their full-length versions (.92-.96) and obtain independent correlates from nonparent observers that match expressed scale content and diagnostic intent. Clinician ratings provided the greatest support for ADH, DLQ, and DIS, focusing on problems of disruptive and noncompliant behavior and intense and dysphoric affect that often form the basis of clinical referral. These analyses also demonstrate that ADH, as previously demonstrated for the PIC Hyperactivity scale (Lachar & Gdowski, 1979), assesses those behaviors most related to problems in classroom adjustment. In addition, observations obtained directly from the student under study provide those internal and subjective judgments that demonstrate the clinical value of PIC-2 dimensions that do not receive robust correlates from clinicians or teachers. The PIC-2 short-form scales are applied in this chapter's section on evaluating treatment effectiveness. PIY and PIC-2 Validity Scales PIY and PIC-2 profiles both include a set of three comparable validity scales. The Inconsistency scale evaluates the likelihood that responses to items are random or reflect in some manner inadequate comprehension of inventory statements or compliance with

< previous page

page_412

next page >

< previous page

page_413

next page > Page 413

TABLE 14.4 Correlates of PIC-2 Brief Clinical Scales IMPULSIVITY & DISTRACTIBILITY (ADH12: .96*) Clinician Ratings: total = 14 Hyperactive, overactive Restless Frequently frustrated Teacher Ratings: total = 34 Disobeys class or school rules Impulsive, acts without thinking Misbehaves unless closely supervised Student Ratings: total = 9 Brags about being sent to the principal School has sent notes home about bad behavior Like to show off DELINQUENCY (DLQ12: .93) Clinician Ratings: total = 44 Argues, Oppositional Disobedient to parents Poorly modulated anger Teacher Ratings: total = 11 Does not demonstrate polite behavior Blames others for his/her problems Becomes upset for little or no reason Student Ratings: total = 24 Gives parents a lot of trouble Sometimes swear at parents Ran away from home FAMILY DYSFUNCTION (FAM12: .93) Clinician Ratings: total = 6 Conflict between parents Parent divorce/separation Emotionally abused Teacher Ratings: total = 6 Uses alcohol or drugs Strikes or pushes school personnel Preoccupied with sex Student Ratings: total = 18 Family does not enjoy being with each other more than other families A lot of tension in the home Parents often argue REALITY DISTORTION (RLT12: .95) Clinician Ratings: total = 3 Auditory hallucinations Inappropriate emotion, affect Delusions, paranoia Teacher Ratings: total = 0 Student Ratings: total = 13 Don't get along with others most of the time Need a lot of help from others Do strange or unusual things (Continued) (table continued on next page)

< previous page

page_413

next page >

< previous page

page_414

next page > Page 414

(table continued from previous page) TABLE 14.4 (Continued) SOMATIC CONCERN (SOM12: .92) Clinician Ratings: total = 6 Excessive sleeping Sexually abused Continually tired, listless Teacher Ratings: total = 0 Student Ratings: total = 33 Often get very tired At times have trouble breathing Don't have as much energy as most other kids PSYCHOLOGICAL DISCOMFORT (DIS12: .93) Clinician Ratings: total = 22 Depressed, sad, unhappy Moodiness Inadequate self-esteem Teacher Ratings: total = 4 Appears sad or unhappy Doesn't take successes and failures in stride Worries about little things Student Ratings: total = 25 Shy with kids my own age Not often in a good mood Often afraid of little things SOCIAL WITHDRAWAL (WDL12: .96) Clinician Ratings: total = 4 Withdrawn Shy Uncommunicative, seldom talks Teacher Ratings: total = 2 Pessimistic about the future Overcritical of himself/herself Student Ratings: total = 12 Shyness is my biggest problem Hardly ever talk Stay in the house for days at a time SOCIAL SKILL DEFICITS (SSK12: .96) Clinician Ratings: total = 5 Teased by peers Isolated, few or no friends Attends self-contained special education class Teacher Ratings: total = 17 Does not seem to have fun Avoids social interaction in class Unaware of the feelings of others Student Ratings: total = 28 Often rejected by other kids Have very few friends Not very popular with other kids Note. See text for explanation of correlate selection procedure. * Indicates correlation between total and 12-item version of PIC-2 clinical scale in a sample of 905 referred students.

< previous page

page_414

next page >

< previous page

page_415

next page > Page 415

test instructions. The Dissimulation scale identifies profiles that may result from either exaggeration of current problems or a malingered pattern of atypical or infrequent symptoms. The third validity scale, Defensiveness, identified profiles likely to demonstrate the effect of minimization or denial of current problems. The PIY also provides a fourth unique validity measure that consists of six items written so that either a True or False response would be highly improbable, such as a False response to ''I sometimes talk on the telephone." These six items, although also highly infrequent in the form of parent description, were omitted from the PIC-2 as they contributed no additional information beyond that provided by the other three validity scales. PIY and PIC-2 Inconsistency scales (INC) measure semantic inconsistency (Tellegen, 1988) through the classification of response to 35 pairs of highly correlated items drawn from all nine clinical scales (e.g., "I have many friends./I have very few friends." "My child has a lot of talent./My child has no special talents."). For each item pair, two response combinations are consistent and two are inconsistent (either True/True and False/False or True/False and False/True). Each inconsistent pair identified in a given protocol contributes one point to the INC raw score. Application of a cutting raw score of less than 13 resulted in correct identification of 90% to 95% of clinical protocols and greater than 12 correctly identified 92% to 96% of random protocols. The PIY and PIC-2 Dissimulation scales (abbreviation FB, representing "Fake Bad") were empirically constructed through item analyses that compared clinical protocols and two sets of protocols completed by nonreferred regular education students or their mothers. The PIY or PIC-2 was first completed with directions to provide an accurate or valid description. The same student or mother then completed a second questionnaire in which the student was now described as in need of mental health counseling or psychiatric hospitalization. Selected FB items in the scored direction were very infrequent in valid normal (PIY: 11%, PIC-2: 4%) and in valid clinical protocols (PIY: 18%, PIC-2: 15%), and highly frequent (PIY: 83%, PIC-2: 55%) in the "fake bad," or dissimulated, condition. FB items reflect "erroneous stereotype" in that they reflect face valid content by naïve informants, but demonstrate no empirical validity (Lanyon, 1997). Examples of the 42 PIY FB items include "People are out to get me" and "I do not care about having fun." Examples of the 35 PIC-2 FB items include "My child is not as strong as most children" and "My child often talks about sickness." Application of one FB cutting score to PIY data correctly identified 99% of accurate, 98% of fake bad, and 96% of clinical protocols. Application of two potential cutting scores to similar PIC-2 protocols revealed that both correctly classified from 97% to 100% of accurate regular education student descriptions. A cutting score of greater than 8 resulted in correct classification of 92% of dissimulated and 78% of clinical protocols (possible dissimulation), and a cutting score of greater than 14 resulted in correct classification of 70% of dissimulated and 95% of clinical protocols (probable dissimulation). The pattern of FB and INC scale elevation facilitates the differentiation of inadequate from inaccurate response. A deliberate exaggerated response (or for that matter, an accurate description of a severe or atypical psychopathological adjustment) would generate an elevated FB score and an unelevated INC scale score. Protocols completed without adequate statement comprehension, in contrast, obtain raw INC and FB scores approximating 50% of each scale's length; in this case, both scales are clinically elevated (see Figs. 11 and 12 in Lachar & Gruber, 1995b). Table 14.5 presents the PIC-2 and Student Behavior Survey (SBS) results that describe a hospitalized 7-yearold, first-grade boy with a history of multiple psychiatric hospitalizations. This boy had attended a self-contained special education classroom for the emotionally impaired. His current psychiatric hospitalization was due to reported verbal

< previous page

page_415

next page >

< previous page

next page >

page_416

Page 416 TABLE 14.5 A Case of Symptom Malingering and the PIC-2 Profile PIC-2 Validity Scales: Inconsistency: Dissimulation: Defensiveness: COG: 78 COG1: 68 COG2: 60 COG3: 98 ADH: 75 ADH1: 78 ADH2: 56

47 127 27 FAM: FAM1: FAM2:

82

RLT: RLT1: RLT2:

102

SOM: SOM1: SOM2:

74

DLQ: 89 DLQ1: 54 DLQ2: 102 DLQ3: 81 Student Behavior Survey (SBS) Academic Performance: Academic Habits: Social Skills: Parent Participation: Health Concerns: Emotional Distress: Unusual Behavior: Social Problems:

46 58 57 59 38 43 35

DIS: 81DIS1: 71DIS2: DIS3:

106

92WDL: 104WDL1: WDL2:

89

82SSK: 57SSK1: SSK2:

94

76 95 123 77 88

Verbal Aggression: Physical Aggression: Behavior Problems: Attention Deficit Hyperactivity Disorder: Oppositional Defiant Disorder: Conduct Disorder:

75 106 40 43 39 40 38 43

and physical aggression toward family members, noncompliance, auditory and visual hallucinations, agitation, running in front of moving cars, attempting to drown himself, hair pulling, running away from home, and oppositional defiant behavior. His parents were asking for assistance in obtaining an agency placement, because they were unable to cope with his undercontrolled behavior. Several psychiatrists had been sufficiently impressed by these complaints to prescribe a variety of psychotropic medications (stimulants, neuroleptics, mood stabilizers, and antidepressants). In contrast to this presentation of severe emotional and behavioral psychopathology, it became known that this young boy had called children's protective services to allege physical and emotional abuse by his parents. No indications of maladjustment had been observed during a week of hospitalization, and a telephone conversation with his teacher elicited that he was "a bright and cooperative student, obtaining excellent grades." At this point, the mother completed the PIC-2, the teacher the SBS, and the child was individually administered tests of intellectual ability and academic achievement. An obtained Full Scale IQ of 122 and SBS Academic Habits T = 58 and Social Skills T = 57 (positively worded statements reflect above average adaptive behaviors in the classroom) were in remarkable contrast to his mother's description of a severely disturbed child. A review of PIC-2 revealed an Inconsistency raw score of 4 (of 35) and an Dissimulation raw score of 24 (a score equivalent to the top third of dissimulated protocols), which raised serious concerns about the validity of this PIC-2. Of nine dimensions, only two received any support: FAM suggested considerable family conflict and parent maladjustment, and SOM (and the SBS) possibly reflected somatic components of stress experienced within this family. Subsequent weeks of hospitalization

< previous page

page_416

next page >

< previous page

page_417

next page > Page 417

documented in a variety of ways that this child had been the scapegoat in this family and had experienced considerable emotional abuse. The third set of validity scales, Defensiveness (DEF), are expanded versions of the PIC Lie scale. DEF items represent denials of common problems (e.g., "Sometimes I put off doing a chore." False; "My child almost never argues." True) and attributions of improbable positive adjustment (e.g., "My child always does his/her homework on time." True; ''I am almost always on time and remember what I am supposed to do." True). Such items represent inaccurate knowledge in the form of overendorsement (Lanyon, 1997). DEF elevations above 59T, even in hospitalized patients, result in profiles that either minimize current problems or consistently deny the presence of most or all problems in adjustment. A secondary interpretation of an elevated PIY DEF scale is that such a youth would be unlikely to be good candidate for talk therapy. Youth who respond with denial to items of an administration booklet are most likely to respond in a similar manner during a diagnostic interview. The INC/FB/DEF pattern readily identifies profiles in which caution must be applied to their interpretation. Table 14.6 presents the PIC-2 and PIY profile pairs obtained for three 12-year-old hospitalized patients. Review of Case A profiles identified PIC-2 and PIY validity scale T-scores that did not suggest the accuracy of these profiles had been compromised in any systematic manner. It is important to observe that both profiles included clinically elevated scale scores. First, consider similarities in scale and subscale pattern and scale and subscale clinical elevations (e.g., apparent disagreement between PIY and PIC-2 COG3 actually represents difference in subscale content). COG3 for parent report reflect developmental issues (Developmental Delay), and COG3 for student report represents problems in learning (Learning Problems), a dimension much more similar to PIC-2 COG2 (Poor Achievement). Case A is a 12-year-old in his fifth psychiatric hospitalization who attended a self-contained special education classroom for the behaviorally maladjusted. His current diagnoses were ADHD combined type, oppositional defiant disorder, and conduct disorder. His behavior at home and in the hospital demonstrated serious behavioral dyscontrol. He had been noncompliant in taking psychotropic medication to improve his emotional and behavioral control. He had threatened to kill himself (see DIS3 scores) and had assaulted his mother, who he said did not want him at home (see PIY FAM1, Parent-Child Conflict and FAM2, Parent Maladjustment). He attempted to escape from the hospital and required multiple time-outs and seclusions to control his rages, threats, and aggressive and inappropriate behavior (see DLQ2, Dyscontrol). Off of medication, Case A demonstrated an attention span of less than 15 minutes (see ADH values). He had a history of impulsive and disruptive behavior, fighting with peers, noncompliance with adults, verbal and physical aggression, and running away from home (see elevated DLQ values for PIY and PIC-2). The profiles of Case B, in contrast to Case A, document considerable disagreement between parent and child. The only consistency in agreement is in the problem area of academic achievement (PIC-2 COG2, Poor Achievement, PIY COG3, Learning Problems). History and psychometric assessment document retention in grade, special class placement, and academic achievement substantially below assessed intellectual ability. Clearly, the PIC-2 most accurately describes this 12-year-old male who presents with multiple handicaps. The elevation of the PIY DEF scale (T = 64) is the most likely explanation for this PIY profile that is essentially within normal limits. Indeed, review of this patient's medical record details his repeated denial and minimizing of problems during this hospitalization in an attempt to facilitate his early discharge from treatment. This child's psychiatric history was secondary to a traumatic motor vehicle accident. He felt

< previous page

page_417

next page >

< previous page

page_418

next page > Page 418

TABLE 14.6 The Influence of Respondent Defensiveness on PIC-2-PIY Profile Pairs Case A Case B Case C Scale/Subscale PIC-2 PIY PIC-2 PIY PIC-2 PIY Inconsistency 53 49 67 57 50 68 Dissimulation 60 72 81 48 47 72 64* 65* 50 Defensiveness 30 39 30 Cognitive Impairment 61 63 67 57 43 47 COG1 56 57 68 52 49 42 COG2 67 50 70 45 41 45 COG3 53 85 48 85 43 67 Impulsivity & Distract 79 69 81 39 44 64 ADH1 75 77 83 49 46 53 ADH2 81 62 64 37 41 62 57 ADH3 41 66 Delinquency 83 75 86 41 42 49 DLQ1 71 75 63 43 46 49 DLQ2 82 71 98 41 43 56 DLQ3 79 67 76 43 42 42 69 Family Dysfunction 60 58 57 67 49 FAM1 58 69 58 52 62 47 FAM2 60 74 54 57 72 49 53 FAM3 58 53 Reality Distortion 67 71 73 48 50 79 RLT1 69 64 83 53 55 74 RLT2 62 75 56 41 44 79 Somatic Concern 82 75 74 49 41 65 SOM1 82 66 76 38 42 59 SOM2 72 71 65 53 41 65 73 SOM3 59 65 Psychological Discomfort 90 59 97 63 53 68 DIS1 70 64 81 64 51 66 DIS2 88 38 92 52 49 58 DIS3 83 70 83 63 63 68 Social Withdrawal 47 83 85 41 46 59 77 WDL1 46 72 46 49 53 WDL2 50 78 88 38 42 65 Social Skill Deficits 57 43 86 56 49 57 SSK1 43 40 68 53 50 56 SSK2 74 50 97 59 48 55 * Note elevated DEF for PIY (Case B) and for PIC-2 (Case C). lonely and scared, he frequently cried, sobbed, shook, avoided others, and was preoccupied with excessive worries (DIS, WDL). He externalized his problems (DLQ) and had difficulties with peers and had little insight into his role in these conflicts (SSK2, Conflict with Peers). This pattern of PIY defensiveness is fairly common in inpatient settings. Case C is quite unusual in that the PIC-2 completed by C's mother is essentially within normal limits, with the exception of FAM1, Conflict Among Members, FAM2, Parent Maladjustment, and DIS3, Sleep Disturbance/Preoccupation with Death obtaining some elevation. This PIC-2 profile would be quite difficult to interpret without the presence of validity scales as C's mother had referred her 12-year-old daughter for this

< previous page

page_418

next page >

< previous page

page_419

next page > Page 419

current hospitalization. Case C presented with suicidal ideation, low self-esteem, depression, crying spells, poor appetite, and associated weight loss (DIS, WDL2, Isolation). She actively demonstrated somatic concern and somatic symptoms in response to conflict during this hospitalization (SOM) and told others that she would not talk about her problems with her mother, because she was afraid that she would distress her mother who was under psychiatric care (PIC-2 FAM2, Parent Maladjustment). Clinicians were sufficiently concerned with C's internalizing problems to assign discharge diagnoses of Generalized Anxiety Disorder and Depressive Disorder NOS and to continue her on antidepressant medication at discharge. Why was C's mother defensive in describing her daughter's problems? It was clearly documented in the medical record that C's mother was concerned that she would be seen as an inadequate mother because of her and C's psychiatric problems and consequently could lose custody of her child to another adult family member. Student Behavior Survey (SBS) The development of the Student Behavior Survey (SBS) (Lachar, Kline, Wingenfeld, & Gruber, 1999) consisted of several iterations in which the test authors, in their review of established teacher rating scales and in the writing of new rating statements, focused on content appropriate to teacher observation. SBS items are not derived from the PIY or PIC-2. Unlike measures that provide separate parent and teacher norms for the same questionnaire items (see, e.g., the Devereux Scales of Mental Disorders; Naglieri, LeBuffe, & Pfeiffer, 1994, and chap. 15 in this volume), the SBS items demonstrate a specific school focus. Review of the SBS reveals that 58 of 102 items specifically refer to in-class or in-school behaviors and judgments that can only be made by school staff (Wingenfeld, Lachar, Gruber, & Kline, 1998). The SBS items are profiled onto 14 scales that assess student academic status and work habits, social skills, parental participation in the educational process, and problems such as aggressive or atypical behavior and emotional stress. Norms that generate linear T-scores are genderspecific and divided into two age groups: 5 to 11 and 12 to 18 years. SBS items and their rating options appear on three pages. These items are sorted into content meaningful dimensions and are placed under 11 scale headings to enhance the clarity of item meaning rather than being presented in a random order. The SBS consists of three sections. In the first section, the teacher selects one of five ratings options (Deficient, Below Average, Average, Above Average, Superior) to describe eight areas of achievement, such as reading comprehension and mathematics, which are then summed to provide an estimate of current Academic Performance (AP). The remaining 94 items are rated on a 4-point frequency scale: Never, Seldom, Sometimes, and Usually. The second section (Academic Resources) presents positively worded statements divided into three scales. The first two of these scales consist of descriptions of positive behaviors that describe the student's adaptive behaviors: Academic Habits (AH) and Social Skills (SS). The third scale consists of rating of parents that are very school specific. In Parent Participation (PP), the teacher is asked to judge the degree to which parents support a student's educational program. The third SBS section, Problems in Adjustment, provides seven scales that consist of negatively worded items: Health Concerns (HC), Emotional Distress (ED), Unusual Behavior (UB), Social Problems (SP), Verbal Aggression (VA), Physical Aggression

< previous page

page_419

next page >

< previous page

page_420

next page > Page 420

(PA), and Behavior Problems (BP). Table 14.7 provides examples of SBS scale items, scale length, and coefficient alpha based on 601 protocols from students either obtained in clinical evaluation or receiving special education services in grades K through 12. Initial item and scale performance documented that 99 of 102 items statistically separated clinical and special education protocols from SBS protocols of regular education students. All items demonstrated that they had been placed on the scale with which each obtained the largest correlation. Scale scores of regular education and referred students obtained meaningful three-factor solutions (Wingenfeld et al., 1998). Additional effort (Pisecco, Lachar, Gallen, Gruber, & Huzinec, 1998) was applied to the construction of additional 16-item nonoverlapping scales. These scales consisted of SBS items drawn from several content dimensions that were nominated as fitting into one of three DSM-IV diagnoses: Attention Deficit Hyperactivity Disorder Combined Type (9 items from AH, 4 items from BP, and one each from SS, UB, and SP), Oppositional Defiant Disorder (4 items each from ED and VA, 3 items each from SS and BP, and 2 items from SP), and Conduct Disorder (8 items from BP, 5 items from PA, and 3 items from VA). Item-to-scale correlations and a three-factor solution of these 48 SBS items empirically supported the placement of these scale items. A substantial degree of criterion validity is demonstrated when SBS scales are correlated with PIC-2 scales and subscales. Table 14.8 presents these correlations in a sample of 287 students seen for evaluation. Inspection of each table column with special attention to correlations greater than .49 reveals support for 11 of 14 SBS scales. TABLE 14.7 SBS Scales, Their Psychometric Characteristics, and Sample Items Scale Name Items/a Examples Academic 8/.88 Reading comprehension Performance Speech articulation Academic Habits 13/.92 Completes class assignments Remembers teacher's directions Social Skills 8/.88 Helps other students Participates in class activities Parent Participation 6/.88 Parent(s) encourage achievement Parent(s) meet with school staff when asked Health Concerns 6/.87 Complains of headaches Talks about being sick Emotional Distress 15/.90 Appears sad or unhappy Mood changes without reason Unusual Behavior 7/.88 Says strange or bizarre things Seems disoriented or lost Social Problems 12/.86 Angers other students Teased by other students Verbal Aggression 7/.91 Argues and wants the last word Threatens other students Physical Aggression 5/.88 Hits or pushes other students Destroys property when angry Behavior Problems 15/.93 Disobeys class or school rules Lies to school personnel 16/.94 Waits for his/her turn Attention Deficit Talks excessively Hyperactivity Disorder Oppositional Defiant 16/.95 Mood changes without reason Disorder Insults other students Conduct Disorder 16/.93 Swears at school personnel Skips classes

< previous page

page_420

next page >

< previous page

next page >

page_421

Page 421 Teacher-rated Academic Performance correlates substantially with all three COG dimensions; note the correlation between AP and COG2 of .56. Academic Habits and Social Skills obtain similar correlations to PIC2 subscales demonstrating that disruptive behavior and limited social adjustment at school correlate with noncompliant (DLQ3) and disruptive (ADH1) behaviors at home, as well as parent report of poor school performance (COG2). Specific support was also achieved for Emotional Distress (DIS2 .53) and Social Problems (SSK .50, ADH1 .50). Parent description of poorly modulated anger (DLQ2) correlated with both Physical and Verbal Aggression in the classroom, whereas ADH and other DLQ dimensions also correlated with teacher-rated Verbal Aggression. Teacher-rated Behavior Problems in students also obtained substantial correlations with undercontrolled behaviors (ADH, DLQ) as described by their mothers. Health Concerns, Unusual Behavior, and Parent Participation obtained minimal, if any, support from this analysis. Parent/youth agreement between PIC-2 and PIY Somatic

PIC-2 COG

TABLE 14.8 Correlations Between SBS and PIC-2 Scale Scores SBS Scale AP AH SS PP HC ED UB SP VA PA BP ADHD 62 56 47 39 35 44 38 32 38 52 43 47 42 38 30 37 35 30 33 44

ODD CD 43 36 38 33

COG1 COG2 56 63 52 36 34 45 42 32 52 57 47 COG3 52 ADH 56 57 34 38 31 49 59 47 62 60 60 ADH1 32 57 57 33 40 32 50 57 47 61 61 59 ADH2 32 36 44 31 44 36 40 DLQ 47 52 37 42 47 60 47 64 49 58 DLQ1 31 30 41 55 36 DLQ2 39 47 35 39 39 58 51 54 40 54 DLQ3 50 50 31 40 48 51 40 57 51 54 FAM 33 32 38 FAM1 32 37 FAM2 32 30 RLT 35 43 42 35 42 43 33 40 41 43 RLT1 39 41 39 33 41 40 31 34 39 41 RLT2 38 40 34 39 31 40 37 38 SOM SOM1 SOM2 DIS 35 41 52 41 39 35 38 33 41 DIS1 33 DIS2 38 43 53 43 38 34 39 34 42 DIS3 33 35 37 32 39 44 31 36 WDL WDL1 WDL2 33 32 30 SSK 32 41 45 41 31 50 37 33 35 40 42 SSK1 34 35 36 38 45 33 32 SSK2 39 45 35 46 41 36 37 40 43 Note. Abbreviations explained in Tables 14.1 and 14.6. Decimals and correlations

< previous page

page_422

next page > Page 422

Concern performance (noted elsewhere) suggest that teachers may be less aware of these phenomena. Parent Participation is a dimension unique to the SBS, whereas Unusual Behavior may represent behaviors that are very infrequent in the classroom, even for students who receive mental health services. Evaluation of Treatment Effectiveness Baseline application of the PIC-2, SBS, and PIY at intake or program admission supports treatment planning by providing an efficient, comprehensive, and expeditious focus on a child's problems, which may then be placed within both a historical or developmental as well as a family systems context. Not only is FAM valuable in this context, but independent administration of the PIC-2 to each parent allows subsequent identification of problem areas with which parents do and do not agree. The provision of feedback to parents from PIC-2 profiles is quite straightforward and usually well received as these profiles summarize parent observation. Therapeutic effectiveness is documented through questionnaire readministration following intervention efforts. The focus and form of measure readministration should be guided by both the setting in which therapeutic intervention has occurred, as well as the nature of the identified problem dimensions under treatment. For example, if the problem focus is inattention and disruptive behavior primarily observed in the classroom, then repeated teacher ratings are most appropriate. On the other hand, if a child's individual psychotherapy focuses on current problems that have been demonstrated to be related to negative affect and a problematic self-concept, then repeated assessment using a content-appropriate self-report measure should be considered. Certainly the questionnaire that had documented the problems under treatment would be a most likely candidate for readministration. Table 14.9 presents the baseline PIC-2 and SBS scale scores and subsequent SBS results 1 and 3 months post baseline for a 6-year-old boy. His pediatrician initiated this referral, questioning the presence of an early learning disability or a behavior disorder. In his kindergarten class, he was observed to have difficulty settling down, staying in his seat, paying attention, and following instructions. He was described as disruptive and seeking excessive attention; demonstrating a low frustration tolerance, he often started fights with classmates and was described as the "class clown." On the other hand, he was liked by several of his peers and performed well academically. Inconsistent with a learning disability, individual assessment of this boy revealed above-average intellectual ability and advanced academic achievement 6 to 12 months beyond age expectations. PIC-2 and SBS scale scores document disruptive and noncompliant behaviors (ADH1, DLQ2, DLQ3, FAM1, BP, ADHD) and specific problems in attention and peer relationships (AH, SP). A significant degree of internalizing symptoms (somatic response to stress, fearfulness, sleep disturbance) was also suggested (SOM1, DIS1, DIS3); these problems, however, were not elicited in an interview with this child's parents. In response to a diagnosis of ADHD combined type, his pediatrician treated him with stimulant medication. Table 14.9 presents the SBS results 1 month and 3 months following initiation of medication. The SBS was selected because his problems were observed and most problematic in the classroom. SBS scale scores clearly document both improvement in inattentive, disruptive, and uncooperative behaviors (AH, SS, SP,

< previous page

page_422

next page >

< previous page

page_423

next page > Page 423

TABLE 14.9 Measuring the Effects of Therapeutic Intervention for a 6Year-Old Boy with the Student Behavior Survey Baseline Assessment PIC-2 Student Behavior Survey 1 Month 3 Months Academic Performance: 54 54 COG: 50 56 Academic Habits: 36 52 COG1: 64 53 Social Skills: 44 54 COG2: 39 59 Parent Participation: 56 56 COG3: 48 56 Health Concerns: 42 51 ADH: 73 51 Emotional Distress: 42 47 ADH1: 75 42 Unusual Behavior: 43 41 ADH2: 56 41 Social Problems: 59 46 43 Verbal Aggression: 50 40 DLQ: 66 43 Physical Aggression: 43 43 DLQ1: 46 43 Behavior Problems: 60 52 DLQ2: 66 54 DLQ3: 68 Attention Deficit Hyperactivity Disorder: 68 53 48 FAM: 62 Oppositional FAM1: 68 Defiant Disorder: 49 43 FAM2: 49 43 Conduct Disorder: 48 44 48 RLT: 53 RLT1: 51 RLT2: 56 SOM: 63 SOM1: 76 SOM2: 42 DIS: 70 DIS1: 76 DIS2: 53 DIS3: 83 WDL: 43 WDL1: 41 WDL2: 50 SSK: 53 SSK1: 52 SSK2: 52 Note. Significant interpretive elements presented in bold. VA, BP) and stability of these changes over 3 months. Of particular note is the change in the ADHD score of 68T at baseline to 48T following 3 months of treatment. Table 14.10 presents intake and 3-month reassessment of a 14-year-old adolescent girl referred by her parents for school refusal. Initial assessment revealed a significantly depressed and distressed adolescent in conflict with parents. Her parents also emphasized her noncompliant and disruptive behaviors. This patient entered into cognitive behavioral therapy focused on return to school and she also began treatment with an antidepressant. She and her mother completed a second assessment after she returned to school and began to attend consistently. Review of parent report measures reveal that five of eight short scales demonstrate significant change (greater than one standard

< previous page

page_423

next page >

< previous page

page_424

next page > Page 424

TABLE 14.10 Measuring the Effects of Therapeutic Intervention with the Personality Inventory for Youth and PIC-2 Shortened Scales Personality Inventory for Youth PIC-2 Shortened Scales Intake 3 Months Intake 3 Months COG: 57 47 COG1: 62 42 COG2: 55 50 COG3: 43 43 ADH: 57 47 ADH12: 81 70 ADH1: 42 42 ADH2: 55 48 ADH3: 66 50 DLQ: 62 55 DLQ12: 79 64 DLQ1: 65 57 DLQ2: 56 52 DLQ3: 60 53 FAM: 67 55 FAM12: 64 68 FAM1: 71 59 FAM2: 62 54 FAM3: 58 48 RLT: 53 51 RLT12: 61 55 RLT1: 56 52 RLT2: 48 48 SOM: 67 48 SOM12: 83 78 SOM1: 68 46 SOM2: 69 56 SOM3: 52 38 DIS: 75 54 DIS12: 92 75 DIS1: 66 54 DIS2: 72 53 DIS3: 74 50 WDL: 47 56 WDL12: 73 50 WDL1: 49 53 WDL2: 45 58 SSK: 61 59 SSK12: 72 54 SSK1: 53 59 SSK2: 70 55 Note. Significant interpretive elements presented in bold. deviation, or 10 T-score points), although only two (WDL, SSK) present a pattern of a clinically elevated scale falling at reassessment within the normal range. Parent report documents improved social adjustment and improved affect and behavioral control. This adolescent, in contrast, was much more explicit in documenting change. In two PIY profiles in which validity scales fall within normal limits, baseline assessment demonstrated nine clinically elevated subscales and fell to normal limits at the second administration. Prominent changes were demonstrated on DIS and SOM subscales, suggesting improved mood, sleep, and somatic state. Subscales also suggest improved behavioral control, less conflict with parents (FAM1), and improved relationships with peers (SSK2).

< previous page

page_424

next page >

< previous page

page_425

next page > Page 425

Commentary The revision of the PIC, the addition of a multidimensional teacher rating scale, and the collection of a national representative normative sample for each have gone a long way to respond to the concerns raised by others regarding the PIC's age (Kamphaus & Frick, 1996; Knoff, 1989; Merrell, 1994). Critical evaluations of the SBS and PIC-2 manuals and the study of their demonstrated ability to evaluate emotional adjustment at baseline and to quantify response to intervention will continue well into the next century. The emphasis on evaluating response accuracy using validity scales and the empirical determination of interpretive guidelines continues to characterize these measures. Many psychologists unconvinced of the importance of these psychometric phenomena may not value their contributions to assessment. Although the PIC has been reduced from 420 to 275 items, into which a set of subscales and a brief 96-item form have been incorporated, some clinicians may still judge the length of these questionnaires to be problematic. Indeed, the PIC is still being described in recent publications as a 600-item inventory (Noll et al., 1997). Although this chapter is obviously biased against the view that inventory length is intrinsically a negative attribute, it is certain that the breath and depth of a measure's content establishes the potential boundaries of its utility. Even the 270 items of the PIY are easily completed in less than 45 minutes by students in the fourth grade. PIC-2, PIY, and SBS efficiency has been improved by rejecting any item not actively used in the interpretive process, as well as providing computer software for scoring and interpretation. The value of saving 10 or 15 minutes of teacher, parent, or youth effort should be balanced against what is lost in measure reliability and in the restriction of the variety of dimensions assessed. As either new or revised measures, the approximately 400 PIC-relevant publications (bibliography available from this author) at best suggest the diagnostic potential for these forms on the dimensions that retain the greatest similarity from original to revised formats. A great deal of effort will be necessary to establish the diagnostic utility of the new and revised forms to achieve the standards of the original inventory (Lachar & Kline, 1994). Such efforts have begun (note Tables 14.4 and 14.8 in this chapter). For example, matched samples of adolescents with discharge diagnoses of either Conduct Disorder or Major Depression were correctly classified by PIY subscales in 83% of these cases (Lachar, Harper, Green, Morgan, & Wheeler, 1996). Conclusions This chapter reviews the development and application of a "family" of parent, teacher, and self-report multidimensional inventories for use with school-age children and adolescents (grades K-12). These objective questionnaires integrate a variety of psychometric components that improve efficiency and facilitate inventory interpretation, such as validity scales, subscale-within-scale structure, and screening forms designed to be sensitive to treatment effects. The PIC-2, PIY, and SBS measure dimensions of internalizing and externalizing problem behaviors, family character, and cognitive ability. Each measure incorporates dimensions that are similar across informants as well as dimensions that are unique to a given informant source. These questionnaires can be applied independently or in combination. This chapter provides demonstration of instrument validity, application to case studies, as well as use of the PIC-2, PIY, and SBS to document treatment effects.

< previous page

page_425

next page >

< previous page

page_426

next page > Page 426

Reference Achenbach, T.M. (1981). A junior MMPI? (Review of Multidimensional description of child personality: A manual for the Personality Inventory for Children and actuarial assessment of child and adolescent personality: An interpretive guide for the Personality Inventory for Children profile.) Journal of Personality Assessment, 45, 332-333. Achenbach, T.M., McConaughy, S.H., & Howell, C.T. (1987). Child/adolescent behavioral and emotional problems: Implications of cross-informant correlations for situational specificity. Psychological Bulletin, 101, 213-232. August, G.J., Realmuto, G.M., MacDonald III, A.W., Nugent, S.M., & Crosby, R. (1996). Prevalence of ADHD and comorbid disorders among elementary school children screened for disruptive behavior. Journal of Abnormal Child Psychology, 42, 571-595. Gdowski, C.L., Lachar, D., & Kline, R.B. (1985). A PIC profile typology of children and adolescents: I. An empirically-derived alternative to traditional diagnosis. Journal of Abnormal Psychology, 94, 346-361. Greenbaum, P.E., Dedrick, R.F., Prange, M.E., & Friedman, R.M. (1994). Parent, teacher, and child ratings of problem behaviors of youngsters with serious emotional disturbances. Psychological Assessment, 6, 141-148. Harrington, R.G., & Follett, G.M. (1984). The readability of child personality assessment instruments. Journal of Psychoeducational Assessment, 2, 37-48. Jensen, P.S., Watanabe, H.K., Richters, J.E., Roper, M., Hibbs, E.D., Salzberg, A.D., & Liu, S. (1996). Scales, diagnoses, and child psychopathology: II. Comparing the CBCL and the DISC against external validators. Journal of Abnormal Child Psychology, 24, 151-168. Kamphaus, R.W., & Frick, P.J. (1996). Clinical assessment of child and adolescent personality and behavior. Boston: Allyn & Bacon. Kline, R.B., & Lachar, D. (1992). Evaluation of age, sex, and race bias in the Personality Inventory for Children (PIC). Psychological Assessment, 4, 333-339. Kline, R.B., Lachar, D., & Gdowski, C.L. (1987). A PIC typology of children and adolescents: II. Classification rules and specific behavior correlates. Journal of Clinical Child Psychology, 16, 225-234. Kline, R.B., Lachar, D., Gruber, C.P., & Boersma, D.C. (1994). Identification of special education needs with the Personality Inventory for Children (PIC): A profilematching strategy. Assessment, 1, 301-313. Kline, R.B., Lachar, D., & Sprague, D.J. (1985). The Personality Inventory for Children (PIC): An unbiased predictor of cognitive and academic status. Journal of Pediatric Psychology, 10, 461-477. Knoff, H.M. (1989). Review of the Personality Inventory for Children, Revised Format. In J.C. Connolly & J.C. Kramer (Eds.), The tenth mental measurements yearbook (pp. 624-630). Lincoln, NE: Buros Institute of Mental Measurements. Lachar, D. (1982). Personality Inventory for Children (PIC) Revised Format manual supplement. Los Angeles: Western Psychological Services. Lachar, D. (1998). Observations of parents, teachers, and children: Contributions to the objective multidimensional assessment of youth. In A.S. Bellack & M. Hersen (Series Eds.) & C.R. Reynolds (Vol. Ed.), Comprehensive clinical psychology: Vol. 4. Assessment (pp. 371-401). New York: Pergamon. Lachar, D. (1999). Personality Inventory for Children-2 (PIC-2) manual. Los Angeles: Western Psychological Services. Lachar, D., & Gdowski, C.L. (1979). Actuarial assessment of child and adolescent personality: An interpretive guide for the Personality Inventory for Children profile. Los Angeles: Western Psychological Services. Lachar, D., Gdowski, C.L., & Snyder, D.K. (1982). Broad-band dimensions of psychopathology: Factor scales for the Personality Inventory for Children. Journal of Consulting and Clinical Psychology, 50, 634-642. Lachar, D., & Gruber, C.P. (1993). Development of the Personality Inventory for Youth: A self-report companion to the Personality Inventory for Children. Journal of Personality Assessment, 61, 81-98. Lachar, D., & Gruber, C.P. (1995a). Personality Inventory for Youth (PIY) manual: Administration and interpretation guide. Los Angeles: Western Psychological Services.

< previous page

page_426

next page >

< previous page

page_427

next page > Page 427

Lachar, D., & Gruber, C.P. (1995b). Personality Inventory for Youth (PIY) manual: Technical guide. Los Angeles: Western Psychological Services. Lachar, D., Harper, R.A., Green, B.A., Morgan, S. T., & Wheeler, A.C. (1996, August). The Personality Inventory for Youth: Contribution to diagnosis. Paper presented at the 104th Annual Convention, American Psychological Association, Toronto Canada. Lachar, D., & Kline, R.B. (1994). The Personality Inventory for Children (PIC) and the Personality Inventory for Youth (PIY). In M. Maruish (Ed.), Use of psychological testing for treatment planning and outcome assessment (pp. 479-516). Hillsdale NJ: Lawrence Erlbaum Associates. Lachar, D., Kline, R.B., & Gdowski, C.L. (1987). Respondent psychopathology and interpretive accuracy of the Personality Inventory for Children: The evaluation of a ''most reasonable" assumption. Journal of Personality Assessment, 51, 165-177. Lachar, D., Kline, R.B., Green, B.A., & Gruber, C.P. (1996, August). Contribution of self-report to PIC profile type interpretation. Paper presented at the 104th Annual Convention, American Psychological Association, Toronto, Canada. Lachar, D., Kline, R.B., Wingenfeld, S.A., & Gruber, C.P. (1999). Student Behavior Survey (SBS) manual. Los Angeles: Western Psychological Services. LaCombe, J.A., Kline, R.B., Lachar, D., Butkus, M., & Hillman, S.B. (1991). Case history correlates of a Personality Inventory for Children (PIC) profile typology. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 13, 1-14. Lanyon, R.I. (1997). Detecting deception: Current models and directions. Clinical Psychology: Science and Practice, 4, 377-387. Loeber, R., Green, S.M., & Lahey, B.B. (1990). Mental health professionals' perception of the utility of children, mothers, and teachers as informants on childhood psychopathology. Journal of Clinical Child Psychology, 19, 136-143. Lonigan, C.J., Carey, M.P., & Finch, A.J., Jr. (1994). Anxiety and depression in children and adolescents: Negative affectivity and the utility of self-reports. Journal of Consulting and Clinical Psychology, 62, 10001008. Merrell, K.W. (1994). Assessment of behavioral, social, and emotional problems. Direct and objective methods for use with children and adolescents. New York: Longman. Michael, K.D., & Merrell, K.W. (1998). Reliability of children's self-reported internalizing symptoms over short to medium-length time intervals. Journal of the American Academy of Child and Adolescent Psychiatry, 37, 194-201. Naglieri, J.A., LeBuffe, P.A., & Pfeiffer, S.I. (1994). Devereux Scales of Mental Disorders manual. San Antonio, TX: The Psychological Corporation. Noll, R.B., Stehbens, J.A., MacLean, W.E. Jr., Waskerwitz, M.J., Whitt, J.K., Ruymann, F. B., Kaleita, T.A., & Hammond, G. D. (1997). Behavioral adjustment of social functioning of long-term survivors of childhood leukemia: Parent and teacher reports. Journal of Pediatric Psychology, 22, 827-841. Phares, V. (1997). Accuracy of informants: Do parents think that mother knows best? Journal of Abnormal Child Psychology, 25, 165-171. Pisecco, S., Lachar, D., Gallen, R.T., Gruber, C. P., & Huzinec, C. (1998). Development of disruptive behavior DSM-IV scales from teacher ratings. Richters, J.E. (1992). Depressed mothers as informants about their children: A critical review of the evidence for distortion. Psychological Bulletin, 112, 485-499. Tellegen, A. (1988). The analysis of consistency in personality assessment. Journal of Personality, 56, 621-663. Vaughn, M.L., Riccio, C.A., Hynd, G.W., & Hall, J. (1997). Diagnosing ADHD (predominantly inattentive and combined subtypes): Discriminant validity of the Behavior Assessment System for Children and the Achenbach parent and teacher rating scales. Journal of Clinical Child Psychology, 26, 349-357. Wingenfeld, S.A., Lachar, D., Gruber, C.P., & Kline, R.B. (1998). Development of the teacher-informant Student Behavior Survey. Journal of Psychoeducational Assessment, 16, 226-249.

< previous page

page_427

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_429

next page > Page 429

Chapter 15 The Child Behavior Checklist and Related Instruments Thomas M. Achenbach University of Vermont The Child Behavior Checklist (CBCL) is one of a family of instruments designed to obtain data on behavioral/emotional problems and competencies. These instruments represent an approach called multiaxial empirically based assessment. Unlike the multiaxial aspects of the American Psychiatric Association's (1994) Diagnostic and Statistical Manual (DSM), this approach is "multiaxial" in the sense that it focuses on assessment data obtained from multiple sources. In the clinical assessment of most children, data from parents, teachers, standardized tests, physical examinations, and direct assessment of the child are relevant, as summarized in terms of the five axes shown in Table 15.1. This chapter focuses on standardized data obtained from parents (Axis I), teachers (Axis II), and direct assessment of the child, including observations, interviews, and self-reports (Axis V). Meta-analyses of many studies using many different instruments have shown low to moderate correlations among different sources of data concerning children's behavioral/emotional problems (Achenbach, McConaughy, & Howell, 1987). Because children's behavior varies between contexts (e.g., home and school), and between different interaction partners (e.g., mothers, fathers, and teachers), no single source of data can serve as a gold standard. Instead, comprehensive assessment of children requires data from multiple sources. This approach is "empirically based" in the sense that both the assessment data and the derivation of syndromes and scales are empirical, rather than being based on a priori assumptions about the nature of psychopathology or on theoretical inferences about underlying determinants. The empirical derivation of syndromes is especially important, because there has been no well-validated taxonomy of child and adolescent behavioral/emotional disorders. The approach to assessing behavioral/emotional problems began with gathering numerous candidate items designed to be rated by particular types of informants, such as parents, teachers, youths, direct observers, or clinical interviewers. The item pools were assembled from reviews of the literature and our own research, as well as from suggestions by mental health professionals and by the informants for whom the instruments were designed. Each draft instrument was tested and

< previous page

page_429

next page >

< previous page

page_430

next page > Page 430

TABLE 15.1 Examples of Multiaxial Assessment Age Axis I Axis II Axis III Axis IV Axis V Range Parent Teacher Cognitive Physical Direct Assessment Reports Reports Assessment Assessment of Subject 2-5 CBCL/2-3 C-TRF/2-5a Ability tests Height, weight Observations during CBCL/4-18 Preschool Perceptual-motor Medical exam play History records tests Neurological Interview Parent Teacher Language tests exam interview interview 6-11 CBCL/4-18 TRFb Ability tests Height, weight DOFc History School recordsAchievement testsMedical exam SCICAd Parent Teacher Perceptual-motor Neurological interview interview tests exam Language tests 12-18 CBCL/4-18 TRF Ability tests Height, weight DOF History School recordsAchievement testsMedical exam YSRe Parent Teacher Language tests Neurological SCICA interview interview exam Personality tests 18-30 YABCLf Ability tests Medical exam YASRg Neurological Clinical interview exam Personality tests a Caregiver-Teacher Report Form for Ages 2-5 (Achenbach, 1997a). b Teacher's Report Form (Achenbach, 1991c). c Direct Observation Form (Achenbach, 1991b). d Semistructured Clinical Interview for Children and Adolescents (McConaughy & Achenbach, 1994). e Youth Self-report (Achenbach, 1991d). f Young Adult Behavior Checklist (Achenbach, 1997b). g Young Adult Self-report (Achenbach, 1997b).

< previous page

page_430

next page >

< previous page

page_431

next page > Page 431

revised over multiple pilot editions that were filled out and critiqued by the relevant informants for real cases. After the items were finalized for a particular instrument, the instrument was used to assess a large number of children who had been referred for professional help with behavioral/emotional problems. The referred children were drawn from diverse mental health and special educational settings in order to avoid the biases inherent in the caseloads of individual settings. Within samples of children of each sex in particular age ranges, correlations were computed among the problem items, scored on a particular instrument. To identify syndromes of problems that tend to co-occur, principal components analyses were performed on the correlations among problems found in each sample. (The term syndrome is used in the generic sense of problems that tend to occur together. It does not imply any assumptions about the etiology or diagnosis of disorders.) In order to identify syndromes that were statistically robust, varimax rotations were performed on different numbers of principal components from each analysis. Sets of items that remained together throughout multiple rotations were retained as a basis for syndromes characterizing children of a particular age and sex. Historically, this research program began with data derived from child psychiatric case histories (Achenbach, 1966). Thereafter, it led to the development of rating forms to be filled out by each type of informant, including parents, teachers, youths, observers, and interviewers. In the initial derivation of syndromes, each syndrome was based entirely on the items found to occur together in the analyses of either boys or girls in a particular age range scored on a particular instrument (Achenbach, 1978; Achenbach & Edelbrock, 1983, 1986, 1987). Profiles were constructed for scoring the syndromes obtained for each sex in each age range on each instrument in relation to normative samples of the same sex and age. Some of the syndromes had counterparts in virtually all analyses. Versions of a syndrome designated as Aggressive Behavior, for example, were found for both sexes in all age ranges, as rated by each type of informant. Other syndromes were more variable, being found for only one sex, limited age ranges, or particular informants. As the use of these instruments spread, clinicians and researchers increasingly sought to compare findings for children of both sexes and different ages, rated by different informants. They also sought to compare findings for the same child rated at different ages and/or by different informants. The variations in syndromes by sex, age, and informant impeded such comparisons. Cross-Informant Syndromes Derived from the CBCL/4-18, TRF, and YSR In 1991, major revisions were made in the profiles for scoring parents' reports on the Child Behavior Checklist for Ages 4-18 (CBCL/4-18), teachers' reports on the Teacher's Report Form for Ages 5-18 (TRF), and selfreports on the Youth Self-Report for Ages 11-18 (YSR). The profiles are designed to score all three instruments in terms of eight cross-informant syndrome constructs that have been derived from data on both sexes in multiple age ranges, rated by different informants (Achenbach, 1991b, 1991c, 1991d). The cross-informant syndrome constructs are defined by sets of problems that were found to co-occur in a majority of samples of boys and girls of different ages, as rated by different types of informants. The problems are assessed by having the informants

< previous page

page_431

next page >

< previous page

page_432

next page > Page 432

TABLE 15.2 Cross-informant Syndrome Constructs Scored from the CBCL/4-18, TRF, and YSR Neither Internalizing nor Externalizing Internalizing Externalizing Withdrawn Social Problems Delinquent Behavior Somatic ComplaintsThought Problems Aggressive Behavior Anxious/Depressed Attention Problems rate items such as Cruel to animals and Nightmares on three-step scales where 0 = "Not true (as far as you know)," 1 = "Somewhat or sometimes true," and 2 = "Very true or often true.'' The eight syndrome constructs that met the criteria for robustness across sex, age, and informant are listed in Table 15.2. The Internalizing and Externalizing groupings of syndromes shown in Table 15.2 were formed from secondorder factor analyses of scores on the eight syndromes (Achenbach, 1991a). Children can be scored on scales comprising all the Internalizing items and all the Externalizing items, as well as on the syndrome scales and a scale of total problem scores. Profiles for Scoring the CBCL/4-18, TRF, and YSR The profiles for the CBCL/4-18, TRF, and YSR all include scales for scoring the eight cross-informant syndrome constructs. In addition, the profiles also preserve important sex, age, and informant variations. This is done in three ways. First, items that were strongly associated with a cross-informant syndrome in ratings by only one type of informant are included in the scale for scoring that syndrome from that type of informant. For example, an item concerning suicidal thoughts was strongly associated with the Anxious/Depressed syndrome only in self-ratings by youths on the YSR. Because this item was not strongly associated with the Anxious/Depressed syndrome in parents' ratings on the CBCL/4-18 or teachers' ratings on the TRF, it is not included in the cross-informant construct for the Anxious/Depressed syndrome, nor in the scales for scoring this syndrome. However, it is included in the scale for scoring the Anxious/Depressed syndrome on the YSR profile. A second way in which sex, age, and informant variations are preserved is that syndromes found only in analyses of a particular group rated by a particular type of informant are scored for that group and type of informant. For example, in boys' YSR ratings, a syndrome was found that was designated as SelfDestructive/Identity Problems. There was no counterpart of this syndrome in girls' YSR ratings, nor in parent or teacher ratings. To preserve this syndrome that is specific to boys' self-ratings, it is scored on the YSR profile for boys. A second example is a syndrome designated as Sex Problems that was found in parents' ratings of 4- to 11-year-old boys and girls, but not in parents' ratings of 12- to 18-year-olds, nor in teacher ratings or selfratings. The Sex Problems syndrome is therefore scored from the CBCL for both sexes at ages 4 to 11, but not from the CBCL at ages 12 to 18, or from the TRF or YSR. A third way in which sex, age, and informant variations are preserved is by providing norms for each syndrome scale based on children of a particular sex in a particular age range rated by a particular type of informant. The normative samples of each sex and

< previous page

page_432

next page >

< previous page

page_433

next page > Page 433

type of informant were drawn from a nationally representative sample of children that excluded those who had received mental health services or special remedial school classes in the preceding 12 months. A child's standing on each syndrome is displayed in terms of percentiles and T-scores derived from a normative sample of nonreferred children of the same sex and age range, rated by the same type of informant. Figure 15.1 displays a computer-scored version of the CBCL profile for 15-year-old Ginny. Percentiles of the normative sample for the child's sex and age are indicated on the left side of the profile, and T-scores are indicated on the right side. The raw scale score and T-score for each syndrome scale are printed below the item scores for the scale.

Fig. 15.1. Computer-scored CBCL problem profile for 15-year-old Ginny. From Empirically Based Taxonomy: How to Use Syndromes and Profile Types Derived From the CBCL/4-18, TRF, and YSR (p. 45) by T.M. Achenbach, 1993, Burlington, Vermont: University of Vermont. Copyright © 1993 by T.M. Achenbach. Reprinted by permission.

< previous page

page_433

next page >

< previous page

page_434

next page > Page 434

TABLE 15.3 Derivation Samples for CBCL/4-18, TRF, and YSR Syndromes and Normsa Syndrome Derivation Normative Sample Instrument N Sources N Sources CBCL/4- 4,44552 mental health 2,368 Parents in national home 18 services interview survey TRF/5-18 2,815 1,391 58 mental health & Teachers of children in special national survey education services YSR/11- 1,27226 mental health 1,315 Youths in national home 18 services interview survey a Children were drawn from a sample selected to be representative of the 48 contiguous states with respect to socioeconomic status (SES), ethnicity, region, and urbansuburbanrural residence. Children were excluded if they had disabilities, had recieved mental health services or special remedial school classes in the previous 12 months, or lacked an English-speaking parent or prent surrogate. Details of the samples, SES, ethnicity, region, and informants are presented by Achenbach (1991b, 1991c, 1991d) and McConaughy, Stanger, and Achenbach (1992). The broken lines across the profile at T-scores of 67 and 70 demarcate a borderline clinical range. Syndrome Tscores below 67 are in the normal range, whereas syndrome T-scores above 70 are in the clinical range. The borderline range is included in the profile to emphasize the quantitative nature of a child's standing on each scale. Rather than indicating whether a child is "sick" or "well," the scales indicate how high the child scores relative to normative samples of peers rated by the same type of informant. For the Internalizing, Externalizing, and total problem scores, which encompass broader ranges of problems than the individual syndrome scales, the borderline clinical range is defined by T-scores from 60 to 63. Table 15.3 summarizes the clinical samples from which the scales were derived and the normative samples on which the percentiles and T-scores are based. Coordinating Parent, Teacher, and Self-Ratings To facilitate comparisons among parent, teacher, and self-ratings for clinical or research purposes, the problem portions of the CBCL/4-18, TRF, and YSR profiles are laid out in a uniform format, as illustrated in Fig. 15.1. Thus, if ratings are obtained on one or more CBCLs, one or more TRFs, and the YSR, profiles from each of them can be directly compared to identify syndrome scales that show agreement or disagreement among the informants. This can be done using either hand-scored profiles or profiles scored from computer programs that are available from the author of this chapter. A cross-informant program offering the following features is available: 1. The user can enter data from any combination of CBCLs, TRFs, and YSRs for the same child. 2. Profiles can be scored and printed from each informant. 3. The scores obtained from all informants for 89 items common to the three instruments can be displayed side by side. 4. T-scores obtained from all informants for the eight syndrome scales, Internalizing, Externalizing, and total problems can be displayed side by side.

< previous page

page_434

next page >

< previous page

page_435

next page > Page 435

5. Q-correlations are printed between the 89 item scores and between the 8 syndrome scale scores from all combinations of informants who have rated the child. Q-correlations are also printed from large reference samples of informants. 6. All the raw scores, T-scores, and Q-correlations can be stored and used as input for statistical analyses. The computer program also computes intraclass correlations between a child's profile and profile types identified via cluster analyses (Achenbach, 1993). Figure 15.2 illustrates the portion of the cross-informant printout that displays Ginny's T-scores from ratings on CBCLs completed by both her parents, the YSR completed by Ginny, and TRFs completed by two teachers. The printout provides side-by-side listings of the T-scores for the eight syndrome scales common to all three instruments, Internalizing, Externalizing, and total problems. It prints a single cross beside each score that is in the borderline clinical range and two crosses beside each score that is in the clinical range. To provide a quantitative index of agreement between pairs of informants, the program computes Q-correlations between the item scores obtained from the informants and also computes Q-correlations between the eight syndrome T-scores obtained from pairs of informants. The Q-correlations are obtained by using the Pearson product-moment formula to compute the association between a set of scores obtained from one informant and the corresponding set of scores obtained from a second informant. For example, the scores obtained from Ginny's YSR on the 89 items common to all three instruments are correlated with the scores obtained from her mother's CBCL on the same 89 items. Similarly, the T-scores obtained from Ginny's YSR on the eight syndrome scales are

Fig. 15.2. Cross-informant printout of scores obtained by 15-year-old Ginny. Copyright © 1993 by T.M. Achenbach. Reprinted by permission.

< previous page

page_435

next page >

< previous page

page_436

next page > Page 436

correlated with the T-scores obtained from her mother's CBCL on the eight syndrome scales. As Figure 15.2 shows for the syndrome scales, the Q-correlations between informants are printed beneath the side-by-side listings of the syndrome scale scores. In addition, to provide a basis for evaluating the level of agreement for particular pairs of informants, the program prints out the 25th percentile, mean, and 75th percentile Q-correlation obtained in large reference samples of similar pairs of informants. As shown in Fig. 15.2, Q-correlations that are below the 25th percentile are considered to indicate below average agreement, those between the 25th and 75th percentiles are considered to be in the average range, and those above the 75th percentile are considered to be above average. For both clinical and research purposes, the cross-informant program thus enables users to identify cases for which agreement between particular pairs of informants is low, average, or high. It also enables users to identify individual informants whose reports are especially discrepant from reports by other informants about the same child. Unless their reports can be substantiated, the informants' whose reports are discrepant in important ways from those of all the other informants may be targeted for interventions to change their perceptions of the child's behavior or their own behavior toward the child. Competencies Scored from the CBCL/4-18, TRF, and YSR A key goal of the multiaxial empirically based approach has been to derive syndromal constructs as a basis for a taxonomy of child and adolescent disorders that can be assessed from multiple sources of data. However, children's need for help may depend not only on the problems they manifest, but also on their competencies or lack thereof. A child who has a high score on the Attention Problems syndrome, but who has good social skills, an IQ of 130, and strong academic interests, for example, may not need the same help as a child with the same score on the Attention Problems syndrome but poor social skills, an IQ of 90, and no interest in school. To combine the assessment of competencies with the assessment of behavioral/emotional problems, the CBCL/4-18 and YSR include items for tapping the amount and quality of involvement in sports, nonsports activities, organizations, jobs and chores, social relationships, and school performance. The TRF has items for assessing academic performance in terms of grade level, plus adaptive functioning in terms of how hard children are working, how appropriately they are behaving, how much they are learning, and how happy they are. These items are scored on scales that compare children's standing with scores obtained by the same national samples as were used to norm the problem scales. As shown on the hand-scored version of Ginny's profile in Fig. 15.3, the competence scales scored from the CBCL/4-18 are designated as Activities, Social, and School. A total competence score is computed by summing the three scale scores. Like the problem portion of the profile, the competence portion displays percentiles on the left and T-scores on the right that are based on a national normative sample. The broken lines printed across the profile demarcate the borderline clinical range. However, unlike the problem scales, scores below the bottom broken line (indicating a lack of competencies) are in the clinical range. Scores above the top broken line (indicating more competencies)

< previous page

page_436

next page >

< previous page

page_437

next page > Page 437

Fig. 15.3. Hand-scored CBCL competence profile for 15-year-old Ginny. Copyright © 1993 by T.M. Achenbach. Reprinted by permission.

< previous page

page_437

next page >

< previous page

page_438

next page > Page 438

are in the normal range. Analogous profiles are provided for scoring competencies from the YSR and adaptive functioning from the TRF. Other Empirically Based Instruments The CBCL/4-18, TRF, and YSR are designed to obtain similar types of data in a similar format from parents, teachers, and youths at ages where two or all three types of informants are relevant. This empirically based approach has also been used to develop analogous assessment instruments for preschoolers and adults. In addition, the approach has been extended to observational assessment of children in group settings, such as classrooms, and in individual clinical interviews. The Child Behavior Checklist for Ages 2-3 (CBCL/2-3) There has been relatively little research on behavioral/emotional problems occurring prior to age 4. Neither clinical assessment procedures nor nosologies, such as the American Psychiatric Association's (1994) Diagnostic and Statistical Manual (DSM), provide differentiated pictures of disorders among toddlers. To extend empirically based assessment to younger ages, the CBCL/2-3 was developed. The CBCL/2-3 has 99 problem items plus an open-ended item for additional problems that are scored on three-step scales like those for the CBCL/4-18, TRF, and YSR. Fifty-nine of the items have counterparts on the CBCL/4-18. Other than developmental milestones, it is difficult to specify what characteristics should be considered to represent social competence among toddlers. Consequently, the CBCL/2-3 does not include competence items. However, it does have an open-ended item for describing the best things about the child, as do the CBCL/4-18, TRF, and YSR. The profile for scoring the CBCL/2-3 (Achenbach, 1992) resembles the profile shown in Fig. 15.1 for the CBCL/4-18. It provides percentiles and normalized T-scores based on a randomly selected general population sample, and it demarcates a borderline clinical range from T-scores of 67 to 70 on the syndrome scales. The syndromes were derived from principal components/varimax analyses of 2- and 3-year-olds who had been referred for mental health services or who scored in the top 50% of nonreferred children, as summarized in Table 15.4. Six syndromes were identified. Those designated as Anxious/Depressed, Withdrawn, Somatic Problems, and Aggressive Behavior have approximate counterparts in the cross-informant syndromes identified for older ages. The remaining CBCL/2-3 syndromesdesignated as Sleep Problems and Destructive Behaviordo not have clear counterparts among the syndromes identified for older ages. Second-order factor analyses yielded an Internalizing grouping consisting of the Anxious/Depressed and Withdrawn syndromes and an Externalizing grouping consisting of the Aggressive Behavior and Destructive Behavior syndromes. The computer scoring program provides comparisons of multiple CBCL/2-3 forms for each child. The Caregiver-Teacher Report Form for Ages 2-5 (C-TRF/2-5) The C-TRF/2-5 is designed to assess young children on the basis of data obtained from day-care providers and preschool teachers (Achenbach, 1997a). Like the CBCL/2-3, the C-TRF/2-5 has 99 problem items, plus openended items for additional problems and

< previous page

page_438

next page >

< previous page

page_439

next page > Page 439

TABLE 15.4 Derivation Samples for CBCL/2-3, C-TRF/2-5, DOF, SCICA, YABCL, and YASR Syndromes and Norms Syndrome Derivation Normative Samples Instrument N Sources N Sources CBCL/2-3 546 368 7 mental health & special Home interview education services, plus children surveys of parents having high problem scores in in national sample other samples & Worcester, MA area C-TRF/2- 1,001 1,075 Children referred for services or Day-care & 5 obtaining high scores in 26 daypreschool settings at care & preschool settings 15 sites DOF 212 287 Children referred for mental Classroom control health or school psychological children in 45 services from 45 public & public & parochial parochial schools schools SCICA 168 237 Children referred to outpatient Children referred to psychiatry service or school outpatient psychologists psychiatry service or school psychologists YABCL 1,532 1,074National sample Young adults seen in 12 mental health settings or obtaining high scores in national sample YASR 1,455Same as YABCL 1,058Same as YABCL for describing the best things about the child. However, 17 of the C-TRF/2-5 problem items differ from those of the CBCL/2-3 in being oriented to day care and preschool rather than family contexts. Seven syndromes were derived from principal components/varimax analyses of the C-TRF/2-5 completed for 1,001 children who were seen in 26 day-care and preschool settings. Syndromes designated as Somatic Problems, Attention Problems, and Aggressive Behavior have clear counterparts among the syndromes derived from the other instruments. Syndromes designated as Anxious/Obsessive and Depressed/Withdrawn have approximate counterparts in the Anxious/Depressed and Withdrawn syndromes derived from the other instruments. However, syndromes designated as Fears and Immature embody patterns that may be rather specific to day-care and preschool settings. The C-TRF/2-5 is normed on 1,075 children assessed at 15 sites. The computer program for scoring the C-TRF/2-5 can compare multiple C-TRF forms for each child. The Direct Observation Form (DOF) The DOF was developed to apply the empirically based approach to the assessment of problem behavior observed in school classrooms and other group settings. The DOF has 96 problem items, plus an open-ended item for entering additional problems. The items are designed to capture problem behavior that can be observed in 10-minute observational samples. Seventy-two of the problem items have counterparts on the CBCL/4-18, whereas 85 have counterparts on the TRF. The DOF also records whether the child is on task or not on task at the end of each 1 minute of observations. The on-task scores are summed to provide a score of 0 to 10 for each 10-minute observational session.

< previous page

page_439

next page >

< previous page

page_440

next page > Page 440

To provide a record of what is seen in each observational session, the observer writes a narrative description of the child's behavior and interactions over a 10-minute interval. The narrative is written in a space provided on the DOF near the list of items to be rated. To take account of the context, the behavior of others toward the child, and characteristics of the child that might not be captured by precoded items, the observer describes the actual stream of behavior, including the events impinging on the child. At the end of each 10 minutes of observation, the observer scores the items on a 4-step scale ranging from 0 = "not observed" to 3 = "definite occurrence with severe intensity or greater than 3 minutes duration." To obtain a stable index of problems and on-task behavior, 10-minute samples of behavior should be obtained on three to six occasions. Scores are then averaged over these occasions. To provide a baseline for the behavior of peers in the same context, it is recommended that the DOF be completed for one "control" child of the same sex observed just before the target child and a second control child of the same sex observed just after the target child. To facilitate comparisons of the target and control children across multiple occasions, the scoring program for the DOF prints out mean scores for the target and control children averaged over up to six occasions. The program displays these comparisons for on-task behavior, six empirically derived syndromes, Internalizing, Externalizing, and total problem scores. The following syndromes were derived from principal components/varimax analyses of 212 clinically referred 5- to 14-year-old children: Withdrawn-Inattentive, Nervous-Obsessive, Depressed, Hyperactive, Attention Demanding, and Aggressive. Percentiles for the problem scores are based on 5- to 14-year-old children (N = 287) observed as controls for referred children in regular classrooms of 45 public and parochial schools of 23 school systems located in Vermont, Nebraska, and Oregon. Hand-scored profiles are available for on-task, Internalizing, Externalizing, and total problem scores. However, because the syndrome scales are too laborious to average by hand, they are scorable only from the DOF computer-scoring program. Mean scale scores for referred and nonreferred children are provided by Achenbach (1991b). Reliability and validity data have been reported by McConaughy, Achenbach, and Gent (1988), and Reed and Edelbrock (1983). The Semistructured Clinical Interview for Children and Adolescents (SCICA) Interviews are probably the most widely used clinical assessment procedures. They typically form the centerpiece of clinical contacts in that they provide the clinician with direct impressions of clients and their responses to the clinician's probing. What the clinician gleans from interviews is apt to be more influential than other types of assessment data, because the clinician may filter other data through the firsthand interview impressions. Despite their popularity and influence, clinical interviews of children have not been thoroughly researched to determine what data can be best obtained at what ages and how such interviews can be integrated with other data. Since the advent of DSM-III (American Psychiatric Association, 1980), a common approach to clinical interviews has been to ask children whether they have each of the symptoms that are used as criteria for DSM disorders (e.g., Reich & Welner, 1989; Shaffer et al., 1996). Diagnoses are then based on whether the symptoms affirmed by the child meet the criteria for DSM disorders.

< previous page

page_440

next page >

< previous page

page_441

next page > Page 441

Structured interviews have increased the rigor with which diagnoses of adults are made in epidemiological surveys (e.g., Helzer et al., 1990). Yet, there is considerable evidence that children do not reliably and validly report DSM symptoms when asked about them in structured interviews. For example, many of the symptoms that children affirm in an initial interview are denied in a repeat interview several days later (Edelbrock, Costello, Dulcan, Kalas, & Conover, 1985). This results in low test-retest reliability and a large decline in the number of diagnoses made from the first interview to the second interview with the same children. Poor agreement has also been found between diagnoses made from structured interviews with children and diagnoses made from interviews with their parents or from comprehensive clinical evaluations (Costello, Edelbrock, Dulcan, Kalas, & Klaric, 1984; Shaffer et al., 1988). In addition, children fail to understand many of the questions posed by structured diagnostic interviews (Breton et al., 1995). To apply empirically based assessment to interviews with children, an interview has been developed that is geared to the cognitive levels and interactive styles of children. Instead of asking children whether they have particular symptoms, the SCICA (McConaughy & Achenbach, 1994) provides open-ended questions designed to elicit children's reports and views on various important areas of their lives, including family, friends, school, activities, concerns, and fantasies. It also includes a kinetic family drawing, brief achievement tests, screen for fine and gross motor abnormalities, and questions about problems attributed to the child by others, such as parents and teachers. The SCICA focuses not only on what the child says in response to questions, but also on the child's nonverbal behavior. While administering the SCICA, the interviewer notes on the protocol what the child says and does. Immediately after the interview, the interviewer scores the 120 observational items and 125 self-report items, as well as additional problems that can be added by the interviewer. Each item is scored on a four-step scale like that of the DOF. Principal components/varimax analyses have yielded five syndrome scales based on the SCICA observational items, designated as Anxious, Withdrawn, Attention Problems, Strange, and Resistant. The analyses also yielded three syndrome scales based on the self-report items, designated as Anxious/Depressed, Family Problems, and Aggressive Behavior. The sample from which the syndromes were derived is summarized in Table 15.4. The syndrome scales, Internalizing, Externalizing, and total problem scores are entered on the SCICA profile using either a computer-scoring program or hand-scoring forms. Supplementary questions and scoring items are provided for adolescents through age 18. A videotape of excerpts from SCICA interviews is available for training raters. The trainee watches a segment, scores it on the SCICA observation and self-report forms, and then enters the item scores into the SCICA scoring program. The program prints out comparisons between the trainee's scores and scores obtained from experienced raters. The trainee can then repeat the process until achieving a high correlation with the experienced raters' scores. Table 15.5 summarizes the scale names for all the instruments discussed to this point. Assessment of Young Adults It has long been assumed that many adult disorders have their origins in childhood and that childhood problems may predict adult disorders. Yet, there is a large gap between the procedures and criteria for assessing childhood problems, on the one hand, and the

< previous page

page_441

next page >

< previous page

page_442

next page > Page 442

TABLE 15.5 Scales Scored from the CBCL/2-3, C-TRF/2-5, CBCL/4-18, TRF, YSR, DOF, SCICA Instrument Scalea CBCL/2-3 C-TRF/2-5 CBCL/4-18 TRF YSR DOF SCICA Adaptive/Competence 1. Academic Performance + + 2. Activities + + 3. Adaptive + 4. On-task + 5. School + 6. Social + + 7. Total adaptive/competence + + + Problems 1. Aggressive Behavior + + + + + + + 2. Anxious/Depressed + + + + + + + 3. Attention Demanding + 4. Attention Problems + + + + + + 5. Delinquent Behavior + + + 6. Destructive Behavior + (table continued on next page)

< previous page

page_442

next page >

< previous page

page_443

next page > Page 443

(table continued from previous page) 7. Family Problems + 8. Fears + 9. Immature + 10. Nervous-Obsessive + 11. Resistant + 12. Self-Destructive/Identity Problems boys 13. Sex Problems ages 411 14. Sleep Problems + 15. Social Problems + + + 16. Somatic Complaints ++ + + + 17. Strange + 18. Thought Problems + + + 19. Withdrawn ++ + + + ++ Broad Band 1. Internalizing ++ + + + ++ 2. Externalizing ++ + + + ++ 3. Observational + 4. Self-Report + 5. Total Problems ++ + + + + Note. + indicates that the scale is scored from the instrument; indicates that the scale is not scored from the instrument. a Scale names differ somewhat among the instruments.

< previous page

page_443

next page >

< previous page

page_444

next page > Page 444

procedures and criteria for assessing adult disorders, on the other. This gap becomes especially obvious in longitudinal and follow-up research on relations between child and adult problems. Between the late teens and late twenties, neither problem behaviors nor competencies can be judged by the standards of either the preceding or subsequent years. Young people follow diverse developmental pathways from adolescence to adulthood. Many may manifest behavior that would be quite deviant in earlier or later years, but that may not be so deviant during the transition to adulthood. The Young Adult Self-report (YASR; Achenbach, 1997b), which has a format similar to that of the YSR, was developed to extend empirically based assessment to this period. Many YASR items have counterparts on the YSR, but items have been modified and new items have been added to tap functioning in spousal and similar relationships, work, higher education, and other situations relevant to young adults. The YASR also has questions about tobacco, alcohol, and drug use. The Young Adult Behavior Checklist (YABCL; Achenbach, 1997b) was developed for young adults who maintain sufficient contact with their parents or others who know their behavior well. Many CBCL items have counterparts on the YABCL, with modifications and additions appropriate for young adulthood. Principal components/varimax analyses of the samples summarized in Table 15.4 yielded the following seven syndromes, which are scorable from both the YASR and YABCL and have counterparts among the CBCL/4-18-TRF-YSR cross-informant syndromes: Aggressive Behavior, Anxious/Depressed, Attention Problems, Delinquent Behavior, Somatic Complaints, Thought Problems, and Withdrawn. No clear counterpart of the CBCL/4-18TRF-YSR Social Problems syndrome was found in the analyses of the YASR or YABCL. However, a syndrome designated as Intrusive was found, consisting of problems such as the following: Brags, Demands a lot of attention, Showing off, Talks too much, Teases a lot, and Unusually loud. These items, plus items indicative of overt aggression, also loaded on the CBCL/4-18-TRF-YSR Aggressive Behavior syndrome. Among young adults, however, the Aggressive Behavior syndrome comprised only the more overtly aggressive behaviors, whereas the socially obnoxious behaviors formed the separate syndrome designated as Intrusive. Longitudinal research has shown that child/adolescent scores on the CBCL/4-18-TRF-YSR Aggressive Behavior syndrome strongly predict scores on both the YASR-YABCL Aggressive Behavior and Intrusive syndromes (Achenbach, Howell, McConaughy, & Stanger, 1995c). This indicates that some aggressive youths become less overtly aggressive in young adulthood but still display the socially obnoxious behavior comprising the YASRYABCL Intrusive syndrome. Other aggressive youths, by contrast, continue to display the overtly aggressive behaviors comprising the YASR-YABCL Aggressive Behavior syndrome. Derivation of the YASR-YABCL syndromes also revealed developmental changes in the Attention Problems syndrome. Although the possibility of adult attention deficit hyperactivity disorder (ADHD) has received a great deal of publicity (e.g., Shaffer, 1994), the DSM does not provide specific diagnostic criteria for adult ADHD. The Attention Problems syndrome derived from principal components/varimax analyses of the YASR and YABCL includes attentional problems like those of the CBCL/4-18-TRF-YSR, but does not include problems of overactivity. Instead, the YASR-YABCL version includes problems such as irresponsibility. Nevertheless, child/adolescent scores on the CBCL/4-18-TRF-YSR Attention Problems syndrome have been found to predict young adult scores on the YASR-YABCL version (Achenbach et al., 1995c). This

< previous page

page_444

next page >

< previous page

page_445

next page > Page 445

suggests continuity in attention problems over developmental periods when overactivity declines and maladaptive behavioral byproducts of attention problems become more evident. The YASR-YABCL computer-scoring program produces profiles scored from both kinds of forms, plus crossinformant comparisons of multiple forms for each young adult. The cross-informant printouts display side-byside comparisons of item and scale scores like those described earlier for the CBCL/4-18, TRF, and YSR. In addition, the YASR profile displays scores for adaptive functioning scales designated as follows: Friends, Education, Job, Family, Spouse (or partner), and Mean Adaptive. The YASR profile also displays scales for tobacco, alcohol, and drug use, plus a composite substance use scale. The YASR and YABCL scales are normed on national samples of young adults, separately for each gender. Hand-scored profiles, as well as computerscoring programs, are available. Reliability and Validity The manuals for the CBCL/2-3, CBCL/4-18, TRF, YSR, SCICA, YASR, and YABCL present extensive data on test-retest reliability, internal consistency, interinformant agreement (where relevant), and stability over various periods (Achenbach, 1991b, 1991c, 1991d, 1992, 1997b; McConaughy & Achenbach, 1994)). The manuals also present extensive data on validity, including content validity, construct validity, and criterion-related validity. Additional correlates of the CBCL/4-18-TRF-YSR cross-informant syndromes have been surveyed by Achenbach and McConaughy (1997). Data on the reliability and validity of the DOF, as well as scale scores for referred and nonreferred children, have been published by Achenbach (1991b), McConaughy et al. (1988), and Reed and Edelbrock (1983). The Guide for the Caregiver-Teacher Report Form for Ages 2-5 (Achenbach, 1997a) presents data on test-retest reliability, internal consistency, cross-informant agreement, and validity in terms of discrimination between referred and normative samples. A full-length manual will be published when more extensive data are available. Reliability and validity data are summarized in Table 15.6 for all instruments. Hundreds of studies have demonstrated reliability and validity with respect to many criteria. A bibliography of some 3,000 published studies, including topic listings, author listings, and bibliographic references, is updated as of April 1 of each year (Vignoe & Achenbach, 1999). Basic Interpretive Strategy In keeping with the empirically based approach, the instruments are designed to provide psychometrically sound, standardized descriptions of functioning, as seen by a particular informant at a particular point in time. Rather than being ''interpreted" to reveal hidden entities, the descriptions obtained from each informant are to be compared with data from other sources. In the assessment of an individual, data from multiple sources can

< previous page

page_445

next page >

< previous page

page_446

next page > Page 446

TABLE 15.6 Reliability and Validity Data InstrumentReliabilitya Validity CBCL/2-3 .87 1. All scales discriminate between referred and nonreferred at p < .01. 2. Significant r with Richman (1977) instrument at p < .01. C-TRF/2.84 1. All scales discriminate between referred 5 and normative at p < .05. CBCL/4.88 1. All scales discriminate between referred 18 and nonreferred at p < .01. 2. Significant rs with corresponding scales at Conners (1973) and Quay and Peterson (1987) instruments. TRF .91 1. All scales discriminate between referred and nonreferred at p < .01. 2. Significant rs with corresponding scales of Conners Revised Teacher Rating Scale (Goyette, Conners, & Ulrich, 1978). YSR Ages 1114: .67 1. All scales discriminate between referred Ages 15- and nonreferred at p < .01. 18: .83 DOF Total problems: 1. All scales discriminate between referred and randomly selected controls at p < .01. .91b On-task: .82b SCICA .73c 1. All but one scale discriminate between referred and nonreferred at p < .05. 2. Significant rs with CBCL and TRF scores. YABCL .87 1. All scales discriminate between referred and nonreferred at p < .01. 2. Significant rs with CBCL/4-18-TRF-YSR syndromes assessed in adolescence. YASR .84 1. All problem scales and most adaptive and substance use scales discriminate between referred and nonreferred at p < .01. 2. Significant rs with CBCL/4-18-TRF-YSR syndromes assessed in adolescence and with MMPI scales. Note. Many other reliability and validity data are presented in the manual for each instrument and in hundreds of studies listed in the Bibilography of Published Studies Using the Child Behavior Checklist and Related Materials> (Vignoe & Achenbach, 1999). aUnless otherwise indicated, reliability is mean of rs between all scale scores obtained over 7- to 15-day intervals, as reported in the manuals for the respective instruments. bReliabilities for DOF are rs between two observes scoring behavior observed during the same intervals (McConaughy, Achenbach, & Gent, 1988). cReliability for SCICA is mean of rs between all scale scores obtained from two interviewers independently interviewing and rating children at a mean interval of 12 days (McConaught & Achenbach, 1994).

be used to form a mosaic picture of how the individual is seen by different people in different contexts. The cross-informant computer program for the CBCL/4-18, TRF, and YSR facilitates comparisons among father, mother, teacher, and self-reports obtained on the same child. The program also indicates whether agreement between particular pairs of informants is below average, average, or above average for those types of informants. Where agreement is especially poor, the reasons should be explored to determine whether the child's functioning differs markedly from one context to another or whether characteristics of the informants markedly affect their reports. Differences between the pictures obtained from different sources can be clinically useful, because they provide

< previous page

page_446

next page >

< previous page

page_447

next page > Page 447

specific foci for asking informants about their perceptions of the child in those particular areas, the circumstances under which problem behaviors occur, and possible eliciting factors. For example, if a mother's CBCL yields a much higher Aggressive Behavior score than does the father's CBCL, the clinician can inquire about the circumstances in which the mother sees the aggressive behaviors she reports, whether the father is also present at the same time, whether one parent elicits different behavior than the other parent, whether the parents have different standards for judging the same behavior, and so on. The YASR-YABCL computer-scoring program has similar features for making comparisons between problems reported by young adults and other informants, such as parents. The CBCL/2-3 and C-TRF/2-5 programs make comparisons among up to five informants who have all completed the same type of form. For purposes of research, such as treatment outcome evaluations, data from each source can be analyzed separately to determine whether they point to similar or different conclusions. Data from multiple sources can also be aggregated by converting raw scale scores from each source to z scores within the research sample and then computing a mean or weighted combination of scores to use as a composite variable. The T-scores provided by the CBCL, TRF, and YSR profiles can likewise be averaged to obtain composite scores that indicate how each individual compares with national normative samples. The scoring of the instruments is designed to facilitate analyses at the levels of items, narrow-band competence and syndrome scores, broad-band Internalizing and Externalizing scores, and global total competence and problem scores. In addition, the computer-scoring programs enable users to determine the degree of similarity between a child's CBCL/4-18, TRF, or YSR profile and profile types derived from cluster analyses (Achenbach, 1993). The manuals for these instruments provide numerous examples of practical applications to individual cases, as well as applications of the instruments to research that extends beyond the individual case (Achenbach, 1991b, 1991c, 1991d, 1992, 1997a, 1997b; McConaughy & Achenbach, 1994). Treatment Planning All of the instruments can be used to provide baseline assessments of functioning. The specific problems reported and the scores on the various scales can be used to determine whether intervention is warranted and to identify specific targets for intervention. The CBCL/2-3, C-TRF/2-5, CBCL/4-18, TRF, YSR, YABCL, and YASR can all be obtained from the relevant informants as part of the clinical intake process. The SCICA can be used by clinicians for their initial interviews with children and adolescents. For school psychologists and others who work in school settings, the DOF can be used to obtain observations in classrooms and other group settings such as recess. Because professional qualifications and extensive training are not needed to obtain data on the DOF, teacher aides and other school personnel may be able to obtain the observational data. Many states require classroom observations as part of the evaluation process for special education services. For those who do not work in school settings, it may be possible to employ their own observers or to arrange for school personnel to collect DOF data. Many schools and teachers permit observations for this purpose, if permission is obtained from the parents of the target child and if the control children are not identified by name.

< previous page

page_447

next page >

< previous page

page_448

next page > Page 448

Research Applications and Findings These instruments can be used in diverse research contexts to select subjects having particular problem patterns and levels of deviance. For example, several studies have used the CBCL to identify children with high scores on particular scales such as the Aggressive Behavior or Delinquent Behavior scale (e.g., Kazdin, EsveldtDawson, French, & Unis, 1987). Children who were initially deviant in the target areas were then assigned to different treatment conditions. Following treatment, comparisons were made between changes in scores among children receiving the different conditions. Table 15.7 lists treatment-related topics for which published studies have reported use of these instruments. Because research on the treatment of child psychopathology is still in an early stage of development, it has focused largely on determining which interventions work at all for fairly broad classes of problems, such as aggressive behavior, attention problems, and depression. This type of research is clearly needed to determine whether particular treatments are effective even for broad classes of problems and to weed out ineffective treatments. However, as research on child treatment advances, it should focus increasingly on more differentiated patterns of problems. Profiles of scale scores make it possible to identify permutations of problems and competencies that may not be adequately captured by broad diagnostic categories. If particular profile patterns are found to be shared by significant numbers of children, children grouped according to these patterns can be compared on variables relevant to treatment planning. For example, cluster analyses of profiles scored from the CBCL/4-18 have identified several patterns that characterize substantial proportions of clinically referred children (Achenbach, 1993). Children classified according to these CBCL patterns have, in turn, been found to differ significantly on variables relevant to treatment planning, such as teacher ratings, behaviors observed in school, and scores on ability and achievement tests (McConaughy et al., 1988). TABLE 15.7 Examples of Treatment-related Topics for Which Studies Have Been Published on the CBCL and Related Instruments Abdominal pain Headaches Peer interaction Anxiety Lead toxicity Posttraumatic stress disorder Asthma Learning problems Psychotherapy Attention deficit Obesity Schizophrenia disorder Colitis Obsessive-compulsive School refusal behavior Conduct disorder Oppositional disorder Seasonal affective disorder Delinquent behavior Outcomes of problems Self-concept Diabetes Pain Self-esteem Divorce Parent management Separation training Drug studies Parent-child relationships Sex abuse Eating problems Parental perceptions Sleep disturbance Encopresis Parental psychopathology Stress Enuresis Suicide Epilepsy Teacher perceptions Fire-setting Temperament Gender problems Tourette syndrome Treatment Note. The Bibliography of Published Studies Using the Child Behavior Cheklist and Related Materials (Vignoe & Achenbach, 1999) provides references to the studies relevant to each topic.

< previous page

page_448

next page >

< previous page

page_449

next page > Page 449

Clinical Applications In planning treatments for individuals, data from multiple sources should be compared to identify areas of agreement and disagreement. Disagreements between different sources should not be regarded as error but as clinically informative. If profiles scored from CBCLs completed by a child's mother and father show major disagreements, for example, the parents can be interviewed to determine the possible reasons. If the clinician judges the parents to be sufficiently sophisticated, they can be shown the profiles scored from their CBCLs. Interviews may reveal reasons, such as the following, for discrepancies between reports by a mother and father: One parent has too little contact with the child to be a good informant; only one parent sees the child in certain situations where the problems occur; one parent provokes the child's problem behaviors; the parents differ in their thresholds for noticing or reporting particular behaviors. If a teacher's report is quite discrepant from reports by other informants, a classroom observer can complete the DOF to determine whether what the teacher reports is also evident to the observer. The DOF may confirm that the child's behavior is markedly different in the teacher's class than elsewhere, or it may reveal important discrepancies between what is observed versus what is reported by the teacher. If a discrepant report by a particular informant appears to reflect idiosyncracies of that informant's view of the child, then that informant's view of the child may be chosen as an important focus of treatment. Similarly, if the behavior of a particular informant toward the child appears to provoke problems not seen in interactions with others, that informant's behavior may be targeted for change. After the clinician determines which problems are consistent across situations and which are specific to particular situations, treatment can be planned to deal with the syndromes and/or competencies that indicate the greatest need for help. If profiles scored from multiple informants agree in showing deviance on several syndromes, interventions should be designed to change broad aspects of functioning across multiple situations. On the other hand, if the data indicate different types of problems in different contexts, such as home versus school, then different interventions may be needed for each context. The particular configurations of problems provide an important basis for planning the types of interventions as well as for selecting the contexts for intervention. For example, two children might both have scores in the clinical range on the Aggressive Behavior scale. However, a different treatment plan would be appropriate for a child who also scores in the clinical range on the Anxious/Depressed syndrome but in the normal range on the competence scales than for a child who is not deviant on the Anxious/Depressed syndrome but is in the clinical range on the competence scales. Use with Other Evaluation Data Table 15.1 listed examples of other data that should be obtained in comprehensive clinical evaluations. Such data include developmental histories, family background, school records, cognitive test results, physical assessment, andfor clients able to complete themtests of self-concept and personality. All of these types of data may be relevant to determining what interventions are feasible and desirable for particular clients. According to the multiaxial approach, the variations in clients' functioning revealed by different assessment procedures may argue for a variety of interventions to address different problems in different contexts.

< previous page

page_449

next page >

< previous page

page_450

next page > Page 450

Applications to Managed Care The empirically based instruments described in this chapter are especially appropriate for managed care. Except for the DOF and SCICA, the instruments are self-administered and require little professional time. They can all be computer scored by clerical personnel. The data are stored in ASCII files, which can be fed into statistical and other programs for analysis of individual cases and entire caseloads, as well as for testing changes from intake to subsequent assessments. They can be readministered periodically to monitor treatments and to evaluate outcomes. For quick processing, machine-readable forms of the CBCL/4-18, TRF, and YSR are available that can be processed by reflective-read scanners, image scanners, and fax machines. To avoid the need for forms, software is available to enable clients to directly key enter their responses to the CBCL/4-18, TRF, and YSR. The cross-informant program can score and compare data from all three types of informants, whether the data are obtained via the classic forms, machine-readable forms, or key entry by clients. Because all the instruments except the DOF and SCICA are self-administered, they can be employed in diverse settings that lack mental health professionals. For example, the pediatric, family practice, and general medical services of managed care organizations can use the self-administered instruments to distinguish between problems that can be targeted for intervention within these services versus those that may warrant referral to mental health specialists. Providers such as pediatricians and family practitioners can learn to use the empirically based profiles to quickly identify specific problems as well as scale scores outside the normal range. In some cases, the providers may elect to obtain further information through interviews and/or by having other informants complete the assessment instruments. The providers may then use the results to decide whether they can arrange appropriate interventions or should refer clients to specialty services. In other cases, the initial profiles may indicate problems that are severe or complex enough to warrant immediate referral to specialty services. Managed care organizations may also provide guidelines to be followed by providers in deciding which cases to refer to specialty services. Providers can use the empirically based profiles to document their decisions about individual cases. If clients are referred to mental health services, the completed forms and their scored profiles can be used by the mental health professionals as a basis for their evaluation. Because clinical interviews are usually the keystones of mental health evaluations, the SCICA can be administered to make comparisons between findings from the self-administered instruments and what the clinician can glean from firsthand contacts. If a more differentiated picture of school functioning is needed than obtained from TRFs, then the clinician can request that DOFs be completed. This can be done in cooperation with school psychologists who may personally complete DOFs or have paraprofessionals, trainees, or teacher's aides complete them. The profiles scored from multiple sources provide an explicitly documented basis for interventions. The profiles also document the client's baseline functioning. To measure change, users can compare the baseline profiles with profiles obtained later. Feedback Regarding Assessment Findings In the assessment of children, the consumers of assessment findings are usually adults such as parents and school personnel. The form in which the findings should be conveyed depends on the relationship between the clinician and consumer, as well as on the

< previous page

page_450

next page >

< previous page

page_451

next page > Page 451

consumer's level of sophistication about standardized psychological assessment. If a clinician is working directly with fairly sophisticated parents, the profiles scored from their CBCLs can be shared with them. Within the limits of confidentiality assurances given to the informants, profiles scored from C-TRFs, TRFs, DOFs, and CBCLs may be shared with those who oversee special education to help them in planning educational interventions. Because not only adults but also children and youths should be assured of confidentiality, completed SCICAs, YSRs, and their profiles should be carefully protected from family and teachers. No scored profiles from any of the instruments should be provided to classroom teachers, nor should completed forms or scored profiles be placed in students' school records where teachers, office workers, or other students might see them. Limitations/Potential Problems in Use Users of the empirically based instruments are admonished not to label individual children in terms of their scores on profile scales or to equate scale scores with diagnoses. Many labels, administrative categories, and diagnostic terms are applied to children without adequate empirical support. In the empirically based approach, data are obtained on specific problem items from informants who interact with children in various contexts. The data are then aggregated into scales. The scales are normed to provide a basis for judging the degree of deviance indicated by a particular informant's report relative to what is reported by similar informants for large normative samples. Descriptive labels have been provided to summarize the content of each scale. However, it is not the labels but the problems actually reported and the comparisons with normative samples that are the basis for judging a child's functioning and need for treatment. The empirically derived scales should be regarded as ways of describing problems reported for children. The scales are not assumed to represent disease entities or inherent attributes of the children. Furthermore, because all assessment procedures are subject to error, no single procedure or set of scores should be the sole basis for decisions about treatments. Instead, multiple sources of data should be compared to identify findings that are consistent across sources, those that are inconsistent but may validly reflect functioning in particular contexts, and those that may reflect errors of measurement or respondent characteristics that might themselves be in need of change. To maximize the efficacy and cost-effectiveness of treatment, managed care requires systematic monitoring of treatment. The empirically based instruments can be used to monitor treatment by having them completed periodically by relevant informants, such as parents, caregivers, teachers, youths, and young adults. Because changes in important behaviors require time to occur, to stabilize, and to become evident to informants, it is usually desirable to allow at least 2 months between assessments. The CBCL/2-3, C-TRF/2-5, and TRF specify 2-month rating periods. To use the CBCL/4-18, YSR, YABCL, and YASR for treatment monitoring over periods shorter than the 6 months that they specify, users can change the instructions to specify shorter periods, such as 2 months. If this is done, it is also desirable to instruct respondents to use the shorter period when they initially complete the instruments, so that each assessment will cover intervals of equal length. Shorter intervals may yield slightly lower scores because they may miss low frequency problems. However, if each rating interval is of the same length and if raw scale scores are used to analyze changes from one interval to another, then the shorter intervals are unlikely to distort measurement of change.

< previous page

page_451

next page >

< previous page

page_452

next page > Page 452

Profiles obtained during treatment monitoring can be compared to the initial profiles in order to identify changes in patterns of problems, as well as decreases and increases in problem levels. Scores on each scale can be compared from one assessment to another in order to identify changes. To be certain that improvement is genuine and general, it is important to obtain data from multiple informants at each assessment. The DOF and SCICA can also be used to monitor behavioral changes across the course of treatment. They can be repeated at any intervals chosen by the user, with no modifications of instructions. Outcomes Assessment Just as all these instruments can be used to provide baseline assessments and to monitor the course of treatment, they can also be used to assess outcomes after treatment or no treatment. The CBCL/2-3, C-TRF/2-5, CBCL/418, TRF, YSR, YABCL, and YASR can be completed by the same informants who originally completed them. The SCICA and DOF can be repeated by interviewers and observers after treatment. Evaluation Against "Ideal Criteria" for Outcome Measures Eleven criteria for the assessment of mental health outcome measures have been proposed by Ciarlo, Edwards, Kiresuk, Newman, and Brown (1981). Table 15.8 lists the 11 criteria and evaluates the instruments according to each criterion. Research Applications For rigorous tests of the effectiveness of particular treatments, appropriate research designs are needed. In group comparison designs, it is desirable to randomly assign similar groups to different treatment and control conditions, either via completely randomized designs or randomized blocks designs. Where more than one type of treatment is plausible for a particular condition, it is desirable to assign similar groups to Treatment A, Treatment B, and a placebo control condition. Comparing the efficacy of one treatment to another, as well as comparing both treatments to a control condition, can be far more informative than merely finding that a particular treatment is followed by better outcomes than no treatment. To avoid confounding the assessment of treatment effects with "hello-goodbye" effects or with differences between the durations of different treatments, it is important to select a uniform interval for the reassessment of all subjects. For example, all subjects should initially be assessed at a similar point of entry to the study, before variations in treatment can have effects. The subjects should then be assessed again at a uniform interval after the initial assessment. If a treatment is planned to last no more than 4 months, for example, the first outcome assessment might be scheduled for 6 months after the initial assessment, to avoid "goodbye" effects associated with termination of treatment. Follow-up assessments could be scheduled for 12, 18, and 24 months after the initial assessment to track longer term effects of treatment. If treatment groups are selected for deviance, regression toward the mean may affect all kinds of subsequent assessment. That is, subjects who have very deviant scores at one point in time are likely to have less deviant scores when reassessed, because deviant

< previous page

page_452

next page >

< previous page

page_453

next page > Page 453

TABLE 15.8 Evaluation Against "Ideal Criteria" for Outcome Measures Criteria Comments 1. Items and scales were developed on people referred for 1. Relevant and mental health and special education services for diverse appropriate to problems seen in diverse settings, including both sexes, client groups multiple ages, different ethnic groups, and all SES levels. 2. Standardized forms, scoring procedures, and profiles are 2. Simplicity usable in all relevant settings. Scoring can be done by and uniformity clerical workers, although training is necessary to obtain of observational data via the DOF and interview data via procedures the SCICA. 3. Quantitatively scored descriptive items are summed to 3. Clear and objective yield scale scores that are converted to standard scores referents based on normative samples. 4. Instruments are designed to obtain analogous data from 4. Reflect the parents, teachers, caregivers, self-reports, direct perspectives of observations, and clinical interviews. relevant participants 5. Because the instruments are designed for so many 5. Data on how different applications, they do not focus on specific treatments treatment processes. However, comparisons among the produce effects diverse scale scores and informants can reveal differential treatment effects. 6. Table 15.6 summarizes reliability and validity data. 6. Manuals present further details, plus data on Psychometric distributions of item and scale scores in normative and adequacy clinical samples, longer term stabilities, standard deviations, standard errors, coefficient alpha, and correlations among scales. Instruments can be completed by informants blind to treatment conditions. Studies listed by Vignoe and Achenbach (1999) have demonstrated sensitivity to treatment-related change. 7. Classic forms cost 40¢ each. Machine-scorable forms 7. Inexpensive cost 80¢ each. Can be hand scored or computer scored on-site in a few minutes by clerical workers. Computer programs provide unlimited runs at no extra cost. 8. Items use ordinary descriptive language that is 8. understandable without special training. Scales have Understandable descriptive titles to summarize content. Scores are and sensible displayed on easily understood profiles. 9. Profiles and normal, borderline, and clinical ranges 9. Quick, easy provide quick, easy, norm-based feedback. Scores are feedback and displayed in relation to percentiles for normative norms samples in addition to raw scores and T-scores. Manuals display means, standard deviations, and standard errors of scores for demographically matched referred and normative samples. 10. Instruments are designed for routine clerical assessment 10. Useful in of individual cases in diverse settings. Data can be used clinical service in evaluation reports, treatment planning, and outcome assessments. Forms and profiles provide documentation for case records. 11. The standardized descriptive data are compatible with 11. Compatible virtually theoriesall theories and treatments. with diverse theories Note. Criteria are from Ciaralo, Edwards, Kiresuk, Newman. and Brown (1981). scores are partly a function of random influences that will not all operate in the same direction on subsequent occasions. Even in nondeviant groups, however, interviews, questionnaires, and checklists tend to show fewer

problems on a second administration shortly after the first (e.g., Achenbach, 1991b; Edelbrock et al., 1985; Vandiver & Sher, 1991). This test-retest attenuation effect has also been found in diagnostic interviews designed to obtain lifetime diagnoses, where it is illogical for lifetime diagnoses to

< previous page

page_453

next page >

< previous page

page_454

next page > Page 454

decrease from an initial occasion to a later occasion when subjects' lifetimes span more time (Robins, 1985). Although the exact reason for test-retest attenuation is not known, its pervasiveness, plus regression of deviant scores toward the mean, argue strongly for applying exactly the same assessment sequences to groups receiving different treatment conditions. Even if all groups show declines in problems, comparisons among groups are needed to determine whether the declines are significantly greater for certain treatment conditions than for others. For mental health interventions that lend themselves to single-subject, ABAB, or multiple baseline designs, it is important to have enough replications over different subjects to insure that treatment effects are robust across individuals. Although treatment of an individual may be deemed successful if the individual improves, research on treatment effectiveness should produce knowledge that is generalizable to many individuals. Even when single-subject designs are optimal for testing efficacy, findings must be replicated on new individuals to ensure that they are generalizable. All of the competence, adaptive functioning, syndrome, Internalizing, Externalizing and total problem scales of these instruments can be used for assessing outcomes. For individuals, profiles obtained before and after treatment can be visually compared. For statistical analyses of groups randomly assigned to different treatment conditions, appropriate statistics include repeated measures analyses of variance (ANOVA), analyses of covariance (ANCOVA), multivariate analyses of variance (MANOVA), and multivariate analyses of covariance (MANCOVA) to compare pretreatment with posttreatment scores obtained by each group. Group profiles can also be constructed from the mean syndrome scores obtained by all members of one treatment group for comparison with the mean syndrome scores obtained by all members of another treatment group. The competence scales of the CBCL/4-18 and YSR, the adaptive functioning scales of the YASR, and the academic performance scale of the TRF may be less likely than problem scales to detect short-term changes in response to treatment. This is because they encompass characteristics that are apt to take longer to change, such as involvement in activities and organizations, friendships, and academic achievement. However, individual competence items, such as how well an individual gets along with parents and siblings, can show changes over relatively brief periods. Prediction of Outcomes from Case Characteristics Beside controlled studies of treatment effects, these instruments can be used to determine whether initial case characteristics predict differences in outcomes. This can be done for untreated as well as treated cases. As an example, 3-year and 6-year follow-ups of over 2,000 children who were initially assessed in a national home interview survey have been carried out. Some of the children were identified as being deviant on particular empirically derived syndromes. We then tested prediction of outcomes from scores on each syndrome and other variables, such as family characteristics, stressful experiences, and receipt of mental health services (Achenbach, Howell, McConaughy, & Stanger, 1995a, 1995b, 1995c, 1998). The outcome variables included scores on the CBCL/4-18, TRF, YSR, YABCL, and YASR, plus reports of suicidal behavior, referral for mental health services, contacts with the police, academic and behavioral problems in school, substance abuse, and parents' judgment that their child needed additional professional help for behavioral/emotional problems. The results are too complex to present here but can be found in Achenbach et al. (1995a, 1995b, 1995c, 1998).

< previous page

page_454

next page >

< previous page

page_455

next page > Page 455

In parallel research, long-term outcome assessments of 1,100 children who were referred to a child psychiatry service were conducted. This research demonstrated the power of CBCL/4-18 scores and other case characteristics to predict outcomes according to follow-up CBCL/4-18, TRF, YSR, YABCL, and YASR scores, as well as other outcome data (Stanger, MacDonald, McConaughy, & Achenbach, 1996). Because so little is known about which children really need treatment, which ones would have poor long-term outcomes even after treatment, and which ones would have good outcomes without treatment, studies of this sort can help to determine where to focus efforts to improve treatment. Clinical Applications For the clinical evaluation of treatment in individual cases, it is always useful to readminister the assessment instruments periodically in order to detect unexpected as well as hoped for changes. Because they depend on cumulative accomplishments, the competence and adaptive functioning items may be less sensitive to short-term changes than are the problem items. However, to gain a full picture of the impact of treatment on competencies as well as on problems, outcome assessments should include follow-ups over a period of 1 or more years whenever possible. In the clinical evaluation of outcomes for individual cases, users can roughly estimate the statistical significance of changes in scale scores on the basis of the standard error of measurement for the relevant scale. The manuals display the standard error of measurement for each scale for referred and normative samples. Suppose, for example, that a 10-year-old boy referred for mental health services initially obtained a raw score of 21 on the Externalizing scale of the CBCL/4-18. At the outcome assessment, the boy's score has dropped to 11. As shown in the manual for the CBCL/4-18 (Achenbach, 1991b), the standard error of measurement is 3.4 for clinically referred 10-year-old boys on the Externalizing scale. The decline of 10 points from 21 to 11 is thus equal to 2.9 standard errors, which would be likely to occur by chance in less than 5% of cases. However, for a variety of reasons, tests of statistical significance are not really applicable to individual scores. The clinical significance of particular scale scores can be evaluated by reference to the appropriate normative samples. For 10-year-old boys, a raw score of 21 on the Externalizing scale is close to the mean of 20.9 for the sample of clinically referred boys and well above the mean of 9.8 for the nonreferred boys (Achenbach, 1991b). The raw score of 21 is equivalent to a T-score of 65, which is above the T-score of 63 at the top of the borderline clinical range. The boy's posttreatment raw score of 11 is equivalent to a T-score of 56, which is below the borderline clinical range and within the normal range. In terms of clinical significance, the boy's Externalizing score has thus declined from being clearly in the clinical range to being clearly in the normal range, indicating a clinically important change. Use with Other Evaluation Data Any of the instruments can be used in conjunction with any other data that are relevant to a particular outcome assessment. For example, if treatment focuses on a particular area of dysfunction or diagnostic category, such as depression or attention deficits,

< previous page

page_455

next page >

< previous page

page_456

next page > Page 456

measures specific to the target disorder can be used in conjunction with these measures. If the more specialized measures are used alone, they may miss worsening, improvement, or lack of change in areas that were not specifically targeted for treatment. Findings based only on specialized measures may therefore be much less informative than if they are accompanied by findings from measures that tap multiple areas of functioning assessed and normed in a uniform fashion. Feedback Regarding Outcome Findings Outcome findings can be presented in terms of the amount of change in scale scores and comparisons of preversus posttreatment profiles. Group results can be tested for statistical significance. For effects found to be significant, the percent of variance accounted for can be computed as a measure of effect size (Cohen, 1988). Because norms are available for these scales, the pre- and posttest scores can be described as being in the normal, borderline, or clinical range. Reference to norms is helpful in the overall evaluation of outcomes. Many studies report significant effects of treatment without providing data on how deviant the subjects were from their peers either pre-or posttreatment. Treatment effects are obviously more valuable if they move clients from the clinical to the normal range, than if the clients remain in the clinical range after treatment or if they were not initially in the clinical range. The presentation of outcome findings should be tailored to the consumer. For parents or school personnel, reports of outcomes for individual children need not provide details of scale scores, but can verbally describe the areas in which changes have occurred and whether the child is now within the normal range. Research reports should include initial and outcome scale scores, the statistical significance of changes in scale scores, effect sizes, and comparisons of the subjects' scores with those for normative samples. This can be done in descriptive terms with respect to the normal, borderline, and clinical range, as well as in terms of percentiles and T-scores. Note that, because the raw score and T-score distributions differ in shape, the mean raw score for a group should not be directly converted to a T-score. Instead, if T-scores are desired, each raw scale score in a sample should be individually converted to a T-score by the computer program. The mean of these T-scores should then be computed. Limitations/Problems in Use Any assessment of treatment outcomes must be carefully designed to avoid artifacts. As mentioned earlier, the tendency for problem scores on most measures to decline over time argues for multiple comparison groups that receive different treatment conditions but similar assessment protocols. Placebo controls should be used whenever possible to control for the effects of receiving an active treatment. Data from informants who are blind to treatment conditions are essential for rigorous outcome assessments. If an intervention only involves children and their parents, for example, pre- and posttreatment TRFs, SCICAs, and DOFs can be obtained from people who are blind to treatment conditions. However, YSRs and CBCLs would also be helpful to determine whether changes are reported by those directly involved in the treatment. Where multiple treatment conditions are compared, significant differences in outcomes reported by participants

< previous page

page_456

next page >

< previous page

page_457

next page > Page 457

in each condition can also provide evidence for differential efficacy, at least with respect to their perceptions. Scale scores should not be equated with diagnoses or disease entities. Pretreatment to posttreatment changes in scale scores reflect changes as described by a particular informant who knows the subject under particular conditions. Similar findings from multiple informants augment the evidence that changes are pervasive. Yet, even significant improvements in reports by one informant can indicate beneficial changes in that informant's perceptions. Because there is no litmus test for definitively diagnosing the presence or absence of behavioral/emotional disorders, treatments that yield significant improvements in outcome measures may not necessarily have ''cured" a disorder. Instead, long-term monitoring and outcome evaluations are needed to determine whether functioning is better in multiple areas among those who did, as opposed to those who did not, receive a particular treatment. Case Study Nine-year-old Erica was referred for evaluation when her fourth-grade teacher became concerned about her total lack of productive work and her difficulties with peers. Cognitive testing showed superior ability, with no indication of specific learning disabilities. Achievement tests yielded scores that were in the average range but below expectations for her cognitive ability. Medical and neurological examinations indicated immature motor development but no specific abnormalities. Figure 15.4 shows the problem portion of the profile scored from the TRF completed by Erica's teacher, and Fig. 15.5 shows the profile scored from the CBCL completed by her mother. The total problem scores on both instruments were in the clinical range and both profiles indicated exceptional elevations on the Withdrawn scale. The TRF yielded scores in the borderline range on the Social Problems and Attention Problems scales. The CBCL yielded scores in the clinical range on both these scales, as well as a borderline clinical score on the Thought Problems scale. On the TRF adaptive functioning scales and CBCL competence scales, Erica obtained low scores in the areas related to social and academic functioning. The profile obtained by averaging DOFs completed on three occasions by a teacher's aide confirmed problems with interpersonal relationships and showed that Erica was on task much less than the two girls observed as controls. Rather than attending to her work, Erica was either daydreaming or was gazing at other students. The specific behaviors described on the DOF included awkward attempts to intrude into interactions among peers and angry gestures toward people behind their backs. In the SCICA interview, Erica appeared depressed. She reported that her classmates disliked her and picked on her. In view of the large discrepancy between her ability and achievement scores, as well as her TRF, CBCL, and DOF scale scores in the clinical range, Erica was deemed eligible for special education services. A treatment plan was developed focusing on overcoming the problems of withdrawal and social relationships that were evident both at school and home. Because treatment was to be supported by special education funding and because there were obstacles to direct work with Erica's family, treatment was carried out at school, accompanied by occasional phone contacts with Erica's mother. A school-based therapist met weekly with Erica to identify specific frustrations and conflicts in interpersonal relationships that had occurred during the week. The therapist

< previous page

page_457

next page >

< previous page

page_458

next page > Page 458

Fig. 15.4. Hand-scored TRF problem profile for 9-year-old Erica. Copyright © 1993 by T.M. Achenbach. Reprinted by permission.

< previous page

page_458

next page >

< previous page

page_459

next page > Page 459

Fig. 15.5. Hand-scored CBCL problem profile for 9-year-old Erica. Copyright © 1993 by T.M. Achenbach. Reprinted by permission.

< previous page

page_459

next page >

< previous page

page_460

next page > Page 460

worked with Erica to highlight ways in which she may have contributed to the situations and how she might handle them differently. The therapist also worked on reinforcing Erica's interest in science and encouraged her to apply her considerable cognitive ability to science-related schoolwork. In addition, the therapist met periodically with Erica's teacher to obtain feedback on Erica's behavior and to instruct the teacher on reinforcing adaptive social and academic behaviors. The treatment continued for the 6 months remaining in that school year. During this period, Erica became more outgoing and developed a friendship with a classmate, but she also showed outbursts of temper in the classroom. In the following school year, the therapist met with Erica's fifth-grade teacher to encourage continued reinforcement of specific social and academic behaviors. The therapist also met with Erica a few times during the year. Figure 15.6 shows profiles scored from follow-up TRFs completed by two of Erica's seventh-grade teachers who had not been involved in any intervention efforts. (The profiles are both drawn on one hand-scored TRF profile form.) Although social problems were still reported, the total problem scores and all scale scores, including academic performance and adaptive behavior (not shown in Fig. 15.6), were now in the normal range. Figure 15.7 shows the profile scored from a CBCL completed by Erica's mother at this time. The CBCL total problem score remained in the clinical range, but it and the previously deviant syndrome scores were now considerably lower than on her initial CBCL. The case of Erica illustrates some of the ways in which the empirically based instruments can be used to identify specific targets for intervention. The choice of treatment for dealing with the identified problems was determined partly by the feasibility of a school-based intervention and by obstacles to a family-based intervention. The fact that problems of withdrawal and interpersonal relationships were similarly evident to the teacher, parent, and direct observer made these appropriate targets for treatment, no matter where the treatment was based. The follow-up TRFs indicated that Erica's school behavior was well within the normal range by seventh grade. The follow-up CBCL indicated improvement but still some problems in the clinical range, as seen by Erica's mother. If direct work with the family had been feasible, perhaps the problems reported by Erica's mother would have shown more improvement. However, comparisons of profiles from Erica's teachers and mother highlighted the differences between the outcomes at school, where treatment was focused, and in the family, where treatment had not been feasible. Divergent/Conflicting Data from Multiple Sources The multiaxial empirically based approach is designed to take account of the fact that multiple sources of data are needed for comprehensive clinical assessment. This approach emphasizes that multiple sources do not necessarily converge on a single diagnostic construct or disease entity. Discrepancies between data from different sources do not mean that one source is right and another is wrong or that the data are unreliable. Instead, different sources may reliably and validly yield different pictures of a client's functioning in different contexts, as perceived by different informants.

< previous page

page_460

next page >

< previous page

page_461

next page > Page 461

Fig. 15.6. Profiles scored from two follow-up TRFs for Erica at age 11. Broken line indicates science teacher's ratings; solid line indicates home economics teacher's ratings. Copyright © 1993 by T.M. Achenbach. Reprinted by permission.

< previous page

page_461

next page >

< previous page

page_462

next page > Page 462

Fig. 15.7. Profile scored from follow-up CBCL for Erica at age 11. Copyright © 1993 by T.M. Achenbach. Reprinted by permission.

< previous page

page_462

next page >

< previous page

page_463

next page > Page 463

This approach is designed to enable users to explicitly compare data from multiple sources, to identify the specific agreements and disagreements, to quantify the overall level of agreement, and to compare this with the level of agreement obtained in relevant reference samples. For both clinical and research purposes, the obtained level of agreement may be an important variable in its own right. Clients for whom there is high agreement among all informants, for example, may be less affected by situational variations than are clients for whom there are large differences among informants' reports. On the other hand, if data from one informant are highly discrepant from data obtained from all other sources and if that informant's reports are not verified, then that informant's perceptions may be targeted for change. Both consistencies and discrepancies among multiple sources should be regarded as potentially informative. The clinician's job is to synthesize from multiple sources an understanding of the case in all its complexities and to formulate treatment plans that take account of both the consistencies and discrepancies. Comprehensive assessment may often yield a mosaic of pieces from which a complex picture is constructed, rather than a single, seamless image. For the researcher, the task is to use multiple kinds of data to construct generalizable knowledge that can be applied to many new cases despite all the variations that make each bit of data and each case in some respects unique. The specific ways in which multisource data are used depend on the research questions to be answered. For some purposes, data from each source should be analyzed separately to determine whether each source points to similar or different conclusions than the other sources. For other purposes, a taxonomic decision tree can be used to identify cases for which each possible combination of agreement and disagreement occurs (see Achenbach, 1991a). For still other purposes, scores from multiple sources can be aggregated by converting the raw scale scores to z scores within the samples from each source and averaging the z scores across sources. Conclusions Multiaxial empirically based assessment is an approach to the assessment of problems and competencies that obtains empirical assessment data from multiple sources. The sources relevant to children include parents, teachers, cognitive and achievement tests, physical examinations, and direct assessment of the child, such as observations, interviews, and self-reports. This chapter described instruments for obtaining standardized data on child, adolescent, and young adult problems and competencies, including the CBCL/2-3, C-TRF/2-5, CBCL/418, TRF, YSR, DOF, SCICA, YABCL, and YASR. Syndromes of co-occurring problems have been empirically derived via principal components/varimax analyses of each instrument. Large samples have been used to norm profiles of scales for scoring the syndromes, Internalizing, Externalizing, total problems, competencies, and adaptive behavior. Eight cross-informant syndromes are similarly scorable from parents' CBCL/4-18 ratings, teachers' TRF ratings, and adolescents' YSR ratings. A cross-informant computer program enables users to enter data for the same subject from mother, father, teacher, and self-ratings, and prints out separate profiles scored from each informant's data. The program also displays side-by-side the item scores and scale scores from all informants, computes Q-correlations to provide a quantitative index of agreement between informants,

< previous page

page_463

next page >

< previous page

page_464

next page > Page 464

and displays the corresponding Q-correlations for large reference samples. Similar programs are available for comparing YASR versus YABCL ratings, and for comparing ratings by multiple informants on the CBCL/2-3 and on the C-TRF/2-5. These features enable the clinician to identify specific agreements and disagreements among informants and to judge how the overall level of agreement compares with that typically found for similar combinations of informants. This information, in turn, provides a basis for planning interventions in relation to different contexts and interaction partners. All the empirically based instruments can be used to obtain baseline data for treatment planning, reassessments to monitor changes over the course of treatment, and outcome data for evaluating the effects of treatment. Considerations relevant to clinical and research applications were presented. Because the empirically based procedures are designed to explicitly display variations in data from different sources, the inconsistencies as well as the consistencies between findings from different sources are useful for both clinical and research purposes. References Achenbach, T.M. (1966). The classification of children's psychiatric symptoms: A factoranalytic study. Psychological Monographs, 80(No. 615). Achenbach, T.M. (1978). The Child Behavior Profile: I. Boys aged 6-11. Journal of Consulting and Clinical Psychology, 46, 478-488. Achenbach, T.M. (1991a). Integrative guide for the 1991 CBCL/4-18, YSR, and TRF profiles. Burlington, VT: University of Vermont Department of Psychiatry. Achenbach, T.M. (1991b). Manual for the Child Behavior Checklist/4-18 and 1991 Profile. Burlington, VT: University of Vermont Department of Psychiatry. Achenbach, T.M. (1991c). Manual for the Teacher's Report Form and 1991 Profile. Burlington, VT: University of Vermont Department of Psychiatry. Achenbach, T.M. (1991d). Manual for the Youth Self-Report and 1991 Profile. Burlington, VT: University of Vermont Department of Psychiatry. Achenbach, T.M. (1992). Manual for the Child Behavior Checklist/2-3 and 1992 Profile. Burlington, VT: University of Vermont Department of Psychiatry. Achenbach, T.M. (1993). Empirically based taxonomy: How to use syndromes and profile types derived from the CBCL/4-18, TRF, and YSR. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T.M. (1997a). Guide for the Caregiver-Teacher Report Form for Ages 2-5. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T.M. (1997b). Manual for the Young Adult Behavior Checklist and Young Adult Self-Report. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T.M., & Edelbrock, C. (1983). Manual for the Child Behavior Checklist/4-18 and Revised Child Behavior Profile. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T.M., & Edelbrock, C. (1986). Manual for the Teacher's Report Form and Teacher Version of the Child Behavior Profile. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T.M., & Edelbrock, C. (1987). Manual for the Youth Self-Report and Profile. Burlington, VT: University of Vermont, Department of Psychiatry. Achenbach, T.M., Howell, C.T., McConaughy, S. H., & Stanger, C. (1995a). Six-year predictors of problems in a national sample of children and youth: I. Cross-informant syndromes. Journal of the American Academy of Child and Adolescent Psychiatry, 34, 336-347. Achenbach, T.M., Howell, C.T., McConaughy, S.H., & Stanger, C. (1995b). Six-year predictors of problems in a national sample

< previous page

page_464

next page >

< previous page

page_465

next page > Page 465

of children and youth: II. Signs of disturbance. Journal of the American Academy of Child and Adolescent Psychiatry, 34, 488-498. Achenbach, T. M., Howell, C.T., McConaughy, S. H., & Stanger, C. (1995c). Six-year predictors of problems in a national sample: III. Transitions to young adult syndromes. Journal of the American Academy of Child and Adolescent Psychiatry, 34, 658-669. Achenbach, T.M., Howell, C.T., McConaughy, S.H., & Stanger, C. (1998). Six-year predictors of problems in a national sample: IV. Young adult signs of disturbance. Journal of the American Academy of Child and Adolescent Psychiatry, 37, 718-727. Achenbach, T.M., & McConaughy, S.H. (1997). Empirically based assessment of child and adolescent psychopathology: Practical applications (2nd ed.). Thousand Oaks, CA: Sage. Achenbach, T.M., McConaughy, S.H., & Howell, C.T. (1987). Child/adolescent behavioral and emotional problems: Implications of cross-informant correlations for situational specificity. Psychological Bulletin, 101, 213-232. American Psychiatric Association. (1980). Diagnostic and statistical manual of mental disorders (3rd ed.). Washington, DC: Author. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Breton, J.J., Bergeron, L., Valla, J.P., Lepine, S., Houde, L., & Gaudet, N. (1995). Do children aged 9 through 11 years understand the DISC Version 2.5 questions? Journal of the American Academy of Child and Adolescent Psychiatry, 34, 954-956. Ciarlo, J.A., Edwards, D.W., Kiresuk, T.J., Newman, F.L., & Brown, T.R. (1981). Final report: The assessment of client/patient outcome techniques for use in mental health programs. (Contract No. 278-80-0005 DB). Bethesda, MD: National Institute of Mental Health. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Academic Press. Conners, C.K. (1973). Rating scales for use in drug studies with children. In Psychopharmacology bulletin: Pharmacotherapy with children. Washington, DC: U.S. Government Printing Office. Costello, A.J., Edelbrock, C., Dulcan, M.K., Kalas, R., & Klaric, S.H. (1984). Report on the Diagnostic Interview Schedule for Children (DISC). Pittsburgh: University of Pittsburgh, Department of Psychiatry. Edelbrock, C., Costello, A.J., Dulcan, M.K., Kalas, R., & Conover, N.C. (1985). Age differences in the reliability of the psychiatric interview of the child. Child Development, 56, 265-275. Goyette, C.H., Conners, C.K., & Ulrich, R.F. (1978). Normative data on revised Conners Parent and Teacher Rating Scales. Journal of Abnormal Child Psychology, 6, 221-236. Helzer, J.E., Canino, G.J., Yeh, E.K., Bland, R.C., Lee, C.K., Hwu, H.G., & Newman, S. (1990). AlcoholismNorth American and Asia: A comparison of population surveys with the Diagnostic Interview Schedule. Archives of General Psychiatry, 47, 313-319. Kazdin, A.E., Esveldt-Dawson, K., French, N. H., & Unis, A.S. (1987). Problem-solving skills training and relationship therapy in the threatment of antisocial child behavior. Journal of Consulting and Clinical Psychology, 55, 76-85. McConaughy, S.H., & Achenbach, T.M. (1994). Manual for the Semistructured Clinical Interview for Children and Adolescents. Burlington, VT: University of Vermont, Department of Psychiatry. McConaughy, S.H., Achenbach, T.M., & Gent, C.L. (1988). Multiaxial empirically based assessment: Parent, teacher, observational, cognitive, and personality correlates of Child Behavior Profiles for 6-11-year-old boys. Journal of Abnormal Child Psychology, 16, 485-509. McConaughy, S.H., Stanger, C., & Achenbach, T. M. (1992). Three-year course of behavioral/emotional problems in a national sample of 4- to 16-year-olds: I. Agreement among informants. Journal of the American Academy of Child and Adolescent Psychiatry, 31, 932-940. Quay, H.C., & Peterson, D.R. (1987). Manual for the Revised Behavior Problem Checklist. Coral Gables, FL: University of Miami, Department of Psychology. Reed, M.L., & Edelbrock, C. (1983). Reliability and validity of the Direct Observation Form of the Child Behavior Checklist. Journal of Abnormal Child Psychology, 11, 521-530.

< previous page

page_465

next page >

< previous page

page_466

next page > Page 466

Reich, W., & Welner, Z. (1989). DICA-R-C. DSM-III-R version. Revised version of DICA for children ages 612. St. Louis, MO: Washington University, Department of Psychiatry. Richman, N. (1977). Is a behaviour checklist for preschool children useful? In P.J. Graham (Ed.), Epidemiological approaches to child psychiatry (pp. 125-136). London: Academic Press. Robins, L.N. (1985). Epidemiology: Reflections on testing the validity of psychiatric interviews. Archives of General Psychiatry, 42, 918-924. Shaffer, D. (1994). Attention deficit hyperactivity disorder in adults. American Journal of Psychiatry, 151, 633638. Shaffer, D., Fisher, P., Dulcan, M.K., Davies, M., Piacentini, J., Schwab-Stone, M.E., Lahey, B. B., Bourdon, K., Jensen, P.S., Bird, H. R., Canino, G., & Regier, D.A. (1996). The NIMH Diagnostic Interview Schedule for Children version 2.3 (DISC-2.3): Description, acceptability, prevalence rates, and performance in the MECA study. Journal of the American Academy of Child and Adolescent Psychiatry, 35, 865-877. Shaffer, D., Schwab-Stone, M., Fisher, P., Davies, M., Piacentini, J., & Gioia, P. (1988). A revised version of the Diagnostic Interview Schedule for Children. New York: Columbia University Division of Child Psychiatry. Stanger, C., MacDonald, V., McConaughy, S. H., & Achenbach, T.M. (1996). Predictors of cross-informant syndromes among children and youths referred for mental health services. Journal of Abnormal Child Psychology, 24, 597-614. Vandiver, T., & Sher, K.J. (1991). Temporal stability of the Diagnostic Interview Schedule. Psychological Assessment, 3, 277-281. Vignoe, D., & Achenbach, T.M. (1999). Bibliography of published studies using the Child Behavior Checklist and related materials: 1999 edition. Burlington, VT: University of Vermont, Department of Psychiatry.

< previous page

page_466

next page >

< previous page

page_467

next page > Page 467

Chapter 16 Conners Rating Scales-Revised C. Keith Conners Duke University Medical Center The Revised Conners Rating Scales represent the culmination of 30 years of work. The original scales appeared in the mid-1960s. Empirical work on the original scales is described in an annotated bibliography (Wainright, 1996) and in Conners (1994b). Whereas the original scales were developed almost entirely by Conners,1 from children seen personally in an outpatient clinic or collected in local Baltimore public schools, the current restandardization involved a number of colleagues as well as data collection from 200 sites throughout North America.2 A technical manual and a user's manual describe the technical development, norms, reliability, validity, and user aids for acquiring and displaying data (Conners, 1997). There were several reasons for undertaking a revision and restandardization of the Conners Rating Scales at this time: (a) Relatively little empirical work was available at the time the original scales were created, and since then there has been extensive use of these scales in hundreds of studies as well as feedback regarding aspects of use. (b) There have been substantial changes in the demographics of North America, with the old norms being based on relatively restricted samples in Baltimore, Pittsburgh and Ottawa. (c) Researchers often used ''pirate" versions of these scales, altering them for their own purposes, so that standardized item content and format were often compromised. (d) There was a need for information derived directly from adolescents by way of selfreport. (e) A large, national survey of parents using scales by Achenbach, Conners, and Quay provided guidance regarding factor constructs and item content not previously available from restricted samples (Achenbach, Conners, Quay, Verhulst, & Howell, 1 Parts of the original scales were in use at the Harriet Lane Home for Children at Johns Hopkins Hospital, and were created by Anita Bond from section headings of Leo Kanner's textbook of child psychiatry. The items were further modified by Leon Eisenberg, Eli Breger, and Arthur Lockman. A smaller restandardization on a census tract sample in Pittsburgh was assisted by Charles Goyette and Richard Ulrich (Goyette, Conners, & Ulrich, 1978). 2 I am particularly indebted for technical assistance to Gill Sitarenios, James D.A. Parker, George Huba, Drew Erhardt, Jeffrey Epstein, and Elizabeth Sparrow. Karen Wells provided invaluable insights in the construction of the adolescent self-report scales.

< previous page

page_467

next page >

< previous page

page_468

next page > Page 468

1989; Achenbach, Howell, Quay, & Conners, 1991). (f) New analytic methods and more sophisticated psychometric approaches are available now that were not typical of earlier research with these scales. And (g), most importantly, there has been a series of advances in the nosology of childhood psychiatric disorders, culminating in the Diagnostic and Statistical Manual, fourth edition (DSM-IV; American Psychiatric Association, 1994). The original Conners scales included short and long forms for teacher and parent, as well as abbreviated (10item) scales known as the Hyperactivity Index. Descriptors for each item included "Not at all true," "Just a little bit true," "Pretty much true," and ''Very much true." The current versions, referred to as the Conners Rating Scale-Revised (CRS-R), add additional descriptors of frequency of occurrence: Never, Seldom; Occasionally; Often, Quite a bit; and Very often, Very frequent. Short forms were constructed as exact subsets of the longer forms, and more parallel item and factor content was maintained between parent and teacher versions. New adolescent self-report and adult self-report scales were added. Additional specialty scales include DSM-IV subtypes and ADHD indexes. The acronyms, subscales, number of items, and age range of the new scales in comparison to the original scales are shown in Table 16.1. This chapter describes the development of the new scales, with particular emphasis on their differences from the original scales. Overview of the New Instruments The current versions of the scale have a number of features not found in the earlier versions. First, there are long and short scales in which careful attention was paid to making the shorter scales as reliable as the longer ones, so that choices could be made on grounds other than the superior reliability of a longer scale. Second, factor and item content were made as parallel as possible between parent and teacher versions. Because the parent and teacher scales were usually collected for the same subjects, this made it possible to plot results on the same profile sheets, enhancing comparison of findings across settings. Third, new adolescent and adult self-report rating scales were constructed, extending the depth of coverage for assessment of adolescents, and adding adult norms for similar constructs found in childhood, as well as new constructs for adolescents and adults that were warranted on theoretical grounds. Fourth, it was now possible to include items closely modeled on the DSM-IV symptomatic criteria for Attention Deficit Hyperactivity Disorder (ADHD) and Oppositional Defiant Disorder (ODD). Fifth, the internalizing scales were greatly expanded. Sixth, the most often used scale, the 10-item socalled Hyperactivity Index, was normed and factor analyzed into two subscales, retaining the sensitive properties of the Hyperactivity Index.3 Finally, true ADHD indexes were developed by careful extraction of items discriminating between well-diagnosed ADHD samples and age- and gender-matched cases from the standardization sample. 3 This scale has often been misunderstood, as evidenced by the appellation of "Hyperactivity Index." Because the scale was constructed from the highest loaded items on each of the other scales, it is really a "Psychopathology Index." But because it showed great efficiency in detecting hyperkinetic children, and was extremely sensitive to drug treatments of hyperactive youngsters, it came to be known as a hyperactivity index.

< previous page

page_468

next page >

< previous page

next page >

page_469

Page 469

Scale Parent Scales Long Scales CPRS-R:L

TABLE 16.1 Differences Between the CRS and CRS-R Subscales

Version CRS-R

Oppositional, Cognitive Problems, Hyperactivity, Anxious-Shy, Perfectionism, Social Problems, Psychosomatic, CGI (former "Hyperactivity Index," including Emotional Lability and RestlessImpulsive), ADHD Index, DSM-IV Symptoms (including DSM-IV Inattentive and DSM-IV Hyperactive-imclusive)

CPRS-93 CRS

Conduct Disorder, Anxious-Shy, RestlessDisorganized, Learning Problem, Psychosomatic, Obsessive Compulsive, Antisocial, HyperactiveImmature, Hyperactivity Index CRS-R Oppositional, Cognitive Problems, Hyperactivity, Short CRS ADHD Index Scales Conduct Problem, Learning Problem, CPRSPsychosomatic, R:S Impulsive-Hyperactive, Anxiety, Hyperactivity Index CPRS48 CRS-R Emotional Lability, Restless-Impulsive, CGI Total AuxiliaryCRS Hyperactivity Index Scales CRS-R ADHD Index, DSM-IV Symptoms (including DSMCGI-P IV Inattentive and ASQ-P DSM-IV Hyperactive-Impulsive) CADS-P

#of # of Age SubscalesItems Range 14 80 3-17

9

93 6-14

4 6

27 3-17 48 3-17

3 1 4

10 3-17 10 3-17 26, 3-17 18, 12 (Continued)

(table continued on next page)

< previous page

page_469

next page >

< previous page

next page >

page_470

Page 470 (table continued from previous page)

Scale

TABLE 16-1 (Continued) Subscales

Version

Teacher Scales CRS-R Oppositional, Cognitive Problems, Hyperactivity, Long Anxious-Shy, Scales Perfectionism, Social Problems, CGI (former CTRS-R:L "Hyperactivity Index," including Emotional Lability and RestlessImpulsive), ADHD Index, DSM-IV Symptoms (including DSM-IV Inattentive and DSM-IV Hyperactive-Impulsive) CRS Hyperactivity, Conduct Problem, Emotional CTRS-39 Overindulgent, Anxious-Passive, Asocial, Daydream-Attention Problem, Hyperactivity Index Short Scales CRS-R Oppositional, Cognitive Problems, Hyperactivity, CTRS- CRS ADHD Index Conduct Problem, Hyperactivity, R:S Inattentive-Passive, Hyperactivity CTRS-28 Index Auxiliary Scales CRS-R Emotional Lability, Restless-Impulsive, CGI TotalIL CGI-T CRS Hyperactivity Index ASQ-T CRS-R ADHD Index, DSM-IV Symptoms (including DSMCADS-T IV Inattentive and DSM-IV Hyperactive-Impulsive) Adolescent Scales Long Scales CRS-R CASS:L Family Problems, Emotional Problems, Conduct Problems, Cognitive Problems, Anger Control Problems, Hyperactivity, ADHD Index, DSM-IV Symptoms (including DSM-IV Inattentive and DSMIV Hyperactive-Impulsive) CRS Short Scales CASS:S

CRS-R CRS

#of # of Age SubscalesItems Range 11

59 3-17

7

4 4

39 4-12

28 3-17 28 3-17

3 1 4

10 10 27, 18, 12

3-17 3-17 3-17

10

87 12-17

4

27 12-17

Conduct Problems, Cognitive Problems, Hyperactivity, ADHD Index

Auxiliary Scales 4 CADS-A

CRS-R ADHD Index, DSM-IV Symptoms (including DSMIV Inattentive CRS and DSM-IV Hyperactive-Impulsive)

< previous page

page_470

30, 12-17 18, 12

next page >

< previous page

page_471

next page > Page 471

Summary of Development Normative Data Collection. Subjects were recruited by professionals at over 200 sites in North America, including 49 of 50 states and all of the provinces of Canada.4 Forms were completed individually (not in group settings). Item Pool. For each instrument, a large item pool was created from existing items on the earlier scales, new items based on recent diagnostic developments, and new items designed to tap constructs considered of importance to the broad range of internalizing and externalizing disorders of children, adolescents, and adults. For example, the new parent rating scale (long version) started with an item pool of 190 items, which was ultimately reduced to 80 items. Clinical experience contributed many items based on parent interviews, work with teachers, discussions with adolescents, and assessment of adults presenting with complaints related to attentional and self-regulation problems at an outpatient clinic. Analytic Approach. A similar methodology was followed for each of the scales. Papers describing the development of each of the scales are available (Conners, Erhardt, Epstein, Parker, & Sitarenios, 1998; Conners, Parker, & Sitarenios, 1998; Conners, Sitarenios, Parker, & Epstein, 1998a; Conners, Sitarenios, Parker, & Epstein, 1998b; Conners, Wells, Parker, Sitarenios, Diamond, & Powell, 1997; Parker, Sitarenios, & Conners, 1996; Parker, Sitarenios, & Conners, in press). For each scale, samples were split into a derivation sample and a confirmatory sample. The correlation matrix from the derivation sample was subject to a series of principal axis factor analyses to determine which items to retain. Items were included on the final version if the following criteria were met: Items had to load significantly (> .30) on a given factor and lower than .30 on the other factors (exceptions would be made for items with very high loadings on one factor and loadings only slightly above .30 on one or more other factors); and following the rational approach to scale construction, an item was eliminated if it lacked conceptual coherence with its factor. Scree test and eigenvalues (> 1.0) were used to select the number of factors for rotation (Cattell, 1978). In addition, the split-half factor comparabilities method (Everett, 1983) was employed to determine the most reliable factor solution. The factor structure of the final model was then tested with the cross-validation sample using confirmatory factor analysis with EQS for Windows, version 5.1 (Bentler, 1995). As recommended by Cole (1987) and Marsh, Balla, and McDonald (1988), multiple criteria were used to assess the goodness of fit of the factor model: the goodness-of-fit index (GFI; Joreskog & Sorbom, 1986); the adjusted GFI (AGFI), and the root mean-square residual (RMS). Based on the recommendations of Anderson and Gerbing (1984), Cole (1987), and Marsh et al. (1988), the following criteria were used to indicate the goodness of fit of the model to the data: GFI > .85, AGFI > .80, and RMS < .01. The long forms include both externalizing and internalizing scales, whereas the short scales include only scales related to the core constructs for ADHD. Thus, if assessment for only these core ADHD scales is desired, then the short forms will suffice. However, as noted later, the long forms are recommended for an initial assessment of most children. 4 A complete list of site coordinators may be found in the technical manual (Conners, 1997).

< previous page

page_471

next page >

< previous page

page_472

next page > Page 472

Types of Available Norms Normative data are fully described in the technical manual (Conners, 1997). Data are provided for ages 3 through 17, separately for each gender, both as T-scores, percentile ranks, and mean/standard deviation. The normative sample for the child scales included 2,482 children of whom 83% were Caucasian, 4.8% were African American, 3.5% were Hispanic, 2.2% were Asian, 1.1% were Native Americans, and 4.9% were classified as "Other." The adolescent self-report scale included 3,394 adolescents between the ages of 12 and 17. Of these, 62% were Caucasian, 29.9% were African American, 2.3% were Hispanic, 1.6% were Asian, 1.3% were Native American; and 3.1% were classified as Other. Age, sex, and ethnicity effects on each of the scales are fully described in the manual (pp. 102-110). The new adult scale standardization sample included 839 normal adults between the ages of 18 and 81 years of age, approximately divided between males and females (Conners, Erhardt, Epstein, Parker, & Sitarenios, 1998). Basic Validity and Reliability Information Reliability. Overall, for all forms, the coefficients of internal consistency (Cronbach's Alpha) are highly satisfactory across the normative groups. For the CPRS-R:L, the total reliability coefficients ranged from .728 to .942. For the CPRS-R:S, coefficients ranged from .857 to .938. For the CTRS-R:L the coefficients ranged from .773 to .958. For the CTRS-R:S coefficients ranged from .882 to .952. For the adolescent self-report scale (CASS:L), the coefficients ranged from .754 to .917. For the CASS:S, reliability coefficients ranged from .752 to .852. Testretest reliability was estimated from samples tested 6 to 8 weeks apart. Median reliabilities ranged from .70 to .87. Factorial Validity. Factor structure for each of the scales was evaluated by examining intercorrelations between the subscales to see if the intercorrelations met theoretical expectations, separately, for males and females, and by applying a statistical procedure to test the replicability of the subscale structure. To test for gender differences in the pattern of intercorrelations, the equality of the correlation matrices was examined as recommended by Tanaka and Huba (1984). In all cases, the pattern of intercorrelations for the two sexes were virtually identical. As noted earlier, the factor comparabilities method and confirmatory factor analysis were employed to examine the replicability of the scales. In all cases, the scales met rigorous criteria for comparability. Convergent and Divergent Validity. Given the overlap between items in the short and long forms, it would be expected that correlations between long and short forms would be high. Correlations were very near 1.0 in all cases (range .96-.99). Correlations between parent and teacher scales for the ADHD Index was .49. But as expected, correlations among the other scales was somewhat lower between parents and teachers (.12-.55). In all cases across informants, the ADHD Index showed the highest correlations, generally averaging about .50. This suggests that for the core features of ADHD, parents, teachers, and adolescents are in reasonable agreement about the degree of the key symptoms. Because the new scales included the 18 DSM-IV diagnostic items for ADHD, it was possible to calculate prevalence rates. When presence of a symptom is defined as a score of 3 on the 0 to 3 scale, the prevalence rates for ADHD are 3.84 for teacher ratings and 2.30 for parent ratings, consistent with the prevalence figures reported in DSM-IV (3%-5%). It should be noted in this regard, that if the DSM-IV

< previous page

page_472

next page >

< previous page

page_473

next page > Page 473

criterion of "often" was used for each symptom (the equivalent of a "2" on our scales), prevalence figures would have been substantially larger. For the specialty index scales, samples of comprehensively studied ADHD patients were matched with age- and gender-matched cases from the normative database, and discriminant function analyses conducted to identify the subset of items from the long form, which provided the best discrimination. As recommended by Kessel and Zimmerman (1993), a variety of diagnostic efficiency statistics were calculated: sensitivity, specificity, positive predictive power, negative predictive power, false positive rate, false negative rate, kappa, and overall classification rate. Sensitivity ranged between 91% and 100%. Specificity ranged between 77% and 98%. Overall classification rates ranged between 84% and 96%. An important finding was that teacher ratings produce considerably higher false positive rates (17%-18%) than parent ratings (2%-8%). These data are presented in Table 16.2. Basic Interpretive Strategy As with many similar instruments, an understanding of T-scores and their relation to percentiles is necessary for interpretation. T-scores allow direct comparison of relative standing on each factor dimension without regard to the factorial composition or number of items involved. The revised rating scales are sensitive to developmental trends and sex differences, so T-scores are also important in comparing relative performance at different ages or between genders. These conversions can be misleading, however. Consider a child such as a recently turned 12year-old who obtained a T-score of 70. This would be interpreted as "moderately atypical" in terms of the conventions used in the manual for 12- to 14-year-olds. If the same raw score is applied to the 9- to 11-year-old age range, the score would be interpreted as "mildly atypical." Thus caution is urged when repeated measures span age ranges. In terms of interpretations for ethnic subgroups, when age and gender are taken into account, there is practically no variance attributable to ethnicity. This finding is similar to the findings with other large-scale studies in which ethnicity vanishes as an effect TABLE 16.2 Diagnostic Utility of Empirically Derived ADHD Index for Teacher, Parent, and Adolescent Self-report Ratings Teacher Rating Parent Rating Adolescent Scale Scale Self-Report OriginalReplicationOriginalReplicationOriginalReplication 98.2 97.1 92.3 100.0 90.7 90.7 Sensitivity 82.5 81.6 98.1 92.5 88.4 76.7 Specificity 84.8 84.0 98.0 93.0 88.6 79.6 Positive Predictive Power 97.9 96.6 92.7 100.0 90.5 89.2 Negative Predictive Power 17.5 18.4 1.9 7.5 11.6 23.3 False Positive Rate 1.8 2.90 7.7 0.0 9.3 9.3 False Negative Rate .807 .786 .904 .925 .791 .674 Kappa 90.4 89.3 95.2 96.3 89.5 83.7 Overall Classification N 114 206 104 80 86 86

< previous page

page_473

next page >

< previous page

page_474

next page > Page 474

when the sample is large enough and age, sex, and social class effects are controlled (Achenbach et al., 1991). However, in the case of the adolescent self-report scale, there are sufficiently large numbers of African Americans across age and gender that separate norms are provided. An eight-step interpretation sequence is suggested. Examine Threats to Validity of the Scale. This includes random responding due to poor motivation, rushed responding due to time limits, reading difficulties, or misunderstandings of how the instrument will be used. Consider different forms of response bias, such as overreporting to obtain a diagnosis, or underreporting due to fear of labeling. Self-reports by adolescents may underreport as an oppositional strategy or social desirability response. It should be noted that the normative data were collected under conditions of anonymity, so clinic patients may appear less impaired than the normative sample if they minimize their behavior difficulties for reasons of social desirability. Note logical inconsistencies, such as endorsing the item "fidgeting" while denying the item, "Fidgets with hands or feet or squirms in seat." Try to determine the reason for inconsistencies across raters (e.g., between mother and father or between parent and teacher). Sometimes such differences reflect true differences in the subjective perceptions of the raters, but other times such differences are clues to invalid responses created by differing standards. For example, parent raters use themselves as the standard of comparison as when they were children, whereas others use same-age children in the neighborhood as the referent against which to judge severity or intensity of a behavior. Analyze the Index Scores. For each scale, the ADHD Index is the best initial indicator of whether a child is likely to have an attentional or hyperactive-impulsive problem. The Global Index (formerly called the Hyperactivity Index in the previous version of the scales) also may be elevated, with or without an elevated ADHD Index. This situation suggests the presence of internalizing problems along with a significant restless-impulsive component. A recent factor analysis of this scale indicates that both restlessness and emotional lability contribute to this Global Index (Parker et al., 1996). Examine the Overall Profile and Subscale Patterns. There are several basic patterns of interest. The typical, or normal, profile has scores within a few points of 50 (the average for the standardization sample). The mildly elevated profile shows one or more scales hovering close to 60, that is, one standard deviation above the mean. Although such scores may cause suspicion of a problem, they need careful further investigation. The elevated profile, Type G is a profile in which three or more independent subscales are elevated above 60 or 65. "Independent" refers to scales that are conceptually different; for example, the three internalizing scales (Anxious-Shy, Perfectionistic, Psychosomatic) are conceptually different from the three externalizing scales (e.g., Hyperactivity, Conduct, and ADHD Index). The elevated profile, Type P profile refers to more focused elevation of scales, as when the Hyperactivity and Cognitive/Inattention scales are elevated. These profiles typically point to "particular" problem areas such as ADHD. Certain patterns are consistent with "pure ADHD," such as elevated Hyperactivity, Inattention, and ADHD Index in the absence of other elevations. Others suggest common comorbidities, as when the former three scales are elevated in the presence of oppositional problems. Comorbid learning disorders are indicated by elevations of the ADHD Index along with high elevations on the Cognitive/Inattention dimension. Note that this latter dimension includes academic subjects that are not found in the DSM-IV categorical approach, where learning problems have been relegated to other Axis I

< previous page

page_474

next page >

< previous page

page_475

next page > Page 475

disorders (e.g., developmental writing or reading disorders). The fact that there is empirical clustering of academic and inattention items is noteworthy and reflects the common perception that problems of inattention are closely linked to academic failure. Analyze Subscale Scores. This is the most common approach to interpretation of the Conners scales. Each factor can be interpreted according to the predominant conceptual unity implied by the item content. Special parent and teacher feedback forms are available for ease of communicating the meaning of the factors to referring teachers or to the parents. Analyze the DSM-IV Symptom Subscales. The interpretive position adopted in the manual for the Conners scales is a conservative one. It is recommended that only those items scored as 3 (i.e., "Very Much True," "Very Often," "Very Frequent") count toward a categorical DSM-IV diagnosis. This will sometimes lead to a discrepancy of the DSM-IV item count and the dimensional, T-score, normatively based approach. DSM-IV itself words items as with a prefix of "often . . ." but gives no criteria as to what counts as often. If this definition were to be used literally, the items with a score of 2 (i.e., "Pretty Much True,'' "Often," "Quite a Bit") would count toward the criterion symptom count. How this problem is approached depends on whether there is more concern about over- or underdiagnosis when using the categorical approach. The preference is to rely on the dimensional T-score approach, counting the presence of Hyperactivity-Impulsivity and Inattention only when the scores reach a T-score level of 65 or greater, that is, one and one half standard deviations above the mean. But, in a managed care environment where more literal (and liberal) interpretation might be required to obtain services, it is possible for one to prefer the "often . . ." approach, counting items toward the DSM-IV criterion that are 2 or greater. Examine Individual Item Responses. Interpretations at the item level are best considered hypotheses for further exploration. For example, a teacher rating may not show an elevated factor score for Cognitive Problems/Inattention, but perusal of the items shows that she gave a 3 on the item "Seems tired or slowed down all the time." This might suggest further inquiry regarding some physical impairment such as endocrine, vitamin deficiency, or lack of sleep. Remember that many treatable problems do not rise to the level of a disorder or syndromal level of impairment, but are nevertheless important. Although statistically coherent, the factors can be considered to contain subfactors of interest. For example, the reading, spelling, and handwriting items on the Cognitive/Inattention factor might suggest an academic or learning problem; elevations on items of forgetting things already learned and frequently misplacing items might suggest specific memory problems. Although hazardous as conclusions, such findings provide rich material for investigative hypotheses. Integrate Results with All Other Available Information. The real art of clinical practice consists in reconciling data from many different sources, including the rating scales, family psychiatric and medical history, family interactions, child developmental history, educational information, intrapsychic functioning such as self-efficacy and self-esteem, early temperament, language and motor development, and so on. Rating scales are good guides to narrowing the focus of an inquiry, but by themselves are insufficient to tell a comprehensive and accurate story regarding individual patients. Used in conjunction with other relevant historical, medical, and current function data, they can provide important adjuncts to the descriptive and diagnostic enterprise, as well as guides for treatment and instruments for treatment monitoring.

< previous page

page_475

next page >

< previous page

page_476

next page > Page 476

Using All Sources of Information, Determine the Appropriate Intervention or Remediation Strategy. After a multimodal assessment there are several issues to be considered: what is appropriate and ethical in terms of feedback, who should have access to the information, and who should participate in the treatment planning deliberations. Treatment Planning The revised versions of the CRS-R were developed after 30 years of clinical practice and experience with the earlier versions. Of course, many researchers have utilized the scales in a variety of research contexts as well. But the individual, person-to-person application of the scales has usually been done in the context of a full clinical evaluation whose central focus is the development of a treatment plan for the clients. With parents, the scales provide a convenient and direct method of summarizing the behavior pattern of the child. Typically, this involves providing parents with either a typed summary (obtained from the output of the computer version), or a special feedback form on which the T-scores have been entered. One begins by perusal of individual items, particularly those rated as 3 ("Very much true"), to highlight the most salient individual behaviors. Then the meaning of the factor clusters to which the items belong is discussed, using the plotted profile to focus on the overall pattern of findings. This information is then integrated with the entire "story" of the child's life, including the family context, developmental trends, cognitive and academic strengths and weaknesses, and social interaction style. The essential aim of this stage of feedback to the parents is to provide a meaningful and coherent account of the children's current state and how they got to that point. Often this will require a summary of the family psychiatric history as it bears on the current problems (e.g., pointing out family-genetic trends that help explain the findings). Or, it might be necessary to interpret the problem in the context of significant birth or developmental risks relating early birth and developmental information. The parent rating scale profiles give a concrete and visible document to which the complex life story can be anchored. This process is enhanced by using a systematic recording form filled out by parents or adults (Conners & March, 1994). The primary problems for treatment planning are generally suggested by those factors most significantly elevated (e.g., one to two standard deviations above the normative mean). Secondary problems are often individual behaviors that are prominent even though the factors from which they come are not significantly above the normative mean. For example, there might not be an elevated Cognitive/Inattention factor score, but there may be ratings of 3 on reading, spelling, or handwriting items. These could in fact become primary treatment targets, not secondary problems, needing vigorous intervention at the school level before other factors are addressed. The timing of the introduction of treatments is determined by a number of factors, such as which behaviors are causing the most immediate distress or functional impairment. This information can only be obtained as part of the overall assessment. Each of the factors carries with them certain implications regarding treatment, based on a large body of clinical research evidence and experience. For example, oppositional or aggressive conduct suggests the need for parent training and behavioral interventions at home, whereas comparable behaviors at school might imply the need for similar programs there also. The fact that a diagnosis based on parent report is a high predictor

< previous page

page_476

next page >

< previous page

page_477

next page > Page 477

of diagnosis based on teacher report (Biederman, Keenan, & Faraone, 1990) should not deter one from having information from both settings because treatment in both settings may well be required. DSM-IV actually requires cross-situational presence of symptoms for a diagnosis of ADHD. On the other hand, remember to point out to parents that aggressive or oppositional problems are also responsive to stimulants (Speltz, Varley, Peterson, & Beilke, 1988). An elevated hyperactivity score will clearly raise the issue of the need for medication, whereas high anxiety, perfectionism, or psychosomatic anxiety elevations suggest caution in the use of stimulants (Pliszka, 1989) while pointing to the possible need for antistress techniques such as progressive relaxation, cognitive behavioral therapies, or anti-anxiety or mood-stabilizing medications. Many ADHD children will have significant cognitive and academic problems, as indicated by elevated scores on the Cognitive Problems/Inattention scale. A variety of classroom management and academic interventions will then be appropriate treatment targets. For example, the use of a home-based reward system such as the daily report card might be an appropriate intervention for classroom disruptive, interfering, or other off-task behavior (Abramowitz & O'Leary, 1991). Elevations on this factor might also suggest the need for further intellectual and academic assessment if these are not already available. Many parents will justifiably be concerned about recommendations regarding medication for ADHD and related behavior problems. By using the rating profile to indicate that "this is the type of pattern that research and clinical experience shows to be responsive to medication," the parents may be reassured. Of course, a thorough understanding of the research literature on medications is important in order to avoid embarrassing "show me" responses by the parent (Brown & Sawyer, in press). It is often wise for nonspecialists to share the findings with a pediatrician or psychiatrist, leaving the discussion of medication to them. Treatment planning in the school is often facilitated by using the scales as a focus for describing the overall pattern of the child's behavior. Convenient feedback forms for the teacher provide the essential descriptive information that can be useful in a teacher or school team treatment planning conference. For both teacher and parent, the scales help prevent a focus on only one area of behavior, to the exclusion of other and possibly equally important treatment targets. For example, a child may require counseling or therapy directed at inappropriate goal setting when excessive perfectionism causes a high level of frustration in achievement-related areas. With adolescents and adults, self-report rating scales provide an important focus for direct feedback regarding the nature of their problem. Of particular importance is the identification of specific cognitive deficits that are responsible for academic and vocational impairment. Treatment of these specific deficits constitutes an important element in the management of adolescent and adult ADHD (Hallowell & Ratey, 1994). With adolescents, complaints regarding anger management suggest specific cognitive anger management interventions (Hinshaw, Buhrmester, & Heller, 1989; Pelham, Milich, Cummings, & Murphy, 1991; Saylor, Benson, & Einhaus, 1985). Also, among both children and adolescents with ADHD, family processes are prominent complications of the disorder, frequently requiring family interventions (Arnold, Sheridan, & Estreicher, 1986; Barkley, Anastopoulos, Guevremont, & Fletcher, 1992a; Barkley, Guevremont, Anastopoulos, & Fletcher, 1992b; Brown & Pacini, 1989; Coker & Thyer, 1990; Cunningham, Benness, & Siegel, 1988; Edwards, Schulz, & Long, 1995; Marshall, Longwell, Goldstein, & Swanson, 1990; Schachar & Wachsmuth, 1991; Silver, 1989; Wells, 1988; Ziegler & Holden, 1988).

< previous page

page_477

next page >

< previous page

page_478

next page > Page 478

Certain limitations in the use of rating scales for treatment planning are apparent. One such limitation comes with adolescents who may refuse to reveal ongoing problems, and instead deny presence of symptoms. Often, such problems arise when the adolescent is an unwilling participant in the evaluation process. It may be necessary to develop a trusting relationship in which the adolescent feels comfortable in revealing problems. In any case, it is often advisable to structure the administration of the self-report scales by indicating that if they choose, the results will not be shared directly with parents. Instead, the adolescent can be reassured that admission of specific problems (e.g., covert substance use or criticisms of the family process) will only be shared in a general way to help find appropriate treatment or problem-solving strategies. A review of how the scale is to be used will often calm fears that they will get themselves into more hot water by being honest about their current behavior. Many adult patients have poor insight into their own behavior and often underestimate the impact of their behavior on family members. For this reason, a separate form filled out by a significant other is often a useful supplement to treatment planning and monitoring. For example, whenever possible, the spouse of an adult ADHD client should be asked to provide an independent assessment, as well as repeated measures during the course of treatment. This is a particularly desirable feature in clinical trials research when treatment sensitivity to medications is an important issue. An adult with ADHD often believes that there has been little change from a stimulant drug, whereas the spouse notices many areas of improvement. The Conners Rating Scales are particularly appropriate for use in a managed care environment. The scales provide a standard, economical, and well-validated descriptive framework for determining when treatment is needed. They also provide a sensitive and well-established method for monitoring treatment response and determining when the target behaviors are within normal limits. The scales are ideally suited as documentation for managed care caseworkers, provided that certain guidelines are followed. If the scales are going to be useful and interpretable over time, practitioners should identify the target behaviors of interest. In addition to highly elevated factors, individual target behaviors should be identified by asking parents, teachers, or clients to circle three to five items that are the most important. It is highly recommended that at least two administrations of the scales be obtained at baseline to avoid artifactual decreases unrelated to treatment, and to establish a benchmark for comparing subsequent change (Milich, Roberts, Loney, & Caputo, 1980). A form for recording repeated observations is available that allows tracking of responses by multiple observers over time (Conners, 1997). Although it is important to try and record teachers' observations for children receiving drug treatment, during summer and holidays the reports by parents may be crucial. But parents should be made aware that the dosing schedule may mean they see the child at the end of the day when most of the treatment has worn off. Recent trends in stimulant medication therapy have moved to three times daily-dosing regimens (Greenhill et al., in press), and instructing parents to observe behavior at a specific time period in the evening will make drug response observations more valid. The main limitations for use of rating scales in planning treatments comes from the classic "garbage-in, garbageout" problem. When the responses are invalid due to positive or negative halo effects, logical inconsistencies, inappropriate referents of what is normative, and various other limitations of rating instruments (Piancentini, 1993), then treatment planning is hazardous. Usually these problems can be dealt with by following the eightstep interpretive approach described earlier.

< previous page

page_478

next page >

< previous page

page_479

next page > Page 479

Use of the Revised Conners Scales in Treatment Outcome Assessment As described in Maruish (1994), the validity of outcome assessments with the Conners scales is highly dependent on task setting and task complexity. Research indicates that these global ratings tend to be most valid when made within the context of structured activities (Oettinger, Majovski, & Gauch, 1978) and with somewhat demanding tasks in terms of difficulty and complexity (Steinkamp, 1980). Thus, in planning an outcome study, it is important to encourage teachers and parents to make their ratings based on structured academic situations such as arithmetic or language arts, not free play or recess. Similarly, observations at home might best be considered in homework situations or in responses to task demands or chores, not while watching television or playing Nintendo-like games. Situations involving structured problem solving (Cohen, Sullivan, Minde, Novak, & Keens, 1983) or response delay or transitions between tasks (Zentall, Gohs, & Culatta, 1983) are likely to be more sensitive to the Global Index or ADHD Index measures. Ratings for externalizing problems such as hyperactivity are likely to be more valid when made in contexts with high levels of activity cues in the environment (Copeland & Weissbrod, 1978). Many research studies use pre- and postmeasures in an experimental and control group design. It is important to recall that research with earlier versions of the scales show short-term drops in symptoms after a single readministration (Milich et al., 1980), and long-term administration may result in gradual increases over time (Diamond & Deane, 1990). Single case studies without controls are therefore suspect. More importantly, it would seem de rigueur to have at least one readministration at baseline before beginning a treatment or intervention. The question of which scales to use in outcome research depends on several factors. If the research is highly focused on a narrow condition, such as ADHD, then it seems reasonable to use only those factors pertaining to the condition, or to use the short ADHD Index or Global Index. The robustness of past versions of the Global Index in treatment studies, the ease and lack of subject burden, all recommend the new indexes as dependent measures in treatment outcome research. Similarly, if anxiety is the target behavior, then the anxiety scale might be extracted as a dependent measure. However, based on clinical experience, it is unusual to have single conditions presenting in the absence of comorbid characteristics. Thus, ADHD often carries oppositional-defiant behaviors or conduct disorder scales along with it. Only if frequent repeated measures are called for would it be recommended to use a brief 12-item scale, such as the ADHD Index or the 10-item Global Index, by itself. Otherwise, it is reasonable to use the short forms if only a few repeated measures are required. Typically, the long forms are used at the very beginning (e.g., screening or baseline) and as the last posttreatment measure. Because the short scales and indexes are exact subsets of the longer scales, they can always be extracted from the longer versions for comparison with briefer interim measures. Maruish (1994) noted that the revisions of DSM-IV are likely to have a significant impact on outcome research. It was for this reason that all of the DSM items for ADHD were included in the standardization of the revised parent and teacher scales. It is now possible to use the DSM scales as outcome measures when it would be judicious to do so. In an ADHD drug trial, Food and Drug Administration (FDA) guidelines might require documentation regarding the disease entity being targeted by the new drug. Note that the symptom criteria for ADHD in DSM-IV use the descriptor "often" as the frequency required for each symptom to meet criteria. Experience suggests that this

< previous page

page_479

next page >

< previous page

page_480

next page > Page 480

would lead to an overdiagnosis of the condition compared with normative data, and it may be less sensitive to treatment response than the data obtained in comparison with norms. However, where the relatively undifferentiated DSM criteria are needed, then symptom scores based on ratings of 2 or greater are certainly possible and appropriate. Typically, the T-scores will be examined and the focus will be on treatment changes of half a standard deviation or more (T-score changes of 5 or more scale points). However, it is important to remember that the factor scales represent a structure derived from large numbers of cases, and individuals may have only a few target symptoms relevant to themselves within a particular factor. Clinically, therefore, it is always useful in assessing change to have parents or teachers circle three to five items they think are the most crucial problem areas. Then, regardless of changes in factor scores, it is possible to examine particular target symptoms or behaviors for evidence of treatment effect. Obviously, in clinical situations involving a single client, clinicians must be mindful of the possibility of interpreting random fluctuations as real change; but this is precisely the reason for not relying on a single outcome measure. It is possible that a factor might show significant change, but a particular target symptom of interest to the parent or teacher does not change. It is therefore important to maintain an "ipsative" mindset as well as a normative one in evaluating change in the clinical setting with rating scales. In a research context, the target-symptom approach may be important because those symptoms most troubling or most elevated are likely to show the most treatment gain. Thus, target symptoms may be more sensitive to treatment than general factor score elevations. In general, endorsements of "Pretty Much" or "Very Much" represent clinically significant levels of behavior or symptomatology. But some raters use a constricted scale, so that a change from "Not at all'' to "Just a Little" represents a major shift in judgment of symptom intensity. It is therefore important in a clinical context to note the overall pattern of parent or teacher item endorsements and judge the impact of treatment accordingly. For the most part, however, clinically significant change has been interpreted as constituting a change from "Very Much" or "Pretty Much" to "Not at All" or "Just a Little." Based on this reasoning, average scale changes (where each item is scored 0, 1, 2, or 3) of 2 points are considered clinically meaningful. In drug therapy outcome studies, an essential source of data to be used in conjunction with the scales is Treatment-Emergent Symptoms Scale (TESS). Whereas behavioral symptoms can be increased by drugs, and therefore constitute TESS, specific lists of symptoms for this purpose have been devised (Guy, 1973). Other collateral information of importance is Who, Where, and When: Who made the ratings, in what setting, and when they were done in relation to the treatment? Obviously, this advice is most pertinent to time-sensitive and context-sensitive treatments such as pharmacologic agents; but experience shows that changes in raters from pre-to posttreatment, as well as in variations in time and place of data collection, affect the amount of noise in the rating data. A change in rater from one parent to the other, one classroom or teacher to another, or to a different time of day, will add an unpredictable element to assessment of outcome, and most likely will diminish the sensitivity of ratings. Some cautions are needed regarding the use of the new adolescent and adult self-rating scales in treatment outcome research. By their very nature, these scales allow the patient or research subject to estimate the level of their own symptomatology. What these patients may reveal can be greatly affected by the demand characteristics of the treatment situation, as well as their own attitudes regarding the treatment. Adolescents, for example, may deliberately minimize their symptoms or the extent to which the treatment affects them as part of their general oppositionality to adult authority. It is therefore important to clarify

< previous page

page_480

next page >

< previous page

page_481

next page > Page 481

this issue up front in order to avoid a common error with global scales, in which the internal referent for severity and change may be different than expected by the investigator or clinician. Adults may wish to receive a drug and therefore exaggerate the severity of their symptoms or the amount of change the drug evoked. Because the scales were standardized under anonymous conditions, some attempt to replicate this context of assessment should be made with the adolescent and adult (e.g., by insuring anonymity in published results, privacy with respect to parental or spousal access, etc.). By the very nature of their problems, it is to be expected that adolescents and adults with ADHD will have poorly organized recall of information that is not recent. For this reason, in treatment outcome studies where self-report is used as an outcome or study entry criterion, several measures are typically obtained using a daily diary format. In this approach, the time frame for the presence and severity of the symptoms is one day, rather than the typical weekly or monthly time frame. Case Studies5 Case Study 1 DSM-IV Diagnosis: The Integration of Other Information Background. T. H., a 14-year-old male, has always struggled in school, and both his teachers and his parents have become frustrated with trying to help him over the years. His grades since entering middle school have dropped dramatically. Nevertheless, with a lot of effort, he has managed to achieve small gains in areas such as being responsible for his work. He appears distractible and without motivation, and T.H. has been diagnosed with learning disabilities in written expression. He has a 114 Verbal IQ, a 104 Performance IQ, and a 110 Full Scale IQ. Achievement scores were within the expected range, with the exception of his slow motor performance and writing. A psychometrist's evaluation showed weaknesses in fine motor coordination, visual-sequential memory, and difficulties in visual tracking and copying from the board. He was given extra help in writing for 45 minutes per day. At the end of second grade, T.H. was evaluated at a university clinic, was given a diagnosis of Attention Deficit Hyperactivity Disorder, and was placed on Dexedrine (5 mg/a.m., 2.5 mg/p.m). He was taken off Dexedrine and placed on Ritalin 30 mg/a.m., 5 mg/noon. While on Ritalin, he exhibited mood swings, irritability, decreased appetite, crying, and temper tantrums. Ritalin treatment was terminated, and he was tried on Prozac, 10 mg per day. With Prozac, he had side effects of restlessness, dizziness, face rashes, and headaches. T.H. seemed more content on Prozac, but his motivation remained poor. Adderall was tried for a month, but it was discontinued when the physician relocated. TH appeared to have received some benefits from Adderall, but the trial was too short to be conclusive. T.H. lives with his mother, father, and his 5-year-old sister. His parents are high school graduates with some college education. There is no family history of psychiatric, neurological, alcohol, or substance abuse problems, but his father may have had some learning difficulties when he was younger. T.H.'s parents report considerable family stress as a result of the energy required to deal with his problems. The father has cultivated a somewhat tense and angry relationship with his son. Although these family relationship 5 This section is excerpted from the technical manual for the revised rating scales (Conners, 1997) by permission from the publisher.

< previous page

page_481

next page >

< previous page

page_482

next page > Page 482

problems did not cause T.H.'s difficulties, they now make it difficult to deal effectively with him. T.H. was considered a "wonderful baby" who was cautious, normally active, and nonimpulsive. He had persistent difficulties falling and staying asleep and woke up every hour until he was 6 months old. He continues to have nightmares and sleepwalking episodes. Other observers (i.e., family members, friends of family, teachers) have commented on T.H.'s distractibility, but they generally noted a lack of motor hyperactivity and impulsivity. CRS-R Ratings. Figure 16.1 shows a profile sheet that summarizes CTRS-R:L information from three of T.H.'s teachers. Figure 16.2 summarizes CPRS-R:L data from the mother and Fig. 16.3 shows the completed CASS:L profile sheet. Although there is some variation, overall, the teachers' responses are in agreement with each other. These data, like the parents' ratings, reveal problems with inattention and conduct. The parent responses also indicate internalizing problems (psychosomatic complaints and anxiety). The self-report responses (Fig. 16.3) were in general agreement with the other informants. The long self-report scale is a particularly good source for insights regarding internalizing problems. Scores on the mood-related subscales (Emotional Problems and Anger Control subscales) show only minor elevations. Further Testing and Information. A DSM-IV interview with T.H.'s mother indicated only three of a possible nine hyperactive-impulsive symptoms, but eight of nine inattentive symptoms. This outcome qualifies him for a diagnosis of Attention Deficit Hyperactivity Disorder, Inattentive Subtype. Symptoms have been present since T.H.'s first grade and have created significant dysfunction at both home and school. T.H. also was positive on seven of eight symptoms of oppositional defiant disorder. Although he can be physically aggressive, he does not meet criteria for conduct disorder. He was an appropriately dressed, friendly, and cooperative youngster. His response style was deliberate and careful. Activity level was minimal, and verbal fluency was spontaneous and mature. He was hyperverbal at times, especially about computers, which he seems to know well. Because of the moderate elevations on the anxiety-social, and mood-related scales, follow-up instruments were administered to probe some of these areas. However, T.H. reported few problems of anxiety or depression on the Revised Children's Manifest Anxiety Scale (Reynolds & Richmond, 1978), Multidimensional Anxiety Scale for children (March, Parker, Sullivan, Stallings, & Conners, 1997), and the Children's Depression Inventory (Kovacs, 1993). On these tests, he answered "Yes" to a number of items addressing concern with peer evaluation and being considered stupid. T.H. was administered the Test of Spatial Attention (Swanson et al., 1991), which requires the child/adolescent to respond to targets presented in the left and right visual fields. On this test, he had a distinct difficulty in disengaging his attention from the right visual field and shifting his attention to the left visual field. This difficulty occurred only when very fast processing of the warning cues was required, which suggests a deficit in automatic visual attention of the right visual field. Finally, the Conners Continuous Performance Test (CPT; Conners, 1994a) was employed as part of the assessment. For the CPT, the child/adolescent must learn to adapt to different presentation rates, as well as continuously respond over a 14-minute period. T.H. made many errors of omission and commission, was highly variable in his performance, and tended to perform worse as time went on. Thus, he showed marked impairment of sustained attention, response control, and precision.

< previous page

page_482

next page >

< previous page

page_483

next page > Page 483

Fig. 16.1. Conners' Teacher Rating Scale-Revised: long version. Taken from Conners' Rating Scales-Revised Technical Manual by Dr. Keith Conners, copyright © 1997, Multi-Health Systems, Inc., 908 Niagara Falls Blvd., North Tonawanda, NY 14120-2060. All rights reserved. Reproduced by permission.

< previous page

page_483

next page >

< previous page

page_484

next page > Page 484

Fig. 16.2. Conners' Parent Rating Scale-Revised: long version. Taken from Conners' Rating Scales-Revised Technical Manual by Dr. Keith Conners, copyright © 1997, Multi-Health Systems, Inc., 908 Niagara Falls Blvd., North Tonawanda, NY 14120-2060. All rights reserved. Reproduced by permission.

< previous page

page_484

next page >

< previous page

page_485

next page > Page 485

Fig. 16.3. Conners-Wells' Adolescent Self-Report Scale: long version. Taken from Conners' Rating Scales-Revised Technical Manual by Dr. Keith Conners, copyright © 1997, Multi-Health Systems, Inc., 908 Niagara Falls Blvd., North Tonawanda, NY 14120-2060. All rights reserved. Reproduced by permission.

< previous page

page_485

next page >

< previous page

page_486

next page > Page 486

Analysis. The practitioner followed all of the interpretation steps identified earlier in this chapter. The following are interpretive steps with regard to T.H.'s case: 1. There was no reason to question the validity of the findings. The consistency of the CRS-R findings across informants supports the validity of the ratings and subsequent conclusions based on the ratings. 2. Elevations on the ADHD Index argue for the presence of an attention-related disorder. The elevation of the Conners Global Index on the teacher and parent forms supports this hypothesis and also suggests the problem may include comorbid internalizing problems (mood lability). 3. The general impression derived from the profile implies that the problem entails more than just attention problems. Elevations are seen on subscales pertaining to oppositional behavior and social and psychosomatic problems. 4. The subscale elevations suggest that this adolescent is likely to break rules, have problems with persons in authority, and react with anger and annoyance in many situations. This adolescent is also likely to have trouble paying attention. Elevations on the Anxious-Shy subscale indicate an emotional and sensitive teenager. The psychosomatic elevations identified by the parent connote the adolescent's tendency to report more physical symptoms than others his or her age. 5. The DSM-IV symptom subscales show elevations on the Inattentive subscale and only slightly above average scores on the Hyperactive-Impulsive dimension. When the DSM-IV symptoms subscales were examined categorically, the symptom count roughly corresponded to the symptom counts generated from the parent interview. 6. The practitioner examined individual item responses to help identify specific concerns. For example, T.H.'s response of "Pretty much true ("Often, quite a bit) to "People bug me and get me angry" suggested an oversensitivity to criticism/comments from others that may contribute to his anger control problems and oppositional behavior. His response of "Just a little true (Occasionally)" to "I am truant from school (i.e., stayed out of school without permission)'' was also a concern. 7. Other tools were used in conjunction with the CRS-R assessment. In particular, aptitude testing was conducted to evaluate cognitive/learning problems, interviews were utilized to help arrive at a diagnosis, the RCMAS and MASC were used to further explore anxiety issues, the CDI was applied to explore mood problems in more depth, the spatial attention test was used to examine selective aspects of attention and to consider biophysical elements, and the CPT was used as an objective assessment of attention problems. Historical, family, and observational information were all synthesized with the psychometric results to arrive at the conclusions. Only after all this information was integrated with the CRS-R results, was it reasonable to issue a diagnosis. In this case, the diagnosis might be: Axis I: 314.00 Attention Deficit Hyperactivity Disorder, Predominantly Inattentive Subtype 313.81 Oppositional Defiant Disorder Axis II: None Axis III: None Axis IV: Educational problems; family stress Axis V: Global Assessment of functioning = 60 8. The information gathered led to recommendations for parent management training, with an emphasis on developing methods for enhancing self-esteem; individualized classroom instruction and educational accommodations; reevaluation of medication with proper monitoring. Case Study 2 Using Short Forms for Initial Screening Although administration of the long forms is preferable in most cases due to the comprehensive information they produce, occasionally it may be more appropriate to administer the short forms of the CRS-R. In the following case study, the subject, a

< previous page

page_486

next page >

< previous page

page_487

next page > Page 487

reasonably successful student with no apparent behavioral problems, is troubled by a sense of not fulfilling his potential. The CRS-R short forms were employed as part of an initial screening procedure to evaluate the presence of a cognitive or attentional problem. Background. M.W. is a 17-year-old male who reports a continuous difficulty performing up to par on standardized tests. He complains of memory difficulties and, on discovery of a paternal uncle and a cousin with ADHD, felt that he shared many of the same problems. Although the teacher and the parents generally felt he was a cooperative, popular, and hard-working student, screening for attentional and cognitive problems was undertaken to ascertain whether there was an identifiable problem that might account for M.W.'s subjective sense of underachievement. CRS-R Ratings. The completed profiles from the short forms provided by M.W.'s parents, his teacher, and M.W. himself are shown in Figs. 16.4, 16.5, and 16.6. The ratings indicate potential problems in cognitive functions (T-score of 65 for mother and 63 for the father); and the mildly elevated ADHD Index (T-scores of 63 and 59, respectively) raises a suspicion of ADHD. The teacher ratings were within normal limits, but the self-report ratings also suggested cognitive problems (T-score of 66), and possible ADHD (T-score of 58). An examination of the specific items of the ADHD Index endorsed by parents and M.W., respectively, indicated a preponderance of items consistent with problems in organizing and completing tasks, and difficulty concentrating on work requiring mental effort. Further Testing and Information. Given these suggestive but inconclusive findings, a full evaluation was recommended. Significant risk factors were noted: M.W.'s father and brother have been diagnosed with expressive written language disorders, and his paternal uncle and cousin have been diagnosed with ADHD. However, given his adequate academic performance and classroom behavior, the teacher's unremarkable ratings are not surprising. In a subsequent semistructured DSM-IV interview of ADHD symptoms, both M.W. and his parents endorsed few symptoms that supported the presence of either hyperactivity-impulsivity or inattention. A significant number of oppositional defiant disorder symptoms were endorsed by the parents, but these turned out to reflect primarily resistance around homework issues. Further psychological testing revealed that M.W. had a significant problem in written language. He noted that expressing himself in writing was always very slow and effortful. Though he managed to complete his classroom and homework assignments, the continuing frustration and effort eventually caused him to rebel and resist school-related tasks. With this information, it was clear that the apparent elevation of Inattention-related items was in fact part and parcel of M.W.'s learning disability. Recommendations were made for accommodations that would allow less writing in class and in homework, the use of oral reports, a "note-taking buddy," and extra time in taking written examinations. Case Study 3 Monitoring Treatment Effectiveness Background. K. H. was given a full clinical evaluation, which suggested ADHD, Hyperactive-Impulsive subtype and, after trying and considering various types of interventions, led to the recommendation of pharmacological treatment. The decisions involved in introducing and monitoring medication are complex (Brown & Sawyer, in

< previous page

page_487

next page >

< previous page

page_488

next page > Page 488

Fig. 16.4. Conners' Parent Rating Scale-Revised. Taken from Conners' Rating Scales-Revised Technical Manual by Dr. Keith Conners, copyright © 1997, Multi-Health Systems, Inc., 908 Niagara Falls Blvd., North Tonawanda, NY 14120-2060. All rights reserved. Reproduced by permission. press; Conners, March, Erhardt, Butcher, & Epstein, 1996). But in situations where medications are being recommended, the following steps are in order: 1. Identify the target behavior and subscaleslitems for monitoring change. The first step in the process is to select a subscale or subscales for monitoring change. Because multiple readministration of scales is required, the practitioner decided to use the CADS-P and CADS-T because they are short, provide data on attention problems, and provide information that can be linked directly to diagnostic symptoms. The CADS scales include the ADHD Index and the DSM-IV symptoms subscales for both ADHD subtypes.

< previous page

page_488

next page >

< previous page

page_489

next page > Page 489

Fig. 16.5. Conners-Wells' Self-Report Scale. Taken from Conners' Rating Scales-Revised Technical Manual by Dr. Keith Conners, copyright © 1997, Multi-Health Systems, Inc., 908 Niagara Falls Blvd., North Tonawanda, NY 14120-2060. All rights reserved. Reproduced by permission. 2. Choose and follow a specific protocol. In this case, the child's classroom behavior was of particular concern, so increased emphasis was placed on the teacher's responses. Additionally, the teacher was not told when the medication was begun to minimize the possibility of bias. Scores from the ADHD Index and the DSM-IV subscales from the initial administration of the CTRS-R:L were used as baseline measures. It was decided to start methylphenidate at 5 mg twice daily, and to increase dosage once every 2 weeks. Because academic concerns were of interest, curriculum-based measurement (CBM) was used by obtaining brief samples of handwriting and math (Gickling & Thompson, 1985; Ross, Poidevant, & Miner, 1995).

< previous page

page_489

next page >

< previous page

page_490

next page > Page 490

Fig. 16.6. Conners' Teacher Rating Scale-Revised. Taken from Conners' Rating Scales-Revised Technical Manual by Dr. Keith Conners, copyright © 1997, Multi-Health Systems, Inc., 908 Niagara Falls Blvd., North Tonawanda, NY 14120-2060. All rights reserved. Reproduced by permission. 3. Examine results on the subscale level and evaluate medicationldosage effectiveness. The Treatment Progress ColorPlot is shown in Fig. 16.7. It shows the scores on seven administrations (one premedication result and six postmedication results). Normally, it would be desirable to have at least two baseline measures. Although there was some improvement with 5 mg, the change was less than one-half standard deviation and cannot be taken as a meaningful effect. The increase to 10 mg produced a drop of almost two standard deviations. The level of improvement was roughly from the 98th percentile for elevated symptoms to the 66th percentile, or near-normal performance. Notably, a further increase of dosage failed to show any further decrease in symptoms, so K.H. was maintained on 10 mg/twice daily. Note that it would be possible to also

< previous page

page_490

next page >

< previous page

page_491

next page > Page 491

Fig. 16.7. Conners' Rating Scale-Revised Treatment Progress ColorPlot. Taken from Conners' Rating Scales-Revised Technical Manual by Dr. Keith Conners, copyright © 1997, Multi-Health Systems, Inc., 908 Niagara Falls Blvd., North Tonawanda, NY 14120-2060. All rights reserved. Reproduced by permission. use self-ratings if the patient had been an adolescent. Usually, it would also be desirable to include ratings by parents so that improvement can be compared across settings. Conclusions ADHD is a developmental disorder that covers the entire life span. However, its presentation is different for each sex, and the manifestation of symptoms differs at various developmental stages. There is usually a set of other symptoms that either reflect comorbid disorders or the cumulative developmental consequences of the primary disorder of behavior, cognition, or learning. Current categorical methods of diagnosis do not provide a sound empirical basis for assessment and treatment monitoring because they provide no empirical standards of levels of impairment. Nor do they account for the changing threshold for diagnosis that should be required in older patients. As normal development occurs, the frequency and intensity of behaviors change so that what previously may have been abnormal is no longer appropriate as a referent (Barkley, Murphy, & Kwasnik, 1996).

< previous page

page_491

next page >

< previous page

page_492

next page > Page 492

The present restandardized Conners Rating Scales provide a national normative database and rigorously developed psychometric standards for scales that are now applicable across the life span. No diagnostic category can be fully assessed in isolation from potential alternative diagnoses. Therefore, in addition to scales primarily focused on issues related to ADHD, scales for other important behaviors relating to conduct, oppositionality, anxiety, somatization, self-esteem, mood, and family function have been carefully developed. The scales now include the DSM-IV categorical symptom list, and it is possible to use them simply as symptom counts as in the DSM-IV field trials. However, the scalar and normative features also allow one to bring to bear the powerful dimensional measurement concept that views each individual as the intersect of several underlying dimensions or influences. In combination with the categorical approach, dimensional measures provide a cross-check that places diagnoses within a normative and developmental framework. The rating scales are particularly well-suited for treatment monitoring by virtue of having brief but highly reliable subscales such as the new ADHD Index. These scales represent the optimum set of items for discriminating well-diagnosed ADHD from non-ADHD, and have been carefully replicated on independent samples. Their use in drug studies may be of particular value because the earlier versions (the so-called Hyperactivity Index) have proven to be brief, drug-sensitive, and well-liked by teachers, parents, and professionals. References Abramowitz, A.J., & O'Leary, S.G. (1991). Behavioral interventions for the classroom: Implications for students with ADHD. School Psychology Review, 20, 220-234. Achenbach, T.M., Conners, C.K., Quay, H.C., Verhulst, F.C., & Howell, C.T. (1989). Replication of empirically derived syndromes as a basis for taxonomy of child/adolescent psychopathology. Journal of Abnormal Child Psychology, 17, 299-323. Achenbach, T.M., Howell, C.T., Quay, H.C., & Conners, C.K. (1991). National survey of problems and competencies among 4- to 16-year-olds: Parents' reports for normative and clinical samples. Monographs of the Society for Research in Child Development, 56, 1-131. Anderson, J., & Gerbing, D. (1984). The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49, 155-173. American Psychiatric Association (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Arnold, L.E., Sheridan, K., & Estreicher, D. (1986). Multifamily parent-child group therapy for behavior and learning disorders. Journal of Child and Adolescent Psychotherapy, 3, 279-284. Barkley, R.A., Anastopoulos, A.D., Guevremont, D.C., & Fletcher, K.E. (1992a). Adolescents with attention deficit hyperactivity disorder: Mother-adolescent interactions, family beliefs and conflicts, and maternal psychopathology. Journal of Abnormal Child Psychology, 20, 263-288. Barkley, R.A., Guevremont, D.C., Anastopoulos, A.D., & Fletcher, K.E. (1992b). A comparison of three family therapy programs for treating family conflicts in adolescents with attention-deficit hyperactivity disorder. Journal of Consulting and Clinical Psychology, 60, 450-462. Barkley, R.A., Murphy, K., & Kwasnik, D. (1996). Psychological adjustment and adaptive impairments in young adults with ADHD. Journal of Attention Disorders, 1, 41-54. Bentler, P. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software. Biederman, J., Keenan, K., & Faraone, S.V. (1990). Parent-based diagnosis of attention

< previous page

page_492

next page >

< previous page

page_493

next page > Page 493

deficit disorder predicts a diagnosis based on teacher report. Journal of the American Academy of Child and Adolescent Psychiatry, 29, 698-701. Brown, R.T., & Pacini, J.N. (1989). Perceived family functioning, marital status, and depression in parents of boys with attention deficit disorder. Journal of Learning Disabilities, 22, 581-587. Brown, R.T., & Sawyer, M. (in press). Medication for school-age children. New York: Guilford. Cattell, R. (1978). The scientific use of factor analysis in behavioural and life sciences. New York: Plenum. Cohen, N.J., Sullivan, J., Minde, K., Novak, C., & Keens, S. (1983). Mother-child interaction in hyperactive and normal kindergarten-aged children and the effect of treatment. Child Psychiatry and Human Development, 13(4), 213-224. Coker, K.H., & Thyer, B.A. (1990). School-and family-based treatment of children with attention-deficit hyperactivity disorder. Families in Society, 71, 276-282. Cole, D. (1987). Utility of confirmatory factor analysis in test validation research. Journal of Consulting and Clinical Psychology, 55, 584-594. Conners, C., Parker, J., & Sitarenios, G. (1998). Parent and teacher ratings of DSM-IV ADHD symptoms: A test of the subtype model. Manuscript in preparation. Conners, C.K. (1994a). The Conners Continuous Performance Test. Toronto, Canada: Multi-Health Systems. Conners, C.K. (1994b). The Conners Rating Scales: Use in clinical assessment, treatment planning and research. In M. Maruish (Ed.), Use of psychological testing for treatment planning and outcomes assessment (pp. 550578). Hillsdale, NJ: Lawrence Erlbaum Associates. Conners, C.K. (1997). Conners' Rating Scales-Revised technical manual. North Tonawanda, NY: Multi-Health Systems. Conners, C., Erhardt, D., Epstein, J., Parker, J., & Sitarenios, G. (1998). Self-ratings of ADHD symptoms in adults: Normative data, factor structure, reliability, and diagnostic sensitivity. Manuscript in preparation. Conners, C.K., & March, J.S. (1994). A developmental history form for ADHD and related disorders. North Tonawanda, NY: Multi-Health Systems. Conners, C.K., March, J.S., Erhardt, D., Butcher, T., & Epstein, J. (1996). Assessment of attention deficit disorders (ADHD): Conceptual issues and future trends. Journal of Psychoeducational Assessment. Special ADHD Issue (Monograph Series), 185-204. Conners, C.K., Sitarenios, G., Parker, J.D., & Epstein, J.N. (1998a). The Revised Conners' Parent Rating Scale (CPRS-R): Factor structure, reliability, and criterion validity. Journal of Abnormal Child Psychology, 26(4), 257-268. Conners, C.K., Sitarenios, G., Parker, J.D., & Epstein, J.N. (1998b). The Revised Conners' Teacher Rating Scale (CTRS-R): Factor structure, reliability, and criterion validity. Journal of Abnormal Child Psychology, 26(4), 279-291. Conners, C.K., Wells, K.C., Parker, J.D., Sitarenios, G., Diamond, J., & Powell, J.W. (1997). A new self-report scale for assessment of adolescent psychopathology: Factor structure, reliability, validity, and diagnostic sensitivity. Journal of Abnormal Child Psychology, 25(6), 487-497. Copeland, A.P., & Weissbrod, C.S. (1978). Behavioral correlates of the hyperactivity factor of the Conners Teacher Questionnaire. Journal of Abnormal Child Psychology, 6, 339-343. Cunningham, C.E., Benness, B.B., & Siegel, L.S. (1988). Family functioning, time allocation, and parental depression in the families of normal and ADDH children. Journal of Clinical Child Psychology, 17, 169-177. Diamond, J.M., & Deane, F.P. (1990). Conners Teachers' Questionnaire: Effects and implications of frequent administration. Journal of Clinical Child Psychology, 19(3), 202-204. Edwards, M.C., Schulz, E.G., & Long, N. (1995). The role of the family in the assessment of attention deficit hyperactivity disorder. Special Issue: The impact of the family on child adjustment and psychopathology. Clinical Psychology Review, 15, 375-394. Everett, J. (1983). Factor comparability as a means of determining the number of factors and their rotation. Multivariate Behavioral Research, 197-218. Gickling, E.E., & Thompson, V.P. (1985). A personal view of curriculum-based assessment.

< previous page

page_493

next page >

< previous page

page_494

next page > Page 494

Special Issue: Curriculum-based assessment. Exceptional Children, 52, 205-218. Goyette, C.H., Conners, C.K., & Ulrich, R.F. (1978). Normative data on revised Conners Parent and Teacher Rating Scales. Journal of Abnormal Child Psychology, 6, 221-236. Greenhill, L., Abikoff, H., Arnold, E., Conners, C., Wells, K., Elliott, G., Hechtman, L., Hinshaw, S., Hoza, B., Jensen, P., March, J., Newcorn, J., Pelham, W., Severe, J., Swanson, J., & Vitiello, B. (in press). Medication treatment strategies in the MTA study: Relevance to clinicians and researchers. Journal of the American Academy of Child and Adolescent Psychiatry. Guy, W. (1973). Treatment Emergent Symptom Scale (TESS). In Psychopharmacology Bulletin: Special Issue on Children (p. 220). Washington, DC: NIMH. (DHEW Publication No. [HSM] 73-9002) Hallowell, E., & Ratey, J. (1994). Driven to distraction. New York: Pantheon. Hinshaw, S.P., Buhrmester, D., & Heller, T. (1989). Anger control in response to verbal provocation: Effects of stimulant medication for boys with ADHD. Journal of Abnormal Child Psychology, 17, 393-407. Joreskog, K., & Sorbom, D. (1986). LISREL VI: Analysis of linear structural relationships by maximum likelihood, instrumental variables, and least squares methods. Morresville, IN: Scientific Software. Kessel, J., & Zimmerman, M. (1993). Reporting errors in studies of the diagnostic performance of selfadministered questionnaires: Extent of the problem, recommendations for standardized presentation of results, and implications for the peer review process. Psychological Assessment, 5, 395-399. Kovacs, M. (1993). The Children's Depression Inventory (CDI). Toronto, Canada: Multi-Health Systems. March, J.S., Parker, J.D., Sullivan, K., Stallings, P., & Conners, C.K. (1997). The Multidimensional Anxiety Scale for Children (MASC): Factor structure, reliability, and validity. Journal of the American Academy of Child and Adolescent Psychiatry, 36(4), 554-565. Marsh, H., Balla, J., & McDonald, R. (1988). Goodness-of-fit indexes in confirmatory factor analysis: The effect of sample size. Psychological Bulletin, 103, 391-410. Marshall, V.G., Longwell, L., Goldstein, M.J., & Swanson, J.M. (1990). Family factors associated with aggressive symptomatology in boys with attention deficit hyperactivity disorder: A research note. Journal of Child Psychology and Psychiatry and Allied Disciplines, 31, 629-636. Maruish, M.E. (Ed.). (1994). The use of psychological testing for treatment planning and outcome assessment. Hillsdale, NJ: Lawrence Erlbaum Associates. Milich, R., Roberts, M.A., Loney, J., & Caputo, J. (1980). Differentiating practice effects and statistical regression on the Conners Hyperkinesis Index. Journal of Abnormal Child Psychology, 8, 549-552. Oettinger, L., Majovski, L.V., & Gauch, R.R. (1978). Coding A and Coding B on the WISC are not equivalent tasks. Perceptual and Motor Skills, 47, 987-991. Parker, J., Sitarenios, G., & Conners, C. (1996). Abbreviated Conners' Rating Scales revisited: A confirmatory factor analytic study. Journal of Attention Disorders, 1, 55-62. Parker, J.D.A., Sitarenios, G., & Conners, C.K. (in press). Assessment of Attention-Deficit/Hyperactivity Disorder: A new index for parent, teacher, and adolescent ratings. Journal of Abnormal Child Psychology. Pelham, W.E., Milich, R., Cummings, E.M., & Murphy, D.A. (1991). Effects of background anger, provocation, and methylphenidate on emotional arousal and aggressive responding in attention-deficit hyperactivity disordered boys with and without concurrent aggressiveness. Journal of Abnormal Child Psychology, 19, 407426. Piancentini, J. (1993). Checklists and rating scales. In T. Ollendick & M. Hersen (Eds.), Handbook of child and adolescent assessment (pp. 82-97). Boston: Allyn & Bacon. Pliszka, S.R. (1989). Effect of anxiety on cognition, behavior, and stimulant response in ADHD. Journal of the American Academy of Child and Adolescent Psychiatry, 28, 882-887. Reynolds, C.R., & Richmond, B.O. (1978). Factor structure and construct validity of "What I Think and Feel": The Revised Children's Manifest Anxiety Scale. Journal of Personality Assessment, 43, 281-283. Ross, P.A., Poidevant, J.M., & Miner, C.U. (1995). Curriculum-based assessment of writing fluency in children with attention-deficit hyperactivity disorder and normal children.

< previous page

page_494

next page >

< previous page

page_495

next page > Page 495

Reading and Writing Quarterly: Overcoming Learning Difficulties, 11, 201-208. Saylor, C.F., Benson, B., & Einhaus, L. (1985). Evaluation of an anger measurement program for aggressive boys in inpatient treatment. Journal of Child and Adolescent Psychotherapy, 2, 5-15. Schachar, R.J., & Wachsmuth, R. (1991). Family dysfunction and psychosocial adversity: Comparison of attention deficit disorder, conduct disorder, normal and clinical controls. Special Issue: Childhood disorders in the context of the family. Canadian Journal of Behavioural Science, 23, 332-348. Silver, L.B. (1989). Psychological and family problems associated with learning disabilities: Assessment and intervention. Journal of the American Academy of Child and Adolescent Psychiatry, 28, 319-325. Speltz, M.L., Varley, C.K., Peterson, K., & Beilke, R.L. (1988). Effects of dextroamphetamine and contingency management on a preschooler with ADHD and oppositional defiant disorder. Journal of the American Academy of Child and Adolescent Psychiatry, 27, 175-178. Steinkamp, M.W. (1980). Relationship between environmental distractions and task performance of hyperactive and normal children. Journal of Learning Disabilities, 13, 209-214. Swanson, J.M., Posner, M., Potkin, S., Bonforte, S., Youpa, D., Fiore, C., Cantwell, D., & Crinella, F. (1991). Activating tasks for the study of visual-spatial attention in ADHD Children: A cognitive anatomic approach. Journal of Child Neurology, 6, S119-S127. Tanaka, J.S., & Huba, G.J. (1984). Confirmatory hierarchical factor analyses of psychological distress measures. Journal of Personality and Social Psychology, 46. Wainright, A. (1996). Conners' Rating Scales: 30 years of research. Toronto, Canada: Multi-Health Systems. Wells, K.E. (1988). Social learning and systems family therapy for childhood oppositional disorder: Comparative treatment outcome. Comprehensive Psychiatry, 296, 138-146. Zentall, S.S., Gohs, D.E., & Culatta, B. (1983). Language and activity of hyperactive and comparison children during listening tasks. Exceptional Children, 50(3), 255-266. Ziegler, R., & Holden, L. (1988). Family therapy for learning disabled and attention-deficit disordered children. American Journal of Orthopsychiatry, 58, 196-210.

< previous page

page_495

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_497

next page > Page 497

Chapter 17 Youth Outcome Questionnaire (Y-OQ) M. Gawain Wells Gary M. Burlingame Michael J. Lambert Brigham Young University At the present juncture of psychotherapy outcome measurement, three rather disparate influences meet, perhaps even appear to collide: the demands of rigorous research, the concerns of clinicians in practice, and the call for accountability from customers and third-party payers such as health care corporations. This chapter describes the mutual effort of individuals from each sectora university-based research team, administrators from a large managed health organization, parents and child patients, and clinicians in practicein developing a multiscale treatment outcome instrument for children and adolescents that is both scientifically standardized and useful in the clinical world of today's mental health care market. Moreover, although many of the major assessment instruments for children and adolescents were designed for diagnosis of psychopathology, employing the test specifically for the evaluation of psychotherapy outcome has been of more recent interest. The Youth Outcome Questionnaire (Y-OQ; Burlingame, Wells, & Lambert, 1996) is such an instrument: a parent report measure constructed specifically to track treatment progress. The questionnaire is a continuation and extension of an adult outcomes instrument, the Outcome Questionnaire (OQ-45; Lambert & Burlingame, 1996; see Lambert & Finch, chap. 27, this volume). The discussion begins with a brief review of the current ''intersect" between the stakeholders and construction of the instrument before considering its characteristics and its utility. Intersection of Research, Clinical Work, and Managed Health Care Research Recent reviews of psychotherapy research suggest significant gains have been made in demonstrating the effectiveness of psychotherapy and refining treatment strategies for specific populations in need of care (Bergin & Garfield, 1994). However, the parallel

< previous page

page_497

next page >

< previous page

page_498

next page > Page 498

research literature in the treatment of emotional and behavior difficulties of children and adolescents is less well articulated, given both the range of dysfunctions to be studied and the treatment techniques employed (Kazdin, 1988, 1991, 1993; Kazdin, Bass, Ayers, & Rodgers, 1990; Shirk & Russell, 1992; Weisz, Weiss, & Donenberg, 1992; Weisz, Weiss, Alicke, & Klotz, 1987). Allen, Tarnowski, Simonian, Elliot, and Drabman (1991) examined more than 15,000 studies and found that only 6% related to children. Tramonta (1980) estimated that the adolescent outcome literature lagged 15 years behind the adult literature, although several more recent articles have suggested that research in the area of child and adolescent therapy outcome is progressing (Kazdin, 1991, 1993; Kendall & Morris, 1991). General conclusions drawn from several meta-analytic reviews (Barrnett, Docherty, & Frommelt, 1991; Casey & Berman, 1985; Hoag & Burlingame, 1997; Kazdin, 1990; Kovacs & Paulaskas, 1986; Weisz et al., 1987) indicate the following: (a) Psychotherapy for children and adolescents is effective (Kazdin, 1993), with the average treated child showing improvement that exceeds 76% of children in a comparable sample (effect size = .71). (b) Individual therapy with children is approximately as effective as that with adults (Brown, 1987). And, (c) comparisons of alternative treatment techniques appear to favor behavioral approaches thus far, although methodological confounds may also explain these results (Kazdin, 1993). A 1995 issue of the Journal of Clinical Child Psychology devoted to methodological issues in child psychotherapy research chronicled the state of the art. In this issue, Kazdin (1995) reported that the last decade has seen significant research advances but suggested that therapy research as a whole is restricted and narrow as presently conceived. Moreover, because methodological issues are directly related to an individual's ability to draw valid inferences, the current methodological limitations constrict present conclusions about the effectiveness of psychotherapy with children. Kazdin called for much more varied and careful methodological research to explicate a variety of issues. For instance, he noted that there is no standard outcome assessment practice, which prevents the combination of findings that would build a common knowledge base. Standardized measures, he suggested, would serve to profile children and adolescents in a consistent way to enable researchers to integrate studies about specific types of problems. Durlak, Wells, Cotten, and Johnson (1995) found reason for more optimism than Kazdin and other earlier reviewers from a study of 516 clinical interventions with children under the age of 13. Their data suggested that the methodologies employed have improved significantly, including a majority of investigations with adequate controls, random assignment to treatment groups, multiple outcome measurements, and assessment of generalization of treatment effects. On the other hand, Durlak and associates noted a number of important neglected variables in child psychotherapy research, among them the need for normed outcome measures that would permit assessment of clinical significance (as opposed to statistical significance) of treatment effects. They concluded that "there is, however, certainly room for improvement. In general, child psychotherapy research would be strengthened by greater use of normed outcome measures, assessing the general as well as the specific impact of treatment, using attention-placebo controls, and collecting follow-up data" (p. 143). Clinicians Theoretically, the fundamental consumers of psychotherapy research are the therapist practitioners who read the literature to keep abreast of new developments that they will incorporate into their treatment strategies. Yet, several surveys find practitioners at best

< previous page

page_498

next page >

< previous page

page_499

next page > Page 499

to be reluctant and suspicious participators in the process, for two reasons. First, for many, a careful reading of psychotherapy research suggests it is such a dissimilar enterprise as to be irrelevant. Kazdin (1991) noted that "research and clinical practice differ in who is treated, how they are treated, with what treatment approaches, how long they are treated, and range of persons involved in the child's treatment. There is genuine risk that the generalizability of research findings to clinical practice is quite strained. A critical research issue for the field is the deep chasm between research and practice on child treatments" (p. 795). Accordingly, Kazdin, Siegel, and Bass (1990) reported an increasing tendency for clinicians to dismiss psychotherapy research as irrelevant to their practice needs. There is a second and perhaps more fundamental reason for the "chasm" between the researcher and the clinician. Although the research findings may be irrelevant to the clinician, the findings are not for third-party payers who may evaluate the clinician's skills on that basis. Many therapists feel that they face losing their jobs or positions on provider panels if they do not produce substantial improvement in brief periods of time with more severe clients and more chaotic families than the researchers have ever attempted to treat. Durlak et al. (1995) directly addressed this latter point. They found more convergence between research topics and practice concerns in recent psychotherapy research of children than is generally thought. In their evaluation of 516 intervention studies, they examined several practitioner concerns, among others: that there have been few studies of nonbehavioral treatments, that child research has seldom been conducted on subjects from minority populations, that long-term treatments have not been studied, and that few treatments have been investigated. They found that approximately 40% of the recent studies evaluate nonbehavioral interventions, 18% of the studies treated minority children, 20% of the reports evaluated treatments of 20 sessions or longer, and 19 categories were required to describe the types of treatment entailed. Thus, Durlak and associates opined that the quality of child psychotherapy research is clearly germane to the practicing clinician. Health Care Corporations The costs of providing mental health treatment to children and adolescents is difficult to estimate, but the evidence from several sources justifies the concern in the health care marketplace for cost containment and accountability in quality care (e.g., Burlingame, Lambert, Reisinger, Neff, & Mosier, 1995). In 1985, monetary costs for children's mental health care (without adolescent care) was estimated to be $1.5 billion (Institute of Medicine, 1989). The total cost of health care in the United States in 1991 was estimated to be $800 billion (Health Care Financial Administrative, 1992) and growing at a rate exceeding inflation by more than 4% (Survey of Current Business, 1992). Moreover, mental health costs in particular were growing at an alarming rate of from 30% to 40% during the early 1980s (Cummings, 1987). The health care industry's response to such spiraling costs has led to managed health care systems and the socalled era of accountability. Thus, in spite of the suspicions of practitioners, third-party payers make it essential for health care providers to be able to document therapeutic progress. Consequently, psychotherapy outcomes research has been pushed into the center arena as a means of tracking the quality of mental health treatment. Moreover, the adoption of the continuous quality improvement (CQI) model

< previous page

page_499

next page >

< previous page

page_500

next page > Page 500

in 1992 by the Joint Commission on the Accreditation of Healthcare Organizations (JCAHO) requires ongoing monitoring of patient care in addition to traditional outcomes assessment. In this setting, standardized outcome assessment becomes vital. According to Burlingame et al. (1995), "Continuous monitoring of outcome ideally requires standardized data to profile reliable and valid patterns of improvement across time, providers, programs, and patient groups rather than data generated solely from professional judgment that tends to be more variable and unstable" (p. 227). This chapter describes the development and uses of the Youth Outcome Questionnaire (Burlingame et al., 1996), a 64-item parent report measure constructed specifically to track treatment progress. Parents or others with reasonably extensive interaction with the client complete the questionnaire at intake to establish a severity baseline and then complete it at regular intervals to follow that person's perception of the child's psychological state. As is described later, psychometric calculations from the normative database permit determination of the client's behavioral similarity at each measurement interval to inpatient populations, outpatient populations, and a large untreated community sample. Utilizing cutoff scores and a reliable change index, clinicians and/or administrators can determine if and when the client's behavior has entered the "normal" range of behavior. Although its publication is still very recent and much research remains to be done to evaluate its properties, its use has spread surprisingly rapidly. It currently is employed as an outcome instrument for health care corporations covering 18 million lives. Overview of the Instrument Summary of Development Originally conceived as the child and adolescent equivalent of the Outcome Questionnaire (OQ-45.2) (Lambert & Burlingame, 1996; see chap. 27, this volume), the Youth Outcome Questionnaire (Y-OQ), like its predecessor, was constructed to be brief, sensitive to change over short periods of time, and available at a nominal cost, while maintaining high psychometric standards of reliability and validity. Inasmuch as psychotherapy outcome research for children and adolescents has been considered as less "mature" than the research in adult outcome (Kazdin, 1995), a comprehensive instrument construction was undertaken to identify the most salient content domains to include in the instrument, including a literature search, focus groups, and chart reviews. Thus, input was sought from all stakeholders in the process. Initial Literature Review. A comprehensive review of the literature was performed, including both narrative and meta-analytic reviews of the general psychotherapy treatment literature for children and adolescents. The search utilized PsycLit, recent relevant publications, and reference sections of previous reviews of child and adolescent psychotherapy to identify studies related to outcome in psychotherapy. Topics searched included treatment acceptance, factors contributing to dropout, satisfaction with treatment, expectations for psychotherapy, the development of other outcome measures, evaluating outcome with children as well as adults, and assessing clinically significant change. The primary purpose of such an exhaustive literature review was to search for content domains that had empirical support for being sensitive to change in clinical work from a wide variety of orientations with children and adolescents with a wide

< previous page

page_500

next page >

< previous page

page_501

next page > Page 501

variety of disorders. Similarly, from recent meta-analytic reviews, content domains were considered in which a .5 effect size was obtained in particular content areas (e.g., Baer & Nietzel, 1991; Casey & Berman, 1985; Grossman & Hughes, 1992; Prout & DeMartino, 1986; Roberts & Camasso, 1991; Russel, Greenwald, & Shirk, 1991; Shirk & Russell, 1992, Weisz et al., 1987). In other words, the goal was to identify domains of content in which the average treated child or adolescent experienced improvement greater than 69% of those not in treatment. An effect size of .5 has been classified by Cohen (1988) as being in the medium range. It was reasoned that areas of child functioning that had a demonstrated research record of being amenable to change as a consequence of treatment would be particularly suitable for inclusion in the outcome measure. Next, focus groups made up of consumers (i.e., former clients and their parents) as well as inpatient and outpatient providers (i.e. psychologists, psychiatrists, and other support staff) were queried. The incorporation of this qualitative method assured a rich source of information from those most intimately involved with the mental health change process in children and adolescents. Both consumers and providers were drawn from a large Western health care corporation. Professional focus group leaders led 10 separate groups of 5 to 10 participants to identify characteristics of change thought to be the direct results of treatment. Of particular importance were the following questions: In what ways does treatment affect children and their families, and when change was taking place, what was changing? Audiotapes of the focus groups were transcribed and became part of the material reviewed for selection of item content and subscale development. Hospital Records. Hospital records were examined to assess characteristic behavior change goals being addressed in treatment planning for both inpatient and outpatient clients. A manifest content analytic process was used to delineate the most frequently occurring change themes noted by providers in these two settings. A list of these themes was then compared to domains generated by both the literature review and the focus groups. In some cases, item content for the Y-OQ directly reflects change described in patient records. The results from literature reviews, focus groups, and hospital charts constituted the sources from which the final common content domains were developed. Six general domains were identified that were sufficiently distinct to require the generation of a subscale. For example, inpatient therapist focus groups argued that the purposes and goals for inpatient treatment in the current health care market were stabilization and referral, goals very different from outpatient treatments. The Critical Items subscale (described later) was designated as the content domain to capture behaviors that would be particularly representative of and amenable to change during inpatient hospitalizations. Parenthetically, in some instances, striking similarities were found between focus group suggestions and the published literature. Fifteen to 20 items from each of the domains were created and compared to existing items from other instruments. A final review to eliminate items with duplicate content in other subscales or that described diagnostic symptoms unlikely to be amenable to change (e.g., historical facts) resulted in the current format of the instrument. Normative data has been gathered from children ranging from age 4 to 17. Most parents require from 5 to 7 minutes to complete the measure, although particularly careful parents may take as long as 20 minutes. Each item is rated on a 5-point Likert scale (0-4). Eight of the items are written in a reverse direction identifying healthy behaviors that may be seen to increase during treatment. Thus, sensitivity to change in

< previous page

page_501

next page >

< previous page

page_502

next page > Page 502

the measure may be seen in the increase of socially approved behaviors as well as in the decrease of symptoms. Methodologically, the inclusion of positive items increases the range of measurement to avoid an artificially low floor. These items are scored differently to reflect the continuum between the absence of healthy behaviors and their emergence (e.g., "My child handles frustration or boredom appropriately."). Description of Subscales Six subscales were found to be optimal in capturing the domains of change identified by the aforementioned sources. Intrapersonal Distress (ID). This subscale assesses the amount of emotional distress in the child/adolescent. Anxiety, depression, fearfulness, hopelessness, and self-harm are aspects measured by the ID subscale. Because depression and anxiety are frequently correlated in assessment instruments and patients who come for treatment (Burlingame et al., 1995), no attempt was made at differentiating these symptoms. High scores indicate an elevated sense of distress in the patient. Somatic (S). This subscale assesses change in somatic distress that the child/adolescent may be experiencing. Items address symptoms that are typical presentations, including headaches, dizziness, stomachaches, nausea, bowel difficulties, and pain or weakness in joints. High scores indicate that the patient's caregiver is aware of a large number of somatic symptoms, whereas low scores indicate either absence or unawareness of such symptoms. Interpersonal Relations (IR). This subscale assesses issues and actions relevant to the child's/adolescent's relationship with parents, other adults, and peers. Parents or caretakers evaluate the child's attitude toward others, his or her communication and interaction with friends, cooperativeness, aggressiveness, arguing, and defiance. High scores indicate that the caregiver is reporting significant interpersonal difficulty and low scores reflect a cooperative, pleasant interpersonal demeanor. Social Problems (SP). This subscale assesses problematic social behaviors, particularly delinquent or aggressive behaviors that precipitate bringing a child or adolescent into treatment for conduct difficulties. Although aggressiveness is also assessed in the IR scale, the aggressive content tapped by this subscale is of a more severe nature, typically involving the breaking of social mores. Items include truancy, sexual problems, running away from home, destruction of property, and substance abuse. Behavioral Dysfunction (BD). This subscale evaluates the child's ability to organize tasks, complete assignments, concentrate, and handle frustration, including times of inattention, hyperactivity, and impulsivity. Many of the items on this scale tap features of specific disorders such as attention deficit hyperactivity disorder. However, the intent of the scale is to identify difficulties and their severity and then to observe the emergence of behavior change, not to attempt to facilitate a diagnostic decision. Critical Items (CI). This subscale describes features of children and adolescents often found in inpatient services where short-term stabilization is the primary change sought. It assesses the presence of paranoid ideation, obsessive-compulsive behaviors, hallucinations, delusions, suicidal feelings, mania, and eating disorder issues. High scores are indicative of those who may need immediate intervention beyond standard outpatient

< previous page

page_502

next page >

< previous page

page_503

next page > Page 503

treatment (inpatient, day treatment, or residential care). Moreover, a high score on any individual item warrants serious attention and follow-up assessment by the provider and significant change should be observed before moving the child to less protected environments. Scoring Scoring the Y-OQ is a straightforward procedure involving simple addition of item values. For example, if Item 1 is endorsed at a 3, the weight given to this item for the subscale and total Y-OQ score is 3. Subscale scores are calculated by adding items that are assigned to each. The eight healthy behavior items are scored from a -2 to +2 to adjust for the presence or absence of positive behaviors in the child or adolescent. The total score is simply a summation of item weights from all six scales. It reflects total distress in a child/adolescent's life, having a range from -16 to +240. A child rated as having no pathological symptoms and the highest level of all positive behaviors would receive a Total Y-OQ score of -16 whereas a child rated as having the highest level of all symptoms would receive a total score of 240. Like the OQ-45 total (Lambert, Thompson, Nebeker, & Andrews, 1996), this value tends to be the best index to track global change and has the highest reliability and validity. A computerized scoring program and graphic presentation of patient scores across time are available. Types of Available Norms Normative data were drawn from several samples across the U.S. Rocky Mountain region, primarily in Utah and Idaho. Previous studies of geographic or regional norm differences on the OQ-45 found no differences; similarly, to this point, data collection from sites extending across 12 eastern, southern, and western states suggest no reason for geographic norms. A large community normal sample (approximately 680) was collected by randomly telephoning residents in a western community of 250,000. After identifying themselves and the research project, the telephone solicitors asked if the household had any children from age 4 to 17. If there were children in the age range who were not currently receiving mental health treatment (either medication or therapy), the parent was invited to complete a Y-OQ on only one child or adolescent in the household. The questionnairesalong with a self-addressed stamped envelope, a brief summary of the project, and consent formwere mailed to parents who agreed to participate in the project. Approximately 50% of those who agreed to participate returned completed, usable questionnaires. In addition, two separate inpatient and outpatient samples totaling approximately 500 children and adolescents were collected using a cluster sampling procedure through the intake offices of a large multistate western health care corporation. All parents of children and adolescents who presented for treatment at outpatient and inpatient facilities located in Utah and Idaho owned by the corporation completed the Y-OQ as part of their initial screening. By collecting data at intake from all new patients, initial level of disturbance was captured on each child. Collectively across the three samples (community normal, outpatient, and inpatient), males slightly outnumbered females (57% to 43%, respectively). Of the total normative sample, 57% were children of the community normal group, 29% were outpatients, and 15% were inpatients. Means and standard errors for the normative sample groups are

< previous page

page_503

next page >

< previous page

page_504

next page > Page 504

TABLE 17.1 Normative Group Data for the Y-OQ Total Score Sample N Mean Inpatient 174 100 Outpatient 342 78.6 Community 683 23.2

S.E. 3.05 1.97 1.02

depicted in Table 17.1. Analyses found large differences on the total Y-OQ scores between the three samples, F(2, 1095) = 528.5, p < .0001. As can be seen, community sample total means were substantially below those taken from the two clinical settings. Reliable differences also occurred between the two patient populations, with inpatients demonstrating more overall symptoms than outpatients. Analysis of domain or subscale scores seen in Table 17.2 support similarly strong significant differences between the community normal and the two clinical samples on every subscale except the Behavioral Dysfunction scale. Although the clinical samples reliably differ from the community normal group on this subscale, the inpatient and outpatient groups demonstrate similar elevations. Reliable differences were found between the three samples on all remaining subscales. Table 17.3 presents data from three outpatient treatment settings from diverse geographic locations and demographic characteristics, illustrating the comparability of the Y-OQ across settings. No significant differences exist between sites on total or subscale means. Gender. Analysis of Y-OQ total scores found no differences between males and females in any of the three normative samples, nor were any significant interactions observed (see Table 17.4 for means and standard errors for each sample). As indicated in Table 17.5, gender differences were observed on two Y-OQ subscales, Behavior Dysfunction (males score higher across all samples), F(1, 1128) = 16.23, p < .01, and Somatic (females score higher than their male counterparts across all samples), F(1, 1128) = 16.46, p < .01. Inasmuch as the Behavior Dysfunction scale taps behavior frequently associated with attention deficit disorder, this finding might be expected, given that three to six times more males than females are diagnosed with ADHD (Whalen, 1989). Similarly, Werry (1986) found that females predominate in the incidence of somatization disorders. TABLE 17.2 Normative Group Raw Score Data for the Y-OQ Domains Inpatient Outpatient Community (N = 174) (N = 342) (N = 683) Sample Mean S.E. Mean S.E. Mean S.E. Intrapersonal Distress 34.4 1.11 26.4 0.71 8.9 0.38 Somatic 9.2 0.43 7.8 0.31 3.3 0.13 Interpersonal Relations 13.0 0.62 10.4 0.42 0.6 0.21 Social Problems 10.2 0.51 6.3 0.32 0.7 0.12 Behavioral Dysfunction 21.7 0.75 20.1 0.49 6.8 0.28 Critical Items 11.5 0.48 7.7 0.27 3.0 0.11

< previous page

page_504

next page >

< previous page

page_505

next page > Page 505

TABLE 17.3 Total and Subscale Y-OQ Raw Scores for Outpatients from Three Health Maintenance Organizations Subscale Scores Source Total BD CI ID SP IR S Y-OQ Company A M 71.1637 18.0529 7.5316 24.1397 9.2299 6.1905 5.4813 N 171 189 190 179 187 189 187 SD 36.3415 9.2367 5.1183 12.3378 7.5296 4.9213 5.3273 Company B 84.9071 20.18817 8.5339 26.2222 12.8265 6.5559 9.0455 M 226 328 339 297 340 340 330 N 37.8705 8.8797 5.3732 12.3965 7.4177 5.2219 5.8922 SD Company C 66.6078 19.3490 6.4157 22.0510 8.3686 5.5451 4.8784 M 255 255 255 255 255 255 255 N 32.7433 9.0532 4.2869 11.8302 6.9874 4.3892 4.2190 SD Total 74.1457 19.5130 7.6020 24.2572 10.5128 6.1390 6.8057 M 652 772 784 731 782 784 772 N 36.3823 9.0652 5.0580 12.3045 7.5822 4.9052 5.5970 SD Note. Company A is a large multistate HMO based on the eastern seaboard. It includes a hospital, but is primarily an outpatient facility. Company B is a large community mental health center in the western United States serving a catchment area of 750,000 people. It serves primarily Medicare/Medicaid patients. Company C is a large multistate HMO serving patients throughout the United States. The current sample primarily comes from east and west coast offices and the southern states. TABLE 17.4 Comparison of Total Scores on the Y-OQ by Gender and Setting Sample N Mean S.E. Inpatient 108 97.9 Male 3.87 65 104.0 Female 5.07 Outpatient 208 79.4 Male 2.43 132 77.2 Female 3.37 Community 361 24.5 Male 1.41 320 22.0 Female 1.48 Age. Data from the community normal sample was examined by four age groups that correspond somewhat to those used in other diagnostic instruments (e.g., Conners Parent Rating Scale, Achenbach Child Behavior Checklist). Analysis by age found no reliable differences between age groups on the Y-OQ total, F(3, 555) = 2.14, p > .05, although slightly higher mean elevations (4-7 point spread) were noted for the adolescent age groups (12-14, 1517) than preschool and latency age children (see Table 17.6). Subscale means from the community sample highlight a few significant differences that may be developmental in nature (see Table 17.7). The greatest age difference was on the Intrapersonal Relations subscale where the two older groups showed reliably higher distress scores from the 9- to 11-year-old group, F(3, 555) = 5.09, p

< previous page

page_506

next page > Page 506

TABLE 17.5 Comparison of Gender Scores on the YOQ Domain Scores Sample MeanS.E.MeanS.E.Mean S.E. Intrapersonal Distress 33.1 1.42 25.9 0.86 8.6 0.51 Male 36.6 1.76 27.1 1.21 9.3 0.56 Female Somatic 8.6 0.55 7.3 0.39 2.9 0.16 Male 10.4 0.68 8.5 0.50 3.6 0.20 Female Interpersonal Relationships 12.8 0.78 10.5 0.54 .95 0.28 Male 13.4 1.04 10.2 0.69 .16 0.27 Female Social Problems 9.8 0.61 6.6 0.41 1.1 0.18 Male 10.9 0.92 5.6 0.52 0.2 0.15 Female Behavioral Dysfunction 22.1 0.94 21.3 0.59 7.9 0.41 Male 21.0 1.29 18.1 0.85 5.7 0.38 Female Critical Items 11.4 0.63 7.7 0.34 3.0 0.15 Male 11.7 0.76 7.5 0.46 3.0 0.17 Female

Inpatient Outpatient Community (N = (N = (N = 108/65) 208/132) 361/320)

TABLE 17.6 Y-OQ Total Score by Age in Community Normal Sample Age Range N Mean S.E. 6-8 170 20.4 1.73 9-11 155 22.8 2.15 12-14 123 27.2 2.80 15-17 111 27.0 2.89

Total Score

TABLE 17.7 Y-OQ Subscale Scores by Age in a Sample of Community Normals Age Range Age= 6-8 Age=9-11 Age=12-14 Age= 15-17 (N = 170) (N = 155) (N = 123) (N = 111) Subscale Mean S.E. Mean S.E. Mean S.E. Mean S.E. 7.6 9.3 10.7 Intrapersonal Distress 0.61 0.79 10.8 1.07 1.09 2.8 0.22 3.7 0.30 3.5 0.31 3.8 0.37 Somatic .05 0.33 -0.2 0.41 1.6 0.54 1.7 0.54 Interpersonal Relations 0.5 0.20 0.3 0.23 0.7 0.30 1.4 0.40 Social Problems 6.6 0.53 6.8 0.60 7.2 0.70 6.3 0.69 Behavioral Dysfunction 2.8 2.9 3.1

Critical Items

< previous page

0.19

0.23

3.5

0.32

page_506

0.28

next page >

< previous page

page_507

next page > Page 507

on the Intrapersonal Distress scale are also higher for the 15- to 17-year-olds than for the youngest group, the 6to 8-year-olds. Finally, the Social Problems scale is also higher among the 15- to 17-year-olds than the 9- to 11year-olds, F(3, 305) = 2.66, p < .05. The Social Problems subscale contains items that assess behaviors that are irrelevant to younger children (e.g., "Uses alcohol or drugs"). Thus, the therapist interpreting individual protocols should be aware that high scores on the Social Problems subscale for young children signal potentially more serious difficulties than for adolescents. Analyses by age in clinical groups found virtually identical profiles, with the exception that the 15- to 17-year-old group reported higher total distress than did the two youngest groups. The 6- to 8-year-old group had the lowest Social Problems subscale. Taken as a whole, the previous group differences in particular subscales appear to represent issues relevant to the particular developmental stage of the child. Cutoff Score. Utilizing the formula from Jacobson and Truax (1991) and the rationale for multiple cutoffs proposed by Tingey, Lambert, Burlingame, and Hansen (1996), a cutoff score has been calculated between the community sample and the two clinical samples (inpatient and outpatient combined), because this seems the most logical place to compare individuals for treatment outcome. The cutoff score for the Y-OQ total score is 46; in other words, total severity scores falling below that figure suggest that the patient is demonstrating behaviors no more extreme than that of the untreated normal population. Cutoff scores for the subscales are as follows: Intrapersonal Distress, 16; Somatic, 5; Interpersonal Relations, 4; Social Problems, 3; Behavioral Dysfunction, 12; Critical Items, 5. Cutoffs can, of course, be derived between any two normative samples for comparative purposes in evaluating treatment outcome. If special populations are being assessed, it may be appropriate to construct new normative samples and compute new cutoff scores for that particular group. The accuracy of the present cutoff scores is demonstrated specifically in the description of ''Sensitivity and Specificity" calculations to follow. Reliable Change Index. A reliable change index (RCI) has been derived between the community and clinical samples. The RCI is used to determine if the change exhibited by an individual in treatment is reliable. It is one of the two hurdles necessary for estimating clinically significant change (Jacobson & Truax, 1991). In order for an individual's score to be considered clinically significantly changed, it must cross a cutoff score and have a magnitude greater than the RCI. The RCI value that has been computed using reliability estimates from the community samples is 13, meaning that an individual's score must change by at least 13 points on the Y-OQ to be considered clinically significantly changed. The RCIs for each of the subscales are as follows: Intrapersonal Distress, 8; Somatic, 5; Interpersonal Relations, 4; Social Problems, 5; Behavior Dysfunction, 8; and Critical Items, 5. Inasmuch as the RCI value represents a large and diverse normative sample, it will be serviceable for most general purposes. If specialized or more specific RCI values are desired, appropriate norms can be gathered for computation of new RCI values according to the Jacobson and Truax formula. Other Normative Groups. Thus far, the Y-OQ has been translated into Spanish, German, French, and Laotian. Data gathering is underway for African American families and Hawaiian Island-Polynesian families. Results of analyses of Lao-speaking families (Mills, Manivanh, Burlingame, Wells, Peterson, & Nuttal, 1997) found the mean Y-OQ total scores to be significantly higher (46.58) than the means of the

< previous page

page_507

next page >

< previous page

page_508

next page > Page 508

community normal sample (22.40), but less elevated than the outpatient treatment sample (77.41). It is possible, of course, that the findings are spurious. Southeast Asians are known to be suspicious of psychology and psychologists (Dinh, B. Sarason, & L. Sarason, 1994). Moreover, verbal reports from those assisting in the project found that some parents were confused about the structure of a Likert scale format even after having it explained to them in Laotian. If the findings accurately reflect differences, however, it would appear that Laotian parents are experiencing somewhat more agitation or perturbation with their children than parents of the English-speaking normative community sample. Dinh et al. (1994), for example, found that Vietnamese students consistently reported more problems in their relationship with their parents than did American-born Asian students. Clinicians using the Y-OQ have commented that adolescents could complete a self-report version, and frequently, adolescents come to outpatient therapy without their parents, making it more difficult to capture parent reports. Therefore, a parallel adolescent self-report Y-OQ has been developed and focus group tested. Normative data are currently being analyzed in a manner analogous to the procedure of the original Y-OQ, with untreated community normals from junior high and high school, outpatient adolescents in treatment, and inpatient adolescents in treatment. Basic Validity and Reliability Information Reliability. Internal consistency reliability estimates of the Y-OQ used Cronbach's alpha (Cronbach, 1970) and were based on a nonclinical sample drawn from a large elementary school (N = 427), the original community normative sample of 651 participants, and the clinical normative sample of 490. The Y-OQ total score had a remarkably high and similar internal consistency estimates of .97 across the three samples. Critical Item and Somatic subscales had the lowest internal consistency estimates of the six, suggesting greater item heterogeneity (see Table 17.8). These findings make intuitive sense in that these scales cover very broad content areas (e.g., diverse somatic complaints and equally diverse behaviors suggesting a need for inpatient treatment). Overall, the high reliability estimate of the total Y-OQ suggests a strong single factor underlying the six subscales of the instrument. The presence of a strong single factor was also found on the OQ-45.2 (Lambert et al., 1996), which is particularly useful because the most frequently used score is the total score. TABLE 17.8 Internal Consistency Values for the Y-OQ Total and Domain Scores Internal Consistency Student Community Patient Total Subscale (N = (N = 642) (N = (N = 41) 516) 1199) Intrapersonal Distress .84 .90 .89 .93 Somatic .72 .68 .70 .76 Interpersonal .69 .79 .82 .89 Relations Social Problems .51 .71 .78 .84 Behavioral .85 .86 .84 .91 Dysfunction Critical Items .61 .63 .69 .76 Total Score .93 .95 .94 .97

< previous page

page_508

next page >

< previous page

page_509

next page > Page 509

Atkin, Whoolery, Peterson, Burlingame, Wells, and Nebeker (1997) calculated test-retest reliability correlation coefficients from two separate subsamples in which retests were completed at 2 weeks and 4 weeks. As Table 17.9 illustrates, a strong relation is seen between the first administration of the Y-OQ and scores taken at 2-week (r = .84, N = 56) and 4-week (r = .81, N = 93) intervals, producing an average test-retest reliability coefficient of .83. Similarly, all subscale test-retest correlations were significant at p < .001, ranging from .56 to .82. Thus, the total instrument, as well as its separate subscales, appear to have good to excellent test-retest reliability. Validity. A concurrent validity study (Atkin et al., 1997) compared the relation between the Y-OQ total and subscale scores to parallel subscales from the Child Behavior Checklist (Achenbach, 1991) and the Conners Parent Rating Scale-93 (Conners, 1989). The parents of a sample of 423 children completed the three instruments. Data analyses found a highly significant correlation (p < .001) of .78 between Y-OQ total and CBCL total scores, suggesting high convergent validity between the two tests (see Table 17.10). Moreover, all Y-OQ subscales correlated most highly with the analogous subscale of the CBCL. Inasmuch as a total score is not computed for the CPRS, no total comparisons could be made; however, again, nearly all Y-OQ subscales correlated most highly with the corresponding subscale from the Conners. The one (nonsignificant) exception was a slightly higher correlation between the Y-OQ Behavior Dysfunction subscale and the Conners Conduct Disorder (r = .61) rather than Behavior Dysfunction and Conners Hyperactive-Immature subscale (r = .59). Although not as distinctive as hoped for, given the substantial comorbidity between attention deficit disorder and conduct disorder in the literature (cf., Hinshaw, Lahey, & Hart, 1993), the findings are not surprising. Moreover, small and nonsignificant coefficients can be noted between dissimilar subscales, suggesting adequate divergent validity. Overall, the findings from these two samples suggest that the relations between the Y-OQ and established criteria are very promising. Support for the construct validity of the Y-OQ was also sought by comparing inpatient and outpatient scores on the Y-OQ with those of the community samples. Assuming that scores would be ordered from most pathological to least pathological, it was expected that the inpatient sample would be most disturbed, followed by the outpatient, and then the community sample. A one-way analysis of variance (ANOVA) demonstrated that the three samples were significantly different at the .001 level in the expected order. Table 17.11 demonstrates the clear differences between clinical and nonclinical groups; children in the community sample were, on average, the most healthy, and inpatient children, on average, were most severely disturbed. Other studies are underway that specifically evaluate various populations of court-adjudicated youth, TABLE 17.9 Test-Retest Reliability Estimates for the Y-OQ Using Normal Sample Subscale Scores Time Elapsed Between Administrations ID S IR SP BD CI Total 2 Weeks (N = 56) .82 .70 .75 .78 .82 .56 .84 4 Weeks (N = 93) .79 .67 .57 .71 .78 .65 .81 Total (N = 149) .78 .69 .66 .75 .79 .61 .83 Note. Scores are significant at p < .001.

< previous page

page_509

next page >

< previous page

page_510

next page > Page 510

TABLE 17.10 Validity Estimates between Y-OQ Subscale Scores and CBCL/CPRS-93 Youth Outcome Questionnaire Subscale ID S IR IP SP BD Total Problems CBCL Anxious/Depressed .70 Withdrawn .64 Somatic Complaints .61 Aggressive Behavior .63 Delinquent Behavior .58 Attention Problems .64 Total Problems .78 CPRS-93 Anxious-Shy .67 Psychosomatic .62 Conduct Disorder .60 (.61) Antisocial .49 Hyperactive-Immature .67 Restless-Disorganized .59 Note. The subscale scores use a normal sample (N = 423), and are significant at p < .001. TABLE 17.11 Comparison of Level of Psychopathology as Measured by the Total Y-OQ Score across Patient and Nonpatient Samples F(2, 1196) = 591.4 (significant, p

< previous page

page_511

next page > Page 511

community normal group, that are correctly identified. The specificity of the Y-OQ is .81, that is, 81% of the members of the normal group were correctly placed with a cutoff score of 46 (see Table 17.12). Other operating characteristics include Positive Predictive Power (PPP) = .7654 and Negative Predictive Power (NPP) = .8099. The sensitivity and specificity values are similar to those obtained by the OQ-45.2 (Lambert et al., 1996) and provide an index of the accuracy of the Y-OQ as a screening tool. It is important to note that whereas the sensitivity, specificity, and other operating characteristics are values that are clearly useful at the levels obtained, some consideration of the measurement process itself may indicate why they are not even higher. By definition, the children of the community sample are considered as normal and the children from the patient samples as abnormal. Yet, parents have individually defined thresholds of sufficient concern to seek treatment for their child; therefore, some children, who in the opinion of most professionals should be receiving treatment, are not receiving help (the "false negative" child) because their parents have not chosen to refer them. Other parents may refer their child for much less serious difficulties, even prophylactic reasons such as divorce adjustment (the "false positive" child) Basic Interpretive Strategy The Y-OQ is designed to serve two general purposes: to evaluate the outcomes of mental health treatment through repeated administrations, and as a screening instrument designed to establish need for treatment or alert mental health professionals to the need for more careful evaluation; In the first edition of this book, Derogatis and DellaPietra (1994) commented on the use of screening measures in psychiatry: The screening process represents a relatively unrefined sieve that is designed to segregate the cohort under assessment into "positives" who presumptively have the condition, and "negatives" who are ostensibly free of the disorder. Screening is not a diagnostic procedure per se. Rather, it represents a preliminary filtering operation that identifies those individuals with the highest probability of having the disorder in question for subsequent specific evaluation. Individuals found negative by the screening process typically are not evaluated further. (p. 23) Thus, when parents complete the Y-OQ at intake, the scores may be utilized at three different levels to determine the need for treatment. The first and most conservative procedure is to examine the Y-OQ total severity elevation. The Cronbach's alpha coefficients clearly suggest that the test is best understood as having one main underlying factor common to all the subscales. Comparing the individual case to the current TABLE 17.12 Y-OQ Classification with a Cutoff Score of 46 Predicted Group Criterion Group Normal Sample Patient Sample Normal Sample 520 122 Patient Sample 120 391

< previous page

Total 642 511

page_511

next page >

< previous page

page_512

next page > Page 512

normative groups, the clinician can immediately determine if children's scores suggest that they are in need of treatment, and what level of treatment is most likely to warrantedoutpatient or a more restrictive environment. Again, it must be emphasized that the questionnaire should be used as a screening measure, not as a complete diagnostic tool. In the words of one psychiatrist who uses the Y-OQ with his clientele, "It's like being able to take a child's mental health temperature. It is not a diagnostic blood test. It just tells you something is wrong and a little about how wrong and where to look" (Ferre, personal communication, September, 1997). The second and third levels of interpretation speak to the issues of "how wrong and where to look." The clinician studying the Y-OQ printout can next examine the subscale elevations, particularly the Critical Items (CI) subscale. CI questions are written specifically to speak to concerns for increased protective actions, such as inpatient hospitalization (e.g., "Sees, hears, or believes things that are not real," "Has times of unusual happiness or excessive energy," etc.) Although their empirical utility has yet to be established, cutoff scores and RCI calculations are available for each subscale. At the third level, further evaluation of both subscale elevations as well as individual items assist the clinician in generating hypotheses about the most effective treatment protocol. Again, for example, frequent endorsement of Critical Items, such as "Believes that others can hear her/his thoughts, or that s/he can hear the thoughts of others" signals the need for psychiatric evaluation for medication as opposed to planning for parent training. The Y-OQ is designed to be brief enough to allow for repeated administrations to parents. The scores can be plotted on a "Y-OQ tracking sheet" to monitor the parent's report of perceived changes over the course of treatment. When the total score drops below the community normal cutoff score of 46, the clinician knows that the parent sees the child as behaving "within normal limits" and can adjust treatment accordingly (e.g., work to consolidate gains, prepare for termination, etc.). Erratic scores may suggest to the clinician the need for careful evaluation of the treatment protocol, the introduction of second-level treatment, the influence of risk factors that mitigate against rapid treatment response, and so on. Greater consideration of these issues is addressed later. Use of the Instrument for Treatment Planning General Treatment Planning Issues To date, research with the Y-OQ has not addressed many issues related to treatment planning. That which is offered, therefore, should be seen as possibilities as opposed to empirically demonstrated and validated practices. It is equally clear, however, that the same statement could be applied to the entire topic of treatment planning in child and adolescent psychotherapy. Although significant advances have been made, Kazdin (1995) asserted that, in child and adolescent treatment, "treatment research is at an early stage in relation to the range of clinical dysfunctions that has been systematically studied and the types of research questions that are addressed" (p. 125). A survey conducted by Kazdin et al. (1990) found that commonly offered forms of therapy (e.g., family therapy, play therapy) for children have very small research bases. Behavioral and cognitive-behavioral therapies, on the other hand, comprised approximately one half of the published outcome research studies. Of the reported treatment

< previous page

page_512

next page >

< previous page

page_513

next page > Page 513

outcome studies, only 9% examined child characteristics in relation to outcome. Fonagy (1996) similarly noted that, at this point, research should direct efforts toward studies of "ecological" relevance, with somewhat less emphasis on those designed for maximal internal validity. Drotar, LaGreca, Lemanek, and Kazak (1995) suggested that the relative immaturity of the field requires research at all levels of sophistication, including carefully done case studies. And Peterson and Bell-Dolan (1995) attempted to describe a balance between investigative rigor and clinical relevance that will encourage the continuing search for a "best practice" approach. Given this caveat, the Y-OQ may be employed as a means of tracking a parent's perceptions of a child's change. Consider the following example of an actual treatment failure. The tracking sheet for F.T., a 16-year-old male being treated for oppositional defiant disorder and attention deficit hyperactivity disorder shown in Fig. 17.1, demonstrates reliable negative change over the first five sessions, that is, the repeated Y-OQ administrations document deterioration, at least to this point in treatment. The graph also illustrates the clinical expectation that children with such comorbid conditions are very difficult to treat successfully in their mid to late adolescence. Although there is not sufficient information to ascertain if what is being observed is the frequent upward swing before the drop in an extinction curve, the data do clearly demonstrate the need for careful treatment monitoring and possibly the need for a different level of treatment. Regrettably, F.T.'s case was transferred to a different treatment facility, precluding definitive conclusions. Similarly, the watchful clinician or treatment team may, by repeated administrations of the Y-OQ, observe the effects of additional interventions or environmental impacts, such as psychotropic medications or the filing for divorce of parents. Figure 17.2 illustrates the continuing problem of medication compliance. D.M., a 14-year-old female,

Fig. 17.1. Case example of F.T., age 16, diagnosed with attention deficit hyperactivity disorder and oppositional defiant disorder.

< previous page

page_513

next page >

< previous page

page_514

next page > Page 514

Fig. 17.2. Case example of D.M., a 14-year-old female diagnosed with bipolar disorder. was diagnosed with bipolar disorder. At intake, her Y-OQ total score was well within the range of the inpatient treatment population. Time 2 administration demonstrates the stabilization effect of inpatient treatment and initiation of lithium treatment. The Time 3 Y-OQ was given 2 months later on readmission to the inpatient unit. D.M. had stopped attending her therapy group and had stopped taking her medication. Y-OQs administered at Times 4 and 5 were completed by the parent at 2-week intervals following D.M.'s resumption of medication and psychotherapy. According to her mother's perception, D.M. has maintained her therapeutic gains. Clinically, M. Latkowski (personal communication, October 1997) suggested that repeated Y-OQ administrations provide something akin to a structured anecdotal report from a parent that decreases the time the clinician must take to query the parent about the child's progress. The first administration provides a baseline level from which treatment effects can be monitored. When scores fluctuate in unpredictable directions, the therapist can often learn about other external stressors that would not otherwise be discovered but may need to be addressed therapeutically. Latkowski also found a subtle but definite shift among clinicians who carefully review the Y-OQ tracking sheet toward therapeutic impact questions as opposed to diagnostic questions. In other words, he noticed that therapists spend less time in evaluation because the parents are providing the clinician with the treatment targets, and the clinicians move more directly to the question, "What needs to be done to address this difficulty?" In his call for systematic progress in clinical child outcome research, Kazdin (1995) noted the current dearth of knowledge regarding intensity of treatment delivery. He suggested, for instance, that some disorders (e.g., major depression) are sufficiently episodic that treatment should similarly be geared or timed to the onset and offset of symptoms. Perhaps some disorders such as attention deficit hyperactivity disorder will require continuous treatment (albeit spaced to perhaps monthly sessions) throughout

< previous page

page_514

next page >

< previous page

page_515

next page > Page 515

the course of development. Presently, there are few, if any, studies that speak to this issue. However, in that regard, child psychiatrists in the managed health care corporation who assisted in the development of the Y-OQ have employed it as a "need-for-treatment" monitoring device (R. Ferre, personal communication, March 1997). Once a child has reached a stable, therapeutic dose of medication, a portion of the practice of child psychiatry is devoted to quarterly check-ups for medication evaluation. These physicians have clinic secretaries regularly mail out Y-OQ protocols. When they are received, they are scored and the results entered on the Y-OQ tracking sheet. In the capitated health care system, the physicians' timely review of tracking sheets alerts them to the continuing efficacy of treatment or the need for a new evaluation, resumption of more intensive treatment, or further treatment modifications. Applications to Treatment Planning Issues Identification of Primary and Secondary Problems. As already described, the Y-OQ was not devised as a diagnostic instrument as such. Existing standardized diagnostic instruments, such as the Child Behavior Checklist (Achenbach, 1991) and Personality Inventory for Children (Lachar, 1982), are more appropriate for that purpose. Conceptually, however, the clinician may examine subscale patterns of the instrument to glean overall presenting problems as perceived by parents. Socalled broad-band difficulties with externalizing or internalizing behavior, or potential psychotic spectrum difficulties will become apparent. Individual item responses may key further questioning or evaluation in such arenas as eating disorders (e.g., "Has lost significant amounts of weight without medical reason") or somatization (e.g., "Complains of stomach pain or feeling sick more than other children of the same age"). Some clinicians have used extreme item or subscale scores as treatment targets; however, the nature of the symptom(s) expressed and the child's parents' concern may not match the measure. Diagnosis of a primary problem may not be synonymous with the most important problem to be treated. As Kazdin's (1995) commentary on child and adolescent psychotherapy research contends, not enough is known yet about the "natural history" of treatment with children and adolescents to do more than either question parents regarding their gravest concerns or target those problems that appear to be most ecologically dangerous. It is possible, for instance, that in terms of longterm success of ADHD children, their social skill deficits are much more deleterious than are study and homework skills, the more usual focus of concern. In other words, the 14-year-old girl who lives with the recognition that she has never been invited to a party ultimately may be at more risk than if she recognizes that she has never received better than a "C" in math or history. Appropriate Level of Care. As described earlier, total score elevations on the Y-OQ differ significantly between inpatient and outpatient clinical samples. Thus, in a rough clinical sense, the clinician employing the Y-OQ can say that a total score of 110, for instance, is most comparable to children being treated on an inpatient or residential treatment basis. Similarly, high scores on the Critical Items subscale reveal the need for a more protected or restricted treatment environment, such as inpatient or residential treatment. Several studies are ongoing that examine severity and subscale pattern levels of particular populations segregated by presenting problems (e.g., court-adjudicated youth vs. substance-abusing youth) or current level of care (e.g., residential treatment vs. inpatient vs. partial hospitalization vs. intensive outpatient).

< previous page

page_515

next page >

< previous page

page_516

next page > Page 516

TABLE 17.13 Hypothetical Summary Prepared by Psychologist for Demonstration of Effectiveness as Psychotherapist: Summary of Services for Managed Care Company X, 1996 Severity # Patients Seen Mean Mean Mean Level Severity Y-OQ Initial Treatment Duration Level Y-OQ Termination High 12 109 33 90 Medium 44 91 20 68* Low 21 68 9 47* Note. The median time lapse between initial contact and date of first appointment is 5 days. The percentage of high risk factor patients is 13%. The * indicates a score within the normal population. Potential Use or Limits for Treatment Planning in a Managed Care Setting Adoption of the continuous quality improvement (CQI) model in 1992 by the Joint Commission on the Accreditation of Healthcare Organizations (JCAHO) requires ongoing monitoring of patient care in addition to traditional outcomes assessment. Many managed care organizations are doing more with outcomes assessment than ever before. In one scenario illustrating a potential CQI application, the Y-OQ is completed at regular intervals to track therapeutic progress. In the office, the measure is scored and entered into the client's chart. The therapist employs the measure as an intake measure of initial severity of symptoms and index of risk factors that moderate expectations for rapid improvement, a tracking device of therapeutic change, and a potential summary source for demonstration of effectiveness of therapeutic interventions. At the corporate level, each completed measure is electronically received at the central corporate office and entered into a data bank. The corporation uses the resulting analyses for reporting therapeutic effectiveness to subscriber companies and for profiling individual providers, establishing decision algorithms to empirically determine appropriate sessions limits, and answering further research questions such as evaluating the efficacy of innovative approaches to treatment. In a similar vein, a hypothetical therapist may summarize her psychotherapy services for Managed Care Company X in the year 1997, illustrated in Table 17.13. The therapist details the outcomes of patients referred from the managed care company, with appropriate case-mixing information drawn from initial severity ratings of the Y-OQ. This type of summary should be useful as a report to Company X as well as applying for admission to other panels. Use of the Instrument for Treatment Monitoring Purpose of Treatment Monitoring This chapter began by considering the interface between three audiences: researchers, clinicians, and health care administrators. In the development of the measure, an obvious additional stakeholderthe clients, or, in the case of the Y-OQ, the parents of the child

< previous page

page_516

next page >

< previous page

page_517

next page > Page 517

being treatedwas added. Treatment monitoring has an obvious, albeit different, part to play for each stakeholder. Much has been said already about the relatively immature state of the art in child and adolescent psychotherapy outcome research. For the researcher and, ultimately, the clinician, treatment monitoring provides ecological relevance and answers in standardized format the question, ''How is the patient doing at this stage of treatment?" For Fonagy (1996), monitoring is essential to bridge the gap between academic research and everyday actual clinical practice: Monitoring the process of a service goes hand in hand with routine monitoring of outcomes of clinical practice. . . . Where outcomes are poorer than anticipated on the bases of research findings, and the discrepancy is not accounted for by the deviations from laid down standards of performance, the monitoring process has provided further research questions about the essential components of treatment or patient characteristics which place limits on treatment effectiveness. In an ideal world further theoretical and clinical development follow, leading to research which in turn addresses the shortcomings of the treatment protocol. (p. 39) Administrators of health care organizations find treatment monitoring equally essential to build a knowledge base for decisions regarding which patients benefit from particular services and what the duration and intensity of treatment should entail. Echoing Fonagy's vision of treatment monitoring, Burlingame et al. (1995) urged that administrators see treatment monitoring as an informational process by which the organization as well as providers can refine the treatment process and, therefore, its "products." Families, too, are benefitted by treatment monitoring. They, too, are asking, "How is my child doing?" In the absence of more regularized information, they must rely on their impressions of the recent past, unable to track for themselves gradual but substantial changes that may have occurred. Clinicians reported increased consumer satisfaction in a child psychiatry practice when they routinely shared the Y-OQ tracking charts with the parents of patients. How to Use the Instrument for Treatment Monitoring Practitioners who have integrated the Y-OQ into their practices request that parents complete the questionnaire at each session. They reportedly interpret the printout at three levels: First, the overall severity and/or response to Critical Items suggests the parents' sense of crisis or agitation with their child. Even in the midst of ongoing treatment, the occurrence of three data points that suggest increasing behavioral problems prompts a search from the clinician with the parents to determine the source of negative change. Second, initial examination of subscales frequently indicates the general therapeutic thrust required (i.e., higher levels of externalizing behaviors requiring a different therapeutic protocol than higher levels of internalizing behaviors). And, third, when the child has made significant progress, the clinician is alerted to consider less restrictive treatment alternatives for the child. The following example illustrated in Fig. 17.3 may serve to demonstrate the utility of treatment monitoring, although therapy continued beyond the last available data point. T.S., a 9-year-old male with mild attention deficit difficulties, was being treated for an adjustment reaction to his parents' divorce. As can be seen in the figure, T.S. was experiencing moderate distress at intake, the point at which clinicians usually attempt to collect the initial Y-OQ protocol. Perusal of subscales, as well as his mother's

< previous page

page_517

next page >

< previous page

page_518

next page > Page 518

Fig. 17.3. Case example of T.S., a 9-year-old male with mild attention deficit disorder and adjustment reaction to divorce. report, revealed considerable intrapersonal distress, moderate conflict with his mother, who already feeling overwhelmed by the changing family circumstances, was unnerved by his acting out behaviors, and mild difficulties with behaviors associated with attention-related problems. In other words, T.S. presented a picture of a boy experiencing a welter of confused feelings about his parents' divorce, including depression, anxiety, anger, and increased distractability. The Y-OQ tracking sheet illustrates a 3-month period of outpatient supportive treatment to mother and child. Although the chart illustrates essentially a continuing course of difficulty without significant improvement, it is noteworthy that the session-by-session Y-OQs capture the agitation and pain of Christmas holidays for this boy, followed by a return to his earlier level of difficulty. His mother, when shown the tracking sheet, admitted that she was not surprised, given the turmoil everyone in the family was experiencing. She stated, however, that her expectation was that her son would have been much worse without the support of the child's therapist. She saw the tracking sheet as evidence that the entire family was still in the middle of some distressing adjustments. Nevertheless, clinicians, recognizing the no-change status of their client, may have considered a second-level form of treatment, such as a divorce adjustment group for the mother and children. Frequency of Administration Recent research in psychotherapy outcome has found value and scientific justification for increasing the number of administrations of outcome measures (Bryk & Raudenbush, 1988; Willet, 1989). Willet contended that increasing the number of administrations

< previous page

page_518

next page >

< previous page

page_519

next page > Page 519

("waves") increases the reliability of growth rate data although repeated administrations may result in retest effects whereby raters respond with increasing carelessness (mechanical responding) or with social desirability motives. In a direct examination of these issues with a randomized block design, Vermeersch, Durham, and Lambert (1997) studied 320 subjects who completed the OQ-45 (Lambert & Burlingame, 1996) in one of four experimental conditions (weekly, biweekly, monthly, pre-post) using a 9-week overall interval. Vermeersch et al. reported no evidence for social desirability and mechanical responding affecting total OQ-45 scores. Moreover, a small (2-4 point) drop in scores occurred only between the first and second administration, with OQ-45 scores being unaffected by weekly, biweekly, and monthly administration. It should be noted that this drop is far smaller than either the RCI for the measure (14 points) or its standard deviations, thus making it practically irrelevant to interpretation. A study is underway to test these retest artifacts with the Y-OQ. Thus, lacking future contradictory evidence, session-by-session administration of the Y-OQ provides for the most sensitive tracking process and this schedule does not seem to be unduly affected by retest, social desirability, or mechanical response artifacts. Use of the Y-OQ for Treatment Outcomes Assessment As noted earlier, the Y-OQ was developed specifically for the purpose of tracking outcomes. Some of its advantages and limitations are discussed for this purpose. It is important to note that the Y-OQ, in its present form, is limited to addressing outcomes from the point of view of the patient's primary caregiver. Unlike adults, children are not considered reliable informants of their behavioral and mental states. Adolescents who are commonly coerced into treatment are notoriously poor informants. It would be desirable if there were a form of the test that could be administered to the child, mental health specialists, teachers, or other relevant sources besides the parents. As mentioned earlier, such Y-OQ forms are currently under development. However, few people except a primary caregiver are privy to the extensive information needed to assess the broad range of problems assessed by the Y-OQ, and data collected from other data sources will necessarily vary in format and accuracy. Naturally, the aforementioned considerations limit the generalizability and usefulness of outcome data because these data come from a single source (i.e., the parent or guardian). Additional factors affect the value of outcome data generated by significant other ratings. For example, if outcome data are collected on a weekly basis, different informants (e.g., mother, father, an older sibling, etc.) may bring the child to weekly sessions. In this instance, there may be large differences between informants because of the amount of information they possess, or their response bias toward the child; consequently, the outcome data collected may not be meaningfully interpreted. Similar problems exist within inpatient treatment centers where parents may not have enough contact to adequately rate the child. Additionally, no one particular inpatient staff member may have spent enough time with the child to accurately rate the full range of behaviors present on the Y-OQ. The knowledgeable reader will recognize these problems as methodological issues inherent in all data that are collected from significant others, and therefore not unique to the Y-OQ. Nevertheless, it is important to note that these and related shortcomings

< previous page

page_519

next page >

< previous page

page_520

next page > Page 520

will affect conclusions about the effectiveness of treatment interventions as measured by the Y-OQ. Evaluation of the Y-OQ Against NIMH Criteria for Outcomes Measures Newman and Ciarlo (1994) suggested 11 criteria by which measures of outcome can be judged. These criteria were based on the recommendations of a panel of experts convened by the National Institute of Mental Health. The following summarize each criterion and provide an evaluation of the Y-OQ's compliance to these criteria. Relevance to Target Group and Independent of Treatment Provided. The Y-OQ is relevant for children from age 4 to 17 that have a primary caregiver who can read at the sixthgrade level. It is most appropriate for tracking outcome in outpatients. It can be applied with inpatients as well, although with greater difficulty due to the caregiver's infrequent contact with the child. Its content is related to day-to-day functioning and is not based on any particular treatment theory or modality. It has proven to be as appropriate in monitoring patients undergoing psychoactive pharmaceutical interventions as well as for those undergoing psychological interventions, based on any theory of change. Simple, Teachable Methods. The Y-OQ was specifically designed for ease of administration. It is intended to be administered by a wide range of service professionals ranging from clinic receptionists to clinicians themselves. Administrative instructions are very straightforward and do not require a complex understanding of the instrument itself. Scoring may be accomplished a number of ways depending on the version of the instrument being used. The most straightforward version provides the Likert point values on the form itself, allowing it to be scored by simply transferring the point value to the appropriate subscale column (also clearly indicated), and then adding up the columns. Recent versions have been produced that can be scanned and scored by computer. A recent commercially released software package also allows for actual Y-OQ administration on the computer with automatic scoring and storage of the data in a cumulative database for each client, as well as tabled treatment summaries by treatment providers. Use of Measures with Objective Referents. The items on the Y-OQ are based on objective constructs indicative of issues of importance to families, thirdparty payers, mental health professionals, and government agencies. However, the very nature of an observerbased measure, derived from parental judgments and a subjective understanding of their child's current condition, limits the nature of the data collected. The Y-OQ is not exempt from this limitation. In fact, it requires not only personal conceptualization of current psychological functioning, but also a rating of intensity. It does not call for counting behavior, reporting school grades, or recording actual behavior as they occur. Typical items include rating child behaviors on a scale from "never" to "almost always" (e.g. "steals or lies," "is fidgety," ''restless," or "hyperactive"). Use of Multiple Respondents. The Y-OQ currently does not make use of multiple respondents. It is limited to the responses of a caregiver whose interactions with the patient are on a daily basis. Forms are being developed for completion by older patients and other informants, such as inpatient clinicians and nurses.

< previous page

page_520

next page >

< previous page

page_521

next page > Page 521

More Process-Identifying Outcome Measures. Again, the Y-OQ only focuses on a subjective understanding of current psychological functioning. It is not intended to identify the process, course, or likely outcome of a pathological condition. Were the Y-OQ designed to measure such constructs, it would likely lose many of its most desirable attributes, ease of administration, short administration time, and straightforward scoring and interpretation. It is likely that repeated administrations of the Y-OQ combined with other meaningful diagnostic data and professional interpretation can provide valuable information leading to process identification. Psychometric Strength: Reliable, Valid, Sensitive to Treatment-Related Change, and Nonreactive. As reported previously in the reliability, validity, and sensitivity sections of this chapter, the Y-OQ is a psychometrically sound instrument exhibiting high validity and consistent reliability, with the ability to measure client change across sessions and to discriminate between normal, outpatient, and inpatient populations. It does not appear to be reactive in the sense that it influences the clients behavior, because the client themselves do not complete the scale. Low Costs. One of the requirements of the Y-OQ design protocol was that it be very cost-effective. Use of the Y-OQ requires a nominal licensing fee, which allows the licensee the lifetime right to reproduce and administer this instrument as well as the Outcome Questionnaire-45.2. Cost per administration thus becomes limited to reproduction and administration costs. Understanding by Nonprofessional Audiences. The Y-OQ was intended for general use in a wide range of settings, and as such it was designed to be easily understood both conceptually and practically by laypersons and professionals alike. This developmental tactic has lent itself to ease of administration, but it has also been discovered that the results of Y-OQ administrations can be easily understood by older patients and parents as well as other nonprofessional observers, when clinicians choose to share that information. Most people appear to understand its utility as similar to a blood test being taken for analysis of current physical functioning. A lower score is likely to indicate better functioning and less pathology, and a high score represents some level of psychological distress. Easy Feedback and Uncomplicated Interpretation. Development of computer scoring as well as hand-scored Y-OQ forms have yielded a straightforward instrument that is typically easy to interpret. Interpretation can begin with comparing the total score of one administration against the norms, establishing the level of distress currently being experienced and whether this would be considered normal or abnormal. Interpretation of a single protocol can become more complicated by looking at the individual subscale domains as well as responses to individual items. However, even this level of interpretation is not very complex. The Y-OQ is also capable of presenting a slightly more complex interpretive picture when repeated measures are used to track individual client progress across sessions. Interpretation can also be expanded to include evaluation of score profiles for a specific treatment provider, therapeutic intervention, or patient population. Feedback follows a similar course from a simple explanation of the total score to complex statistical analysis and explanation of trends, patterns, and cycles. Useful in Clinical Services. The Y-OQ has proven to be very useful in a number of clinical settings. It can help establish levels of needed treatment, justify or nullify an extended number of sessions, track patient progress across time, and monitor treatment

< previous page

page_521

next page >

< previous page

page_522

next page > Page 522

effectiveness. The simplicity of use, low cost, and straightforward interpretation are additional features that make the Y-OQ a very useful tool in a clinical setting. Compatibility with Clinical Theories and Practices. The Y-OQ was intentionally developed to be atheoretical with regard to extant psychological theories. This was done with the hope that it would allow the Y-OQ to be a powerful and meaningful instrument for any clinician to use regardless of clientele, theoretical perspective, or therapeutic style. The current research with the Y-OQ has shown that it can be used effectively in a diverse range of settings, providing meaningful if not different information in each instance. Obviously, differing theories and practical applications are going to require varied implementation strategies; however, to date, the Y-OQ appears to be flexible enough to meet these needs and demands. Research Findings Relevant to Use of the Y-OQ as an Outcome Measure Sensitivity to Change. The Y-OQ was not designed to be a diagnostic instrument, but rather as an assessment tool to track patient progress and treatment outcome, so its utility rests on its ability to be sensitive to change during and following participation in therapy or treatment. A logical criterion against which to compare the Y-OQ's sensitivity to change is subsequent administrations of the Y-OQ. The Y-OQ has proven to be a very stable measure with testretest reliability estimate of .83 in nonpatients (Atkin et al., 1997). This suggests that over brief periods of time (i.e., 2- and 4-week intervals), outside of the influence of treatment or other intervening events, Y-OQ scores should remain relatively constant. Therefore, observed changes in Y-OQ scores over relatively brief time periods that exceed the magnitude expected by known instability coefficients (test-retest estimates) are suggestive of meaningful changes in the pattern and level of a subject's symptomatology. Thus, to a great extent, the Y-OQ's construct validity depends on its ability to detect change following interventions such as psychotherapy. Accordingly, it is expected that the scores of patients receiving psychological or psychopharmacological interventions would become lower over time. Past psychotherapy research demonstrates a pattern wherein most patients typically improve in therapy, and a portion improve in placebo treatments. Moreover, the greatest gains are expected to take place by the eighth therapy session (Lambert & Bergin, 1994). Given all this, the litmus test for the Y-OQ is its sensitivity to change. Unfortunately, there are several methods used to operationalize "sensitivity to meaningful change." The simplest is to calculate a pre- to posttreatment difference score. Fortunately, the reliability of this difference score can be calculated according to the method described by Allen and Yen (1979), which provides a frame of reference to compare difference scores. This method essentially estimates the inflation in difference scores due to shared variance in pretreatment and posttreatment observations (i.e., correlation between the two scores). A second way to sensitivity to change is by using methods proposed by Jacobson and Truax (1991). This involves applying two statistical indices to the pre-post change to classify clients as "recovered" or "significantly changed." The first index is based on the principle that following clinical intervention, a client's outcome or posttest score should be in the normal population range of functioning rather than in the dysfunctional population range. A cutoff score that establishes the threshold between the functional and

< previous page

page_522

next page >

< previous page

page_523

next page > Page 523

dysfunctional populations is calculated and serves as the criterion for "recovery" against which the posttest score is compared. Two such cutoff scores, or thresholds, have been computed for the Y-OQ that differentiate between the severe symptomatology characteristic of inpatient populations, the more moderate symptomatology characteristic of outpatient samples, and the lower range scores characteristic of community normals. The second statistical indice is Jacobson and Truax's Reliable Change Index (RCI), a ratio formed by subtracting the pretreatment score from the posttreatment score and then dividing this difference by the standard error of the differences between the two test scores. This index essentially controls for variation in pre- to posttreatment scores that can be attributed to measurement error inherent in the test (known test-retest variability). The RCI value that has been computed for the Y-OQ is 13, meaning that an individual's score must change by at least 13 points to be considered reliable change (Burlingame et al., 1996). When both of these indices are met, a client is considered to have demonstrated clinically significant change. Specifically, if the pre- to posttreatment difference score exceeds the RCI value (13) and the posttest score has crossed the threshold between a dysfunctional and functional population, then the client is classified as having demonstrated clinically significant change. Speer (1992) described a third method of classifying client change. This approach, known as the EdwardsNunnally (EN) method, involves centering confidence intervals of ±2 standard errors of measurement on the client's unbiased estimated or true initial score. If the client's postscore then falls outside the confidence interval, then it is considered significantly different from the prescore at p < .05. The EN method is somewhat more conservative than the RCI method because it adjusts for regression to the mean by using the estimated true score. The formulas used in calculating each of these methods are listed in Table 17.14. Mosier, Burlingame, Nebeker, and Wells (1997) gathered data from an outpatient sample of 185 on the Y-OQ to directly compare these three methods for detecting and interpreting change. The sample included 103 males and 83 females ranging in age from 4 to 18 years (M = 12.24, SD = 3.40). The majority (n = 124) were seen in an outpatient setting; however, the sample also included children and adolescents receiving day or residential treatment. Pre- to posttreatment change scores were normally distributed, ranging from -151 to 82. The average participant change score was 16.6 (SD = 37.8), indicating overall improvement in the sample that met the RCI criteria. The reliability of Mosier et al.'s (1997) pre- to posttest difference scores, when estimated via the method suggested by Allen and Yen (1979), yielded an estimate of TABLE 17.14 Formulas Used in Measuring Sensitivity to Change Method Formula Allen & Yen rDD = 1/2 (rxx + ryy) ryx / 1 ryx Jacobsen & Truax RCI = xAxL / Sdiff EdwardsxL > or < [rxx (xAM) + M] ± 2 SD(1 rxx) 1/2 Nunnally Note. rxx = internal reliability estimate of Time 1 Y-OQ total scores; ryy = internal reliability estimate of Time 2 Y-OQ total scores; ryx = the correlation between Time 2 Y-OQ total score and Time 1 Y-OQ total score; xL = individual's raw last assessment score; xA = individual's raw assessment score; Sdiff = standard error of the differences between xA and xL; M = population mean of total Y-OQ scores; SD = population standard deviation of Y-OQ total scores; rxx = test-retest reliability of the Y-OQ.

< previous page

page_523

next page >

< previous page

page_524

next page > Page 524

.78. This value is within the conventional range of acceptability and leads to greater confidence in drawing comparative conclusions of the difference scores. Although high difference score reliability is expected (given the aforementioned reliability of the Y-OQ), it cannot be guaranteed. Thus, the importance of generating empirical estimates of such cannot be underestimated. Although all methods support the Y-OQ with regard to sensitivity to change in the Mosier et al. (1997) sample, each produces a different profile of overall patient change. The Edwards-Nunnally method determined that of the 185 patients, 85 were improved, 22 unchanged, and 78 deteriorated. Using the Reliable Change Index, 91 subjects were classified as significantly improved, 59 as unchanged, and 35 as deteriorated. The actual frequency totals and the percentage of hits and misses are depicted in Table 17.15. Although there is a high rate of agreement between the two methods regarding which subjects are improving (93%), more discrepancy exists among those subjects who are classified as either remaining the same or deteriorating (45% and 45%, respectively). The EN method classified 46% of Mosier et al.'s (1997) subjects as improved and the RCI method classified 49% improved. This finding fits literature-based expectations, with the majority of subjects exhibiting improvement and a portion of presumably more difficult or recalcitrant cases remaining the same or deteriorating. The EN method classifies more subjects as deterioraters (i.e., worse off than at admission or initial administration) and fewer subjects as unchanged subjects than the RCI method. However, this is largely due to the asymmetrical confidence interval that is generated via the EN method and should not be interpreted to mean that the EN method is necessarily superior to the RCI method. Moreover, the findings of this study are in agreement with those reported by Speer (1992), giving greater confidence in both (i.e., replication). Sensitivity to change is the raison d'être for a treatment outcome device. In the ideal circumstance, any recorded change of scores would represent actual change in behavior. In the clinical world, however, measurement error, regression to the mean, and other possible confounding variables require statistical techniques to determine how much change in recorded scores is necessary to validly demonstrate that meaningful behavioral change has occurred. Overall, the findings of the Mosier et al. (1997) study suggest that change is occurring in the expected direction and support the Y-OQ as a measure that is sensitive to the changes occurring in the pattern and level of subjects' symptomatology when measured over the course of treatment. Two other Y-OQ studies also speak to this question: an inpatient treatment study designed primarily to test the effectiveness of the current "stabilization" goals of managed health care (Wells et al., 1997) and a TABLE 17.15 Cross-tabulation of EN and RCI Classification Methods Reliable Change Index Edwards-Nunnally Improved Unchanged Deteriorated Total Improved (87%) 79 (93%) (10%) 6 (7%) ** 85 Unchanged (13%) 12 (55%) (17%) 10 (45%) ** 22 Deteriorated ** (73%) 43 (55%) (100%) 35 (45%) 78 Total 91 59 35 185 Note. Percentages for EN values are on the left side and read from top to bottom. Percentage values of clasiffication for RCI method are to the right of the actual frequency count and are read horizontally.

< previous page

page_524

next page >

< previous page

page_525

next page > Page 525

psychoeducational treatment study of adjudicated youth (McCollam, Burlingame, Vanderwal, Hardinger, & Wells, 1997). Wells et al. (1997) focused on evaluating behavior change via RCI standards in an inpatient population of 36 children and adolescents. They found that 61% of the sample achieved clinically significant change (differences scores exceeding the RCI of 13 plus Time 2 falling within the normative cutoff score), 28% made change that was less than the RCI (defined as no change), and 11% produced reliable deterioration (i.e., more severe symptomatology at Time 2 than at Time 1). Note that Weisz, Weiss, Han, Granger, and Morton's (1995) metaanalysis reported a similar pattern of effectiveness for child mental health interventions. Finally, McCollam et al. (1997) evaluated sensitivity to change in a psycho-educational group program for court-adjudicated youth. Parents of 95 adolescents completed Y-OQs at intake and at the final treatment session. Matched-pair t-test comparisons of pre- and posttreatment scores found no significant differences. However, RCI indices of change found approximately 25% of the sample reaching criterion for reliable improvement, whereas an additional 25% reached criterion for reliable deterioration. Thus, the net "no-change" finding on the t-test comparisons can be explained by the change exhibited by the improvers being cancelled out by an equivalent number of deterioraters. These findings illustrate the importance of not confounding therapeutic change produced by effective treatment with the average observed change on the instrument. Careful evaluation of the treatment and the characteristics of the population (psycho-educational group treatment for adolescents who, by definition, are repudiating adult guidance) suggests a less intensive treatment presented to a difficult audience as a more persuasive explanation of the findings rather than the measure's inability to detect meaningful change. Clinical Applications of the Y-OQ for Outcomes Assessment Figure 17.4 provides the report generated from the software program Y-OQ-OATS for the purpose of providing clinicians with feedback. This report form can be used clinically as a part of the therapy process as well. The report form provides graphical representation of patient progress over the course of therapy, with emphasis on the most recent administration of the scale. Detailed information is provided about the patient's most recent level of functioning, including critical item scores and subscales scores presented in relation to normative groups. Graphical representation of change over time clearly shows any trends in change that may be important in treatment decision making. This format provides the clinician with useful summary of the pattern of improvement, the range of patient functioning, as well as problems that the patient may be currently experiencing. Use of Findings from the Y-OQ with Other Evaluation Data To date, the Y-OQ has not been used simultaneously with other measures of treatment outcome. Thus, it is not possible to compare outcomes across measures for the purpose of identifying measures that produce larger or smaller effect sizes. Future research should be aimed at such comparisons. One study from the research program is underway but results are not yet available, wherein Y-OQ change scores are correlated with empirically

< previous page

page_525

next page >

< previous page

page_526

next page > Page 526

Fig. 17.4. Y-OQ-OATS computerized printout of summary for S.D. demonstrated risk factors for psychopathology such as arrest and recidivism, school performance, and other environmental indices. Provision of Feedback Regarding Outcomes Assessment Findings Feedback based on the results of Y-OQ administrations may be used in a wide range of applications. Frequently, parents will ask what purpose the measure serves and inquire as to their personal results. The course of action to be followed here is typically left for

< previous page

page_526

next page >

< previous page

page_527

next page > Page 527

the clinician to determine. This may include a full disclosure of the results. Such an inquiry is essentially the equivalent of a patient's parents asking the question, "How is my son doing . . . is he getting better?" and should be handled on a case-by-case basis. Charting the progress of a specific client may also be quite informative to a clinician and can even provide validating feedback as to therapeutic setbacks, stagnation, or rate and pattern of progress. For a clinician or a third-party provider, the most meaningful feedback is typically provided by an aggregate of clients and sessions. Once Y-OQ results have been accumulated across multiple clients and sessions, the resulting data may provide critical feedback on the progress of patients, typical patterns of improvement for the patients of different clinicians, and the effectiveness of treatments found in various hospitals and regions. To date, the most effective means of accessing this vital information is through the use of the computerized administration and scoring program. Clients can take the Y-OQ on the computer terminal itself, or a clinician can enter responses or score totals from a completed profile. The program will then provide tabled results describing clinician or clinic efficacy in terms of percentage of clients improved and/or recovered across the number or sessions used. An example of the clinician record is provided in Fig. 17.5. Figure 17.5 presents the output graph comparing the therapists working in a large metropolitan children's hospital. This routine report allows clinicians and program directors to see a summary of patient outcomes. It is generated by the OQ-OATS software program. A glance at the report shows the percent of patients who have made clinically significant improvement at the time of the report as well as reliable deterioration. Additionally, it allows clinicians to evaluate (to some degree) the severity of their caseload relative to other clinicians. For example, there are obvious average differences between Clinician A and Clinician B's clients at intake, as well as interesting differences between the amount of change they show at intake and last testing. When comparisons across hospitals, clinics, programs, or providers are made, they must be initially case-mix adjusted to produce equivalent findings. Case mixing is accomplished through a variety of methods on any number of relevant variables (e.g., initial severity scores, patient diagnosis, chronicity of disorders), with the goal of each being a balanced or matched comparison based on the patient's disorder or severity of illness. The present research program has shown some promising efforts toward equalizing patient characteristics in comparative analyses, but empirically validated case-mix adjustments using the Y-OQ are not yet ready for general clinical application. Limitations in the Use of the Y-OQ for Outcomes Assessment High intercorrelations were found for the Y-OQ subscales, suggesting that the subscales share considerable variance and are likely tapping a common underlying source of variance. These findings mirror the findings of analysis conducted on frequently used tests such as the Minnesota Multiphasic Personality Inventory (MMPI; Butcher, Graham, Williams, & Ben-Porath, 1989; W.G. Dahlstrom, Welsh, & L.E. Dahlstrom, 1972), Symptom Checklist-90-R (SCL-90-R; Cyr, Doxey, & Vigna, 1988; Cyr, McKenna-Foley, & Peacock, 1985), Millon Clinical Multiaxial Inventory (MCMI; Millon, 1981), and Inventory for Interpersonal Problems (IIP; Horowitz, Rosenberg, Baer, Ureno, & Villasenor, 1988). Despite such high intercorrelations among subscales

< previous page

page_527

next page >

< previous page

page_528

next page > Page 528

Fig. 17.5. Y-OQ-OATS printout comparison across treatment settings. of prominent psychological measures, researchers have provided persuasive rationales for why individual subscales may still make unique clinical contributions. For example, highly related subscales on the MMPI Hysteria and Hypochondriasis scales have been found to provide unique classification value (Dahlstrom et al., 1972; McKinley & Hathaway, 1944). It may be that highly related Y-OQ subscales will find utility in their ability to provide unique classification information. However, until further research empirically demonstrates the unique utility of individual Y-OQ subscales, the total score should remain the primary unit of analysis for clinician and researcher alike. Perhaps the biggest limitation of the Y-OQ remains in the area of implementation. As with all outcome measures, the most difficult challenge with the Y-OQ is the systematic

< previous page

page_528

next page >

< previous page

page_529

next page > Page 529

collection of data from informants across clinical sessions to interpret the overall pattern of change. It may be that the problem of missing data will be somewhat resolved as more and more HMOs require such data from their health care provider panels. Potential Use of the Y-OQ as a Data Source for Mental Health Service Report Cards As already pointed out, with the costs of mental health care rising, third-party providers are more frequently requiring health care providers to document therapeutic progress. This requirement for "accountability" leaves clinicians in fear of losing their livelihood if their patients do not exhibit documentable improvement. Further concerns revolve around comparisons that fail to take into account case-mix adjustments where the severity of client pathology is empirically factored into health care administrators' expectations for therapeutic progress (Wells, Burlingame, Lambert, Hoag, & Hope, 1996). But there is reason for optimism. The results of outcome research based on HMOs covering tens of millions of lives has begun to paint a field-based, realistic picture of treatment outcome. This portrait appears to be in line with the actual experiences of providers rather than the assumed limits and restraints of health care management and provider systems. Nonetheless, the age of accountability (Burlingame et al., 1995) no longer gives clinicians the same degree of freedom they once enjoyed. The marriage of outcome research and managed health care has not resulted in the grim demise that has been predicted. Case Studies The cases illustrated are actual treatment cases from a large inpatient and outpatient treatment facility. Like many cases from hospital charts, more details about significant treatment and environmental events, as well as more frequent administrations of the Y-OQ, would be most helpful. Nevertheless, the cases that follow illustrate how the Y-OQ is used in actual practice. Figure 17.6 presents the tracking sheet for B.L., an 8-year-old boy being treated for intense aggressive behaviors. He had received a diagnosis of attention deficit hyperactivity disorder and oppositional defiant disorder. B.L.'s level of initial severity placed him in the range most often seen with children being treated on an inpatient service; however, he was treated on an outpatient basis and his parents received corrolary intervention in the form of parent training. Although the imposition of a behavior modification program had a positive effect on his total symptomatic profile (Y-OQ total), as evidenced by his improvement extending beyond the RCI index, his acting out was still extreme enough that it was particularly problematic when his parents moved rather suddenly from the community to seek other employment. Examination of chart notes suggests that a careful diagnostic workup might have found B.L. to fit the criteria for the early onset conduct disordered child (Hinshaw, Lahey, & Hart, 1993). Figure 17.7 highlights the rapid and very beneficial response to antidepressant medication and psychotherapy on the part of I.H., a 10-year-old female experiencing major depression and suicidal feelings. Y-OQs were completed by the mother at intake, at 1 month into treatment, and at termination after 3 months of treatment. The chart clearly

< previous page

page_529

next page >

< previous page

page_530

next page > Page 530

Fig. 17.6. Case example of B.L., an 8-year-old male treated for intense aggressive behaviors.

Fig. 17.7. Case example of I.H., a 10-year-old female diagnosed with major depression.

< previous page

page_530

next page >

< previous page

page_531

next page > Page 531

and succinctly illustrates the progress of the child. Note that clinicians have found the availability of the tracking sheets help to decrease the length of progress and termination reports. Conclusions Like its predecessor, the OQ-45, the Youth Outcome Questionnaire is designed to be a brief, psychometrically standardized instrument for tracking the progress of child and adolescent patients receiving mental health treatment. Parents or other caretakers with reasonable acquaintance with the patient's behavior complete the questionnaire first at intake and then at intervals throughout treatment. Scores from six content domains are summed to provide an overall severity elevation, which may roughly indicate the appropriate level of treatment. Repeated measures permit evaluation of the patient's changes in behavior as perceived by the parent/caretaker respondent. Established cutoff scores and RCI values signal significant symptomatic change and the point in time when the patient's behavior has become similar in elevation to the untreated community normal group. Although its history is still quite brief and much research remains to be done, evidence thus far suggests that the Y-OQ holds considerable promise. It requires little time to complete, is easy to administer, and is available at nominal cost. It has been translated into Spanish, French, and Laotian, and ethnic group comparisons are underway. Thus far, it has demonstrated acceptable-to-excellent reliability and validity, has been shown to be sensitive to change, and has been found to be useful both by clinicians and health care administrators. Currently, data analyses suggest that like the OQ-45, interpretation of the total score rather than subscale scores is the most justifiable tracking procedure; however, subscales and individual item responses may be used for hypothesis generation like any other objective measure. Further refinements in the near future include confirmatory factor analyses, establishment of norms for more specific treatment populations (residential treatment, court-adjudicated youth), parallel self-report versions for adolescents, and a primary care physician screening version. References Achenbach, T.M. (1991). Manual for the Child Behavior Checklist/4-18 and 1991 profile. Burlington, VT: University of Vermont, Department of Psychiatry. Allen, J.S., Jr., Tarnowski, K.J., Simonian, S. J., Elliot, D., & Drabman, R.S. (1991). The generalization gap revisited: Assessment of generalized treatment effects in child and adolescent behavior therapy. Behavior Therapy, 22, 393-405. Allen, M.J., & Yen, W.M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole. Atkin, Q., Whoolery, M.L., Peterson, G., Burlingame, G.M., Wells, M.G., & Nebeker, R.S. (1997, April). Reliability and validity of the Youth Outcome Questionnaire. Paper presented at annual meeting of the Western Psychological Association convention, Seattle, WA. Baer, R.A., & Nietzel, M.T. (1991). Cognitive and behavioral treatment of impulsivity in children: A metaanalytic review of the outcome literature. Journal of Clinical Child Psychology, 20, 400-412. Barrnett, R.J., Docherty, J.P., & Frommelt, G.M. (1991). A review of psychotherapy

< previous page

page_531

next page >

< previous page

page_532

next page > Page 532

research since 1963. Journal of the American Academy of Child and Adolescent Psychiatry, 30, 1-14. Bergin, A.E., & Garfield, S.L. (1994). Handbook of psychotherapy and behavior change (4th ed.). New York: Wiley. Brown, J. (1987). A review of meta-analyses conducted on psychotherapy outcome research. Clinical Psychology Review, 7, 1-23. Bryk, A.S., & Raudenbush, S.W. (1988). Toward a more appropriate conceptualization of research on school effects: A three level hierarchical linear model. American Journal of Education, 97, 65-108. Burlingame, G.M., Lambert, M.J., Reisinger, C.W., Neff, W.L., & Mosier, J.I. (1995). Pragmatics of tracking mental health outcomes in a managed care setting. Journal of Mental Health Administration, 22, 226-236. Burlingame, G.M., Wells, M.G., & Lambert, M.J. (1996). Youth Outcome Questionnaire. Stevenson, MD: American Professional Credentialing Services. Butcher, J.N., Graham, J.R., Williams, C.L., & Ben-Porath, Y. (1989). Development and use of the MMPI-2 content scales. Minneapolis: University of Minnesota Press. Casey, R.J., & Berman, J.S. (1985). The outcome of psychotherapy with children. Psychological Bulletin, 98, 388-400. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Academic Press. Conners, C.K. (1989). Manual for Conners Rating Scales. North Tonawanda, NY: Multi-Health Systems. Cronbach, L.J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row. Cummings, N.A. (1987). The future of psychotherapy: One psychologist's perspective. American Journal of Psychotherapy, 41, 349-360. Cyr, J.J., Doxey, N.C., & Vigna, C.M. (1988). Factorial composition of SCL-90-R. Journal of Social Behavior and Personality, 3, 245-252. Cyr, J.J., McKenna-Foley, J.M., & Peacock, E. (1985). Factor structure of the SCL-90-R: Is there one? Journal of Personality Assessment, 49, 571-578. Dahlstrom, W.G., Welsh, G.S., & Dahlstrom, L.E. (1972). An MMPI handbook: Vol. 1. Clinical interpretation (rev. ed.). Minneapolis: University of Minnesota Press. Derogatis, L.R., & DellaPietra, L. (1994). Psychological tests in screening for psychiatric disorder. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 22-54). Hillsdale, NJ: Lawrence Erlbaum Associates. Dinh, K., Sarason, B., & Sarason, L. (1994). Parent-child relationships in Vietnamese immigrant families. Journal of Family Psychology, 8, 471-488. Drotar, D., La Greca, A.M., Lemanek, K., & Kazak, A. (1995). Case reports in pediatric psychology: Uses and guidelines for authors and reviewers. Journal of Pediatric Psychology, 20, 549-566. Durlak, J.A., Wells, A.M., Cotten, J.K., & Johnson, S. (1995). Analysis of selected methodological issues in child psychotherapy research. Journal of Clinical Child Psychology, 24, 141-148. Fonagy, P. (1996, October). Evaluating the effectiveness of interventions in child psychiatry: The state of the art. Invited address at the Kansas Conference on Clinical Child Psychology, Lawrence, KS. Grossman, P.B., & Hughes, J.N. (1992). Self-control interventions with internalizing disorders: A review and analysis. School Psychology Review, 21, 229-245. Health Care Financial Administrative (1992). Health care financial administrative review. Washington, DC: U.S. Department of Health and Human Services. Hinshaw, S.P., Lahey, B.B., & Hart, E.L. (1993). Issues of taxonomy and comorbidity in the development of conduct disorder. Development and Psychopathology, 5, 31-49. Hoag, M.J., & Burlingame, G.M. (1997). Evaluating the effectiveness of child and adolescent group treatment: A meta-analytic review. Journal of Clinical Child Psychology, 26, 234-246. Horowitz, L.M., Rosenberg, S.E., Baer, B.A., Ureno, G., & Villasenor, V.S. (1988). Inventory of Interpersonal Problems: Psychometric properties and clinical applications. Journal of Consulting and Clinical Psychology, 56, 885-892. Institute of Medicine (1989). Research on children and adolescents with mental, behavioral,

< previous page

page_533

next page > Page 533

and developmental disorders. Washington, DC: National Academy Press. Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Kazdin, A.E. (1988). Child psychotherapy: Developing and identifying effective treatments. Elmsford, NY: Pergamon. Kazdin, A.E. (1990). Premature termination from treatment among children referred for antisocial behavior. Journal of Child Psychology and Psychiatry, 31, 415-425. Kazdin, A.E. (1991). Treatment research: The investigation and evaluation of psychotherapy. In M. Hersen, A.E. Kazdin, & A.S. Bellack (Eds.), The clinical psychology handbook (2nd ed., pp. 293-312). New York: Pergamon. Kazdin, A.E. (1993). Psychotherapy for children and adolescent psychotherapy research: Limited sampling of dysfunctions, treatments, and client characteristics. Journal of Consulting and Clinical Psychology, 58, 729-740. Kazdin, A.E. (1995). Scope of child and adolescent psychotherapy research: Limited sampling of dysfunctions, treatments, and client characteristics. Journal of Clinical Child Psychology, 24, 125-140. Kazdin, A.E., Bass, D., Ayers, W.A., & Rodgers, A. (1990). Empirical and clinical focus of child and adolescent psychotherapy research. Journal of Consulting and Clinical Psychology, 58, 729-740. Kazdin, A.E., Siegel, T.C., & Bass, D. (1990). Drawing upon clinical practice to inform research on child and adolescent psychotherapy: A survey of practitioners. Professional Psychology: Research and Practice, 21, 189198. Kendall, P.C., & Morris, R.J. (1991). Child therapy: Issues and recommendations. Journal of Consulting and Clinical Psychology, 59, 777-784. Kovacs, M., & Paulaskas, S. (1986). The traditional psychotherapies. In H.D. Qual & J.S. Werry (Eds.), Psychopathological disorders of childhood (3rd ed., pp. 496-522). New York: Wiley. Lachar, D. (1982). Personality Inventory for Children (PIC) revised format manual supplement. Los Angeles: Western Psychological Services. Lambert, M.J., & Bergin, A.E. (1994). The effectiveness of psychotherapy. In A.E. Bergin & S.L. Garfield (Eds.), The handbook of psychotherapy and behavior change (4th ed., pp. 143-189). New York: Wiley. Lambert, M.J., & Burlingame, G.M. (1996). Outcome Questionnaire 45.2 Wilmington, DE: American Professional Credentialing Services. Lambert, M.J., Thompson, K.C., Nebeker, R.S., & Andrews, A. (1996, June). The retest artifact and its implications for establishing dose-effect estimates of patient change following psychotherapy. Paper presented at the annual meeting of the Society for Psychotherapy Research. Amelia Island, FL. McCollam, P.M., Burlingame, G.M., Vanderwal, G., Hardinger, C., & Wells, M.G. (1997, April). The Youth Outcome Questionnaire: Characteristics of a juvenile delinquent population. Paper presented at the annual meeting of the Western Psychological Association Convention, Seattle, WA. McKinley, J.C., & Hathaway, S.R. (1944). The MMPI: Hysteria, hypomania, and psychopathic deviate. Journal of Applied Psychology, 28, 153-174. Millon, T. (1981). Disorders of personality: DSM III, Axis II. New York: Wiley. Mills, J.R., Manivanh, T., Burlingame, G.M., Wells, M.G., Peterson, G., & Nuttall, M. (1997, April). Developing the Youth Outcome Questionnaire (Y-OQ) for Lao refugeee children and adolescents. Paper presented at the annual meeting of the Western Psychological Association Convention, Seattle, WA. Mosier, J.I., Burlingame, G.M., Nebeker, R.S., & Wells, M.G. (1997, April). Correlates of change in treatment in a clinical sample of children and adolescents. Paper presented at the annual meeting of the Western Psychological Association Convention, Seattle, WA. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Peterson, L., & Bell-Dolan, D. (1995). Treatment outcome research in child psychology:

< previous page

page_533

next page >

< previous page

page_534

next page > Page 534

Realistic coping with the ''ten commandments of methodology." Journal of Clinical Child Psychology, 24, 149-162. Prout, H.T., & DeMartino, R.A. (1986). A meta-analysis of school based studies of psychotherapy. Journal of School Psychology, 24, 285-292. Roberts, A.R., & Camasso, M.J. (1991). The effects of juvenile offender treatment programs on recividism: A meta-analysis of 46 studies. Notre Dame Journal of Law, Ethics, and Public Policy, 5, 421-441. Russell, R.L., Greenwald, S., & Shirk, S.R. (1991). Language change in child psychotherapy: A meta-analytic review. Journal of Consulting and Clinical Psychology, 51, 42-53. Shirk, S.R., & Russell, R.L. (1992). A reevaluation of estimated of child therapy effectiveness. Journal of American Academy of Child and Adolescent Psychiatry, 31, 703-709. Speer, D.C. (1992). Clinically significant change: Jacobson and Truax (1991) revisited. Journal of Consulting and Clinical Psychology, 60, 402-408. Survey of Current Business. (1992). Washington, DC: Department of Commerce. Tingey, R., Lambert, M., Burlingame, G., & Hansen, N. (1996). Assessing clinical significance: Proposed extensions to method. Psychotherapy Research, 6, 109-123. Tramonta, M.G. (1980). Critical review of research on psychotherapy outcome with adolescents: 1967-1977. Psychological Bulletin, 88, 429-450. Vermeersch, D.A., Durham, C.J., & Lambert, M. J. (1997, April). The sensitivity to change of items on a psychotherapy outcome questionnaire. Paper presented at the annual conference of the Rocky Mountain Psychological Association, Las Vegas, NV. Weisz, J.R., Weiss, B., Alicke, M.D., & Klotz, M. L. (1987). Effectiveness of psychotherapy with children and adolescents: A meta-analysis for clinicians. Journal of Consulting and Clinical Psychology, 55, 542-549. Weisz, J.R., Weiss, B., & Donenberg, G.R. (1992). The lab versus the clinic: Effects of child and adolescent psychotherapy. American Psychologist, 47, 1578-1585. Weisz, J.R., Weiss, B., Han, S.S., Granger, D.A., & Morton, T. (1995). Effects of psychotherapy with children and adolescents revisited: A meta-analysis of treatment outcome studies. Psychological Bulletin, 117, 450-468. Wells, M.G., Burlingame, G.M., Lambert, M.J., Hoag, M., & Hope, C. (1996). Conceptualization and measurement of patient change during psychotherapy: Development of the Youth and Adult Outcome Questionnaires. Psychotherapy: Theory, Research, and Practice, 33, 275-283. Wells, M.G., Mohlman, J., Trecarico, S., Gandhi, P., Turner, L., Stark, C., Peterson, S., & Burlingame, G.M. (1997, April). Measuring therapeutic change among child and adolescent inpatients using the Youth Outcome Questionnaire. Paper presented at the annual meeting of the Western Psychological Association, Seattle, WA. Werry, J.S. (1986). Physical illness, symptoms, and allied disorder. In H.C. Quay & J.S. Werry (Eds.), Psychopathological disorders of childhood (3rd ed., pp. 232-293). New York: Wiley. Whalen, C.K. (1989). Attention deficit and hyperactive disorders. In T.H. Ollendick & M. Hersen (Eds.), Handbook of child psychopathology (pp. 131-169). New York: Plenum. Willet, J.B. (1989). Some results on reliability for the longitudinal measurement of change: Implications for the design of studies of individual growth. Educational and Psychological Measurement, 49, 587-602.

< previous page

page_534

next page >

< previous page

page_535

next page > Page 535

Chapter 18 Use of the Devereux Scales of Mental Disorders for Diagnosis, Treatment Planning, and Outcome Assessment Jack A. Naglieri Ohio State University Steven I. Pfeiffer Duke University The purpose of this chapter is to describe the use and interpretation of the Devereux Scale of Mental Disorders (DSMD; Naglieri, LeBuffe, & Pfeiffer, 1994). The scale is described in detail, as is its development, standardization, and norming procedures. The interpretation method described in the DSMD manual is presented. In addition, the computer scoring and interpretation program is used to illustrate how the DSMD can be used. Special attention is given to the issue of treatment planning and evaluation of treatment effectiveness. Overview of the Instrument The DSMD is a behavior rating scale that can be used to assess maladjustment in children and adolescents from age 5 to 18. The instrument is designed to identify psychopathological and behavioral problems in children and adolescents by evaluating overt behaviors exhibited by the individual. The DSMD provides this information from the report of the parent, teacher, or professional. Ratings from each of a child's parents, as well as from teachers or other professionals who have had the opportunity to observe the child, can provide a rich source of information about the variability or consistency of behavior across several settings and under different environmental conditions. The behavior rating scale is easy to administer and score either by hand or by using the computer scoring system. The DSMD is especially useful for the direct assessment of changes in behavior over time as a function of psychological, psychiatric, or behavioral treatment, as described by Pfeiffer (1989), because of several unique psychometric features included during development of the instrument. The DSMD is comprised of 111 (age 5-12) or 110 (age 13-18) items included on a two-page record form. Each item begins with the stem "During the past 4 weeks, how often did the child . . ." or "During the past 4 weeks, how often did the adolescent . . ." for the two versions of the scale. Teachers and parents rate the child using the

< previous page

page_535

next page >

< previous page

page_536

next page > Page 536

same form. The ratings "Never," "Rarely," "Occasionally," "Frequently," and ''Very Frequently" are assigned scores of 0 through 5, respectively, on the second page of the record form. The rater's marks are automatically transferred from the page they see to the second page that contains the scoring system. Scoring the DSMD is easily completed and accomplished with the assistance of a well-organized and visually informative record form. For example, arrows are used to tell the user of the flow of the scoring system and text is included that instructs the practitioner which norms tables to use. Scoring involves the following four steps: 1. Item scores are summed to yield raw scores for each of six scales. 2. Each scale's raw score is converted to a T-score using a conversion table. 3. The six scales are combined into three pairs to yield a sum of T-scores used to obtain a T-score for Externalizing, Internalizing, and Critical Pathology Composite scales. 4. The sum of the six scale T-scores is used to obtain a T-score for the Total scale. The DSMD yields an overall score and scores for several factorially derived scales (see the Validity section later) that reflect major categories of psychopathological symptoms. The rating scale yields standard scores for the three broad scales of Externalizing, Internalizing, and Critical Pathology. Within each of these three scales are two separate scales: Conduct, Attention (for age 5-12 only), Delinquency (for age 13-18 only), Anxiety, Depression, Autism, and Acute Problems. Additionally, the DSMD provides an approach for the evaluation of specific item scores outside of the normal range and can be used to guide diagnosis and treatment planning. The DSMD can therefore aid professionals in identifying psychological or emotional difficulties, in specifying the type of psychopathology, and in formulating a treatment plan. Summary of Development The DSMD is a revision of the Devereux Child Behavior Rating Scale (Spivack & Spotts, 1966) and Devereux Adolescent Behavior Rating Scale (Spivack, Spotts, & Haimes, 1967), which were among the first behavior rating scales developed. These scales were developed as measures of behaviors that provide information about how "the child relates to his world of things and people" (Spivack & Levine, 1964, p. 702). The identification of the original behaviors included in the scales was based on extensive field research that proved to be an effective and efficient method of detecting behavioral problems associated with psychopathology in children and adolescents. Item construction for the DSMD was based on three sources. The original items included in the previous behavior scales were examined. These items were categorized according to the DSM-III-R (American Psychiatric Association, 1987) and the DSM-IV Options Book: Work in Progress (American Psychiatric Association, 1991). Additional items were written where needed. Other relevant literature (e.g., Garfinkel, Carlson, & Weller, 1990; Hooper, Hynd, & Mattison, 1992; Lewis & Miller, 1990) was considered, as was the need to modify items that included outdated language (e.g., terms no longer in use, sexist language). The results of these efforts served as the starting point in the construction of the DSMD. One important value of using the DSM as a structure for item identification was to formalize the assessment of a large set of behaviors associated with psychopathology that are representative of the major categories. The advantage of using the DSM as a base is that it provides a rich source of options to include and then evaluate from a

< previous page

page_536

next page >

< previous page

page_537

next page > Page 537

psychometric perspective. Second, this approach provides the kinds of information that can aid professionals in the selection of an appropriate therapeutic intervention. Although diagnosis does not necessarily determine the exact type of therapy, it may narrow the field of choices considerably and, therefore, add to the efficiency, quality, and effectiveness of treatment. Accurate diagnosis can be especially important when selecting, for example, behavioral, cognitive, and psychopharmacological therapies that can have differential effectiveness. This was noted by Dougherty, Saxe, Cross, and Silverman (1987), who concluded that behavioral treatment is clearly effective for phobias and enuresis, and cognitive behavioral therapy is effective for a range of disorders involving self-control (except aggressive behavior). Group therapy has been found to be effective with delinquent adolescents, and family therapy appears to be effective for children with conduct disorders and psychophysiological disorders. Psychopharmacological treatment, while not curative, has been found to have limited effectiveness with children with ADD-H, depression, or enuresis, and also in managing the behavior of children who are severely disturbed. (p. 114) These conclusions emphasize the need for accurate differential diagnosis, which should influence the selection of the most appropriate therapeutic intervention and eventual outcomes. Using a child's scores on the various scales of the DSMD in conjunction with information from item-level analysis, along with a psychosocial history and other findings, the professional can determine the existence of psychological or emotional difficulties, specify the type of psychopathology according to the DSM-IV, and develop an effective treatment plan. The reading level of the items and rater directions were also carefully written so that the overall readability level of the text would be as low as possible. All text was evaluated according to the Living Word Vocabulary (Dale & O'Rourke, 1981), which provides a percentage score on more than 44,000 words and terms familiar to students in grades 4, 6, 8, 10, 12, 13, and 16. Words that were too difficult were eliminated. Direct and simple instructions were written with readability of the average newspaper (about sixth-grade level). The item development phase also included a review of the items for possible cultural, racial, and gender bias. Experts in the field qualitatively examined every item for racial, ethnic, and gender bias. Statistical analyses were also conducted that would indicate the extent to which the items showed mean score differences. Based on both the review of content and statistical evidence of mean score differences, a small number of items (approximately five) were deleted from the DSMD. Development of DSMD Scales Items were organized into statistically and logically derived scales. Statistically derived scales were identified with the aid of item factor analyses. Item factor analyses guided the assignment of items onto six factorially defined scales. The factor-based scales were consistent with generally accepted conceptualizations of developmental psychopathology. There were three superordinate categories within which three pairs of scales could be subsumed. For age 5 through 12, both Conduct and Attention factors could be viewed as Externalizing problems; Anxiety and Depression both suggested an Internalizing dimension; and the Autism and Acute Problems factors together formed a Critical Pathology category. Similarly, for age 13 through 18, the Conduct and Delinquency factors represented

< previous page

page_537

next page >

< previous page

page_538

next page > Page 538

Ages 5-12

TOTAL SCALE

Ages 13-18

Conduct & Attention

Externalizing

Conduct & Delinquency

Anxiety & Depression

Internalizing

Anxiety & Depression

Autism & Acute Problems

Critical PathologyAutism & Acute Problems

Fig. 18.1. Organization of the DSMD scales and composites. Externalizing problems; Depression and Anxiety constituted an Internalizing dimension; and the Autism and Acute Problems factors a Critical Pathology group. This categorization of the factor-based scales formed the basis of the organization of the DSMD into Externalizing, Internalizing, and Critical Pathology composites shown in Fig. 18.1. (See the Validity section for more information about the factor analytic results.) Standardization The DSMD was carefully standardized so that the sample would be representative of the U.S. population. The data collection process included both regular education (for generation of norms) and special education and clinical (for standardization and validity studies) samples of children and adolescents from age 5 to 18. Data were collected from spring through summer 1991 from public school districts, private special education settings, and clinical treatment programs from across the United States. Students who attended regular education classes at least part time were included in the normative sample, as were those enrolled in part-time special education classes and identified as having learning disabilities, speech or language impairments, or other disabilities. Children and adolescents who were receiving either part- or full-time special education services for the seriously emotionally disturbed or the mentally retarded were not included in the normal standardization sample, but those identified as seriously emotionally disturbed were included in the validity studies. Parents and/or teachers rated each of the students. The DSMD standardization sample is comprised of 3,153 children and adolescents from age 5 to 18. The sample was stratified according to age, gender, geographic region, race, ethnicity, socioeconomic status, community size, and educational placement. There were, on average, 225 children across the fourteen 1-year age groups included. Data were collected from sites in 17 states in the four geographic regions: northeast, midwest, south, and west. The sample closely matches the United States on the basis of five major race categories: White, African American, Asian/Pacific Islander, Native American, and other. The proportions of children and youth of Hispanic origin included in the standardization sample are also very similar to that of the U.S. population. Norming Procedures Following collection on the standardization sample, a series of procedures was conducted to ensure the quality of the norms. First, the Total scale raw scores (i.e., the sums of all item raw scores) were examined for age, rater, and gender differences. Results of the analyses of these data indicated that the scores across each of the 14 ages differed

< previous page

page_538

next page >

< previous page

page_539

next page > Page 539

by gender and rater but showed no meaningful age progression within the 5 to 12 or 13 to 18 age groupsthus standard scores by age were not needed. Analyses of the differences in total raw scores by gender were small to moderate but statistically significant (Naglieri, LeBuffe, & Pfeiffer, 1994). For those in the 5 to 12 and 13 to 18 year age groups, the total scale mean raw score was significantly greater for males than for females. These results suggested that separate norms by gender were appropriate. Similarly, analysis of differences by rater (teachers or school staff vs. parents or other appropriate caregiver) varied and therefore separate norms by rater were needed to account for the differences. The final norms for the DSMD are by rater by gender. Normality of the Total Scale Raw Score Distributions The distributions of scores by rater, gender, and age all approached a normal distribution but were positively skewed (Naglieri et al., 1994). In each case, the range of scores extended approximately one standard deviation below the mean and about three standard deviations above the mean. Typically, the Total scale raw score mean and median values differed by approximately 8 to 10 raw score points. Given that the distributions were skewed, normalization was considered but rejected. If a nonnormal raw score distribution represents the distribution of scores that would have been obtained if the entire population were tested, then a linear standard score approach that retains the shape of the original distribution should be used (Crocker & Algina, 1986). Moreover, because it was assumed that emotional functioning, as measured by the DSMD, is not normally distributed (because behaviors associated with developmental psychopathology are not typical in the general population), and current standardization results support this view (normal distributions were not obtained), normalization was further contraindicated. Normalization of the raw scores, therefore, was not conducted because the nonnormal shape of the raw score distribution is believed to accurately reflect the distribution of behaviors associated with severe emotional disturbance in the population (Crocker & Algina, 1986). Derivation of Standard Scores Standard scores for the DSMD were computed separately for the six scales, the Externalizing, Internalizing, and Critical Pathology composites, and the Total scale (Naglieri et al., 1994). T-scores for each scale and composite and for the Total scale, set at a mean of 50 and a standard deviation of 10, were developed. The T-scores for the DSMD Externalizing, Internalizing, and Critical Pathology composites and Total scale, were calculated on the basis of the sum of the T-scores. For each composite, the T-score was based on the sum of the T-scores on the contributing scales (e.g., the Externalizing T-score was based on the sum of the T-scores on the Conduct and Attention scales). The Total scale T-score was calculated on the basis of the sum of the T-scores on the six scales. This method weights each scale equally in the derivation of composite and Total scale scores. Percentile Rank The DSMD percentile scores are based on the actual distributions of scores obtained from the standardization sample (Naglieri et al., 1994). These scores are not the same as those that would be obtained if the distributions of raw scores were normal. Tables

< previous page

page_539

next page >

< previous page

page_540

next page > Page 540

for converting standard IQ scores to percentiles have been reported by many authors who used a normalization procedure (e.g., Wechsler, 1991). The conversion of IQ scores to percentiles in these cases is based on the normal curve. Because DSMD T-scores are based on the actual distribution of Total scale raw scores obtained during standardization and these scores are positively skewed, T-score to percentile rank conversions based on the normal curve technically do not apply to the DSMD, although the differences between the two are not large (Naglieri et al., 1994). Reliability Internal Consistency Reliability. Internal consistency reliability of the DSMD was determined using Cronbach's alpha for each of the 10 scales (Naglieri et al., 1994). The median reliability coefficients (across rater and gender) provided in Table 18.1 show that the scale has excellent internal consistency reliability. These data are especially important because they are used to obtain the standard errors of measurement, which in turn are used to guide interpretation of the scales. For example, to determine the significance of the difference between scores obtained by different raters, comparisons of scores for an individual child, and the comparison of scores over the course of treatment, standard errors of measurement are used to obtain the values needed for significance. Each of these issues is discussed more fully in the Interpretation section of this chapter. Test-Retest Reliability. Test-retest reliability was examined from data obtained from the same rater for the same individual on two occasions. The magnitude of the obtained value informs about the degree to which random changes influence the scores (Anastasi, 1988). Two test-retest reliability studies were conducted for the DSMD: one with a clinical sample, the other with a regular education sample. Participants in the clinical study (n = 48) attended psychiatric day or residential treatment programs at Devereux facilities and represented a wide range of severe psychiatric disorders. Two ratings by the same rater were obtained for each participant over a 24-hour interval. Teachers provided paired ratings for 30 participants who were mostly males (67%) and White (83%), who ranged in age from 10 to 17 years (M = 14.5). The median scale test-retest reliability coefficient was .81. At the composite level, the reliabilities were TABLE 18.1 Median DSMD Reliability Coefficients Across Rater and Gender Scale 5-12 years 13-18 years Externalizing .97 .94 .96 .97 Conduct .84 Attention .75 Delinquency Internalizing .94 .96 .88 .84 Anxiety .89 .93 Depression Critical Pathology .90 .93 .90 .88 Autism .78 .90 Acute Problems Total Scale .98 .98

< previous page

page_540

next page >

< previous page

page_541

next page > Page 541

.89 for Externalizing, .85 for Internalizing, .91 for Critical Pathology, and .90 for the Total scale. Similar results were obtained for a regular education sample from public schools (Naglieri et al., 1994). Interrater Reliability. Interrater reliability for the DSMD was also examined for a sample of 45 children evaluated by their teachers and teacher's aides. The sample resided in the inpatient unit of a children's psychiatric hospital in the midAtlantic area. The paired teachers and teacher's aides completed ratings for the same child within a 1-week period. The results of this study indicate that the DSMD has good interrater reliability with a clinical population. The interrater reliability coefficients were .66 (Conduct), .55 (Attention), .48 (Anxiety), .52 (Depression), .46 (Autism), .44 (Acute Problems), .61 (Externalizing), .44 (Internalizing), .45 (Critical Pathology), and .52 (Total scale). All of the coefficients are significant (p < .01; Naglieri et al., 1994). The median coefficient for the six scales is .50, and for the three composites, .45. Coefficients of this magnitude are typical of those found in interrater studies with behavior rating scales. For example, in their review of 119 articles reporting interrater reliability findings with behavior rating scales, Achenbach, McConaughy, and Howell (1987) found a mean correlation coefficient of .60 with similar informants (e.g., teachers and teacher's aides). Validity The validity of the DSMD was examined through an extensive program designed to evaluate several types of validity evidence. Content-related validity was assessed to examine the extent to which the items included in the test represent the domain(s) being assessed. Construct-related validity was studied to determine the extent to which the DSMD measures the relevant theoretical constructs or traits. Especially important was criterionrelated validation that involved the examination of how the test is related to an individual's performance within a particular domain. These dimensions of validity are summarized in the following sections. For more details, interested readers should consult chapter 4 of the DSMD manual (Naglieri et al., 1994). Construct Validity. The DSMD items were subjected to a series of factor analyses so that the number of factors that best describes the underlying relations among the items could be identified. Analyses were conducted for the 5 to 12 and 13 to 18 age groups separately. Each correlation matrix was subjected to principal component analyses to obtain an indication of the size of the eigenvalues and guidance on the number of factors to consider. Next, principal factor analyses was used with multiple R2 in the diagonal and varimax rotations. Factor Analysis. The results of the factor analyses for the 5 to 12 age group indicated a large first factor (eigenvalue of 34.6) and eight additional factors with eigenvalues greater than 1.0 (Naglieri et al., 1994). Following this first step, factor analyses of four-through nine-factor solutions were obtained. First a nine-factor solution was selected, then eight, followed by seven, and so on until a factor solution that produced the clearest groups of items was found. The solutions with more than six factors had no items with their highest loadings on the sixth, seventh, eighth, or ninth factors. Overfactoring suggested that the six-factor solution was the best solution. This solution also contained the closest approximation of simple structure with the most number of factors that were

< previous page

page_541

next page >

< previous page

page_542

next page > Page 542

interpretable. With this solution, nearly all the items loaded decisively on a factor, and seldom did an item's loading on any other factor approach its highest loading. The results of the factor analyses for the 13 to 18 age group were remarkably similar to those obtained for the younger sample. Initial results indicated there was a large first factor (eigenvalue of 33.6) and eight additional factors with eigenvalues greater than 1.0 (Naglieri et al., 1994). Solutions with more than six factors had no items with their highest loadings on the sixth, seventh, eighth, or ninth factors. Also with the six-factor solution, the items had high loadings on one factor and low loadings on the other factors. Therefore, the six-factor solution appeared to be the best solution. In addition, the six-factor solution was easily interpretable from a conceptual framework based on the present understanding of developmental psychopathology. Confirmatory Factor Analysis. The DSMD organization of six scales into three composites was evaluated with confirmatory factor analyses. LISREL-7 structural equation modeling program (Joreskog & Sorbom, 1989) was employed to contrast models of one, two, and three composite factors with a null-factor model (Naglieri et al., 1992). The organization of the six DSMD scales were examined for the 5- to 12-year-olds and 13- to 18-year-olds in the standardization sample and several indices of model-data fit were computed (Marsh, Balla, & McDonald, 1988). These included LISREL goodness-of-fit index (GFI), the goodness-of-fit index adjusted for degrees of freedom (AGFI), the root mean squared residual (RMSR) index, and the Bentler (1990) comparative fit index (CFI). The results of the fit statistics appear comparable and within acceptable ranges for all three models. Logical and theoretical decisions, as well as the statistical quality of the fit, drove the selection of a three-factor composite model. For children age 5 to 12, the three-factor model yielded a superior GFI (.989), AGFI (.963), and CFI (.993) relative to the alternative models. For adolescents from age 13 to 18, the three-factor solution yielded a higher GFI (.931) and CFI (.935), but a somewhat reduced AGFI (.758) relative to the alternative models. These results provide strong support for the organization of the six scales into three composites. Together with expectations based on current understanding of mental disorders, both statistical and theoretical support for organization of the DSMD was found (Naglieri et al., 1994). Criterion-Related Validity Diagnostic Criterion Groups for Clinical Samples. The sensitivity of the DSMD's six scales and three composites to differences between samples provides important criterion-related evidence related to the diagnostic utility of the scales. The degree to which DSMD scores assist in this process provides a measure of criterion-related validity (Anastasi, 1988) and is especially important in validating the DSMD. Naglieri et al. (1994) provided considerable information about several investigations that compared the mean scale, composite, and Total scale scores for groups of patients with specific psychiatric diagnoses. The six groups included diagnoses of conduct, attention deficit hyperactivity, anxiety, depressive, autistic, and psychotic disorders. These diagnoses are among the most prevalent child and adolescent disorders. The subjects in the diagnostic groups criterion validity studies were carefully selected from a large pool of clinical samples. Participants were included if they had only one DSM-III-R psychiatric diagnosis from one of the six groups already noted. Over 750 cases were reviewed to obtain

< previous page

page_542

next page >

< previous page

page_543

next page > Page 543

Fig. 18.2. Mean scale, composite, and total scores for six diagnostic groups. the 128 cases. Each child or adolescent was rated by a teacher or parent and the results are graphically presented in Fig. 18.2 and discussed here. The results for the conduct disorder sample are clear. The Externalizing composite mean score of 75 is more than one standard deviation greater than the next highest composite mean score (Critical Pathology, 62). The profile of scale mean scores is characterized by elevations on the Conduct scale (M = 71) and the Delinquency scale (M = 75), as would be expected in adolescents with conduct disorders. The third highest scale mean score, 65 on the Acute Problems scale, is the result of items on that scale associated with severe conduct disorders (e.g., "set or threaten to set a fire, hurt or torture animals, run away from home"). The children in the attention deficit hyperactivity disorder (ADHD) sample earned a Total scale mean in the Very Elevated range (73). The highest composite mean score (75) is on the Externalizing composite and consistent with an ADHD diagnosis. The highest mean scores (73 and 73) were found on the Conduct and Attention scales, respectively. The relatively high score on the Autism scale is attributable to a small number of items on that scale related to ADHD (e.g., "show a lack of fear of getting hurt in dangerous activities," "become easily overexcited"). The anxiety disorders group included diagnoses of posttraumatic stress disorder, separation anxiety disorder, obsessive-compulsive, and overanxious disorders and was considered a more heterogeneous group. The Total scale mean for this group is 68, which is in the Elevated range. The sample earned their highest mean score (68) on the Internalizing compositewhich is not unexpected. The sample earned high scores on the Depression scale (68), Acute Problems (67) and Anxiety (65) scales. The profile for the anxiety disorders

< previous page

page_543

next page >

< previous page

page_544

next page > Page 544

sample is not characterized by distinct scale elevations as is apparent for the other groups, which likely reflects the more heterogeneous nature of this sample. The depressive disorders group was a relatively homogeneous sample with each participant having a single diagnosis of major depression. The Total scale mean score for this group is in the Very Elevated range (73). The Internalizing composite mean score of 74 is also in the Very Elevated range. The profile of scores shows a pattern with a distinct elevation on the Depression scale (79), which is clearly consistent with this group's diagnoses. The results for individuals with autism shows that the group had a high Total scale mean (70), which is in the Elevated range. The Critical Pathology composite, which incorporates the Autism scale, has the highest mean score (73). The profile of the six scale scores shows distinct elevations on the Autism scale (79) and the Depression scale (74). The high score on the Depression scale is logical because this scale includes items related to social withdrawal and isolation, which are key clinical features of autism. The results for individuals with psychotic disorders included persons with undifferentiated schizophrenia, psychotic disorder not otherwise specified, paranoid schizophrenia, and brief reactive psychosis. The group's Total scale mean (77) is in the Very Elevated range and is the highest for all the diagnostic criterion samples. The Critical Pathology composite mean score of 79 is also in the Very Elevated range. The profile of mean scores, including elevations on the Depression (77), Autism (76), and Acute Problems (76) scales, reflects the variety of symptoms associated with the different phases in the clinical course of schizophrenia. Summary of Clinical Group Studies. The Total scale mean scores of the six diagnostic groups are all elevated, which indicates that the groups are comparable in their total level of pathology. For each of the six samples there were appropriate elevations in the composite scales. For example, Externalizing was elevated for the conduct disorder and attention deficit hyperactivity disorder samples, Internalizing was elevated for the anxiety disorders and depressive disorders samples, and Critical Pathology was elevated for the autistic and psychotic disorders samples. The profiles are pronounced for the five diagnostic groups that were characterized by relatively homogeneous diagnoses (conduct disorder, attention deficit hyperactivity disorder, depressive disorders, autistic disorder, and psychotic disorders). For the anxiety disorders sample, the complexity of the profile is consistent with the presence of diverse diagnoses. In conclusion, "these data provide strong evidence that the DSMD is sensitive to differing psychiatric diagnoses and that the DSMD can contribute to psychological and psychiatric assessment, diagnosis, and treatment planning" (Naglieri et al., 1994, p. 79). Diagnostic Criterion Groups for Special Education Samples In addition to studying several samples of individuals in clinical settings, Naglieri et al. (1994) also provided results of research with children in special education settings. The results of these studies are provided here and summarized in Table 18.2 by study and age of the sample. The table provides two important statistics. First, in all instances, the experimental and control samples were matched on the basis of age, sex, race, and geographic region. In every study, mean scores were compared between the samples and the percentage of correct classification and the d-ratio between the experimental

< previous page

page_544

next page >

< previous page

page_545

next page > Page 545

TABLE 18.2 DSMD Total Scale Classifications Accuracy for Several Samples of Children Identified as SED and Regular Education by Age Group Ages 5-12 Ages 13-18 Study Percent d-ratio n Percent d-ratio n SED National Special Ed 72.2 1.0 223 65.5 1.4 142 64 69.0 SED Local Special Ed 78.1 1.3 1.2 84 SED/LD 87.5 1.2 88 SED in Psychiatric 69.6 1.5 112 89.7 2.5 58 SED in Residential 81.1 1.7 106 77.8 1.9 180 SED in Clinical Treatment 77.5 1.8 209 72.7 1.6 132 Average 75.7 1.5 77.0 1.6 714 Total n of cases 684 Note. From DSMD manual, p. 99. Copyright © 1992, The Devereux Foundation. Adapted with permission. and control samples was computed. The percentage of correct classification was determined using a cutoff score of 60 on the DSMD Total scale. The efficiency of differentiation between hospitalized individuals and those in the control sample was evaluated by chi-square analysis. That is, all the subjects in the experimental and control groups who had a Total scale T-score of 60 or more (i.e., 1 or more SDs above the mean) and those with a Tscore less than 60 were identified as having significant emotional problems or not, respectively. These results were compared with their actual group membership (experimental vs. control). This provided a measure of the percentage of the total sample correctly identified by the DSMD Total scale score. Additional statistical results, such as multivariate analysis of variance (MANOVA), chi-square analyses, and so on are provided in the DSMD manual (Naglieri et al., 1994). Each study is described next. SED in National Special Education Settings. Seriously emotionally disturbed (SED) students who were receiving public special education services were contrasted to a matched regular education national sample. Both part- and full-time students identified by the local school systems as having a serious emotional disturbance and placed in special educational settings were included in this investigation. The regular education control sample for this investigation was selected from the standardization sample matched on the basis of age, race, ethnicity, and gender. SED in a Local Special Education Setting. Seriously emotionally disturbed students who were receiving public special education services were compared with a matched regular education control sample. Both groups were from greater Philadelphia area. The control sample of regular education students was matched to the SED sample on the basis of age, race, ethnicity, and gender. SED LD in Private Special Education. Dually Diagnosed Learning Disabled (LD) and Emotionally Disturbed students in private educational settings were compared with a normal control sample. The samples were similar on the basis of age, race, ethnic origin, and gender. The students with learning and emotional problems attended a private school, where they received specialized educational intervention as well as group and individual counseling. Students with learning and emotional problems were included

< previous page

page_545

next page >

< previous page

page_546

next page > Page 546

in this sample if school records contained evidence of both educational and psychological problems. Criteria for selection included psychological treatment in school or community-based settings or a history of behavioral problems. SEDPsychiatric. Severely emotionally disturbed students in psychiatric hospitals were compared with a normal control group selected from the standardization sample on the basis of age, race, ethnicity, and gender. Two groups of children from age 5 to 12 and adolescents from age 13 to 18 were included in this study, each group included an experimental (SED) and control sample. SEDResidential. Severely emotionally disturbed children and adolescents in residential treatment were compared with a matched normal control sample. The experimental group included individuals in long-term psychiatric treatment for severe psychological problems. The clinical sample was matched with a control group selected from the standardization sample on the basis of age, race, region (all from the northeast region), ethnicity, and gender. Both 5 to 12 and 13 to 18 year age groups were included. SEDin Clinical Treatment. Clinically diagnosed individuals receiving special education services were compared with a normal control sample that participated in this study. The clinical sample was composed of Devereux Foundation clients served in residential and day special education and psychiatric treatment programs in Arizona, Florida, Massachusetts, New Jersey, and Texas. They exhibited a wide range of psychopathological disorders. The comparison group was drawn from public facilities in Pennsylvania and Kentucky. Any individual who was receiving counseling, psychotherapy, or special education services was excluded from the normal comparison group. A summary of these results is provided in Table 18.2. Included are the Total scale classification accuracy rates by age level for each of the six studies. The results clearly indicate that the DSMD did an excellent job of separating children with various degrees of mental disorders from normal control samples. For age 5 to 12, the accuracy rates varied from 69.6% (Validity Study 4) to 81.1% (Study 5). For ages 13 to 18, the lowest accuracy rate was 65.5% (Study 1), and the highest, 89.7% (Study 4). The average total classification rates are 75.7% and 77.0% for the 5 to 12 and 13 to 18 age levels, respectively (Naglieri et al., 1994). The overall d-ratio values at the two age levels were also very large. These d-ratios indicate the extent to which the group mean scores for children and adolescents with serious emotional problems differ from those for individuals who do not have emotional problems. These results provide considerable support for the criterion validity of the Devereux Scales of Mental Disorders as a measure that can separate those who have emotional disorders from those who do not (for more details, see Naglieri et al., 1994). Interpretation of the DSMD Interpretation of the DSMD involves a carefully prescribed sequence of examination that moves from most general (composite scales) to most specific (item level). Although attention is paid to the three basic elements of the scaleExternalizing, Internalizing, and Critical Pathologyscales within these three composites are carefully examined. Finally, analysis of the individual item scores is used to facilitate treatment planning. Follow-up analysis includes comparisons of pre- and posttest scores on the three composites and six scales.

< previous page

page_546

next page >

< previous page

page_547

next page > Page 547

Definition of Composite Scales Externalizing behaviors involve problems that relate to a person's conflicts with others in the environment. A child or adolescent with a high score on the Externalizing composite will likely be seen as aggressive, disobedient, annoying, disruptive, undercontrolled, restless, or inattentive. This includes Conduct (disruptive and hostile [BAD TEXT] in which the basic rights of others are violated and age-appropriate norms are disregarded), Attention (difficulty with concentration and distractibility), and Delinquency (those behaviors considered to be out of accord with societal standards or the law) problems. Internalizing problems involve excessive worrying, social withdrawal, anxiety, and overcontrol or inhibition associated with the individual's state of well-being. Included are behaviors reflecting Anxiety (excessive worry, fears, tension, low self-concept, and somatic complaints) and signs of Depression (withdrawal from social contacts, depressed mood, and decreased interest or pleasure in activities). Critical Pathology behaviors represent severe mental, behavioral, or emotional disturbances. This includes behaviors that are often symptomatic of an individual who may be out of contact with reality resulting in disruption of daily functioning. The behaviors in the Critical Pathology composite scale are not seen in normal individuals and are problematic if they occur at all. This includes behaviors that indicate impaired social interaction and communication, such as disorganized and echolalic speech, inappropriate social interactions, and odd motoric responses (Autism scale). The hallucinatory and bizarre behaviors described in the Acute Problems scale are typical of individuals with severe psychological disturbances that can be described as psychotic. Descriptions of the DSMD Total scale and its composites and scales are summarized in Fig. 18.3. DSMD Scores The DSMD Total scale, three composites, and six scales all have a mean of 50 and a standard deviation of 10. The Total scale T-score is the most reliable way of describing a child's or adolescent's performance across the areas measured by the six scales of the DSMD. The scores on the composites (Externalizing, Internalizing, and Critical Pathology) and scales (Conduct, Attention, Anxiety, Depression, Autism, and Acute Problems for ages 5-12 and Conduct, Delinquency, Anxiety, Depression, Autism, and Acute Problems for ages 13-18) describe the individual's performance at more specific levels, and therefore, are most informative. These scores are further described using percentile and categorical ratings. Naglieri et al. (1994) suggested that a cutoff score of 60 (1 SD above the mean) could be used for determining when a person's scores on the DSMD depart substantially from the average for the standardization sample and therefore are indicative of significant problems reported by the rater. Summary of Interpretive Steps for the DSMD The following steps should be followed when interpreting the DSMD. These steps are listed first, then described in some detail. The steps presented graphically in flow charts in Figs. 18.4 and 18.5 (see Naglieri et al., 1994, for more information), are as follows: 1. Examine the Total Test, and Externalizing, Internalizing, and Critical Pathology scales to determine if any fall above a T-score of 59.

< previous page

page_547

next page >

< previous page

page_548

next page > Page 548

Fig. 18.3. Description of the DSMD scales and composites. 2. Compare the Externalizing, Internalizing, and Critical Pathology scales to determine if any one of these is significantly higher than the child's or adolescent's mean. 3. Compare the Conduct, Attention (age 5-12), Delinquency (age 13-18), Anxiety, Depression, Autism, and Acute Problem scales to determine if any one of these is significantly higher than the child's or adolescent's mean. 4. Analyze the items to determine which problem items were rated significantly high for the child and warrant consideration in treatment planning. 5. Compare ratings obtained from parents and teachers to determine if the child is rated differently in different environments or by different raters. 6. Compare pre- and posttest scores following treatment to determine effectiveness of interventions. Step 1 Initial Examination of Total and Composite Scores. Step 1 is a descriptive one that involves examination of the T-scores on the broad scales. The cutoff score of 60 is suggested as a point to determine that significantly problems exist, but should not be used rigidly. Practitioners should consider the advantages and disadvantages of higher or lower values based on the aims of the evaluation and the specific populations of interest.

< previous page

page_548

next page >

< previous page

page_549

next page > Page 549

Fig. 18.4. Flow chart for interpretation of the DSMD. Steps 2 And 3 Intraindividual Comparisons. Comparisons of the three composites as well as the six scales are accomplished using intraindividual comparisons. For example, comparing the six scale T-scores allows the clinician to determine if one or more of those scores are significantly elevated relative to the child's or adolescent's average of those scores. The purpose of such an analysis is the identification of any scale (or composite) score or scores that are significantly greater than the child's or adolescent's mean and thereby identify particular areas of concern. Because the T-scores for the six scales (or for the three composites) are compared for an individual for a single rating, this is an intraindividual approach to interpretation. This interpretive perspective, often used in interpreting intelligence test results (Naglieri, 1993; Sattler, 1988), is based on the comparison of each scale T-score to the individual's own mean T-score and was originally described by Davis (1959).

< previous page

page_549

next page >

< previous page

page_550

next page > Page 550

Fig. 18.5. Flow chart for interpretation of the DSMD. The procedure for identifying significant differences between an individual's scale T-scores and the mean of scale T-scores or between composite T-scores and the mean of composite T-scores are basically the same. The following example illustrates the steps for scale T-scores using the data provided in Table 18.3. This illustration involves an 8-year-old female based on a rating by her mother (parent rater). To determine if any of the six Tscores is significantly greater than the girl's mean T-score, refer to the values in Table B.1 from the DSMD manual (Naglieri et al., 1994, pp. 198-199) and then: 1. Calculate the average of the six T-scores (in this case, 55.5). Subtract the mean from each of the scale T-scores to obtain the difference score (the difference between each score and the

< previous page

page_550

next page >

< previous page

page_551

next page > Page 551

TABLE 18.3 Intraindividual Interpretative Example Scale T-Score Difference from Difference Result Child's Mean Required at 90% 2.5 Conduct 6.1 58 Not Significant 12.5 Attention 9.9 68 Significant -0.5 Anxiety 8.3 55 Not Significant -2.5 Depression 8.7 53 Not Significant -1.5 Autism 8.3 54 Not Significant -10.5 Acute Problems 9.9 45 Not Significant Child's Mean 55.5 Note. = T-score minus the child's average T-score. Negative difference scores are ignored because the goal of intraindividual comparisons is to identify high scores. child's average score). A positive sign indicates the scale T-score is above the child's mean; a negative sign indicates the scale T-score is below the mean. Ignore negative difference scores because the aim of the intraindividual comparisons is to identify high scores only. 2. Compare the difference scores to the values provided in Table B.1 of the DSMD manual. A difference score equal to or greater than the tabled value (and positive in sign) indicates that the scale T-score is significantly greater than the person's mean scale T-score. In this example, the Attention scale is an area of particular difficulty for this child, relative to her other scale T-scores. Step 4 Problem Item Identification. Identification of specific problem items is the most detailed level of DSMD analysis. This level of analysis reflects specific behavioral difficulties that contribute to the overall T-scores for the scales and composites. The method of identifying specific problem items was developed following a similar approach to the one used by Naglieri, McNeish, and Bardos (1991). The method specifies that when an individual item score exceeds the normative mean item score plus one standard deviation it is considered significant. The values needed to identify a particular item as significant are provided in the DSMD manual (Tables C.1 and C.2) and noted on the DSMD record form. These scores indicate that the rater reported the child or adolescent is experiencing significant difficulty with a particular behavior to the extent that the item score is outside of the normal range. Problem item identification provides important information for interpretation and intervention. First, problem item identification allows the practitioner to explain the nature of the behaviors that may contribute to a high score, for example, in the Conduct scale. The items rated at the problem item level are most important and specifically clarify the nature of the reported problem. Second, problem item analysis yields information that is very relevant for treatment planning. The items that were identified can be particularly useful for the development of behavioral goals and/or objectives in individual treatment or education plans. Step 5 Comparison of Scores Across Raters. The standard error of the difference between scores must be taken into consideration when a practitioner compares DSMD scores obtained from different raters (Naglieri et al., 1994). Total scale T-scores based on ratings from a child's mother and father, for example, will differ as a function of measurement error. In order to account for measurement error, the DSMD manual provides the standard error of the difference between two scores as a method of

< previous page

page_551

next page >

< previous page

page_552

next page > Page 552

determining when T-scores based on ratings by both parents, by two teachers, or by a parent and a teacher differ significantly. Naglieri et al. (1994) provided the differences between T-scores needed for significance based on ratings obtained from two raters. Practitioners simply compare the two scores and contrast the result to tabled values (at the 90% level of significance). For example, if a 16-year-old female earned T-scores of 60 and 71 on the Anxiety scale, when rated by a parent and teacher, respectively. The 11-point difference is significant (the value reported in Table D.3. on page 207 of the DSMD manual is 10). The next step in interpretation of the DSMD is conducted after treatment is provided. For this reason, the sixth step in interpretation is presented in the next section. Use of the DSMD for Treatment Monitoring The final step in the interpretation of the DSMD is the examination of change in scores over the course of treatment. This section, therefore, begins with the examination of pre- and posttest scores. The method described here is intended to provide a statistically supported evaluation of treatment effectiveness. Step 6 Pretest-Posttest Comparisons Continuing with the last step in Fig. 18.5, changes in a child's or adolescent's T-scores over the course of treatment can be evaluated using the DSMD. To do so, the statistical significance of the difference between pretest and posttest scores should be determined to account for psychometric issues. Naglieri et al. (1994) used a method described by Atkinson (1991), which requires the comparison of the pretest score with a range of scores that represents the variability expected by both regression to the mean and measurement error. The posttest score is compared to a range of scores provided in Appendix E of the DSMD manual. For example, if an 8-yearold male is initially rated by a teacher and earns a Conduct score of 76, a range of 72 to 79 provides the scores that can be expected due to measurement error and regression effects. If the boy was rated by the teacher after treatment and earned a score less than 72, then because posttest T-score would fall outside the range provided in the table, the change indicates a significant pre-post treatment effect. The evaluation of the effectiveness of treatment outcomes based on the DSMD should also involve the dual criteria of statistically reliable and clinically meaningful change. Jacobsen and Truax (1991) recommended these two criteria. The first statistically reliable change is addressed through use of the standard error of prediction. The second is clinical meaningfulness of the change. That is, the closer the posttest score is to the normative mean, the more meaningful the change. Posttreatment scores are considered optimal if they fall below 60 (1 SD above the mean). Posttreatment scores that are significantly lower than the initial scores, but still above 60, indicate that improvement has been shown but behavioral problems are still present. For example, if a 10-yearold male obtains a pretreatment Total DSMD T-score of 70 as rated by his teacher and a posttreatment score of 64, the difference is significant (range = 65-74 for a pretest Total scale score of 70). Because the posttest score (64) falls below this range, there has been a significant change in behavior as reflected by the DSMD score. Because the posttreatment

< previous page

page_552

next page >

< previous page

page_553

next page > Page 553

score is above 60, this means that problems are still indicated, but they are of a lesser severity than initially found. DSMD and Treatment Monitoring: Conclusions A crucial decision in the design and implementation of treatment monitoring and outcome efforts is the selection of appropriate dependent measures. To be maximally useful, the measures need to satisfy a variety of technical and practical criteria (Newman & Ciarlo, 1994). Perhaps most important, the measures need to be directly related to the goals of the treatment plan (i.e., have high content validity). The monitoring or outcome instrument should examine ''the most important and frequently observed symptoms, problems, goals or other domains of change for the (individual or) group" (Ciarlo, Brown, Edwards, Kiresuk, & Newman, 1986, p. 26). In addition, the measures need to be standardized, well developed, and enjoy strong psychometric properties. Of special relevance is the need for the measures to be reliable and sensitive indices of changes in the child or adolescent's behavior over the course of treatment. Treatment monitoring affords the client, the family, the therapist, and the funding source with reliable and clinically meaningful "real-time" data to gauge the ongoing success of the planned interventions. In this way, all parties are able to evaluate progress. Youngsters get feedback on the relative success of their efforts in working toward the agreed-on treatment goals. The parents similarly receive data on their youngsters' progress. Perhaps most important, the therapist is provided with reliable, timely feedback in the form of specific behavioral data on how well the treatment is progressing. The monitoring allows the therapist to make midcourse adjustments that reflects sound clinical practice and is the hallmark of a continuous quality improvement philosophy (Pfeiffer & Shott, 1996; Vermillion & Pfeiffer, 1993). The DSMD meets all these criteria and is therefore an excellent method for treatment monitoring. Use of the DSMD for Treatment Planning Until recently, the history of behavioral health care has been marked by a lack of attention to formalized treatment planning. Previously, most mental health providers offered at best only a sketchy and vague plan that did not include either a clinical formulation or implementation plan for the treatment, much less a set of measurable treatment goals. However, the health care field has entered an era of accountability (Linder, 1991). Governmental and private funders of mental health care are increasingly demanding outcome data to justify expenditures and select providers. Mirin and Namerow (1991) predicted that "reliable data about the outcomes of care will be essential in demonstrating that particular types of mental health care are worth paying for" (p. 1008). Reliable outcome data necessitate a more carefully crafted treatment plan that meets the client's needs, is individualized and specific, and measurable in terms of setting goals and objectives that can be used to chart the client's ongoing progress and ultimate outcome. A detailed written treatment plan benefits not only the client, therapist, and insurance company, but also the psychotherapeutic process. The client is served by a written treatment plan in that it stipulates the focus of the treatment and the specific outcomes that the client and therapist are jointly and collaboratively working toward

< previous page

page_553

next page >

< previous page

page_554

next page > Page 554

(Jongsman, Peterson, & McInnis, 1996). Therapists benefit because the plan serves as a road map that guides the treatment process and selection of therapeutic interventions, and keeps them on course working toward the resolution of the agreed on therapeutic goals. The DSMD is only one important instrument used in the development of a treatment plan. The foundation of a well-conceived treatment plan includes a thorough biopsychosocial assessment, with particular attention to both historical and present factors that contribute to the client's presenting problem, as well as the anticipated resolution of the problem. In working with children and adolescents, in particular, the clinician will want to include pertinent information on family issues, medical and early developmental issues, school and academic concerns, current stressors and resourcesboth individual and within the family and communitycurrent physical health, and a host of case-relevant psychosocial, emotional, behavioral, and interpersonal factors. The DSMD can be an important measure of a youngster's present psychiatric status that helps guide the development of the problem definition, case formulation, planned intervention, and anticipated treatment goals. Treatment planning begins with the identification of a very select number of the most significant problems. The clinician benefits from identifying no more than one or two highest priority primary problems (see DSMD intraindividual method later), with perhaps an equally limited number of secondary significant problems (see DSMD scale and problem item analysis). This does not intimate that the client is not necessarily presenting with a multitude of significant behavioral concerns. Rather, it simply implies that the focus of treatment needs to start somewhere, and that it is best to begin with a prioritization of issues so that the youngster, family, and therapist can together work toward the resolution of a select number of specific issues. The DSMD is very well suited to play a key role in this process. The DSMD provides the clinician with a methodology to identify specific symptoms that can guide the clinician in prioritizing behaviors rated as problematic by the youngster's parents and teachers. Additionally, the DSMD affords a linkage with the diagnostic criteria and codes found in the Diagnostic and Statistical Manual of Mental Disorders (APA, 1994), and behaviorally specific statements that can serve as a platform or starting point for personally crafted treatment goals and objectives. Use of DSMD with Other Data The DSMD, like any psychological test, is most appropriately used in treatment planning when incorporated with other diagnostic and clinical information. Although the DSMD is an instrument that provides ratings on descriptive behaviors of relatively low inference and high reliability, it is most useful when used in conjunction with other data to corroborate clinical hypotheses. Specifically, the DSMD should be used in concert with information obtained from a developmental, medical, and school history, behavioral observations (preferably conducted in a variety of settings), clinical interview with the child and parents, and other appropriate diagnostic information (e.g., self-report measure, personality test). Ultimately, the various sets of diagnostic information should complement the development of a detailed functional analysis of the youngster's behavior, with particular attention to both strengths and areas of concern, as well as a road map to guide intervention. The DSMD affords the clinician a broad landscape of topographical descriptors of potential problem behaviors. In essence, the DSMD provides in an efficient and highly

< previous page

page_554

next page >

< previous page

page_555

next page > Page 555

reliable fashion a focus on the "what" of problem behaviors. The clinician still needs to incorporate other information in developing a functional analysis of the problem behaviorsthe "what for" that is driving or supporting the child's problems. Potential Use or Limits for Treatment Planning in a Managed Care Setting There is considerable utility and value in using the DSMD within a managed care environment, especially because the scale allows the clinician to efficiently assess, identify the problem, and formulating a reasonable (i.e., not costly, attainable) and measurable treatment plan. The DSMD is helpful to the clinician because the items readily translate into behaviorally measurable treatment goals. Items can easily be identified as high priority and therefore selected by the clinician, in collaboration with the youngster and parents, as most needing intervention. Additionally, the scale lends itself to evaluation of treatment effectiveness (see DSMD pre-post section). Use of DSMD for Treatment Outcomes Assessment The DSMD is ideally suited for use in mental health treatment outcome assessment. As mentioned earlier, the content of the scale is derived primarily from the diagnostic criteria of the Diagnostic and Statistical Manual of the American Psychiatric Association (DSM-IV; American Psychiatric Association, 1994). In addition, it reflects the full range of psychopathology seen in childhood and adolescence, including the more severely disturbed disorders that are often missing from other scales. For instance, the DSMD includes items related to stereotypy, echolalia, fire setting, self-stimulatory and self-abusive behaviors, and hurting and torturing animals. A unique feature of the DSMD regarding treatment outcome assessment is the pre-post comparison methodology. The dual criterion approach, which recognizes behavioral changes that are both statistically significant (i.e., reliable) and clinically meaningful (i.e., socially valid), provides practitioners with an effective treatment outcome methodology. The first aspect of this dual criterion addresses the issue of how large the change in T-scores must be to enable the therapist to conclude that there has been change in behavior, that is, that the pretest-posttest difference reflects "real differences as opposed to ones that are illusory, questionable or unreliable" (Jacobson & Truax, 1991, p. 12). The second aspect of the dual criterion addresses the social validity or real-life meaning of the noted changes in the youngster's behavior. Newman and colleagues (Green & Newman, 1996; Newman & Ciarlo, 1994) provided a useful set of criteria for selecting psychological instruments for treatment outcome assessment. The criteria include relevance to the target group (standardization sample); a simple measure with good instructions and administration manual; a measure with objective referents (for which concrete examples are given at key points on the rating scale); use of multiple respondents; psychometric strength, including reliability, validity, and treatment sensitivity; low cost of the measure; readily understood by nonprofessional audiences; easy feedback and uncomplicated interpretation; useful in clinical services; and compatibility with clinical theories and practices. The DSMD meets all of these criteria and is therefore an excellent tool for treatment outcome assessment.

< previous page

page_555

next page >

< previous page

page_556

next page > Page 556

Potential Use for Service Report Cards Governmental and private providers of mental health care are increasingly relying on monitoring and outcome data to justify expenditures, and to select and continue with providers. Reliable clinical data about the outcomes of mental health care is now essential in demonstrating that particular types of mental health services are worth paying for. Accrediting bodies such as the Joint Commission on the Accreditation of Health Care Organizations are systematizing the collection and reporting of outcome indicators, leading O'Leary (1993) to assert that report card day is coming for health care providers. Individual consumers of mental health services are becoming increasingly sophisticated with demonstrable value for their health care dollar. Because of the strengths of the DSMD already articulated, it can be considered an excellent outcome measure to include in a report card. Case Study Selected Background Information The case of Vincent is marked by considerable familial turmoil. Vincent, age 7, is a boy who has a history of behavior problems that have begun to have significant impact on his performance at school. His anxiety problems may be related to the marital discord of his parents, which has been present for at least the last 6 years, and the recent finalization of the divorce. He now resides with his father. Vincent's mother lives some distance away, in a different state. DSMD Results The DSMD Scoring Assistant provides scores based on ratings by Vincent's father. Vincent's overall patterns of behavior were not extreme enough to attain an elevated DSMD rating (see Fig. 18.6). His Total scale T-score of 57 (90% confidence interval = 54-60) is in the borderline range. A borderline score indicates that Vincent may be experiencing problems in certain areas or with specific behaviors. Vincent was rated as exhibiting a higher level of behavioral disturbance than 81% of children. Significant elevations were found on the Internalizing composite and on the Attention and Depression scales. Vincent's highest rating was on the Internalizing composite, a measure of behaviors that reflects the individual's state of psychological well-being. Internalizing behaviors are those that include excessive worrying, social withdrawal, anxiety, and overcontrol or inhibition. On this composite Vincent received a T-score of 67 (90% confidence interval = 61-70), which falls in the elevated range. This score indicates that Vincent is experiencing more symptoms in this area than 93% of children. The Internalizing composite is comprised of the Anxiety and Depression scales on which Vincent obtained T-scores of 51 (90% confidence interval of 45-57) and 82 (90% confidence interval of 72-84), respectively. The Intraindividual Comparison, which compares individual scale

< previous page

page_556

next page >

< previous page

page_557

next page > Page 557

Client Information Name: Vincent Gender: Male Date of Rating: 10/3/97 Date of Birth: 10/3/90 Age: 7 years Rater: Frank M. Relationship to Child: Father Norms Used: Parent Scale Score Analyses (significant findings are indicated by shaded areas) DSMD Score Summary Scale

Raw

Vincent's

90%

Composite

Score

T Score

Confidence

Percentile

Interval Conduct

29

44

41-48

30

Attention

13

65

56-69

93

55

51-58

72

Externalizing Anxiety

16

51

45-57

62

Depression

37

82

72-84

99

67

61-70

93

Internalizing Autism

8

50

45-55

64

Acute Problems

0

45

38-54

49

Critical Pathology

47

42-53

53

Total

57

54-60

81

* This table illustrates Vincent's functioning in comparison to a normative group. T scores have a mean of 50 and a standard deviation of 10; a score of 60 or higher indicates an area of concern. Fig. 18.6. Devereux Scales of Mental Disorders: child form, ages 5-12. and composite scores to Vincent's mean scale and composite scores, indicated significant elevations on the Internalizing Composite and the Depression scale (see Fig. 18.7). It is apparent from these data that Vincent is experiencing attention problems as well as signs of depression. His DSMD Attention score of 65 is elevated, but his Depression score of 82 is significantly higher than the mean of all his six T-scores on the separate scales of the DSMD (see the section on Intraindividual Comparison). This means that the most severe area of concern is with signs of Depression, which should be the primary focus of intervention. Attention problems, although noted and important, are secondary to problems with Depression. Further analysis of the DSMD Problem Items indicates that Vincent's depressed mood is apparent by high ratings on items involving Somatic Complaints (difficulty sleeping and complaining of physical problems), lowered affect (appearing discouraged, unhappy, etc.), and interpersonal problems such as social isolation, emotional variability, poor self-esteem, and dependency. His high level of anxiety is apparent by his inattention

< previous page

page_557

next page >

< previous page

page_558

next page > Page 558

vincent's Difference Score Significant?* (T Score - Mean T T Score Score)

(p

< previous page

page_559

next page > Page 559

DSMD Considerations Vincent's behavior should be assessed by multiple raters in multiple environments to examine the consistency of behavioral ratings. These ratings should then be compared using the DSMD Interrater Comparison Procedure (Devereux Scales of Mental Disorders manual, p. 107) to assess the situational specificity or generality of Vincent's behavioral problems. Treatment planning should consider the reports of the following: 1. Somatic Complaints: A medical evaluation may be advisable to eliminate possible physiological determinants of Vincent's physical complaints. 2. Social Isolation: Frank M. reported Vincent to be socially withdrawn. Social skills training may be an appropriate intervention to increase and encourage social contacts. 3. Depressed Affect: Vincent is reported to exhibit behaviors reflecting Depressed Affect, indicating that he may be suffering from a Mood Disorder. This possibility should be investigated using the DSM-IV Inquiry and other appropriate assessment techniques. 4. Vincent is at risk of truancy. Close contact should be maintained with his school. 5. A medical evaluation is recommended to rule out a physical cause for Vincent's somatic complaints. Vincent's response to interventions should be measured using the DSMD Treatment Outcome Evaluation Procedure (Devereux Scales of Mental Disorders manual, p. 109). Because Vincent's Total scale T-score was not in the clinical range (60 or higher), the effectiveness of interventions should be assessed at the composite, scale, or item level. The following posttest T-scores would indicate reliable improvement: Internalizing composite, 59; Attention scale, 52; Depression scale, 69. Conclusions This chapter has provided a summary of information more fully discussed in the DSMD manual (Naglieri et al., 1994). Emphasis in this chapter, like in the DSMD manual, has been on illustrating how the rating scale can be used to assist in the identification of mental disorders. In addition, it is clear from the information presented that the DSMD gives practitioners and researchers a method that has high reliability, considerable validity support, and numerous interpretation methods. Most important are the psychometric methods for interpretation, including comparison of rater scores, comparisons of scores for an individual child, determination of specific behaviors to target in treatment, and determination of the effectiveness of treatment through examination of pre-post scores. The dual criterion of reliable and clinically meaningful change, in conjunction with other interpretive methods, provides clinicians with a tool to meet today's standards for identification, treatment monitoring, and determination of treatment effectiveness. References Achenbach, T.M., McConaughy, S.H., & Howell, C.T. (1987). Child adolescent behavioral and emotional problems: Implications of cross informant correlations for situational specificity. Psychological Bulletin, 101, 213-232.

< previous page

page_559

next page >

< previous page

page_560

next page > Page 560

American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd ed., rev.). Washington, DC: Author. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. American Psychiatric Association Task Force on DSM-IV. (1991). DSM-IV options book: Work in progress (7/1/91). Washington, DC: Author. Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan. Atkinson, L. (1991). Three standard errors of measurement and the Wechsler Memory Scale-Revised. Psychological Assessment, 3, 136-138. Bentler, P.M. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238-246. Ciarlo, J.A., Brown, T.R., Edwards, D.W., Kiresuk, T.J., & Newman, F.L. (1986). Assessing mental health treatment outcome measurement techniques (DHHS Publication No. ADM 86-1301). Washington, DC: U.S. Government Printing Office. Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston. Dale, E., & O'Rourke, J. (1981). The living word vocabulary. Chicago: World Book Childcraft International. Davis, F.B. (1959). Interpretation of differences among averages and individual test scores. Journal of Educational Psychology, 50, 162-170. Dougherty, D.M., Saxe, L.M., Cross, T., & Silverman, N. (1987). Children's mental health: Problems and services. Durham, NC: Duke University Press. Garfinkel, B.D., Carlson, G.A., & Weller, E.B. (Eds.). (1990). Psychiatric disorders in children and adolescents. Philadelphia: Saunders. Green, R., & Newman, F. (1996). Criteria for selecting instruments to assess treatment outcomes. In S.I. Pfeiffer (Ed.), Outcome assessment in residential treatment (pp. 29-48). Binghamton, NY: Haworth Press. Hooper, S.R., Hynd, G.W., & Mattison, R.E. (Eds.). (1992). Child psychopathology: Diagnostic criteria and clinical assessment. Hillsdale, NJ: Lawrence Erlbaum Associates. Jacobsen, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Jongsman, A.E., Peterson, L.M., & McInnis, W.P. (1996). The child and adolescent psychotherapy treatment planner. New York: Wiley. Joreskog, K.G., & Sorbom, D. (1989). LISREL 7: A guide to the program and applications (2nd ed.). Chicago, IL: SPSS. Lewis, M., & Miller, S.M. (Eds). (1990). Handbook of developmental psychopathology. New York: Plenum. Linder, J.C. (1991). Outcomes measurement: compliance tool or strategic initiative? Health Care Management Review, 4, 21-33. Marsh, H.W., Balla, J.R., & McDonald, R.P. (1988). Goodness of fit indexes in confirmatory factor analysis: The effect of sample size. Psychological Bulletin, 103, 391-410. Mirin, S.M., & Namerow, M.J. (1991). Why study treatment outcome? Hospital and Community Psychiatry, 10, 1007-1013. Naglieri, J.A. (1993). Pairwise and ipsative comparisons for the WISC-III IQ and index scores. Psychological Assessment, 5, 113-116. Naglieri, J.A., LeBuffe, P.A., & Pfeiffer, S.I. (1994). Devereux Scales of Mental Disorders. San Antonio, TX: The Psychological Corporation. Naglieri, J.A., McNeish, T.J., & Bardos, A.N. (1991). Draw a Person: Screening procedure for emotional disturbance. Austin, TX: Pro-Ed. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. O'Leary, D.S. (1993). The measurement mandate: Report card day is coming. Journal for Quality Improvement, 19, 487-491. Pfeiffer, S.I. (1989). Follow up of children and adolescents treated in psychiatric facilities: A methodology

review. The Psychiatric Hospital, 20, 15-20. Pfeiffer, S.I., & Shott, S. (1996). Treatment outcomes assessment: Conceptual, practical, and ethical considerations. In C.E. Stout

< previous page

page_560

next page >

< previous page

page_561

next page > Page 561

(Ed.), The complete guide to managed behavioral healthcare (pp. 1-11). New York: Wiley. Sattler, J. (1988). Assessment of children (3rd ed.). San Diego: J.M. Sattler. Spivack, G., & Levine, M. (1964). The Devereux Child Behavior Rating Scales: A study of symptom behaviors in latency age atypical children. American Journal of Mental Deficiency, 68, 700-717. Spivack, G., & Spotts, J. (1966). Devereux Child Behavior Rating Scale. Devon, PA: The Devereux Foundation. Spivack, G., Spotts, J., & Haimes, P.E. (1967). Devereux Adolescent Behavior Rating Scale. Devon, PA: The Devereux Foundation. Vermillion, J.M., & Pfeiffer, S.I. (1993). Treatment outcome and continuous quality improvement: Two aspects of program evaluation. The Psychiatric Hospital, 24(12), 9-14. Wechsler, D. (1991). Wechsler Intelligence Scale for Children (3rd ed.). San Antonio, TX: The Psychological Corporation.

< previous page

page_561

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_563

next page > Page 563

Chapter 19 Treatment Planning and Evaluation with the BASC: The Behavior Assessment System for Children R.W. Kamphaus University of Georgia Cecil R. Reynolds Texas a & M University Nancy M. Hatcher University of Georgia This chapter describes the Behavior Assessment System for Children (BASC; Reynolds & Kamphaus, 1992), a multimethod, multidimensional approach to evaluating the behavior and self-perceptions of children from age 4 to 18, and its new variant, the BASC ADHD Monitor (Kamphaus & Reynolds, 1998). The original BASC is multimethod in that it has the following five components, which may be used individually or in any combination: a self-report scale on which children can describe their emotions and self-perceptions two rating scales, one for teachers and one for parents, which gather descriptions of children's observable behavior a structured developmental history a form for recording and classifying directly observed classroom behavior The BASC is multidimensional in that it measures numerous aspects of behavior and personality, including positive (adaptive) and negative (clinical) dimensions. Teacher Rating Scales The Teacher Rating Scale (TRS) has three forms with items targeted at three age levels: preschool (age 2 1/2-5), child (age 6-11), and adolescent (age 12-18). The forms contain descriptors of behaviors that the respondent rates on a 4-point scale of frequency, ranging from "Never" to "Almost always." The TRS takes 10 to 20 minutes to complete. The TRS assesses clinical problems in the broad domains of Externalizing Problems, Internalizing Problems, and School Problems. It also measures Adaptive Skills. Table 19.1 shows the scales for all levels of the TRS. The slight differences between levels are due to developmental changes in the behavioral manifestations of child problems.

< previous page

page_563

next page >

< previous page

page_564

next page > Page 564

TABLE 19.1 Composites and Scales in the TRS and PRS Teacher Rating Scales Parent Rating Scales Composite/Scale Preschool Child Adolescent Preschool Child Adolescent Externalizing Problems * * * * * * * * * * * * Aggression * * * * * * Hyperactivity * * * * Conduct Problems Internalizing Problems * * * * * * * * * * * * Anxiety * * * * * * Depression * * * * * * Somatization School Problems * * * * * * * * Attention Problems * * Learning Problems (Other Problems) * * * * * * Atypicality * * * * * * Withdrawal Adaptive Skills * * * * * * * * * * Adaptability * * * * * Leadership * * * * * * Social Skills * * Study Skills Behavioral Symptoms Index * * * * * * Note. Italicized scales compose the Behavioral Symptoms Index. Reprinted with permission from Reynolds and Kamphaus (1992, p. 3). Copyright © 1992 by American Guidance Service, Inc. Nevertheless, scales and composites with the same name contain essentially the same content at all age levels. In addition to scale and composite scores, the TRS provides a broad composite, the Behavioral Symptoms Index (BSI), that assesses the overall level of problem behaviors. The TRS may be interpreted with reference to national age norms (general, female, or male) or to clinical norms. In addition, selected critical items may be interpreted individually. The TRS includes a validity check in the form of an F (''fake bad") index designed to detect a negative response set on the part of the teacher doing the rating. Parent Rating Scales The Parent Rating Scales (PRS) is a comprehensive measure of a child's adaptive and problem behaviors in community and home settings. The PRS uses the same four-choice response format as the TRS and takes 10 to 20 minutes to complete. Like the TRS, the PRS has three forms at three age levels: preschool, child, and adolescent. The age levels of the PRS are similar in content and structure. Table 19.2 shows the scale definitions of the PRS and TRS. The PRS assesses almost all of the clinical problem and adaptive behavior domains that the TRS measures. However, the PRS does not have a School Problems composite,

< previous page

page_564

next page >

< previous page

page_565

next page > Page 565

TABLE 19.2 Teacher and Parent Rating Scale Definitions Scale Definition Adaptability The ability to adapt readily to changes in the environment Anxiety The tendency to be nervous, fearful, or worried about real or imagined problems Aggression The tendency to act in a hostile manner (either verbal or physical) that is threatening to others Attention The tendency to be easily distracted and unable to concentrate Problems more than momentarily Atypicality The tendency to behave in ways that are immature, considered "odd," or commonly associated with psychosis (such as experiencing visual or auditory hallucinations) Conduct The tendency to engage in antisocial and rule-breaking Problems behavior, including destroying property Depression Feelings of unhappiness, sadness, and stress that may result in an inability to carry out everyday activities (neurovegetative symptoms) or may bring on thoughts of suicide Hyperactivity The tendency to be overly active, rush through work or activities, and act without thinking Leadership The skills associated with accomplishing academic, social, or community goals, including, in particular, the ability to work well with others Learning The presence of academic difficulties, particularly in Problems understanding or completing schoolwork Social Skills The skills necessary for interacting successfully with peers and adults in home, school, and community settings Somatization The tendency to be overly sensitive to and complain about relatively minor physical problems and discomforts Study Skills The skills that are conducive to strong academic performance, including organizational skills and good study habits Withdrawal The tendency to evade others to avoid social contact Note. The PRS does not include TRS Learning Problems, Study Skills, or School Problems composite scales. Reprinted with permission from Reynolds and Kamphaus (1992, p. 48). Copyright © 1992 by American Guidance Service, Inc. nor does it include the two TRS scales that are best observed by teachers (Learning Problems and Study Skills). The PRS offers the same norm groups as the TRS: national age norms (general, female, and male) and clinical norms. Like the TRS, the PRS includes an F index as a check on the validity of the parent ratings, and critical items that may be interpreted individually. Self-Report of Personality The Self-report of Personality (SRP) is an omnibus personality inventory consisting of statements that are

responded to as True or False. The SRP, which takes about 30 minutes to complete, has forms at two age levels: child (age 8-11) and adolescent (age 12-18). These levels overlap considerably in scales, structure, and individual items. Both levels have identical composite scores: School Maladjustment, Clinical Maladjustment, Personal Adjustment, and an overall composite score, the Emotional Symptoms Index (ESI). The child level (SRP-C) has 12 scales and the adolescent level (SRP-A) has 14

< previous page

page_565

next page >

< previous page

page_566

next page > Page 566

TABLE 19.3 Composites and Scales in the SRP Composite/Scale Child Adolescent Clinical Maladjustment * * * * Anxiety * * Atypicality * * Locus of Control * * Social Stress * * Somatization School Maladjustment * * * * Attitude to School * * Attitude to Teachers * Sensation Seeking (Other Problems) * * Depression * * Sense of Inadequacy Personal Adjustment * * * * Relations with Parents * * Interpersonal Relations * * Self-esteem * * Self-reliance Emotional Symptoms Index * * Note. Italicized scales compose the Emotional Symptoms Indes. Reprinted with permission from Reynolds and Kamphaus (1992, p. 3). Copyright © 1992 by American Guidance Service, Inc. scales arranged into composites (see Table 19.3). Unlike the BSI for the rating scales, the ESI is composed of both negative (clinical) scales and positive (adaptive) scales whose scoring has been reversed, because these are the scales that load highest on a general factor. SRP scale definitions are presented in Table 19.4. Like the rating scales, the SRP may be interpreted with reference to national age norms (general, female, and male) or to clinical norms. Special indexes are incorporated to assess the validity of the child's responses: the F index, the L ("fake good") index for the SRP-A only, and the V index designed to detect invalid responses due to poor reading comprehension, failure to follow directions, or poor contact with reality. Structured Developmental History The Structured Developmental History (SDH) is an extensive history and background survey that may be completed by a clinician during an interview with a parent or guardian, or may be completed as a questionnaire by a parent, either at home or in the school or clinic. The SDH systematically gathers information that is crucial to the diagnostic and treatment process. Many developmental events and medical or related problems in the family may have an impact on a child's current behavior. The SDH structures the gathering of the child and family history, both social and medical. Because it is comprehensive, the SDH should be an asset to any evaluation of a child, whether or not other BASC components are used.

< previous page

page_566

next page >

< previous page

page_567

next page > Page 567

Scale Anxiety

TABLE 19.4 Student Self-report of Personality Scale Definitions Definition

Feelings of nervousness, worry, and fear; the tendency to be overwhelmed by problems Attitude to Feelings of alienation, hostility, and dissatisfaction regarding school School Attitude to Feelings of resentment and dislike of teachers; beliefs that teachers Teachers are unfair, uncaring, or overly demanding Atypicality The tendency toward gross mood swings, bizarre thoughts, subjective experiences, or obsessive-compulsive thoughts and behaviors often considered "odd" Depression Feelings of unhappiness, sadness, and dejection; a belief that nothing goes right Interpersonal The perception of having good social relationships and friendships Relations with peers Locus of The belief that rewards and punishments are controlled by external Control events or other people Relations with Parents A positive regard toward parents and a feeling of being esteemed by them Self-esteem Feelings of self-esteem, self-respect, and self-acceptance Self-reliance Confidence in one's ability to solve problems; a belief in one's personal dependability and decisiveness Sensation The tendency to take risks, to like noise, and to seek excitement Seeking Sense of Inadequacy Perceptions of being unsuccessful in school, unable to achieve one's goals, and generally inadequate Social Stress Feelings of stress and tension in personal relationships; a feeling of being excluded from social activities Somatization The tendency to be overly sensitive to, experience, or complain about relatively minor physical problems and discomforts> Note. Reprinted with permission from Reynolds and Kamphaus (1992, p. 58). Copyright © 1992 by American Guidance Service, Inc. Student Observation System The Student Observation System (SOS) is a form for recording a direct observation of the classroom behavior of a child. The SOS uses the technique of momentary time sampling (i.e., systematic coding during 3-second intervals spaced 30 seconds apart over a 15-minute period) to record a wide range of children's behaviors, including positive (e.g., teacher-student interaction) and negative behaviors (e.g., inappropriate movement or inattention). The BASC SOS may be used appropriately in regular and special education classes. It can be used in the initial assessment as part of the diagnostic process. It can also be used repetitively to evaluate the effectiveness of educational, behavioral, psychopharmacological, or other treatments. The BASC SOS Parts A, B, and C, and other components, can contribute to the functional assessment of behavior from multiple perspectives: Frequency. SOS Part A ratings of "never observed," "sometimes observed," and "frequently observed." SOS Part B assesses frequencies by category of behavior problem, and PRS and TRS ratings tally the frequency of behavior problems. Duration. SOS Part B ratings of percentage of time engaged in a particular behavior by category. Intensity. SOS Part A ratings of "disruptive." SOS Part B ratings of frequency by category. Antecedent Events. SOS Part C descriptions of teacher position, behavior, and other variables that precede misbehavior.

< previous page

page_568

next page > Page 568

Consequences. SOS Part C descriptions of teacher behavior, peer behavior, and other variables that follow a behavior. Ecological Analysis of Settings. SOS observations made at various times of day and classroom setting. The PRS may be used for the assessment of behavior in the community and home environments. Forms The TRS, PRS, and SRP forms come in two formats: hand scoring or computer entry. The hand-scoring forms are printed in a convenient self-scoring format, allowing them to be scored rapidly without using templates or keys. Each form includes a profile of scale and composite scores. The computer entry forms, which are simpler one-part forms, are designed to allow the user to key item responses into a microcomputer in about 5 minutes or less. Computer Software A microcomputer program, BASC Plus, is available and offers online administration of the TRS, PRS, and SRP and computer scoring of a completed computer-scored or hand-scoring form. The manual for BASC Plus explains how to use the program to administer, score, and report the TRS, PRS, and SRP (see sample BASC Plus printout in Appendix). The BASC Enhanced ASSIST program offers users a simpler computer program that produces all possible scores, a graphical display of results, and item responses. General Norms The general norms are based on a large national sample that is representative of the general population of U.S. children with regard to sex, race/ethnicity, clinical or special education classification, and, for the PRS, parent education. These norms are subdivided by age and, therefore, indicate how the child compares with the general population of children that age. For many applications, these norms (combining females and males) will be the preferred norms, and are recommended for general use. Several of the scales of the TRS, PRS, and SRP show gender differences. Males tend to obtain higher raw scores on the Aggression, Conduct Problems, Hyperactivity, Attention Problems, and Learning Problems scales of the TRS and PRS and on the Sensation Seeking, Attitude to School, Attitude to Teachers, and Self-esteem scales of the SRP. Females tend to score higher than males on the Social Skills, Study Skills, Leadership, and Depression scales of the TRS and PRS and on the Anxiety and Interpersonal Relations scales of the SRP. These differences in scores likely reflect real differences between males and females in the incidence of the indicated behavioral or emotional problems.

< previous page

page_568

next page >

< previous page

page_569

next page > Page 569

For these gender differences to be reflected in the normative scores, a common set of norms must be used for both males and females. The general combined-sex norms serve this purpose. General norms answer the question, "How commonly does this level of rated or self-reported behavior occur in the general population at this age?" Using general norms, more males than females will show high T-scores on Aggression, for example, and more females than males will have high T-scores on Social Skills. These norms preserved any observed gender difference in BASC standard scores. This is appropriate, and the general norms are used if the clinician believes that boys and girls are in fact different on various behavioral characteristics (i.e., observed differences are not due to psychometric artifacts). For example, girls score higher than boys on the SRP Anxiety scale (a common finding in research on anxiety; e.g., see Reynolds & Richmond, 1985). In determining which set of norms to use, the clinician must answer the question, "Are girls more anxious than boys, or are they simply more willing to admit to symptoms of anxiety?" If the former is true, then the general norms are more appropriate, but in the latter case, the gender-specific norms are the correct choice. Reynolds and Kamphaus (1992) recommended the use of the general norms, but the individual clinician may disagree and opt for the other norms. This allows the clinician more latitude than typically occurs on other behavioral and self-report scales. Female and Male Norms These norms are based on subsets of the general norm sample; each is representative of the general population of children of that age and gender. The effect of using these separate-sex norms is to eliminate differences between males and females in the distribution of T-scores or percentiles. For example, although raw score ratings on the Aggression scale tend to be higher for males than females, use of separate-sex norms removes this difference and produces distributions of normative scores that are the same for both genders. Indexes of Validity and Response Set Several indexes are provided to help the BASC user judge the quality of a completed form. Validity may be threatened by any of several factors, including failure to pay attention to item content, carelessness, an attempt to portray the child in a highly negative or positive light, lack of motivation to respond truthfully, or poor comprehension of the items. Information on the development of these indexes and the setting of cutoff scores is provided in Reynolds and Kamphaus (1992). F Index The F index, included on all of the BASC rating scale and self-report forms, is a measure of the respondent's tendency to be excessively negative about the child's behaviors or self-perceptions and emotions. The F scale was developed using traditional psychometric methods associated with Infrequency scales.

< previous page

page_569

next page >

< previous page

page_570

next page > Page 570

On the PRS and TRS, the F index is scored by counting the number of times the respondent answered "Almost always" to a description of negative behavior or "Never" to a description of positive behavior. Because responses on the SRP are limited to True and False, items selected for that F index are either extremely negative items to which the child responded True or positive items to which the response was False. Items were selected for these scales that have a low probability of co-occurrence, that is, they are seldom endorsed in concert with one another. The TRS, PRS, and SRP record forms show what levels of F index scores are high enough to be of concern. Detailed guidance to interpretation of the F index is given in Reynolds and Kamphaus (1992). L Index The L index, offered for the adolescent level of the SRP, measures adolescents' tendency to give an extremely positive picture of themselveswhat might be called "faking good." The index consists of items that are unrealistically positive statements (e.g., "I like everyone I meet") or are mildly self-critical statements that most people would endorse (e.g., "I sometimes get mad"). Individuals scoring high on this scale may also be giving the most socially desirable response or possibly are psychologically naive relative to their peers. The SRP-A record form shows which L scores should be of concern. V Index Each level of the SRP includes a V index made up of five or six nonsensical or highly implausible statements (e.g., "Superman is a real person"). The V index serves as a basic check on the validity of the SRP scores in general. If a respondent marks two or more of these statements as True, the SRP may be invalid. BASC ADHD Monitor Nature and Purpose The BASC ADHD Monitor fills a unique role in the assessment of children who are diagnosed with AttentionDeficit Hyperactivity Disorder (ADHD). The monitor is the second step in an assessment regimen designed to enhance treatment planning and evaluation by more thoroughly assessing the primary symptoms of ADHD. Attention problems and hyperactivity constitute the core symptoms used by the DSM-IV to define the ADHD syndrome (Kamphaus & Frick, 1996). Problems in one or both of these areas are used to differentiate the three subtypes of ADHD: ADHD Predominantly Inattentive Type, ADHD Predominantly Hyperactive-Impulsive Type, and ADHD Combined Type. Components of the original BASC system serve as the first step in the comprehensive assessment of children suspected of having ADHD. The BASC takes a broad sampling of child behavior in order to identify the full range of child problems. If the initial administration of the BASC reveals problems on the Attention Problems and/or Hyperactivity

< previous page

page_570

next page >

< previous page

page_571

next page > Page 571

scales, the diagnosis of ADHD becomes a possibility. Of greater importance, however, is the necessity to use the BASC teacher, parent, and self-report forms to rule out co-occurring problems, which can only be done with the initial use of a broad-based measure (Kamphaus & Frick, 1996). This is particularly important in diagnosis of ADHD where so many comorbid disorders occur and where other disorders (e.g., childhood depression) may superficially mimic ADHD. In the diagnostic process, the use of narrow-band scales may result often in overdiagnosis of ADHD. The monitor represents the second step in the comprehensive assessment of ADHD in that it is concerned with treatment design and evaluation. The narrowly focused monitor is designed to assess an expanded range of attention problems and hyperactivity symptoms in a practical, time-efficient manner. This additional detail allows the clinician to refine the diagnosis of ADHD and, of greater importance, to design a comprehensive treatment program aimed at reducing behavioral problems. The monitor also provides Internalizing and Adaptive Skills scales that further encourage comprehensive treatment planning and evaluation of treatment effectiveness by allowing clinicians to include these important constructs easily in the treatment plan. The BASC and BASC ADHD Monitor represent a coordinated multiple-step assessment system that allows the clinician to proceed from referral for ADHD to diagnosis, treatment design, and treatment evaluation with greater ease and precision. In order to achieve these assessment objectives, the monitor utilizes information provided by parents, teachers, and a classroom observer to assess the constructs noted in Table 19.5. Few child assessment measures are designed to meet the practical demands of treatment evaluation. In other words, few tests are constructed in a manner that facilitates the repeated collection and dissemination of child information. The monitor is designed to meet the unusual practical demands dictated by the need for the repeated assessment TABLE 19.5 BASC ADHD Monitor Constructs Component Scales Parent Monitor Attention Problems Hyperactivity Internalizing Problems Adaptive Skills Teacher Monitor Attention Problems Hyperactivity Internalizing Problems Adaptive Skills BASC SOS Response to Teacher/Lesson Peer Interaction Work on School Subjects Transition Movement Inappropriate Movement Inattention Inappropriate Vocalization Somatization Repetitive Motor Movements Aggression Self-injurious Behavior Inappropriate Sexual Behavior Bowel/Bladder Problems ADHD Monitoring Plan All monitor components and scales ADHD Monitor ASSIST All monitor components and scales

< previous page

page_571

next page >

< previous page

page_572

next page > Page 572

of child behavior. The original BASC may be used repeatedly to evaluate treatment effects, particularly if a child is found to have multiple problems (e.g., ADHD, depression, anxiety, and conduct disorder) that cannot be fully assessed by the monitor. In the case of ADHD and its subtypes, however, the monitor is constructed so as to allow clinicians to evaluate treatment with greater efficiency. The needs of child health care workers led to the development of the monitor. Health care and related professionals in psychology, medicine, and education are all struggling to meet the needs of the child with ADHD. The epidemiological estimates of ADHD of 3% to 5% of the U.S. population are striking (Rapoport & Castellanos, 1996). Moreover, approximately 2% to 6% of elementary age schoolchildren may be receiving psychostimulant medication treatment at any given time, making this the most frequently used pharmacological treatment received by children (Bender, 1997). The problems of children with ADHD also put them at higher risk for a diverse array of other problems, including learning disabilities (Frick et al., 1991) and anxiety disorders (Last, 1993). A high comorbidity rate of this nature suggests that children with these problems are likely to receive a variety of medical and behavioral treatments in home, school, and other settings. These facts lead invariably to the conclusion that the population of children with ADHD present special assessment challenges as a result of the phenomenology of their problems, which dictates the creation of complex treatment strategies. One of the most significant challenges of ADHD treatment is the coordination of medical management and other treatment strategies. Consequently, assessment strategies have to be designed so as to enhance frequent and accurate communication among treatment providers, including parents, physicians, teachers, other clinicians, and the child. The BASC ADHD Monitor is designed to enhance the work of all those individuals who provide services to children with ADHD. It has the following purposes: 1. To provide accurate and frequent feedback to the prescribing physician. The physician and other health care workers need accurate information to ensure that a child is receiving the most accurate psychotropic regimen and in order to adjust dosage. Information about the effects of medication on hyperactivity, attention problems, internalizing problems, and adaptive skills can aid the physician in making crucial medical treatment decisions. 2. To ensure that the ongoing assessment of ADHD problems is efficient, timely, and cost-effective. Given the multiple time demands on parents, teachers, and others, little time remains to complete lengthy or unnecessarily complex rating scales that are not specifically targeted to the needs of the child with ADHD. On the other hand, the monitor is designed to be adequately thorough in order to allow for the assessment of constructs in addition to the core dimensions of ADHDinternalizing problems and adaptive skills (Kamphaus & Frick, 1996). All of these assessment objectives must be achieved in an efficient way given the exigencies of health care. Accordingly, the monitor is brief, yet it provides coverage of four important domains related to the functioning of the child with ADHD. 3. To provide a system of devices that allows for input from multiple informants. Teacher, parent, and clinician observations are all of potential importance for the treatment process, and communication among these individuals is crucial for effective treatment (Bender, 1997). Each monitor form is designed to meet the specialized needs of each of these informants. 4. To emphasize the assessment of specific behavioral outcomes in order to demonstrate accountability for services. Increasingly, the effectiveness of child services is being challenged, thereby creating the need to assess outcomes. The monitor assesses the DSM-IV criteria for ADHD and includes items that are written in clear behavioral terms. In addition, the monitor software is designed to produce output that gives providers and administrators a clear indication of response to treatment. The monitor is designed to provide clinicians with the information needed to adjust treatment whenever response to intervention is not optimal.

< previous page

page_572

next page >

< previous page

page_573

next page > Page 573

5. To link assessment to treatment. The monitor is designed to be practical enough to be considered central to the treatment process. Heretofore, physicians and other clinicians have often had difficulty acquiring the feedback needed to adjust treatment. The test and software design of the monitor was guided throughout by the need to provide information relevant to treatment. The selection of items and scales, test length, scoring and reporting systems, graphical output, and other monitor characteristics were all guided by this central objective. 6. To allow teachers to use a single form for evaluating treatment effects. Teachers are often charged with evaluating the effects of medical and behavioral interventions, and they often have to review several behavioral charts, graphs, or other records in order to report on a child's progress (Bender, 1997). The BASC ADHD Monitor, because of its multiple components, is intended to assess a child's behavior both broadly and specifically in order to expedite the teacher's reporting duties. Monitor Interpretation ADHD Monitor interpretation can take several forms depending on the instrument(s) used, theoretical orientation of the clinician, the nature of the evaluation questions posed, and other factors. Given space limitations, this chapter focuses on the "basics" of interpretation that are grounded in psychometric principles and customary practice. It is also important to keep in mind that the monitor is designed to create and evaluate treatment plans. Hence, interpretation of the scales as diagnostic devices is of considerably lesser importance. In evaluating monitor results the individual clinician is asking whether or not significant change has occurred in response to treatment. For the parent and teacher monitors, four questions are generally posed: Is treatment affecting symptoms of inattention? Is treatment affecting symptoms of hyperactivity? Is treatment affecting internalizing symptoms? Is treatment affecting adaptive skills? The questions related to change are multitudinous and parallel for the SOS, where clinicians may be assessing change at the item or scale level. It appears, however, that the assessment of change is fraught with methodological and conceptual pitfalls (Jacobson & Truax, 1991). For instance, a high level of inference may be involved in the case where a child's hyperactivity has improved in response to the administration of methylphenidate. Certainly an improvement in scores may indicate positive change, yet it is not possible to be certain that the medication was the cause. Placebo and other effects can only be ruled out in well-controlled studies. Hence, when it is said that a child has shown improvement in response to treatment, such a statement risks making an inappropriate inference. It merely makes the writing task easier. It is common practice to use statistics to assess change. If, for example, it can be shown that a child's T-score on the Hyperactivity scale went from 75 to 63 in response to treatment, then this may be a statistically significant improvement. Is this reduction in symptoms, however, clinically significant in the sense that the child, parents, teachers, and others are pleased with the child's progress in school and other spheres? Would this interpretation of this statistically significant amount of change be modified if the pre- and posttreatment scores were 85 and 74, respectively? The point is that it may be difficult to quantify change to the satisfaction of all stakeholders in a child's life. The monitor attempts to improve assessment of treatment effects by measuring multiple dimensions of behavior, as opposed to producing a single score. It is more likely that clinically significant change has occurred if a child shows improvement in hyperactivity, attention problems, internalizing, and adaptive skills.

< previous page

page_573

next page >

< previous page

page_574

next page > Page 574

Coordination with BASC Results Use of the BASC will often precede administration of the ADHD Monitor. The use of an omnibus measure, such as the BASC, is crucial at the outset of the evaluation process (Kamphaus & Frick, 1996). Only a broadbased measure that assesses numerous constructs will allow the clinician to rule out comorbidities, alternative causes for the apparent symptomatology, and other diagnostic and treatment considerations. Once diagnostic decisions are made, the ADHD Monitor can be used to develop and evaluate treatment plans. There is, however, one important area of interpretive overlap between the BASC and the BASC ADHD Monitor Parent and Teacher Forms. A T-score baseline for treatment evaluation can be obtained from either set of measures. There are two administration scenarios that are most likely. First, a clinician may administer either or both of the BASC parent and teacher forms during the initial diagnostic evaluation. The obtained T-scores for the Hyperactivity, Attention Problems, Internalizing Problems, and Adaptive Skills scales may be entered into the BASC ADHD software and be used as the baseline against which subesequent administrations of the ADHD Monitor will be compared. Second, a clinician may administer either or both of the BASC ADHD Parent and Teacher Monitor Forms during the initial diagnostic evaluation. The obtained T-scores for the Hyperactivity, Attention Problems, Internalizing Problems, and Adaptive Skills scales will then be used as the baseline against which subsequent administrations of the ADHD Monitor Forms will be compared. It is important to establish a T-score baseline in a timely fashion regardless of the method used. In other words, a T-score baseline should be collected during the evaluation phase and prior to implementation of treatment. The ADHD Monitor T-scores for parent and teacher rating scales serve as the most reliable indicator of behavioral change over time (see Kamphaus & Reynolds, 1998). The SOS is designed specifically for classroom-based intervention. SOS results then should not be considered when evaluating home-based intervention unless home- and school-based interventions are linked. For example, a home-bound reinforcement program may be used to improve behavior at school. The frequency of classroom behavior problems is assessed by the SOS. Consequently, SOS results from Parts A and B may be used to indentify behaviors in need of intervention. Specifically, any behavior problem that is exhibited or adaptive skill that is not exhibited becomes a potential candidate for intervention. Within these groups, problem behaviors of higher frequency can be given priority for intervention. Analogously, low frequency adaptive skills also become candidates for intervention. The SOS is unique among Monitor components in that it alllows clinicians to prioritize behaviors for classroombased intervention. The SOS also measures the ''bothersomeness" of a child's behavior problems via the disruptive category of Part A. Often children display a number of behavior problems making it difficult to prioritize behaviors for intervention (Schwanz & Kamphaus, 1997). The ratings of disruptiveness can be used to identify behaviors that should be targeted first for treatment. Interpretive Steps BASC interpretation is described in some detail in Reynolds and Kamphaus (1992). However, additional interpretive guidance is being developed as new research findings

< previous page

page_574

next page >

< previous page

page_575

next page > Page 575

become available. The following proposals for additional interpretive steps are offered to supplement those found in the BASC manual. Specify Treatment Objectives and Target Behaviors Results from the BASC may form the basis for establishing treatment objectives. The examiner is cautioned, however, to avoid establishing unrealistic objectives and, therefore, expectations for change. It may, in fact, be inappropriate to offer treatment objectives for all BASC constructs. It may not be warranted, for example, to expect change on the Withdrawal or Anxiety scale if somatic therapy for hyperactivity and attention problems is the only treatment being used. For example, pharmacological treatments for ADHD (e.g., amphetamines) have not been shown to affect systematically internalizing problems in predictable ways. Therefore, it seems unwise to communicate the expectation that these scales would change when in fact there is not a treatment aimed at changing the construct. On the other hand, if comorbid depression is being treated through cognitive behavior therapy, then change in the Depression scale may be an appropriate treatment objective. Treatment objectives should always be linked to actual treatments delivered in order to ensure that parents, teachers, and others have realistic expectations regarding change. Any of the three BASC components (parent/teacher, SRP, SOS) may then be used to identify target behaviors for intervention. Erhardt and Conners (1995) suggested that deviant scales and items be used to set target behaviors by observing that "it is reasonable in drug studies to use the most elevated scales on rating measures as predictors of drug treatment outcome, as well as primary target behaviors for medication effects" (p. 130). Record Treatment Data The major interpretive inference made by the BASC user is a link between treatment and outcomes. Careful treatment records are necessary in order to draw such inferences. The reverse sides of the parent and teacher monitor forms, for example, provide space for the documentation of the nature of various treatments along with their onset, cessation, and adherence. Adherence is probably the most difficult to document, but attempts should be made to do so. It may be necessary to use telephone interviews with parents, teachers, or clinicians, or conduct reviews of clinician records in order to assess adherence to treatment regimens. The collection of treatment information allows the clinician to be confident in the assessment of change. Collect at Least Three Data Points Frequent follow-up with the BASC or BASC Monitor is encouraged by the design of the record forms and software. Francis, Fletcher, Stuebing, Davidson, and Thompson (1991) observed that individual growth is difficult to measure accurately with two data points, such as a pretest and posttest. The BASC is designed to make the collection of three or more data points convenient so as to encourage a more accurate assessment of a child's true underlying change trajectory. An evaluation of the BASC against the Newman and Ciarlo (1994) criteria for outcomes measures is presented in Table 19.6.

< previous page

page_575

next page >

< previous page

page_576

next page > Page 576

TABLE 19.6 BASC Evaluation against Criteria for Treatment Outcome Assessment 1. Relevance to target group "It is safe to assert that the BASC has been carefully developed and represents a synthesis of what is known about developmental psychopathology and personality development. The items for all of the components have been derived from a review of the relevant literature and collected clinical experience. All of the contents of the scales were selected from research findings, other measures, and clinical experience. The items were constructed with the help of professionals (including teachers) and students, and were carefully evaluated for readability, acceptability, and comprehensibility" (Sandoval & Enchandia, 1994, p. 420). "Item content varies across levels to reflect developmental differences" (Flanagan, 1995, p. 179). 2. Simple and teachable methods "The record forms for the TRS, PRS, and SRP are clear and userfriendly. A true-false format was chosen for the SRP so that it would be more likely that children and adolescents would attend to the task sufficiently. The forms are completed quickly, are understandable to respondents, and are readily scored" (Flanagan, 1995, pp. 179, 180, 185). "Minimal self-instructional training is required in order to use the SOS appropriately, and hence it can be used by a wide variety of clinicians and educational professionals with varied training backgrounds" (Hoza, 1994, p. 9). "The test materials for the PRS and TRS are noteworthy for their convenient design. In contrast to other tests that require separate item booklets, answer sheets, scoring templates, and profile forms, all of these are combined into a single record form for both the PRS and TRS" (Kline, 1994, p. 291). 3. Use of measures with objective referents "The scale norms were constructed by using linear T-score scaling. This procedure is advisable because many of the constructs being measured are probably not normally distributed, and this metric preserves underlying distribution" (Sandoval & Echandia, 1994, p. 422). 4. Use of multiple respondents "The BASC is a multimethod test because information about a child's behavior is obtained from teacher(s) perspective, parent(s) perspective, and a student's perspective" (Miller, 1994, p. 24). "The BASC is a multimethod, multidimensional assessment system that measures both adaptive and problem behaviors as well as self-perceptions of school children both in school and in home settings" (Merenda, 1996, p. 232). 5. More process-identifying outcome measures "The BASC helps link dimensions of behavior and emotions to DSM-IIIR criteria, treatment programming, and educational classifications" (Adams & Drabman, 1994). "For subtypes of ADHD, and specifically the ADHD:PI subtype however, results would favor the use of the BASC PRS and TRS [over the Achenbach CBCL and TRF]" (Vaughn et al., 1997, p. 355). 6. Psychometric strengths "The BASC rating scales have strong psychometric properties and useful scale content and structure" (Adams & Drabman, 1994, p. 4). "The BASC has several strengths, particularly with regard to the psychometrically well-developed TRS, PRS, and SRP scales" (Hoza,

1994, p. 9). Reliability: "Three types of reliability are reported for the general samples of the TRS and PRS: internal consistency, test-retest reliability, and interrater reliability. Internal consistency and test-retest reliability are reported for the SRP. The majority of BASC components have reliabilities greater than .80, which is psychometrically acceptable" (Flanagan, 1995, p. 181). Validity: Validity evidence reported by Reynolds and Kamphaus (1992) includes the following: CONFIRMATORY FACTOR ANALYSIS (CFA) EXPLORATORY FACTOR ANALYSIS (EFA) Teacher Rating Scale Parent Rating Scale Self-report of Personality CFA CFA CFA EFA EFA EFA CRITERION RELATED CORRELATION STUDIES Teacher Rating Scale: Teacher's Report Form (2 studies), Revised Behavior Problem Checklist, Conners Teacher Rating Scales, Burks Behavior Rating Scales, Behavior Rating Profile Parent Rating Scale: Child Behavior Checklist (3 studies), Personality Inventory for Children, Conners Parent Rating Scales, Behavior Rating Profile Self-report of Personality: Minnesota Multiphasic Personality Inventory, Youth Self-report (2 studies), Behavior Rating Profile, Children's Personality Questionnaire (Continued) (table continued on next page)

< previous page

page_576

next page >

< previous page

page_577

next page > Page 577

(table continued from previous page) TABLE 19.6 (Continued) 7. Low measure costs relative Materials cost: Handscore form: each $1.00 OR Computer score form: each $0.64 Enhanced ASSIST Software (unlimited uses): $225.95 OR BASC Plus software (per use): $0.90 Structured Observation System form: each $1.16 Structured Developmental History form: each $1.36 8. Understanding by nonprofessional audiences "The parent rating scales can be readily understood and completed by parents with minimal assistance from the professional administering the scale. The protocols are efficiently arranged and attractive" (Sandoval & Echandia, 1994, p. 424). 9. Easy feedback and uncomplicated interpretation "In practice, the BASC has been positively received by children, parents, teachers, and school psychologists. The data are easily interpreted and presented to parents and school personnel" (Flanagan, 1995, p. 185). "Printouts are readily understandable and offer graphical representation of results as well as a brief narrative" (Davis, 1995, p. 22). 10. Useful in clinical services "Integration of BASC data with other data obtained in a comprehensive assessment of a child is readily accomplished. Diagnosis of educational disabilities and descriptions of behavioral and emotional variables can be considerably simplified with the use of the BASC" (Flanagan, 1995, p. 185). "The BASC assists with several assessment needs including the description of adaptive and maladaptive dimensions disorders, the decision for educational classification, the evaluation of treatment programs for behavior, and research regarding childhood emotional and behavior disorders" (Adams & Drabman, 1994, p. 1). "The [BASC-PRS] has unique potential to aid in diagnostic decision making, as it contains conceptually derived scales created for use in conjunction with psychiatric and educational classification systems" (Doyle et al., 1997, p. 281). 11. Compatibility with clinical theories and practices "Samples for the general, male, and female norms were selected to be representative of the 1990 U.S. population aged 4-18 years, including exceptional children, for race and ethnicity. In addition, the goal was to have overlap of the samples across PRS, TRS, and SRP, to make the sets of norms comparable. Clinical samples were drawn from self-contained classrooms, community mental health centers, residential schools, juvenile detention centers, and university and hospital outpatient mental health clinics" (Flanagan, 1995, pp. 180-181). Assessing Change: Clinical Significance The assessment of change, however, is characterized by numerous methodological confounds and issues. Jacobson and Truax (1991) criticized the use of meta-analyses for assessing change by observing that statistical significance may be unrelated to the important issue of the clinical significance of change. They cited the example of weight loss where a loss of 2 or 3 pounds may be deemed statistically significant, yet hardly satisfying to many of those undergoing treatment. In relation to change subsequent to psychotherapy, they noted that there are three potential indicators of clinical significance: (a) The level of functioning subsequent to therapy should fall outside the range of the dysfunctional population, where the range is defined as extending two

standard deviations beyond (in the direction of functionality) the mean of the population. (b) The level of functioning subsequent to therapy should fall within the range of the functional or normal population, where the range is defined as within two standard deviations of the mean of that population. (c) The level of functioning subsequent to therapy places that client closer to the mean of the functional population than it does

< previous page

page_577

next page >

< previous page

page_578

next page > Page 578

to the mean of the dysfunctional population (p. 13). Add to these an important, but potentially more difficult, fourth indicator: (d) The response to treatment made a positive difference in the day-to-day life of the patient. Consider, for example, a severe case of ADHD with a response to psychopharmacotherapy that reduces a score on a hyperactivity scale from 90 to 75. This child is still more overactive than 99% of children at his age, but if the child can now attend a regular classroom with peers who have good social skills as opposed to being in a segregated classroom with behavior problem children who have lesser social skills, then the treatment would be quite successful. There is still a question, however, as to whether or not such goals are realistic for a syndrome such as ADHD. If ADHD is conceptualized as a developmental disorder, there may be cause for modest expectations for change. For example, children with mental retardation would never be expected to have intelligence test scores within the normal range. Similarly, there are the syndromes of reading disabilities and autism where expectations for change due to treatment may not include return to the normal range of functioning for many children. There are, however, other potential indicators of acceptable change that could include change recognizable by peers, teachers, parents, or others, and/or reduced risk for various health problems (Jacobson & Truax, 1991). Therefore, the limitations of statistical indicators of change cannot be overlooked. For the aforementioned reasons, it is important that clinical significance be considered in the assessment of change. In part, clinical significance will eventually become clear through research on the outcomes of children with various disorders. Until then, the clinical assessor of change must decide on a measure of clinical significance that is appropriate for each child. Perhaps the best way is to define this clinical significance by setting objectives that are commensurate with the treatment objectives for the child, parents, and/or caregivers. In fact, setting appropriate goals for change may be an important topic to discuss with parents and others prior to the implementation of treatment. If treatment objectives are not specified and/or realistic for a given child, then the effectiveness of treatment will remain uncertain. Therefore, the establishment of treatment goals is the first step involved in BASC interpretation. Assessing Change: Other Methodological Issues Statistical procedures and formulae have been offered for individual growth models of change assessment by Francis et al. (1991). They also discussed some of the measurement prerequisites necessary for the accurate assessment of change. Francis et al. (1991) concluded that measures of change, such as the BASC, should have interval scales of measurement, be relatively free of ceiling or floor effects, encourage collection of three or more observations in order to assess an individual growth trajectory, and not use age-based standard scores for measuring change. Each of these criteria is discussed in turn as it applies to the BASC. The BASC strives to produce an interval scale of measurement through the use of T-scores. An interval scale is necessary for assessing change because a scale with unequal units would not allow for the precise computation of change indices. Moreover, ceiling and floor effects detract from the usefulness of a scale for assessing change. Consequently, the BASC T-score range is not artificially restricted to a prespecified range. The T-scores were scaled based on the distributional properties of the construct as sampled from the population (see Reynolds & Kamphaus, 1992). Of course,

< previous page

page_578

next page >

< previous page

page_579

next page > Page 579

ceiling and floor effects may still be encountered with the BASC, but they should not occur with significant frequency. Additionally, the BASC does use age-based standard scores (T-scores) as the featured score for measuring change. For many children, however, the effects of age standard scores on the interpretation of change will be nonexistent because raw scores are not significantly different across age groups, as is indicated by the use of the same norm tables for large age groups of children. In the case of the BASC PRS, the raw scores did not differ significantly for age 8 through 11, allowing for the use of one norm table for this age range. For example, if a child is diagnosed initially at age 8, then the same raw score to T-score conversion table is used until the child reaches age 11. On the other hand, for a 7-year-old child, different norm tables will be used for the computation of T-scores at age 8, which could result in inaccurate assessment of change. When norm tables change, it may be wise for the clinician to consider raw scores in addition to T-scores when assessing change over this time period. Another methodological issue to consider is the reliability of difference (pretest-post-test) scores. Because an inverse relation exists between the reliability of a difference score and the correlation between the pretest and posttest, unreliable difference scores may reflect the possibility that the trait being measured changes little over time, or in response to treatment (Francis et al., 1991). The long-term stability of the BASC scales is currently known only for a 7-month period (see Reynolds & Kamphaus, 1992), but additional information, as it becomes available through future research, may shed light on the maleability that may be detected by the scales. Some information about expected change can be gained from prior research regarding specific syndromes such as the well-documented effects of stimulant medication on ADHD symptomatology. Brown, Dreelin, and Dingle (1997) suggested the following conclusions: 1. Although standardized intelligence test results are not likely to change in response to somatic therapy, there may be some improvements seen in rote memory and higher order cognitive skills tasks. 2. Standardized achievement test results do not change significantly. 3. Some research suggests that beneficial effects for the use of stimulant medications might include more academic productivity, better quality seatwork (e.g., greater percent correct), less classroom disruption, less aggression, greater adherence to classroom rules, more time on task and attention, and better handwriting. 4. Prosocial behaviors have not been shown to increase significantly in response to medication. These conclusions await confirmation, extension, and clarification by future researchers. The evidence currently available, however, does suggest that change should be seen in the BASC Attention Problems and Hyperactivity scales. Whether or not changes in other domains are likely remains uncertain pending future research. Moreover, the effects of behavior therapies on the domains of the BASC are still unknown. Inform Treatment Providers and Caregivers Due to the complexity of the presentation of symptoms commonly associated with children's disorders, children often receive multimodal therapies including somatic, behavioral, and educational interventions. The involvement of multiple treatment providers makes communication among providers critical. BASC output is designed to

< previous page

page_579

next page >

< previous page

page_580

next page > Page 580

foster communication among parents, physicians, teachers, psychologists, and other clinicians by offering simple graphical output. At the same time, T-scores and percentile ranks are also provided in order to enhance quantitative interpretation. BASC results should be shared generously among all individuals involved with a case. Results, however, should always be accompanied by competent interpretation in order to ensure that they are used appropriately. Evaluation of the BASC as an Outcomes Measure An attempt has been made to fit the BASC to the criteria suggested by Newman and Ciarlo (1994) for evaluating the worthiness of a test for outcomes evaluation (see Table 19.6). This assessment of the BASC is presented with the caveat that the criteria may be open to considerable interpretation and new or different criteria may eventually be of more value. One more caveat is in order, in that the wish is to evaluate the suitability of the BASC in as objective a manner as possible. Consequently, Table 19.6 was constructed to include the opinions of other scholars to the extent that is practicable. Case Studies Sally Sally is an 8-year-old third grader referred for suspected ADHD. She has a history of overactivity, impulsive behavior, and inattention dating to her preschool years. She has undergone numerous behavioral and educational treatments that have resulted in no noticeable improvement. Consequently, her teachers and parents have requested a formal psychological evaluation. Sally was administered an extensive battery of psychological tests including intelligence tests, achievement measurements, classrooms observations, and a variety of parent and teacher rating scales. The results of the BASC parent, teacher rating T-scores, and SOS are shown in Tables 19.7 and 19.8. The results of this initial evaluation were consistent with the diagnosis of ADHD Combined Type. The diagnosis was made and related behavioral interventions were suggested. Furthermore, Sally's parents were admonished to seek a medical evaluation to determine the need for pharmocological intervention. Her parents did not seek pharmocological treatment but some behavioral interventions were deliveredagain, with little evidence of improvement. Her teachers and parents became concerned during her fourth-grade year about her lack of progress. A second classroom observation (see Table 19.8) and other data were collected, suggesting that her ADHD symptomatology was not responding well to treatment. Consequently, her pediatrician placed her on a regimen of 5 mg Ritalin twice a day during the academic year to begin in November 1997. Follow-up SOS observations and teacher and parent interviews documented an extraordinarily positive response to intervention, including increasing schoolwork productivity and suppression of inappropriate movement and vocalizations and attention problems (see Table 19.8, December 1997 results).

< previous page

page_580

next page >

< previous page

page_581

next page > Page 581

TABLE 19.7 Sally's BASC Mother and Teacher Ratings at Initial Evaluation Scale Mother Teacher Hyperactivity 61 70 Aggression 39 52 Conduct Problems 41 43 Anxiety 40 73 Depression 43 54 Somatization 44 46 Atypicality 53 64 Withdrawal 41 55 Attention Problems 48 60 Learning Problems 61 Adaptability 53 32 Social Skills 49 52 Leadership 35 48 Study Skills 51 TABLE 19.8 SOS Results for Sally at Initial Evaluation, Second Evaluation, and 1 Month After Initiation of Somatic Therapy Scales October October December 1996 1997 1997 18 13 14 Response to Teacher/Lesson 0 0 0 Peer Interaction 7 2 10 Work on School Subjects 0 0 0 Transition Movement 5 8 1 Inappropriate Movement 6 1 1 Inattention 4 2 2 Inappropriate Vocalization 0 0 0 Somatization 0 0 0 Repetitive Motor Movements 0 0 0 Aggression 0 0 0 Self-injurious Behavior 0 0 0 Inappropriate Sexual Behavior 0 0 0 Bowel/Bladder Movements Jordan Jordan is a 5-year-old referred for suspected ADHD. He has a previous diagnosis of mild mental retardation. At the time of evaluation he was being served in a preschool special education program. Evaluation results produced a diagnosis of ADHD. He was referred to a community health clinic, where he was administered the BASC ADHD Monitor at intake (see Table 19.9). His physician used the monitor results to conclude that the current medical regimen of 10 mg Ritalin administered twice daily should be maintained for another month until monitor teacher ratings are collected. Initial ratings and interview data from Jordan's mother show promising indications of successful intervention, particularly in the realm of hyperactivity and, to a lesser extent, attention problems.

< previous page

page_581

next page >

< previous page

page_582

next page > Page 582

TABLE 19.9 BASC ADHD Monitor Results for Jordan Both Before and After Initiation of Somatic Therapy Monitor Scale November 1997 December 1997 Attention Problems 80 67 Hyperactivity 84 58 Internalizing 52 42 Adaptive Skills 31 33 Recent Research Much of the latest BASC research has focused on the assessment and diagnosis of clinical populations such as ADHD. Studies by Vaughn, Riccio, Hynd, and Hall (1997) and Doyle, Ostrander, Skare, Crosby, and August (1997) clarified the role that BASC parent and teacher ratings may play in the diagnosis of ADHD. Doyle et al. (1997) concluded that the BASC PRS and CBCL were roughly equivalent for the diagnosis of ADHD Combined Type. In addition, they stated a preference for the use of the BASC due to the rational derivation of its scales. Vaughn et al. (1997), on the other hand, found BASC PRS and TRS, and CBCL and TRF results to differ significantly for cases of ADHD Primary Inattentive Type. They again noted equivalence between the two systems for the diagnosis of ADHD Combined Type, although the authors suggest that "the CBCL may not be as accurate when diagnosing ADHD:CT children without externalizing disorders" (p. 356). Regarding the diagnosis of ADHD Combined Type, they concluded that "when discriminating among ADHD:PI children, however, the BASC scales are more accurate" (p. 356). There are numerous BASC research studies of clinical populations underway by a variety of authors with no vested interest in supporting the BASC. These studies will undoubtedly add much to the clinician's understanding of the functioning of the BASC with a variety of populations. The BASC has also been used as a "basic" clinical science research tool as exemplified by a recent study by Kamphaus, Huberty, DiStefano, and Petoskey (1997). This study involved a cluster analysis of the TRS-C norming sample with the aim of developing a typology of child classroom behavior for the United States. The resulting seven-cluster typology may prove useful for the study of risk and protective factors in the context of developmental psychopathology research. Briefly, the seven clusters identified were well adapted, average, disruptive behavior problems, learning problems, physical complaints/worry, severe psychopathology, and mildly disruptive. This study provides an estimate of the prevalence of these clusters in the U.S. classroom population and a description of their phenomenology. Again, additional studies addressing a variety of applied and theoretical issues are underway at several research centers. Conclusions The BASC, like all psychological assessment tools, will be a work in progress for years to come. The accumulation of evidence is just beginning to reveal the worth of this set of instruments for a variety of purposes, including treatment planning and evaluation. Much more research is to come, a phenomenon that is at least partly due to the recent

< previous page

page_582

next page >

< previous page

page_583

next page > Page 583

publication of promising findings and BASC-related methodologies. Hopefully, this momentum will continue as a vibrant research enterprise represents the field's greatest opportunity to produce new innovations in child personality and behavioral assessment. Appendix

< previous page

page_583

next page >

< previous page

page_584

next page > Page 584

< previous page

page_584

next page >

< previous page

page_585

next page > Page 585

< previous page

page_585

next page >

< previous page

page_586

next page > Page 586

< previous page

page_586

next page >

< previous page

page_587

next page > Page 587

< previous page

page_587

next page >

< previous page

page_588

next page > Page 588

< previous page

page_588

next page >

< previous page

page_589

next page > Page 589

< previous page

page_589

next page >

< previous page

page_590

next page > Page 590

< previous page

page_590

next page >

< previous page

page_591

next page > Page 591

< previous page

page_591

next page >

< previous page

page_592

next page > Page 592

< previous page

page_592

next page >

< previous page

page_593

next page > Page 593

< previous page

page_593

next page >

< previous page

page_594

next page > Page 594

< previous page

page_594

next page >

< previous page

page_595

next page > Page 595

< previous page

page_595

next page >

< previous page

page_596

next page > Page 596

References Adams, C.D., & Drabman, R.S. (1994). BASC: A critical review. Child Assessment News, 4, 1-5. Bender, W.N. (1997). Medical interventions and school monitoring. In W.N. Bender (Ed.), Understanding ADHD: A practical guide for teachers and parents (pp. 107-122). Upper Saddle River, NJ: Merrill. Brown, R.T., Dreelin, E., & Dingle, A. (1997). Neuropsychological effects of stimulant medication on children's learning and behavior. In C.R. Reynolds & E. Fletcher-Jansen (Eds.), Handbook of clinical child neuropsychology (2nd ed., pp. 539-572). New York: Plenum. Davis, H. (1995). Behavior assessment system for children. Protocol: Maryland School Psychologists' Association, 15, 21-23. Doyle, A., Ostrander, R., Skare, S., Crosby, R.D., & August, G.J. (1997). Convergent and criterion-related validity of the behavior assessment system for children-parent rating scale. Journal of Clinical Child Psychology, 26, 276-284. Erhardt, D., & Conners, C.K. (1995). Methodological and assessment issues in pediatric psychopharmacology. In J.M. Weiner (Ed.), Diagnosis and psychopharmacology of childhood and adolescent disorders (2nd ed., pp. 97-137). New York: Wiley. Flanagan, R. (1995). A review of the behavior assessment system for children (BASC): Assessment consistent with the requirements of the individuals with disabilities education act (IDEA). Journal of School Psychology, 33, 177-186. Francis, D.J., Fletcher, J.M., Stuebing, K.K., Davidson, K.C., & Thompson, N.M. (1991). Analysis of change: Modeling individual growth. Journal of Consulting and Clinical Psychology, 59, 27-37. Frick, P.J., Kamphaus, R.W., Lahey, B.B., Loeber, R., Christ, M.A.G., Hart, E.L., & Tannenbaum, L.E. (1991). Academic underachievement and the disruptive behavior disorders. Journal of Consulting and Clinical Psychology, 59, 289-294. Hoza, B. (1994). Review of the behavior assessment system for children. Child Assessment News, 4, 5-10. Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Kamphaus, R.W., & Frick, P.J. (1996). Clinical assessment of child and adolescent personality and behavior. Needham Heights, MA: Allyn & Bacon. Kamphaus, R.W., Huberty, C.J., Distefano, C., & Petoskey, M.D. (1997). A typology of teacher rated child behavior for a national U.S. sample. Journal of Abnormal Child Psychology, 25(6), 453-463. Kamphaus, R.W., & Reynolds, C.R. (1998). BASC ADHD Monitor. Circle Pines, MN: American Guidance Service. Kline, R.B. (1994). New objective rating scales for child assessment: I. Parent and teacher informant inventories of the behavior assessment system for children: The child behavior checklist, and the teacher report form. Journal of Psychoeducational Assessment, 12, 289-306. Last, C.G. (1993). Introduction. In C.G. Last (Ed.), Anxiety across the lifespan: A developmental perspective (pp. 1-6). New York: Springer. Merenda, P.F. (1996). BASC: Behavior assessment system for children. Measurement and Evaluation in Counseling and Development, 28, 229-232. Miller, D.C. (1994). Behavior assessment system for children (BASC): Test critique. Texas Association of School Psychologists Newsletter, 23-29. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Rapoport, J.L., & Castellanos, F.X. (1996). Attention-deficit hyperactivity disorder. In J. M. Wiener (Ed.), Diagnosis and psychopharmacology of childhood and adolescent disorders (2nd ed., pp. 265-292). New York: Wiley. Reynolds, C.R., & Kamphaus, R.W. (1992). Behavior Assessment System for Children.

< previous page

page_596

next page >

< previous page

page_597

next page > Page 597

Circle Pines, MN: American Guidance Service. Reynolds, C.R., & Richmond, B.O. (1985). Revised Children's Manifest Anxiety Scale. Los Angeles: Western Psychological Services. Sandoval, J., & Echandia, A. (1994). Behavior assessment system for children. Journal of School Psychology, 32, 419-425, Schwanz, K.A., & Kamphaus, R.W. (1997). Assessment and diagnosis of ADHD. In W.N. Bender (Ed.), Understanding ADHD: A practical guide for parents and teachers (pp. 81-106). Upper Saddle River, NJ: Merrill/Prentice-Hall. Vaughn, M.L., Riccio, C.A., Hynd, G.W., & Hall, J. (1997). Diagnosing ADHD subtypes: Discriminant validity of the Behavior Assessment System for Children (BASC) and the Achenbach parent and teacher rating scales. Journal of Clinical Child Psychology, 26, 349-357.

< previous page

page_597

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_599

next page > Page 599

Chapter 20 Assessing Adolescent Drug Use with the Personal Experience Inventory Ken C. Winters William W. Latimer Randy D. Stinchfield University of Minnesota, Minneapolis George Henly University of North Dakota Alcohol and other drug use among American teenagers during the past 25 years persists as a public health concern. Despite some leveling during the early 1990s, the frequency of drug use reported by high school students is still disturbingly high, even increasing over the past 3 years (National Institute on Drug Abuse, 1997). The drug abuse treatment community has responded by creating developmentally appropriate services for drug abusing youth (Rahdert, 1991). This trend has led to identifying adolescents who abuse drugs at earlier stages of involvement than ever before by a variety of professionals and treatment service settings that are more specialized to adolescent problems. As adolescents enter the health care system at younger ages and with more variable problems, and given the greater diversity of professionals working in the field, the need for standardized and objective measures of adolescent drug abuse is heightened. Furthermore, the assessment field is further burdened by the demands of insurers, who in the face of mounting costs, are requiring service providers to further document and better justify the need for treatment. Research Goals and Applications Research Goals The development of the Personal Experience Inventory (PEI; Winters & Henly, 1989) was promoted by the view that assessment practices in the field in the 1980s were inadequate to meet the growing challenges in the service sector. Thus, the PEI was intended to provide clinicians and researchers with a comprehensive and standardized self-report inventory to assist in the identification, referral, and treatment of problems associated with teenage alcohol and drug abuse. To accomplish these goals, a norm-based instrument, consisting of scales reflecting multiple facets of adolescent substance use and related (coexisting) problems with sufficient reliability and validity, was to be developed.

< previous page

page_599

next page >

< previous page

page_600

next page > Page 600

The goal of measuring comorbid problems addressed an important instrumentation gap at the time of project inception. Teenage drug use behaviors are complex: Their presence may cause other problems, or may be a symptom of another primary behavioral or mental disorder. Such problems may require treatment in their own right, or they may appreciably change the focus and nature of recommended treatment. Psychosocial risk factors may serve as predisposing or precipitating variables that are important in the initiation of substance use, or these factors may contribute to continued use. Assessing these related factors should be a vital part of any substance abuse assessment procedure. Thus, the position was to view adolescent drug abuse and related problems within the context of a broad array of psychological, social, and contextual behaviors, attitudes, consequences, and symptoms. Applications The PEI is intended to characterize the adolescent respondent according to (a) the severity of psychological and behavioral involvement with drugs, including alcohol; (b) the nature and style of the drug use (e.g., consequences, personal effects, setting variables, polydrug use); (c) the onset, duration, and frequency of use for 12 major psychoactive substance categories; (d) the presence of psychosocial risk and protective factors, with a focus on variables thought to be precipitatant or resiliency factors in adolescent drug use severity; (e) the existence of co-occurring behavioral and psychiatric problems that may accompany drug use; and (f) the extent and nature of invalid response tendencies (e.g., faking bad, faking good). Whereas the main purpose of the PEI is to address the clinical need related to treatment referral, the questionnaire also can be used as an evaluation tool in pre- and posttreatment assessment. The use of the PEI for these purposes is discussed later in the chapter. Scale Descriptions The PEI consists of two parts. The Problem Severity section (Part I) consists of 153 questions organized into five Basic scales, five Clinical scales, two validity indices, and a set of questions concerning drug use frequency/duration and age of onset. This section begins with 80 items addressing drug use experiences. Response options are formatted by either 4-point (never/once or twice/sometimes/often) or 3-point (never/once or twice/more than once or twice) options. A set of 16 yes-no items follow, 11 of which measure defensiveness. The next 14 items measure general frequency of drug use, which are followed by 37 items that address frequency and duration of drug use at 3 months, 1 year, and lifetime for 12 specific classes of drugs. The final six questions concern the onset of initial regular use of alcohol, marijuana, and other drugs. The Psychosocial section (Part II) consists of 147 items, which are divided into eight Personal Adjustment scales, five Environment scales, six Problem screens, and two validity indices. These items have either a 4-point (strongly disagree/disagree/agree/strongly agree; or seldom or never/sometimes/often/almost always) or a 3-point response format (never/once or twice/more than once or twice). The reading level of the PEI is approximately sixth grade, based on the procedure developed by Fry (1977). An effort was made to construct items that have short sentences

< previous page

page_600

next page >

< previous page

page_601

next page > Page 601

and to avoid complicated double negatives. The definitions and number of items for each scale are provided: Part IProblem Severity Sections Basic Scales 1. Personal Involvement with Chemicals (29 items). This scale measures the degree of psychological involvement with drug use. High scores indicate frequent use at times and in settings that are inappropriate for drug use; use for psychological benefit or self-medication; and restructuring activities to accommodate use. Low scorers report relatively infrequent use and involvement limited primarily to social and recreational settings. 2. Effects from Use (10 items). The items of this set measure the immediate psychological, physiological, and behavioral effects of using chemicals, most of which refer to negative or aversive states and feelings. 3. Social Benefits of Use (8 items). This scale reflects increase in perception of social affiliation and competence and peer acceptance as a result of drug use. 4. Personal Consequence of Use (11 items). Items in this set primarily focus on difficulties with friends, parents, school, and other social institutions resulting from substance use. Some items pertain to behavioral changes in the individual that may be related to these consequences. 5. Polydrug Use (8 items). The items defining this scale are all indicators of use of drugs other than alcohol, including marijuana, stimulants, tranquilizers, quaaludes or downers, cocaine, hallucinogens, and heroin or other opiates. Clinical Scales 6. Transsituational Use (9 items). This set of items represents use in a variety of physical settings, particularly those that are inappropriate for drug use (e.g., school), and use across a range of temporal settings (e.g., early morning). 7. Psychological Benefits of Use (7 items). These items suggest the use of drugs to reduce negative emotional states (e.g., loneliness, depression, boredom, and anxiety) and use related to enhancing pleasurable affect. 8. Social-Recreational of Use (8 items). Items from this scale are associated with use of drugs to enhance social situations and peer interactions. 9. Preoccupation (8 items). High scorers report preplanning future use, restructuring activities to promote private or social use, and rumination about use. 10. Loss of Control (9 items). This set of items is associated with the inability to abstain from chemical use, or difficulty using in moderation when chemicals are available. Validity Indices 11. Infrequency (7 items). These items refer to extremely unlikely drug use behavior and thus are expected to show very low rates of endorsement. High scorers may be ''faking bad," displaying inattention, or randomly responding. 12. Defensiveness (15 items). The basis for this set of items was the 33-item Marlowe-Crowne Social Desirability Scale (Crowne & Marlowe, 1960), a frequently used measure of defensiveness or social desirability. The items were modified slightly to make them appropriate for an adolescent population. Drug Use Frequency/Duration and Age of Onset Twelve items in this section summarize frequency of use of 12 categories of drugs during lifetime, the last 12 months, and the last 3 months. These items are similar to those used in national surveys of American high school seniors (Johnston, Bachman, & O'Malley, 1985). The final 6 items in Part I inquire about the school grade at which the respondent first used and regularly used alcohol, marijuana, and other drugs. Part IIPsychosocial Scales Personal Adjustment Scales 1. Negative Self-Image (10 items). This scale reflects lack of self-esteem and self-regard, personal dissatisfaction, and feelings of incompetence.

< previous page

page_601

next page >

< previous page

page_602

next page > Page 602

2. Psychological Disturbance (10 items). Items from this scale are associated with psychological problems and distress, such as difficulties with mood and thinking, and physical signs of distress. 3. Social Isolation (8 items). This scale taps perception of social discomfort and incompetence, and feelings of mistrust toward others. 4. Uncontrolled (12 items). These items focus on the tendency to act-out, display anger and aggressiveness, and to defy authority figures and rules. 5. Rejecting Convention (11 items). The items in this set measure the extent to which the individual does not endorse traditional beliefs about right and wrong. Items ask about attitudes toward lying, breaking rules, stealing, and oppositional behavior. 6. Deviant Behavior (10 items). High scores on this scale suggest actual participation in unlawful, delinquent, or oppositional behavior (e.g., hitting a teacher, breaking into a home). 7. Absence of Goals (11 items). Elevated scores on this scale represent lack of planning or thinking about future plans, goals, and expectations, including finishing school and career attitudes. 8. Spiritual Isolation (7 items). High scores on this scale point to lack of belief in a spiritual life or force, few spiritual experiences, and little use of prayer. Family and Peer Environment Scales 9. Peer Chemical Environment (8 items). The items defining this scale indicate involvement with chemicals by one's peers. 10. Sibling Chemical Use (4 items). This set represents chemical use by brothers or sisters. 11. Family Pathology (14 items). Items from this scale are associated with family problems of chemical dependency, physical or sexual abuse, and severe family dysfunction. 12. Family Estrangement (9 items). This scale reflects lack of family solidarity and closeness, and the presence of parent-child conflict. Screens for Other Problems Brief screens are provided for the following problem areas: 13. Family Chemical Abuse (Parental and sibling chemical abuse). 14. Physical Abuse (Intrafamilial physical abuse). 15. Sexual Abuse (Intrafamilial and other source of sexual abuse). 16. Eating Disorders (Bulimia and anorexia nervosa). 17. Need for Psychiatric Referral (Signs that suggest need for psychiatric evaluation). 18. Suicide Potential (Indications of serious suicidality). Validity Indices 19. Infrequent Responses (11 items). These items have either very low or high rates of endorsement and may reflect "faking bad," inattention, or random responding (e.g., "I like sunny days"; "I would rather lose a game than win"). 20. Defensiveness (12 items). This is another subset of items from the Marlowe-Crowne Social Desirability Scale and adapted for use with an adolescent population. Test Administration Examinee Characteristics The intended age range for the questionnaire is from 12 to 18 years. Although the test has been administered to older individuals, this is not generally recommended because the test's development and validity data are based on adolescents. Some professionals may consider using the PEI with general population subjects. In most cases, the low base rate of excessive drug involvement in a general adolescent population would argue against use of the Problem Severity section with this group. However, the instrument's Psychosocial scales may be relevant for general population administration because clinical and nonclinical score distributions on many of these scales are not markedly

< previous page

page_602

next page >

< previous page

page_603

next page > Page 603

different. Finally, great caution should be taken when the PEI is administered to populations whose demographic and background characteristics are largely different from those of the samples used in the validity analyses. Materials and Scoring The PEI can be administered in paper-and-pencil or computer formats (with materials marketed by Western Psychological Services). For paper-and-pencil administration, the six-page PEI test booklet is needed. Computer administration involves use of an IBM-compatible diskette. Score reports of the PEI are computer generated and consist of standardized scores (based on both drug clinic and nonclinical standardization samples), narrative descriptions for each scale, and summaries of responses to the drug use frequency and screening items. To assist with treatment referral decisions, an overview of residential treatment indicators are provided. Scale Construction Problem Severity Scale development efforts began by assembling a pool of 600 Problem Severity items from existing instruments that represented a broad range of drug abuse problem indicators. In reviewing all the relevant literature, a diverse range of models and theoretical approaches to drug involvement were considered, including the Alcoholics Anonymous, learning, social-based, and the psychiatric model. This approach to defining the target constructs was oriented toward searching for characteristics and symptoms reflecting problems that precede or resulted from drug involvement. The process of developing the Problem Severity scales employed both rational and empirical strategies (see Henly & Winters, 1988, for more details). Under the rational approach, items were assigned a priori to scales, and then reassigned or deleted based on their own scale and other scale correlations so that scale reliability and independence could be maximized. For the empirical approach, a variety of factor and cluster analytic procedures were employed in an effort to define relatively independent dimensions that emerged reliably across methods. Both procedures identified dimensions or constructs that were quite similar, and from the standpoint of psychometric adequacy, only modest differences in the two sets of scales existed. The relatively brief empirical scales had reliability estimates that approached or exceeded those of their longer rational counterparts, which is to be expected in view of empirical scale construction methods employed. Psychosocial Scales A great deal of research effort has been devoted to understanding the role of psychosocial factors in the etiology and maintenance of adolescent drug use (Glantz & Pickens, 1992). An extensive review of this research literature was conducted in order to identify an appropriate list of psychosocial risk factors. Studies reviewed included cross-sectional and longitudinal studies of high school samples, and adult retrospective and prospective studies of prealcoholic characteristics. The set of identified variables was supplemented by factors suggested by consultants and service providers. Over 200 new items were

< previous page

page_603

next page >

< previous page

page_604

next page > Page 604

written to reflect the identified factors and formatted to have either three or four response options: never/once or twice/more than once or twice; strongly disagree/disagree/agree/strongly agree; or seldom or ever/sometimes/often/almost always. The following were added to these 232 items: 31 items adapted from the Marlowe-Crowne Social Desirability Scale (Crowne & Marlowe, 1960); 14 infrequency items expected to have rates of endorsement of less than 10% or greater than 90%; 11 psychosocial stressor items reflecting life events that might be important contextual factors in interpreting questionnaire responses, and that were viewed by the service providers as important clinical phenomena in their own right (e.g., intrafamilial sexual abuse); and a 28-item drug use frequency checklist assessing frequency of use for nine classes of substances over lifetime, 12 months, and 3 months. As detailed elsewhere (Henly & Winters, 1989), scale scores, reliability estimates (coefficient alpha), and item scale correlations were computed for each a priori scale based on results from a drug clinic sample. Twelve scales having acceptable levels of reliability (a > .70) and independence (proportion of unique, reliable variance ³ .25) emerged from the scale analyses. Problem Screens Specific criterion groups related to problem areas of interest were defined according to staff ratings on each client at the participating agencies. Ratings were recorded on a problem checklist and were based on interviews with the client, parents, and school informants, and on any in-house questionnaire administered routinely to adolescent clients. The specific criterion groups defined were eating disorders, intrafamilial physical abuse, intrafamilial sexual abuse, family (parent or siblings) history of substance abuse/dependence, suicidal behavior, and need for psychiatric referral. Psychosocial items relevant to each problem area were tested statistically for their discriminative value, and additive or configural decision rules were established that optimized the discrimination of the criterion groups identified from drug clinic samples. Drug Use Frequency Checklist Included in the development of the PEI scales was the insertion of a drug use frequency checklist adapted from survey instruments used in national surveys of high school seniors (e.g., Johnston et al., 1985). Frequency of drug use for 12 drug categories (from alcohol to inhalants) during lifetime, last 12 months, and last 3 months were included (the latter time frame was chosen rather than the more standard 1-month period as a clinical consideration). Reliability and Validity of the PEI Samples and Procedures Since the mid-1980s, the authors of the PEI have been collecting psychometric data from three types of sites: drug clinic, juvenile detention, and nonclinical (school). Participation in PEI research by an agency or program required an agreement from the administration that they either allow research staff to collect the data or have their staff

< previous page

page_604

next page >

< previous page

page_605

next page > Page 605

trained in PEI testing procedures; collect additional data, such as concurrent measures; follow informed consent and confidentiality procedures; and provide all data to the research staff at the completion of their commitment. The drug clinic data have been collected from over 45 drug abuse evaluation or treatment programs across the United States and Canada, representing a range of service settings (residential and nonresidential, 30-day and longer term, short-stay evaluation, hospital-based and freestanding) and locales (rural and urban). The juvenile offender samples were obtained from eight sites: four residential, state-operated facilities that house adolescent felons and those in need of supervision; one drug education program for teenagers convicted of minor charges related to drug use; and three state drug evaluation programs that received referrals from the court. The school samples were collected from three school districts (two urban, one rural); here participants were drawn from targeted classrooms believed to be representative of students within the participating school districts. Data collection for the drug clinic and juvenile offender samples has been an ongoing process since 1985; the school data were collected during a 12-month period (1986-1987). The clinical samples (drug clinic and juvenile offender) were recruited by staff at the participating facilities. Adolescents who agreed to take the test were assured anonymity and confidentiality. For minors to participate, parental consent was also required. Test administrators (either facility employees or research staff) who administered the PEI received detailed and standard instructions, and they were responsible for collecting other data and questionnaires. Subjects with obvious signs of intoxication, withdrawal, or other cognitive or learning impairments were screened from testing. If staff at the drug clinic or juvenile offender sites provided ratings and diagnoses, they were blind to PEI results. Drug clinic subjects were tested as soon as was practical after initial client contact, which was usually after 2 or 3 days from intake. Juvenile offender subjects were tested prior to participation in any drug rehabilitation program. Alternate drug use questionnaires were administered concurrent to administration of the PEI. The Minnesota Multiphasic Personality Inventory (MMPI) was administered independently of the PEI, but usually within 2 to 3 days of PEI administration. A small sample of parents of adolescents who had completed the PEI was administered a questionnaire about their child's behavior and experiences. Staff at participating programs or agencies provided client information (problem severity ratings, treatment history, referral recommendations, and discharge status) on a summary data form. Provided next is an overview of representative findings from these series of studies. Interested readers can turn to the literature for more complete descriptions of these PEI psychometric studies (Henly & Winters, 1988, 1989; Winters, Stinchfield, Henly, & Schwartz, 1991; Winters, Stinchfield, & Henly, 1993, 1996). Internal Consistency. Data on the internal consistency of the PEI scales were computed for the school, drug clinic, and juvenile offender samples as a function of gender and ethnic background. Internal consistency was also examined for two age groups (young = 12-15 years; old = 16+ years). The total internal consistency sample consisted of 6,759 subjects (6,066 drug clinic/juvenile offenders and 693 school subjects). Table 20.1 provides a summary of the reliability (coefficient alpha) data across ethnic groups. Data between other subgroups (setting, gender, age group) are indistinguishable. Good to excellent alpha coefficients were obtained across all samples. The range (and median) of alpha coefficients for the two main groups of PEI scales are as follows: Basic Problem Severity, .76-.97 (median = .88); Psychosocial, .59-.88 (median = .82). The internal consistency estimates for the Response Distortion scales (not shown in table) are

< previous page

page_605

next page >

< previous page

page_606

next page > Page 606

TABLE 20.1 Internal Consistency Coefficients for PEI Scales by Ethnic Groups White Hispanic African American Asian (N = (N = American Indian American 4,236) 996) (N = (N = (N = 94) 740) 253) Basic Scales Personal Involvement with .97 .97 .97 .97 .97 Chemicals .91 .89 .89 .89 .89 Effects from Use .88 .84 .86 .83 .86 Social Benefits .90 .87 .86 .87 .86 Personal Consequences .87 .85 .76 .85 .80 Polydrug use Psychosocial Scales .84 .77 .67 .73 .64 Negative Self-image .83 .81 .78 .83 .82 Psychological Disturbance .70 .59 .60 .66 .69 Social Isolation .88 .88 .85 .85 .84 Uncontrolled .75 .71 .70 .69 .70 Rejecting Convention .85 .86 .83 .87 .87 Deviant Behavior .82 .81 .82 .82 .83 Absence of Goals .86 .82 .78 .81 .78 Spiritual Isolation .83 .76 .73 .74 .80 Peer Chemical Environment .85 .87 .81 .82 .86 Sibling Chemical Use .83 .81 .82 .81 .81 Family Pathology Family Estrangement .84 .80 .74 .79 .72 Note. Obtained results are based on composite data across three sites (drug clinic, juvenile offender, school). somewhat lower than those for the substantive scales, but this is to be expected because they have smaller standard deviations. The data also indicate that scale internal consistency estimates are quite consistent across subgroups defined by site (school, drug clinic, and offender), sex, and age group. In addition, these favorable results suggest that the PEI was developed by proper sampling of the content domains, and that the individual PEI scales are comprised of items that appear to be measuring a common attribute. Test-Retest Reliability. A subset of drug clinic and school subjects participated in a test-retest evaluation of the PEI. Due to second testing refusal or absenteeism, or to errors in identification codes, participation rates at retest were less than 100%. Averaging across the test-retest samples, about 10% of the test cases could not be matched with retest data. Test-retest data was collected on two drug clinic samples (1-week interval, n = 59; 1-month interval, n = 44), a waiting list sample (1-month interval, n = 46), and a school sample (1-month interval, n = 123). The results suggest that temporal stability of the PEI scales varies as a function of the retest interval length and intervening experiences. In general, scores were more stable for subjects that did not receive formal drug clinic services during the period between the two testings (i.e., the 1-month drug clinic waiting list sample and the school sample). The range of stability coefficients was .65 to .96 (median = .84) for the 1-month drug clinic waiting list and .65 to .90 (median = .79) for the school sample. Within the drug clinic samples, greater score stability was observed in the 1-week retest group than in the 1-month retest group, which is to be expected from test-retest theory. Convergent Validity. Convergent validity evidence of the PEI scales will document the extent to which scale scores are associated with alternate measures of similar or related constructs. Data from the client and other informants (research staff, counselor, and parent/guardian) are presented in Table 20.2.

< previous page

page_607

next page > Page 607

PEI Basic Scale

TABLE 20.2 Convergent Validity of the PEI Basic Scales Research Staff Parent Client Rating Rating Counselor Rating Rating AAIS ADS DUF Legal SUD Referral ConsequencesSymptoms Global Consequences (N = (N = (N = Problems (N = (N = of Use (N = 140) Rating of Use 224) 83) 140) (N = 115) 140) (N = 140) (N = (N = 140) 140) 140) .59 .59 .76 .43 .82 .88 .68 .74 .79 .26

Personal Involvement Effects from Use .56 .60 .63 .25 .65 .76 .55 .58 .68 .23 .13 Social Benefits .49 .53 .53 .28 .63 .66 .47 .57 .57 Personal .59 .60 .66 .58 .72 .80 .58 .60 .77 .19 Consequences Polydrug Use .45 .35 .72 .44 .74 .81 .67 .70 .75 .20 Note. Questions Rating Description: AAIS = Adolescent Alcohol Involvement Scale; ADS = Alcohol Dependence Scale; DUF = Drug Use Frequency (aggregate across 12 drugs); SUD = Substance Use Diagnosis (1 = no diagnosis, 2 = abuse only, 3 = at least one dependence); Legal Problems, Consequences of Use, Symptoms, and Global Rating refer to rating forms described in Winters et al., 1996; Referral = Recommended level of drug treatment (1 = no drug treatment, 2 = drug treatment). All correlations are significant at the p

< previous page

page_608

next page > Page 608

Correlation data were obtained between Basic Problem Severity scales and the following self-report problem severity measures: Adolescent Alcohol Involvement Scale (AAIS; Mayer & Filstead, 1979), Alcohol Dependence Scale (ADS; Horn, Skinner, Wanberg, & Foster, 1982), a composite frequency score derived from a standard drug use frequency (DUF) checklist (Johnston et al., 1985), admitted legal problems, and substance use disorder diagnoses (no diagnosis vs. abuse vs. dependence) based on results from a structured interview (Winters & Henly, 1993). The results indicate that the PEI Problem Severity scales are highly correlated with these related measures. The magnitude of these correlations (.25-.76, median = .57) indicates that the PEI Basic scales reflect, to a large degree, the same construct measured by these questionnaires. Based on semistructured intake interviews, ratings indicative of client drug involvement were identified: global rating, consequences of drug use, drug use symptoms (based on Heilman's, 1973, signs of chemical dependency), substance use diagnosis, and referral recommendation (no drug treatment vs. drug treatment). Also, the designated parent/guardian completed the same consequences checklist at intake. Counselor and research staff ratings converged highly with the PEI; all of these ratings had rs > .47 (median = .68). Perhaps not too surprising, the parent rating of consequences yielded a lower correlation (rs = .13-.26) Nevertheless, the significant and relatively large magnitude of the correlations between the PEI Basic scales and other informant ratings indicate that the PEI Problem Severity scales are not just congruent with other self-report measures. Correlations between PEI Psychosocial scales and scales of the MMPI were computed. PEI scales that measure personal and interpersonal adjustment generally showed significant relations with MMPI scales; those PEI scales reflecting values and characteristics of others showed little relation to MMPI scales. In order to better define relations between MMPI scales and relevant PEI scales, an interbattery factor analysis was conducted. As part of this analysis, a set of revised orthogonal scales was created for each inventory. Each of the orthogonal scales relates maximally to an existing scale, but is statistically independent of the other orthogonal scales from its instrument. (Correlations of the original and corresponding PEI scales ranged from .87 to .98; those for the MMPI orthogonal ranged from .69 to .91.) Four statistically significant interbattery factors were extracted and rotated to a univocal varimax solution. Loadings of the orthogonalized scales on the interbattery factors are presented in Table 20.3. Factor I may be hypothesized to represent tendencies toward acting-out behavior, given the salient positive loadings for orthogonalized Rejecting Convention (PEI), Deviant Behavior (PEI), and Hypomania (MMPI), and a negative loading for Hysteria (MMPI). Factor II may reflect global psychological distress, in view of the substantial loading of the Psychological Disturbance scale (PEI), and moderate loadings for six of the nine MMPI scales. Factor III appears to be concerned with adolescent/parent conflict, based on salient loadings for Family Estrangement (PEI) and Psychopathic Deviate (MMPI) scales (which for adolescents often reflects family conflicts). Factor IV, on which Social Isolation (PEI) and Social Introversion (MMPI) scales load importantly, may be interpreted as reflecting feelings of loneliness or social inadequacy. Discriminant Validity. To investigate the ability of the PEI to discriminate between meaningfully defined groups, adolescents were compared as a function of group status (drug clinic, juvenile offender, and normal; see Table 20.4). The analysis of variance (ANOVA) revealed three main results: Mean scores on the Problem Severity and Psychosocial scales were virtually identical for drug clinic and juvenile offender groups, except for a significantly higher Deviance mean score for the juvenile offender group;

< previous page

page_608

next page >

< previous page

page_609

next page > Page 609

TABLE 20.3 Selected PEI Psychosocial Scales Versus MMPI Scales Interbattery Factor Loading I II III IV Orthogonalized PEI Scale .32 .33 Negative Self-image .79 Psychological Disturbance .79 Social Isolation Uncontrolled .33 Rejecting Convention .66 Deviant Behavior .33 Absence of Goals .68 Family Estrangement Orthogonalized MMPI Scale .30 .32 1 Hypochondriasis (Hs) .34 2 Depression (D) .40 .34 .38 3 Hysteria (Hy) .79 4 Psychopathic Deviate (Pd) .33 6 Paranoia (Pa) .45 7 Psychasthenia (Pt) .39 8 Schizophrenia (Sc) .57 9 Hypomania (Ma) .70 0 Social Introversion (Si) Note. Lodings < .30 in absolute value are omitted. TABLE 20.4 PEI Scale Standardized (T) Scores as a Function of Group Membership PEI Scale School Juvenile Drug (N = Offenders Clinic 567) (N = 160) (N = 889) M M M Basic Scales 34.57 49.32 50.10 Personal Involvement with Chemicals 38.36 50.09 50.22 Effects from Use 40.07 49.69 50.19 Social Benefits 38.35 51.58 50.00 Personal Consequences 34.64 50.25 49.87 Polydrug Use Psychosocial Scales 42.12 51.11 49.95 Negative Self-image 45.96 50.88 49.81 Psychological Disturbance

Social Isolation Uncontrolled Rejecting Convention Deviant Behavior Absence of Goals Spiritual Isolation Peer Chemical Environment Sibling Chemical Use Family Pathology

47.71

53.06

49.98

42.74

48.86

50.10

48.00

49.79

49.85

34.79

57.57

49.93

43.67

50.35

49.79

46.22

51.06

49.92

39.03

49.91

50.00

47.57

52.27

50.48

41.99

52.57

49.81

40.63 47.11 49.95 Family Estrangement Note. Due to group inequity in terms of sex and age distribution, PEI scale scores are reported in standardized form. All F rates are significant at p < .001 level. From Winters and Henly (1989).

< previous page

page_609

next page >

< previous page

page_610

next page > Page 610

the normal group had a significantly lower mean scores on all the Problem Severity scales compared to the drug clinic and juvenile offender groups; and the normal group had mean scores on the Psychosocial scales that more closely resembled those of the other two groups, particularly with respect to Social Isolation, Rejecting Convention, and Sibling Chemical Use. The between-group analysis for diagnosis was performed on only the Basic Problem Severity scales because they were developed with the expectation of discriminating groups according to severity of substance use diagnosis. The results of the ANOVA in Table 20.5 indicated that diagnosis group mean scores on each Basic scale differed significantly (p < .01), with mean differences ordered as expected (dependence > abuse > no diagnosis). Post hoc contrasts (Student-Newman-Keuls) yielded significant pairwise differences across all three groups on three of the five Basic scales. Scores on Effects from Use and Social Benefits for the abuse diagnosis group did not significantly differ from the no diagnosis group (although the small sample size of the abuse group provides a less than optimal comparison to the other groups). The previous findings offer evidence as to the discriminant validity of the PEI. For the diagnostic comparisons, discrimination was found among groups, particularly between the abuse and dependence groups. Also, the widespread differences between the drug clinic and normal groups supports the PEI's validity. It was somewhat surprising to find that the drug clinic and juvenile offender groups had comparable mean scores. However, officials from the participating juvenile detention centers felt the scores were not unexpected; they generally agreed that heavy drug use and personal and family problems are common to the juvenile offender population. Furthermore, the literature offers similar conclusions. Jessor, Donovan, and Costa (1991) found that the causal relations between various psychosocial variables and alcohol and marijuana abuse were essentially identical to those associated with other forms of problem behavior. Similarly, Kandel (1978), in a review of findings from eight major longitudinal studies of adolescent drug use, concluded that attitudes and behavior associated with delinquent teenagers typically precede rather than result from drug involvement, and recent descriptive studies of youth in detention centers indicate that they report multiple problems and risk factors (Dembo et al., 1990). Predictive Validity: Treatment Involvement. Research on the PEI's ability to predict future behavior has been directed at the Basic scales' association with treatment variables. The issue regarding the use of the PEI in predicting treatment outcome is complex TABLE 20.5 Between-Group Analysis of PEI Basic Scales and Diagnostic Groups Diagnostic Groups Scores (M ± SD) PEI Scale No Abuse Dependence F Diagnosis Diagnosis Diagnosis (n = 48) (n = 7) (n = 60) Personal 35.9 ± 6.6 40.1 ± 4.2 51.3 ± 8.8 53.2 Involvement Effects from Use 39.8 ± 8.3 41.6 ± 7.0 52.4 ± 9.5 28.0 Social Benefits 39.7 ± 7.5 46.6 ± 4.4 50.2 ± 9.7 19.7 Personal 39.1 ± 5.9 43.6 ± 7.5 51.5 ± 9.3 32.8 Consequences Polydrug Use 38.3 ± 6.5 42.0 ± 4.9 51.9 ± 9.8 34.8 Note. All univariate F ratios reported are significant at p < .001 level.

< previous page

page_610

next page >

< previous page

page_611

next page > Page 611

TABLE 20.6 Association of Intake PEI Basic Scales with Treatment Involvement Variables Intake PEI Scale Attended Frequency of Attended Frequency Drug Attendance atAftercare of Treatment Drug (N = Attendance (N = 140) Treatment 140) at (N = 140) Aftercare (N = 140) Personal .39 .35 .31 .20 Involvement Effects from Use .34 .27 .26 .17 Social Benefits .26 .28 .26 .13 Personal .26 .28 .29 .18 Consequences Polydrug Use .40 .33 .36 .28 Note. All correlations (Pearson's R) are significant at the p < .o5 level except those underlined. The categorical variables are coded as follows: Counselor's Treatment Referral Decision (1 = no treatment, 2 = drug education, 3 = outpatient treatment, 4 = inpatient treatment), Attended Drug Treatment (1 = no, 2 = yes), and Attended Self-help Aftercare (1 = no, 2 = yes). enough that the topic is addressed in a separate section. The association of PEI and treatment retention and participation variables is discussed later. The predictive validity results for the treatment involvement variables are summarized in Table 20.6. Moderately positive coefficients were obtained for the four treatment involvement variables (mean r = .28; range r = .13-.40), suggesting that as drug use problem severity increases, client involvement in treatment tends to increase somewhat as well. The results also indicated a slight tendency for the magnitude of the associations to correspond to the temporal distance between the intake measure and the predicted treatment variable. The size of the validity coefficients tended to decrease as the predicted variable represented further temporal distance from the intake PEI measures. For example, the PEI scales had a mean r of .33 with drug treatment involvement, in contrast to a mean r of .19 with frequency of attendance to aftercare. Based on the limited literature in this area, sizable correlations between client report of problem severity and treatment involvement variables should not be expected. Engagement in treatment is likely affected by a variety of factors. However, the modest correlations obtained suggest a tendency among youth who acknowledge more severe problems to attend treatment compared to those who report less severe problem severity. The extent to which adolescent drug treatment participation variance is mediated by other factors is an important line of research that has received a great deal more attention in the adult literature. A potentially fruitful research direction is to understand the role of motivational factors and stages (e.g., Prochaska & DiClemente, 1992) as they pertain to seeking and attending drug treatment. There have been some recent efforts to develop treatment readiness measures for adolescent drug abusers that incorporate motivational constructs (e.g., Cady, Winters, Jordan, Solberg, & Stinchfield, 1996). Use of the PEI for Treatment Referral Overview of Treatment Planning Issues The primary aim of an effective treatment plan is to develop a set of interrelated treatment strategies tailored to the unique assets and problems of the client. Such plans are thus best considered within a larger context defined by ongoing relations between

< previous page

page_611

next page >

< previous page

page_612

next page > Page 612

assessment, treatment planning, and treatment outcome. This section describes how clinicians may use the PEI as one source of information to assist with the development of an individualized treatment plan. Theoretical issues central to treatment planning as well as current research in this area are presented to provide a context for how to use the PEI as a treatment planning tool. Determining the Appropriate Treatment Across a Continuum of Care Descriptions of the continuum of care available to treat adolescent drug abusers typically describe four basic levels: outpatient, partial hospitalization, residential, and inpatient treatment programs (Margolis, 1995; Muisener, 1994). The primary determinants of placement within a given level of care are the nature and severity of the adolescent's drug use and related problems, and the degree and quality of support present in the adolescent's environment. The former dimension concerns the adolescent's drug use problem severity, medical status, and psychosocial (including psychiatric) functioning. The environmental dimension concerns the behaviors, attitudes, and psychiatric status of family members, and the quality of interpersonal relationships with other adults and peers who play key roles in the adolescent's life. Outpatient Treatment. Outpatient treatment may include a range of services and levels of intensity that do not provide overnight stay. One primary indication for outpatient treatment is the presence of a stable interpersonal environment. Outpatient treatment may also be indicated when the adolescent presents with no significant psychiatric or medical problems while possessing sufficient resources to take advantage of services with inherently less structure when compared to residential treatment. Outpatient treatment for adolescent substance abuse is likely to be more effective by coordinating individual and family sessions that address both the adolescent's substance use as well as family issues underlying maladaptive behaviors (Minuchin, 1974). Partial Hospitalization Program. Partial hospitalization programs or day treatment is the highest outpatient intensity level and has been defined as providing 20 or more hours of structured programming weekly (Schonberg, 1993). Several indicators for partial hospitalization include continued substance abuse despite outpatient treatment, substantial relapse following abstinence achieved during residential care, or continuation of care following successful completion of residential treatment (Muisener, 1994). In addition, interpersonal support must be sufficient to promote abstinence and the adolescent should not present with significant psychiatric or medical problems. Residential Treatment. Residential treatment generally provides a range of interrelated individual, group, and family services applied within a setting where patients remain overnight (Schonberg, 1993). Length of treatment has declined substantially during recent years such that the average length of stay for residential patients is approximately 30 days (Latimer, Winters, Stinchfield, & Newcomb, 1998). Residential treatment is indicated for youth with significant substance abuse problems. In addition, residential treatment indicators include continued substance abuse despite partial hospitalization treatment, significant environmental distress that greatly heightens continued substance abuse or relapse risk by an adolescent currently in outpatient or partial

< previous page

page_612

next page >

< previous page

page_613

next page > Page 613

hospitalization treatment, or significant medical or psychiatric problems exhibited by the adolescent that require acute or ongoing management (Muisener, 1994). Inpatient Treatment. Inpatient treatment provides the most intensive residential treatment experience and is housed in a medical unit where constant patient supervision is available, when indicated (Schonberg, 1993). Inpatient and residential treatment options have much in common. Given changes following the advent of managed care, the degree of problem severity required to obtain inpatient treatment is extreme, that is, the immediate safety of the adolescent patient or others is in question. Thus, the primary difference between residential and inpatient programs is that the latter generally serve adolescents who present with severe psychiatric or medical problems that threaten the immediate safety of the substance abusing adolescent. The Use of the PEI to Inform Treatment Placement Decisions regarding treatment placement must be made with a wide range of information sources and assessment strategies. The PEI can be used as one tool within a larger assessment protocol designed to determine placement. Table 20.7 suggests PEI-based guidelines for placing drug abusing youth in programs along a continuum of care. Obviously, these guidelines should not be the sole source for decision making, but should be used in conjunction with other reports and clinical judgment. PEI dimensions utilized in the placement guidelines include signs of psychological dependence, loss of control or excessive preoccupation with drugs, family problems, and psychiatric problems. Theoretical Models of Treatment Planning The conceptualization of adolescent drug abuse treatment planning presented here incorporates knowledge from two general research areas: adolescent development and biopsychosocial factors associated with substance abuse vulnerability (Glantz & Pickens, 1992; Margolis, 1995). Adolescent Development and Treatment Planning. Understanding developmental issues of adolescence is considered central to the formulation of effective substance abuse treatment plans for youth (Trad, 1993). Primary developmental issues of adolescence include negotiating levels of autonomy and dependence in relation to parents and families, identity formation, adjusting to physical changes, sexuality, academic functioning, and peer relationships (Erikson, 1963; Parrish, 1994). Awareness of developmental issues underscores, for example, the need for appropriate levels of selfdetermination during adolescence in order to develop self-esteem and psychosocial competencies. Such awareness has informed modifications in traditional 12-step approaches to treatment as well as the use of cognitive-behavioral and relapse prevention models when treating substance abusing youth. Table 20.8 illustrates how the PEI may be used to measure characteristics unique to adolescents that also may importantly influence treatment planning (Margolis, 1995). Adolescent Substance Abuse Determinants and Treatment Planning. Theoretical conceptualizations of adolescent treatment planning have also relied heavily on research on biopsychosocial factors associated with the onset and severity of substance abuse.

< previous page

page_613

next page >

< previous page

page_614

next page > Page 614

TABLE 20.7 Guidelines for Substance Abuse Treatment Placement Along a Continuum of Care PEI Dimension Outpatient or Partial Hospitalization Depth of psychological dependence 40 £ T £ 60 on most or all Basic scales Signs of loss of control of drug use 40 £ T £ 60 on Loss of Control OR or excessive Preoccupation with Drugs scales preoccupation with drug use Signs of family dysfunction 40 £ T £ 60 on Family Pathology OR Family Estrangement OR Sibling Chemical Use scales AND Sexual and Physical Abuse Problem Screens are negative Signs of psychiatric problems 40 £ T £ 60 on Psychological Disturbance scale OR Psychiatric Referral Problem Screen is positive AND Suicide Potential Problem Screen is negative PEI Dimension Residential or Inpatient Treatment Depth of psychological dependence T > 60 on most or all Basic scales T > 60 on Loss of Control scale Signs of loss of control of drug use OR or excessive T > 60 on Preoccupation with preoccupation with drug use Drugs scale Signs of family dysfunction T > 60 on Family Pathology scale OR T > 60 on Family Estrangement scale OR T > 60 on Sibling Chemical Use scale OR Sexual Abuse Problem Screen is positive OR Physical Abuse Problem Screen is positive Signs of psychiatric problems T > 60 on Psychological Disturbance scale OR Psychiatric Referral Problem Screen is positive OR Suicide Potential Problem Screen is positive Note. Guidelines presented are adapted from those presented in the original PEI manual (Winters & Henly, 1989). Major assessment dimensions used to determine a treatment planning include drug abuse problem severity; areas of psychosocial strength and weakness, including comorbid psychiatric status; and available resources in the community (Friedman & Utada, 1989; Rahdert, 1991; Tarter, 1990; Trad, 1993). Rather than viewing drug use as a disease, current research and treatment programs tend to view adolescent drug involvement in terms of interactions between individual, interpersonal, and contextual factors that together heighten vulnerability (Glantz & Pickens, 1992; Hawkins, Catalano, & Miller, 1992). The PEI is based on research and theory consistent with this view that adolescent drug abuse should be described according to problem severity and psychosocial dimensions. Within this framework, the following section

provides suggestions on how PEI score profiles might inform specific treatment strategies for individual patients organized by these two central dimensions. Given the lack of definitive research regarding the effectiveness of different treatment strategies for adolescent drug abusers, these suggestions should be viewed as clinical starting points rather than as empirically derived findings.

< previous page

page_614

next page >

< previous page

page_615

next page > Page 615

TABLE 20.8 Use of PEI to Address Unique Adolescent Characteristics Pertinent to Treatment Planning Adolescent Use of PEI Characteristic ''Adolescent drug The Personal Involvement with Chemicals scale is an abuse manifests effective measure of psychological dependence and itself through consequences of use for adolescents. The problem behaviors Uncontrolled and Deviant Behavior scales address rather than overt problem behaviors associated with drug abuse. Tsigns of drug scores based on drug clinic norms between 40 and 60 abuse." reflect medium to high risk. T-scores above 60 reflect very high risk. "The disorder The PEI assesses 3-month, 6-month, and 12-month progresses more use frequencies for alcohol and other drugs prior to rapidly in intake assessment. Differences adolescents than it in frequency levels between these time points for does in adults." each substance may be used to examine recent progression of use patterns. In addition, school grade of initial use is assessed for alcohol, marijuana, and other drugs. "Adolescents abuse The PEI assesses use frequency levels across alcohol, more than one marijuana, and 10 additional drug categories. drug; they may have a `drug of choice,' but they almost always use several drugs." "Adolescents The PEI contains school- and drug-based norms; experience strong providing adolescents in apparent denial (e.g., who denial; they have report that "everyone uses drugs") with information not yet that their use is drastically elevated compared to experienced the typical adolescents (i.e., school-based norms), and years of negative equivalent to youth in drug abuse treatment programs consequences that (i.e., drug clinic norms) may help to alter cognitive an adult has distortions by refuting false beliefs. experienced." "The enabling Elevated T based on drug clinic norms on the system following scales reflect an interpersonal system that surrounding likely promotes universal adolescent substance use: adolescents is Peer Chemical Environment, Sibling Chemical Use, stronger than is and Family Pathology. usually found with adults. Usually, drug use is accepted in their peer group." "Adolescents Elevated T based on school norms on the following experience scales reflect possible delays on key psychosocial developmental dimensions under development during adolescence: delays directly Negative Self-image, Psychological Disturbance, caused by drug Social Isolation, Absence of Goals, Spiritual Isolation. use." Note. Quotations from Margolis (1995, p. 172). Defining Low, Medium, and High Risk on the PEI Problem Severity and Psychosocial Scales Cutoff points defining low, medium, and high risk on PEI scales were based on extant research findings (e.g., Winters et al., 1993), clinical judgment, and a desire to make the standard PEI score report user-friendly as a treatment planning tool. As background, PEI scales are coded so that higher scores always reflect risk status. Thus, T-scores below 40 based on drug clinic norms (i.e., approximately 16% of drug treatment youth in the "drug clinic" standardization sample) indicate low risk. Psychosocial dimensions on which adolescent drug abusers exhibit low risk generally represent either issues that require no specialized treatment, or personal assets or protective factors that may be utilized within treatment to promote self-esteem and improve coping in areas of weakness. T-scores between 40 and 60 reflect medium to high risk on the given scale and likely indicate a need for specialized services. Finally, T-scores above 60 reflect high to very high risk. A lack of specialized services addressing high risk issues indicated by this drug clinic T-score cut score will likely compromise treatment

effectiveness substantially.

< previous page

page_615

next page >

< previous page

page_616

next page > Page 616

Drug Abuse Problem Severity Research using the PEI generally focuses on drug use frequency and problem severity subdimensions. The use frequency dimension is defined in terms of the sum across the 12 categories. Psychological dependence, consequences of use, and reasons for use are assessed by the PEI's Basic scales. As noted in an earlier section, research suggests that the remaining Clinical scales of problem severity provide information that is largely redundant to the Basic scales (Henly & Winters, 1988). Higher levels of pretreatment substance use and Basic scale scores have been shown in research to be related to probability of referral to drug treatment (e.g., Winters et al., 1993). In this light, the Basic scales are considered as primarily useful for assisting with the referral decision point of drug treatment versus no drug treatment. Psychosocial Functioning Individual, group, and family substance abuse treatment approaches are available for youth that incorporate a range of 12-step, behavioral, cognitive, systems, and psychodynamic strategies and underpinnings. The 12 PEI Psychosocial scales can be heuristically organized according to five subdimensions derived from a combination of factor analytic (Winters & Henly, 1989) and rational decision-making strategies. Below we summarize hypothetical treatment strategies that logically follow from psychosocial scale elevations. Of course, empirical research is needed to validate which treatment strategies work best across specific problem areas. Attitudes and Beliefs. Studies examining the impact of attitudes have indicated that irrational beliefs (Binion, Miller, Beauvais, & Oetting, 1988; Denhoff, 1987), false perceptions of peer drug use (Ellickson, Bell, & Harrison, 1993), deviant attitudes (Newcomb & Bentler, 1988), and false beliefs on drug effects (Berdiansky, 1991) predict both adolescent drug abuse and relapse following treatment. Modifying maladaptive beliefs is critical to attaining abstinence according to social learning conceptions of drug use determinants (Marlatt, 1979). Elevated T-scores on the Rejecting Convention scale indicate the rejection of traditional beliefs about right and wrong. Although adolescent substance abuse studies are lacking in this area, preliminary findings from the adult literature support the use of cognitive therapies to alter irrational beliefs. Adult alcoholics receiving RationalEmotive Therapy exhibited significant increases in rational thinking following treatment (Ray, Freidlander, & Solomon, 1984). Rational-Emotive Therapy has also been applied effectively within therapeutic communities (Yeager, DiGiuseppe, Olsen, & Lewis, 1988) and with alcoholic offenders (Rosenberg & Brian, 1986). Coping Skills. Problem-solving and coping skill deficits among adolescents also predict substance use problem severity and relapse following treatment (Brown, Stetson, & Beatty, 1989; Labouvie, 1986; Myers & Brown, 1990). Deficits in coping are also associated with HIV risk among drug abusing youth with cognitive-behavioral programs being implemented to reduce risk (St. Lawrence, Jefferson, Alleyne, & Brasfield, 1995). The importance of coping skills for relapse prevention is supported by an adult literature that has identified high risk situations for relapse (Curry & Marlatt, 1987). Among adolescents, peer pressure to use drugs appears to be the predominant relapse precipitant (Brown, Vik, & Creamer, 1989). Although the PEI does not directly assess the adolescent's perceived ability to cope with substance use risk situations, elevated T-scores on the Social Isolation scale reflect,

< previous page

page_616

next page >

< previous page

page_617

next page > Page 617

in part, poor coping skills pertinent to making prosocial friends. Developing skills for communication, adult connectedness, and prosocial peer relationships will likely provide a critical buffer against stressors that typically operate when youth exhibit elevated T-scores on the Social Isolation scale (Hawkins et al., 1992). Cognitive therapies such as Problem Solving Therapy (D'Zurilla & Goldfried, 1971) may be utilized for this deficit. Also related to coping skills problems are elevated T-scores on the Absence of Goals scale. This scale reflects coping deficits related to school functioning that are common among the one in two drug abusing adolescents who ultimately dropout (Friedman, Glickman, & Utada, 1985). Cognitive interventions with drug abusing (Palmer & Paisley, 1991) and low achieving (Hawkins, Doueck, & Lishner, 1988) adolescents have produced increases in school performance while also reducing rates of school expulsion and suspension (e.g., Eggert, Seyl, & Nicholas, 1990; Eggert, Thompson, Herting, & Nicholas, 1994). Behavioral Problems. Behavioral and impulse control problems are common among drug abusing youth (DeMilio, 1989). Rates of conduct disorder among substance abusing youth range from 40% to 57% across several studies (Bukstein, Glancy, & Kaminer, 1992; Monopolis, Brooner, Jadwisiak, Marsh, & Schmidt, 1991). Although dual diagnosis studies have generally found higher rates of conduct disorder among boys than girls, the rate among girls treated for substance abuse is still high, with several studies suggesting that at least one in three exhibit significant conduct disturbance (Brown, Gleghorn, Schuckit, Myers, & Mott, 1996; Bukstein et al., 1992; McKay & Buka, 1994). Elevated T-scores on the Deviant Behavior, Uncontrolled, and Peer Chemical Environment scales are indicative of serious behavioral disturbance. Research with adults suggests that sociopathic clients characterized by behavioral and impulse control problems respond more favorably to behavioral interventions that focus on coping skill development (Allen & Kadden, 1995). Thus, anger management, stimulus control, and reinforcement management techniques that utilize behavioral strategies to increase self-regulation may be used to reduce deviance among substance abusing youth. In addition, given that adolescents with more severe behavioral problems and insufficient family support are likely to be referred to treatment settings with greater restrictions, it is imperative that the application of new self-regulation skills are first practiced in a safe environment where success is likely. Ultimately, however, it is equally important to provide opportunities for youth to apply skills in real-life settings comparable to those encountered following treatment (Marlatt & Gordon, 1985). Psychiatric Problems. A small yet growing base of research suggests that an estimated 75% of youth treated for substance abuse have at least one comorbid psychiatric disorder (Regier, Boyd, & Burke, 1988; Stowell, 1991). Table 20.9 illustrates estimated rates of comorbidity based on a review of the literature (Bukstein et al., 1992; Horner & Scheibe, 1997; Latimer & Winters, 1998; Latimer & Jerstad, in press; Stowell, 1991). In addition, comorbidity status among youth treated for substance abuse is associated with less favorable treatment experiences and poorer outcomes, including shorter length of stay, earlier and more severe relapse episodes, school failure, and HIV exposure (Adams & Wallace, 1994; Moss, Kirisci, & Mezzich, 1994). Further research is critically needed in this area, but extant findings suggest that treatment plans focusing on reducing or eliminating substance use will be compromised substantially if comorbid disorders are left undetected and unaddressed. The PEI provides information pertinent to the assessment of emotional and behavioral problems among substance abusing youth. Although not designed as a diagnostic tool, elevated T-scores on the Psychological Disturbance, and to a lesser

< previous page

page_617

next page >

< previous page

page_618

next page > Page 618

TABLE 20.9 Estimated Rates of Comorbid Disorders Among Adolescents in Treatment for Psychoactive Substance Use Disorders Disorder Comorbid with PSUD % No comorbid condition 25% Attention Deficit Hyperactivity Disorder 30% Oppositional Defiant Disorder 30% Conduct Disorder 50% Learning Disorders 50% Communication Disorders NR* Mood Disorders 40% Anxiety Disorders 40% Note. Estimated rates based on literature review described in Latimer & Winters (1998). *No research available on which to base estimate. degree, Negative Self-image and Social Isolation scales, reflect disturbances in mood, anxiety, and self-esteem. Psychosocial treatments for psychiatric disorders among youth include a broad range of behavioral, cognitivebehavioral, family systems, and psychodynamic approaches (Walker & Roberts, 1992). Unfortunately, little is known from systematic research about pharmacological treatment of psychiatric disorders among drug abusing youths, despite the possibility that psychiatric disorders may underlie a secondary problem with drugs (Klorman, Coons, Brumaghim, Borgstedt, & Fitzpatrick, 1988; Pelham et al., 1990). Similarly, few studies have compared the effectiveness of different psychosocial treatments designed to address comorbid disorders among substance abusing youth (Catalano, Hawkins, Wells, Miller, & Brewer, 1991). Nonetheless, the assessment of comorbid psychiatric disorders among substance abusing youth is essential to planning effective treatment. Family Problems. Alcohol and drug abuse, as well as chaotic home environments, is common among family members of substance abusing youth (Hawkins et al., 1992). Not surprisingly, inclusion of a family treatment component for substance abuse has rapidly become a norm in adolescent settings. Elevations on the following family-based scales of the PEI are relevant to such family problems: The Family Pathology scale reflects serious dysfunction, including parental substance abuse, sexual abuse, and physical abuse; the Family Estrangement scale indicates inconsistent or poor communication patterns between family members; and the Sibling Chemical Use scale indicates significant sibling substance use. A fundamental assumption of family therapy models is that individual pathology, including substance abuse, is largely a reflection of a dysfunctional family system (Nichols, 1987). Whereas this assertion may be less applicable to the chronic adult alcoholic, environmental determinants such as the family system likely play an even greater role in adolescent when compared to adult substance abuse. Elevated T-scores on the Family Pathology, Family Estrangement, and Sibling Chemical Use scales reflect family systems with problems associated with substance abuse, including the absence of adaptive communication patterns between parent-child and spousal subsystems, blaming and labeling of the substance abusing adolescent as the "bad seed," parents using the identified problem of their child's substance abuse to avoid marital conflicts, and active modeling of substance abuse by a parent or sibling. In addition, research suggests that sibling substance use represents an underresearched yet highly potent predictor of adolescent substance abuse (Latimer, Winters, Stinchfield, & Traver, 1998).

< previous page

page_618

next page >

< previous page

page_619

next page > Page 619

Thus, scores in the abnormal range on the Family Pathology, Family Estrangement, and Sibling Chemical Use scales call attention to the need for a thorough family assessment to identify problematic subsystems and communication patterns. Family therapy may represent the most potent aspect of treatment for substance abusing youth with significant family dysfunction, particularly given the substantive role of the family following treatment. Limitations of the PEI as a Treatment Planning Tool A single instrument generally does not capture every dimension of interest pertinent to its assessment domain. For example, the PEI provides valuable problem severity and psychosocial information to inform treatment planning, yet does not address pharmacological history, academic achievement, physical health status, and other areas undoubtedly relevant to substance abuse treatment and outcome. In addition, the PEI is not designed to directly assess key resources available in the adolescent's community, such as school-based programs that foster abstinence through extracurricular activities. Finally, the PEI does not focus on stage of change issues shown to have a significant impact on substance abuse treatment process and outcome among adolescents and adults (Prochaska & DiClemente, 1992). In sum, the PEI can serve as a useful, but not exclusive, component of an adolescent substance use assessment system for informing treatment planning. Use of the PEI in Treatment Evaluation Overview of Measuring Treatment Outcome Drug treatment programs have generally received intensive scrutiny, perhaps more so than other health care services, because of the nature of addiction and the visibility of its effects. Treatment outcome information is thus invaluable to the field; such documentation provides a clearer picture of the types of clients served and helps programs determine the effectiveness and cost offsets of different strategies, and improve program performance. Although the PEI was not developed specifically for the purpose of measuring treatment outcome, it is relevant to consider its role in this capacity. The value of any standardized questionnaire as a measure of change is an important statistical and clinical question (Collins & Horn, 1991). Some investigators use difference scores, but they tend to be less reliable than the scores used to compute them, and the value of the Time 1 score introduces a bias into the difference score calculation (Allen & Yen, 1979). Dividing the simple difference score by the Time 1 score provides a partial correction for this bias. From a clinical standpoint, the important question is how many clients got better, how many got worse, and how many did not change. Along these lines, Jacobson and Truax (1991) proposed using the concept of "clinically significant change," which refers to a score change from the abnormal to the normal range. They have statistically operationalized this concept with the reliable change index (RCI). The RCI yields a change score that is corrected for the amount of measurement error inherent in the instrument. This is done by computing the difference between

< previous page

page_619

next page >

< previous page

page_620

next page > Page 620

pretest and posttest scores and dividing by the standard error of difference for the measure (which is estimated from the measure's temporal stability). The RCI analysis is quite appealing because it addresses the practical needs of the treatment service provider while still maintaining statistical standards of significance. Thus, it can be argued that for an instrument to have utility as an outcome measure, it must demonstrate satisfactory measurement error and provide meaningful information to treatment providers and researchers. The PEI as an Evaluation Tool How does the PEI measure up to these two standards? The PEI authors took great care toward developing highly meaningful scales with high utility. These scales should, on the face of it, be relevant to the measurement of treatment outcome. The Problem Severity scales, by measuring extent of involvement with drugs, are appropriate for evaluating level of change in drug use behaviors. The Drug Use Frequency Checklist provides a measure of posttreatment abstinence and levels of nonabstinence for specific drugs. Furthermore, the Psychosocial Risk scales provide measures of change for important areas of personal functioning and environmental status that are highly relevant to evaluating the client's outcome within the broader context of life functioning. In terms of the PEI's measurement error, its temporal stability has been examined over a time interval that is more in line with treatment outcome designs, such as 1-year time interval (Stinchfield & Winters, 1997). All five PEI Basic scales exhibited satisfactory 1-year test-retest reliability in a clinical sample (n = 37) that did not receive treatment over the test-retest interval. Test-retest correlation coefficients ranged from r = .86 to r = .89. Retest scores were generally higher than initial test scores in this untreated clinical sample. This increase in involvement with drugs is not surprising given that these adolescents did not receive treatment during the 1-year time interval. This evidence of satisfactory 1-year temporal stability is necessary for using these scales as a measure of change over a 1-year interval, the magnitude of the stability coefficients exceeding Nunnally's (1978) standard of at least .70 or higher. This PEI 1-year temporal stability information was then used to measure the significance of PEI change scores in a separate adolescent drug treatment sample (n = 45) using the RCI analysis. For the sake of brevity, the analysis focused on the primary Basic scale, the Personal Involvement with Chemicals Scale (PICS). Difference scores were computed for each subject by subtracting the intake PICS raw score from the 1-year follow-up PICS raw score, and then dividing by the standard error of difference for the PICS (which was 10.4). Figure 20.1 shows a scatterplot of admission assessment and 1-year follow-up PICS raw score retest data for the treatment sample. The shaded diagonal illustrates a 90% confidence interval of plus or minus 1.65 standard error of difference (i.e., + or - 1.65 multiplied times 10.4, or + or - 17 points). A confidence interval around the RCI is computed to indicate three types of outcome: statistically significant improvement, statistically significant deterioration, and no change or unreliable change. In Fig. 20.1, points in the shaded diagonal indicate no change or unreliable change. Points above the shaded diagonal represent a statistically significant increase in PICS raw score (i.e., deterioration), and points below the shaded diagonal indicate a statistically significant decrease in PICS raw score (i.e., improvement). Figure 20.1 shows that 14 subjects (31%) obtained significantly higher scores at 1-year follow-up assessment (i.e., deteriorated),

< previous page

page_620

next page >

< previous page

page_621

next page > Page 621

Fig. 20.1. Scatterplot of admission assessment and 1-year retest raw scores on the PEI Personal Involvement with Chemicals Scale (PICS) in a treatment sample (n = 45), with shaded band showing 90% confidence interval around unreliable change. 22 (49%) exhibited no significant change, and 9 (20%) obtained significantly lower scores at 1-year follow-up assessment (i.e., improved). For assessing clinical significance of change, Jacobson and Truax (1991) recommended using a cutoff score that provides optimal discrimination between functional and dysfunctional groups. The PEI was standardized on a school sample (N = 693) and a drug clinic sample (N = 1,120) (Winters & Henly, 1989) and these samples may be considered to represent functional and dysfunctional groups. The optimal cut score between these two groups was determined by computing a discriminant function analysis using the PICS score to predict group membership in either the school or drug clinic sample. A PICS raw cut score of £ 50 provided optimal discrimination with a hit rate of 90%. Figure 20.2 shows a scatterplot of the same test and 1-year follow-up PICS raw score retest data that was shown in Fig. 20.1 for the treatment sample (n = 45). The shaded vertical and horizontal lines split the sample into functional and dysfunctional ranges at intake and 1-year retest. The shaded lines represent a cut score of £ 50, with a 90% confidence interval of plus or minus 1.65 standard error of measurement (i.e., + or - 1.65 multiplied times 7.4, or + or - 12). Points within the shaded lines indicate classification uncertainty, that is, there is a greater chance of misclassification of subjects in this score range. A clinically significant improvement is represented by a participant's score that moves from the dysfunctional range at admission assessment to the functional range at 1-year follow-up. For this sample, clinically significant improvement was exhibited by four (9%) subjects who moved from the dysfunctional range at admission

< previous page

page_621

next page >

< previous page

page_622

next page > Page 622

Fig. 20.2. Scatterplot of admission assessment and 1-year retest raw scores on the PEI Personal Involvement with Chemicals Scale (PICS) in a treatment sample (n = 45), with shaded bands showing 90% confidence intervals around PICS raw clinical cutoff score of 50. to the functional range at 1-year follow-up (i.e., the four subjects in the lower righthand quadrant). Summary and Limitations This chapter has demonstrated how a RCI analysis with the PEI may be used to measure treatment outcome. The model of treatment outcome methodology described here has several advantages, including (a) the focus on measuring change from pretreatment to posttreatment, which is superior to reporting posttreatment abstinence rates alone (Stinchfield, Owen, & Winters, 1994); (b) consideration of the instrument's measurement error in the change analysis; and (c) sensitivity of the statistical approach toward providing outcome results for individual clients. If change had been analyzed more traditionally, such as with a pretest-posttest design, the ability to identify that some clients may have improved meaningfully would have been hampered. In fact, the pre-post analysis (paired t test) for this particular treatment sample was not statistically significant (t = -.38, df = 44, p = .71). Of course, even the most powerful statistical evaluation analysis has its limitations. The measurement error inherent to the instrument must be considered, as well as the problem of unreliable self-report. As is the case with the often-cited dictum about needing multiple assessment points when conducting a complete intake, a comprehensive follow-up evaluation likewise requires the integration of several lines of evidence.

< previous page

page_622

next page >

< previous page

page_623

next page > Page 623

Case Studies Two PEI case studies are described that reflect how this tool can assist the clinician in describing client problems and contribute to the treatment referral decision process. The case of Tom ("Many Troubles") brings to mind the importance of assessing the concomitant problems that often coexist with adolescent drug use. These other problems may have preceded Tom's drug use and may eventually require more treatment than the drug problems. Susan ("Losing and Using") is an example where it is useful to have drug clinic norms for PEI scores. She did not report an extended history of drug use but appeared to be using drugs recently as a result of a distressing life stress event. The two cases focus on intake data only. Case 1 Tom This case shows how the course of drug involvement can be confounded by the onset of a psychiatric illness. Tom was a diagnostic challenge because of his coexisting drug abuse and depression. This 16-year-old, white male was admitted to a drug treatment intake unit because of "suicidal thinking, severe depression, and recent intoxication, possibly of amphetamines." Tom has been quite shy and uncomfortable about his appearance since early adolescence. Despite considerable attention from his parents, this problem persisted. To aggravate matters, Tom's father has moved the family all over the country over the past few years because of the nature of his occupation. Tom rarely stayed in one school district or neighborhood for an extended period of time and thus had a difficult time establishing meaningful friendships. Furthermore, Tom was an aspiring soccer player but was unable to reap the full rewards that often come with being a stable member of a school team. During early adolescence, Tom used alcohol, but only sparingly because he did not like its taste. Later in adolescence, he began using marijuana and amphetamines. These substances served the purpose of making him feel less pain, and he was able to tolerate them better than alcohol. When he was about 15, Tom began to have long bouts of depression. Sometimes he would not get out of bed in the morning and would go for days without talking to anyone. Tom's parents agonized over what to do about his depression. Occasional visits with a counselor helped Tom somewhat, but he continued to struggle with his problem. Recently, peers and teachers had noticed that Tom was not his normal self. He was much quieter than usual, occasionally sat at his desk with a somber look on his face, and found it more difficult to get out of bed to go to school. He admitted to being preoccupied by depressing thoughts and had recently contemplated suicide. The day before Tom was brought to the intake unit, he had spent the entire day drinking alcohol, smoking marijuana, and using amphetamines, while pondering whether he should continue with life. Tom's drug clinic standardized scores on the PEI scales, presented in Table 20.10, reflect the comorbidities at work in his case. All of his scores on the Basic and Clinical Problem Severity scales were extremely elevated, and only one scale had a T-score below the drug clinic mean. Because it appears that Tom used drugs to selfmedicate, it is not surprising that his scores on the Psychological Benefits of Use scale was high and that his score on the Social Recreational Use scale was low. His very elevated T-score on

< previous page

page_623

next page >

< previous page

page_624

next page > Page 624

the Loss of Control scale reflects his severe involvement with drugs and the presence of substance use dependence symptoms. Tom's responses to the drug consumption items (not shown in Table 20.10) were consistent with the information obtained in the intake interview. Tom reported that he was a regular user of alcohol and marijuana, and he recently started to heavily use amphetamines. A polydrug use pattern was evident from his report of occasional use of LSD, cocaine, barbiturates, and inhalants. The pattern of scores across the Problem Severity scales and the drug consumption items strongly indicated that Tom needed drug abuse treatment. However, the second part of the PEI, which covers psychosocial problems, provided evidence that Tom had additional treatment needs. His other difficulties centered around symptoms related to an affective disorder, suicidal tendencies, social isolation, and delinquency. Not surprisingly, his Tscores exceeded the drug clinic mean for Negative Self-image, Psychological Disturbance, Social Isolation, Uncontrolled, Rejecting Convention, and Deviant Behavior. Also, his environment seemed to be contributing to his drug involvement; two Environmental Risk scales fell above the drug clinic mean (Peer Chemical Environment TABLE 20.10 PEI Scores for Case 1 and Case 2 Case 1 Scale Drug Other Clinic T-Score Problem Severity Scales 58 Personal Involvement with Chemicals 53 Effects from Use 58 Social Benefits 53 Personal Consequences 56 Polydrug Use 49 Social-Recreational Use 62 Psychological Benefits 52 Transsituational Use 61 Preoccupation with Drugs 69 Loss of Control Psychosocial Risk Scales 64 Negative Self-image 62 Psychological Disturbance 63 Social Isolation 60 Uncontrolled 59 Rejecting Convention 58 Deviant Behavior 57 Absence of Goals 58 Spiritual Isolation 71 Peer Chemical Environment no siblings Sibling Chemical Use 61 Family Pathology 52 Family Estrangement Problem Screens

Case 2 Drug Other Clinic T-Score 40 42 36 41 43 44 40 39 44 43 39 34 34 44 48 40 45 37 49 52 32 32

Need for Psychiatric Referral Eating Disorder Sexual Abuse Physical Abuse (intrafamilial) Family Chemical Dependency History Suicide Potential

< previous page

positive

negative

negative

negative

positive

negative

positive

negative

positive

negative

positive

negative

page_624

next page >

< previous page

page_625

next page > Page 625

and Family Pathology). Thus, it appears that Tom's drug involvement was complicated by feelings of despair, social isolation, and a disruptive and possibly pro-drug use environment. Among the Problem Screens, need for Psychiatric Referral, Suicide Risk, Sexual Abuse, and Physical Abuse were positive. These screens indicated the need for immediate attention, including the possible necessity of referring Tom for psychiatric treatment prior to drug abuse treatment. Tom admitted to making ''plans to kill" himself; this disclosure needs to be immediately addressed to determine the seriousness of his possible suicidal intention. Also, it was deemed important to inquire further about Tom's endorsement of sexual and physical abuse items. Generally, Tom's PEI appear to be valid, although both Infrequency scales were somewhat elevated. This was interpreted by the facility staff as a sign that Tom was crying out for help, and not making an intentional effort to exaggerate his problems. Tom appeared to be an individual who is feeling a great deal of pain; his sensitivity to internal distress, as well as his apparent willingness to seek help, probably contributed to the elevations on the Problem Severity and Psychosocial scales, as well as moderately elevated scores on the Infrequency scales. Based on the PEI results and additional information from interviews with Tom and his parents, it was decided to refer Tom to a residential drug treatment program that specialized in youth with coexisting psychiatric problems. Case 2 Susan Susan was a 17-year-old, Black female, high school student, who was referred for an evaluation by her mother. She was recently caught smoking marijuana by her mother. A clinical interview indicated that Susan had experienced a significant life stress of late. Her father passed away about a year ago and the adjustment for her and her mother has been difficult. The mother indicated during her intake interview that she was concerned about Susan's involvement with an older young man, and who may be influencing her to use drugs. Susan's mother hints that the loss of the father may have influenced Susan to have sought out an older man. During the course of the interviews with both Susan and the mother, it became apparent that there has been an increase in conflict between them since the death of the father. The mother reported that Susan had become more distant toward her and more rebellious regarding the standard house "rules." Susan appeared to downplay these difficulties, but admitted that the recent loss has had added a strain to home life. Susan's PEI scores are presented in Table 20.10. All the validity scales fell within normal limits, suggesting no compromised selfreport. The Problem Severity scales were moderately elevated, with drug clinic T-scores ranging from 39 to 44. These scores generally confirm the findings from the interview that Susan was only a moderate drug user and was not suffering from a serious drug problem. The most elevated score in this group was Social-Recreational Use, which indicated that her drug use was mostly motivated by social reasons and that she did not have a personal need to use drugs to "medicate" psychological distress. The low scores on Loss of Control and Preoccupation were also consistent with the perception that Susan was not overly involved with drugs. Susan's responses to the drug consumption items (not shown in Table 20.10) pointed to a pattern of moderate alcohol and marijuana use. She admitted to not being a regular user of these drugs, and reported no use of other elicit drugs. Susan's relatively late start in using drugs (after grade 11) appeared to coincide with the loss of the father.

< previous page

page_625

next page >

< previous page

page_626

next page > Page 626

The results on the Psychosocial section suggest that Susan has many strengths, and her few treatment entry points appear to be in her environment. Her pattern of scores on the Personal Risk scales suggested that she is not suffering from internal distress nor has a tendency toward delinquency. It is interesting to note that she scored moderately elevated on the Peer Chemical Environment and the Sibling Chemical Use scales, which confirmed the concerns of the mother. It was deemed important to conduct a follow-up interview with Susan to discuss issues of drug use by her peers (particularly the boyfriend) and siblings. The Problem Screen section indicated all negative results. She did endorse one item related to Family Chemical Dependency History, that is, "I have a brother/sister who gets drunk or high (sometimes)." This item endorsement is consistent with her elevated Sibling Chemical Use scale score. Susan and her mother were both recommended for outpatient counseling. It was felt that they both could benefit from grief counseling and that drug intervention strategies for Susan (e.g., no-use contract) should be a high priority in treatment. Conclusions The PEI is an appropriate tool within a comprehensive adolescent drug abuse assessment system. It was the intention for the development of the PEI to represent a significant advance in this field. Current data on its psychometric properties offer promising evidence that the PEI can be a useful treatment planning and evaluation tool. The development of the PEI is an ongoing process. More research is needed as to how PEI results at intake predict treatment outcome and if the tool has a role in matching clients to optimal treatment approaches. Its use in diverse ethnic and racial groups is still not well understood at this time, although this topic is currently under investigation. Also, it will be important to look into how useful the PEI score report is for clinicians, as well as updating the instrument to reflect new trends in adolescent drug involvement (e.g., treatment suitability, stages of change, tobacco use). Finally, there are many basic research questions about the determinants of valid and invalid self-report, including temporal (Stinchfield, 1997) and method (Aquilino, 1994) factors that are naturally relevant to use of the PEI. Acknowledgments Partial support for this chapter provided by NIDA grants DA04334 (Winters) and K21-DA00254 (Latimer). References Adams, L., & Wallace, J.L. (1994). Residential treatment for the ADHD adolescent substance abuser. Journal of Child and Adolescent Substance Abuse, 4, 35-44. Allen, J.P., & Kadden, R.M. (1995). Matching clients to alcohol treatments. In R.K. Hester & W.R. Miller (Eds.), Handbook of alcoholism treatment approaches (2nd ed., pp. 278-291). Needham Heights, MA: Allyn & Bacon. Allen, M.J., & Yen, W.M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole. Aquilino, P. (1994). Interview mode effect. Public Opinion Quarterly, 58, 210-240.

< previous page

page_626

next page >

< previous page

page_627

next page > Page 627

Berdiansky, H. (1991). Beliefs about drugs and use among early adolescents. Journal of Alcohol and Drug Education, 36, 26-35. Binion, A., Miller, C.D., Beauvais, F., & Oetting, E. R. (1988). Rationales for the use of alcohol, marijuana, and other drugs by eighth-grade Native American and Anglo youth. International Journal of the Addictions, 23, 4764. Brown, S.A., Gleghorn, A., Schuckit, M.A., Myers, M.G., & Mott, M.A. (1996). Conduct disorder among adolescent alcohol and drug abusers. Journal of Studies on Alcohol, 57, 314-324. Brown, S.A., Stetson, B.A., & Beatty, P.A. (1989). Cognitive and behavioral features of adolescent coping in high risk drinking situation. Addictive Behaviors, 14, 43-52. Brown, S.A., Vik, P.W., & Creamer, V.A. (1989). Characteristics of relapse following adolescent substance abuse treatment. Addictive Behaviors, 14, 291-300. Bukstein, O.G., Glancy, L.J., & Kaminer, Y. (1992). Patterns of affective comorbidity in a clinical population of dually diagnosed adolescent substance abusers. Special section: Substance abuse. Journal of the American Academy of Child and Adolescent Psychiatry, 31, 1041-1045. Cady, M., Winters, K.C., Jordan, D.A., Solberg, K. B., & Stinchfield, R.D. (1996). Motivation to change as a predictor of treatment outcome for adolescent substance abusers. Journal of Child and Adolescent Substance Abuse, 5, 73-91. Catalano, R.F., Hawkins, J.D., Wells, E.A., Miller, J., & Brewer, D. (1991). Evaluation of the effectiveness of adolescent drug abuse treatment, assessment of risks for relapse, and promising approaches for relapse prevention. International Journal of the Addictions, 25(9A & 10A), 1085-1140. Collins, L.M., & Horn, J.L. (Eds.). (1991). Best methods for the analysis of change. Washington, DC: American Psychological Association. Crowne, D.P., & Marlowe, D. (1960). A new scale of social desirability independent of psychopathology. Journal of Consulting Psychology, 24, 349-354. Curry, S., & Marlatt, G.A. (1987). Building self-confidence, self-efficacy, and self-control. In W.M. Cox (Ed.), Treatment and prevention of alcohol problems (pp. 117-135). New York: Academic Press. Dembo, R., Williams, L., Lavoie, L., Schneidler, J., Kern, J., Getreu, A., Berry, E., Genung, L., & Wish, E.D. (1990). A longitudinal study of the relationships among alcohol use, marijuana/hashish use, cocaine use, and emotional/psychological functioning problems in a cohort of highrisk youth. International Journal of the Addictions, 25, 1341-1382. DeMilio, L. (1989). Psychiatric syndromes in adolescent substance abusers. American Journal of Psychiatry, 146, 1212-1214. Denhoff, M.S. (1987). Irrational beliefs as predictors of adolescent drug abuse and running away. Journal of Clinical Psychology, 43, 412-423. D'Zurilla, T.J., & Goldfried, M.R. (1971). Problem solving and behavior modification. Journal of Abnormal Psychology, 78, 197-126. Eggert, L.L., Seyl, C.D., & Nicholas, L.J. (1990). Effects of a school-based prevention program for potential high school dropouts and drug abusers. International Journal of the Addictions, 25, 773-801. Eggert, L.L., Thompson, E.A., Herting, J.R., & Nicholas, L.J. (1994). Preventing adolescent drug abuse and high school dropout through an intensive school-based social network development program. American Journal of Health Promotion, 8, 202-215. Ellickson, P., Bell, R.M., & Harrison, E.R. (1993). Changing adolescent propensities to use drugs: Results from Project ALERT. Health-Education-Quarterly, 20, 227-2420. Erikson, E.H. (1963). Childhood and society. New York: Norton. Friedman, A.S., Glickman, N., & Utada, A. (1985). Does drug and alcohol use lead to failure to graduate from high school? Journal of Drug Education, 15, 353-364. Friedman, A.S., & Utada, A. (1989). A method for diagnosing and planning the treatment of adolescent drug users (the Adolescent Drug Abuse Diagnosis Instrument [ADAD]). Journal of Drug Education, 19, 285-312. Fry, E. (1977). Fry's readability graph: Clarification, validity, and extension to Level 17. Journal of Reading, 21, 242-253. Glantz, M., & Pickens, R. (Eds.). (1992). Vulnerability to drug abuse. Washington, DC: American Psychological Association.

< previous page

page_628

next page > Page 628

Hawkins, J.D., Catalano, R.F., & Miller, J.Y. (1992). Risk and protective factors for alcohol and other drug problems in adolescence and early adulthood: Implications for substance abuse prevention. Psychological Bulletin, 112, 64-105. Hawkins, J.D., Doueck, H.J., & Lishner, D.M. (1988). Changing teaching practices in mainstream classrooms to improve bonding and behavior of low achievers. American Educational Research Journal, 25, 31-50. Heilman, R.O. (1973). Early recognition of alcoholism and other drug dependencies. Center City, MN: Hazelden Foundation. Henly, G.A., & Winters, K.C. (1988). Development of problem severity scales for the assessment of adolescent alcohol and drug abuse. International Journal of the Addictions, 23, 65-85. Henly, G.A., & Winters, K.C. (1989). Development of psychosocial scales for the assessment of adolescent alcohol and drug involvement. International Journal of the Addictions, 24, 973-1001. Horn, J.L., Skinner, H.A., Wanberg, K., & Foster, F.M. (1982). Alcohol Use Questionnaire (ADS). Toronto: Addiction Research Foundation. Horner, B.R., & Scheibe, K.E. (1997). Prevalence and implications of attention-deficit hyperactivity disorder among adolescents in treatment for substance abuse. Journal of the American Academy of Child and Adolescent Psychiatry, 36, 30-36. Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Jessor, R., Donovan, J.E., & Costa, F.M. (1991). Beyond adolescence: Problem behavior and young adult development. Cambridge, England: Cambridge University Press. Johnston, L.D., Bachman, J.G., & O'Malley, P.M. (1985). Monitoring the future: Questionnaire responses from the nation's high school seniors, 1984. Ann Arbor, MI: Survey Research Center, Institute for Social Research. Kandel, D.B. (Ed.). (1978). Longitudinal research on drug use: Empirical findings and methodologies issues. Washington, DC: Hemisphere. Klorman, R., Coons, H.W., Brumaghim, J.T., Borgstedt, A.D., & Fitzpatrick, P. (1988). Stimulant treatment for adolescents with attention deficit disorder. Psychopharmacology Bulletin, 24, 88-92. Labouvie, E.W. (1986). Alcohol and marijuana use in relation to adolescent stress. International Journal of the Addictions, 21, 333-345. Latimer, W.W., & Jerstad, S. (in press). Learning disabilities and psychoactive substance use disorders: Impact of an underresearched comorbidity among youth. Research policy brief, Institute on Community Integration Publications, University of Minnesota, Minneapolis. Latimer, W.W., & Winters, K.C. (1998). The prevalence of comorbid psychiatric disorders among youth treated for substance abuse. Manuscript in preparation. Latimer, W.W., Winters, K.C., Stinchfield, R.D., & Newcomb, M. (1998). Predictors of adolescent drug abuse treatment outcome. Manuscript under review. Latimer, W.W., Winters, K.C., Stinchfield, R.D., & Traver, R.E. (1998). The role of client and contextual variables in the outcome of adolescent drug treatment. Manuscript under review. Margolis, R. (1995). Adolescent chemical dependence: Assessment, treatment, and management. Psychotherapy, 32, 172-179. Marlatt, G.A. (1979). Alcohol use and problem drinking: A cognitive-behavioral analysis. In P.C. Kendall & S.D. Hollon (Eds.), Cognitive-behavioral interventions: Theory research and practice (pp. 199-152). New York: Academic Press. Marlatt, G.A., & Gordon, J.R. (1985). Relapse prevention: Maintenance strategies in the treatment of addictive behaviors. New York: Guilford. Mayer, J., & Filstead, W.J. (1979). The Adolescent Alcohol Involvement Scale: An instrument for measuring adolescents' use and misuse of alcohol. Journal of Studies on Alcohol, 40, 291-300. McKay, J.R., & Buka, S.L. (1994). Issues in the treatment of antisocial adolescent substance abusers. Journal of Child and Adolescent Substance Abuse, 3, 59-81. Minuchin, S. (1974). Families and family therapy. Cambridge, MA: Harvard University Press. Monopolis, S.J., Brooner, R.K., Jadwisiak, R.M., Marsh, E., & Schmidt, C.W. (1991).

< previous page

page_629

next page > Page 629

Preliminary report on psychiatric comorbidity in adolescent substance abusers (NIDA Research Monograph 105, DHHS Publication No. ADM 91-1753). Washington, DC: U.S. Government Printing Office. Moss, H.B., Kirisci, L., & Mezzich, A.C. (1994). Psychiatric comorbidity and self-efficacy to resist heavy drinking in alcoholic and nonalcoholic adolescents. American Journal on Addictions, 3, 204-212. Muisener, P.P. (1994). Understanding and treating adolescent substance abuse. Thousand Oaks, CA: Sage. Myers, M.G., & Brown, S.A. (1990). Coping responses and relapse among adolescent substance abusers. Journal of Substance Abuse, 2, 177-189. National Institute on Drug Abuse. (1997). Marijuana and tobacco use up again among 8th and 10th graders. NIDA Notes, 12(2), 12. Newcomb, M.D., & Bentler, P.M. (1988). The impact of family context, deviant attitudes, and emotional distress on adolescent drug use: Longitudinal latent-variable analyses of mothers and their children. Journal of Research in Personality, 22, 154-176. Nichols, M.P. (1987). The self in the system. New York: Brunner/Mazel. Nunnally, J.C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill. Palmer, J.H., & Paisley, P.O. (1991). Student assistance programs: A response to substance abuse. School Counselor, 38, 287-293. Parrish, S.L., Jr. (1994). Adolescent substance abuse: The challenge for clinicians. Alcohol, 11, 453-455. Pelham, W.E., Jr., Greenslade, K.E., & Vodde-Hamilton, M., Murphy, D.A., Greenstein, J.J., Gnagy, E.M., Guthrie, K. J., Hoover, M.D., & Dahl, R.E. (1990). Relative efficacy of long-acting stimulants on children with attention deficit-hyperactivity disorder: A comparison of standard methylphenidate, sustained-release methylphenidate, sustained-release dextroamphetamine, and pemoline. Pediatrics, 86, 226-237. Prochaska, J.O., & DiClemente, C.C. (1992). Stages of change in the modification of problem behaviors. In M. Hersen, R.M. Eisler, & P.M. Miller (Eds.), Progress in behavior modification (pp. 184-214). Sycamore, IL: Sycamore Press. Rahdert, E. (Ed.). (1991). The Adolescent Assessment/Referral System manual (DHHS Publication No. ADM 91-1735). Rockville, MD: National Institute on Drug Abuse. Ray, J.B., Freidlander, R.B., & Solomon, G.S. (1984). Changes in rational beliefs among treated alcoholics. Psychological Reports, 55, 883-886. Regier, D.A., Boyd, J.H., & Burke, J.D. (1988). One-month prevalence of mental disorders in United States. Archives of General Psychiatry, 45, 977-986. Rosenberg, H., & Brian, T. (1986). Cognitive-behavioral group therapy for multiple-DUI offenders. Special Issue: Drunk driving in America: Strategies and approaches to treatment. Alcoholism Treatment Quarterly, 3, 47-65. Schonberg, K. (Ed.). (1993). Guidelines for the treatment of alcohol- and other drug-abusing adolescents (TIPS No. 4). Rockville, MD: Center for Substance Abuse Treatment. Stinchfield, R. (1997). Reliability of adolescent self-reported pretreatment alcohol and other drug use. Substance Use and Misuse, 32, 425-434. Stinchfield, R.D., Owen, P., & Winters, K.C. (1994). Group therapy for substance abuse: A review of the empirical research. In A. Fuhriman & G. Burlingame (Eds.), Handbook of group psychotherapy (pp. 458-488). New York: Wiley. Stinchfield, R., & Winters, K.C. (1997). Measuring change in adolescent drug misuse with the Personal Experience Inventory (PEI). Substance Use and Misuse, 32, 63-76. St. Lawrence, J.S., Jefferson, K.W., Alleyne, E., & Brasfield, T.L. (1995). Comparison of education versus behavioral skills training interventions in lowering sexual HIV-risk behavior of substance-dependent adolescents. Journal of Consulting and Clinical Psychology, 63, 154-157. Stowell, R. (1991). Dual diagnosis issues. Psychiatric Annals, 21, 98-104. Tarter, R.E. (1990). Evaluation and treatment of adolescent substance abuse: A decision tree method. American Journal of Drug and Alcohol Buse, 16, 1-46. Trad, P.V. (1993). Substance abuse in adolescent mothers: Strategies for diagnosis, treatment and prevention. Journal of Substance Abuse Treatment, 10, 421-431.

< previous page

page_630

next page > Page 630

Walker, C.E., & Roberts, M.C. (Eds.). (1992). Handbook of clinical child psychology (2nd ed.). New York: Wiley. Winters, K.C., & Henly, G.A. (1989). Personal Experience Inventory test and manual. Los Angeles: Western Psychological Services. Winters, K.C., & Henly, G.A. (1993). Adolescent Diagnostic Interview and manual. Los Angeles: Western Psychological Services. Winters, K.C., Stinchfield, R.D., & Henly, G.A. (1993). Further validation of new scales measuring adolescent alcohol and other drug abuse. Journal of Studies on Alcohol, 54, 534-541. Winters, K.C., Stinchfield, R.D., & Henly, G.A. (1996). Convergent and predictive validity of scales measuring adolescent substance abuse. Journal of Child and Adolescent Substance Abuse, 5, 37-55. Winters, K., Stinchfield, R., Henly, G., & Schwartz, R. (1991). Validity of adolescent self-report of alcohol and other drug involvement. International Journal of the Addictions, 25, 1379-1395. Yeager, R.L., DiGiuseppe, R., Olsen, J.T., & Lewis, L. (1988). Rational-emotive therapy in the therapeutic community. Journal of Rational-Emotive and Cognitive Behavior Therapy, 6, 211-235.

< previous page

page_630

next page >

< previous page

page_631

next page > Page 631

Chapter 21 Child and Adolescent Functional Assessment Scale (CAFAS) Kay Hodges Eastern Michigan University Symptoms and diagnoses have been the traditional cornerstone of clinical assessments. Over the past decade, a second concept, impairment, has come to be regarded as important in making treatment decisions. Impairment reflects the consequences or effects of symptoms on functioning. In fact, epidemiological research over the past decade has demonstrated clearly that presence of a diagnosis is not comparable to impairment or need for treatment (Bird et al., 1990). These findings have in part been responsible for the inclusion of the concept of impairment in the most recent edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; American Psychiatric Association, 1994). Presence of impairment is a required criterion for many DSM-IV diagnoses, but it remains vaguely defined. DSM-IV references to impairment state that it should be in social, academic, or occupational areas of functioning or should be present in two or more settings. In addition, the definition of serious emotional disturbance (SED) includes the requirement that the youth have functional impairment that substantially interferes with or limits the child's role or functioning in family, school, or community activities (Federal Register, 1993). State administrators applying for mental health block grants are required to operationally define functional impairment and develop standardized methods for estimating SED (Hodges, 1994d). Most states are in the process of developing managed care procedures for allocating state and federal monies for SED youths. These procedures typically assume that SED children should be offered services before less impaired youths, necessitating development of eligibility criteria for levels of intensity of services. Providers are also being asked to document treatment effectiveness, requiring data on outcome (i.e., change in the youth's condition for which services were sought). Almost two dozen states are using the Child and Adolescent Functional Assessment Scale (CAFAS®; Hodges, 1994b) as one component in determining whether a child is eligible for SED services and/or to assess outcome for youths receiving services. In addition, some statewide child-care associations have adopted the CAFAS as an outcome indicator.

< previous page

page_631

next page >

< previous page

page_632

next page > Page 632

These activities on the state and federal level parallel changes that have been taking place in the private sector. Increasingly, third-party payers are expecting mental health providers to document severity of impairment for the patient to qualify for more intensive or costly treatments. In addition, third-party payers are requiring continual monitoring of the client's level of functioning with an expectation that intensity of treatment will be reduced as impairment lessens. The term outcome monitoring differs from outcome evaluation in that the information obtained as treatment progresses may be used to modify the ongoing treatment plan. Outcome monitoring is a dynamic process, whereas outcome evaluation could be restricted to pre- and posttreatment events. This chapter reviews the psychometric data on the CAFAS, as well as the current uses of the CAFAS instrument for treatment planning, treatment monitoring, and outcome assessment. The CAFAS is used for a variety of purposes, including as a criterion measure in determining intensity of services needed, as a treatment outcome measure (pre/post), and as an aid in case management. Overview of the Instrument Description and Development of the CAFAS The CAFAS was designed to rate functional impairment in children and adolescents from age 7 to 17, who have or may have emotional, behavioral, substance use, psychiatric, or psychological problems. It was developed for use with children who were referred for, or are at risk for developing, these problems. The CAFAS measures the youths across different domains of social/personal functioning and, in each domain, along a continuum of impairment. Areas of impairment, and the extent of that impairment, are identified. It is very sensitive to changes in functioning over time, making it particularly useful as a measure of treatment effects and outcome for youths with SED. The CAFAS is a menu of behavioral descriptors that are organized around domains of functioning. Within each domain, behaviors are grouped into levels of impairment: severe, moderate, mild, and no or minimal impairment. Typically, the youth is rated with the CAFAS by a clinician or case manager based on information usually collected as part of clinical services. The rater simply identifies the behaviors that describe the youth. The actual rating takes about 10 minutes. The CAFAS is not "administered." In situations where typical clinical information is not available, an optional structured 30-minute interview has been developed to obtain the information needed for the CAFAS. The current version of the CAFAS was developed in 1994 (Hodges, 1994b), which replaced the original version developed in 1989 (Hodges, 1989). In this chapter, CAFAS refers to the current version unless otherwise specified. The CAFAS is comprised of eight scales on which the youth is rated (Hodges, 1994b). Three scales assess Role Performance: School/Work (i.e., functions satisfactorily in a group educational environment), Home (i.e., observes reasonable rules and performs age-appropriate tasks), and Community (i.e., respects the rights of others and their property and acts lawfully). The remaining scales include Behavior Toward Others (i.e., appropriateness of youth's daily behavior), Mood/Emotions (i.e., modulation of the youth's emotional life), Selfharmful Behavior (i.e., extent to which the youth can cope without resorting to self-harmful behavior or verbalizations), Substance Use (i.e., youth's

< previous page

page_632

next page >

< previous page

page_633

next page > Page 633

substance use and the extent to which it is inappropriate and disruptive), and Thinking (i.e., ability of youth to use rational thought processes). There are two scales to rate the youth's caregivers: the Material Needs scale (i.e., extent to which the youth's functioning is interfered with due to lack of resources, such as food, clothing, housing, medical attention, or neighborhood safety) and Family/Social Support (i.e., extent to which the youth's functioning is disrupted due to limitations in the family's psychosocial resources relative to the youth's needs). There are three sets of the Caregiver scales in the event that a youth has more than one caregiver. There are separate but identical scales for Primary Family, Noncustodial Caregiver, and Surrogate Caregiver. The intent of the Caregiver scales is to provide information on the context in which the youth functions. It provides a means of rating environmental factors that, if present, are important to consider in designing a treatment plan. Otherwise, the plan may be quite ineffective. Collaborating with other agencies or providing various supportive services to the family may be critical to empowering the caregivers. An optional scale, which permits documenting the client's strengths and goals, is also included in the current CAFAS. The CAFAS scales are unchanged, as is the scoring. However, immediately following each CAFAS scale, a list of positive characteristics relevant to the scale's domain appears. These positive behavioral descriptions can be coded as a strength for youths who have the characteristic, or as a goal for youths who lack the behavior but for whom achieving it is realistic. These positive qualities are presented for both the Youth and Caregiver scales. An example is "attends regularly," which follows the CAFAS School scale. In addition, for the first four scales, there are descriptions of environmental resources (e.g., specialized classroom). These resources can be coded for availability (i.e., whether the resource is available in the youth's community) and for utilization (i.e., whether the resource was utilized during the rating period). These additional items were developed and then submitted to experts in child psychopathology, child development, and minority issues. The intent was to develop a list of strengths that was sensitive to the expression of competence across various subcultures. The rationale for adding these items as an optional scale was to facilitate the link between assessing problem behaviors and developing intervention strategies. Understanding how to mobilize the family's strengths is an important component of successful treatment. Although strengths and behaviors reflecting normal behavior are listed in the "No or Minimal Impairment" category of each CAFAS scale, these items are by necessity limited in range. The items on this optional scale provide a much more elaborate description of the youth's and the family's assets. Although the 1989 CAFAS is no longer used, it is briefly compared to the current version because it is referenced in discussions of research findings. The 1989 version differs from the current version in that there was only one Role Performance scale, with all items relevant to School/Work, Home, and Community appearing on one scale; no separate Self-harmful Behavior scale; only one set of Caregiver scales; fewer behavioral items in each scale; and no accompanying list of strengths. For each scale, the rater determines the severity level that best describes the youth's most severe level of dysfunction for the time period specified. The user defines this time period (e.g., last month, last 3 months). The severity levels are as follows: Severe (i.e., severe disruption or incapacitation); Moderate (i.e., persistent disruption or major occasional disruption of functioning); Mild (i.e., significant problems or distress); and Minimal or No Impairment (i.e., no disruption of functioning). For each scale and each severity level, there are a set of items describing behavior. The rater reviews the items in

< previous page

page_633

next page >

< previous page

page_634

next page > Page 634

the Severe category first. If any item describes the youth's functioning, the youth's level of impairment for the scale is "severe." If none of the items in the Severe category characterize the youth, the rater continues to the Moderate category, progressing through the remainder of the categories as needed until the youth's level of functioning can be described. What if a youth's behavior is not described in the CAFAS items? To accommodate this situation, an item referred to as "Exception" is provided for each scale at each severity level. The rater can circle the item number corresponding to "Exception" and explain the reason in a designated space below each scale. For example, a third-grader who is encopretic in the classroom and alienates the other children by trying to ignore the fact that he defecated may be rejected by his peers. The rater may rate this child at the moderate level on the ''Behavior Toward Others" scale. The CAFAS ratings are based on what the rater has observed or what has been reported by the youth or other informants. The youth's functioning is to be rated independent of previous diagnoses, prognosis, or presumed nature of the disorder. For the purposes of generating quantitative scores, the levels of impairment are assigned the following values: Severe, 30; Moderate, 20; Mild, 10; and Minimal or No Impairment, 0. A total score is generated for the youth by totaling the 8 scales. The Caregiver scale scores are not combined with the Youth scales. A higher score indicates greater impairment, with a range from 0 to 240. The CAFAS is intended for use with youths at risk for, or referred because of, behavioral, emotional, mental, psychiatric, psychological, or substance use problems. The score can be used as a gauge to determine if a youth is impaired, and if so, the extent of impairment. Nonreferred youths typically receive scores indicating no impairment or minimal impairment (i.e., 0, 10; Hodges, 1995). Thus, it is not feasible to collect data on children in the community with the intent of establishing normative data. However, it is appropriate to collect data on referred children for the purpose of establishing general guidelines that provide an expected range of scores for children varying in intensity of services needed. In fact, two studies report on the distribution of scores for youths varying in impairment, as indicated by their level of service (Hodges & Wong, 1996; Hodges, Wong, & Latessa, 1998). General guidelines for interpreting the CAFAS with referred youths are reported later. It is worth noting that for the 1989 version of the CAFAS, the total was generated on five scales, yielding a range of 0 to 150. Because there is considerable data available on the 1989 version, it may be desirable to generate a five-scale sum score for data from the current eight-scale CAFAS. To generate the five-scale sum (hereafter referred to as the "five-scale sum total score"), the highest of the three Role Performance scales (i.e., School/ Work, Home, Community) is chosen for the Role Performance scale and the highest of the Mood/Emotions and Self-harmful Behavior scales is chosen for the Moods scale. There is a downward extension of the CAFAS for children from age 4 to 7, which is referred to as the Preschool and Early Childhood Functional Assessment Scale (PECFAS; Hodges, 1994e). Generally, if a child is attending a full-day school program in elementary school (e.g., kindergarten, first grade), the CAFAS is appropriate. The user should consider the children's developmental age, rather than chronological age, in deciding whether to rate them using the CAFAS or PECFAS. The PECFAS has the same scales as the CAFAS, with the exception of the omission of the Substance Use scale. The content of the items on the PECFAS refers to behavior that is more typical of younger children (e.g., temper tantrums, finicky eater, etc.). There is also an automated version of the CAFAS, which is referred to as the CAFAS Computer System. This system allows clinicians to rate the youths directly on their

< previous page

page_634

next page >

< previous page

page_635

next page > Page 635

computer terminal. The CAFAS is shown in this computer version just as it is in the written version. Each scale is presented on a separate screen. The user clicks on a check box if the item is true for the youth. A clinical interpretive report can be printed out for each CAFAS administration, which can serve as the basis for treatment planning. The format for this report offers one model for interpreting the CAFAS. In addition to the CAFAS, the program collects other information thought to be important for treatment planning, including, at first assessment, demographic information, multiaxial diagnoses, and presence of current and past risk factors; at subsequent CAFAS ratings (e.g., every 3 months), services and interventions rendered and extent of interagency collaboration; and at termination, the circumstances of termination of services. In addition to the client report, the CAFAS Computer System generates two administrative reports that collapse data across clients for a time period specified by the user. One report describes the severity of impairment of youth at intake and the other describes the pre- to posttreatment outcome results. The computer program also permits exporting a computer file that contains the data collected by the program. This permits generating custom reports by analyzing the data file with spreadsheet or statistical programs (e.g., Excel®, SPSS®, and ACCESS®). The data file produced by the CAFAS Computer System has been used to characterize the youths served by public mental health dollars and to assess outcome in these various types of youths served (Hodges & Warren, 1997). Reliability and Validity Data Reliability and validity data have primarily been generated by two evaluation studies: the Ft. Bragg Evaluation Project (hereafter referred to as FBEP) and the national evaluation of the service grants funded by the Center for Mental Health Services (CMHS) Branch of Substance Abuse and Mental Health Services Administration of the Department of Health and Human Services (hereafter referred to as the CMHS evaluation). The two evaluations are briefly described here before presenting the psychometric data generated by them. FBEP. The purpose of the FBEP was to compare two systems of care, and, as has been reported elsewhere, the study showed similar results for the two systems (Lambert & Guthrie, 1996). For the purpose of describing the psychometric data on the CAFAS, the data from both systems of care were collapsed into one sample. Subjects were dependents of army personnel (i.e., active, retired, or discharged), who were referred for mental health services and agreed to participate in the Ft. Bragg Demonstration Evaluation Study (Breda, 1996). The children ranged in age from 5 to 17 years old, with a mean age of 11 years at intake. They were recruited from three army bases (Ft. Bragg, North Carolina; Ft. Campbell, Kentucky; and Ft. Stewart, Georgia) and were evaluated at four time points (intake, 6 months, 12 months, and 18 months postintake). Only those respondents whose CAFAS scores were present in the data were included in this sample. There were 984 respondents at intake (Wave 1), 780 respondents at 6 months (Wave 2), 617 respondents at 12 months (Wave 3), and 373 respondents at 18 months postintake (Wave 4). There were no eligibility criteria for seeking services; cases received services irrespective of the youth's level of impairment. Analyses at the end of the project revealed that the overwhelming majority of the youths (78%) received traditional outpatient therapy and did not receive intensive services (Summerfelt, Foster, & Saunders, 1996).

< previous page

page_635

next page >

< previous page

page_636

next page > Page 636

The CAFAS was rated by 28 interviewers who administered a structured diagnostic interview to the youth and the parent separately and rated the CAFAS based on the information obtained in the interview. The raters were research personnel not involved in the treatment or treatment decisions for any of the subjects, and were mostly lay raters. The research protocols were collected, stored, and processed by the research team employees only. Because the 1989 version of the CAFAS was used in the FBEP, the total score refers to the five-scale sum total score, which was previously described. CMHS. The CAFAS is being used in the evaluation of the demonstration grants funded by the Center for Mental Health Services. The study is still in progress, so only part of the total sample has been analyzed. Data from this study has been summarized by Hodges and colleagues (Doucette-Gates, Liao, Sondheimer, & Slaton, 1997; Hodges, Doucette-Gates, & Liao, 1997; Hodges, Doucette-Gates, Liao, & Wong, 1996; Hodges et al., 1997). In contrast to the FBEP, the CMHS sample were youths who were defined as seriously emotionally disturbed (SED), had a history of (or were at risk for) out-of-home placement, or were served by multiple agencies. Each grantee designated interventions for specified at-risk groups of youths. Thus, although this sample is not representative of referred youth, it consists of many different types of highly impaired youths. Results are available for a sample of 3,300 youths, from age 4 to 23 (Hodges et al., 1997), approximately two thirds of whom were males. The average age for boys was 11.89 years, whereas girls were on average 1 year older, with a mean of 12.9 years. The majority of the youths were Caucasian (64.0%), with 18.9% African American and 8.4% Hispanic. This was an impoverished sample, with 64% of the families living on an income below $15,000. The majority of the youths (54.1%) were from single-parent families. The breakdown of the sample by referral sources reveals that the youths were referred by a variety of child agencies, including the schools (21%), state social services (19%), mental health (19%), juvenile court or corrections (9%), parents (14%) and other sources (18%). Outcome data was available on 931 youths who were reevaluated at 6 months postintake. The design of the study only required collecting 6 months data if the youths were still receiving services. The CAFAS ratings were made primarily by staff who were involved in providing services to the youths. The ratings were based on the information collected as part of the normal course of events in the clinical setting. The eight scale sum total was calculated as well as the five scale; the latter for the purpose of comparison to the FBEP. Reliability. Cronbach's alpha ranged between .63 and .68 for the different waves in the FBEP (Hodges & Wong, 1996). In the CMHS report, Cronbach's alpha for the CAFAS was .73 at intake and .78 at 6 months (Hodges et al., 1997). These alpha values, which reflect on the homogeneity of the scales of the CAFAS, are supportive of the reliability of the CAFAS. This is especially true given that the separate scales are intended to assess different domains of impairment. Furthermore, the results indicated that the reliability for the entire scale would be lower if any of the individual scales were omitted from the CAFAS. The correlation between the five-scale and the eight-scale sum totals was very high, r(3,074) = .93, p < .0001 (Hodges et al., 1997). A test-retest reliability study on the CAFAS indicated good reliability (Hodges & Wong, 1996). Interrater reliability has been assessed with lay raters (i.e., undergraduate and graduate college students) and with frontline staff. Pearson correlations between the trainees and the criterion score (i.e., the "gold standard") were calculated. With both types of raters, high correlations (i.e., Pearson correlations above .92) have been

< previous page

page_636

next page >

< previous page

page_637

next page > Page 637

observed for the total CAFAS score. For the individual scales, the reliability has been consistently high (i.e., Pearson correlations above .83) for Role Performance, Behavior Toward Others, and Substance Use. Moderately high correlations were observed for Mood/Emotions. Comparable results were observed when intraclass correlations were determined to examine this consistency across raters, ignoring the criterion answers (Hodges & Wong, 1996). Content and Face Validity. Content validity depends on the extent to which an empirical measurement reflects a specific domain of content, whereas face validity is concerned with the extent to which a measure looks like it measures what it is intended to measure (Nunnally, 1978). In the construction of the CAFAS, the various domains of functioning were identified, and items were included only if they reflected on impairment in the specific domain. For example, for the School scale, items were developed for describing the problems related to attendance, disobedience in the classroom, unsafe behavior in the classroom, poor attention or overactivity, and below average academic achievement. The total score is derived from the separate scale scores, and thus reflects on impairment across specific domains of functioning. In fact, the factorial validity of the CAFAS was supported by research in which each topic included in each CAFAS scale (e.g., academic functioning for the School scale, etc.) was included in the analysis. The results were supportive of the content validity of the CAFAS (Newman, DeLiberty, Hodges, McGrew, & Tejeda, 1997; Newman & Hodges, 1995). In applied settings, face validity can play an important role if data will be used to try to influence decision making by laypersons, such as legislators and policymakers (Nunnally, 1978). The CAFAS has strong face validity because CAFAS scores can always be translated into specific problematic behaviors because all ratings on the CAFAS must be supported by endorsement of specific items that are objective and behavioral, such as "expelled from school." Also, improvement over time can be described by the changes in domain and global scores as well as by observable changes in specific behaviors, as reflected by the specific items endorsed. For example, a child suspended from school for aggressive behavior at intake may be described as having behavioral problems that can be handled by the classroom teacher, with no intervention by disciplinarians needed, at 6 months follow-up. Such concrete evidence of improvement, which is verifiable by examining specific item endorsements, is more persuasive than abstract scores not directly related to behavior for the specific youth. Concurrent Criterion-Related Validity. Validity has also been examined by determining whether CAFAS scores were different for subgroups of youths who should presumably differ in extent of impairment. Specifically, analyses were conducted to determine whether CAFAS scores differed for youths: being served at different levels of intensity of care, living in settings that differ in restrictiveness and in use of staff with specialized skills at handling problem behaviors, and severity of psychiatric diagnosis. In the FBEP study, three groups varying in intensity of services at intake could be compared: residential (i.e., psychiatric inpatient, residential treatment center), alternative care (i.e., alternative care to traditional residential, including home-based services, day treatment, specialized foster care, and group home), and outpatient. As hypothesized, inpatients scored significantly higher than youths in alternative care, who in turn scored significantly higher than youths in outpatient care (Hodges & Wong, 1996). In the CMHS evaluation, four living arrangements could be compared. They were youths living in their own home, regular foster care, therapeutic foster care, and residential programs (i.e., psychiatric group home; residential treatment center, RTC;

< previous page

page_637

next page >

< previous page

page_638

next page > Page 638

psychiatric inpatient). It was hypothesized that youths living in their own home would be less impaired than youths placed in therapeutic foster care, who in turn would be less impaired than youths in residential programs. It was reasoned that youths in regular foster care could be more impaired or no different than children living in their own home because many children are placed in foster care as a result of their caregivers' behavior rather than their own behavior. Significant effects were observed for the data at intake and 6 months. Children living with their parents or in regular foster care were less impaired than youths in various residential placements. The children in the specialized foster care group scored in between; their scores were not significantly different from children in the other groups (Hodges et al., 1997). The assumption that higher levels of impairment would be observed for youths with more serious psychiatric disorders (e.g., pervasive developmental disorders, psychosis, autism) and lower levels of impairment for youths diagnosed as having less serious disorders (e.g., adjustment disorder) was examined in the CMHS evaluation. Diagnoses were taken from the clinical record at intake. There were significant differences between the diagnoses. Youths in the psychotic/autism and developmental disorders groups were rated as significantly more impaired than youths in the depression, anxiety, and conduct/oppositional groups, who in turn were significantly more impaired than youths in the adjustment disorder group (Hodges et al., 1997). This finding is consistent with literature on the clinical course of these disorders (American Psychiatric Association, 1994). Predictive Criterion-Related Validity. The CAFAS total score at intake significantly predicted service utilization and cost at 6 and 12 months postintake in the FBEP (Hodges & Wong, 1997). The utilization indicators were restrictiveness of care (i.e., inpatient, alternative care and outpatient), total cost of all services received, number of bed days (i.e., days in a residential program), and number of days of service (i.e., number of days on which any services were offered). Multiple regression analyses revealed that higher impairment was significantly related to more restrictive care, higher cost, more bed days, and more days of services. Additional simultaneous multiple regressions were conducted to compare the predictive power of the CAFAS to other measures, including the widely used Child Behavior Checklist (CBCL; Achenbach, 1991). The CAFAS total score was a significant predictor of all four utilization indicators at both 6 and 12 months, whereas number of problems endorsed on the CBCL was not predictive of any of the utilization indicators (Hodges & Wong, 1997). Simultaneous multiple regressions were also conducted to compare the predictive power of the CAFAS to presence/absence of common diagnoses. At intake, diagnoses were determined via computer algorithm scoring, based on answers given in a structured diagnostic interview administered to the parent about their child's behavior, the parent version of the Child Assessment Schedule (PCAS; Hodges, 1990b, 1993; Hodges, Kline, Stern, Cytryn, & McKnew, 1982). The CAFAS at intake was the strongest predictor of subsequent cost, restrictiveness of services, number of bed days, and number of services at both 6 and 12 months. The only diagnosis that was significant at both 6 and 12 months was Conduct Disorder, however, the CAFAS was a more powerful predictor (Hodges & Wong, 1997). These findings provide considerable evidence of the reliability and validity of the CAFAS. The measure has demonstrated both concurrent and predictive validity in studies operating in applied clinical settings. In addition, the CAFAS performed better than other measures in predicting subsequent clinical sequelae, which have important

< previous page

page_638

next page >

< previous page

page_639

next page > Page 639

real-life implications, such as costs spent and restrictiveness of setting in which the child can be served. Basic Interpretive Strategy Comprehensiveness of the Information Base for the CAFAS. Taking the youth's circumstances into consideration, the user of the CAFAS information needs to determine whether the information on which the CAFAS ratings are based is adequate in terms of comprehensiveness. Were all important informants contacted? Is there reason to be concerned about potential bias in reporting? The user also needs to look at each scale to determine whether it was rated. If not, is there an explanation given? Are all important caregivers rated on the Caregiver scales? To ensure that this issue is attended to, the following information is collected on the CAFAS: youth's age, whether youth is enrolled in school, whether youth has a job, youth's caregivers (e.g., biological mother, etc.), youth's living arrangement and/or residential placement (e.g., family home, residential treatment center), and sources of information, including the method of obtaining information (i.e., in-person, telephone, written documentation) and relationship of informant to the youth. Additional information about the youth's current circumstances is also obtained because any psychological interpretation should be made in light of the youth's current circumstances and history. The following information is also recorded on the CAFAS: whether the youth can be maintained in his or her own community, types of services received since last rating, and psychiatric medications received (e.g., stimulant, antidepressant). Interpretation of the Total CAFAS Score. After a youth is rated, three sets of information are generated: total score, individual scale scores, and the specific item endorsements. The interpretation of the total score and the individual scales are discussed separately. The CAFAS total score is the sum of the individual Youth scales, using the values assigned to the four levels of impairment. There are no cutoff scores for the CAFAS that would dictate treatment decisions. Treatment decisions consider many variables, and, in optimal situations, they evolve from collaboration among the professionals, the youth's caregivers, the youth, and other important persons in the youth's life. Although research has shown that the impairment level of the youth may be a cornerstone (Hodges & Wong, 1997), other variables that are considered important in making treatment decisions include whether the youth demonstrates behaviors that put him or her or others at risk, and the degree of judged risk; the youth's awareness of the problems and willingness to work on them constructively; the resources available in the family and community for managing the youth's behaviors; the ability of professionals to work in a model aimed at providing the least restrictive care; existing economic incentives; and whether there is a consensus in the clinical literature on the recommended treatment protocol. Although there are no cutoff scores, a general framework for putting the CAFAS total score into context, referred to as Overall Level of Dysfunction, appears to be useful for laypersons. These guidelines were derived from research with the CAFAS (Hodges et al., 1997; Hodges & Wong, 1996). This total impairment score, when based on the five-scale sum, is categorized as follows: None/Minimal (0-10), Mild (20-30), Moderate (40-60), Marked (70-80), and Severe (90-150). For youths with SED, an exit CAFAS of 30 or

< previous page

page_639

next page >

< previous page

page_640

next page > Page 640

below would be a very good outcome provided that no individual scale was rated as 30. For the eight-scale sum, research to date suggests that comparable categories would be as follows: None/Minimal (0-10), Mild (20-40), Moderate (50-90), Marked (100-130), and Severe (140 and higher). For youths with SED, an exit CAFAS of 40 or below would be a good outcome as long as no individual scale was rated as 30. Interpretat on of the Individual Scale Scores. The CAFAS Profile is the cornerstone to interpreting the CAFAS results. Figure 21.1 shows the Profile for the Youth scales, which appears on the second page of the CAFAS form and is available on a separate two-sided form. Figure 21.2 presents the profile for the Caregiver scales. The horizontal axis contains the names of the individual CAFAS scales and the vertical axis, the levels of impairment. The numbers in the body of the table refer to the item numbers in the CAFAS. For example, "2" under the School scale for Severe Impairment refers to item number 002, which is "Expelled or equivalent from school" on the CAFAS form. The rater circles the item numbers that correspond to those items marked on the CAFAS form. Then the rater fills in the circles indicating the severity level for each scale. When the circles are connected, a profile appears. In Fig. 21.3, two profiles are presented for a case example; the solid line reflects the youth's profile at intake and the dotted line shows it at 6 months postintake. The same procedure is done for the Caregiver scales. The profile provides an easy format for focusing discussions about the client's needs and progress, whether it be with the youths, their caregivers, or other professionals involved in serving the youths. Over time, staff may want to establish clinical care protocols. Specific profile patterns or high endorsements on specific scales may be seen as indicators for evaluative consults, particular treatment protocols, or a specific plan for prioritizing the progression of treatment goals to be addressed. The first step in interpreting the profile is to examine the child's functioning across all areas (i.e., all eight scales) to assess technical validity. The rater should look for any inconsistencies in the profile. At the simplest level, any inconsistencies could represent an error in rating. For example, if an item is endorsed on the Mood subscale indicating that the youth's impairment affects school behavior or social interactions, then some impairment on either the School or the Behavior Toward Others scales would be anticipated. The next step is to notice any "unevenness" in the profile so that any questions can be raised in a timely fashion. Essentially, this means examining the profile to determine if differences in severity levels across the profile "make sense." Sometimes the rater needs to obtain additional information. The question that should be raised is, "What's going on with this child?'' An example is a youth who has no impairment at home but severe impairment at school. Is discrimination at school a problem? Another example is a youth who is doing well in school but very poorly at home. Has the clinician specifically asked the youth about abuse at home? Whereas many youths have "uneven" performance across various settings, raters should always ask "Does this make sense?" In addition, the rater should examine the specific items endorsed on each of the scales to try to understand how the various behaviors may be related to one another. The perceived relation will influence the specifics of the interpretation. For example, a different interpretation would be made for youths with severe impairments on the School scale (due to refusing to attend) and the Moods scale (due to severe separation anxiety) than for a youth with severe impairments on the School scale (due to disruptive behavior and academic failure) and the Moods scale (due to depression). In the former situation, not attending school and the separation anxiety reflect on the same issue (i.e., the child will not attend school so as to be with the caregiver). Whereas, in the latter case, the two elevations may represent two distinct problems that need to be addressed.

< previous page

page_640

next page >

< previous page

page_641

next page > Page 641

Fig. 21.1. CAFAS Profile: Youth's Functioning.

< previous page

page_641

next page >

< previous page

page_642

next page > Page 642

Fig. 21.2. CAFAS Profile: Caregiver Resources. After any questions or issues are resolved, the clinician can begin to characterize the youths along a variety of dimensions that will form the foundation for the treatment plan. The Profile as well as individual items point to areas of difficulty and specify problem behaviors or behavioral deficits. The clinician may integrate these in a variety of ways depending on theoretical orientation. The CAFAS scores can provide a common ground for persons with different theoretical orientations. Use of the Cafas for Treatment Planning In designing a treatment plan, specific behavioral goals are selected and a plan for accomplishing those goals is specified. From the list of specific items endorsed on the CAFAS, the team can specify the behaviors to be addressed during the next treatment period and prioritize those chosen. The list of goals provided for each CAFAS scale can be used to facilitate linking problem behavior with behavioral goals. Before a comprehensive plan for addressing problems can be generated, the context in which each problem exists needs to be sufficiently understood. The CAFAS Profile

< previous page

page_642

next page >

< previous page

page_643

next page > Page 643

Fig. 21.3. Example CAFAS Profile Comparing Pre- and Postfunctioning. Preintervention scores are solid; posintervention scores have dashes.

< previous page

page_643

next page >

< previous page

page_644

next page > Page 644

can be useful in preparing the treatment plan by identifying: areas of severe impairment, presence of risk factors, presence of comorbidity, pervasiveness of problems, youth's strengths, and family's resources. Identification of Areas of Severe Impairment Areas of severe impairment are easily identified by examining the CAFAS Profile. The fact that each problem is associated with an impairment level helps to identify the most critical issues to address. Problems indicating severe impairment or having the most pernicious effect on the youth's development are normally given priority in the treatment plan. Sometimes it is tempting to work on less intimidating goals; however, the clinician needs to be attentive to whether the treatment plan will eventually lead to an improvement in the youth's most impaired functioning. The CAFAS Profile and individual items provide a built-in focus on the most severe areas of impaired functioning. Risk Behaviors Specific items on the CAFAS inquire about behavior that poses a risk to the youth or others. The items, referred to as "Risk Behaviors," are listed on the first page of the CAFAS and are included on the client report generated by the CAFAS Computer System. They include suicidal, aggressive, sexual, firesetting, runaway, dangerous substance use, and psychotic behaviors. Psychotic behavior is included because sometimes the youths are dangerous to themselves because of their psychosis (e.g., put objects in electrical plugs; do not understand that they could fall out of a window) or are potentially dangerous to others (e.g., as seen in paranoid schizophrenia). These identified behaviors allow the team to consider whether immediate action is needed to prevent harm to the youth or others; whether the youth's school setting and living environment are appropriate, given the perceived risks; and how treatment will address these risk behaviors. Presence of Comorbidity Comorbidity refers to having more than one diagnosis or condition. Comorbidity is important because the treatment of choice for a disorder is typically different if the disorder is accompanied by another major disorder. The CAFAS Profile helps identify two important types of comorbidity: co-occurrence of a behavioral and emotional disorder and co-occurrence of a substance use disorder and another psychiatric disorder. Disorders reflecting behavioral problems (e.g., conduct disorder) are also described as externalizing disorders (Achenbach, 1991), and emotional problems (e.g., anxiety or depression) as internalizing disorders. Elevations on the Mood or Self-harmful Behavior scales would reflect internalizing symptoms, whereas elevations on the first four scales on the Profile (i.e., School/Work, Home, Community, Behavior Toward Others) generally reflect externalizing problems. Examination of the specific items would need to be made to confirm externalizing problems. Another important type of comorbidity is the presence of both a substance abuse disorder and another psychiatric disorder. This pattern is easily identified by a profile that peaks on the Substance Use scale and any other scale.

< previous page

page_644

next page >

< previous page

page_645

next page > Page 645

Pervasiveness of Problems Pervasiveness refers to the extent to which the youth has problems across different settings. Three of the scales are associated with specific settings (i.e., School/Work, Home, and Community). A fourth reflects on the youth's interpersonal relationships across settings (i.e., Behavior Toward Others). Generally, the more settings in which a youth has problems, the more costly and extensive the intervention is likely to be. In addition, a youth who has pervasive impairment for a prolonged period of time would have a much more guarded prognosis. Youth's Strengths A critical aspect of any treatment plan is to appreciate the youth's strengths and envision how to use these strengths to bring about desirable change. On the CAFAS, a rating of "No or Minimal Impairment" on any scale reflects on a relative area of strength. For example, on the Behavior Toward Others subscale, the item "Is able to establish and sustain a normal range of age-appropriate relationships" would indicate a strength in the area of social skills/interpersonal relationships. In addition, the strengths, listed under each scale in the current CAFAS, facilitate identification of the youth's and the caregiver's strengths within each domain of functioning. Strengths can also be added by choosing the "Exception" option under "No or Minimal Impairment,'' and describing a strength under "Explanation." On the computerized CAFAS, explanations for exceptions are printed out on the client report, permitting a way to include at least one strength on each scale. Doing this does not result in erroneous scoring because the CAFAS Computer System always uses the most severe endorsement to generate the scale score. Family Resources These scales provide information on the extent to which the youth's functioning is negatively affected by the caregiver's difficulty in providing for the youths' material needs and in providing them with the emotional support and guidance they need. The Family/Social Support scale contains items describing caregiver behaviors and characteristics that need to be considered in the treatment plan if they are present. Some example items include caregiver substance abuse, serious psychiatric illness, inadequate supervision of the youth, domestic abuse, abusive or neglectful behavior, and inadequate care of a child who was previously abused. There are also items that are nonblaming in that they indicate that the youth's needs exceed the family's resources. For example, a caring, but impoverished, single parent of three who has an autistic youth and works two jobs would likely need considerable additional help simply because the youth's needs would be greater than the caregiver's resources. This scale helps identify issues that need to be addressed in order for progress to be made in the youth's condition and, thus, need to be considered in planning interventions. For example, a caregiver who feels like rejecting the youth may benefit from respite care until the youth's behavior becomes less burdensome on the family, or alcoholic caregivers may need to lessen their addiction before being able to responsibly parent the youth. The purpose of the Material Needs scale is to identify situations in which the youth's functioning is hampered due to inadequate economic resources. This may be due to no fault on the caregiver's part (e.g., a hard working but impoverished parent, an elderly

< previous page

page_645

next page >

< previous page

page_646

next page > Page 646

grandparent who is caring for the youth without additional monetary assistance) or may be a reflection of bad parenting (e.g., a drug addict who buys drugs instead of food). In any case, a high score on this scale may mean that interventions other than psychological may be needed. Collaboration with other agencies to provide these services may be indicated. An appreciation of the caregiver's positive resources is also important in trying to design a plan that provides services in the least restrictive environment. The strengths, listed for the Caregiver scales in the current CAFAS, can help identify behaviors that can be used to foster change in the family and can serve to remind clinicians of the family's positive characteristics (e.g., caregiver exercises good control when provoked). Sharing the CAFAS Results with the Family During the treatment planning process, the youth and caregiver should be given an opportunity to review the CAFAS Profile and endorsed items. In fact, the client report produced for the computerized CAFAS lists all of the items endorsed and shows the profile in a graph format. On the report, there is a place for both the therapist and caregiver to sign. This gives the therapist an opportunity to ask the caregiver whether his or her assessment was correct. Disagreements that arise in the discussion could potentially lead the therapist to change the endorsements because of misunderstanding the circumstances or could lead to caregivers revising their perception of the youth's difficulties. In any case, a productive and important interchange will likely occur. This is important if the resulting treatment plan is to be viable. Examining the CAFAS Profile for the youth and caregiver, as well as the specific item endorsements, should contribute to the conceptualization of the case and the prioritizing of specific treatment objectives. In fact, identifying common CAFAS Profiles within a client population could provide insights about the nature of the treatment plans that would likely emerge and the types of services needed to serve the families. Identification of CAFAS Profile Patterns to Develop Treatment Protocols Cluster analysis and other related analytic procedures can be used to identify groupings of youths on the basis of their CAFAS Profiles (i.e., the youth's scores on the individual CAFAS scales). After identifying the types of clients served, consideration can be given to whether specific care protocols or service packages should be associated with each type. For example, empirical data generated for state administrators have provided a useful description of the youths they serve. This information can be used in planning care and service options for these youths and in managing the limited funds available. Five clusters of youths were identified by Lemoine and McDermott (1997) with a Louisiana sample and by Hodges and Warren (1997) with a Michigan sample. There was considerable similarity in the clusters identified, although there were differences in the types of information available on the youths, other than the CAFAS scores. Because the CAFAS Computer System was used in the Michigan project, considerable information about the youths, the caregivers, and the services received was available. For the purpose of illustration, three of the cluster types are briefly described. The most impaired cluster of youths had elevations on the following scales: School, Home, Behavior Toward Others, and Mood. These youths could be described as comorbid

< previous page

page_646

next page >

< previous page

page_647

next page > Page 647

in that they had severe externalizing and internalizing behaviors. Their most common diagnoses were oppositional defiant disorder and depression. Compared to the other clusters, they were much more likely to have been previously hospitalized and/or to have been placed outside of the home. They were characterized by multiple risk behaviors and involvement with multiple agencies. Their total CAFAS score indicated a very high level of impairment, and in fact, their total score was significantly higher than the other clusters. They scored above the group mean on each CAFAS scale, including the Caregiver scales. These youths would likely need intensive services and may require prolonged monitoring over time. Services might include comprehensive diagnostic services and medication consult, inpatient hospitalization for acute episode, monitoring for suicide risk, intensive services including wraparound and case management, therapeutic and supportive services for the caregivers, and active collaboration among agencies. A second cluster of youths had elevations on all three Role Performance scales (i.e., School/Work, Home, and Community) and the Substance Use scale. This was the only cluster with an elevation on the Community scale, which is an indication of delinquent activity. Conduct disorder was the most common disorder. The mean total CAFAS score for this group indicated significantly lower impairment compared to the first cluster, but significantly higher impairment compared to the other clusters. These youths had high Caregiver scale scores, as did the first cluster. Serious risk behaviors were present and current and past involvement with juvenile justice was common. Youths in this cluster would likely need active collaboration with juvenile justice and the school, substance use services, enhanced services that could provide the structure these youths typically need in both their educational settings and living arrangements, and supportive/adjunct services for the caregivers. The third cluster was composed of youths with high elevations on only two scales, Mood and Self-harmful Behavior. Their scores on these scales were significantly higher than the scores observed for the other clusters. They scored below the group mean on the scales indicating externalizing problems. Not surprising, the most frequent diagnosis was depression. Previous hospitalization was not uncommon for these youths, however, risk behaviors (other than suicide) and involvement with other agencies was uncommon. The protocol for these youths may include acute brief hospitalization, medication consultation, plan for monitoring suicide risk, and/or ongoing outpatient services. The treatment plan for each youth would be tailored to the youth's problems, the family's circumstances, and the resources in the community. Although each treatment plan is unique, techniques such as cluster analysis can provide useful data for generating guidelines regarding the range and types of resources that might be needed to serve various youths. One of the most important uses of such guidelines is that they provide a mechanism for empirically studying the relation between intake profiles and subsequent service utilization and costs. These guidelines could be used informally in supervision sessions or, on the other end of the continuum, be mandated protocols. Whereas the protocols could be solely based on customary clinical care, hopefully empirical data will be used to identify the elements of the treatment plan associated with improvement. Potential Limitations A potential criticism of the CAFAS is that it focuses more on problems than strengths. Treatment can be conceptualized as the art of seeing how the family's strengths can be used to develop alternatives to the maladaptive coping that has developed. For this reason, the strengths were added to the CAFAS to help link problems, strengths, and

< previous page

page_647

next page >

< previous page

page_648

next page > Page 648

goals within each scale domain. However, this addition did not change the scoring of the CAFAS scales, which continues to reflect on extent of impairment. This seems appropriate given that the primary measure of outcome for youths with SED will most likely continue to be judged by reduction in deficits and negative excessive behaviors. Some clinicians may object to measures like the CAFAS because they assume that the list of problems or goals generated dictates the content of therapy. That is not the case; any therapeutic approach thought to be an ethical and effective way to achieve the goals and reduce the problem behaviors is legitimate. For example, a youth may be brought for treatment for excessive disruptive behavior, and there may be no reports of depressive symptoms from the youth or the parents. The clinician may infer that there is underlying depression at the root of the problem. If there is no documentable depression present at intake, the Mood scale will not be scored as impaired. However, the therapist's intervention could focus on hypothesized underlying depression. Irrespective of the treatment approach, the therapist's work will be judged on whether the disruptive behavior documented on the School and Home scales decreased. Effective therapy could potentially affect these behaviors, despite the fact that therapy may not have directly addressed them. In summary, the CAFAS Profile provides a means of organizing the youth's problems and the caregiver's resources. It provides a concrete representation that easily helps focus treatment team discussions, supervision issues, and collaborative sessions with caregivers. The specific item endorsement can be easily translated to treatment goals. The CAFAS Profile pattern and the CAFAS total score provide a rough gauge of expected treatment intensity, anticipated course of treatment, and potential costliness. Use of the CAFAS for Treatment Monitoring An important characteristic of the CAFAS is that it can be used to actively manage treatment and lead to documented outcomes. In most clinical settings, the ratings for the CAFAS are done at intake, every 3 months thereafter, and at discharge. In this scenario, the time frame rated is the last 30 days. By reviewing the profile over time, objective assessment can be used to influence future treatment decisions. For example, if a youth's targeted behavior is not changing, then an alternative strategy should be entertained. The purpose of treatment monitoring is to get a reading on how the youth's behaviors and the caregiver's resourcefulness are progressing. If there is a stalemate, then the treatment plan needs to be reevaluated. This could take many forms: concretely showing the profiles to the family and seeking their input, formally or informally consulting with colleagues, seeking more supervision, or progressing to a different protocol developed for these situations. In fact, programs can empirically study the fruitfulness of various strategies for difficult situations. Change over time can be reflected in the profile, as demonstrated in Fig. 21.3, in which different lines represent the points over time (e.g., solid line represents intake scores, dotted line indicates scores at 6 months). In the CAFAS Computer System, the client report provides a listing of all previous CAFAS ratings for the client and produces a graph, illustrating the first and last CAFAS ratings. This comparison can be instructive for staff and the families. On the hopeful side, the family members can see that, despite the work that remains to be done, much has been accomplished. In contrast, when there has been a lack of progress, the graph provides an objective basis for a discussion about

< previous page

page_648

next page >

< previous page

page_649

next page > Page 649

realistic treatment options. By using a concrete representation, as depicted in the CAFAS Profile and computer graph, the failure to make progress or the worsening situation is undeniable. Staff can no longer assume that more of the same is better. The family members are also faced with the problem of generating a realistic plan. How often should the youth be rated? What should be the duration of the time period being rated? These decisions can be tailored to the individual setting or program and are related to the anticipated length of treatment. The most common scenario is rating on a quarterly basis (i.e., every 3 months), with the duration rated being 30 days. Because the rating is of the youth's most severe functioning, rating the last 30 days provides an opportunity for the therapeutic interventions to have some effect. Rating less frequently is not a problem. For example, in both the FBEP and the CMHS evaluations, time intervals of 6 months and 1 year were used. In this case, the length of the duration for the time period being rated can be 1 month or 3 months. Limitations of the CAFAS for Treatment Monitoring in a Managed Care Setting The CAFAS was not designed to assess change over short time spans, such as 2 weeks. Other measures designed for daily ratings would be preferable. On the other hand, sometimes a 2-week stay in one program is part of a continuum of care. Thus, the youths could still be assessed with the CAFAS over time as they progress from more to less restrictive treatment settings. This is fine for assessing the youths, but what if an objective of the evaluation is to assess the program? In the situation of a short-term hospitalization, the unit could rate the youths on all eight scales before and after the short hospitalization. In addition, at intake, the staff could stipulate which individual scales of the CAFAS they plan to impact during the short-term hospitalization. Examples would be the Self-harmful Behavior scale for suicidal youths, the Mood/Emotions scale for depressed youths, the Thinking scale for psychotic youths, and the Behavior Toward Others scale for youths with a variety of other presenting problems. Provided that the unit chooses the outcome criteria ahead of time by stipulating the scales on which they will be assessed, this is an appropriate option. Rating the entire CAFAS provides a picture of how the youths look as they progress through each stage of the continuum of care. In fact, the CAFAS Profile and item endorsements can be forwarded with the youths as they progress through the continuum of care. For example, when they exit from a service, the staff can indicate how the CAFAS Profile and item endorsements have changed from admission to discharge and prepare a plan for the step-down program to which they are transferring. Use of the Instrument for Treatment Outcomes Assessment This section tackles three tasks. The CAFAS will be evaluated on the criteria for evaluating outcome measures proposed by Ciarlo, Newman, and colleagues (Ciarlo, Brown, Edwards, Kiresuk, & Newman, 1986; Newman & Ciarlo, 1994). Research data

< previous page

page_649

next page >

< previous page

page_650

next page > Page 650

relevant to sensitivity to change is presented. Example outcome indicators that can be used in applied clinical settings are described. Criteria for Evaluating Outcome Measures Relevance to Target Group. The target group for the CAFAS is youths referred for mental health problems, youths at risk for mental disorders, and youths with serious emotional disorders. It is appropriate for children ranging from minimal to high impairment. It has been successfully used to assess outcome in children referred from a variety of child service agencies, including schools, social services, juvenile justice, and substance use treatment programs (Hodges et al., 1997; Hodges & Wong, 1996). Information on the relation between client sociodemographic characteristics and the CAFAS have revealed only two consistent findings. Older youths score as more impaired than younger youths, and children from low income families are more impaired than children from higher income families. These two effects were observed in both the FBEP (Hodges & Wong, 1996, 1997) and the CMHS evaluations (Hodges et al., 1997). There was no significant difference for gender found in the FBEP study (Hodges & Wong, 1997); however, in the CMHS study, males tended to score higher because of more behavioral problems (i.e., on the three Role Performance scales and the Behavior Toward Others scale). However, because the CMHS sample was not representative, but a collection of at-risk youths, this may be secondary to the types of youths selected for the interventions. No effects due to race have been documented except for Caucasians scoring as slightly more impaired than other ethnic groups in the FBEP study (Hodges & Wong, 1997). No effects for caregiver education have been observed. Simple, Teachable Methodology. The CAFAS has straightforward and easily understandable instructions that can be implemented by staff who are professionally charged with caring for or treating youths with mental health problems. The measure has objective referents for the most part, in that the items on the CAFAS are mostly behavioral descriptions. There are extensive training and support materials. The CAFAS Self-training Manual (Hodges, 1990a, 1994c) can be used to train raters to be reliable. It contains instructions for scoring, demonstration vignettes with answers, and vignettes to use for the purpose of establishing reliability. New employees can work through the manual on their own. It is helpful to assign one staff member to ensure that the training is satisfactorily accomplished. This person would score the new employee's reliability vignettes using the answer key provided. Simple "eyeball" criteria for judging reliability are given in the Manual for Training Coordinators, Clinical Supervisors, and Data Managers (Hodges, 1997). There are also Supplemental Vignettes (Hodges, 1994f) that can be used if the trainee does not achieve reliability with the Self-training Manual. The structured interview for the CAFAS, also referred to as CAFAS Interview Parent Report (Hodges, 1994a), can also be used as a teaching tool. It is organized around the CAFAS scales and contains inquiries for soliciting all of the information needed in order to rate youths on the CAFAS. It takes about 30 minutes, is done with a caregiver, and can be readily administered over the telephone. Rating the CAFAS is very straightforward after having conducted the interview. The CAFAS interview is optional. It is particularly useful for research purposes or when raters of the CAFAS are professionals

< previous page

page_650

next page >

< previous page

page_651

next page > Page 651

who have not been formally trained in the child mental health field. There are also training and support materials available for the PECFAS. Use of Objective Referents. Newman and Ciarlo (1994) defined an objective referent as one for which concrete examples are given for each level of a measure or at least at key points on the scale. The CAFAS has 200 concrete descriptions of behaviors grouped by level of impairment, from which the rater chooses those that apply to the youth. The items are specific, for example, "chronic truancy resulting in negative consequences (e.g., loss of course credit, failing courses or tests, parents notified)." The items are written in common everyday terms. A detailed set of instructions that provide the rationale underlying the scoring, definitions, and examples are contained in the CAFAS Self-training Manual (Hodges, 1994c). This characteristic of the CAFAS makes it a particularly credible means for justifying services or assessing outcome. The ratings are verifiable. In fact, specific supporting comments can be written on the same page as the CAFAS scale (e.g., on the School scale page, the rater could write: "youth was truant for 30 days in the last 4 months of school; absences were reported to the truant officer and parents were notified in December 1997"). This reduces redundancy in the medical record yet makes a clear record that holds up well under audits. This procedure was encouraged by a state administration that hired an accounting firm to audit 10% of all mental health applications for state funding. Use of Multiple Respondents. The CAFAS is completed by the clinician or staff member who has interviewed the family. The rater is instructed to seek information from all important informants and to base the ratings on the behavior observed and the information reported. To encourage raters to use multiple informants, the first page of the CAFAS asks the rater to indicate the type of informants (i.e., parent, youth, school personnel, foster parent, juvenile justice, social welfare, mental health worker, or public health worker) and the modes of communication (i.e., in-person, telephone, or review of documents). The interview developed for the CAFAS seeks information about all of the major aspects of the youth's life. In addition, in the computerized CAFAS, a place is reserved for the caregiver's signature on the CAFAS client report. This would likely ensure that caregivers have input and are able to state whether or not they agree with the assessment. More Process-Identifying Outcome Measures. The criterion states that it is preferable to use measures that provide a means for regularly collecting information on the client's progress, such as symptom or functioning assessments. The specific items endorsed on the CAFAS provide a map of what needs to be changed or addressed. Periodic ratings of the CAFAS provide an ongoing progress report. Low Cost. In its typical use, the CAFAS is very cost-effective because it can satisfy two requirements; it can serve as a measure of outcome as well as satisfy the requirement for initial and updating assessments needed to justify level of care. Rating the CAFAS can be integrated with the client tracking system for case management or treatment team reviews, as described by Newman and Ciarlo (1994). Typically, the CAFAS is rated at intake and every 3 months thereafter. Because the CAFAS does not require the administration of an interview or a questionnaire to the client, there is no time burden on the clinician other than the rating, which takes about 10 minutes. There are numerous formats available for the CAFAS: regular paper version, a scanning form that can be scanned by fax or scanners, and a computer program. There

< previous page

page_651

next page >

< previous page

page_652

next page > Page 652

are two scanning forms: NCS® and Teleform®. All three formats provide a profile for quick visual uptake of the youth's impairment level and, at the same time, indicate which specific items on the CAFAS were endorsed. The scanning form and the computer program both input the data into a computer file for later analysis, providing a straight-forward and economical means of collecting and entering data. For sites without scanning or computer capacities, the first page (two-sided) of the written CAFAS form contains the profile on which the specific item endorsements are indicated. Thus, once the CAFAS form is completed, the first page can be detached and routed to data entry. The CAFAS form typically becomes part of the clinical monitoring process, so the time invested in rating the CAFAS is not an additional burden. With the CAFAS Self-training Manual (Hodges, 1994c), it is reasonable to require new employees to work through the manual on their own. Additional instruction is not needed. If the CAFAS Profile is routinely discussed in treatment planning sessions, the new employee's learning is enhanced. The CAFAS Self-training Manual is the only training material needed, and it is reusable. Understanding by Nonprofessional Audiences. The CAFAS is easily understood by nonprofessional parties, including caregivers, nonmental health professionals, and administrative or bureaucratic personnel concerned with fiscal or public policy issues. The CAFAS requires no interpretation. The scores for the various scales have a qualitative meaning: severe, moderate, mild, or no impairment. The arenas assessed are straight-forward and meaningful in real life (e.g., school, home, etc.). Impairment is described primarily in terms of impact on everyday behavior (e.g., sent to school authority figure because of failure to follow school rules, depression accompanied by refusal to go to school). Indicators of change over time can be qualitatively described for the individual youth and for groups of clients. Both consumers and other stakeholders can easily get a picture of how youths are functioning and whether they have improved. Data on the CAFAS can be aggregated across clients to answer questions posed by administrative and legislative policymakers. Statistical tests can be performed on the quantitative scores generated by the CAFAS to help evaluate program effectiveness. Both the scannable forms and computer software for the CAFAS produce electronic databases that can be used to answer questions posed by administrators and policy-makers. In addition, the CAFAS sheds light on the costs to the public in terms of illegal or delinquent behavior through the Community scale. Easy Feedback and Uncomplicated Interpretation. The scores from the CAFAS are not derived or based on any scoring key or algorithm. When the rater identifies the items that are true for the youth, the score is determined for that scale. The results are easily interpreted and require no further explanation than is contained on the CAFAS form itself or the client report produced by the computerized CAFAS. Both the written and computer forms contain narrative and graphic presentations of the results. Usefulness in Clinical Services. Newman and Ciarlo (1994) judged the usefulness of a measure by its ability to describe the likelihood that the client needs services; to help in planning the array and levels of services; to provide justification for the services for third-party payers; and to help assess whether the client is responding to the treatment, and if not, to delineate the areas of functioning for which the treatment appears to be unsuccessful. The CAFAS is useful in accomplishing each of these clinically relevant tasks. The CAFAS can be useful in a variety of clinical functions, including documenting

< previous page

page_652

next page >

< previous page

page_653

next page > Page 653

need for services, determining eligibility for levels of care, treatment planning, and treatment monitoring. Compatibility with Clinical Theories and Practices. The CAFAS is not dependent on any given theory or view of child psychopathology. The CAFAS can be used to evaluate a variety of interventions; it is not dependent on a particular orientation. It assesses the positive effects of interventions. Irrespective of the intervention used, functioning is measured in terms of how the youth performs at various age-appropriate life tasks. The spectrum of scales covering various arenas of functioning also demonstrates improvement in specific areas, even if other areas have not changed. Sensitivity to Change Over Time The CAFAS consistently shows changes as treatment progresses and the youth improves. In the FBEP, repeatedmeasures analysis of variance was conducted to determine whether impairment scores (i.e., CAFAS total score) became lower over time. There was a significant main effect for time. From intake to 12 months, youths who were in residential care at intake had a drop in CAFAS total score from 64 to 34; for youths in alternative care at intake, the reduction was from 54 to 29; and for the youths in outpatient care at intake, the reduction was from 41 to 23. At 18 months, the means for all three groups were within 3 points of one another (i.e., 22 to 25; Hodges & Wong, 1996). Analyses collapsing across all groups also found the CAFAS to be sensitive to change. There was significant reduction in total CAFAS score from intake, M = 45.65, SD = 26.47, to 6 months postintake, M = 31.39, SD = 26.03, t(780) = 14.33, p < .0001. The effect size for change from intake to 6 months was .51. For the interval from intake to 12 months, the effect size was .67, t(616) = 16.67, p < .0001, and for intake to 18 months, .78, t(372) = 15.00, p < .0001. These effect sizes represent moderate to large effect sizes (Hodges et al., 1998). The FBEP lacked a no treatment or placebo treatment control group, thus the notion that change over time is related to treatment variables can only be assumed. In the CMHS evaluation (Hodges et al., 1997), there was a significant main effect for time. The youth's total CAFAS score decreased significantly from intake to 6 months, indicating less impairment over time. An important related question is whether the CAFAS is appropriate for assessing change over time in youths with various characteristics. In the CMHS evaluation, there is considerable variability in the characteristics of the youths and their families, permitting examination of this issue. A series of repeated-measures analysis of variance was conducted to determine whether the CAFAS was affected by demographic group membership, nature of the child's problems, or risk factors over the 6-month period. The dependent variable was total CAFAS score, with two levels (intake and 6 months postintake). A significant interaction would mean that some youths improved at a more rapid rate than others. All of the interactions for time by the various youth characteristics were nonsignificant. This means that although some youths were more impaired at intake than other youths, all of them improved. There were no interactions for demographic variables, including age, gender, race, family income, or custodial caregiver. Children referred from various agencies (e.g., mental health, social services, juvenile justice, school, etc.) improved at similar rates, as did youths with varying diagnoses. Youths with a history of difficulties in school, with the law, or with past mental health or substance use problems improved at a rate similar to that of children without those risk factors (Hodges et al., 1997). The CAFAS was

< previous page

page_653

next page >

< previous page

page_654

next page > Page 654

successful at documenting outcome along the entire spectrum of clinical need, from mild to severe impairment. Outcome Indicators Illustrations of various outcome indicators and how to derive them are available in the Manual for Training Coordinators, Clinical Supervisors, and Data Managers (Hodges, 1997). An example evaluation for an agency is presented in Hodges et al. (1998). These indicators can be used to describe clients at intake, as well as to describe outcome over time. In addition, outcome can be determined at the client level or for an aggregated group of clients. CAFAS Total Score. The total score can be calculated by summing the eight Youth scales. The Caregiver scales are not included in this total. The eight-scale sum is preferable to the five-scale sum because it is more reliable and considers all of the information available on the youths. For aggregated data, the difference in the pre- and posttreatment scores can be analyzed with paired t tests, followed by calculation of the effect size (Lipsey & Wilson, 1993). At the individual client level, another approach is to determine whether the youth's score decreased (indicating improvement), increased (indicating further deterioration), or stayed about the same. Given that the CAFAS score is measured in 10-point increments, requiring a difference of at least 20 points or more takes error into account. For quality assurance efforts, it makes sense to identify all cases in which the CAFAS total score increased. To the extent that other data is readily available in the same data set, analyses can be done to determine the client characteristics, environmental factors, and treatment history for youths whose clinical status deteriorated. Furthermore, it would be important to determine whether youths who deteriorated had participated in treatment or were early dropouts. For these reasons, the CAFAS Computer System collects data other than the CAFAS, including client characteristics, services provided, and circumstances of termination. CAFAS Individual Scale Scores. Paired t tests and effect sizes can also be determined for each of the eight Youth scales and the Caregiver scales. Typically, statistically significant results are observed for School/Work, Home, Community, Behavior Toward Others, Mood/Emotions, and Self-harmful Behavior. Significant findings for the Thinking or Substance Use scales would only be anticipated if there was impairment represented in the group at intake. Figure 21.4 presents an illustration of the decrease in impairment scores observed from pre- to posttreatment. This graph would typically be accompanied by a table presenting means, standard deviations, paired t-test results, and effect sizes. Number of Individual CAFAS Scales Rated as Severe. This is the total number of the eight scales rated as severely impaired. A logical goal for an agency is to have no severe impairments on discharge. For some youths, reducing some scales to no or mild impairment may be unrealistic. Figure 21.5 shows the change over time in the number of scales that were rated as severely impaired. In Fig. 21.5, the solid line represents youths who had no scales rated as impaired, whereas the nonsolid lines represent the other groups. If youths improve over time, the number of youths having no scales rated as severe should increase, whereas the youths with one, two, three, four, or five scales rated as severe should decrease.

< previous page

page_654

next page >

< previous page

page_655

next page > Page 655

Fig. 21.4. Example graph showing pre-and postscores for the CAFAS individual scales.

Fig. 21.5. Example graph showing change in number of youth rated as severely impaired on eight CAFAS scales at intake and 6 months. Number of Individual CAFAS Scales Rated as Either Moderate or Severe Impairment. Youths could be described as having a clinically meaningful degree of impairment in an area of functioning if they are judged to be moderately or severely impaired on the relevant scale. Thus, another outcome indicator is the change in the number of scales rated as moderately or severely impaired. A graph similar to that shown in Fig. 21.5 could be produced. In addition, at intake, this indicator could be useful for program planning. The proportion of youths meeting this criteria for each of the eight Youth scales provides information about the need for specific services and for collaborative arrangements with treatment programs or agencies serving children and families (e.g., a substance abuse inpatient program, school liaison).

< previous page

page_655

next page >

< previous page

page_656

next page > Page 656

Change in Category for Overall Level of Dysfunction. This is a variation of comparing the pre- and post-CAFAS total score, generated by summing the eight Youth scales. The categories are the same as those previously described in the ''Basic Interpretative Strategy" section. Figure 21.6 illustrates the change over time for each of the five levels of overall dysfunction. The solid line shows how the number of youths in the two lowest categories (i.e., 0-10 and 20-40) increased from first to last assessment, and the nonsolid lines show that the number of youths scoring in the higher levels of impairment at intake (i.e., 50 and higher) decreased from first to last CAFAS. Youth's Risk Behaviors. Pre- and postcomparisons can be made on specific risk behaviors, which can be measured by the presence or absence of each behavior or by the total number of risk behaviors present for a youth. The behaviors of the youths, which can put themselves or others at risk, are defined as physical aggression in any setting, sexual inappropriateness in any setting, firesetting, runaway behavior, dangerous substance use, and suicidal talk or acts. Precautions in Using the Cafas as an Outcome Measure The CAFAS would be inappropriate to use for community samples or for nonclinical samples. An example of the latter would be evaluating intervention programs with nonclinical samples, such as a group intervention offered after school to youths without clinical problems. It is important that each scale score on the CAFAS be supported by endorsement of at least one behavioral description of the youth. The CAFAS instructions stipulate this, and the form is designed to make justification easy to do. This is in contrast to global measures that do not require explicit justification of ratings (Hodges, 1994d; Hodges & Gust, 1995). In order to maintain the integrity of the measure, it is critical that CAFAS ratings on each scale be supported by at least one specific item endorsement. Thus, only requiring scale scores for each scale would be unwise. This would endanger

Fig. 21.6. Example graph showing change in dysfunction level from intake to 6 months.

< previous page

page_656

next page >

< previous page

page_657

next page > Page 657

the accuracy of the scores because it may result in clinicians being less precise in doing the ratings. If this were to happen, then the reliability of the measure would be lessened, with the end result of reducing the measure's sensitivity to assessing change. Although the clinician or case manager is likely to be the best person to rate the youths on the CAFAS, for some specialized purposes an independent assessment may be preferable. In these situations, it may be best to have trained raters administer the CAFAS Interview (Hodges, 1994a) and, afterward, rate the youths on the CAFAS. The CAFAS Interview provides the independent rater with a means of obtaining information and ensures that the same procedure is used for both pre- and postassessments. For example, a managed care company has used the CAFAS Interview, administered over the telephone, to obtain CAFAS ratings on a sample of their clients whose services were being funded through the department of social services. The state required that providers report on outcome for a sample of the clients served. This same approach can be used to conduct follow-up outcome studies for costly residential services in which the youth's behavior is constrained by external controls. Assessing outcome in the postdischarge time period after return to the community provides the most credible evidence. Potential Use as a Data Source for Mental Health Service Report Cards The CAFAS can be used as an outcome measure for mental health service report cards. In fact, the CAFAS was recommended by the task force convened by the Mental Health Statistics Improvement Program (MHSIP) of the Center for Mental Health Services to develop a consumer-oriented mental health report card (Mental Health Statistics Improvement Program Task Force, 1996). The purpose of the report card is used to provide data that would permit health care purchasers, state mental health agencies, and mental health consumers to compare and evaluate mental health providers based on concerns important to consumers. Providers could also use the report for internal evaluation. The MHSIP report card assesses services in four domains, one of which is outcome. The CAFAS was recommended as a measure for evaluating children and adolescents with SED and for non-SED youths. The CAFAS was recommended for six outcome indicators. The outcome indicators and the CAFAS score selected by the MHSIP Task Force are as follows: reduced psychological distress, proportion of youths with a decreased level on the CAFAS Mood/Emotions scale; reduced impairment from substance abuse, proportion of youths with a decreased level on the CAFAS Substance Use scale; improvement in school performance, proportion of youths with a decreased score on the CAFAS School Performance scale; reduced involvement in the criminal justice system, proportion of youths with a decreased level on the CAFAS Community Role Performance scale; increased social integration, proportion of youths with a decreased level on the CAFAS Behavior Toward Others scale; and increased overall level of functioning, proportion of youths with a decreased score on the CAFAS total scale. The MHSIP Task Force recommended administering the measures at admission, 3 months after treatment begins (or at the end of treatment), and a year from admission for those still receiving services. Data is currently being collected by some of the 19 states that received State Reform Grants for Assistance in State Planning and Managed Care Reform Efforts, funded by the Center for Mental Health Services. In fact, one of these sites has generated a mental health service report card in which the CAFAS was used to evaluate services to youths (Newman et al., 1997).

< previous page

page_657

next page >

< previous page

page_658

next page > Page 658

Clinical Case Example This section presents a clinical case example. It includes a summary of the information learned at intake, organized around the CAFAS Interview, and interpretation of the CAFAS scores based on the intake information and a subsequent assessment made 6 months later. Background Denny is a 12-year-old male who is in the sixth grade. He lives with his biological parents and an older brother. Because of serious financial problems, the family is currently living with Denny's maternal grandmother, her new husband, and his two sons. Denny has been placed in a classroom for the emotionally impaired since kindergarten. At age 9, he had to be hospitalized for a period of 10 weeks as a result of his uncontrollable behavior. Over the next 2 years, Denny received outpatient care and medication (for attention deficit hyperactivity disorder), and several unsuccessful attempts were made to mainstream him. This school year began with Denny once again being placed in a classroom for the emotionally impaired. Mrs. X has refused to accept medication for him any longer because she feared that his growth may be stunted by his continued use of the drug. This year, his behavior has become so harmful and destructive that 3 weeks ago he was expelled from school. School. This year, Denny's grades have dropped to all Ds or below. Mrs. X says that Denny "hates school." Since discontinuing his medication (2 months ago), Denny has found it very difficult to concentrate or to motivate himself to do any work. He was expelled from school for aggressive and noncompliant behavior in the classroom. He has major attention problems and hyperactivity that have become so serious over the past 2 months that he is not able to continue even in the special setting provided by his school. Home. Denny is very troubled by his current living arrangements. He resents having to share a room with his brother and his new stepuncles. He will do chores only if given repeated warnings. He is constantly fighting with other family members and has threatened them with kitchen knives. Last month, he hit his brother with a glass bottle. His brother needed 15 stitches to repair the gash. Denny consistently provokes others by deliberately doing things that annoy them. He and his father are often yelling and swearing at each other. Denny uses obscene language and curse words as a normal part of his vocabulary. Mrs. X says that she is in tears a couple of times a week because Denny's problems seem to be more than she can manage. Community. Denny often lies. Mrs. X does not think Denny has been involved in any delinquent behavior outside the home. She stated that "his saving grace" so far has been that he does not have friends with whom to do these things. Behavior Toward Others. Denny's verbally and physically aggressive behavior makes it very difficult for him to sustain any positive peer relationships. He has bitten, hit, and kicked teachers and classmates. Last year he kicked a pregnant teacher in the abdomen. He is very argumentative and defiant. He will initiate physical fights with classmates and has threatened them with whatever object is in his immediate reach. He admits that he is cruel to people and animals. Last week, when playing with his pet

< previous page

page_658

next page >

< previous page

page_659

next page > Page 659

lizards, he killed one, supposedly by accident. Denny says he has friends, but they annoy him sometimes and they cannot be trusted. Mrs. X says that Denny really does not have any friends because of his behavior. Mood/Emotions. Mrs. X reports that Denny has erratic and frequent mood changes. She states that his moods of sadness and irritability have become frequent and of greater intensity over the last several months. He worries about fires, especially at night, and is excessively worried about his parents' health. He feels sad and hopeless and also feels that there is no reason to live. Denny feels very guilty about his behavior, but he foresees no immediate change in his future. Denny reports continual stomachaches and extreme difficulty falling asleep. He says that it takes him up to 4 hours to calm down after going to bed so that he can get to sleep. He says his mind just "keeps racing" and will not slow down. An interview with Denny revealed that he has unresolved grief over the death of his grandfather 3 years ago. His grandfather appears to have been his main psychological, nurturant parent figure. His grandfather died of choking while Denny looked on helplessly. Denny reported feeling guilty and bad for not having been able to save his grandfather. Denny's parents seemed to be largely unaware of these feelings. Self-Harmful Behavior. Denny has seriously considered suicide as recently as last week. When asked how he would do it, he said he would put a plastic bag around his head and suffocate himself. Mrs. X reports that occasionally Denny will get so frustrated that he harms himself. Last month at school, he became so angry that he dug a pencil up his arm until it bled. Substance Use. Denny denies any substance use. Thinking. No impairing thought disturbances were reported. Primary Caregiver Resources: Material Needs. Mrs. X says that the last few months have been very hard on the family. They went through some very "rough" times when they had to file for bankruptcy. She says that Denny's problems were made worse when the family finally had to move in with his grandmother. Although he has to sleep on the floor, at least this month he "has a roof over his head." Primary Caregiver Resources: Family/Social Support. Mrs. X feels that Denny is a major cause of conflict in the family. Arguments between her and her husband have become more frequent and the tension is more noticeable. She says her husband has a bad temper and often becomes verbally insulting and openly blames Denny for the family's problems. Both parents agree that it might be a lot better were Denny to live elsewhere. It is noteworthy that the staff at the hospital thought that, when angered, Mr. X would be frightening to most adults and children. Cafas Evaluation at Intake Denny's therapist rated the CAFAS after having interviewed Denny and his parents and after having talked to the school principal and counselor. Denny's profile at intake is represented by the solid line on Fig. 21.3. The item numbers circled on Fig. 21.3 correspond to specific item endorsements on the CAFAS. Denny's total score across the eight Youth scales is 150, which would place him in the severe category for Overall Level of Dysfunction. His score is about two standard deviations above the mean for youths referred for mental health care. Another reflection of his high degree of dysfunction

< previous page

page_659

next page >

< previous page

page_660

next page > Page 660

is the fact that he is severely impaired on over half of the scales (i.e., five out of eight Youth scales). He is severely impaired in the following areas: School, Home, Behavior Toward Others, Mood, and Self-harmful Behavior. Although there are exceptions, his lack of impairment in Community and Substance Use is expected for his preadolescent age. Denny is dysfunctional across different areas of functioning, including school, home, and social situations in general. In addition, his profile peaks suggest comorbidity, with both behavioral problems and clinical depression. A review of the specific CAFAS items endorsed reveals two areas of concern: aggression and suicide. Items 4 and 43 are keyed to aggression at school and home, and Items 119 and 144 load on suicide risk. The school judged him to be such a significant risk to other students and teachers that he was expelled. Family members consider him dangerous, although they think he would like to be good. Given these concerns, a school and home environment that provides protection to others needs to be in place before leaving the hospital. During his current hospital evaluation, he will need structure and close supervision for his aggressive and suicidal behaviors. A review of the Caregiver scales indicates that the parents' financial situation has adversely affected Denny, in that overcrowding in the maternal grandmother's home has exacerbated difficulties in the home. In terms of their ability to provide a nurturant home, the parents' openly rejecting feelings toward Denny and the father's apparent modeling of poorly controlled anger need to be addressed. Despite the problems, there are many strengths in this family. Denny is a very likable youth who arouses empathy in others. The following strengths were identified: School (Denny likes to read; he is willing to attend school even though he dislikes it); Home (Denny will do chores with persistent reminders; he seems to care about his parents and brother despite his behavior), Community (follows laws; has never been in trouble with the law; does not have delinquent friends), Behavior Toward Others (has a range of emotions; is aware to some degree of his depression and his grief and guilt over his grandfather's death; is aware of his problems and feels badly about them), and Thinking (despite his difficulties, Denny can communicate with others; can think well abstractly; is capable of good problem-solving skills although he does not always use them). In terms of the family, the parents are both employed and want to return to having their own home or apartment for their family. Neither parent is a substance user, and both appear to be concerned about Denny. Their feelings of rejection of Denny appear rooted in feeling tired, overburdened, and helpless to change the course of events. CAFAS Evaluation at 6 Months After 6 months, Denny was back in school in the classroom for the emotionally impaired. The school agreed to take him back with the stipulation that he would have a personal aide that would accompany him until he could manage problems without aggressiveness or threats. His behavior improved after starting Zoloft and Ritalin. The parents accepted the medication for his attention deficit hyperactivity disorder after it was demonstrated that Denny's growth had not been stunted. After 2 1/2 months, the personal attendant was not required; however, the counselor at the school worked closely with the teacher in the classroom for the emotionally impaired so that Denny could see her whenever he appeared to be getting agitated. He was sometimes argumentative with the teacher but was manageable in the classroom. Denny was no longer threatening aggression in

< previous page

page_660

next page >

< previous page

page_661

next page > Page 661

his interactions with peers. He still had considerable peer problems, mostly antagonizing others to the point that they were quite rejecting of him. Denny's behavior improved greatly after moving from the grandmother's house. It appeared that living in his grandfather's former home had exacerbated his feelings of depression, grief, and helplessness related to the loss of his grandfather. The parents reported they were less irritable and had more patience with Denny now that they were in their own place. With in-home counseling and respite services, both Denny and his father were in better control. Denny was able to express his objections verbally, although he still used profanity. Denny changed markedly in his moods and emotional control. He was feeling better about himself with his successful reintegration at school. He still had feelings of low self-esteem, became sad for short periods of time, and had disproportionate expressions of irritability at times. He no longer made suicidal threats or talked about suicide. He did continue with habits such as pinching himself and digging at his skin. However, he had no abrasions that warranted medical attention. Denny's scores at 6 months postintake are recorded by a nonsolid line in Fig. 21.3. Denny's total score across the eight Youth scales is 80, which places him in the moderate range for Overall Level of Dysfunction. He is not severely impaired on any of the individual CAFAS scales. Denny still has pervasive moderate problems, at school and at home. He no longer has any risk behaviors endorsed. In terms of Caregiver Resources, the Material Needs scale was rated as no impairment because the parents had been able to provide adequate and stable housing. Both parents are still working. On the Family/Social Support Caregiver scale, the family was rated as moderately impaired because there is still considerable family conflict and resentment (Items 224 and 225 on the CAFAS form). Consequently, there is a plan to continue in-home counseling and respite care as needed because the case manager believes that without them, Denny's condition will deteriorate. Conclusions The CAFAS is sensitive to change in client status, as demonstrated by several studies that included referrals for mental health services and highly impaired youths with SED. It works equally well for youths whose referral originates from juvenile justice, child welfare, education, or mental health. The outcome results generated by the CAFAS can have an important impact because administrators and policymakers can see that the quantitative differences represent meaningful qualitative changes that have wide appeal (e.g., keep kids in school, in their own home, and out of trouble). Furthermore, the CAFAS can be used not only for assessing outcome but also as one of the criteria for demonstrating need for services and for qualifying for specific levels of care. As the youth's service needs change, so should the CAFAS results. Even more importantly, the CAFAS can be readily understood by families, who are entitled to know how any decisions and judgments are made that might affect them and their children. Typically, outcome measures have been offensive to clinicians, who view them as increasing the burden of paperwork yet adding little value to planning or actual treatment. This is exacerbated when the outcome results are turned over to an oversight authority before the clinical staff has any information on how that data reflect on the program. The CAFAS minimizes these negative elements. The CAFAS can be used by

< previous page

page_661

next page >

< previous page

page_662

next page > Page 662

clinicians to monitor the progress of treatment; the clinicians can modify treatment based on their observations. The graphic depiction of the CAFAS scores on both the CAFAS Profile and on the computerized CAFAS client report helps all persons involved in the youth's care, including the family, focus on extent of improvement in the most important domains of functioning. In this respect, the CAFAS is the clinician's ally. Another task that typically falls to the clinician is recording information. When multiple inputs of the same information are required, this too becomes onerous. The materials available for the CAFAS, such as the scannable forms and the computer program, were designed so as to minimize redundancy in recording information. The increasing requirements for reporting outcome and for justifying services can be used in a way that will benefit clients and their families. These activities can contribute to better clinical care and serve as a basis for clinical studies that provide insight into the types of services and programs that most effectively serve different client groups. References Achenbach, T.M. (1991). Manual for the Child Behavior Checklist/4-18 and 1991 Profile. Burlington, VT: University of Vermont Department of Psychiatry. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Bird, H.R., Yager, T.J., Staghezza, B., Gould, M.S., Canino, G., & Rubio-Stipec, M. (1990). Impairment in the epidemiological measurement of childhood psychopathology in the community. Journal of the American Academy of Child and Adolescent Psychiatry, 29, 796-803. Breda, C.S. (1996). Methodological issues in evaluating mental health outcomes of a children's mental health managed care demonstration. Journal of Mental Health Administration, 23, 40-50. Ciarlo, J.A., Brown, T.R., Edwards D.W., Kiresuk, T.J., & Newman, F.L. (1986). Assessing mental health treatment outcome measurement techniques (National Institute of Mental Health, Series FN No. 9, DHHS Publication No. ADM 86-1301). Washington, DC: U.S. Government Printing Office. Doucette-Gates, A., Liao, Q., Sondheimer, A., & Slaton, E. (1997, February). CMHS evaluation: Model changingmetaphors and meaning. Symposium presented at the 10th Annual Research Conference of a System of Care for Children's Mental Health: Expanding the Research Base, Tampa, FL. Federal Register. (1993, May). Substance Abuse and Mental Health Services Administration: Center for Mental Health Services. Washington, DC: U.S. Department of Health & Human Services, Office of Human Development Services. Hodges, K. (1989). Child and Adolescent Functional Assessment Scale. Unpublished manuscript, Eastern Michigan University, Ypsilanti, MI. Hodges, K. (1990a). Child and Adolescent Functional Assessment Scale self-training manual. Unpublished manuscript, Eastern Michigan University, Ypsilanti, MI. Hodges, K. (1990b). Child Assessment Schedule-Parent Form (3rd ed.). Unpublished manuscript, Eastern Michigan University, Ypsilanti, MI. Hodges, K. (1993). Structured interviews for assessing children. Journal of Child Psychology and Psychiatry, 34, 49-68. Hodges, K. (1994a). CAFAS Interview: Parent Report. Ypsilanti, MI: Eastern Michigan University. Hodges, K. (1994b). Child and Adolescent Functional Assessment Scale. Ypsilanti, MI: Eastern Michigan University. Hodges, K. (1994c). Child and Adolescent Functional Assessment Scale self-training manual. Ypsilanti, MI: Eastern Michigan University. Hodges K. (1994d). Measures for assessing impairment in children and adolescents (paper prepared for the U.S. Center for Mental Health Services). Rockville, MD: Department of Health and Human Services, Substance Abuse, and Mental Health Services Administration.

< previous page

page_662

next page >

< previous page

page_663

next page > Page 663

Hodges, K. (1994e). Preschool and Early Childhood Functional Assessment Scale. Ypsilanti, MI: Eastern Michigan University. Hodges, K. (1994f). Supplemental vignettes. Ypsilanti, MI: Eastern Michigan University. Hodges, K. (1995, February). Psychometric study of a telephone interview for the CAFAS using an expanded version of the scale. Paper presented at the Eighth Annual Research Conference: A System of Care for Children's Mental Health: Expanding the Research Base, Tampa, FL. Hodges, K. (1997). Manual for training coordinators, clinical supervisors, and data managers. Ypsilanti, MI: Eastern Michigan University. Hodges, K., Doucette-Gates, & A., Liao, Q. (1997). Validity of the Child and Adolescent Functional Assessment Scale (CAFAS). Unpublished manuscript, Eastern Michigan University, Ypsilanti, MI. Hodges, K., Doucette-Gates, A., Liao, Q., & Wong, M. (1996, October). Measuring child and family outcomes in community based systems of care. Paper presented at the Sixth Annual Virginia Beach Conference: Children and Adolescents with Emotional Disorders, Virginia Beach, VA. Hodges, K., & Gust, J. (1995). Measures of impairment for children and adolescents. Journal of Mental Health Administration, 22, 403-413. Hodges, K., Kline, J., Stern, L., Cytryn, L., & McKnew, D. (1982). The development of a child assessment interview for research and clinical use. Journal of Abnormal Child Psychology, 10, 173-189. Hodges, K., Latessa, M., Pernice, F., Wong, M., Doucette-Gates, A., & Liao, Q. (1997, February). Practical issues in using the CAFAS for clinical and administrative outcome. Symposium presented at the 10th Annual Research Conference of a System of Care for Children's Mental Health: Expanding the Research Base, Tampa, FL. Hodges, K., & Warren, B. (1997, December). Level of Functioning Project. Paper presented for the Department of Community Health, State of Michigan, Lansing, MI. Hodges, V.K., & Wong, M.M. (1996). Psychometric characteristics of a multidimensional measure to assess impairment: The Child and Adolescent Functional Assessment Scale (CAFAS). Journal of Child and Family Studies, 5, 445-467. Hodges, K., & Wong, M.M. (1997). Use of the Child and Adolescent Functional Assessment Scale to predict service utilization and cost. Journal of Mental Health Administration, 24, 278-290. Hodges, K., Wong, M.M., & Latessa, M. (1998). Use of the Child and Adolescent Functional Assessment Scale (CAFAS) as an outcome measure in clinical settings. Journal of Behavioral Health Services and Research, 25, 326-337. Lambert, W.E., & Guthrie, P.R. (1996). Clinical outcomes of a children's mental health managed care demonstration. Journal of Mental Health Administration, 23, 51-68. Lemoine, R.L., & McDermott, B.E. (1997). Assessing levels and profiles of service need using the CAFAS. In C.J. Liberton, K. Kutash, & R.M. Friedman (Eds.), Proceedings of the 10th Annual Research Conference: A system of care for children's mental health: Expanding the Research Base. Tampa, FL: Research and Training Center for Children's Mental Health. Lipsey, M.W., & Wilson, D.B. (1993). The efficacy of psychological, educational, and behavioral treatment. American Psychologist, 48, 1181-1199. Mental Health Statistics Improvement Program (MHSIP) Task Force. (1996, April). The MHSIP ConsumerOriented Mental Health Report Card. Substance Abuse and Mental Health Services Administration: Center for Mental Health Services. Washington, DC: U.S. Department of Health & Human Services. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcomes assessment (pp. 98-108). Hillsdale, NJ: Lawrence Erlbaum Associates. Newman, F.L., DeLiberty, R., Hodges, K., McGrew, J., & Tejeda, M. (1997, July). Hoosier assurance plan: Research results on assessment: Instruments and provider profile: Report card development. Unpublished manuscript, Indiana Department of Public Health, Indianapolis, IN. Newman, F.L., & Hodges, K. (1995, December). Developing outcome assessment measures for children, adolescents, and adults to

< previous page

page_663

next page >

< previous page

page_664

next page > Page 664

support a provider profile in Indiana. Paper presented at the meeting of the Second Annual Florida Conference on Behavioral Health Care Evaluation, Orlando, FL. Nunnally, J.C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill. Summerfelt, W.T., Foster, E.M., & Saunders, R.C. (1996). Mental heath services utilization in a children's mental health managed care demonstration. Journal of Mental Health Administration, 23, 80-91.

< previous page

page_664

next page >

< previous page

page_665

next page > Page 665

Chapter 22 The Child Health Questionnaire (CHQ): A Potential New Tool to Assess the Outcome of Psychosocial Treatment and Care Jeanne M. Landgraf Healthact In 1989, the Congress of the United States mandated that patient-reported quality of life be incorporated into Food and Drug Administration (FDA) approved randomized clinical trials (Omnibus Reconciliation Act, 1989). This landmark decision was coincident to the significant development of generic and condition-specific instruments that measure the health-related outcomes of adults (McDowell & Newell, 1987). Its impact on the development of instruments designed especially for children and adolescents, however, has been relatively negligible. Although the conceptual and methodolgic challenges inherent in constructing instruments for these populations have been sufficiently addressed in the literature (Eisen, Ware, Donald, & Brook, 1979; Grave & Pless, 1976; Landgraf & Abetz, 1996; Starfield, 1974, 1987; Walker & Richmond, 1985), there exists a dearth of practical yet well-validated health assessment tools for children and adolescents. To be truly useful as an evaluative tool for practitioners and clinical decision-makers, pediatric instruments must reflect the cultural uniqueness of this population, and be developmentally sensitive and multidimensional in scope so as to capture both the physical and psychosocial well-being of children and adolescents (Landgraf & Abetz, 1996). It has become an insufficient standard to define physical health as merely the absence of disease (World Health Organization [WHO], 1948). The definitional convention, initially posited by the WHO, is "the state of complete physical, emotional, and social well-being." State-of-the art assessment tools must therefore reflect this premise. Further, an assessment tool must not only measure children's capacity to engage in physical activities, but provide evidence concerning the degree of limitation they may experience in accomplishing different tasks that involve dexterity, motor skill, and exertion (Landgraf, Abetz, & Ware, 1996a). To be considered conceptually robust, both children's internal health (i.e., emotional status, self-esteem) and their external or observable well-being (i.e., behavioral problems or social limitations) must be captured. Finally, because children live within a family structurehowever uniquely this unit may be definedunderstanding the impact/outcome of therapeutic interventions on the family should be an essential component of the assessment process (Landgraf et al., 1996a; Landgraf & Abetz, 1996; Landgraf & Abetz, 1998).

< previous page

page_665

next page >

< previous page

page_666

next page > Page 666

TABLE 22.1 Number of Items in Different Lengths of the Child Health Questionnaire Parent Completed Child Completed Conceptsa PF98 PF50 PF28 CF87 9 6 3 9 Physical Functioning 3 3 Role/Social Emotional }3 }1 3 3 Role/Social Behavioral 3 2 1 3 Role/Social-Physical 3 2 1 2 Bodily Pain 16 6 4 16 General Behavior 16 5 3 16 Mental Health 10 6 3 14 Self-esteem 13 6 4 13 General Health Perceptions b 1 1 1 Change in Health 5 3 2 c Parental ImpactEmotional 5 3 2 c Parental ImpactTime 12 6 2 6 Family Activities b 1 1 1 Family Cohesion 12 14 14 12 Number of Concepts 98 50 28 87 Number of Items 1990 1994 1995 1990 Year Developed Note. Reproduced with permission from Landgraf, Abetz, and Ware (1996a, p. 32). All rights reserved. aConcepts are listed in the order they appear in all versions of the CHQ. bThese items were constructed after the CHQ-PF98 was developed and field tests were underway. However, we strongly encourage their inclusion in studies using the full-length parentcompleted CHQ. cThese scales are not included in the child-completed version of the CHQ. Overview The Child Health Project was initiated at the Health Institute/Tufts-New England Medical Center in 1990 with the principle mission to advance measurement work in pediatric health outcomes using a well-grounded child development perspective. The research program formally concluded in 1997 with widespread dissemination and availability of the Child Health Questionnaire (CHQ), a self-completed, comprehensive instrument designed to assess the health outcomes of children age 5 and older (Landgraf, Abetz, & Ware, 1996a, 1996b; Landgraf, Ware, Schor, Davies, & Rossi-Roh, 1993a, 1993b). A thorough exposition of the conceptual framework for the CHQ has been provided elsewhere (Landgraf et al., 1996a). Briefly, the CHQ was designed to measure both the physical and psychosocial well-being of children age 5 and older. In direct response to industry demands, several lengths of the CHQ were developed. As detailed in Table 22.1, each length of the CHQ consists of multi-item scales and single items that measure 14 health concepts: physical functioning; general health; pain; mental health; self-esteem; behavior; limitations in school and social functioning due to physical, emotional, or behavior difficulties; parental limitations and well-being; family limitations and well-being; and health transition. Table 22.2 lists the items found in the 50-item parentcompleted version of the CHQ.

< previous page

page_666

next page >

< previous page

page_667

next page > Page 667

TABLE 22.2 List of Itemsa and Response Optionsb Found in the CHQPF50 1. In general, would you say your child's health is: 2. Has your child been limited in any of the following activities due to health problems? a. Doing things that take a lot of energy, such as playing soccer or running? b. Doing things that take some energy such as riding a bike or skating? c. Ability (physically) to get around the neighborhood, playground, or school? d. Walking one block or climbing one flight of stairs? e. Bending, lifting, or stooping? f. Taking care of him/herself, that is, eating, dressing, bathing, or going to the toilet? 3. Has your child's schoolwork or activities with friends been limited in any of the following ways due to EMOTIONAL difficulties or problems with his/her BEHAVIOR? a. limited in the KIND of schoolwork or activities with friends he/she could do b. limited in the AMOUNT of time he/she could spend on schoolwork or activities with friends c. limited in PERFORMING schoolwork or activities with friends (it took extra effort) 4. Has your child's school work or activities with friends been limited in any of the following ways due to problems with his/her PHYSICAL health? a. limited in the KIND of schoolwork or activities with friends he/she could do b. limited in the AMOUNT of time he/she could spend on schoolwork or activities with friends 5a. How much bodily pain or discomfort has your child had? 5b. How often has your child had bodily pain or discomfort? 6. How often did each of the following statements describe your child? a. argued a lot b. had difficulty concentrating or paying attention c. lied or cheated d. stole things inside or outside the home e. had tantrums or a hot temper f. Compared to other children your child's age, in general would you say his/her behavior is: 7. How much of the time do you think your child: A. felt like crying? B. felt lonely? C. acted nervous? D. acted bothered or upset? E. acted cheerful? 8. How satisfied do you think your child has felt about: A. his/her school ability? B. his/her athletic ability? C. his/her friendships? D. his/her looks/appearance? E. his/her family relationships? F. his/her life overall? 9. How true or false is each of these statements for your child? a. My child seems to be less healthy than other children I know. b. My child has never been seriously ill.

c. When there is something going around my child usually catches it. d. I expect my child will have a very healthy life. e. I worry more about my child's health than other people worry about their children's health. f. Compared to one year ago, how would you rate your child's health now. 10. How MUCH emotional worry or concern did each of the following cause YOU? A. Your child's physical health B. Your child's emotional well-being or behavior C. Your child's attention on learning abilities 11. Were you LIMITED in the amount of time YOU had for your own needs because of: A. Your child's physical health B. Your child's emotional well-being or behavior C. Your child's attention or learning abilities 12. How often has your child's health or behavior: a. limited the types of activities you could do as a family? b. interrupted various everyday family activities (eating meals, watching tv)? c. limited your ability as a family to ''pick up and go" on a moment's notice d. caused tension or conflict in your home? e. been a source of disagreements or arguments in your family? f. caused you to cancel or change plans (personal or work) at the last minute? Sometimes families may have difficulty getting along with one another. They do not always agree and they may get angry. In general, how would you rate your 13. family's ability to get along with one another? Note. Reproduced with permission from Landgraf, Abetz, & Ware (1996a, pp. 363-369). All rights reserved. aA 4-week recall period is used for all scales except for the Change in Health (CH) and Family Cohesion (FC) items and the General Health (GH) scale. The recall stem for change in health is "compared to last year." bOptions include the following: Excellent; Very good; Good; Fair; Poor • Yes, limited a lot; Yes, limited some; Yes, limited a little; No, not limited • None; Very mild; Mild; Moderate; Severe; Very severe • None of the time; Once or twice; A few times; Fairly often; Very often; Every/almost every day • Very Often Fairly Often; Sometimes; Almost Never • All of the time; Most of the time; Some of the time; A little of the time; None of the time • Very satisfied; Somewhat satisfied; Neither satisfied nor dissatisfied; Somewhat dissatisfied; Very dissatisfied • Definitely True; Mostly True; Don't Know; Mostly False; Definitely False • Much better now than 1 year ago; Somewhat better now than 1 year ago; About the same now as 1 year ago; Somewhat worse now than 1 year ago; Much worse now than 1 year ago • None at all; A little bit; Some; Quite a bit; A lot.

< previous page

page_667

next page >

< previous page

page_668

next page > Page 668

Each concept measured in the CHQ yields an independent scale score. For ease of interpretation, raw scale scores are transformed to an arbitrary 0 to 100 continuum with "0" representing the lowest score possible and "100" representing the highest possible score. The higher the score, the more favorable the rating. To further assist interpretation, preliminary normative data and clinical benchmarks have been collected in a general sample of children in the United States (Landgraf et al., 1996a) and similar efforts are underway in Australia (Waters & Landgraf, 1997) and Ireland (McErlain & Gaffney, 1997). To further enhance interpretation, the resulting 14 "scores" can be summarized as two complementary component scoresphysical and psychosocial healthusing a standard mean score of 50 and a standard deviation of 10. These component scores have been shown to account for 59.2% of the total measured variance in a general sample of children in the United States (Landgraf et al., 1996a). Figure 22.1 provides a graphic illustration of the measurement model for the CHQ based on a sample of 914 children in the United States. The psychometric properties of the CHQ have been thoroughly evaluated across common child conditions (e.g., asthma, attentional deficit hyperactivity disorder, cystic fibrosis, epilepsy, juvenile rheumatoid arthritis, and psychiatric problems), child characteristics

Fig. 22.1. CHQ measurement model. Reprinted with permission from Landgraf, Abetz, and Ware (1996a, p. 285).

< previous page

page_668

next page >

< previous page

page_669

next page > Page 669

(e.g., age, gender; Landgraf & Abetz, 1997; Landgraf et al., 1996a), and parental characteristics (e.g., ethnicity, education, work, and marital status; Landgraf & Abetz, 1998; Landgraf et al., 1996a) using multitrait item scaling analysis (Hayashi & Hays, 1987; Hays, Hayashi, Carson, & Ware, 1988). The minimum criteria for item internal consistency (³ .40; Campbell & Fisk, 1959; Howard & Forehand, 1962) was exceeded on average by 91% of all item tests performed in the representative U.S. sample and subgroups, and by 84% of all item tests in the clinical samples. The average success rate for tests of item discriminant validity was 94%. The reliability or internal consistency of the CHQ scales was estimated in a representative sample of children in the United States, 16 corresponding child and parent subgroups, and 6 clinical samples using Cronbach's alpha coefficient (Cronbach, 1951). Results have been thoroughly documented (Landgraf et al., 1996a). In general, coefficients ³ .80 were observed across most of the scales and samples. Evidence of the CHQ's ability to detect differences across clinical samples was noted in early dissemination efforts and was carefully documented in parallel with its widespread release (Landgraf et al., 1993a, 1993b, 1996a). It has been shown to discriminate within groups varying in severity of asthma, juvenile rheumatoid arthritis, and attention deficit hyperactivity disorder (Landgraf, Abetz, DeNardo, & Tucker, 1995; Landgraf et al., 1993a, 1993b; Landgraf, Ware, & Rossi-Roh, 1993; McGrath et al., 1998). Using emergent international guidelines, the CHQ has been rigorously translated or adapted for use in Australia (Waters, Wake, Landgraf, Wright, Ceccato, & Kesketh, 1998; Waters, Wake, Landgraf, Wright, Kesketh, & Ceccato, 1998; Waters & Landgraf, 1997), Ireland (McErlain & Gaffney, 1997), France (Rodary, LePlege, Kalifa, & Bernard, 1997), Canada, Germany, and the United Kingdom (Landgraf, Maunsell, Nixon-Speechly, Bullinger, Campbell, Abetz, & Ware, 1998; Landgraf, Maunsell, Speechly, Abetz, Gibbons, Barrera, & Ware, 1995; Bullinger, Mackensen, & Landgraf, 1994); The Netherlands (Raat, Sturmans, Landgraf, Bonsel, & Gemke, 1998); and Sweden (Landgraf, Erling, Wilund, Abetz, & Ware, 1995). Tests of clinical validity are currently underway in many of these countries. Additional translations are currently planned for use in Eastern Europe, including the Czech Republic, Greece, Hungary, Israel, Poland, Turkey, and Russia. Use of the CHQ in Treatment Planning The CHQ was designed to be used as a common yardstick to gauge the relative burden of disease in children and the relative benefit of treatment irrespective of the child's age, gender, or the nature of the condition (physical, behavioral, social). As a self-completed health assessment tool, the CHQ is designed to provide the clinical practitioner with further insight into parental perceptions concerning children's emotional health, their selfesteem, behavior, school performance and social limitations, and a brief assessment of the state of family relationships. It was not designed for use as a diagnostic tool, although this may be a latent benefit as adoption of the CHQ becomes more widespread and further normative data becomes available. In the interim, however, the CHQ can be used as a tool to enhance the practitioner's current data-gathering processes. Because the tool has been standardized, the information it provides may be useful in the identification of key areas to target and the planning of different treatment options. Presently, the CHQ has been adopted for use in randomized international clinical trials (Landgraf et al., 1996a; Silber, 1995), outpatient settings (Cafferata, 1992; Gillam et al., 1995; Kurtin, Landgraf, & Abetz, 1994; J. Thomas, personal communication, 1995;

< previous page

page_669

next page >

< previous page

page_670

next page > Page 670

Landgraf, Abetz, et al., 1995; Powers, Abetz, & Landgraf, 1996), doctor's offices, and HMOs. Internationally, the CHQ is also being used to assess the health of young schoolchildren in Victoria, Australia (Waters & Landgraf, 1997; Waters et al., 1998a, 1998b), to assess the mental health of children in Adelaide, Australia (Sawyer, Antniou, Toogood, & Rice, 1998), to assess the physical and emotional well-being of children in Northern Ireland (McErlain & Gaffney, 1977), and as tool to assess the long-term health and well-being of patients surviving childhood cancer (Speechly et al., in press). It has been identified as a potential tool for defining special needs in children because of the availability of preliminary clinical benchmarks and normative data that can assist in the interpretation of results. Across these varied settings it has become a standard practice to administer the CHQ prior to intervention as a baseline assessment tool and again postintervention as an objective measure of the potential outcome of care and treatment. Given the recent widespread release of the CHQ and the design of many of these studies, it may be 12 to 18 months before empirical evidence is available regarding use of the instrument in this way. Use of the CHQ for Treatment Monitoring Many scales within the CHQ utilize a standard 4-week recall period. Combined with the robust alpha coefficients observed across most scales (³ .80), it is therefore quite appropriate for use as an instrument to monitor change at the individual patient level. Nunnally and Bernstein (1994) recommended alpha coefficients of at least .80 or higher for individual patient-level analyses. However, there are several constraints that must be noted. The most noteworthy point is that the CHQ was normed using 4 weeks as the standard time referent. From a psychometric perspective, it is unwise to modify the time referent used in an instrument that has been standardized if clinicians wish to utilize the normative data as an interpretive method. However, if the objective of a study is to assess the efficacy of a treatment that lasts only a few days (e.g., administration of an antibiotic) or an intervention that happens quickly (e.g., short-term psychotherapy), modification of the time referent to optimize the instrument's sensitivity to any potential change may be a preferable gain relative to the loss of comparability with normative data. It is important to note that, as is the case with all aspects of validity, providing evidence of an instrument's sensitivity to change is an iterative process. Each application adds to a rich database of information concerning the use and/or limitation of a given instrument. Although its development has spanned almost 7 years of industry-sponsored research, the CHQ is a relatively new assessment tool. Preliminary unpublished information concerning the sensitivity of the CHQ in clinical trials (proprietary unpublished data available from the author) and test-retest results in a pilot study designed to assess the efficacy of a brief 12-week psychotherapy program are encouraging (J. Thomas, personal communication, 1995). However, further evidence is needed to determine the generalizability of these early results. Use of the CHQ to Assess the Outcome of Treatment Newman and Ciarlo (1994) identified five broad categories against which to assess the quality of instruments designed to measure the outcome of care and treatment: relevance

< previous page

page_670

next page >

< previous page

page_671

next page > Page 671

to target population, simplicity of use and interpretability, psychometric features, cost, and utility considerations. Each of these issues, as they relate to the CHQ, have been identified and discussed in the CHQ User Manual (Landgraf et al., 1996a) but are briefly readdressed now for the convenience of the reader. The exception being, the psychometric strengths of the CHQ, which were summarized earlier in this chapter. Relevance The cultural diversity of children and adolescents in the United States is being dramatically redefined (Lewit & Baker, 1996). Given this, it has been argued that the standard criteria used to evaluate state-of the-art health assessment tools be expanded to include evidence of their appropriateness in groups differing in cultural orientation, ethnicity, and race (Landgraf & Abetz, 1996). Thus, an important hallmark of the CHQ development was a thorough evaluation and documentation of the relevance of both items and concepts that constitute its architectural framework (Landgraf & Abetz, 1997; Landgraf et al., 1996a). Methods and Procedures The CHQ was designed and validated as a self-completed instrument. However, because certain study designs necessitate the use of trained interviewers, the content of the CHQ was scripted to facilitate its use in settings that require either phone or face-to-face interviews. However, normative data was collected using a mailout/mail-back methodology. Based on evidence with adult assessment tools (McHorney, Kosinski, & Ware, 1994), differences in mean scores for the CHQ scales can be expected depending on the chosen mode of administration. It is hypothesized that scores will be inflated if an interviewer administers the CHQ; that is, individuals will report more favorable scores indicating better functioning and well-being. To date, however, no studies have been conducted to confirm this position. Regardless, users are strongly encouraged to consult the CHQ User Manual (Landgraf et al., 1996a) for specific guidelines on administering the form. Briefly, the CHQ should be completed independent of guidance from others, including family members or clinical, administrative, or study personnel. Using general conventions (six items/minute), the CHQ can be completed by most people in approximately 10 to 12 minutes. The readability of the CHQ was evaluated using the Flesch Kincaid Readability Test. Results indicate the CHQ-PF50 and 28 require a reading level of third grade, second to fifth month. The parent-completed CHQ should be understandable and easy to complete for 80% of the respondents. The CHQ-CF87 requires a reading level of second grade, fifth month. The CHQ should be understandable and easy to complete for about 85% of children responding. At the commencement of the Child Health Program, the goal was to develop a tool that could be administered across a broad age spectrum (5-18 years). Given the complexities of establishing validity in children under age 12, work initiated with the development and validation of the parent-completed instrument. About a year into the program, however, it became obvious that a complementary child-completed version would greatly facilitate understanding of salient measurement issuesmost notably, the potential tradeoffs in using multiple reporters.

< previous page

page_671

next page >

< previous page

page_672

next page > Page 672

Work on a child report version of the CHQ began as a direct result of an incidence with my then 9-year-old daughter. She had observed the CHQ on our dining room table and began answering the questions concerning limitations in her physical well-being. We were in agreement for all but one of the items: limitations in strenuous activities. Having seen her excel on the soccer field, I scored her as having no limitations; however, she reported herself to be quite limited. After further inquiry and clarification, I learned that she was having considerable difficulty breathing, but had thought that such problems were "normal." After considerable deliberations and testing, she was diagnosed with asthma. Her condition is currently managed with several therapeutics and she recently completed a successful year of freshman soccer. The experience convinced me that development of a parent-completed tool was insufficient and that to be truly useful, parallel development and validation of a self-report tool would be essential to the success of the measurement program. As argued elsewhere (Landgraf & Abetz, 1996; Landgraf et al., 1993a, 1993b; Powers et al., 1996), the objective in using multiple reporters is not to establish concordance or evaluate interrater agreement of the form. Rather, it is to provide a standard platform so that each voice can be heard and the best possible treatment plan can be devised. In addition, a better understanding of the perceptions of both parents and their children will facilitate practitioners in establishing realistic expectations for all involvedoften an essential ingredient to the success of any family-centered intervention, especially for children receiving treatment for psychiatric or behavioral problems. Its use in this way with more traditional chronic child conditions such as renal failure has been demonstrated (Kurtin, Landgraf, & Abetz, 1994). Currently, scoring, analysis, and interpretation of the CHQ requires some understanding of statistical software traditionally used by the scientific community (e.g., SAS, SPSS). To facilitate analysis and interpretation and meet the burgeoning demands of a diverse group of users, a universal scoring engine for the CHQ is currently being developed. The engine will be transferable across all commercially available database platforms. The standard engine will also include data quality checks such as an index that reports on the consistency of an individual's responses within a given scale. Other software initiatives include the development of computergenerated reports that present an individual's responses over time relative to scores obtained for children of the same age and gender and across a broad array of health conditions. The use of patient-based tools to assess the outcome of care is quickly becoming standard practice in the United States. Anticipating this movement, the CHQ was designed to augment clinical aspects of services with minimal interference. It is currently being used in conjunction with other instruments approved by the provider community such as the Child Behavior Checklist (Achenbach & Edlebrock, 1979), the Health Utility Index (Feeny, Furlong, Boyle, & Torrance, 1995), and asthma-specific modules (McGrath et al., 1998). It is anticipated over the next few years that, given the increasing exposure and use of the CHQ, the relation between the information it provides and that of complementary data provided by other tools will be better understood. Cost and Utility Considerations Despite considerable advances concerning the interpretation of health outcomes data, relatively little attention has been given to the presentation of results to the lay medical community, including clinicians, administrators, and payers. Historically, software developers

< previous page

page_672

next page >

< previous page

page_673

next page > Page 673

have taken the lead in designing the standardized reports generated by patient-based assessment tools and outcomes service organizations have developed systems to expedite the process by which data is collected (i.e., touchscreen programs, computer adaptive testing, voice simulated technology). Because many of these efforts are adjunct activities designed to increase market share, there exists tremendous variability in the way data is collected, processed, analyzed, and ultimately presented. Currently, there are several national initiatives to make outcomes more accessible to specialty providers in conjunction with use of the CHQ. One program focuses on the development of user friendly software. A second program focuses on the application of an integrated outcomes measurement system. The objective is to develop a system for use in clinical care that is unobtrusive, yet enriches the patient-provider relationship by providing clinicians with information about their patient's quality of life, and their ability to self-manage their condition. In some instances, merely having an outcomes program in place has been beneficial for providers in negotiating managed care contracts. The biggest challenge, however, still remains: developing a reporting mechanism that provides time-sensitive information that is scientifically valid, clinically meaningful/useful, and easily understood and interpreted by the lay practitioner. Hypothetical Case Study Using the CHQ Considerable time and effort was spent on evaluating the relevance and conceptual framework of the CHQ items and scales in an effort to construct a practical and useful instrument for both research and clinical care (Landgraf et al., 1996a, 1996b). To optimize its development, access to the prestandardized CHQ was limited to carefully designed studies that provided opportunities to evaluate its clinical relevance across an array of conditions. As a result, data accrued from case specific situations is limited. However, the availability of normative data allows for the creation of a hypothetical case study to illustrate the potential utility of the CHQ for mental health providers. This illustration is based in part on information provided in early application studies. Sean, an active 11-year-old, is brought to the clinic for observation and testing. His parents reveal during indepth interviews that he is generally aggressive and argumentative, often exhibiting outbursts of anger and frustration if requests are not granted. He appears to be extremely bright, but despite this, achieves low academic marks. His parents report that he appears relatively disinterested in activities at home and on occasion has expressed strong negative feelings toward his siblings. There are considerable conflicts at home. As part of the diagnostic process, Sean's parents are asked to complete several standardized instruments, including the CHQ. Comments obtained during the interview are reflected in low scores for seven of the CHQ scales: Limitations in Schoolwork, Behavior, Mental Health, Self-esteem, Emotional Impact on the Parent, and Family Limitations. Mean scores are 69.2, 53.2, 66.5, 61.3, 57.0, and 63.5, respectively. Each of these scores are lower than those published (Landgraf & Abetz, 1996) for the U.S. sample of boys (91.4, 74.1, 79.6, 79.6, 78.0; and 88.9) and comparable to the sample of children with attentional problems (68.7, 54.5, 66.8, 62.6, 58.5, and 62.1). Not surprising, scores obtained for the other standardized tools support a diagnosis of attention deficit hyperactivity disorder and borderline depression. Based on these findings, the team designs a comprehensive treatment program that includes behavior

< previous page

page_673

next page >

< previous page

page_674

next page > Page 674

modification techniques, medication, and family therapy. After 6 months of treatment, the CHQ is readministered to monitor progress of treatment. Although scores on the School Limitations, Behavior, and Parental Anxiety scales are higher relative to baseline, differences in scores between baseline and the 6-month assessment are not significant. However, at the end of the 12-month program, t-test results indicate a significant difference for four of the scalesSchool Limitations, Behavior, Mental Health, and Parent Emotional Anxiety (+33.3, 12.8, +10.0, +19.2). These findings suggest that there has been a noticeable improvement in Sean's school performance, he is less depressed, and some of the anxiety previously reported by his parents has been alleviated. Conclusions Clearly, a single instrument will not be appropriate for all applications. Early empirical evidence supports the use of the CHQ as a viable instrument to assess the health-related quality of life of children and their families. Its potential as a diagnostic instrument and as a tool to measure the outcome of care and treatment has also been established. As is the case with any new instrument, more detailed evidence in this regard is warranted. It is important to underscore that the CHQ was not designed as a diagnostic tool, although the presence of normative data and benchmarks for some conditions may facilitate its use in this regard. Currently, there is sufficient evidence to suggest that relative to disease specific modules, the CHQ is a robust tool that can detect meaningful differences among children varying in severity of asthma, juvenile rheumatoid arthritis, some types of cancer, and attention deficit hyperactivity disorder. However, in the absence of clinical benchmarks for other common conditions, it may be advantageous to use the CHQ in conjunction with condition specific modules. Head-tohead comparisons of the sensitivity of the CHQ and these other modules will provide further empirical insight into its strengths and weaknesses. The goal of the Child Health Assessment Project was to advance methods in pediatric measurement using both qualitative and quantitative methods of development and evaluation. The CHQ is a tangible outcome of this effort. It is important to remember that the motivation behind its construction and validation was a broader mission: to give voice to "those often unknown unremembered . . . half-heard between two waves of the sea" (T.S. Eliot, "Little Gidding," 1942-1943). As such, the CHQ provides practitioners with a useful, practical instrument to measure the impact of care and therapy on the everyday functioning and well-being of children and their families. As evidenced by its widespread use in the United States and abroad, the broader and more compelling objective of the program has been achieved. Further application will only serve to enhance this mission. References Achenbach, T.M., & Edlebrock, C.S. (1979). The Child Behavior Profile: II. Boys aged 12-16 and girls aged 611 and 12-16. Journal of Consulting Clinical Psychology, 47, 223-233. Bullinger, M., Mackensen, S., & Landgraf, J.M. (1994). Assessing the quality of life of children. Quality of Life Research, 3, 41. Cafferata, G. (1992). The outpatient unit of the division of child and adolescent psychiatry:

< previous page

page_674

next page >

< previous page

page_675

next page > Page 675

Parent and staff surveys. Boston: New England Medical Center Department of Quality Assessment. Campbell, D.T., & Fisk, D.W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 85-105. Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. Eisen, M., Ware, J.E., Donald, C.A., & Brook, R. H. (1979). Measuring components of children's health status. Medical Care, 17, 902-921. Feeney, D., Furlong, W., Boyle, M., & Torrance, G. W. (1995). Multi-attribute health status classification systems: Health Utility Index. Pharmacoeconomics, 7, 490-502. Gillam, F., Wyllie, Kashden, J., Kuzniecky, R., Faught, E., & Carr, S. (1995). Comprehensive outcome from pediatric epilepsy surgery. Epilepsia, 36, 140. Grave, G.D., & Pless, I.B. (Eds.). (1976). Chronic childhood illness: Assessment of outcome. (NIH DHEW Publication no. 76-877). Bethesda, MD. Hayashi, T., & Hays, R.D. (1987). A microcomputer program for analyzing multitraitmultimethod matrices. Behavior Research Methods, Instruments, and Computers, 19, 345-348. Hays, R.D., Hayashi, T., Carson, S., & Ware, J.E. (1988). User's guide for the Multitrait Analysis Program (MAP) (Report No. N2786-RC). Santa Monica, CA: The RAND Corporation. Howard, K.L., & Forehand, G.C. (1962). A method for correcting item-total correlations for the effect of relevant item inclusion. Educational Psychology Measurement, 22, 731. Kurtin, P., Landgraf, J.M., & Abetz, L. (1994). Patient-based health status measures in pediatric dialysis: Expanding the assessment of outcomes. American Journal of Kidney Diseases, 24, 376-382. Landgraf, J.M., & Abetz, L. (1996). Measuring health outcomes in pediatric populations: Issues in psychometrics and application. In B. Spilker (Ed.), Quality of life and pharmacoeconomics in clinical trials (2nd ed., pp. 793-802). Philadelphia: Lippincott-Raven. Landgraf, J.M., & Abetz, L. (1997). Functional status and well-being of children representing three cultural groups: Initial self-reports using the Child Health Questionnaire (CHQ-CF87). Psychology and Health, Quality of Life: Recent Advances in Theory and Methods, 12, 1-16. Landgraf, J.M., & Abetz, L. (1998). Influences of sociodemographic characteristics on parental reports of children's physical and psychosocial well-being: Early experiences with the Child Health Questionnaire (CHQPF50). In D. Drotar (Ed.), Measuring health-related quality of life in children and adolescents: Implications for research and practice (pp. 105-126). Hillsdale, NJ: Lawrence Erlbaum Associates. Landgraf, J.M., Abetz, L., DeNardo, B.A., & Tucker, L.B. (1995, October). Clinical validity of the Child Health Questionnaire-Parent Form in children with rheumatoid arthritis. Poster presented at the 1995 National Scientific Meeting of the American College of Rheumatology, San Francisco, CA. Landgraf, J.M., Abetz, L., & Ware, J.E. (1996a). The CHQ: A user's manual. Boston: The Health Institute. Landgraf, J.M., Abetz, L., & Ware, J.E. (1996b, May). Psychometric results of the CHQ-PF50 form in a normative U.S. sample of children and its clinical application across several condition groups. Paper presented at the Third Annual Symposium of Contributed Papers, Quality of Life Evaluation, Boston. Landgraf, J.M., Erling, A., Wilkund, I., Abetz, L., & Ware, J.E. (1995, October). The Child Health Questionnaire: Issues in translation, language, and culture for Swedish children and their parents. Poster presentation at the Second Annual Meeting of the International Society for Quality of Life Research, Montreal, Quebec, Canada. Landgraf, J.M., Maunsell, E., Nixon-Speechly, K., Bullinger, M., Campbell, S., Abetz, L., & Ware, J.E. (1998). Canadian-French, German, and United Kingdom versions of the Child Health Questionnaire: Methodology and preliminary item scaling results. Quality of Life Research, 7(5), 433-445. Landgraf, J.M., Maunsell, E., Speechly, K.N., Abetz, L., Gibbons, L., Barrera, M., & Ware, J.E. (1995, October). Psychometric properties of the Child Health Questionnaire (parent form) among English and Frenchspeaking

< previous page

page_675

next page >

< previous page

page_676

next page > Page 676

Canadian respondents: Preliminary results. Paper presented at the Second Annual Meeting of the International Society for Quality of Life Research, Montreal, Quebec, Canada. Landgraf, J.M., Ware, J.E., & Rossi-Roh, K. (1993, October). Measuring the relative health and well-being of children with attention deficit hyperactivity disorder. Results from the Children's Health and Quality of Life Project. Boston: The Health Institute, New England Medical Center. Landgraf, J.M., Ware, J.E., Schor, E., Davies, A. R., & Rossi-Roh, K. (1993a, June). Comparison of health status profiles for children with medical conditions: Preliminary psychometric and clinical results from the Children's Health and Quality of Life Project. Paper presented at the 10th Annual Meeting of the Association for Health Services Research, Washington, DC. Landgraf, J.M., Ware, J.E., Schor, E., Davies, A. R., & Rossi-Roh, K. (1993b, June). Health profiles in children with psychiatric and other medical conditions. Paper presented at the World Congress of Psychiatry, Rio de Janiero, Brazil. Lewit, E.G., & Baker, L.G. (1994). Race and ethnicity changes for children. In R. Behrman (Ed.), The future of children: Critical health issues for children and youth (Vol. 3, pp. 134-144). Los Angeles: The Center for the Future of Children, the David and Lucile Packard Foundation. McDowell, I., & Newel, C. (1987). Measuring health: A guide to rating scales and questionnaires. New York: Oxford University Press. McErlain, S., & Gaffney, (1977). The CHQ-PF50: A new health instrument to assess child health outcomes. Internal report prepared for the Eastern Health and Social Services Board, Northern Ireland. McGrath, M., Buckstein, D.A., Buchner, D.A., Guzman, G.L., Landgraf, J.M., & Goss, T. F. (1998). Assessment of the relationship between disease severity and general and disease-specific health-related quality of life in pediatric asthma patients. Manuscript submitted for publication. McHorney, C., Kosinski, M., & Ware, J.E. (1994). Comparisons of the costs and quality of norms for the SF-36 Health Survey collected by mail versus telephone interview: results from a national survey. Medical Care, 33, 15-28. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Nunnally, J.C., & Bernstein, I.R. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill. Omnibus Reconciliation Act. (1989). Public Law 101-239. Powers, P., Abetz, L., & Landgraf, J.M. (1996, August). Adolescents with cystic fibrosis: Mother, father and child reports of health. Paper presented at the 104th Annual Convention of the American Psychological Association, Toronto, Canada. Raat, H., Sturmans, F., Landgraf, J.M., Bonsel, G. J., & Gemke, R.J.B.J. (1998, August). Health status assessment with the Child Health Questionnaire in a Dutch population of children (5-13 years). Paper presented at the XXII International Congress of Pediatrics, Amsterdam, The Netherlands. Rodary, C., LePlege, A., Kalifa, C., & Bernard, J. L. (1997). Adaptation in French of the Child Health Questionnaire (CHQ), and validation in children 10-18 years. Paper presented at the Ninth Annual Meeting of the International Association for Quality of Life Research, Vienna, Austria. Sawyer, M., Antniou, G., Toogood, L., & Rice, M. (1998, June). Parent and child reports of the health-related quality of life in children treated for cancer. Paper presented at the International Workshop on Assessing healthrelated quality of life in children with cancer, Niagara-on-the-Lake, Ontario, Canada. Silber, J.H. (1995). Enalapril after anthracycline cardiotoxicity (NHLBI, NIH 1R01HL50424-01). Philadelphia: Children's Hospital of Philadelphia, Division of Oncology. Speechly, K.N., Maunsell, E., Desmeules, M., Schanzer, D., Landgraf, J.M., Feeny, D.H., & Barrera, M.E. (in press). Mutual concurrent validity of the Child Health Questionnaire and the Health Utilities Index: An exploratory analysis using survivors of childhood cancer. International Journal of Cancer.

< previous page

page_676

next page >

< previous page

page_677

next page > Page 677

Starfield, B. (1974). Measurement of outcome: A proposed schema. Milbank Memorial Fund Quarterly, 52, 3950. Starfield, B. (1987). Child health status and outcome of care: A commentary on measuring the impact of medical care on children. Journal of Chronic Disease, 40, 109S-115S. Walker, D.K., & Richmond, J.B. (Eds.). (1985). Monitoring child health in the United States: Selected issues and policies. Cambridge, MA: Harvard University Press. Waters, E., & Landgraf, J.M. (1997, October). Measuring child health and well-being in a school-based sample of Australian parents and children. Paper presented at the Ninth Annual Meeting of the International Association for Quality of Life Research, Vienna, Austria. Waters, E., Wake, M., Landgraf, J., Wright, M., Ceccato, L., & Kesketh, K. (1998, August). Cross-cultural comparisons of health-related quality of life between Australian and U.S. children. Abstract presented at the XXII International Congress of Pediatrics, Amsterdam, The Netherlands. Waters, E., Wake, M., Landgraf, J., Wright, M., Kesketh, K., & Ceccato, L. (1998, August). Psychosocial and physical health status of Australian children: Child Health Questionnaire normative data. Abstract presented at the XXII International Congress of Pediatrics, Amsterdam, The Netherlands. World Health Organization. (1948). Constitution of the World Health Organization. Switzerland: Basic Documents.

< previous page

page_677

next page >

< previous page

page_678

next page > Page 678

PART III ADULT ASSESSMENT INSTRUMENTATION

< previous page

page_678

next page >

< previous page

page_679

next page > Page 679

Chapter 23 The SCL-90-R, Brief Symptom Inventory, and Matching Clinical Rating Scales Leonard R. Derogatis Kathryn L. Savitz Clinical Psychometric Research, Inc. And Loyola College of Maryland The SCL-90-R (Derogatis, 1977, 1994) is a 90-item self-report symptom inventory that evolved most directly from the Hopkins Symptom Checklist (HSCL; L.R. Derogatis, Lipman, Rickels, Uhlenhuth, & Covi, 1974a, 1974b). The HSCL has roots in both the Discomfort Scale (Parloff, 1954), and the Cornell Medical Index (Wider, 1948), and specific items shared by these scales can be traced back to the very first self-report symptom inventory, the Woodworth Personal Data Sheet (Woodworth, 1918). A prototype version of the SCL-90-R was first described in 1973 (L.R. Derogatis, Lipman, & Covi, 1973), and the final version of the instrument was completed 2 years later (L.R. Derogatis, 1975). The inventory measures psychological symptoms and distress in terms of nine primary symptom dimensions and three global indices. The nine dimensions represent the constructs of Somatization, Obsessive-Compulsive, Interpersonal Sensitivity, Depression, Anxiety, Hostility, Phobic Anxiety, Paranoid Ideation, and Psychoticism. The global measures were designed to provide additional flexibility in measuring overall distress status; each provides a summary of global distress from a slightly different perspective. The globals are termed the Global Severity Index (GSI), the Positive Symptom Distress Index (PSDI), and the Positive Symptom Total (PST). The SCL-90-R, and its companion instruments in the series, were developed to be utilized with an extensive range of respondents. The inventory may be validly employed with community respondents, medical patients, and various classes of psychiatric outpatients and inpatients. Currently, the SCL-90-R is available in over 26 languages, including English, French, Spanish, German, Dutch, Italian, Russian. Microcomputer scoring, administration, and interpretation programs are also available for the SCL-90-R. The Brief Symptom Inventory (BSI; Derogatis, 1993; L.R. Derogatis & Melisaratos, 1983; L.R. Derogatis & Spencer, 1982) is comprised of 53 items, and represents the brief form of the SCL-90-R. It was also completed in 1975, and reflects psychological distress in terms of the same nine symptom dimensions and three global indices as the longer test. The BSI was specifically designed for measurement situations in which the time constraints will not allow at least 15 minutes, the time typically required to complete the SCL-90-R. Scores on the SCL-90-R and the BSI are highly correlated, however,

< previous page

page_679

next page >

< previous page

page_680

next page > Page 680

and often the brief version of the test is preferred, even in the absence of time constraints. As with the SCL-90R, the three global indices, nine principal symptom dimensions, and individual items reflect the three basic levels of clinical interpretation. Although the SCL-90-R represents a significant psychological test in its own right, it also serves as the centerpiece of a series of matched, multimodality tests. The primary advantage of a multimodality approach is that it enables assessment of clinical status via both self-report and expert clinical judgment with comparable inventories and rating scales. To achieve this goal, several ''companion" clinical rating scales to the SCL-90R/BSI have been developed. The Derogatis Psychiatric Rating Scale (DPRS) is a multidimensional clinical rating scale designed to be the clinician's version of the SCL-90-R. The first nine dimensions of the DPRS match the nine symptom constructs of the SCL-90-R. Eight additional dimensions, judged integral to valid clinical interpretation but not amenable to reliable self-report, also comprise the scale. A brief form of the DPRS (termed the Brief Derogatis Psychiatric Rating Scale, BDPRS) is also available; it consists of only the nine matching SCL-90-R/BSI symptom constructs. The DPRS and BDPRS are designed to be used by experienced clinicians, trained in psychiatric nosology and psychopathology. The SCL-90 Analogue Scale is a second companion scale to the SCL-90-R. It is designed for health professionals (e.g., physicians, nurses, social workers, lay interviewers) who have not received specific training in psychopathology and psychiatric nosology. It is a graphic, or analogue, scale representing the nine primary symptom dimensions of the SCL-90-R along 100 millimeter lines, extending from "not-at-all" at the minimum distress point to "extremely" at the maximum. Any of the three companion clinical observer's scales may be used in conjunction with either the SCL-90-R or the BSI. Normative Samples There are currently four formal norms for the SCL-90-R and BSI: psychiatric outpatients, community nonpatients, psychiatric inpatients, and community adolescents (L.R. Derogatis, 1994). All norms for the SCL90-R are gender-keyed, which means that separate norms exist for males and females for both dimension and global scores. Gender-keying represents an important normative refinement when attributes involving emotional expression or psychological distress are being assessed because of well-established gender differences in reporting emotional distress. The psychiatric outpatient norm for the SCL-90-R (Norm A) is based on 1,002 heterogeneous outpatients who presented for treatment at the outpatient psychiatry departments of four major teaching hospitals located in the east and midwest. The sample was comprised of 425 males and 577 females, was approximately two thirds white, and slightly more than 63% of the sample arose from social class Categories IV and V. Detailed demography of this and all other published norms may be found in the Administration, Scoring and Procedures Manual for the SCL-90-R (L.R. Derogatis, 1977, 1994). The community nonpatient norm (Norm B) was established on a cohort of 973 individuals who represent a stratified random sample from a diversely populated county in a major eastern state. Demography is not as complete on this sample, as is the case with others. Gender was evenly split, with 50.7% of the sample being male and 49.3% of the group female. The racial composition of the sample was predominantly white (84.6%), with 11.6% Black. Other racial groups comprised 1.6% of the sample. A slight majority of the sample were married, and the mean age of the cohort was 46 years.

< previous page

page_680

next page >

< previous page

page_681

next page > Page 681

The psychiatric inpatient norm (Norm C) is based on a sample of 423 individuals who were a heterogeneous group of patients from the psychiatric inpatient services of three major eastern hospitals. Almost two thirds of the sample were female, with 55.7% being white and 43.6% being Black. About 45% were single and 26.1% were married. Almost 80% of these patients arose from social class Categories IV and V, and their mean age was 33.1 years. The adolescent community norm (Norm E) is based on 806 adolescents who were enrolled in two distinct midwestern schools. Females comprised approximately 60% of the sample, which was almost exclusively white. Social class position was predominantly "middle class," with relatively small representations from adjacent "working" (Class IV) and ''upper middle" (Class II) groups. Age ranged from 13 to 18, and was normally distributed around the mean age of 15.6 years. Reliability and Validity Reliability Reliability essentially pertains to the consistency or replicability with which an instrument measures the characteristic(s) under observation. It is the converse of measurement error, and represents the proportion of variation in any measurement that is due to systematic variation of the attribute under study (e.g., intelligence, depression, impulsivity) as opposed to variance due to random or systematic error. Two formal types of reliability estimates are available for the symptom dimensions of the SCL-90-R: internal consistency and testretest. The former serves to reflect the homogeneity of the item sets developed to represent each symptom construct; test-retest reliability is much more of a measure of temporal stability, or score consistency across time. Internal consistency coefficients for the nine dimensions were calculated from the data of 209 "symptomatic volunteers" (L.R. Derogatis, Rickels, & Rock, 1976) in the form of coefficient alpha (a). Coefficient alpha treats within-form correlations among the items as analogous to correlations between alternate forms, and makes the assumption that the average correlation among actual items is equivalent to the correlation among items in the hypothetical alternate form (Nunnally, 1970). Coefficients in this assessment were quite satisfactory, ranging from a low of .77 for Psychoticism to a high of .90 for Depression. Internal consistency coefficients for the SCL90-R were also developed more recently by L.M. Horowitz, Rosenberg, Baer, Ureno, and Villasenor (1988) based on 103 outpatients presenting for psychotherapy. Coefficient alphas in that study ranged from a low of .84 for Interpersonal Sensitivity to a high of .90 for Depression (See Table 23.1). The test-retest coefficients presented in Table 23.1 arose from a sample of 94 heterogenous psychiatric outpatients who presented for evaluation and treatment at the psychiatric outpatient department of a major eastern teaching hospital. One week elapsed between testings, and as is clear from the sizes of the coefficients, the SCL-90-R possesses highly acceptable test-retest reliability. Coefficients ranged from a low of .78 on Hostility, to a high of .90 on the Phobic Anxiety dimension. All other stability coefficients fell in the mid .80s. In addition to these estimates of temporal stability, Horowitz et al. (1988) also evaluated the test-retest reliability of the SCL-90-R in their sample of 103 psychiatric outpatients. Even across 10 weeks, coefficients were very acceptable, with the coefficient

< previous page

page_681

next page >

< previous page

page_682

next page > Page 682

TABLE 23.1 Internal Consistency and Test-Retest Reliability Coefficients for the SCL-90-R Internal Consistency Test-Retest (rP) (Coefficient a) Symptom Derogatis Horowitz et Derogatis Horowitz et Dimension (1977)a al. (1988)b (1983)c al. (1988)b I. SOM .86 .88 .86 .68 II .O-C .86 .87 .85 .70 III. INT .86 .84 .83 .81 IV. DEP .90 .90 .82 .75 V. ANX .85 .88 .80 .80 VI. HOS .84 .85 .78 .73 VII. PHOB .82 .89 .90 .77 VIII.PAR .80 .79 .86 .83 IX. PSY .77 .80 .84 .77 GSI .84 aN = 219 symptomatic volunteers. bN = 103 psychiatric outpatients. cN = 94 heterogeneous psychiatric outpatients with 1 week elapsed between tests. for the GSI reported as .84, and subscale coefficients ranging from a low of .70 for Obsessive-compulsive to a high of .83 for Paranoid Ideation. Factorial Invariance Factorial invariance is an important characteristic of multidimensional measurement, although many professionals are unfamiliar with the concept. In simplest terms, factorial invariance refers to the degree of dimensional constancy a factorial definition maintains during the alteration of significant respondent parameters (e.g., age, sex, race, social class). For example, if a factorially derived depression dimension is developed from a middle-age sample, will that same set of items stay correlated to provide a valid definition of depression among the elderly? Will the dimensional composition retain its invariance when moving from male to female samples? If the factorial compositions of test dimensions are significantly altered (i.e., are unreliable) across fundamental parameters, then the operational definition of the construct has limited generalizability and, therefore, validity. Invariance data have been reported for very few psychological tests, in large measure because such studies are rather demanding, and there is no a priori hierarchy of which parameters to investigate. There is invariance data on the SCL-90-R, however. Factorial invariance coefficients (after Pinneau & Newhouse, 1964) were developed by L.R. Derogatis and Cleary (1977) contrasting male and female samples on the nine dimensions of the SCL90-R. All dimensions revealed very acceptable levels of invariance between males and females with the exception of Paranoid Ideation, which showed only moderate consistency across gender. SCL-90-R Validity Two principal issues that should be appreciated about the validation of psychological test instruments in general concern: the specificity of validity and the programmatic

< previous page

page_682

next page >

< previous page

page_683

next page > Page 683

nature of the validation process. The former issue refers to the fact that in order for the question "Is this test valid?" to have any scientific meaning, the conditional statement "For what purpose?" must be appended. Psychological tests are not valid measures in general, but like all other scientific measuring instruments, are valid for specific purposes and generally invalid for most others. The second issue focuses on the fact that in recent years psychometric theorists have increasingly stressed construct validity as the principal criterion for the validation of psychological tests and the assignment of meaning to them (Messick, 1975, 1981). The validation process, when accomplished successfully, involves an extensive program of experiments and analyses that are highly analogous to the steps necessary to prove a scientific theory. Data from predictive, content, convergent-discriminant, and other types of validation studies serve to contribute to the ultimate validation of the hypothetical construct(s) that the test serves to operationalize. The process of establishing the validity of a test is represented by a methodical series of studies that function to extend and redefine the limits of generalizability of the test as a definition of the construct. Convergent-discriminant validation is a fundamental form of validity that essentially demonstrates that the measure of interest correlates substantially with separate measures of the same construct, and shows little or no correlation with measures of dissimilar constructs. L.R. Derogatis, Rickles, and Rock (1976) illustrated excellent convergent-discriminant validity for the SCL-90-R in a study contrasting its dimensions with those of the Minnesota Multiphasic Personality Inventory (MMPI). In addition to the standard MMPI clinical scales, the MMPI was also scored for the Wiggins (1969) content scales, and Tryon's (1966) cluster scales. Results illustrated that SCL-90-R dimensions had their highest correlations with like MMPI constructs in every case except Obsessive-compulsive, which has no directly comparable MMPI scale. Boleloucky and Horvath (1974) reported a comparable study comparing SCL-90-R dimensions to the dimensions of the Middlesex Hospital Questionnaire (MHQ). With the large majority of dimensions there was very good convergence between like scales, and good discrimination as well. Both of these studies are presented in greater detail in the SCL-90-R administration manual (L.R. Derogatis, 1994). More recently, Koeter (1992) evaluated the convergent-discriminant validity of the anxiety and depression dimensions of the SCL-90-R in comparison with the General Health Questionnaire (GHQ). He concluded that both instruments showed good convergent and discriminant validity. Similarly, Wiznitzer et al. (1992) utilized Receiver Operating Characteristic (ROC) analysis to contrast the SCL-90-R with the Young Adult Self-report (YASR) and the GHQ-28. The SCL-90-R and the YASR performed at equivalent levels in this population, with both outperforming the GHQ-28. Choquette (1994) contrasted the depression dimension of the SCL-90-R with the BDI and DIS criteria in identifying clinical depression in alcoholic patients, and concluded that the SCL-90R and the BDI performed comparably. This conclusion is similar to that of Moffett and Radenhausen (1983) working with a comparable population. In a validation study also directly related to the construct validity of the instrument, L R. Derogatis and Cleary (1977) cast the hypothesized dimensional structure of the SCL-90-R into a binary "hypothesis matrix" (i.e., each item was assigned a "1" for the factor it loaded on and a "0" for all others). Subsequently, data from the SCL-90Rs of 1,002 psychiatric outpatients were factor analyzed and the solution was rotated toward the "target'' matrix via the Procrustes method (Hurley & Cattell, 1962). Rotations were also accomplished via normalized varimax procedures (Kaiser, 1958). Comparisons of both solutions cleanly matched the hypothesized dimensional structure of the SCL-90-R, with only the Psychoticism dimension showing some scatter.

< previous page

page_683

next page >

< previous page

page_684

next page > Page 684

A rigorous and systematic series of validation experiments, reflecting elements of concurrent, criterion-oriented, and construct validity for the SCL-90-R were recently reported by Peveler and Fairburn (1990). They compared and correlated scores from the SCL-90-R with those from the Present State Examination (PSE; Wing, Cooper, & Sartorious, 1974), a clinician-administered, detailed, structured interview. Two distinct samples were utilized in the study: a sample of diabetics (n = 102) representing a chronic medical disease group, and a cohort of bulimics (n = 71) exemplifying patients with high levels of "neurotic" symptoms. Three distinct validation experiments comprised the study. In the first investigation, the case finding power of the SCL-90-R was evaluated via ROC and logistic regression analyses. In this experiment, the proficiency of the SCL-90-R to detect PSE-defined psychiatric "caseness" was evaluated. The instrument performed efficiently in each instance, with areas under the ROC curve (AUC) of .90 + .03 in both cases. In the diabetic sample the optimum sensitivity was 88% with a specificity of 80%, whereas with the bulimic sample sensitivity was 76% with a specificity of 92%. Logistic regression analysis relating the GSI from the SCL-90-R to the probability of being a PSE-defined case also characterized the instrument favorably. Sensitivity among diabetics was 72%, and specificity was 87%; among bulimics, values were 77% and 91%, respectively. These investigators also evaluated the validity of the global indices of the SCL-90-R as accurate measures of general severity of psychopathology by correlating them with global indices from the PSE. Across both samples, all coefficients were statistically significant, and ranged from approximately .60 to .82. In addition, the validities of the SCL-90-R subscales were tested by evaluating their capacities to predict the presence of PSE syndromes through discriminant function analysis. Appropriate subscales were revealed in 12 of 14 instances in the diabetic sample, and 11 of 14 cases in the bulimic cohort. A further concurrent validation exercise was conducted with the depression subscale of the SCL-90-R by correlating it with two independent depression inventories, the BDI and the Asberg Rating Scale. Correlations were .80 and .81, respectively. As noted previously, the type of validation of most interest to clinicians and researchers is the more tangible and pragmatic form, that is, "predictive," or more generally, "criterion-oriented" validity. Current estimates, including SCL-90-R: A Bibliography of Research Reports, 1975-1990 (L.R. Derogatis, 1990), suggests that there are now well over 1,000 published reports involving the SCL-90-R with criterion-oriented validation, demonstrating the breadth of sensitivity of the instrument. Therapeutic intervention studies evaluating treatments as diverse as meditation (Carrington et al., 1980), multicenter psychotherapy protocols (Shapiro & Firth, 1987), and numerous psychotropic drug trials (Ballenger et al., 1988; Noyes et al., 1984) all attest to the instrument's sensitivity to treatment-induced change. Characteristic SCL-90-R profiles for most major diagnostic groups have been established, including those for anxiety (Cameron, Thyer, Nesse, & Curtis, 1986), depression (Prusoff, Weissman, Klerman, & Rounsaville, 1980), panic disorder (Buller, Maier, & Benkert, 1986), and sexual dysfunctions (L.R. Derogatis, Meyer, & King, 1981). Such profiles have also been developed for recently delineated compound nosologic subtypes, for example, comorbid panic/depression (Wetzler, Kahn, Cahn, van Praag, & Asnis, 1990) and substance abuse (Steer, Platt, Ranieri, & Metzger, 1989). In addition to many studies of this nature, the SCL-90-R has been utilized as a distress measure with most major medical illness groups (e.g., cancer, cardiovascular, diabetes, renal diseases) and specialized disorder groups (e.g., eating disorders, stress, chronic pain).

< previous page

page_684

next page >

< previous page

page_685

next page > Page 685

Interpretation of the SCL-90-R The SCL-90-R was designed to be interpreted in terms of three distinct but related levels of information: global scores, dimension scores, and items. The optimal interpretation of the test protocol depends on integration of information from all three sources. A significant advantage associated with the SCL-90-R, and one that more clinical instruments are adopting, has to do with the fact that test scores are standardized and reported in terms of area T-scores. Compared to linear T-scores, area scores possess considerable advantages in that they involve a normalizing area transformation of the raw score distribution. This translates into the capacity to make actuarial statements concerning test respondents' status relative to the norm(s), and the capability to accurately place them in an accurate normative centile position. As an example, regardless of the specific symptom dimension under consideration, an area T-score of 60 will always place the respondent in the 84th percentile of the referent norm. Similarly, an area T-score of 70 will place the individual in the 98th percentile. This feature enables the clinician to not only make accurate comparisons between his patient's status and various norms of interest, but also enables meaningful comparisons within individual profiles (e.g., comparison of depression versus anxiety scores), and a more meaningful interpretation of therapeutic change. When linear T-scores are used, such comparisons are at best rough approximations unless the underlying raw score distribution is precisely normal. Global Scores. The GSI represents the most sensitive single quantitative SCL-90-R indicator concerning the respondent's overall psychological distress status. It reflects information on both the number of distress manifestations individuals are experiencing, and the intensity level of their distress. By comparison, the PSDI is designed to be more of a "pure" intensity measure, adjusted for numbers of symptoms. In addition, the PSDI can provide useful information as to the respondent's distress style, that is, whether a person is apt to be an "augmenter" who typically exaggerates distress, or a "minimizer" who is more likely to utilize understatement. The PST reveals the number of symptoms respondents have endorsed to any degree. It contributes to interpretation by conveying the ''breadth" or array of symptoms that individuals are currently experiencing. Although there are no formal "validity" scales on the SCL-90-R, the PST can also provide a coarse indication of whether or not respondents are attempting to consciously misrepresent their status. Concerning symptom suppression, PST scores of 3 or less for adult normal females and 2 or less for adult normal males are extremely uncommon, and should be viewed with some misgivings. On the question of augmentation, PST scores greater than 70 for females and greater than 65 for males are rarely observed as valid scores outside of psychiatric inpatient populations. Although crude indicators, these values can be useful in identifying individuals in the community population with extreme response styles. Dimension Scores and the SCL-90-R Profile A major advantage associated with the use of the SCL-90-R resides in the fact that, in spite of its relative brevity, it delivers a multidimensional symptom profile, which contains a great deal of information concerning the respondent's symptomatic distress. Multidimensional measurement significantly enhances the breadth of clinical assessment

< previous page

page_685

next page >

< previous page

page_686

next page > Page 686

compared to unidimensional measurement by providing a syndromal context within which specific dimensional psychopathology may be more meaningfully evaluated. It delivers a symptomatic context against which to evaluate scores on any specific symptom dimension. In conjunction with global scores and data on specific symptoms, it enhances the development of an integrated picture of the respondent's clinical status and level of well-being. Individual Symptoms The third element in the interpretive strategy underlying clinical evaluation with the SCL-90-R involves the use of the discrete items or symptoms of the test. Not only is reference made here to the 83 items comprising the 9 primary symptom dimensions, but also the 7 additional or configural items of the test. For example, an elevated depression score plus a substantial score on suicidal ideation (Item 15) should be interpreted differently than the same depression score in the absence of evidence of suicidal ideation. In such an instance, suicidal ideation would be treated as a "symptom of note," the presence of which would clearly alter the clinical decision process. As another example, clinical levels of depression combined with early morning awakening (Item 64), loss of interest (Item 32), and high levels of guilt (Item 89) may signal the emergence of a major affective disorder. The same depression score with a different pattern of accompanying symptoms could be interpreted quite differently. The configural items are not pure or univocal symptoms of any specific dimensional construct; they are designed to enhance specific predictions concerning the respondent's clinical status. They represent clinically significant symptom manifestations that are not unique to any of the SCL-90-R's primary symptom dimensions. Sleep and appetite problems, for example, are important general clinical manifestations. They do not occur solely in the context of a particular syndrome, but their presence in a particular case can be a significant aid in clinical decision making. Caseness Criteria for the SCL-90-R When the SCL-90-R or any other psychological inventory or rating scale is utilized in a screening paradigm, an operational definition of "caseness" must be established. The caseness criterion essentially refers to the numerical value, that is, a "cutoff" score, on a test indicator, at or above which the respondent is considered to be a ''positive" or a case. The caseness criterion is a probabilistic value, chosen to maximize valid case identification (e.g., sensitivity and specificity), and minimize errors (i.e., false positives and false negatives). In psychiatric screening, it is difficult to develop a definitive caseness criterion value for a particular test, because other important parameters (e.g., gender, age, prevalence of the condition in the population being screened) can significantly effect the validity of any criterion value. Nevertheless, it is possible to establish a general criterion for caseness that has demonstrated generalizability across a range of populations and has proven useful in a number of screening contexts. Such a general caseness criterion value is given here for the SCL-90-R. Although it is not possible in the context of this chapter to provide complete supporting data for the general caseness criterion given here, some information can be provided. This criterion has shown effectiveness in accurately discriminating individuals comprising the

< previous page

page_686

next page >

< previous page

page_687

next page > Page 687

normative community nonpatient cohort from those comprising the psychiatric outpatient sample. Further, in a multicenter epidemiologic study of the prevalence of psychiatric disorder in newly admitted cancer patients, the predictive value of a positive was 86% using this criterion (L.R. Derogatis et al., 1983). According to this definition, a case is defined by:

This operational definition states, if the respondent has a GSI score (using Norm B, the community nonpatient norm) greater than or equal to a T-score of 63, or any two primary dimension scores are greater than or equal to a T-score of 63, then the individual shall be considered at high risk for a psychiatric diagnosis and, therefore, a case. The BSI and Its Development The Brief Symptom Inventory (BSI) is the brief form of the SCL-90-R. It is a 53-item self-report symptom inventory designed to assess the psychological symptom patterns of community respondents and psychiatric and medical patients. The BSI was designed to provide valid, multidimensional measurement of psychological distress in a brief (10-minute) assessment period. The instrument's 53 items were selected to best reflect the nine primary symptom dimensions of the SCL-90-R in a brief measurement scale. The BSI also shares the three global indices of distress associated with the SCL-90-R: the Global Severity Index (GSI), the Positive Symptom Distress Index (PSDI), and the Positive Symptom Total (PST). As with the SCL-90-R, the global indices were designed to provide more flexibility in overall assessment of psychopathologic status, as well as psychometric appraisal at a third, more global level of psychological well-being. As with the SCL-90-R, the BSI is scored and profiled in terms of its nine primary symptom dimensions and the three global indices of distress. The three global indicators, nine dimensions, and 53 items reflect the three principal levels of interpretation of the BSI, descending from general, global measures of psychological status, through dimensional syndromes, to individual symptoms. BSI Norms At present, four major norms derived from four distinct normative samples have been developed for the BSI: 974 community nonpatient adult subjects, 1,002 heterogeneous psychiatric outpatients, 423 psychiatric inpatients, and 2,408 community adolescent respondents. Detailed information on the BSI formal norms is available in the BSI: Administration, Scoring and Procedures Manual (3rd ed.; Derogatis, 1993). In addition, Hale, Cochran, and Hedgepeth (1984) independently published norms for the elderly on the BSI, Cochran and Hale (1985) reported norms for American college students, and Canetti, Shalev, and DeNour (1994) reported norms for Israeli adolescents. The community adult nonpatient sample is essentially identical to the sample upon which norms for the SCL-90R were developed, and forms the basis for the B norm on the BSI. There were 494 males and 480 females comprising the sample, and although data on race, marital status and age were recorded, detailed demography on other variables was not available.

< previous page

page_687

next page >

< previous page

page_688

next page > Page 688

More complete information was available for the sample of psychiatric outpatients. There were 425 males and 577 females in this cohort, which was approximately two thirds white. Social class was skewed toward the lower end of the socioeconomic scale among the outpatients, with about 35% of the sample from Hollingshead Class III or above. This sample forms the basis for the A norm of the BSI. Although smaller than the other samples, the psychiatric inpatient cohort also has detailed demographic information available. Females outnumbered males approximately 2 to 1, and slightly more whites than Blacks were involved. Approximately 45% of the inpatients were never married, whereas only 20% of them were from social Class III or above. This group is the basis for the C norm of the BSI. The fourth BSI normative group formed the basis for the community adolescent nonpatient norm. Males are represented approximately 2:1 in this cohort, which was derived from six separate schools in two states. This sample is racially comprised of 58% whites, 30% Blacks, and 12% other. Social class is modally distributed in the working class group; however, there is reasonable representation in the other socioeconomic groups as well. Age in the cohort ranged from 13 to 19, with a mean of 15.8 years. This sample formed the basis for the E norm. BSI Reliability and Validity Reliability Internal consistency reliability coefficients were established based on a sample of 719 psychiatric outpatients, using Cronbach's coefficient alpha (a). The alpha coefficients for all nine dimensions of the BSI ranged from a low of .71 on the Psychoticism dimension to a high of .85 on Depression. Other investigators have independently reported internal consistency coefficients in a comparable range for the BSI (Aroian & Patsdaughter, 1989; Croog et al., 1986). Test-retest reliability is an indicator of the consistency of measurement across time. Once established, psychological distress or psychopathology tends to endure for moderate to substantial periods of time if untreated; therefore, a test designed to measure symptomatic distress should register high test-retest coefficients over a span of 2 weeks. A sample of 60 nonpatient individuals were tested across a 2-week interval. Coefficients ranged from a low of .68 for Somatization to a high of .91 for Phobic Anxiety. The Global Severity Index also revealed an excellent stability coefficient of .90, providing assurance that the BSI represents consistent measurement across time. Internal consistency and test-retest reliability coefficients for the nine primary symptom dimensions and three global indices of the BSI are represented in Table 23.2. Another form of reliability that is frequently discussed concerning psychological tests is alternate forms reliability. This form of reliability is illustrated in correlation between score distributions from two different forms of a test. Although there is no pure alternate form of the BSI, the SCL-90-R represents a test that measures identical symptom constructs. Based on a sample of 565 psychiatric outpatients, the correlations between the two tests across the nine primary symptom dimensions they share were calculated and are represented in Table 23.3.

< previous page

page_688

next page >

< previous page

page_689

next page > Page 689

TABLE 23.2 Internal Consistency and Test-Retest Reliability Coefficients for the Nine Primary Symptom Dimensions and Three Global Indices of the BSI Dimension Number of Internal Test-Retest Items Consistency (rtt) (a) (N = 719) I. Somatization (SOM) 7 .80 .68 II. Obsessive-Compulsive (O6 .83 .85 C) III. Interpersonal Sensitivity (I4 .74 .85 S) IV. Depression (DEP) 6 .85 .84 V. Anxiety (ANX) 6 .81 .79 VI. Hostility (HOS) 5 .78 .81 VII. Phobic Anxiety (PHOB) 5 .77 .91 VIII.Paranoid Ideation (PAR) 5 .77 .79 IX. Psychoticism (PSY) 5 .71 .78 Global Indices Global Severity Index (GSI) .90 Positive Symptom Distress .87 Index (PSDI) Positive Symptom Total (PST) .80 TABLE 23.3 Correlations Between Symptom Dimensions of the SCL-90-R and the BSI Based on 565 Psychiatric Outpatients SOM O-C INT DEP ANX HOS PHOB PAR PSY .96 .96 .94 .95 .95 .99 .97 .98 .92 The data demonstrate very high correlations between the BSI and the SCL-90-R on all nine symptom dimensions. At least for a psychiatric population, the two tests show high agreement across the nine symptom constructs. BSI Validity A comprehensive review of criterion-oriented validity studies involving the BSI was recently provided by L.R. Derogatis (1993). Approximately 120 research reports on the BSI were reviewed involving substantive areas such as screening/case-finding, oncology, sexual disorders, psychoneuroimmunology, psychopathology, pain assessment/management, therapeutic interventions, HIV research, hypertension, student mental health, and various general clinical areas. In addition, L.R. Derogatis and M.F. Derogatis (1996) also published a comprehensive review of research with both the SCL-90-R and the BSI. These studies collectively demonstrate the BSI to be broadly sensitive to the manifestations of psychological distress and interventions designed to ameliorate it across a broad range of contexts. To illustrate the BSI's sensitivity to psychological distress status, several of the more exemplary of these studies are briefly reviewed next.

< previous page

page_689

next page >

< previous page

page_690

next page > Page 690

Evidence for the BSI's sensitivity in a screening paradigm is given by a recent report contrasting several methods for the psychosocial screening of newly diagnosed cancer patients (Zabora, Smith-Wilson, Fetting, & Enterline, 1990). These investigators reported an 84% "hit rate" for the BSI in identifying patients who were determined by independent criteria to be suffering from clinical levels of distress, both at time of initial diagnosis, and subsequently at 1-year follow-up. Additionally, a comparative cost benefits analysis resulted in a strong recommendation for the BSI. Gift (1991) also reported on the sensitivity of BSI subscales to differential respiratory status in a sample of adult asthmatics. In an attempt to determine the underlying causes of episodes of dyspnea (difficulty breathing) in these patients, Gift utilized the BSI and measured airway obstruction and oxygen saturation during periods of high and low dyspnea. Significant elevations were noted on Anxiety, Depression, Somatization, and Hostility scales during periods of high dyspnea. Thompson, Gallagher, and Breckenridge (1987) demonstrated high sensitivity for the BSI in a study of treatment-induced change. These investigators compared the relative efficacy of three distinct psychotherapies in applications with depressed elderly patients. Although no substantial differences were observed between treatments, the BSI showed significant reductions in psychological distress for all three interventions across time, a finding that supported an alternate hypothesis. Finally, in an interesting study reported by Chiles, Benjamin, and Cahn (1990), the BSI was utilized with a random sample of 802 members of the Washington State Bar Association to contrast the psychological distress levels of smokers versus nonsmokers. Results showed that among male members of the bar, almost all BSI subtests revealed smokers to be significantly more highly distressed than nonsmokers. The Somatization, Anxiety, and Depression scales made the greatest contribution to discrimination, with the highly distressed group also showing significantly greater alcohol use. No comparable differences were observed for females. BSI Interpretive Strategy Strategies for interpreting the BSI are, for the most part, identical to those outlined earlier for the SCL-90-R. BSI test scores are standardized in terms of area T-scores, and formal gender-keyed norms are available for the same four normative groups. General caseness criteria are also essentially identical for the BSI, and clinical interpretation is based on integration of information from the same three levels of data as the SCL-90-R, although minor differences do exist at the individual symptom level. The SCL-90-R and the BSI in Treatment Planning In order for a psychological test instrument to be useful in treatment planning, it must be sensitive (e.g., possess predictive validity) to three distinct but related aspects of patient status: patient status at initial evaluation, treatment-induced change, and patient status posttreatment and during any maintenance regimens. Obviously, before an effective treatment plan can be developed, a clinician must know as much as possible about the nature and magnitude of the patient's presenting condition. Diagnostic interviews, medical records, psychological testing, and interviews

< previous page

page_690

next page >

< previous page

page_691

next page > Page 691

with relatives all represent sources of information that facilitate the development of an effective treatment plan. Rarely is information from a single modality (e.g., psychological testing) definitive. Ideally, each source provides an increment of unique information that, taken collectively with data from other sources, contributes to an ultimate understanding of the case at hand. Although seldom conclusive, a substantial array of pertinent information can be developed from an initial psychological assessment. At a minimum, the degree or magnitude of a patient's psychological morbidity or distress should be discernible. Although diagnosis per se is usually not possible from point-in-time assessments, the basic nature (e.g., depression, panic attacks) and profile of the client's distress should be appreciable, as well as salient symptom characteristics (e.g., suicidal ideation, early morning insomnia). Degree of divergence from normative levels and the potential for effective intervention (i.e., prognosis) are also characteristics that would ideally be estimable at initial evaluation. Initial assessment of patient status should communicate the magnitude, nature, and pervasiveness of a patient's presenting condition, and some estimate of the likelihood of successful therapeutic intervention. Although an ideal, information on the relative probabilities of success associated with different therapeutic approaches would also be very useful. Treatment-induced change has both common (to all therapeutic modalities) and specific components. When used for treatment planning, psychological test instruments should be sensitive to both kinds of effects. Also, depending on the nature, pervasiveness, and chronicity of the condition, and the coping resources available to the patient, the degree of benefit anticipated from a course of "effective" treatment can vary widely. The ideal planning instrument should be capable of not only registering the influence of such factors, but should be sensitive across each factor's effective range of values. Patient status at the termination of treatment represents a critical assessment. It reflects the basic evidence of the delivery of treatment and its relative efficacy, at least in terms of the indicators being measured. Meaningful decision making concerning clinical status at treatment termination requires not only adequate clinical and community norms for a test, but ideally, a mechanism that would define what constitutes a "therapeutic," or clinically reliable, magnitude of change. Such a method would communicate how much change is considered clinically meaningful given the patient's baseline status, and whether or not it is possible to be confident that this change is clinically as well as statistically significant. Recently, an effective approach to this long-standing question has been advanced by Jacobson and his colleagues (Jacobson & Truax, 1991), a method discussed in detail in another section. A realistic concept of effective treatment must also be grounded in multiple domains of experience (e.g., vocational performance, interpersonal relationships, reduction of symptomatic distress, family relationships). Currently, few test instruments adequately reflect all relevant domains, reminding clinicians that most tests address only limited aspects of treatment efficacy and effectiveness. Bearing these caveats in mind, the next section is devoted to reviewing a series of recent reports demonstrating the treatment planning relevance of symptom distress data as measured by the SCL-90-R/BSI. An extremely diverse range of respondents are represented in these studies, illustrating the general utility of these brief tests across a broad array of clinical contexts. Consistent with the interpretive strategy outlined previously, information from all three levels of interpretation (i.e., global score, dimension score, and item score) has been utilized in contributing to treatment planning.

< previous page

page_691

next page >

< previous page

page_692

next page > Page 692

Information on overall severity, pattern of distress/ psychopathology, and the specific symptom picture may all be employed in the development of a treatment strategy. Treatment Planning for Psychotherapy For obvious reasons, treatment planning focused on psychotherapeutic interventions with psychiatric patients is of major interest to modern health care systems. Crits-Christoph (1992) completed a meta-analysis of 11 contemporary studies evaluating the efficacy of brief dynamic psychotherapy. The instrument chosen to register the "general level of psychiatric symptoms" in the meta-analysis was the SCL-90-R. Using J. Cohen's (1977) dstatistic, the SCL-90-R revealed large treatment effects for brief dynamic psychotherapy when compared to waiting list control (d = .82), small effects (d = .20) when contrasted with alternative nonpsychiatric treatments, and equal effects compared to other psychotherapeutic approaches (d = .05). Crits-Christoph concluded, as have other investigators in this area, that various psychotherapies do not differ significantly in effectiveness. In a separate evaluation of brief dynamic psychotherapy, M.J. Horowitz, Wilner, Kaltreider, and Alvarez (1980) used the SCL-90-R to assess symptomatic distress during a 10-week waiting period followed by 20 sessions of active treatment. Results showed a slight but observable reduction in distress levels during the 10-week waiting period, followed by a dramatic reduction in distress during the first 10 psychotherapy sessions. Further reductions in distress were not noted in Sessions 10 through 20, probably because mean distress levels had reached the margins of the normal range by Session 10 and were unlikely to drop much further. As has been inferred by previous investigators, Horowitz et al. concluded that symptomatic improvement appears to take place predominantly during the earlier phases of treatment. Using a somewhat different design, Winston et al. (1991) evaluated the efficacy of two distinct variations of brief dynamic psychotherapy in a sample of nonacting-out personality disorders. Patients were randomly assigned to one of the two treatment paradigms on a once-per-week treatment schedule with a maximum of 40 sessions. Results showed a significant reduction in distress on the GSI and four of the primary symptom dimensions (Anxiety, Phobic Anxiety, Depression, and Psychoticism) from admission to termination for both treatments compared to controls. Moderate to large effect sizes were associated with these differences. In addition, Anxiety and Phobic Anxiety significantly discriminated one intervention group from controls, whereas Depression and Psychoticism distinguished the second therapeutic approach. The two treatment groups were not significantly different from each other on any measures. Several studies with the SCL-90-R and the BSI have evaluated alternates to dynamic psychotherapies. Fairburn and associates (1991) compared two variations of cognitive behavior therapy and interpersonal therapy in a cohort of 75 patients diagnosed with bulimia nervosa. The SCL-90-R revealed significant efficacy for all three treatment approaches from baseline to treatment termination; however, no significant differential treatment effects were observed for any of the three treatments. The predominant symptomatology in this sample was depressive in nature, and mean severity of distress (e.g., GSI scores) dropped from the 98th percentile to the 85th percentile of the community female norm during treatment. In a similar treatment design with elderly depressed patients, Thompson et al. (1987) used the BSI to evaluate differences in therapeutic outcome between cognitive, behavioral, and brief psychodynamic psychotherapies.

< previous page

page_692

next page >

< previous page

page_693

next page > Page 693

All three therapeutic approaches showed significant reductions on Anxiety and Depression as well as the GSI compared to controls; however, there were no significant therapeutic differences. More recently, Shear, Pilkonis, Cloitre, and Leon (1994) used the SCL-90-R to compare cognitive-behavioral treatment (CBT) and a nonprescriptive treatment (NPT) involving reflective listening to determine the efficacy of each modality in treating panic disorder. Both CBT and NPT were shown to be effective following 12 weeks of therapy as well as 6 months posttreatment, wherein treatment gains were still maintained. Results indicated strong similarities between CBT and NPT, challenging whether CBT strategies are as modality specific as assumed or if they are generalizable to other treatment methods. In contrast, Kabat-Zinn et al. (1992) demonstrated significant efficacy for a meditation-based group stress reduction program with patients suffering from generalized anxiety disorder and panic disorder. Those patients whose GSI were above the 70th percentile on the SCL-90-R community nonpatient norms were observed to benefit disproportionately from this treatment intervention. Recently, Beutler et al. (1991) employed the BSI in a comparison of the relative efficacy of group cognitive therapy, focused expressive psychotherapy, and supportive self-directed therapy in the treatment of major depressive disorder. Patient coping style (i.e., externalization/internalization, high/low defensiveness) was introduced in the design as a mediating variable. Study results had substantial implications for treatment planning in that very significant interactions were observed between improvement (i.e., reduced BSI scores), type of therapy, and the patient's predominant coping style. Relating more to psychotherapy process, Pekarik (1983) found the BSI to effectively discriminate among patients who gave different reasons for dropping out of outpatient psychotherapy, and Gilbar and Kaplan-DeNour (1988) used the BSI to demonstrate elevations in psychological distress levels among cancer patients who dropped out of cancer chemotherapy treatment versus those who remained in treatment until completion. Treatment Planning in Anxiety and Depressive Disorders For an instrument to be optimally useful in treatment planning it must be sensitive not only to psychotherapeutic effects, but to variations in psychopathology and other therapeutic influences as well. Waryszak (1982) published an interesting prospective study on symptomatic distress in a sample of Australian psychiatric inpatients. Using the SCL-90-R, he evaluated patients at admission to the unit, and at 1 month and 4 months postdischarge. Findings showed that symptomatic distress was reduced significantly from admission to 1-month follow-up, and continued to show significant reductions at 4-months postrelease. General severity levels dropped from approximately the 98th percentile to the 84th percentile (a full standard deviation) of the community norm during the period from admission to the 4-month follow-up. Concurrent measurement of social adjustment showed a much slower and less dramatic recovery over this period. In another prospective study, Wicki and Angst (1991) demonstrated in a community sample of young adults that individuals who subsequently went on to receive a formal diagnosis of hypomania revealed significant elevations in symptomatology 7 years previously on the SCL-90-R. Most subscales showed a heightened sensitivity in those ultimately receiving a diagnosis, with Paranoid Ideation and Interpersonal Sensitivity revealing the largest predictive effects.

< previous page

page_693

next page >

< previous page

page_694

next page > Page 694

A study of untreated individuals with dysthymia by McCullough et al. (1994) revealed no significant changes in affective symptoms, severity of those symptoms, or global severity of symptoms as indicated by the SCL-90-R. As has been observed of dysthymia by previous investigators, this disorder is chronic and pervasive, and without effective treatment, psychological maladjustment is highly unlikely to remit. A comprehensive study of contrasting treatments of depression, using cognitive-behavioral (CB) versus psychodynamic-interpersonal therapy (PI), was conducted by Shapiro et al. (1994). Patients were divided into low, moderate, and high severity levels of depression and were assessed post 8-week and 16-week sessions. Overall, patients demonstrated marked improvement of symptomatology, regardless of treatment modality. Subsequently, no significant differences were found between CB and PI treatments with regard to symptom improvement. There also were no significant interactions between severity of depression and treatment duration on the SCL-90-R Depression subscale. A more recent study of the relation between depression and somatic complaints in the elderly was investigated by Magni, Frisoni, Rozzini, De Leo, and Trabucchi (1996). Focusing on the BSI depression scale, these researchers found a positive correlation between depression and the amount of pain reported. Furthermore, spontaneous reporting of somatic problems was more highly correlated with severity of depression than were reports of pain prompted by direct inquiry. These overall findings were only observed in elderly with intact cognitive abilities, not those with impaired mental faculties. In a dramatic demonstration of treatment-relevant differential sensitivity for the SCL-90-R, Rosenberg, Bech, Mellergard, and Ottosson (1991) used the SCL-90-R to discriminate various categories of panic patients with and without comorbid clinical depression. The SCL-90-R demonstrated significant differences between patients diagnosed as having concomitant major depression, minor depression, and absence of mood disorder based on the Hamilton Rating Scale for Depression. In addition, the SCL-90-R also effectively discriminated between diagnostic categories of current major depressive episode, other mood disorder, and no mood disorder based on the Standardized Clinical Interview for Diagnosis of DSM-III Disorders (SCID). Rosenberg et al. concluded that the data support a common diathesis for panic and mood disorders, with more severe examples of the condition being characterized by symptoms of both anxiety and depression. Vollrath, Koch, and Angst (1990) also recently reported on comparisons of patients with panic disorder versus those with panic and comorbid depression using the SCL-90-R. These investigators found that the Phobic Anxiety dimension, and to a lesser degree the Anxiety dimension, effectively discriminated these subgroups, with the panic/depression group revealing greater general severity and the indication of a more specific nosology. The Suicidal Patient A prominent issue in treatment planning concerns the reliable early identification of the potentially suicidal patient. Several recent studies have addressed this question using the SCL-90-R/BSI. Bulik, Carpenter, Kupfer, and Frank (1990) contrasted 67 patients suffering from recurrent major depression and a history of attempted suicide with 163 recurrent depressives without a history of suicidal behavior. Four subscales (Somatization, Interpersonal Sensitivity, Paranoid Ideation, and Psychoticism), as well as the global scores, significantly discriminated positive from negative attempters. A logistic regression analysis with these and other variables enabled 77% correct prediction of cases.

< previous page

page_694

next page >

< previous page

page_695

next page > Page 695

There is increasing evidence (Coryell, 1988) that panic disorder has associated with it an increased risk for suicide, just as a diagnosis of depression. In an analogous evaluation of panic patients who did and did not attempt suicide, Noyes et al. (1991) reported findings similar to those of Bulik and her colleagues (1990). Seven of the nine primary symptom dimensions of the SCL-90-R and the GSI successfully discriminated suicide attempters from those who did not make attempts. Just as Bulik et al. (1990), these investigators found patients who made suicide attempts had greater severity of distress in general, with particular elevations on measures of inferiority feelings and self-deprecation. L.J. Cohen, Test, and R.L. Brown (1990) employed the BSI among other measures to predict potential for suicide among schizophrenic patients being treated in a community treatment center. Eight of the 82 patients in the sample eventually committed suicide. In addition to greater dissatisfaction with their lives at the time of admission, these patients revealed significantly higher distress levels on the BSI. Swedo and her associates (1991) extended the predictive validity of the SCL-90-R relative to suicidal behavior to suicidal adolescents. They compared adolescents with a history of attempted suicide to adolescents judged to be at risk for suicide for a variety of reasons, and an adolescent control group. All SCL-90-R measures successfully distinguished the attempters from controls; the majority of subscales differentiated those at risk from controls; and the Obsessive-compulsive subscore and the PSDI significantly discriminated the attempters from those at risk. As their adult counterparts, adolescents who actually attempt suicide tend to perceive themselves as more distressed and hopeless than other adolescents who are at risk. Trauma and PTSD Several studies have recently evaluated the utility of the SCL-90-R and the BSI in identifying variables related to the onset or maintenance of posttraumatic stress disorder (PTSD). Shalev (1992) investigated the prevalence of PTSD in survivors of a terrorist attack on a civilian bus in Jerusalem. Approximately one third of those survivors were diagnosed with PTSD. Scores on the BSI of those individuals revealed high levels of obsessive symptoms, depression, anxiety, phobia, and hostility. In addition, GSI and PSDI scores corresponded to the BSI's norms for psychiatric outpatients. A follow-up assessment indicated an increase of avoidance symptoms and a decrease in intrusive symptoms over an 8- to 10-month period. Vietnamese boat refugees were examined by Hauff and Vaglum (1994) for chronic symptoms of PTSD following resettlement. Using the Vietnamese version of the SCL-90-R, responses to items associated with Interpersonal Sensitivity, Hostility, and Psychoticism subscales could be distinguished between those refugees with chronic PTSD and the remaining cohort without PTSD. It is important to note that responses to items on the Anxiety subscale were indistinguishable across groups. In addition, refugees with PTSD had been exposed to more traumatic stress prior to escape and exhibited more psychopathology than those without PTSD. From a different perspective, Beckham, Lytle, and Feldman (1996) examined caregiver burden in 58 partners of Vietnam war veterans with PTSD. Severity of PTSD was measured by the GSI of the SCL-90-R and was found to be significantly related to caregiver burden. Veterans whose caregivers reported higher levels of burden had substantially more severe symptoms of PTSD. An 8-month follow-up assessment revealed

< previous page

page_695

next page >

< previous page

page_696

next page > Page 696

changes in burden that were correlated with changes in caregivers' psychological distress and dysphoria. Alcohol and Substance Abuse Because of the relatively high prevalence of alcohol and substance abuse disorders and their prominent comorbidity with other psychiatric disorders, evidence of the utility of the SCL-90-R/BSI in treatment planning with these classes of patients is very important. Desoto, O'Donnell, Allred, and Lopes (1985) completed a very informative study on the recovery from alcoholism over time. They compared the symptomatology of 363 recovering alcoholics on the SCL-90-R across five temporal abstinence groups (< 6 mos.; 6 mos.-2 years; 2-5 years; 5-10 years; > 10 years). Results showed a slow but progressive reduction in symptomatic distress over the 10-year period (mean GSI = 1.04, 0.74, 0.56, 0.48, 0.37, respectively, for the five groups). Early during recovery, dramatic levels of distress were in evidence, followed by eventual reductions to normative levels. Normal levels of distress were not reached for 5 to 10 years, however. The most prominent elevations occurred on the Depression, Interpersonal Sensitivity, Obsessive-Compulsive, Psychoticism, and Anxiety subscales, with the symptom of guilt being predominant. Distress on these measures eventually fell to normal levels; however, the investigators noted a residual syndrome of cognitive dysfunction that remained present even after many years of abstinence. Alcoholism rarely occurs as a completely independent condition, thus it is important in developing optimal treatment strategies to identify subtypes of the disorder that have relevance for treatment course and outcome. Liskow, Powell, Nickel, and Penick (1991a) used both the SCL-90-R and the MMPI to discriminate four diagnostic subtypes among a sample of 360 male inpatient alcoholics. Twenty-nine percent of the sample was found to have a comorbid antisocial personality disorder (ASP). These were further discriminated into ASP and alcoholism, ASP and alcoholism plus drug dependence, and ASP and alcoholism plus depression. The SCL-90R profiles for the four groups were highly discriminated, an important characteristic for treatment planning because these subtypes were observed to differ substantially in terms of onsets, severity, course of alcoholism, and pattern of medical complications. In a 1-year follow-up study, Liskow, Powell, Nickel, and Penick (1991b) observed that the ASP plus drug dependence subgroup showed the poorest rate of improvement, whereas the ASP plus depression subgroup showed substantial improvement. They concluded that the presence of additional drug problems in ASP alcoholics was a poor prognostic sign, and the presence of clinical depression indicated a high probability of successful treatment. In a study more oriented toward treatment evaluation per se, Dongier, Vachon, and Schwartz (1991) utilized the SCL-90-R to help evaluate the efficacy of bromocriptine as a treatment for alcohol dependence in an 8-week double-blind, randomized trial with ambulatory alcoholics. Results showed the SCL-90-R Interpersonal Sensitivity and Hostility subscales and all three global measures to significantly discriminate the bromocriptine versus placebo groups, with the Depression, Somatization, and Paranoid Ideation scales revealing marginally significant differences. Turning to drug dependency, a number of researchers have published studies recently with the SCL-90-R that have high relevance for treatment design. M.P. Carey, K.B. Carey, and Meisler (1991) demonstrated the dual impact of comorbid conditions in a study contrasting a heterogeneous sample of psychiatric patients who also abused drugs

< previous page

page_696

next page >

< previous page

page_697

next page > Page 697

with a matched sample of psychiatric outpatients with no history of drug abuse. The sample with additional drug abuse had significantly higher symptom distress scores on six of nine subscales and all three globals of the SCL90-R. Following on the work of Rounsaville, Glazer, Wilber, Weissman, and Kleber (1983), which showed a sensitivity of 89% for the SCL-90-R in detecting psychopathology among heroin addicts, Steer, Platt, Hendriks, and Metzger (1989) used modal profile analysis with Dutch and American cohorts of heroin addicts to identify three distinct subtypes based on the SCL-90-R: anxious-depressed, hostile, and paranoid. In addition to the observation that the paranoid subtype was much more likely to also use marijuana, they discussed a number of distinct treatment planning options that could hinge on the availability of this information. The same group of investigators (Steer, Platt, Ranieri, & Metzger, 1989) conducted a similar analysis of SCL-90-R data from 458 methadone patients. They observed the same three modal subtypes, and in addition, defined a fourth somaticizing subtype. The potential utility and impact on treatment planning of subtype membership in this group of chemical abusers was also discussed. In demonstrating its sensitivity to differential levels of psychopathology in the patient with substance abuse, Kleinman et al. (1990) administered the SCL-90-R to three distinct groups of cocaine abusers: those free of any additional DSM-III-R diagnosis, those with an additional DSM-III-R Axis II (personality disorder) diagnosis, and those with an additional DSM-III-R Axis I (clinical) diagnosis. Mean GSI scores for the three groups were .53, .65, and .87, respectively, illustrating high levels of discriminative sensitivity. More recently, M.E. Johnson, Brems, and Fisher (1996) compared psychopathology levels of drug abusers not receiving treatment with those in treatment. Using data from Mercier et al.'s (1992) treatment sample, SCL-90-R scores were significantly higher for all subscales than the nontreatment sample except for the PSDI scale. As predicted, drug abusers in treatment were found to be more symptomatic than those not in treatment, except on the Hostility and Paranoid Ideation scales where the nontreatment group exhibited higher levels. Approximately 60% of male nontreatment abusers and 47% of female nontreatment abusers obtained GSI scores equal to or greater than the cutoff for ''caseness" warranting a dual diagnosis. The implications of this study are that the presence of a comorbid condition is associated with a greater likelihood that drug abusers will seek treatment. Treatment Planning with Medical Patients Psychological factors play a prominent role in the etiologies and courses of many medical conditions, an observation that has been historically well documented. Information on psychological status has infrequently been integrated into treatment plans for medical patients, however, in large measure because physicians in charge of these patients are unfamiliar with methods of psychological assessment and the interpretation of psychological test data. The studies cited next are a small sample of the research done with the SCL-90-R/BSI in medical cohorts, and indicate the potential value of brief measures of psychological distress for treatment planning in medical populations. Johnstone et al. (1991) reported a differential response to treatment in two groups of cancer patients on standard treatment protocols (testis vs. Hodgkins) comparable in prognosis and treatment intensity. Although both patient cohorts showed elevated SCL-90-R profiles at the beginning of treatment, Hodgkins patients revealed a marked reduction in psychological distress at a 3-month follow-up evaluation. No comparable

< previous page

page_697

next page >

< previous page

page_698

next page > Page 698

reduction in distress was apparent among testis patients, even though they had been informed that their chances for survival were quite good. Interestingly, the partners of both patient groups showed a return to normal levels of psychological distress following treatment. E.G. Levine, Raczynski, and Carpenter (1991) also used the SCL90-R as a measure in a study of weight gain among breast cancer patients undergoing adjuvant treatment. They observed significant relations between a number of SCL-90-R measures and weight gain. Global measures of distress showed a positive relation to weight; however, both Obsessive-Compulsive and Interpersonal Sensitivity subscales had significant negative coefficients in a regression equation. Shain, d'Angelo, Dunn, Lichter, and Pierce (1994) investigated psychological adjustment in women diagnosed with breast cancer who had undergone either a mastectomy or conservative surgery (i.e., lumpectomy) plus radiation therapy. All patients were evaluated postoperatively as well as at three different follow-up periods: 6, 12, and 24 months. Neither group scored in the clinical range on any of the SCL-90-R subscales at any followup interval. There also were no significant differences between the groups on the nine major symptom dimensions or global measures of distress. Although there was evidence of emotional distress among both patient groupsespecially the mastectomy group in the later assessment periodthe symptoms exhibited were not of a clinical magnitude. Tross et al. (1996) assessed the association between survival rate and psychological distress in women with stage II breast cancer. Patients were divided into high, medium, and low distress categories based on their SCL-90-R scores. After a 15-year follow-up period, GSI scores were not found to be significantly related to disease-free interval or overall survival effects across the three groups. Psychological distress levels were not effective predictors of survival rates in this patient cohort. Grassi and Rosti (1996) also examined psychological adjustment of survivors of long-term and advanced cancer, with sites including breast, stomach, lymph, kidney, and other types, over a 6-year period. From the time of initial assessment to their 6-year follow-up evaluation, patients' scores on the Interpersonal Sensitivity, Paranoid Ideation, and Psychoticism subscales were significantly lowered. Patients with a DSM-III-R diagnosis at follow-up were found to score significantly higher on the majority of SCL-90-R subscales compared to patients without a diagnosis. These findings suggest patients with early psychological maladjustment are more likely to sustain emotional difficulties later. Fricchione et al. (1992) evaluated psychological distress patterns among patients with end-stage renal disease who had been identified as high and low deniers. Significantly reduced scores were in evidence among high deniers compared to low deniers on the majority of SCL-90-R subscales and globals. The treatment implications for the detection and treatment of mood disorders among the high deniers were discussed and interpreted. Malec and Neimeyer (1983) used the SCL-90-R among spinal cord injured (SCI) patients with the anticipation of predicting length of inpatient rehabilitation and quality of performance of self-care at discharge. Results of the study showed the Depression subscale to be the best predictor of length of stay, whereas the GSI had the highest (inverse) correlation with a discharge self-care rating. Malec and Neimeyer recommended brief psychological measures as having substantial utility for treatment planning in SCI patients. In discriminating distress levels within the same condition, Sullivan et al. (1988) used the SCL-90-R to contrast patients suffering from tinnitus who were diagnosed as depressed, from tinnitus patients who were free of depression, and hearing-impaired controls. All SCL-90-R measures significantly discriminated the tinnitus plus major depression group from both of the other two samples.

< previous page

page_698

next page >

< previous page

page_699

next page > Page 699

D.E. Stewart, Reicher, Gerulath, and Boydell (1994) used the BSI to investigate women experiencing symptoms of vulvodynia both with and without a known physical cause. In addition, they compared vulvodynia patients with women with other forms of vulvar pathology and with physically well women. All women with vulvodynia or vulvar pathology had significantly higher scores on the BSI than did healthy women. Women with vulvodynia, regardless of cause, had significantly higher scores on the Somatization and Anxiety subscales of the BSI. Furthermore, vulvodynia patients without discernible physical cause had significantly higher levels of anxiety than those with identified physical pathology. Overall, all patients experiencing genital problems evidenced notable psychological distress, with highest levels of mental discomfort found with the essential vulvodynia patients. These findings strongly suggest that psychological issues be addressed in treating this condition. Dew et al. (1994) sought to identify psychosocial factors associated with distress in cardiac transplant patients. Using the SCL-90-R at 2, 7, and 12 months postsurgery, these researchers found significantly elevated scores on the Anxiety subscale at both 2- and 7-month follow-up assessments; however, levels dropped close to normative levels by 12-month postsurgery. Depression levels were also significantly elevated at 2 months; however, at 7 and 12 months they decreased to normative levels. Overall, Anxiety and Depression scores significantly improved over time; however, patients whose scores were in the clinical range at the initial 2-month assessment retained elevated distress levels at postevaluations. In considering other psychosocial variables, history of depression and/or anxiety disorders, low family caregiver support, and diminished feelings of mastery were associated with greater susceptibility to increased depression and anxiety during the 12-month recovery period. In addition to these variables, high use of avoidance coping strategies and younger age were found to be specifically related to anxiety levels, whereas life events involving loss were found to predict depressive symptoms. Investigators concluded that psychosocial variables should be assessed prior to heart transplant surgery to ensure effective psychological treatment in potentially emotionally vulnerable patients. HIV/AIDS Several recent studies have employed the BSI to investigate psychological factors associated with HIV infection. Kennedy, Skurnick, Foley, and Louria (1995) examined psychological distress among heterosexual couples with at least one HIV positive partner. Contrary to prediction, family support was not found to play a role in emotional distress. Gender was the only variable found to significantly affect psychological well-being such that females had higher elevations on all BSI subscales than males. This was true for both HIV positive and HIV negative females with HIV positive male partners. It is assumed from these findings that women in a relationship affected by HIV have greater difficulties coping than men. Hopefully, awareness of these psychological vulnerabilities will influence clinicians to institute specific treatment for HIV patients and their partners. Research on HIV positive and negative homosexual men with a diagnosable personality disorder was conducted by J.G. Johnson, Williams, Rabkin, Goetz, and Remien (1995). HIV positive men with personality disorders indicated significantly more psychological distress on BSI Depression, Anxiety, and the GSI than HIV negative men and men without a personality disorder. Furthermore, one third of those HIV positive men with personality disorders (N = 21) also had a comorbid Axis I disorder. It was

< previous page

page_699

next page >

< previous page

page_700

next page > Page 700

concluded by J.G. Johnson et al. (1995) that the presence of both HIV and a personality disorder may enhance vulnerability to concurrent Axis I clinical disorders, particularly anxiety and depression. Sexual Victimization Psychological well-being has also been found to be dramatically affected by sexual victimization. The utility of both the SCL-90-R and the BSI have been demonstrated repeatedly in this important clinical area. Frazier and Schauben (1994) investigated the stressors experienced by college age females in adjusting to the transition to college life. The most commonly reported problems were test pressure, financial problems, personal rejection, relationship dissolution, and academic failure. Significant correlations were found between the number of stressors and the total degree of stressfulness and increased psychological symptoms. In addition, survivors of sexual victimization had higher total scores on the BSI. Ethnic differences were also indicated such that Asian Americans revealed an increased number of stressors, higher levels of stress, and more psychological symptoms on the BSI. Asian American students scored higher on the Somatization, Phobic Anxiety, ObsessiveCompulsive, Psychoticism, Depression, and Paranoid Ideation subscales than did European American students. In order to provide normative data for the BSI with this population, Bennett and Hughes (1996) recently examined female college students who were victims of sexual abuse. Results of the study indicated that sexual abuse victims had significantly greater GSI scores and increased adjustment problems compared to individuals without an abuse history. These researchers found that college females who have been abused revealed symptom profiles similar to individuals undergoing psychological treatment. Chronic pain is another condition observed to be interactive with sexual victimization. Toomey, Seville, Mann, Abashian, and Grant (1995) assessed a heterogenous group of chronic pain patients and found that those patients with a history of sexual abuse scored higher overall on the SCL-90-R than did nonabused patients. This suggests that previous history of abuse sensitizes individuals, who subsequently experience greater psychological distress than individuals without such a history. Furthermore, this sensitization may also be manifested in site-specific chronic pain syndromes. Results of this study make evaluation of sexual abuse a critical issue with this population of patients. Similar findings were reported by Walker et al. (1995), who also found that female patients with chronic pelvic pain evidenced significantly higher symptomatic distress levels compared to patients without pain. In fact, the mean score for chronic pelvic pain sufferers fell in the 60th percentile of psychiatric outpatient norms on the majority of SCL-90-R subscales. The pain group also was found to have a history of diagnosable psychiatric disorders, especially major depression, as well as somatization disorder, drug abuse, phobia, and sexual dysfunction. They also revealed a significantly greater incidence of sexual abuse as compared to a nonpain (tubal ligation) group. Coffey, Leitenberg, Henning, Turner, and Bennett (1996) investigated 192 women with a history of childhood sexual abuse and whether their methods of coping with victimization resulted in healthy psychological adjustment. Women who had been sexually abused revealed a higher GSI score on the BSI than women in the nonabused control group, and a greater proportion of their BSI subscale scores in the clinical range. These findings tend to support the idea that women with a history of sexual abuse tend to experience greater difficulties with psychological adjustment in general. In terms of

< previous page

page_700

next page >

< previous page

page_701

next page > Page 701

coping strategies, most victims of sexual abuse utilized methods of disengagement, a strategy that clearly contributed to higher degrees of psychological distress. Coffey et al. suggested that it is important to appreciate how specific methods of coping with sexual abuse, especially disengagement, can be ultimately counterproductive, and lead to greater distress and poorer adjustment. Norris and Kariasty (1994) took somewhat of a different approach to studying criminal victimization by comparing psychological distress levels of violent crime victims (i.e., rape), property crime victims (i.e., burglary), and nonvictims. According to scores on the majority of BSI subscales, victims of violent crimes reported significantly higher levels of distress (at least one standard deviation above the norm) than the property crime victims, followed by those of the noncrime group, which remained within the normative range for community adults. It is evident from these data that victims of violent crimes experience a significant psychological trauma, which is reflected in a broad range of symptoms that are also manifest at more dramatic distress levels. Subsequent analyses revealed a significant interaction between type of crime and time passage. Evaluations at 3, 9, and 15 months postcrime indicated that symptom reduction occurred for both crime groups, of a magnitude that almost returned distress levels to those of the noncrime group by 15 months. Not surprisingly, most of the improvement occurred during the first 9 months, with little substantial reduction in distress levels subsequently. Victims continued to exhibit related psychological distress 15 months postcrime, suggesting that spontaneous improvement would be unlikely to occur beyond this time period. Health Systems Planning The SCL-90-R and BSI have also been utilized effectively in treatment planning studies with a health care systems orientation. Saravay, Steinberg, Weinschel, Pollack, and Alovis (1991) used it to evaluate the impact of psychological morbidity and length of stay (LOS) in the general hospital. SCL-90-R Depression, Anxiety and Global scores were significantly correlated with length of stay, although psychiatric diagnosis did not predict LOS. Saravay, Pollack, Steinberg, Weinschel, and Habert (1996) also recently reported on a 4-year follow-up study of psychiatric comorbidity in medical patients. Among major findings, these investigators observed patients with elevated scores on the Interpersonal Sensitivity or Depression dimensions of the SCL-90-R at admission spent twice as many days rehospitalized, whereas patients with elevated Hostility scores experienced twice as many readmissions. Katon et al. (1990) used the SCL-90-R to define "highly distressed" patients among 767 high health care utilizers in a large HMO. Fifty-one percent of the sample fit the criterion. Not only did these patients make disproportionate use of health care facilities, they also revealed a high prevalence of chronic medical problems, experienced significant limitation of activities associated with their illnesses, and had substantially elevated prevalence of major depressive disorder, dysthymia, and anxiety disorders. From a somewhat analogous perspective, Drossman et al. (1991) evaluated the nature of health care behavior in a sample of almost 1,000 patients with inflammatory bowel disease. In this study, the SCL-90-R was found to have significant predictive value in a regression model predicting number of physician visits during the previous 6 months. Perhaps the most dramatic study of this type was a 6-month follow-up study reported by Allison et al. (1995) with a sample of 381 cardiac rehabilitation patients, referred for a variety of cardiovascular disorders and/or procedures. Using the SCL-90-R, these

< previous page

page_701

next page >

< previous page

page_702

next page > Page 702

investigators partitioned their cohort into "high psychological distress" versus low distress groups. Comparisons across the 6-month interval revealed that the high distress group had significantly higher rates of cardiac rehospitalization and recurrent cardiac events compared to the low distress group. More striking, however, was the fact that the mean rehospitalization costs for the high distress patients was almost five times the mean cost of the low distress group (i.e., $9,504 vs. $2,146). Allison et al. concluded that psychological distress has an obvious adverse impact on coronary patients, and systematic assessment programs should be instituted to accomplish successful identification and appropriate treatment of these patients. Fontana and Rosenheck (1997) used the BSI in a study designed to evaluate the outcomes and relative costefficiency of three Veteran's Administration (VA) models of inpatient treatment for PTSD: long-stay, specialized PTSD inpatient units; short-stay, evaluation and brief treatment PTSD units; and nonspecialized general psychiatric units. Data were developed from almost 800 patients in 10 units across the country. Results showed that all three treatment models produced significant improvement at discharge; however, during follow-up, patients treated in long-stay units showed more dramatic reemergence of symptoms and deterioration in social functioning than patients treated under the other two conditions. In addition, long-stay units were revealed to be 82.4% and 53.5% more expensive, respectively, than short-stay and general psychiatric units, and the latter showed high levels of patient satisfaction. Fontana and Rosenheck suggested that a restructuring of the VA's approach to the treatment of PTSD could result in both improved efficacy for PTSD patients at a considerable cost savings. The SCL-90-R and the BSI as Treatment Outcome Measures The ideal outcomes instrument will be highly sensitive to a broad range of treatment interventions and will demonstrate sensitivity to change along the entire spectrum of psychological disregulation, from mild stress states in community samples to dramatic psychopathology in institutionalized individuals. Limitations in sensitivity, either qualitative or quantitative, along the psychological distress continuum can seriously constrain the usefulness of an outcomes measure. In a like manner, desirable outcomes instruments are sensitive to changes induced by a wide variety of therapeutic interventions, and are not limited to narrow therapeutic modalities. The following sections endeavor to demonstrate the extremely broad sensitivity to the SCL-90R/BSI, both to the broad continuum of psychological disregulation, and to a wide spectrum of traditional and nontraditional therapeutic interventions. For a psychological test to achieve optimal utility as an outcomes measure, it should possess the capability of documenting the test respondent's status in meaningful clinical terms. Test scores, in and of themselves, are insufficient to communicate real-world status because the constructs that psychological tests serve to operationalize (e.g., depression, anxiety, quality of life) are nontangibles and their test scales carry no intrinsic meaning. This means that good psychological outcomes measures must have highly representative welldeveloped norms, which enable the interpretation of a patient's score or change of status in meaningful actuarial terms. Well-constructed norms are designed to communicate the probabilistic expectation of a particular test score in the referent population of interest (e.g. community adults, psychiatric outpatients), and to help establish the phenomenologic meaning of any changes that have taken place.

< previous page

page_702

next page >

< previous page

page_703

next page > Page 703

An important refinement of any such library of norms, at least concerning psychopathology, is that they be gender-keyed. It is very well-established (although often overlooked) that men and women are distinct in their reports of emotional distress and psychological symptoms, with women being much more willing to acknowledge emotional distress. Norms that are not gender-keyed fail to take these powerful effects into account, which can lead to distorted interpretations. Another important aspect of valid outcomes measurement concerns the distinction between statistically and clinically significant change. It has been apparent for some time now (Garfield, 1981; Jacobson, 1988; Jacobson, Follette, & Ravenstorf, 1984) that significant differences defined on a purely statistical basis are not synonymous with clinically meaningful differences. This situation has been accepted in large measure because no realistic solution to the problem has been offered. Recently, however, some notable advances have been made concerning this dilemma. Jacobson and Truax (1991) proposed a dual-criterion method for determining the clinical significance of therapeutically induced change. Optimal application of their technique requires that norms be available for both "normal" or community individuals and the "clinical" group (e.g., psychiatric outpatients, inpatients, etc.) under evaluation. The dual-criterion for clinically significant change requires that the patient return to normal functioning, and a reliable magnitude of change take place. Meeting the former criterion relies on establishing a cutoff score for discriminating "functional'' from "dysfunctional" status, a value determined from the test's normative distributions. A reliable change index (RC) is calculated based on the standard error of the difference between an individual's pre- and posttreatment scores. Only if an individual's posttreatment score crosses the cutoff into the functional distribution and exceeds the reliable change index is the change considered clinically significant. The SCL-90-R and BSI are among the very few instruments currently available with carefully constructed, genderkeyed norms, based on actuarially accurate area T-scores. With norms available for community adults, community adolescents and inpatient and outpatient psychiatric patients, the SCL-90-R/BSI are among the few psychiatric outcomes measures that enable calculation of clinically significant change across a broad spectrum of clinical populations. Both Jacobson and Truax (1991) and Lambert (1994) provided more detailed discussions of this important evolving methodology. Outcomes in Clinical Drug Trials Pharmacotherapeutic drugs represent one of the cornerstones of the modern treatment of psychiatric disorders, both in conjunction with psychotherapeutic approaches and as sole interventions in these conditions. Drug-drug comparisons are often quite demanding of measurement when similar drugs are involved (L.R. Derogatis, Bonato, & Yang, 1968), because the power to detect a true difference is directly related to effect size (small in such comparisons) and inversely related to error of measurement. Although sensitivity to drug-drug comparisons represents an extremely rigorous standard for outcomes instruments, an acceptable outcomes measure must at the very least be sensitive to drug versus placebo differences. The selective review that follows attests to the fact that the SCL-90-R and BSI have proven their value as primary outcomes measure in drug trials for over 20 years, and have accrued substantial utility and validity in this capacity. As examples, Ravaris, Robinson, Ives, Nies, and Bartlett (1980) used the SCL-90-R in the first definitive double-blind controlled trial comparing a monoamine oxidase

< previous page

page_703

next page >

< previous page

page_704

next page > Page 704

inhibitor (MAOI) with a tricyclic antidepressant (TCA). These investigators compared the tricyclic antidepressant amitriptyline to the MAO inhibitor phenelzine in the treatment of 105 outpatient depressives. Results demonstrated both drugs to have significant efficacy beyond placebo in reducing symptomatic distress over the 6 weeks of the trial, and in drug-drug comparisons phenelzine proved significantly better than amitriptyline in reducing anxiety. The study showed by comparisons with community norms for the SCL-90-R that although distress was significantly reduced at the end of 6 weeks, it remained elevated above normal levels. A number of additional trials using the SCL-90-R have also demonstrated the efficacy of phenelzine. Soloff et al. (1993) compared phenelzine with haloperidol and placebo in a randomized, double-blind trial with a sample of hospitalized borderline personality disorders. Phenelzine was found to be superior to haloperidol in the treatment of these patients on multiple SCL-90-R dimension and global scores. McGrath, Stewart, and Nunes (1993) also utilized the SCL-90-R in a comparison of phenelzine and imipramine in an intervention with treatment-refractory depressed outpatients. Analysis of symptomatic response showed that 67% of those patients who were refractory to imipramine showed clinical improvement on phenelzine. The SCL-90-R also served as one of the principal outcome measures in a large multicenter trial (Ballenger et al., 1988) evaluating the efficacy of alprazolam in the treatment of agoraphobia and panic disorder. In this study, the scale demonstrated substantial efficacy for alprazolam compared to placebo. More recently, Woodman and Noyes (1994) used the BSI to evaluate the efficacy of divalproex sodium in the treatment of panic disorder. All patients were moderately to markedly improved, with a large majority showing sustained improvement at a 6month follow-up. In a strong demonstration of the SCL-90-R's sensitivity to drug effects, Noyes et al. (1984) reported a double-blind crossover comparison of diazepam (Valium) versus the beta-blocker propranolol (Inderal) in the treatment of panic-driven agoraphobia. SCL-90-R measures of Anxiety, Phobic Anxiety, and the GSI showed very significant efficacy for diazepam against propranalol in this study with no placebo group. In a more specialized patient setting, S. Levine, Anderson, Bystritsky, and Barton (1990) used the SCL-90-R in a small sample of HIV patients with major depressive syndrome who were treated with fluoxetine (Prozac). They observed significant improvement on almost all SCL-90-R measures over the 4 weeks of active treatment, and treatment gains were sustained at a 2-month follow-up. Walsh, Hadigan, Devlin, Gladis, and Roose (1991) also used the SCL-90-R in a three-phase evaluation of the efficacy of desipramine in the treatment of depressed bulimics. Four of the primary dimension scores and the GSI revealed significant efficacy for the active drug over placebo. Also working with bulimic patients, this same research group (Walsh et al., 1997) used the SCL-90-R to evaluate the efficacy of combining fluoxitine with cognitive behavioral therapy (CBT) and supportive psychotherapy in a randomized, placebo-controlled trial. Their findings showed CBT plus fluoxetine to be the treatment of choice, with clear superiority over other treatments and treatment combinations. Perse, Greist, Jefferson, Rosenfeld, and Dar (1987) employed the SCL-90-R to assess the efficacy of fluvoxamine compared to placebo to treat obsessive-compulsive disorder in a 20-week, double-blind crossover design. Eighty-one percent on the active drug versus 19% on placebo improved, with multiple SCL-90-R scales, particularly the Obsessive-Compulsive dimension, demonstrating efficacy. Focusing on cost-efficiency/cost-benefit issues, Marder et al. (1984) contrasted the costs versus benefits in a double-blind comparison of 5 mg versus 25 mg of the depot neuroleptic fluphenazine decanoate with schizophrenic outpatients. The SCL-90-R was

< previous page

page_704

next page >

< previous page

page_705

next page > Page 705

used to assess symptomatic distress, and patients were followed for 1 year. Analyses of symptom data at 1 month and 3 months postinitiation showed the high dose group to have significantly higher levels of distress on a number of SCL-90-R subscales. In addition, drug side effects were more severe in the high dose group and relapse percentages were no better. No advantage was found for continuing the high dose regimen. More recently, psychopharmacologists have increasingly attempted to treat Axis II personality disorders with pharmacologic agents. Consistent with this posture, Teicher et al. (1989) reported on an open trial of low dose thioridazine (Mellaril) in the treatment of borderline personality disorder. The SCL-90-R was utilized as a selfreport measure of psychopathology. Results showed significant reductions in many SCL-90-R subscales, particularly for the subgroup that completed the full 12 weeks of the study. Similarly, Cornelius, Soloff, Perel, and Ulrich (1990), theorizing borderline personality disorder to be a condition based in deranged seratonin regulation, utilized the SCL-90-R as an outcome measure in an 8-week trial of the seratonin uptake inhibitor fluoxetine (Prozac). The majority of SCL-90-R measures were sensitive to a therapeutic effect for the drug over the 8-week period. Having relevance for Axis II conditions, Karterud and his colleagues (Karterud et al., 1995) proposed an SCL-90-R derived index of severity of personality disorders. They maintained that the mean score of the aggregate Interpersonal Sensitivity, Hostility, and Paranoid Ideation dimensions (which they termed the Personality Severity Index, PSI) can serve as a reliable and valid measure of Cluster A and Cluster B personality disorder severity. In an additional fluoxetine study, Kim and Dysken (1990) used the SCL-90-R as an outcome assessment in a 12week open trial with obsessive-compulsive disorder (OCD). Focusing on the Obsessive-Compulsive subscale of the instrument, these investigators found significant reductions in symptoms of OCD from baseline to treatment completion. Although clomipramine (Anafranil) is noted for its therapeutic effects in obsessive-compulsive disorder, Judd et al. (1990) utilized the drug in a therapeutic trial with patients suffering from panic disorder. In an 8-week treatment trial, the SCL-90-R reflected significant reductions in distress on most subscales, particularly on the Anxiety dimension. Similarly, Kahn et al. (1987) contrasted clomipramine with 5hydroxytryptophan in an eight-week double-blind, placebo-controlled trial with mixed anxiety disorders. Both drugs were highly superior to placebo, and clomipramine showed significantly greater efficacy in treating depressive symptomatology in these patients. An analogous study comparing clomipramine with fluvoxamine in the same population revealed both drugs to be superior to placebo, but in drug-drug comparisons clomipramine was also superior to fluvoxamine on a number of SCL-90-R measures. Taken together these studies not only demonstrate the requisite sensitivity to drug-placebo comparisons essential for a psychopharmacologic outcomes measure, but reveal the more demanding capacity to identify differences between active pharmacotherapeutic drugs of the same functional class. Psychotherapy Outcomes Psychotherapeutic efficacy, in both absolute and relative terms, is an issue of major interest in contemporary health care. Does psychotherapy work? Does one psychotherapy work better than another, and if so, for whom? Does the incremental benefit of adding psychotherapy to a drug treatment regimen justify the additional costs? Is psychotherapy more effective than drugs for some disorders, and if so which? These

< previous page

page_705

next page >

< previous page

page_706

next page > Page 706

are all pressing questions with high relevance for today's health care. In order to obtain answers to these questions, numerous outcomes studies have been conducted assessing the efficacy and effectiveness of psychotherapy. Many of them have utilized the SCL-90-R/BSI as primary outcomes measures in their evaluations, and the commentary that follows selectively reviews this important literature. Previously mentioned in the context of treatment planning, the recent meta-analysis of brief dynamic psychotherapy (BDP) studies reported by Crits-Cristoph (1992) represents a convincing demonstration of the sensitivity of the SCL-90-R to psychotherapy outcomes. Aggregating over almost one dozen studies, this analysis highlighted sensitivity to BDP efficacy in comparison to waiting list control where effects were large (d =. 82), revealed small effects in comparison to nonpsychiatric interventions (d = .20), and shared equivalent effects when compared to alternative psychotherapies (d = .05). These results are consistent with those of earlier trials. For example, M.J. Horowitz, Marmar, Weiss, Dewitt, and Rosenbaum (1984) studied the efficacy of BDP with bereaved individuals. They found SCL-90-R Anxiety and Depression subscales and globals were highly sensitive to treatment-induced improvement. They further noted that magnitude of distress reduction was significantly correlated with baseline distress levels. Findings showing considerable consistency with those already discussed have been reported in the large and well-designed series of British psychotherapy studies known as the Sheffield Psychotherapy Projects. In the first of these (Shapiro & Firth, 1987), depressed and anxious patients were randomly assigned to either cognitive behavioral (CBT) or BDP for successive 8-week periods in a crossover design. The SCL-90-R was used as a primary outcomes measure and showed that both interventions effectively reduced distress, although CBT proved slightly more effective. Further analysis of follow-up data (Shapiro & Firth-Cozens, 1990) showed a correlation of .64 (p < .01) between treatment completion and 2-year follow-up assessments on the SCL-90-R. In the second project (Shapiro, Barkham, Hardy, & Morrison, 1990; Shapiro et al., 1994), 120 white-collar professionals suffering from depression received either 8 or 16 sessions of one or the other treatment, in a 2 × 2 design. Findings indicated substantial improvement for both interventions across durations of approximately the same magnitude. No differences in time or magnitude of effect were evident. In a third Sheffield replication (Barkham et al., 1996) contrasting CBT and BDP over 8 versus 16 weeks, in this instance across three levels of severity of depression, findings were again similar. The two types of therapy revealed approximately equivalent effects, with few differences between 8 and 16 weeks of treatment. At the 3-month and 1-year follow-ups, however, the SCL-90-R revealed a significant recurrence of symptoms in patients undergoing both therapy durations that was not apparent in the second Sheffield study. The SCL-90-R has also demonstrated sensitivity to the therapeutic impact of less typical methods of intervention. Bohachick (1984) reported significant reductions in distress among a cohort of hypertensives exposed to the addition of a progressive relaxation paradigm to their standard exercise regimen compared to exercise only. Carrington et al. (1980) compared two distinct meditation techniques to progressive relaxation and waiting list control in a sample of 154 self-defined high stress individuals. Evaluations at the end of 6 months on the SCL-90-R revealed the two meditation techniques to be significantly better than progressive relaxation at reducing symptomatic distress. In one of the more unusual therapeutic outcome studies in the professional literature, Griffith, Mahy, and Young (1986) reported significant reductions in symptomatic distress as a result of participation in the West Indian ritual of Spiritual Baptist "mourning." With the exception of the Somatization dimension, all SCL-90-R measures

< previous page

page_706

next page >

< previous page

page_707

next page > Page 707

showed significant efficacy for the solitary contemplative experience in reducing psychological symptomatic distress. The SCL-90-R has also been utilized to evaluate alternatives to dynamic psychotherapies. Fairburn et al. (1991) contrasted two variations of CBT and interpersonal therapy in a sample of bulimic patients. All three interventions showed efficacy on the SCL-90-R from admission to treatment termination, but none of the interventions showed superiority. Beck and colleagues (Beck, Stanley, Baldwin, Deagle, & Averill, 1994) used the SCL-90-R to establish the relative efficacy of CBT versus relaxation training and a minimum contact condition for the treatment of panic disorder in a small group format. At the end of 10 weeks, the Phobic Anxiety dimension revealed significant reductions in the CBT group compared to the other two interventions. Shear et al. (1994) also compared CBT with nonprescriptive reflective treatment over 12 treatment sessions in a sample of patients with panic disorder. SCL-90-R profiles at discharge and at 6 months follow-up showed the two interventions to be equally effective at reducing symptomatic distress, a finding shared with other outcomes measures. At follow-up, however, the SCL-90-R demonstrated continued improvement for the nonprescriptive reflective treatment, a finding not shown by other measures. A psychotherapy outcomes study with interesting implications for both the clinical and fiscal aspects of psychological treatment interventions was recently reported by Kopta, Howard, Lowrey, and Beutler (1994). Using the psychotherapy dosage model originally postulated by Howard, Kopta, Krause, and Orlinsky (1986), which operationalizes effect as the probability that a given score was derived from a normal population, these investigators administered the SCL-90-R/BSI to 854 psychotherapy outpatients at intake and during treatment. Jacobson and Truax's method (1991) was used to define clinically reliable change, and symptoms were partitioned into three classes on the basis of probit analysis: acute distress, chronic distress, and characterologic symptoms. Acute distress symptoms demonstrated the highest percent recovery across all doses (68%-95% after 52 weeks), whereas chronic distress symptoms revealed the fastest average response rate. Characterologic symptoms showed the slowest rate of response, with a number of these symptoms demonstrating less than a 50% chance of recovery after 52 weeks. For all symptoms, the percentages of patients recovered with selected doses were calculated. Consistent with previous work, these researchers found improvement was proportionally greater early in treatment, with diminishing benefits as treatment progressed to higher dose levels. Anxiety, Depressive, and Somataform Disorders There is compelling evidence (L.R. Derogatis & DellaPietra, 1994; L. R. Derogatis & Wise, 1989) that anxiety and depressive disorders account for between 75% and 80% of the psychiatric conditions seen in either the community or primary care practice. Many of the manifestations of these disorders are somatic in nature, which causes them to be easily confused with medical disorders and somataform psychiatric conditions (Kirmayer, Robbins, Dworkin, & Jaffe, 1993; Simon & Von Korff, 1991). Because many authorities estimate that depressive disorder is the most prevalent clinical problem in primary care (Katon & Sullivan, 1990) a section on outcomes specific to these disorders has been included. The SCL-90-R has been used as an outcome measure in many studies focused on depression. Weissman, et al. (1977) used the instrument to characterize primary versus secondary depressions, and the same research group used the SCL-90-R in an

< previous page

page_707

next page >

< previous page

page_708

next page > Page 708

epidemiologic study of depression in five psychiatric populations (Weissman, Sholomskas, Pottenger, Prusoff, & Locke, 1977). Quitkin et al. (1984) applied the test as an outcome measure in a treatment trial of 1-deprenyl in atypical depressives and found it sensitive to drug-placebo differences on a number of dimensions. Meanwhile, Wetzler et al. (1990) profiled differences between depressed and panic patients on the SCL-90-R, and Stewart, Quitkin, M. Terman, and J.S. Terman (1990) contrasted atypical depressions with seasonal affective disorders on the scale. In addition, Bryer, Borelli, Matthews, and Kornetsky (1983) used the SCL-90-R in a depressed sample to predict suppressors versus nonsuppressors on the dexamethasone suppression test (DST). Employing discriminant function analysis, these investigators were able to correctly predict DST status in 73% of cases. Working with a community cohort of young adults, Angst and Dobler-Mikola (1984) reported discriminating among three groups of depressives with the SCL-90-R partitioned according to frequency and duration of episodes. Discriminations among groups were made at both dimension score and item levels. In an interesting predictive study, Robinson, Olmsted, and Garner (1989) found that they could predict from elevated SCL-90-R scores during the second trimester of pregnancy which women would have difficulties adjusting at 1 year postpartum. In an informative review, Katon and Sullivan (1990) examined depression occurring among chronic medical populations, and enumerated a number of studies done with the SCL-90-R. More recently, Wetzler, Khadivi, and Oppenheim (1995) compared the psychological assessments of bipolar versus unipolar depressives on the SCL-90-R as well as the MMPI and the Millon Clinical Multiaxial Inventory (MCMI). Interestingly, in spite of a great deal of clinical anecdote about phenomenologic differences between bipolar and unipolar depressions, no consistent differences were found in their mean profiles on any of the three measures. McCullough and his colleagues (McCullough et al., 1994) replicated their study of an untreated sample of community dysthymics, again including the SCL-90-R as part of their assessment battery. Twentyfour dysthymics were followed for 1 year, with three showing spontaneous remission at the end of that period. At the end of a subsequent 4-year follow-up, one of the three remissions had relapsed. Symptom profiles were extremely constant over the study period, leading McCullough et al. to again conclude that dysthymia is an enduring chronic disorder with insidious onset and problematic social functioning and levels of symptomatic distress. The BSI has also been utilized in outcomes studies of depression. Amenson and Lewinsohn (1981) used the BSI in a longitudinal prevalence study of depression. They observed, consistent with many other investigators, that the prevalence of depressive phenomena was higher for women than for men no matter which indicators they used. They also noted that women with previous histories of depression were much more likely to experience recurrences of their depression than were men with similar histories. Buckner and Mandel (1990) also utilized the BSI in a prospective study of young adult psychoactive drug users in an attempt to establish their risk for developing depression. The results of their analyses identified methaqualone use as being a reliable predictor of depression, along with low self-esteem and negative life events. In the case of anxiety disorders, Cameron et al. (1986) used the SCL-90-R to profile patients with distinct DSMIII anxiety disorders. This same research group (Cameron & Hudson, 1986) employed the instrument in an engaging study to evaluate the influence of exercise on severity of anxiety in patients diagnosed with anxiety disorders. Thirty-one percent of patients with panic attacks were exercise sensitive, compared to only 7% of other patients. The SCL-90-R Anxiety and Phobic Anxiety subscales were particularly effective in making this discrimination.

< previous page

page_708

next page >

< previous page

page_709

next page > Page 709

Ae Lee and Cameron (1986) evaluated the relation between Type-A behavior, symptom distress patterns, and family history of coronary heart disease among males and females with anxiety disorders. Significant correlations between SCL-90-R Anxiety and Hostility scores and Jenkins Activity Scale (JAS) Type-A scores were observed among males, but not among female patients. These same investigators (Ae Lee, Cameron, & Greden, 1985) also used the instrument to evaluate the relation between caffeine consumption and the experience of anxiety in anxious patients. They discovered that severity of anxiety was not related to amount of caffeine consumption, but that the subset of patients who reported becoming anxious in response to drinking coffee had higher SCL-90-R Anxiety, Somatization, and Phobic Anxiety scores than those who did not, even though their daily consumption of caffeine was equivalent. In their comprehensive review, Katon and Roy-Byrne (1991) argued for the existence of a mixed anxietydepression syndrome. They cited strong evidence to substantiate the existence of this diagnostic syndrome, with studies involving the SCL-90-R contributing substantial confirmational data. Individuals afflicted with the condition are found to have a high incidence of medically unexplained problems, and to be proportionally greater utilizers of health care systems. They also appear to be at increased risk for more severe anxiety and mood disorders. Similarly, Clark and Watson (1991) developed a tripartite model of anxiety and depression. Based on a meta-analysis of psychometric data, they argued that at the clinical level, anxiety and depressive phenomena may be explained by a general distress factor and two specific factors of anxiety and depression. Clark and Watson mobilized an impressive body of data to support their theory, in particular noting that this pattern was very explicit in numerous studies with the SCL-90-R. Strauman (1992), working from a self-discrepancy theory model, also examined the relation between anxious, depressed, and anxious/depressed states by using the anxiety subscale of the SCL-90-R to predict specific vulnerabilities to emotional disorders. The hypothesized patterns of vulnerability (i.e., anxious vs. depressive symptoms and affects) are theoretically based in actual-ideal versus actual-ought self-discrepancies, and were strongly confirmed by the outcome of the study. In an discriminative study involving two anxiety disorders, Noyes and his associates (1992) contrasted SCL-90-R dimension and symptom scores between patients diagnosed with generalized anxiety disorder (GAD) versus those diagnosed with panic disorder (PD). The GAD patients revealed symptoms indicative of CNS hyperarousal, whereas PD patients' profiles appeared more indicative of autonomic hyperactivity. Consistent with other reports, the GAD patients tended to manifest significantly lower scores on SCL-90-R Depression, Anxiety and Phobic Anxiety scales, and to experience less overall psychological morbidity. Because the SCL-90-R/BSI are multidimensional and contain a Somatization dimension as well as Depression and Anxiety scales, they are well suited to outcomes studies of somataform conditions. For example, Rief, Hiller, Geissner, and Fichter (1995) examined the course of pathology in 30 patients with somataform disorders. Patients were assessed using the SCID and the SCL-90-R and were assigned to a variety of treatment interventions. Results indicated that patients with somataform disorders with a comorbid affective disorder (N = 24) had somatoform symptoms that persisted through the 2-year follow-up period, whereas patients without a comorbid diagnosis were more likely to remit within this time period. Overall, significant symptom reduction was observed for somatoform patients over the 2-year period as indicated by the Somatization, Depression, Anxiety, and Phobic Anxiety subscales of the SCL-90-R. Katon et al. (1990), focusing on the prognostic value of somatic symptoms, used the SCL-90-R to provide an operational definition of "high distressed-high utilizers" within

< previous page

page_709

next page >

< previous page

page_710

next page > Page 710

two large primary care practices. The high distress group was further divided into four subgroups on the basis of numbers of unexplained somatic symptoms. The investigators observed linear increases in SCL-90-R dimension scores of Somatization, Depression, and Anxiety, as well as independent diagnoses of psychiatric disorder, as they moved progressively through the somatic symptom subgroups from "low" to "high." Kellner, Hernandez, and Pathak (1992) also reported an interesting study with the SCL-90-R and somaticizing patients. These researchers related distinct dimensions of the SCL-90-R to different aspects of hypochondriasis. Although high levels of the SCL-90-R Somatization and Anxiety dimensions were observed to be predictive of hypochondriacal fears and beliefs, elevations on depression were not so. Further, they observed that fear of disease correlated most highly with the SCL-90-R Anxiety score, but that the false conviction of having a disease was more highly correlated with Somatization. Stress Outcomes Although some theorists view the construct of "stress" as simply a variant of anxiety with perhaps a more explicit environmental linkage, variations in the construct range from mild dysphoria arising from problems in daily living to a formal diagnostic entity. Posttraumatic Stress Disorder (PTSD) was conferred formal nosologic status in DSM-III (American Psychiatric Association, 1980). Addressing the more dramatic end of the stress spectrum, Horowitz et al. (1980) used the SCL-90-R to help distinguish PTSD from other anxiety-based disorders. Davidson, Kudler, Saunders, and Smith (1991) also used the SCL-90-R to profile the symptom patterns and severity of PTSD in groups of World War II versus Vietnam veterans. Vietnam vets exhibited more severe PTSD symptom and revealed higher distress scores on a number of SCL-90-R subscales. More recently, Weathers and his colleagues (Weathers et al., 1996) derived what they termed a "war-zone PTSD scale (WZPTSD)" from the SCL-90-R. They reported the subscale to have good reliability and discriminative validity, with diagnostic utility (for PTSD) better than a number of dedicated PTSD scales. A similar subscale specific for "crime-related PTSD'' was reported by Saunders, Arata, and Kilpatrick (1990). This scale was highly effective, demonstrating 89% correct assignment in a discriminant function analysis that used the Diagnostic Interview Schedule (DIS) (Robbins, Helzer, Croughan, & Ratcliff, 1981) as an external criterion. In addition to war, natural and man-made disasters have high potential for trauma, and can be extremely stressful to those who experience them. Winje (1996) reported on a longitudinal study with the SCL-90-R of the parents of children and spouses who were involved in a fatal school bus accident. The course and duration of posttraumatic symptoms were assessed at 1, 3, and 5 years after the accident. Analyses were done in terms of loss status and prior exposure to trauma. Significant proportions of the sample evidenced high levels of symptomatic distress (50%, 39%, and 42%, respectively) throughout the follow-up period. Individuals who suffered loss were not significantly more distressed than those who did not; however, individuals who had suffered previous trauma revealed a significantly smaller reduction in symptoms over time than did those free of prior traumatic experiences. Green, Grace, Lindy, Titchner, and Lindy (1983) also utilized the SCL-90-R to document residual levels of stress and functional impairment after another man-made disasterthe Beverly Hills Supper Club fire. In a fashion similar to Winje (1996), they observed significant levels of residual symptomatology and distress.

< previous page

page_710

next page >

< previous page

page_711

next page > Page 711

Fleming, Baum, Gisriel, and Gatchel (1982) used the SCL-90-R to evaluate the stress associated with a potential catastrophe at Three Mile Island after the nuclear accident. Although those exposed to possible radiation as a result of the accident had elevated levels of symptomatic distress, stress levels were found to be influenced by levels of social supports. High social support respondents essentially did not differ from controls in their levels of reported psychological distress. In a rather unique study of environmental stress, Girodo (1991) used the SCL-90-R to evaluate the stress levels of federal undercover agents. Evaluations were accomplished prior to, during, and subsequent to undercover assignments. Symptom levels were found to be most dramatic among agents currently on assignment. Active agents revealed mean symptom profiles analogous to those of psychiatric outpatients, with the exception that they manifested much lower levels of depression. Preoperational agents who had not yet been on undercover assignment had the least elevated symptom profile, with agents who had completed their assignments demonstrating an intermediate position. Transitions to new environments and responsibilities are well-known sources of stress for many, whether they be moving from one school to the next, or moving to a new country. Concerning the former, Hirsch and Dubois (1992) used the BSI to document stress levels of 143 students making the transition from elementary school to junior high. Their major finding was the observation of a strong inverse relation between levels of peer support and levels of symptomatology both at point in time and across time. Frazier and Schauben (1994) reported a somewhat analogous study with 282 female college students. In this age group, both the number of independent stressors and degree of perceived stress were highly correlated with BSI symptom scores, with major stressors being identified as bereavement, relationship dissolution, personal rejection, and poor school performance. Women who had suffered previous sexual victimization were also found to have higher symptom levels. A similar study was conducted by Lu (1994) with Taiwanese first-year college students using the Chinese version of the BSI. Results indicated that both life events and "hassles" had some predictive value regarding symptom levels, but that the personality attribute neuroticism had a major predictive effect. Although life transitions within a cultural environment certainly have the capacity to induce stress, sudden shifts in the sociocultural environment can often be even more dramatically stress inducing. Plante, Manuel, Mendez, and Marcote (1995) investigated the adjustment of a group of Salvadoran immigrants to the United States who had been displaced from their native country by the Salvadoran civil war. Using the SCL-90-R, they found significant symptom levels in this cohort, which they related to problems in adjusting to a new environment and residuals from the war in Salvador. Language skills and employment were observed to be an important part of making an effective adjustment, as were social supports and religious faith. Aroian, Patsdaughter, Levin, and Gianan (1995) also studied psychological distress levels among three immigrant groups (Filipino, Irish, and Polish) using the BSI. These investigators also conducted psychometric analyses on the translated versions of the BSI and reported very acceptable internal consistency coefficients, with the possible exception of the Psychoticism dimension. Outcomes Research in Medicine From their inception, the SCL-90-R and the BSI were designed for applications in primary care and specialized medical populations. In terms of screening, these populations almost certainly contain the highest prevalences of occult psychiatric disorder (L. R.

< previous page

page_711

next page >

< previous page

page_712

next page > Page 712

Derogatis & DellaPietra, 1994), a highly relevant fact given that the outcomes of numerous medical treatment regimens can be dramatically affected by the psychological status of patients. Snyder, Lynch, Derogatis, and Gruss (1980) reported an early study with the SCL-90-R in a family practice setting. Their research showed that those patients who had significant communications problems with their physicians also demonstrated significantly higher symptom profiles on the SCL-90-R. More recently, Weidner, S. L. Connor, Hollis, and W. E. Connor (1992) used the SCL-90-R to show significant decreases in Depression and Hostility scores were associated with reductions in serum cholesterol over the course of a 5-year dietary intervention program. Working with diabetics, Irvine, Cox, and Gonder-Fredrick (1992) observed that worry over hypoglycemia and behaviors focused on avoiding this condition were clearly correlated with elevations on multiple SCL-90-R dimension scores. The SCL-90-R/BSI have been utilized extensively in oncology. Early in its development, Craig and Abeloff (1974) utilized the SCL-90-R to demonstrate clinical levels of psychological distress in cancer patients, and Abeloff and Derogatis (1977) used the scale to describe the specific psychological symptom picture of breast cancer patients. L. R. Derogatis, Abeloff, and Melisaratos (1979) employed the SCL-90-R to show that length of survival with metastatic breast disease was distinctly related to coping style, a finding also reported by Rogentine et al. (1979) with a malignant melanoma sample. More current studies include an investigation by Hannum, Geise-Davis, Harding, and Hatfield (1991) of breast cancer patients and their spouses who were tested within a year after diagnosis. Spouses' coping skills and their ratings of the quality of their marriages were the best predictors of patients' reported levels of symptomatic distress. Roberts, Rossetti, Cone, and Kavanaugh (1992) also used the SCL-90-R in a longitudinal study of the posttreatment levels of psychological distress in gynecologic cancer patients who had survived from 1 to 19 years. These researchers found that considerable symptomatology persisted, with mean levels on many dimension scores over the 85th percentile of the community norms. The BSI has also been used extensively with oncology populations. Baider, Peretz, and Kaplan-DeNour (1992) evaluated a heterogeneous group of cancer patients who had completed treatment, some of whom were also Holocaust survivors. Consistent with other research demonstrating the vulnerability associated with previous trauma, the Holocaust survivors revealed significantly greater distress. Gilbar (1991) also used the BSI to compare a heterogeneous group of cancer patients who completed their chemotherapy regimen to a group who terminated therapy prior to completion. Among other findings, the patients who dropped out of chemotherapy scored significantly higher on Hostility and a number of other BSI scales. Gotay and Stern (1995) provided a very useful review of SCL-90-R/BSI studies in oncology. Sexual Victimization Sexual abuse and victimization is a trauma and source of distress that can convey a long-standing residual emotional vulnerability. Both physical and sexual abuse, particularly during childhood, are extremely traumatic experiences that can have dramatic psychological sequelae. As examples, Kelly (1990) reported a study dealing with the stress engendered in the parents of children who have been abused. She contrasted SCL-90-R symptom profiles of parents of children who were sexually abused, a second group whose children were ritually abused in the context of cult worship, and the parents

< previous page

page_712

next page >

< previous page

page_713

next page > Page 713

of nonabused controls. Results showed both groups of parents of abused children displayed substantially elevated profiles, with the parents of the ritually abused children being significantly more distressed than the parents of the other abuse group. Also using the SCL-90-R, Williamson, Borduin, and Howe (1991) compared the symptomatic distress of physically and sexually abused adolescents with those who had been neglected, and adolescents who had not been maltreated in any way. The SCL-90-R showed the two abuse groups to be much more dramatically distressed than controls, with the neglect group falling in between. Swett, Surrey, and C. Cohen (1990) studied abuse histories of 125 adult psychiatric outpatients with the purpose of comparing the current symptomatic distress profiles of patients with histories of abuse to those free of abuse experiences. SCL-90-R profiles of patients with histories of sexual and/or physical abuse were significantly higher than patients without such histories. Bryer, Nelson, Miller, and Kroll (1987) also studied the abuse histories of 66 female psychiatric inpatients and linked them to score profiles of the SCL-90-R. These researchers categorized patients as not abused, sexually abused, physically abused, and sexually and physically abused. Using discriminant function analysis and childhood abuse as the independent variable, they were able to correctly assign 72.7% of patients on the basis of the SCL-90-R. They also completed a multiple regression analysis with the GSI as the dependent variable. The significant predictive variables and their respective predictive variance proportions were early sexual abuse (21.4%), alcohol abuse by father (10.2%), and early physical abuse (7.3%). The total R2 = 38.9%. In addition, SCL-90-R scores for nonabused subjects were significantly below the inpatient psychiatric norm, whereas those who were both sexually and physically abused were significantly elevated on this norm. Finally, an extensive amount of research on sexual function/dysfunction has utilized the SCL-90-R/BSI as an outcome measure concerning comorbid psychopathology and psychological distress. L. R. Derogatis et al. (1981) reported on 325 sexually dysfunctional patients who had been evaluated at the Johns Hopkins Sexual Consultation Unit. Approximately 50% of the female patients and one third of the males received DSM-II and DSM-III psychiatric diagnoses. SCL-90-R profiles of these individuals were substantially elevated beyond the community norm, with many of them falling in the clinical range. Althof et al. (1991) also utilized the SCL-90R as a distress measure in a clinical trial of the locally injected vasodilators papaverine/phentolamine in the treatment of erectile disorder. The SCL-90-R profile showed significant reductions from baseline distress at 3 months and 6 months posttreatment initiation. Similarly, symptomatic distress levels in response to treatment of erectile disorder with external vacuum device were evaluated by Turner and her colleagues (1990) subsequent to 6 months of treatment. Eighty-nine percent of patients experienced success in treatment, with five SCL-90-R subscales showing significant reductions in distress. A 12-month follow-up study (Turner et al., 1991) revealed the efficacy rate holding at 87%, with six SCL-90-R measures showing significant distress reductions. The SCL-90 Analogue and Derogatis Psychiatric Rating Scale (DPRS) in Outcomes Research A specific advantage associated with the SCL-90-R/BSI concerns the fact that valid, matched clinical rating scales exist that may be used in conjunction with the self-report measures. If clinicians' judgments about the patient's psychological status are important

< previous page

page_713

next page >

< previous page

page_714

next page > Page 714

to the project of interest, then the same symptom constructs may be measured from both patient and clinician perspectives. Differences in perceptions can be accurately evaluated by comparing clinician judgments with patient self-ratings. Comparisons can be greatly facilitated by converting both sets of measurements to respective standardized scores, thereby enabling comparisons in a common metric. As mentioned previously, the SCL-90 Analogue is a clinical observer's rating scale designed specifically for the health professional without detailed training in psychopathology or mental health. The SCL-90 Analogue is brief and uncomplicated, usually requiring less than 5 minutes to complete. In addition to representations for the nine SCL-90-R symptom dimensions, the rating scale also contains an analogue global distress scale. An example of the use of the SCL-90 Analogue scale is provided by a study done by L. R. Derogatis, Abeloff, and McBeth (1976) with a small sample of cancer patients. Shortly after admission, patients completed an SCL90-R. Subsequently, the primary treating oncologist filled out an SCL-90 Analogue scale on the patient based on a clinical interview. Raw scores were converted to area T-scores for each patient on each measure, and doctorpatient difference scores (DTs) were calculated. Results showed that as physicians' ratings of global psychological distress rose, they tended to judge the patient to be increasingly distressed on Interpersonal Sensitivity and Anxiety dimensions, but viewed much less distress arising from Depression than did the patient. Analyses also demonstrated that the highest subscale correlations with the physicians' independent global ratings of patient psychological distress were on Anxiety (r = .50) and Hostility (r = .48). Correlations between the physicians' global distress ratings and the patients' self-rated global scores showed only the correlation with the PSDI (r = .43) to be significant. This result indicated that oncologists were basing their judgments much more on selective indicators of distress rather than numbers of manifest symptoms. In another study with cancer patients, Schleifer et al. (1991) used the SCL-90 Analogue to evaluate factors that affect oncologists' adherence to chemotherapy protocols. The sample consisted of 107 breast cancer patients who were followed for 26 weeks of treatment. Fifty-two percent of patients experienced an unjustified regimen modification. Physician perception of psychological distress was not a significant factor in modifying prescription in the majority of protocols; however, on the vincristine protocol, the global severity score and a number of SCL-90 Analogue subscale scores were significantly related to nonadherence. Steer and Hassett (1982) also used the SCL-90 Analogue to identify the differential weights assigned various dimensions of psychopathology in arriving at staff judgments of global severity of illness. Over 1,000 mental health patients were contrasted with 809 substance abuse clients. They found that Interpersonal Sensitivity and Psychoticism were the best predictors of global severity ratings in mental health patients, whereas Anxiety and Paranoid Ideation scores accounted for most variance among ratings of substance abusers. The DPRS has also been utilized in a variety of interesting studies. Winokur, Guthrie, Rickels, and Nael (1982) used the DPRS as a validating instrument for patients' self-ratings of psychological distress on the SCL-90-R. Approximately 60 nonpsychiatric medical patients from two settings participated in the trial. Two psychiatrists who were completely unaware of each others' or patients' self-report completed all DPRS ratings. Psychiatristpatient correlations were generally high, with Depression (r = .63), Anxiety (r =.63), and Phobic Anxiety (r = .72) showing the highest agreement. The authors reported sensitivities for the SCL-90-R Depression scale of .91 and .89 in the two groups of patients, with specificities of .78 and .85, respectively. Perconte and Griger

< previous page

page_714

next page >

< previous page

page_715

next page > Page 715

(1991) used both the DPRS and the SCL-90-R to discriminate differential treatment responders among Vietnam veterans suffering from posttraumatic stress disorder. Although the investigators did not report on levels of agreement between the two instruments, both were highly successful in discriminating successful, unchanged, and relapsing patients. Similarly, Fricchione et al. (1992) used the DPRS and the SCL-90-R to evaluate high versus low deniers among patients with end-stage renal disease. DPRS subscales of Interpersonal Sensitivity, Anxiety, and Sleep Disturbance were significantly elevated among the low deniers, as were numerous SCL-90-R scales. Conclusions The SCL-90-R/BSI, and their matching clinical rating scales represent a unique set of brief, multidimensional psychological test instruments for the assessment of psychological symptoms and psychological distress. Their successful use in hundreds of published outcomes research and clinical studies across an extremely broad spectrum of applications provides convincing confirmation of their reliability, validity, and utility. Sensitivity to pharmacologic, psychotherapeutic, and other treatment interventions, as well as to clinically meaningful variations in psychopathology and psychological distress levels, provides endorsement for these tests instruments as effective for both psychiatric screening functions and clinical outcomes measurement. The availability of the DPRS and the SCL-90 Analogue as matching clinician rating scales contributes the relatively unique capacity to obtain clinician ratings on the same symptom constructs the patient reports on. An additional advantage of this series of test instruments is that the SCL-90-R and BSI are available in over two dozen languages, and have been extensively utilized worldwide. References Abeloff, M. D., & Derogatis, L. R. (1977). Psychological aspects of the management of primary and metastatic breast cancer. In G. L. Stonsifer & E. F. Lewison (Eds.), Breast cancer. Baltimore: Johns Hopkins University Press. Ae Lee, M., & Cameron, O. G. (1986). Anxiety, type A behavior and cardiovascular disease. International Journal of Psychiatry in Medicine, 16, 123-129. Ae Lee, M., Cameron, O. G., & Greden, J. F. (1985). Anxiety and caffeine consumption in people with anxiety disorders. Psychiatry Research, 15, 211-217. Allison, T. G., Williams, D. E., Miller, T. D., Patten, C. A., Bailey, K. R., Squires, R. W., & Gau, G. T. (1995). Medical and economic costs of psychological distress in patients with coronary disease. Mayo Clinic Proceedings, 70, 734-742. Althof, S. E., Turner, L. A., Levine, S. B., Risen, C. B., Bodner, D., Kursh, E. D., & Resnick, M. (1991). Sexual, psychological, and marital impact of self-injection of papaverine and phentolamine: A long-term prospective study. Journal of Sex and Marital Therapy, 17, 101-112. Amenson, C. S., & Lewinsohn, P. M. (1981). An investigation into the observed sex difference in prevalence of unipolar depression. Journal of Abnormal Psychology, 90, 1-13. American Psychiatric Association. (1980). Diagnostic and statistical manual of mental disorders (3rd ed.). Washington, DC: Author. Angst, J., & Dobler-Mikola, A. (1984). The Zurich study: The continuum from normal to pathological depressive mood swings. European Archives of Psychiatry and Neurological Sciences, 234, 21-29.

< previous page

page_715

next page >

< previous page

page_716

next page > Page 716

Aroian, K. J., & Patsdaughter C. A. (1989). Multiple-method, cross-cultural assessment of psychological distress. ImageJournal of Nursing Scholarship, 21, 90-93. Aroian, K. J., Patsdaughter, C. A., Levin, A., & Gianan, M. E. (1995). Use of the brief symptom inventory to assess psychological distress in three immigrant groups. International Journal of Social Psychiatry, 41, 131-146. Baider, L., Peretz, T., & Kaplan-DeNour, A. (1992). Effect of the Holocaust on coping with cancer. Social Science and Medicine, 34, 11-15. Ballenger, J. C., Burrows, G. D., Dupont, R. L., Lesser, I. M., Noyes, R., Pecknold, J. C. Rifkin, A., & Swinson, R. P. (1988). Alprazolam in panic disorder and agoraphobiaresults from a multicenter trial: 1. Efficacy in shortterm treatment. Archives of General Psychiatry, 45, 413-422. Barkham, M., Rees, A., Shapiro, D., Stiles, W. B., Agnew, R. M., Halstead, J., Culverwell, A., & Harrington, V. G. (1996). Outcomes of time-limited psychotherapy in applied settings: Replicating the second Sheffield Psychotherapy Project. Journal of Consulting and Clinical Psychology, 64, 1079-1085. Beck, J. G., Stanley, M. A., Baldwin, L. E., Deagle, E. A., & Averill, P. M. (1994). Comparison of cognitive therapy and relaxation training for panic disorder. Journal of Consulting & Clinical Psychology, 62, 818-826. Beckham, J. C., Lytle, B. L., & Feldman, M. E. (1996). Caregiver burden in partners of Vietnam war veterans with Posttraumatic Stress Disorder. Journal of Consulting and Clinical Psychology, 64, 1068-1072. Bennett, S. E., & Hughes, H. M. (1996). Performance of female college students and sexual abuse survivors on the Brief Symptom Inventory. Journal of Clinical Psychology, 52, 535-541. Beutler, L. E., Engle, D., Mohr, D., Daldrup, R. J., Bergen, J., Meredith, K., & Merry, W. (1991). Predictors of differential response to cognitive, experiential, and self-directed psychotherapeutic procedures. Journal of Consulting and Clinical Psychology, 59, 333-340. Bohachick, P. (1984). Progressive relaxation training in cardiac rehabilitation: Effect on psychological variables. Nursing Research, 33, 283-287. Boleloucky, Z., & Horvath, M. (1974). The SCL-90 rating scale: First experience with the Czech version in healthy male scientific workers. Activitas Nervosa Superior, 16, 115-116. Bryer, J. B., Borrelli, D. J., Matthews, E. J., & Kornetsky, C. (1983). The psychological correlates of the DST in depressed patients. Psychopharmacology Bulletin, 19, 633-637. Bryer, J. B., Nelson, B. A., Miller, J. B., & Krol, P. A. (1987). Childhood sexual and physical abuse as factors in adult psychiatric illness. American Journal of Psychiatry, 144, 1426-1430. Buckner, C. J., & Mandel, W. (1990). Risk factors for depressive symptomatology in a drug-using population. American Journal of Public Health, 80, 580-585. Bulik, C. M., Carpenter, L. L., Kupfer, D. J., & Frank, E. (1990). Features associated with suicide attempts in recurrent major depression. Journal of Affective Disorders, 18, 27-29. Buller, R., Maier, W., & Benkert, O. (1986). Clinical subtypes in panic disorder: Their descriptive and prospective validity. Journal of Affective Disorders, 11, 105-114. Cameron, O. G., & Hudson, C. J. (1986). Influence of exercise on anxiety level in patients with anxiety disorders. Psychosomatics, 27, 720-723. Cameron, O. G., Thyer, B. A., Nesse, R. M., & Curtis, G. C. (1986). Symptom profiles of patients with DSM-III anxiety disorders. American Journal of Psychiatry, 143, 1132-1137. Canetti, L., Shalev, A. Y., & DeNour, A. K. (1994). Israeli adolescents' norms for the brief symptom inventory (BSI). Israeli Journal of Psychiatry and Related Sciences, 31, 13-18. Carey, M. P., Carey, K. B., & Meisler, A. W. (1991). Psychiatric symptoms in mentally ill chemical abusers. Journal of Nervous and Mental Disease, 179, 136-138. Carrington, P., Collings, G. H., Benson, H., Robinson, H. Wood, L. W., Lehrer, P. M., Woolfolk, R. L., & Cole, J. (1980). The use of meditation-relaxation techniques for the management of stress in a working population. Journal of Occupational Medicine, 22(4), 221-231. Chiles, J. A., Benjamin, A. H., & Cahn, T. S. (1990). Who smokes? Why? Psychiatric aspects

< previous page

page_716

next page >

< previous page

page_717

next page > Page 717

of continued cigarette usage among lawyers in Washington state. Comprehensive Psychiatry, 31, 176-184. Choquette, K. A. (1994). Assessing depression in alcoholics with the BDI, SCL-90-R, and DIS criteria. Journal of Substance Abuse, 6, 295-304. Clark, L. A., & Watson, D. (1991). Tripartite model of anxiety and depression: Psychometric evidence and taxonomic implications. Journal of Abnormal Psychology, 100, 316-336. Cochran, C. D., & Hale, W. D. (1985). College students norms on the brief symptom inventory. Journal of Clinical Psychology, 41, 777-779. Coffey, P., Leitenberg, H., Henning, K., Turner, T., & Bennett, R. T. (1996). The relation between methods of coping during adulthood with a history of childhood sexual abuse and current psychological adjustment. Journal of Consulting and Clinical Psychology, 64, 1090-1093. Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press. Cohen, L. J., Test, M. A., & Brown, R. L. (1990). Suicide and schizophrenia: Data from a prospective community treatment study. American Journal of Psychiatry, 147, 602-607. Cornelius, J. R., Soloff, P. H., Perel, J. M., & Ulrich, R. F. (1990). Fluoxetine trial in borderline personality disorder. Psychopharmacology Bulletin, 26, 151-154. Coryell, W. (1988). Mortality of anxiety disorders. In R. Noyes, M. Roth, & G. D. Burrows (Eds.), Handbook of anxiety: Vol. 2. Classification, biological factors and associated disturbances (pp. 685-698). New York: Elsevier. Craig, T. J., & Abeloff, M. (1974). Psychiatric symptomatology among hospitalized cancer patients. American Journal of Psychiatry, 131, 1323-1327. Crits-Christoph, P. (1992). The efficacy of brief dynamic psychotherapy: A meta-analysis. American Journal of Psychiatry, 149, 151-158. Croog, S. H., Levine, S., Testa, M. A., Brown, B., Bulpitt, C. J., Jenkins, C. D., Klerman, G. L., & Williams, G. H. (1986). The effects of antihypertensive therapy on the quality of life. New England Journal of Medicine, 314, 1657-1664. Davidson, J.R.T., Kudler, H. S., Saunders, W. B., & Smith, R. D. (1991). Symptom and comorbidity patterns in World War II and Vietnam veterans with posttraumatic stress disorder. Comprehensive Psychiatry, 31, 162-170. Derogatis, L. R. (1975). The SCL-90-R. Baltimore: Clinical Psychometric Research. Derogatis, L. R. (1977). SCL-90-R: Administration scoring and procedures manual I. Baltimore: Clinical Psychometric Research. Derogatis, L. R. (1990). SCL-90-R: A bibliography of research reports, 1975-1990. Baltimore: Clinical Psychometric Research. Derogatis, L. R. (1993). BSI: Administration, scoring and procedures manual3rd Ed. Minneapolis, MN: National Computer Systems. Derogatis, L. R. (1994). SCL-90-R: Administration scoring and procedures manual. Minneapolis, MN: National Computer Systems. Derogatis, L. R., Abeloff, M. D., & McBeth, C. D. (1976). Cancer patients and their physicians in the perception of psychological symptoms. Psychosomatics, 17, 197-201. Derogatis, L. R., Abeloff, M. D., & Melisaratos, N. (1979). Psychological coping mechganisms and survival time in metastatic breast cancer. Journal of the American Medical Association, 242, 1504-1508. Derogatis, L. R., Bonato, R. R., & Yang, K. C. (1968). The power of the IMPS in psychiatric drug research: As a function of sample size, number of raters, and choice of treatment comparison. Archives of General Psychiatry, 19, 689-699. Derogatis, L. R., & Cleary, P. A. (1977b). Factorial invariance across gender for the primary symptom dimensions of the SCL-90-R. British Journal of Social and Clinical Psychology, 16, 347-356. Derogatis, L. R., & DellaPietra, L. (1994). Psychological tests in screening for psychiatric disorder. In M. Maruish (Ed.), Psychological testing: treatment planning and outcome assessment (pp. 22-54). Hillsdale, NJ: Lawrence Erlbaum Associates. Derogatis, L. R., & Derogatis, M. F. (1996). The SCL-90-R and the BSI. In B. Spilker (Ed.), Quality of life and pharmacoeconomics in clinical trials (2nd ed., pp. 323-335). Philadelphia: Lippincott-Raven.

< previous page

page_718

next page > Page 718

Derogatis, L. R., Lipman, R. S., & Covi, L. (1973). SCL-90: An outpatient psychiatric rating scalepreliminary report. Psychopharmacol Bulletin, 9, 13-27. Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi, L. (1974a). The Hopkins Symptom Checklist (HSCL): A measurement of primary symptom dimensions. In P. Pichot (Ed.), Psychological measurements in psychopharmacology (pp. 79-111). Basel: Karger. Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi, L. (1974b). The Hopkins Symptom Checklist (HSCL): A self-report symptom inventory. Behavioral Science, 19, 1-15. Derogatis, L. R., & Melisaratos, N. (1983). The Brief Symptom Inventory: An introductory report. Psychological Medicine, 13, 595-605. Derogatis, L. R., Meyer, J. K., & King, K. M. (1981). Psychopathology in individuals with sexual dysfunction. American Journal of Psychiatry, 138, 757-763. Derogatis, L. R., Morrow, G., Fetting, J., Penaman, D., Piasetsky, S., Schmale, A. H., Hendrichs, M., & Carnricke, C. M. (1983). The prevalence of psychiatric disorders among cancer patients. JAMA, 249, 751-757. Derogatis, L. R., Rickels, K., & Rock, A. (1976). The SCL-90-R and the MMPI: A step in the validation of new self-report scale. British Journal of Psychiatry, 128, 280-289. Derogatis, L. R., & Spencer, P. (1982). The Brief Symptom Inventory: Administration, scoring and procedures manual1. Baltimore: Clinical Psychometric Research. Derogatis, L. R., & Wise, T. N. (1989). Anxiety and depressive disorders in the medical patient. Washington, DC: American Psychiatric Press. DeSoto, C. B., O'Donnell, W. E., Allred, L. J., & Lopes, C. E. (1985). Symptomatology in alcoholics at various stages of abstinence. Alcoholism, 9, 505-512. Dew, M. A., Simmons, R. G., Roth, L. H., Schulberg, H. C., Thompson, M. E., Armitage, J. M., & Griffith, B. P. (1994). Psychosocial predictors of vulnerability to distress in the year following heart transplantation. Psychological Medicine, 24, 929-945. Dongier, M., Vachon, L., & Schwartz, G. (1991). Bromocriptine in the treatment of alcohol dependence. Alcoholism Clinical and Experimental Research 15, 970-977. Drossman, D. A., Leserman, J., Mitchell, C. M., Zhiming, M., Zagami, E. A., & Patrick, D. L. (1991). Health status and health care use in persons with inflammatory bowel disease: A national sample. Digestive Diseases and Sciences, 36, 1746-1755. Fairburn, C. G., Jones, R., Peveler, R. C., Carr, S. J., Solomon, R. A., O'Connor, M. E., Burton, J., & Hope, A. (1991). Three psychological treatments for bulimia nervosa: A comparative trial. Archives of General Psychiatry, 48, 463-469. Fleming, R., Baum, A., Gisriel, M. M., & Gatchel, R. J. (1982). Mediating influences of social support on stress at Three Mile Island. Journal of Human Stress, 14-22. Fontana, A., & Rosenheck, R. (1997). Effectiveness and cost of the inpatient treatemt of posttraumatic stress disorder: Comparison of three models of treatment. American Journal of Psychiatry, 154, 758-765. Frazier, P. A., & Schauben, L. J. (1994). Stressful life events and psychological adjustment among female college students. Measurement and Evaluation in Counseling and Development, 27, 280-292. Fricchione, G. L., Howanitz, E., Jandorf, L., Krosesler, D., Zervas, I., & Woznicki, R. M. (1992). Psychological adjustment to end-stage renal disease and the implications of denial. Psychosomatics, 33, 85-91. Garfield, S. L. (1981). Evaluating the psychotherapies. Behavior Therapy, 12, 295-307. Gift, A. G. (1991). Psychologic and physiologic aspects of acute dyspnea in asthmatics. Nursing Research, 40, 196-198. Gilbar, O. (1991). The quality of life of cancer patients who refuse chemotherapy. Social Science and Medicine, 32, 1337-1340. Gilbar, O., & Kaplan-Denour, A. (1988). Adjustment to illness and dropout of chemotherapy. Journal of Psychosomatic Research, 33, 1-5. Girodo, M. (1991). Symptomatic reactions to undercover work. Journal of Mental Diseases, 179, 626-630. Gotay, C. C., & Stern, J. D. (1995). Assessment of psychological functioning in cancer patients. Journal of Psychosocial Oncology, 13, 123-160. Grassi, L., & Rosti, G. (1996). Psychosocial morbidity and adjustment to illness among

< previous page

page_719

next page > Page 719

long-term cancer survivors. Psychosomatics, 37, 523-532. Green, B. L., Grace, M. C., Lindy, J. D., Titchner, J. L., & Lindy, J. G. (1983). Levels of functional impairment following a civilian disaster: The Beverly Hills Supper Club fire. Journal of Consulting and Clinical Psychology, 51, 573-580. Griffith, E. H., Mahy, G. E., & Young, J. L. (1986). Psychological benefits of spiritual baptist "mourning": II. An empirical assessment. American Journal of Psychiatry, 143, 226. Hale, W. D., Cochran, C. D., & Hedgepeth, B. E. (1984). Norms for the elderly on the brief symptom inventory. Journal of Consulting and Clinical Psychology, 52, 321-322. Hannum, J. W., Giese-Davis, J., Harding, K., & Hatfield, A. K. (1991). Effects of individual and marital variables on coping with cancer. Journal of Psychosocial Oncology, 9, 1-20. Hauff, E., & Vaglum, P. (1994). Chronic posttraumatic stress disorder in Vietnamese refugees. Journal of Nervous and Mental Disease, 182, 85-90. Hirsch, B. J., & DuBois, D. L. (1992). The relation of peer social support and psychological symptomatology during the transition to junior high school. American Journal of Community Psychology, 20, 333-347. Horowitz, L. M., Rosenberg, S. E., Baer, B. A., Ureno, G., & Villasenor, V. S. (1988). Inventory of interpersonal problems: Psychometric properties and clinical applications. Journal of Consulting and Clinical Psychology, 56, 885-892. Horowitz, M. J., Marmar, C., Weiss, D. S., DeWitt, K. N., & Rosenbaum, R. (1984). Brief psychotherapy of bereavement reactions. Archives of General Psychiatry, 41, 438-448. Horowitz, M. J., Wilner, N., Kaltreider, N., & Alvarez, W. (1980). Signs and symptoms of posttraumatic stress disorder. Archives of General Psychiatry, 37, 85-92. Howard, K. I., Kopta, S. M., Krause, M. S., & Orlinsky, D. E. (1986). The dose-effect relationship in psychotherapy. American Psychologist, 41, 159-164. Hurley, J. R., & Cattell, (1962). The Procrustes program: producing direct rotation to test a hypothesized factor structure. Behavioral Science, 7, 258-262. Irvine, A. A., Cox, D., & Gonder-Fredrick, L. (1992). Fear of hypoglycemia: Relation to physical and psychological symptoms in patients with insuline dependent diabetes. Health Psychology, 11, 135-138. Jacobson, N. S. (1988). Defining clinically significant change: An introduction. Behavioral Assessment, 10, 131132. Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984). Psychotherapy outcomes research. Methods for reporting variability and evaluating clinical significance. Behavior Therapy, 15, 336-352. Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful clinical change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Johnson, J. G., Williams, J.B.W., Rabkin, J. G., Goetz, R. R., & Remien, R. H. (1995). Axis I psychiatric symptoms associated with HIV infection and personality disorder. American Journal of Psychiatry, 152, 551554. Johnson, M. E., Brems, C., & Fisher, D. G. (1996). Self-reported levels of psychopathology and drug abusers not currently in treatment. Journal of Psychopathology and Behavioral Assessment, 18, 21-34. Johnstone, B.G.M., Silberfield, M., Chapman, J., Phoenix, C., Sturgeon, J., Till, J. E., & Sutcliffe, S. B. (1991). Heterogeneity in responses to cancer: Part 1. psychiatric symptoms. Canadian Journal of Psychiatry, 36, 85-90. Judd, F. K., Burrow, G. D., Marriott, P. F., Farnbach, P., & Blair-West, S. (1990). A short-term open trial of clomipramine in the treatment of patients with panic attacks. Human Psychopharmacology, 6, 53-60. Kabat-Zinn, J., Massion, A. O., Kristeller, J., Peterson, L. G., Fletcher, K. E., Pbert, L., Lenderking, W. R., & Santorelli, S. F. (1992). Effectiveness of a meditation-based stress reduction program in the treatment of anxiety disorders. American Journal of Psychiatry, 149, 936-943. Kahn, R. S., Westenberg, H. G., Verhoeven, W. M., Gispen-De Wied, C. C., & Kamerbeek, D. W. (1987). Effect of a serotonin precursor and uptake inhibitor in anxiety disorders: A double-blind comparison of 5Hydroxytryptophan, clomipramine and placebo.

< previous page

page_719

next page >

< previous page

page_720

next page > Page 720

International Clinical Psychopharmacology, 2, 33-45. Kaiser, H. E. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187-200. Karterud, S., Friis, S., Irion, T., Mehlum, L., Vaglum, P., & Vaglum, S. (1995). An SCL-90-R derived index of the severity of personality disorders. Journal of Personality Disorders, 9, 112-123. Katon, W., & Roy-Byrne, P. P. (1991). Mixed anxiety and depression. Journal of Abnormal Psychology, 100, 337-345. Katon, W., & Sullivan, M. D. (1990). Depression and chronic medical illness. Journal of Clinical Psychiatry, 15, 3-11. Katon, W., Von Korff, M., Lin, E., Lipscomb, P., Russo, J., Wagner, E., & Polk, E. (1990). Distressed high utilizers of medical care DSM-III-R diagnoses and treatment needs. General Hospital Psychiatry, 12, 355-362. Kellner, R., Hernandez, J., & Pathak, D. (1992). Hypochondriacal fears and their relationship to anxiety and somatization. British Journal of Psychiatry, 160, 525-532. Kelly, S. J. (1990). Parental stress response to sexual abuse and ritualistic abuse of children in day-care centers. Nursing Research 39, 25-29. Kennedy, C. A., Skurnick, J. H., Foley, M., & Louria, D. B. (1995). Gender differences in HIV-related psychological distress in hetero-sexual couples. AIDS Care, 7, 33-38. Kim, S. W., & Dysken, W. W. (1990). Open fixed dose trial of fluoxetine in the treatment of obsessive compulsive disorder. Drug Development Research, 19, 315-319. Kirmayer, L. J., Robbins, J. M., Dworkin, M., & Jaffe, M. J. (1993). Somatization and the recognition of anxiety and depression in primary care. American Journal of Psychiatry, 150, 734-741. Kleinman, P. H., Miller, A. B., Millman, R. B., Woody, G. E., Todd, T., Kempt, J., & Lipton, D. S. (1990). Psychopathology among cocaine abusers entering treatment. Journal of Nervous and Mental Disease, 178, 442447. Koeter, M. W. (1992). Validity of the GHQ and SCL-90-R anxiety and depression scales: A comparative study. Journal of Affective Disorders, 24, 271-279. Kopta, S. M., Howard, K. I., Lowry, J. L., & Beutler, L. E. (1994). Patterns of symptomatic recovery in psychotherapy. Journal of Clinical and Consulting Psychology, 62, 1009-1016. Lambert, M. J. (1994). Use of psychological tests for outcomes measurement. In M. Maruish (Ed.), The use of psychological tests for treatment planning and outcomes assessment (pp. 75-97). New York: Lawrence Erlbaum Associates. Levenson, J. L., & Collins, J. B. (1991). Sexual dysfunction, social maladjustment, and psychiatric disorders in women seeking treatment in a premenstrual syndrome clinic. International Journal of Psychiatry in Medicine, 21, 189-204. Levine, E. G., Raczynski, J. M., & Carpenter, J. T. (1991). Weight gain with breast cancer adjuvant treatment. Cancer, 67, 1954-1959. Levine, S., Anderson, D., Bystritski, A., & Barton, D. (1990). Eight HIV-seropositive patients with major depression responding to fluoxetine. Journal of Acquired Immune Deficiency Syndrome, 3, 1074-1077. Lu, L. (1994). University transition: Major and minor life stressors, personality characteristics and mental health. Psychological Medicine, 24, 81-87. Liskow, B., Powell, B. J., Nickel, E. J., & Penick, E. (1991a). Antisocial alcoholics: Are there clinically significant diagnostic subtypes? Journal of Studies on Alcohol, 52(1), 62-69. Liskow, B., Powell, B. J., Nickel, E. J., & Penick, E. (1991b). Diagnostic subgroups of antisocial alcoholics: Outcome at 1 year. Comprehensive Psychiatry, 31, 549-556. Magni, E., Frisoni, G. B., Rozzini, R., De Leo, D., & Trabucchi, M. (1996). Depression and somatic symptoms in the elderly: The role of cognitive function. International Journal of Geriatric Psychiatry, 11, 517-522. Malec, J., & Neimeyer, R. (1983). Psychologic prediction of duration of inpatient spinal cord injury rehabilitation and performance of self-care. Archives of Physical and Medical Rehabilitation, 64, 359-363. Marder, S. R., Van Putten, T., Mintz, J., McKenzie, J., Lebell, M., Faltico, G., & May, P.R.A. (1984). Costs and benefits of two doses of fluphenazine. Archives of General Psychiatry, 41, 1025-1029. McCullough, J. P., McCune, K. J., Kaye, A. L., Braith, J. A., Friend, R., Roberts, W. C.,

< previous page

page_721

next page > Page 721

Belyea-Caldwell, S., Norris, S. W., & Hampton, C. (1994). One year prospective replication study of an untreated sample of community dysthymia subjects. Journal of Nervous and Mental Diseases, 182, 396-401. McGrath, P. S., Stewart, J. W., & Nunes, E. V. (1993). A double-blind crossover trial with imipramine and phenelzine for outpatients with treatment refractory depression. American Journal of Psychiatry, 150, 118-123. Mercier, C., Brochu, Girard, M., Gravel, J., Ouellet, R., & Pare, P. (1992). Profiles of alcoholics according to the SCL-90-R: A confirmative study. International Journal of Addictions, 27, 1267-1281. Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation. American Psychologist, 30, 955-966. Messick, S. (1981). Constructs and their vicissitudes in educational and psychological measurement. Psychological Bulletin, 89, 575-588. Moffett, L. A., & Radenhausen, R. (1983, August). Assessing depression in substance abusers: The SCL-90-R and Beck Depression Inventory. Paper presented at the 91st Annual Convention of the American Psychological Association, Anaheim, CA. Norris, F. H., & Kariasty, K. (1994). Psychological distress following criminal victimization in the general population: Cross-sectional, longitudinal, and prospective analyses. Journal of Consulting and Clinical Psychology, 62, 111-123. Noyes, R., Anderson, D. J., Clancy, J., Crowe, R. R., Slymen, D. J., Ghoneim, M. M., & Hinrichs, J. V. (1984). Diazepam and propranolol in panic disorder and agoraphobia. Archives of General Psychiatry, 41, 287-292. Noyes, R., Christiansen, J., Clancy, J., Garvey, M. J., Suelzer, M., & Anderson, D. J. (1991). Predictors of serious suicide attempts among patients with panic disorder. Comprehensive Psychiatry, 32, 261-267. Noyes, R., Woodman, C., Garvey, M. J., Cook, B. L., Seulzer, M., Clancy, J., & Andersen, D. J. (1992). Generalized anxiety disorder versus panic disorder: Distinguishing characteristics and patterns of comorbidity. Journal of Nervous and Mental Diseases, 180, 369-379. Nunnally, J. (1970). Introduction to psychological measurement. New York: McGraw-Hill. Parloff, M., Kelman, H., & Frank, J. (1954). Comfort, effectiveness and self-awareness as criteria of improvement in psychotherapy. American Journal of Psychiatry, 3, 343-351. Perconte, S. T., & Griger, M. L. (1991). Comparison of successful, unsuccessful and relapsed Vietnam veterans treated for posttraumatic stress disorder. Journal of Nervous and Mental Disease, 179, 558-562. Perse, T. L., Greist, J. H., Jefferson, J. W., Rosenfeld, R., & Dar, R. (1987). Fluvoxamine treatment of obsessives-compulsive disorder. American Journal of Psychiatry, 144, 1543-1548. Pekarik, G. (1983). Improvement in clients who have given different reasons for dropping out of treatment. Journal of Clinical Psychology, 39, 909-913. Peveler, R. C., & Fairburn, C. G. (1990). Measurement of neurotic symptoms by self-report questionnaire: Validity of the SCL-90-R. Psychological Medicine, 20, 873-879. Pinneau, S. R., & Newhouse, A. (1964). Measures of invariance and comparability in factor analysis for fixed variables. Psychometrika, 29, 271-281. Plante, T. G., Manuel, G. M., Mendez, A. V., & Marcotte, D. (1995). Coping with stress among Salvadoran immigrants. Hispanic Journal of the Behavioral Sciences, 17, 471-479. Prusoff, B. A., Weissman, M. M., Klerman, G. L., & Rounsaville, B. J. (1980). Research diagnostic criteria subtypes of depression: Their role as predictors of differential response to psychotherapy and drug treatment. Archives of General Psychiatry, 37, 791-801. Quitkin, F. M., Liebowitz, M. R., Steward, J. W., McGrath, P. J., Harrison, W., Rabkin, J. G., Markowitz, J., & Davies, S. O. (1984). 1-Deprenyl in atypical depressives. Archives of General Psychiatry, 41, 777-780. Ravaris, C. L., Robinson, D. S., Ives, J. O., Nies, A., & Bartlett, D. (1980). Phenelzine and amitriptyline in the treatment of depression. Archives of General Psychiatry, 37, 1975-1980. Rief, W., Hiller, W., Geissner, E., & Fichter, M. M. (1995). A 2-year follow-up study of patients with somatoform disorders. Psychosomatics, 36, 376-386. Robbins, L. N., Helzer, J. E., Croughan, J., & Ratcliff, K. S. (1981). National institue of

< previous page

page_721

next page >

< previous page

page_722

next page > Page 722

mental health diagnostic interview schedule. Archives of General Psychiatry, 38, 318-389. Roberts, C. S., Rossetti, K., Cone, D., & Kavanaugh, D. (1992). Psychosocial impact of gynecologic cancer: A descriptive study. Journal of Psychosocial Oncology, 10, 99-109. Robinson, G. E., Olmsted, M. P., & Garner, D. M. (1989). Predictors of postpartum adjustment. Acta Psychiatrica Scandinavica, 80, 561-565. Rogentine, D. S., VanKammen, D. P., Fox, B. H., Docherty, J. P., Rosenblatt, J. E., Boyd, S. C., & Bunney, W. E. (1979). Psychological factors in the prognosis of malignant melanoma: A prospective study. Psychosomatic Medicine, 41, 647-655. Rosenberg, R., Bech, P., Mellergard, M., & Ottoson, J. O. (1991). Secondary depression in panic disorder: An indicator of severity with a weak effect on outcome in alprazolam and imipramine treatment. Acta Psychiatrica Scandinavica, 365, 39-45. Rounsaville, B. J., Glazer, W., Wilber, C. H., Weissman, M., & Kleber, H. D. (1983). Short-term interpersonal psychotherapy in methadone-maintained opiate addicts. Archives of General Psychiatry, 40, 620-638. Saravay, S. M., Pollack, S., Steinberg, M. D., Weinschel, B., & Habert, M. (1996). Four year follow-up of the influence of psychological comorbidity on medical rehospitalization. American Journal of Psychiatry, 153, 397403. Saravay, S. M., Steinberg, M. D., Weinschel, B., Pollack, S., & Alovis, N. (1991). Psychological comorbidity and length of stay in the general hospital. American Journal of Psychiatry, 148(3), 324-329. Saunders, B. E., Arata, C. M., & Kilpatrick, (1990). Development of a crime-related post traumatic stress disorder scale for women with the SCL-90-R. Journal of Traumatic Stress Disorders, 3, 439-448. Schleifer, S. J., Bhardwaj, S., Lebovits, Al., Tanaka, S., Messe, M., & Strain, J. J. (1991). Predictors of physician nonadherence to chemotherapy regimens. Cancer, 67, 945-951. Shain, W. S., d'Angelo, T. M., Dunn, M. E., Lichter, A. S., & Pierce, L. J. (1994). Mastectomy versus conservative surgery and radiation therapy: Psychosocial consequences. Cancer, 73, 1221-1228. Shalev, A. Y. (1992). Posttraumatic stress disorder among injured survivors of a terrorist attack. Journal of Nervous and Mental Disease, 180, 505-509. Shapiro, D. A., Barkham, M., Hardy, G. E., & Morrison, L. A. (1990). The second Sheffield Psychotherapy Project: Rationale, design and preliminary outcome data. British Journal of Medical Psychology, 63, 97-108. Shapiro, D. A., Barkham, M., Rees, A., Hardy, G. E., Reynolds, S., & Startup, M. (1994). Effects of treatment duration and severity of depression on the effectiveness of cognitive-behavioral and psychodynamicinterpersonal psychotherapy. Journal of Consulting and Clinical Psychology, 62, 522-534. Shapiro, D. A., & Firth, J. (1987). Prescriptive vs. exploratory psychotherapy: Outcomes of the Sheffield Psychotherapy Project. British Journal of Psychiatry, 151, 790-799. Shapiro, D. A., & Firth-Cozens, J. (1990). Two-year follow-up of the Sheffield Psychotherapy Project. British Journal of Psychiatry, 157, 389-391. Shear, M. K., Pilkonis, P. A., Cloitre, M., & Leon, A. C. (1994). Cognitive behavioral treatment compared with nonprescriptive treatment of panic disorder. Archives of General Psychiatry, 51, 395-401. Simon, G. E., & Von Korff, M. (1991). Somatization and psychiatric disorders: The NIMH epidemiologic catchment area study. American Journal of Psychiatry, 148, 1491-1500. Snyder, D., Lynch, J., Derogatis, L. R., & Gruss, L. (1980). Psychopathology and communication problems in primary practice. Psychosomatics, 21, 661-670. Soloff, P. H., Cornelius, J., Anselm, G., Swami, M., Perel, J. M., & Ulrich, R. S. (1993). Efficacy of haloperidol and phenelzine in borderline personality disorder. Archives of General Psychiatry, 50, 377-385. Steer, R. A., & Hassett, T. (1982). Contributions of individual syndromes of global pathology ratings for mental health and substance abuse patients. Jounral of Clinical Psychology, 38, 448-551. Steer, R. A., Platt, J. J., Hendriks, V. M., & Metzger, D. S. (1989). Types of self-reported psychopathology in Dutch and American heroin addicts. Drug and Alcohol Dependence, 24, 175-181. Steer, R. A., Platt, J. J., Ranieri, W. F., & Metzger, D. S. (1989). Relationships of SCL-90-R

< previous page

page_722

next page >

< previous page

page_723

next page > Page 723

to methadone patients' psychosocial characteristics and treatment response. Multivariate Experimental Clinical Research, 9, 45-54. Stewart, D. E., Reicher, A. E., Gerulath, A. H., & Boydell, K. M. (1994). Vulvodynia and psychological distress. Obstetrics and Gynecology, 84, 587-590. Stewart, J. W., Quitkin, F. M., Terman, M., & Terman, J. S. (1990). Is seasonal affective disorder a variant of atypical depression? Differential response to light therapy. Psychiatry Research, 33, 121-128. Strauman, T. J. (1992). Self-guides, autobiographical memory and anxiety and dysphoria: Toward a cognitive model of vulnerability to emotional distress. Journal of Abnormal Psychology, 101, 87-95. Sullivan, M. D., Katon, W., Dobie, R., Sakai, C., Russo, J., & Harrop-Griffiths, J. (1988). Disabling tinnitus associated with affective disorder. General Hospital Psychiatry, 10, 285-291. Swedo, S. E., Rettew, D. C., Kuppenheimer, M., Lum, D., Dolan, S., & Goldberger, E. (1991). Can adolescent suicide attempters be distinguished from at-risk adolescents? Pediatrics, 88(3), 620-629. Swett, C., Surrey, J., & Cohen, C. (1990). Sexual and physical abuse histories and psychiatric symptoms among male psychiatric outpatients. American Journal of Psychiatry, 147(5), 632-636. Teicher, M. H., Glod, C. A., Aaronson, S. T., Gunter, P. A., Schatzberg, A. F., & Cole, J. O. (1989). Open assessment of the safety and efficacy of thioridazine in the treatment of patients with borderline personality disorder. Psychopharmacology Bulletin, 25, 535. Thompson, L. W., Gallagher, D., & Breckenridge, J. (1987). Comparative effectiveness of psychotherapy for depressed elders. Journal of Consulting and Clinical Psychology, 55, 385-390. Toomey, T. C., Seville, J. L., Mann, J. D., Abashian, S. W., & Grant, J. R. (1995). Relationship of sexual and physical abuse to pain description, psychological distress, and health-care utilization in a chronic pain sample. Clinical Journal of Pain, 11, 307-315. Tross, S., Herndon, J., Korzun, A., Kornblith, A. B., Cella, D. F., Holland, J. F., Raich, P., Johnson, A., Kiang, D. T., Perloff, M., Norton, L., Wood, W., & Holland, J. C. (1996). Psychological symptoms and disease-free and overall survival in women with stage II breast cancer. Journal of the National Cancer Institute, 88, 661-667. Tryon, R. C. (1966). Unrestricted cluster and factor analysis with application to the MMPI and HolzingerHarman problems. Multivariate Behavioral Research, 1, 229-244. Turner, L. A., Althof, S. E., Levine, S. B., Bodner, D. R., Kursh, E. D., & Resnick, M. I. (1991). External vacuum devices in the treatment of erectile dysfunction: A 1-year study of sexual and psychosocial impact. Journal of Sex and Marital Therapy, 17, 81. Turner, L. A., Althof, S. E., Levine, S. B., Tobias, T. R., Kursh, E. D., Bodner, D., & Resnick, M. I. (1990). Treating erectile dysfunction with external vacuum devices: Impact upon sexual, psychological and marital functioning. The Journal of Urology, 144, 79-82. Vollrath, M., Koch, R., & Angst, J. (1990). The Zurich study: IX. Panic disorder and sporadic panic: Symptoms, diagnosis, prevalence and overlap with depression. European Archives of Psychiatry and Neurological Sciences, 239, 221-230. Walker, E. A., Katon, W. J., Hansom, J., Harrop-Griffiths, J., Holm, L., Jones, M. L., Hickok, L. R., & Russo, J. (1995). Psychiatric diagnoses and sexual victimization in women with chronic pelvic pain. Psychosomatics, 36, 531-540. Walsh, B. T., Hadigan, C. M., Devlin, M. J., Gladis, M., & Roose, S. P. (1991). Longterm outcome of antidepressant treatment for bulima nervosa. American Journal of Psychiatry, 148(9), 1206-1212. Walsh, B. T., Wilson, G. T., Loeb, K. L., Devlin, M. J., Pike, K. M., Roose, S. P., Fleiss, J., & Waternaux, C. (1997). Medication and psychotherapy in the treatment of bulimia nervosa. American Journal of Psychiatry, 154, 523-531. Waryszak, Z. (1982). Symptomatology and social adjustment of psychiatric patients before and after hospitalization. Social Psychiatry, 17, 149-154. Weathers, F. W., Litz, B. T., Keane, T. M., Herman, D. S., Steinberg, H. R., Huska, J. A.,

< previous page

page_723

next page >

< previous page

page_724

next page > Page 724

& Kraemer, H. C. (1996). The Utility of the SCL-90-R for the diagnosis of war-zone related post traumatic stress disorder. Journal of Traumatic Stress, 9, 111-128. Weidner, G., Connor, S. L., Hollis, J. F., & Connor, W. E. (1992). Improvements in hostility and depression relative to dietary change and cholesterol lowering: The Family Heart Study. Annals of Internal Medicine, 117, 820-823. Weissman, M. M., Pottenger, M., Kleber, H., Ruben, H. L., Williams, D., & Thompson, W. D. (1977). Symptom patterns in primary and secondary depression: A comparison of primary depressives and depressed opiate addicts, alcoholics, and schizophrenics. Archives of General Psychiatry, 34, 854-862. Weissman, M. M., Sholomskas, D., Pottenger, M., Prusoff, B. A., & Locke, B. Z. (1977). Assessing depressive symptoms in five psychiatric populations: A validation study. American Journal of Epidemiology, 106(3), 203214. Wetzler, S., Kahn, R. S., Cahn, W., van Praag, H. M., & Asnis, G. M. (1990). Psychological test characteristics of depressed and panic patients. Psychiatry Research, 31, 179-192. Wetzler, S., Khadivi, A., & Oppenheim, S. (1995). The Psychological assessment of depression: Unipolars versus bipolars. Journal of Personality Assessment, 65, 557-566. Wicki, A., & Angst, J. (1991). The Zurich study: X. Hypomania in a 28- to 30-year-old cohort. European Archives of Psychiatry and Clinical Neuroscience, 240, 339-348. Wider, A. (1948). The Cornell medical index. San Antonio, TX: Psychological Corporation. Wiggins, J. S. (1969). Content dimensions in the MMPI. In J. N. Butcher (Ed.), MMPI: Research developments and clinical applications. New York: McGraw-Hill. Williamson, J. M., Borduin, C. M., & Howe, B. A. (1991). The Ecology of adolescent maltreatment: A multilevel examination of adolescent physical abuse, sexual abuse, and neglect. Journal of Consulting and Clinical Psychology, 59, 449-457. Wing, J. K., Cooper, J. E., & Sartorius, N. (1974). The measurement and classification of psychiatric symptoms. London: Cambridge University Press. Winje, D. (1996). Long-term outcome of trauma in adults: The psychological impact of a fatal bus accident. Journal of Consulting and Clinical Psychology, 64, 1037-1043. Winokur, A., Guthrie, M., Rickels, K., & Nael, S. (1982). Extent of agreement between patient and physician ratings of emotional distress. Psychosomatics, 23, 1141-1146. Winston, A., Pollack, J., McCullough, L., Flegenheimer, W., Kestenbaum, R., & Trujillo, M. (1991). Brief psychotherapy of personality disorders. Journal of Nervous and Mental Disease, 179, 188-193. Wiznitzer, M., Verhulst, F. C., Van den Brink, W., Koeter, M., van der Enoe, J., Griel, R., & Koot, H. M. (1992). Detecting psychopathology in young adults: The young adult self-report, the general health questionnaire, and the Symptom Checklist 90-R as screening instruments. Acta Psychiatrica Scandinavica, 86, 32-37. Woodman, C. L., & Noyes, R. (1994). Panic disorder: Treatment with valproate. Journal of Clinical Psychiatry, 55, 134-136. Woodworth, R. S. (1918). Personal data sheet. Chicago: Stoelting. Zabora, J. R., Smith-Wilson, R., Fetting, J. H., & Enterline, J. P. (1990). An efficient method for psychosocial screening of cancer patients. Psychosomatics, 31, 192-196.

< previous page

page_724

next page >

< previous page

page_725

next page > Page 725

Chapter 24 Sympton Assessment-45 Questionnaire (SA-45) Mark E. Maruish United Behavioral Health In chapter 1, Maruish provides an overview of the current uses of psychological testing and the potential contributions of psychological testing to clinical decision making and outcomes measurement (with the latter serving as the impetus for this chapter). The need for brief instruments to support these applications also is identified. As has been observed, the form of assessment commonly used is moving away from the lengthy, multidimensional objective instruments (e.g., MMPI) or time-consuming projective techniques (e.g., Rorschach) that previously represented the standard in practice. The type of assessment authorized now usually involves the use of brief, inexpensive, yet well-validated problem-oriented instruments. This reflects modern behavioral health care's time-limited, problem-oriented approach to treatment. Today, the clinician can no longer afford to spend a great deal of time in assessment activities when the patient is only allowed a limited number of payer-authorized sessions. Thus, brief instruments will become more commonly employed for problem identification, progress monitoring, and outcomes assessment in the foreseeable future. (p. 32) One way in which the need for brief measures has been successfully addressed is through a series of symptom checklists developed by Leonard Derogatis and his colleagues. This family of instruments had its beginnings with the Hopkins Symptom Checklist (HSCL; Derogatis, Lipman, Rickels, Uhlenhuth, & Covi, 1974a, 1974b). Subsequent to its development, Derogatis (1983) noted that several aspects of the HSCL limited its utility; consequently, he began work on a new symptom checklistthe Symptom Checklist-90 (SCL-90; Derogatis, Lipman, & Covi, 1973). Clinical experience and further analyses of its psychometric properties later led to the development of the current version of the SCL-90, the Symptom Checklist-90-Revised (SCL-90-R; Derogatis, 1983, 1994; Derogatis, Rickels, & Rock, 1976). The development of the SCL-90-R was accompanied by the development of two companion clinician rating scalesthe Hopkins Psychiatric Rating Scale (HPRS) and the SCL-90 Analogue Scaleas well as a much shorter (by almost half) version of the revised checklist, the Brief Symptom Inventory (BSI; Derogatis, 1992, 1993). The utility

< previous page

page_725

next page >

< previous page

page_726

next page > Page 726

of the BSI for both screening and outcomes assessment is attested to by the number of organizations that have chosen it for internal use (e.g., Pallak, 1994). However, cost can present a barrier to its use for many providers who routinely administer tests to their clientele. With a need for an inexpensive, brief, multidimensional measure that could serve as a preliminary screener, treatment outcome indicator and general purpose research tool, Strategic Advantage, Inc. (SAI), a Minneapolis-based behavioral health care outcomes assessment and consultation group, set out to develop an alternative. The original SCL-90 was the logical source from which to derive and further develop a set of items to satisfy SAI's requirements. The professional literature supports the use of the SCL-90 as a valid and reliable measure of psychological distress that can be used for screening and the assessment of treatment outcomes. The SCL-90 is a public domain instrument that has demonstrated its suitability for use with both adults and adolescents and has gained widespread acceptance within the provider community. Moreover, as reported in the BSI manual (Derogatis & Spencer, 1982), Derogatis's earlier research with the SCL-90 (Derogatis & Cleary, 1977) had demonstrated that only a limited number of items from each of its scales was necessary to maintain the definition of the construct purportedly measured by that scale. Finally, SAI had employed the SCL-90 in outcomes consulting work for a number of years and was quite familiar with its strengths and limitations. Moreover, SAI had a large number of data sets containing SCL-90 and other collateral data that could facilitate the completion of key aspects of instrument development. Using a different approach than that used by Derogatis and his colleagues, SAI researchers selected 45 items from the SCL-90 (five items for each of the nine SCL-90 symptom domains) for inclusion in the Symptom Assessment-45 Questionnaire (SA-45). Separate gender-based norms were developed for both adolescents and adults from both inpatient and nonpatient populations, and the requisite validity and reliability studies were completed. The SA-45 and its abbreviated version, the SA-24, have become key components of the instrumentation employed by SAI in its behavioral health care outcomes research. Having demonstrated its psychometric properties and utility, the SA-45 is now commercially available to all qualified behavioral health care providers. Development and Use of the SA-45 The primary goal of SA-45 development efforts was to use the proven items and structure of the SCL-90 to create a brief, valid, and reliable measure of psychiatric symptomatology that could be used for the assessment of treatment outcomes. The instrument was intended to have additional utility for screening patients and tracking their progress during the course of treatment. As is demonstrated in this and the following two sections, these goals were achieved. Summary of Development The approach taken in developing the SA-45 was that of cluster analysis. For the purpose of initial item selection, the SCL-90 results of an inpatient sample tested at the time of admission to a large system of private psychiatric hospitals were used. This sample was described in Davison et al. (1997) as their inpatient intake sample, but hereafter

< previous page

page_726

next page >

< previous page

page_727

next page > Page 727

it is referred to as the development sample. It consisted of 690 adult females, 829 adult males, 466 adolescent females, and 400 adolescent males. To examine the structure of the symptom domains, items were intercorrelated and Ward's (1963) method of cluster analysis was applied to the correlation matrix. A nine-cluster solution was forced, with each cluster containing five items. Based on these findings, nine scales matching the symptom domain scales of the SCL-90 were constructed, each incorporating the five items from the respective parent SCL-90 scale that were identified through the cluster analytic procedures. Subsequent cluster analyses were performed on five subgroups (adult inpatients, adolescent inpatients, adult and adolescent female inpatients, adult and adolescent male inpatients, and adult and adolescent nonpatients) in order to examine the degree to which items clustered according to expectations (Davison et al., 1997). The required item response sets needed for this study were extracted from existing SCL-90 data sets. The number of hits (i.e., items that clustered according to expectation) ranged from 35 (78%) of the 45 items using nonpatient data to 43 (96%) of the items using adult inpatient intake data. In comparison, cluster analyses of SCL-90 intake data and BSI item responses extracted from that same SCL-90 data yielded hits for 51 (61%) of the 83 scored SCL-90 items, and for 42 (86%) of the 49 scored BSI items. SAI's experience with the SCL-90 and BSI indicated that two summary indices found on both instrumentsthe Positive Symptom Total (PST) and the Global Severity Index (GSI)were useful as descriptors of overall level of psychopathology or symptomatology. The PST index is the total number of symptoms reported by the respondent to be present to any degree during the previous 7 days. The GSI represents the average item response value (ranging from 1 to 5) for all items on the SA-45 and thus provides a good indication of the respondent's overall level of distress or disturbance. Normative Group Data Normative data for the SA-45 items were extracted from SCL-90 data sets gathered on groups of 748 adult females, 328 adult males, 321 adolescent females, and 293 adolescent males. These nonpatient samples included employees of a large, national behavioral health care company and their family members, along with approximately 300 adolescents from a midwestern suburban high school. In calculating the mean and standard deviation for each SA-45 scale and index, cases in which one or more item responses for a given scale or index were missing (i.e., omitted or for which more than one response was indicated) were not included in the data for that scale or index. Thus, means and standard deviations for the 11 scales and indices are based on 714 to 748 adult females, 312 to 328 adult males, 302 to 321 adolescent females, and 293 adolescent males, depending on the scale or index being considered. Recognizing that being able to compare a patient's results to those of inpatients would enhance the interpretation of results, SCL-90 data sets for groups of adult and adolescent inpatients were rescored to arrive at raw scores for each of the 11 SA-45 scales and indices. These groups included 5,317 adult females, 5,854 adult males, 2,889 adolescent females, and 2,331 adolescent males who were administered the SCL-90 at the time of admission to inpatient facilities for behavioral health treatment. The SA-45 development sample was included in these samples. Cases in which one or more items for a given scale or index were missing were excluded from the mean raw score calculations for the 11 SA-45 scales and indices. Thus, the inpatient means and standard deviations for the scales and indices are

< previous page

page_727

next page >

< previous page

page_728

next page > Page 728

based on 4,732 to 5,300 adult females, 4,753 to 5,276 adult males, 2,424 to 2,715 adolescent females, and 1,935 to 2,196 adolescent males. Development of Area t-Score and Percentile Conversions Standard scores were developed for the same nonpatient normative sample reported in Davison et al. (1997). These scores were developed by first calculating the frequency of the score distributions of the SA-45's nine symptom domain scales and the PST and GSI indices for the four age-by-gender nonpatient samples. These calculations resulted in a total of 44 score distributions that needed to be modeled. Although each sample was large, the number of cases observed at one end of the distribution often was low, resulting in possible instability in that area of the distribution. A smoothing function was applied to the sample data to adjust for this instability, and a family of nonlinear functions was identified that fit each of the given sample distributions. One of two nonlinear functions was able to fit the data, depending on whether the distribution was skewed to the right or left. The models were fit to the sample data using a quasi-Newton search algorithm and the maximum likelihood method. The resulting models were used to estimate a frequency distribution for each scale and index score for each age-by-gender sample. Using the estimated frequency distribution, the raw scores were transformed to area T-scores for the four nonpatient populations. This same procedure also was applied to the inpatient normative sample data to arrive at raw score-to-inpatient area T-score conversions. Area T-scores, rather than linear T-scores, were selected for use in interpreting SA-45 results. Employment of the more commonly used linear T-scores assumes that the characteristic or construct being measured by the instrument is normally distributedan assumption that one cannot make regarding psychopathological characteristics. Area T-scores have the effect of normalizing the distribution of scores and permitting accurate percentile determination. For example, an area T-score of 60 (i.e., one standard deviation above the mean) is equivalent to the 84th percentile. A linear T-score of 60 equates to the 84th percentile only when the distribution is normal; otherwise, it represents only a rough approximation. Related SA-45 product offerings include software for online administration and automated scoring of the instrument. Development of this software required the transformation of the normative conversion data from a tabular form to a more compressed form for economical storage. To accomplish this, polynomial functions of either five degrees for the nine domain scale scores and the PST score, or seven degrees for the GSI score were developed to fit the area T-score for each nonpatient normative subsample. Quasi-Newton search algorithms that minimized the squared error were used to fit the polynomial functions, thus yielding a more efficient means of storing the SA-45 norms tables. Another way in which the SA-45's scored data conversions differ from the standard is the manner in which percentiles have been computed. Whereas a percentile is commonly computed and interpreted to indicate the percentage of the normative that obtained a score lower than the score being referred to, an SA-45 percentile indicates the percentage of the normative sample that obtained scores equal to or lower than that score. Thus, 84% of the relevant age- and gender-specific SA-45 normative group scored equal to or lower than anyone obtaining a scale or index raw score that equals a

< previous page

page_728

next page >

< previous page

page_729

next page > Page 729

percentile of 84 (or area T-score of 60). At the same time, 16% of that same normative group scored higher than that person on that particular scale or index. With scales comprised of only five items each, the omission of even one SA-45 item means that the corresponding item pool of the symptom domain scale is reduced by 20%. As a result, a simple method was to be developed that could be used to estimate the values of missing SA-45 items and thus provide an accurate estimate of the total raw score for a given scale or index. It was decided to limit efforts to arrive at a means of estimating a missing item's response value when it is the only response value missing in the scale. Similarly, the investigation of methods for estimating the total raw score for the PST and GSI indices was limited to estimates of the respective total raw scores in those cases in which the total number of missing item response values is 10 or fewer for the PST index and 11 or fewer (approximately one quarter of the total SA-45 item pool) for the GSI. Stepwise linear regression methods were used to develop an equation for predicting or estimating the value for each SA-45 item when it is the only item missing from its parent scale. In estimating the total raw score of the PST index and the GSI when 10 or less and 11 or less items, respectively, are missing from the SA-45 test protocol, the mean value replacement method was found to represent the optimal solution for arriving at the predicted total raw scores for the two summary indices. These means of correcting for missing items were later found to yield raw scores that generally correlate at about .98 with actual scores across age and gender groups. Psychometric Considerations The psychometric integrity of any psychological test, rating scale, or related measure or procedure is reflected in two broad constructs: reliability and validity. Reliability refers to the extent to which an instrument is consistent in what it measures. Validity refers to the degree to which an instrument measures what it purports to measure. Reliability. The internal consistency reliability of each of the nine symptom domain scales was evaluated using Cronbach's coefficient alpha for each of four adult samples and four adolescent samples (Davison et al., 1997). The coefficients for the adult samples were computed from the results of 1,471 to 1,498 mental health or chemical dependency inpatients who took the SCL-90 at the time of treatment intake, 1,003 to 1,017 of the intake patients who took the SCL-90 again at treatment termination, 938 to 951 of the intake patients who took the SCL-90 again 6 months following treatment termination, and 1,077 to 1,085 nonpatients. For adolescents, coefficients were computed from the results of 827 to 858 mental health or chemical dependency inpatients who took the SCL-90 at the time of treatment intake, 598 to 605 of the intake patients who took the SCL-90 again at treatment termination, 565 to 571 of the intake patients who took the SCL-90 again 6 months after treatment termination, and 610 to 619 nonpatients. The alpha coefficients for the adult samples ranged from .71 for the Psychoticism scale for the Follow-Up sample to .92 for the Depression scale for the Termination sample. For the adolescent samples, the alphas ranged from .69 for the Psychoticism scale for both the Termination and Follow-Up samples to .90 for the Depression scale for the Intake sample. In general, the SA-45 coefficients are comparable to those for the BSI, but the SCL90 coefficients are greater than both. This latter finding is not surprising, given that reliability generally increases with increased test length and the SCL-90 is approximately twice as long as both the SA-45 and BSI.

< previous page

page_729

next page >

< previous page

page_730

next page > Page 730

In another examination of internal consistency, each of the SA-45's items were correlated with the total raw score for each of the nine symptom domain scales. In those instances in which an item was correlated with the scale of which it is a member, that item was removed from the scale before the scale score and correlation were calculated. For the combined development sample (2,442 male and female adult and adolescent psychiatric inpatients), it was determined that the highest correlations of 42 of the 45 items (93%) were with the scale to which each item belongs. For 19 of these 42 items (45%), the correlations were at least .10 greater than the correlation of these items with any other scale. For a cross-validation sample (13,550 male and female adult and adolescent psychiatric inpatients), 43 of the 45 items (96%) correlated highest with their parent scales, with correlations for 18 (42%) of these 43 items being at least .10 greater than their correlation with any other scale. Overall, the results of these two analyses compare quite favorably to those obtained by Boulet and Boss (1991) in a similar analysis of the BSI items. Adult, nonpatient test-retest study data were gathered on 15 males and 42 females who were not receiving any behavioral health care services at the time of testing. The study employed a 1- to 2-week retest interval. The raw score-based correlations are generally in the .80's, with notable exceptions for the Somatization scale (.69) and Anxiety scale (.42). One possible explanation for these findings is that the Anxiety scale items are sensitive to variations in common, everyday experiences (e.g., ''Feeling tense or keyed up," "Feeling so restless you couldn't sit still"). Similar sensitivities might also be operating with some of the Somatization items (e.g., "Soreness in your muscles"). Overall, these findings are somewhat lower but generally in line with the BSI 2-week test-retest reliabilities reported by Derogatis (1993) for a group of 60 nonpatients. In a similar study, the SA-45 was administered to 48 adolescent males and 16 adolescent females, and then readministered 1 to 2 weeks later. The raw score-based correlations are quite variable, ranging from .51 for the Hostility scale to .85 for the Psychoticism scale. Consistent with the adult findings, the Anxiety scale coefficient (.58) is the next to the lowest of the coefficients. The area T-score-based correlation coefficients generally show only slight variations from those reported for the raw scores. Area T-score changes from the first to second testings remain relatively stable, dropping on an average of only 1.12 points for the nine symptom domain scales and 2.27 points for the two summary indices. SA-45 test-retest reliability coefficients also were computed for combined-gender adult, adolescent, and combined age group inpatient psychiatric samples retested at 1, 2, and/or 3-week intervals. Overall, moderate level correlations were obtained for all three age groups. In general, these correlations are consistent with what might be expected for a brief symptom measure that is administered to a psychiatric inpatient sample over the three time intervals. Another way of expressing the reliability of a given measure is through its standard error of measurement (SEM). With the exception of the Somatization scale findings for adolescents, the adult and adolescent area Tscore SEMs for the nine domain scales and two indices computed using the 1- to 2-week, nonpatient test-retest reliability coefficients do not exceed five T-score points. Validity. The SA-45's construct validity has been demonstrated through various approaches. One approach was an investigation of the instrument's interscale correlations. Using the SCL-90 item responses of more than 1,300 adult inpatients, the SA-45 interscale correlations were found to range from .38 between the Phobic Anxiety and

< previous page

page_730

next page >

< previous page

page_731

next page > Page 731

Hostility scales, to .75 between the Interpersonal Sensitivity and Depression scales, suggesting a substantial degree of shared variance (14% to 56%) and a lack of clear independence among these nine scales. Similar analyses were conducted on the interscale correlations for the inpatient adolescent sample (N = 770+), resulting in findings that were similar to those for the adult sample. Additional analyses of the same SCL-90 data indicated that the SA-45 scales are statistically more distinct than those in the SCL-90 for both adults and adolescents; and with one exception, the distinction between the SA-45 scales is equal to or better than that for the BSI for both age groups. An instrument developed to assess the presence and intensity of normal personality or psychopathological constructs should yield results that differentiate groups with varying degrees of those constructs. In the case of psychiatric inpatients, it would be expected that they would report more severe symptomatology at the time of admission than at the time of discharge or several months thereafter. It also would be expected that nonpatients would report less symptomatology than psychiatric inpatients at the time of admission, and also to report a level of symptomatology that would be no more (and probably less) than inpatients at the time of their discharge and on postdischarge follow-up. Results reported by Davison et al. (1997) generally revealed the expected group differences for adults. The results were somewhat different for the male and female adolescent subsamples but generally supported the SA-45's ability to discriminate among groups of different symptom severity levels. Related to the contrasted group comparisons is the SA-45's ability to accurately classify a respondent as belonging or not belonging to inpatient and nonpatient samples (i.e., sensitivity and specificity) using a single score or set of scores yielded by the instrument. In establishing cutoffs for maximized sensitivity and specificity, a 90% rate of correct classification of inpatients and nonpatients, respectively, was used in order to match the prevalence or base rate of inpatients within the total available sample. Preliminary findings reveal that the use of scores from a subset of the SA-45 scales in a derived logistic regression equation is superior to the GSI score alone for classification purposes. Analyses revealed relatively high sensitivity and specificity values for both adult gender samples with the female values (.87 and .87, respectively) being somewhat higher than those for the males (.78 and .86, respectively) when optimized classification cutoffs are applied. For the two adolescent groups, the values for optimized classification showed a substantial drop, with the sensitivity and specificity values being .73 and .69, respectively, for the females and .57 and .68, respectively, for the males. In order to cross-validate the item composition of the nine symptom domain scales, SA-45 item responses were extracted from SCL-90 intake data for four groups of psychiatric inpatients and submitted to the same cluster analytic procedures used to derive the scales. The four samples consisted of 8,459 adults, 3,793 adolescents, 6,110 males, and 6,142 females. For each group, the number of correct classifications was as follows: 44 (98%) for the adult patient group; 32 (71%) for the adolescent patient group; and 43 (96%) for each of the male and female patient groups. Overall, the findings support the cluster solution originally derived for the SA-45. However, as has been found in other investigations, the psychometric data for adolescents is not as strong as those for adults. Because the SA-45 items were derived from the SCL-90 in a manner that retained the structure and representativeness of the symptom domains of the parent instrument, the SA-45's scales and global indices should correlate highly with those of the SCL-90. In order to demonstrate this, the SCL-90 results of the adult and adolescent inpatient development samples were scored using standard SCL-90 scoring procedures and then

< previous page

page_731

next page >

< previous page

page_732

next page > Page 732

rescored to obtain SA-45 data. The correlations between the scales and indices of the two instruments for these large samples of adult and adolescent inpatients generally were found to be .95 or higher. The notable exception is the correlation for the Psychoticism scale (.88-.90). Remember that the relation between the two sets of scales was probably maximized owing to the fact that the SA-45 data were derived from the same SCL-90 data with which it was correlated. Thirty-five of the SA-45's items are identical to items scored on the BSI; thus, one also would expect scales from these two brief symptom measures also to be highly correlated. Because all scored BSI items are contained in the SCL-90, scores for the nine BSI scales were derived from the SCL-90 data sets used for the SA-45 development. Results similar to those found for the SA-45/SCL-90 correlations were found for the SA-45/BSI correlations. Again, differences in content likely account for the relatively low correlation between the two Psychoticism scales, and the fact that SA-45 and BSI data were derived from the same SCL-90 data likely maximized the obtained correlations. The SA-45's content validity can be examined through its item total scale correlations. As indicated earlier, each of the SA-45 items demonstrates its strongest relationship with the scale to which it belongs. All of these correlations are higher than the .30 to .50 range that Reynolds (1991) considered substantial when evaluating the content validity of an instrument. Two additional findings also support the content validity of the SA-45. First, the content of each scale's five items reflects symptoms that are pathognomonic or commonly associated with the broad group of disorders suggested by the scale's title. The possible exception lies with the Phobic Anxiety scale. Also, the high correlations between the SA-45 and SCL-90 scales and indices lend support for the content validity of the SA-45 for the purpose for which it was developed. Comparability of the Stand-Alone and SCL-90 Extracted Versions of the SA-45 The majority of the data reported for the SA-45 were extracted from or based on SCL-90 data sets of adult and adolescent patient and nonpatient groups. Consequently, an important consideration in evaluating these data is the degree to which SA-45 results obtained from the administration of SA-45 items presented by themselves are comparable to those that would be obtained if they were based on item responses given as part of an SCL-90 test administration. One method of determining whether the stand-alone version and the SCL-90 extracted version of the SA-45 are comparable would be to examine the extent to which the psychometric characteristics of one version, derived from the results of one sample, are similar to those of the other version administered to a second sample from the same population. SAI used this approach because it had access to the first set of SA-45 stand-alone data on a large group of psychiatric inpatients being treated within the same hospital system from which the development and validation data were obtained. Despite the fact that the available data did not permit sample matching on the important demographic, diagnostic, and treatment variables, the data obtained from the administration of the stand-alone version yielded mean scale/index scores, interscale correlations, item total scale correlations, and alpha coefficients that are quite consistent with those reported for the SCL-90 extracted version. However, caution in the use of the PST index is suggested, and results of the cluster analyses performed using stand-alone data were somewhat mixed, possibly owing to the particular sample used. Moreover, the

< previous page

page_732

next page >

< previous page

page_733

next page > Page 733

alpha coefficients for nonpatients samplesparticularly for the adolescent samplegenerally were lower than those found using SCL-90 extracted data for similar samples. The reason for this is unclear, although the small size and/or particular composition of each of the two nonpatient samples may account for the differences. Overall, preliminary comparisons of the SCL-90 extracted and stand-alone versions of the SA-45 have supported the comparability of the two versions, at least for adults. For adolescents, the initial findings suggest that the stand-alone version may not be as reliable as it is for adults. As with any other instrument for which an updated or alternate version has been developed, the exact degree of comparability of the stand-alone and SCL90 extracted versions will become clear as the results of other investigations around this issue begin to appear in the professional literature. Basic Interpretive Strategy The SA-45 was designed to serve as a measure of treatment outcome for psychiatric populations. It also can provide information that is important in identifying and monitoring significant psychological problems in dayto-day clinical work. However, clinicians must be aware of the SA-45's limitations, particularly if it is used for diagnostic or treatment planning purposes. It is a brief instrument that is not inclusive of all possible psychiatric symptomatology. Thus, the SA-45 generally should be used as only one source of information about the patent. When combined with information obtained from other psychological tests, patient and collateral interviews, and a review of medical records or other historical information, the SA-45 can assist the clinician in screening for the need for behavioral health care services, arriving at a diagnosis, formulating a treatment plan, and monitoring patient progress during treatment. The SA-45 may be used in many types of settings for various purposes related to the measurement of psychological/psychiatric symptomatology that respondents may be experiencing at a given point in time. However, interpretation of an individual's SA-45 results begins with the assumption that the SA-45 is an appropriate instrument to administer to that individual. Factors related to the development of the SA-45 mandate that the instrument be administered only to those individuals who meet all of the following criteria: are at least 13 years or older; read at the sixth-grade level or higher; and are not experiencing a level of distress or agitation that would likely impair their ability to indicate valid answers to all items. (Note that the SA-45 may be administered when an individuals' distress has subsided to a level that likely will not interfere with their ability to provide an accurate assessment of current symptomatology.) At the core of the SA-45 data are the area T-scores and percentiles derived from the nonpatient normative data for the 11 symptom domain scales and summary indices. The area T-score provides a measure that is useful in determining the presence of significant problem areas. Separate nonpatient norms are available for use with male and female adult and adolescent groups. As a general rule, an area T-score of 60 or greater on a given scale or index (i.e., one standard deviation above the nonpatient normative group's average area T-score of 50) indicates a problem area warranting further investigation. The SA-45 nonpatient percentiles can provide additional descriptive information. For example, an SA-45 percentile of 87 (i.e., 87th percentile) on the Depression scale means that 87% of the nonpatient age- and gendermatched normative sample obtained

< previous page

page_733

next page >

< previous page

page_734

next page > Page 734

a score equal to or lower than that of the respondent. Conversely, it also means that only 13% of that same nonpatient normative sample scored higher than the respondent. SA-45 area T-scores and percentiles based on a mixed psychiatric inpatient sample also are available. Like their nonpatient counterparts, the inpatient area T-scores have a mean of 50 and a standard deviation of 10. Also, an area T-score of 60 or higher (or a percentile of 84 or higher) is considered significant. This additional information enables the clinician to compare the respondent's SA-45 results to both nonpatient and inpatient reference groups. This can be quite useful, particularly when evaluating a respondent with known psychological problems. The combination of these two sets of findings may have implications in triaging a person to the appropriate level of care, arriving at a diagnosis, or planning treatment should inpatient psychiatric treatment be indicated. With this basic information in mind, a five-step approach to the interpretation of the SA-45 is recommended. Step 1 Assess the General Validity of the Results. The nature of the SA-45 and other symptom checklists"obvious" items with no "subtle" itemsmakes it relatively easy for the respondent to overreport (''fake bad") or underreport ("fake good") the presence of symptoms to just about any desired degree. And, like many symptom checklists, there currently are no empirically derived special scales or indices to detect the validity of the test-taker's responses to the SA-45 items. However, on rational basis, the presence of any of the following conditions should lead to questions about the validity of the SA-45 profile: Unusually quick completion time: Three minutes, that is, an average time of 4 seconds to read and respond to each of the 45 items, can probably serve as a useful, minimum completion time for the SA-45. Thus, completion of the instrument in less than 3 minutes suggests that the respondent likely has not carefully attended and/or responded to the SA-45 items. Unusually slow completion time: An SA-45 completion time of 30 minutes or more is much longer than is generally encountered in clinical settings. Long completion times may be the result of any of several factors, such as poor reading skills, obsessive rumination over the meaning of specific items and/or how to respond to them, or poor concentration, any of which may have interfered with the respondent's ability to accurately report how frequently they have recently experienced the listed psychological symptoms. Missing items: The SA-45 manual (SAI, 1997) provides instructions for correcting the raw score of any of the nine symptom domain scales for which only one response to that scale's items is missing, the PST raw score when 10 or less of the 45 responses are missing, and the GSI raw score when 11 or less of the 45 responses are missing. Otherwise, depending on the number and scales to which the missing items belong, relevant symptom domain scales, the PST, and/or the GSI should not be interpreted. Patterned responding: Visual inspection of the SA-45 answer form or computer-generated report may reveal that test-takers entered their responses in a questionable manner. On the one hand, consider the case in which the same response (e.g., "A little bit") is given to every item. It is highly unlikelyparticularly in clinical settingsthat a person would experience each of the SA-45 symptoms to the same degree during the past 7 days. On the other hand, consider another type of patterned responding in which the sequence of responses, beginning with Item 1, was 0, 1, 2, 3, 4, 3, 2, 1, 0, 1, and so on, or was 0, 1, 0, 1, 0, and so, through Item 45. The probability that valid responding would yield these types of response patterns would seem to be very low. Their presence therefore should lead to questions about the validity of the obtained profile. Results inconsistent with presentation: Other causes for concern are SA-45 profiles that appear inconsistent with the respondent's clinical presentation or with other evaluation data (e.g., results from other abnormal personality measures). For example, it would be highly unlikely that

< previous page

page_734

next page >

< previous page

page_735

next page > Page 735

individuals taking the SA-45 at the time of their admission to a psychiatric inpatient unit would obtain a profile in which all T-scores are below 60 (i.e, indicating no significant psychological distress). Similarly, it is improbable that a high functioning CEO of a large corporation would obtain a profile in which the Tscore for every scale and index is elevated above 70. Step 2 Evaluate Overall Level of Symptom Distress. Assuming a valid profile, begin the interpretation of SA-45 results with an evaluation of the respondent's overall level of symptom distress or disturbance by noting the nonpatient area T-scores and percentiles for both the PST index and GSI. Each indicates the respondent's overall level of distress and disturbance, one through a count of the number of symptoms reported to have been present to any degree during the previous 7 days (PST) and the other through the average intensity level of the 45 listed symptoms (GSI). A nonpatient norms-based GSI and/or PST area T-score of 60 or higher, or a percentile of 84 or higher, suggests that the number of symptoms respondents are reporting (PST) and/or the intensity at which they are experiencing them (GSI) is significant and warrants further investigation. The investigation may take the form of a psychosocial or diagnostic interview with the respondent, collateral interviews, more extensive psychological testing, or a combination of approaches. In interpreting percentiles, keep in mind that an SA-45 percentile equivalent indicates the percentage of the normative sample who obtained a score equal to or lower than that score. Step 3 Evaluate Area t-Scores and Percentiles for the Nine Symptom Domain Scales. Examine the nonpatient area T-scores and percentiles for each of the symptom domain scales. Scale elevations based on nonpatient norms, viewed either alone or in comparison with other scales, can be useful in identifying problem areas that are present and have contributed to any elevation on either or both the GSI and PST index. Scale elevations also can help in developing a treatment plan. Again, a symptom domain scale area T-score of 60 or higher, or a percentile of 84 or higher based on nonpatient norms, suggests a likely problem area and the need for further investigation. In evaluating scale elevations, it is important to know exactly what each scale is assessing. Following is a summary of the content of each of the nine symptom domain scales: Anxiety (ANX): Items from this scale inquire about symptoms related to fearfulness, panic, tension, and restlessness. Depression (DEP): This scale consists of items asking about recent experiences with feelings of loneliness, hopelessness, and worthlessness. Other symptoms that are assessed include a loss of interest in things and feeling blue. Hostility (HOS): A number of hostility-related symptoms are found on this scale. They include having uncontrollable temper outbursts, getting into frequent arguments, shouting, and feeling urges to harm others or to break things. Interpersonal Sensitivity (INT): The respondents' symptomatic feelings about themselves in relation to others are assessed here. These include feeling inferior or self-conscious around others, feeling that others are unsympathetic or unfriendly, and feeling uneasy when others are talking with or watching the respondent. Obsessive-Compulsive (OC): Difficulty in concentrating or making decisions, repetitive checking or doing tasks slowly to ensure correctness, and problems with one's mind "going blank" are obsessive-compulsive symptoms presented on this scale. Paranoid Ideation (PAR): Some of the subtler forms of paranoid thinking are assessed on this scale, such as feeling that others take advantage of the respondents, cannot be trusted, are

< previous page

page_735

next page >

< previous page

page_736

next page > Page 736

responsible for their troubles, fail to give credit for their achievements, and watch or talk about them. Phobic Anxiety (PHO): On this scale, the respondents are asked to rate their recent experiences with fear or uneasiness when being in open spaces and crowds, using public transportation, and leaving home alone. This scale also asks about avoidance of specific places, things, and activities. Psychoticism (PSY): A number of symptoms of disordered thinking are queried here. These include auditory hallucinations, feelings that others know or are controlling one's thinking, and ideas that one should be punished for their sins. Somatization (SOM): The presence of rather vague physical symptoms is assessed here, including hot or cold spells and feelings of numbness, soreness, tingling, and heaviness in various parts of the body. Step 4 Evaluate Individual Item Responses. Examine the individual items to which the respondent indicated a response other than "Not at all" to attain additional interpretive information. This, along with scale composition information from the manual (1997), will enable clinicians to determine which symptoms reported on the SA-45 contribute to each scale's overall score. In evaluating individual item responses, they will also obtain a more detailed picture of current symptoms, which can assist in developing specific goals for therapeutic work. Individual item responses warrant special scrutiny in those cases in which the only "significant" scale elevations are either at or only slightly above a T-score of 60. This is particularly important if the results are being used to screen for or otherwise classify individuals as having psychological problems that require further evaluation. The metrics of the SA-45 are such that the report of relatively minor problems can result in these types of mild elevations on some of the scales. For example, a raw score of 8 on the Anxiety scale is transformed to an area Tscore of 60 for adult males. Although the 8 raw score indicates that the test-taker may have responded "A little bit" to three of the five Anxiety scale items (and "Not at all" to the other two Anxiety scale items), identification of individuals with relatively mild symptoms may not be of concern to the clinician or researcher. However, the individual who obtains an Anxiety scale raw score of 8 by indicating that they have been bothered "Quite a bit" by "Spells of terror and panic'' during the past 7 days (and "Not at all" to the other four Anxiety scale items) would probably be an appropriate candidate for further evaluation or classification. Step 5 Compare Results with Inpatient-Based SA-45 Norms. As already indicated, additional SA-45 interpretive information is available when SA-45 raw scores are converted to area T-scores based on inpatient norms. This additional information allows clinicians to determine severity of symptomatology relative to a group of patients who, by definition, are experiencing significant problems. This type of comparison probably has its greatest utility for respondents who are receiving inpatient treatment, but it also may be useful in other situations. Use of the SA-45 For Determining Group Membership. SA-45 research indicates that the GSI can be useful in accurately classifying inpatient and nonpatient groups. Sensitivity and specificity statistics for age- and gender-specific cutoff points are provided in the SA-45 manual. They indicate the rates of accurate identification of adult and adolescent nonpatient and inpatient groups based on the GSI score alone, or in combination with other SA-45 variables entered into logistic regression equations. One set of cutoffs was established to maximize the accurate identification of inpatients (i.e., 90% of the "true positives"); another set of cutoffs was established to maximize the accurate

< previous page

page_736

next page >

< previous page

page_737

next page > Page 737

identification of nonpatients (i.e., 90% of the "true negatives"); and yet another set of cutoffs was established to optimize sensitivity and specificity. Sensitivity and specificity statistics provide information relative to SA-45 overall classification rates for groups of inpatients and nonpatients. However, more often than not, the clinician is more interested in determining the probability that an individual patient has been classified accurately based on a given set of test scores. For this reason, the SA-45's positive predictive power (PPP) and negative predictive power (NPP) values are available for a range of rates at which those similar to inpatients (in terms of symptomatology) might be present in a given provider setting. The SA-45 manual (SAI, 1997) also presents the logistic regression equations to be employed for classifying each of the four gender-by-age groups. Note that for each indicated inpatient-to-total patient ratio (i.e., prevalence or base rate), the intercept (b0) to be used in the equation, as well as the cutoffs for maximizing sensitivity and specificity and optimizing both, change in order to maintain the desired levels of sensitivity and specificity (approximately .90). Two notes of caution are warranted regarding the use of the logistic regression equations and accompanying GSI cutoffs. First, use of the appropriate GSI cutoffs for adolescents leads to more inaccurate classifications (false positives and false negatives) than it does for adults. At a broader level, the utility of the GSI and logistic regression cutoffs for classifying SA-45 respondents has only been explored with regard to discriminating inpatient from nonpatient samples. Thus, results of the investigations conducted to this point are applicable only when discriminations of this type are being made. It is hoped that future research will establish GSI and logistic regression cutoffs that will permit discriminations of inpatients from outpatients and outpatients from nonpatients. Use of the SA-45 For Treatment Planning Maruish (chap. 1, this volume) defined treatment planning in the behavioral health care setting as that part of a therapeutic episode in which a set of goals for an individual presenting with mental health or substance abuse problems is developed, and the specific means by which the therapist or other resources will assist the patient in achieving those goals in the most efficient manner are identified. (p. 18) It begins with the assumption that the patient is experiencing behavioral health problems and is motivated to eliminate or reduce the identified problems. Further, the goals of treatment are developed by the patient in collaboration with the clinician, are tied either directly or indirectly to the identified problems, have definable criteria for achievement, and are indeed achievable by the patient. Moreover, the prioritization of goals is reflected in the treatment plan. General Treatment Planning Issues Issues in treatment planning arise when those assumptions identified by Maruish (chap. 1, this volume) are not found to be true. Obviously, if no problems requiring behavioral health care intervention are found to exist, a treatment plan is not required. Low or no

< previous page

page_737

next page >

< previous page

page_738

next page > Page 738

motivation to change problems identified by either the patient or involved third parties (e.g., parents, court systems, school personnel, employers) may demand a different-than-usual approach. Also, motivational problems may result in a prolonged treatment process and/or yield less than the maximum gains that otherwise could be made. Even in a motivated patient, the amount and type of gains that are possible may be limited if relevant, achievable goals are not formulated with clearly identified criteria for success. Vague goals with or without specific indicators of achievement provide only vague direction for both the clinician and the patient. And, unless patients have participated in the development and prioritization of treatment goals, it is unlikely that they will be fully active participants in the therapeutic process. Application of SA-45 Research and Clinical Findings to Treatment Planning The SA-45 was developed primarily to enable the measurement of symptomatic improvement in individuals who have received inpatient psychiatric care. However, research conducted during its development and validation indicate that it also can be useful for treatment planning purposes. Identification of Primary and Secondary Problems. As already discussed, the treatment planning process assumes that the individuals presenting themselves to the behavioral health care professional are experiencing one or more problems of sufficient concern to warrant their seeking help to ameliorate the problem(s). Administration of the SA-45 can assist the clinician in quickly determining if indeed there is a significant problem warranting a course of psychological treatment, or if another course of action (e.g., patient education, referral for medical evaluation) is more appropriate. The SA-45 manual (SAI, 1997) provides two means of identifying individuals with significant psychological problems. The simplest is the five-step approach to interpretation presented earlier. Using nonpatient-based area T-score cutoffs for classification purposes, clinicians can easily identify those likely to need some form of psychological intervention, symptom domains that likely are particularly problematic for the patient, and specific symptoms within those domains that are especially troublesome. The manual also provides age- and gender-specific cutoffs for the GSI score and the value derived from logistic regression equations (employing several SA-45 variables) that can be used to help identify respondents likely to require treatment. Ability/Willingness of Patient to Become Engaged in Psychotherapy. There are no specific scales or indices that provide a direct measure of a respondent's ability or willingness to participate in psychotherapy. However, there are a few indicators that can be derived from the administration of the test, which if present may suggest potential problems in engaging patients in the therapeutic endeavor. One is their overt reaction to the request to complete the SA-45. The demands on the test-taker are minimal because the test is relatively brief and contains straightforward items. Complaints about the time required to complete the instrument or about item relevance, failure to respond to several items, or any other negative reaction or form of resistance might be predictive of problems in eliciting the patient's full cooperation during treatment. This may have

< previous page

page_738

next page >

< previous page

page_739

next page > Page 739

implications for the approach the therapist takes with the patient (e.g., direct vs. indirect, behavioral vs. psychoanalytic) and itself may become grist for the therapeutic mill. Appropriate Level of Care. At this time, the SA-45 can assist in addressing level-of-care issues in two ways. First, it can help the clinician determine if the respondent's overall level of psychiatric distress or disturbance and/or level of domain-specific symptomatology are significant enough to warrant the need for behavioral health care services. When employed this way, the SA-45 is being used as a screener. As previously noted, a T-score of 60 or greater on either of the two summary indices (GSI and PST) or one or more of the nine symptom domain scales should prompt the clinician to consider this option. Second, the clinician also may find the tables and formulae for calculating SA-45 base rate-adjusted PPP and NPP statistics presented in the manual (SAI, 1997) useful in discriminating those who may be in need of inpatient psychiatric services from those not in need of any services. Outpatient psychiatric normative data are not yet available, thus the information provided in the manual cannot assist in discriminating the likely need for inpatient versus outpatient services, or the need for outpatient versus no services. Regardless, these statistics may provide clues to the appropriate level of care for individuals presenting with a wide range of psychological problems. Appropriate Therapeutic Approach/Need for Therapeutic Adjuncts. The ability of the SA-45 to assist in determining the "best," or optimal, therapeutic approach was addressed indirectly in the two previous subsections. Similarly, the SA-45 may be useful in determining the appropriateness of certain adjuncts to care provided to psychiatric patients, including the use of psychotropic medication. Although there have been no studies designed to determine whether specific SA-45 indicators for the appropriateness of medication exist, the nonpatient normative data for the nine symptom domain scales may provide some direction. For example, a Depression scale area T-score of 70 or greater (based on nonpatient norms) indicates that 95% of the relevant age- and gender-specific normative group achieved a Depression scale score that was at or below the level achieved by the respondent. Conversely, only 5% of the normative group scored higher. Intuitively, the degree of endorsement of the content of this scalerecent experiences with feelings of depression, loneliness, hopelessness and worthlessness, and a loss of interest in thingswould suggest the presence of a condition that might benefit from the addition of antidepressant medication to the therapeutic regimen. Similarly, significantly high elevations on other SA-45 symptom domain scales (e.g., Anxiety, Psychoticism) should lead clinicians to consider the use of other appropriate types of medications (e.g., antianxiety agents, major tranquilizers) as therapeutic adjuncts. Common Therapeutic Problems. Again, the manner in which the respondent completes the SA-45 should alert the clinician to problems that might be encountered during the course of treatment. These could include those related to the patient's ability to be sufficiently motivated, cooperative, and open to admitting to thoughts, feelings, and behaviors that might be anxiety provoking. In addition, a significantly elevated score on any of the Interpersonal Sensitivity, Hostility, and Paranoid Ideation scales may signal potential problems in forming a therapeutic bond. Additional effort in establishing rapport with patients such as these may be required if meaningful therapeutic progress is to be expected.

< previous page

page_739

next page >

< previous page

page_740

next page > Page 740

Use of SA-45 Findings with Other Evaluation Data The SA-45 can be useful in the treatment planning process, but only if it is used in conjunction with other sources of information (e.g., interview and collateral data, behavioral observations, information from medical records, other test data). On the one hand, it may serve as a source of hypotheses about patient needs and resources that should be verified by data from other sources. For example, a significantly elevated Anxiety scale area T-score (T > 59) obtained from the routine administration of SA-45 at intake should not lead to the assignment of an anxiety disorder diagnosis. Rather, it should lead the clinician to further evaluate the possibility of the presence of such a disorder and the need for treatment targeted to the alleviation of anxious symptomatology. Similarly, although a Depression scale area T-score that is less than 60 suggests that the respondent probably is not depressed, this factor by itself should not rule out the presence of depression, particularly if there is evidence to the contrary (e.g., clinical presentation). Conversely, SA-45 results can be a source of confirmatory information for hypotheses generated by other means. For example, a clinician might use the SA-45 results to validate impressions of a patient derived from a clinical interview. Here, the SA-45 can provide data that may help verify impressions about the presence or absence of significant psychological distress or domain-specific symptomatology, and/or the level at which one or both are being experienced by the respondent. Potential Use and Limits for Treatment Planning in a Managed Care Setting The SA-45 can provide managed care organizations (MCOs) and their providers with an efficient means of gauging a patient's overall level of distress and likely presence of disturbance in specific domains of symptomatology. In doing so, it complements the time-limited, problem-focused approach to treatment that has become the hallmark of MCOs. In conjunction with a report of the problem(s) in the patient's own words, the results of the SA-45 can help the clinician immediately begin formulating and implementing a plan of treatment. Also, repeated administration of the SA-45 at or about the end of the authorized treatment may provide data that can either justify continued treatment or provide evidence that further treatment is not required (discussed later). Provision of Feedback of SA-45 Findings Providing patients with feedback about their test results is not only good practice, but it also is now a requirement specified in the American Psychological Association's Ethical Principles (1992). According to ethical standard 2.09, "Psychologists ensure that an explanation of the results is provided using language that is reasonably understandable to the person assessed or to another legally authorized person on behalf of the client" (p. 8). The presentation of the results via the SA-45 profile allows patients to easily see potential problem areas and their overall level of psychological distress in relation to age- and gender-appropriate reference groups. Finn and his associates (Finn, 1996a, 1996b; Finn & Martin, 1997; Finn & Tonsager, 1992) developed an excellent framework

< previous page

page_740

next page >

< previous page

page_741

next page > Page 741

for providing feedback of the results of multidimensional instruments. Employing Finn's "therapeutic assessment" approach when providing patients with feedback about their SA-45 results potentially has the additional benefit of turning the feedback session into a therapeutic intervention. Thus, it is the recommended approach for providing SA-45 feedback. Limitations/Potential Problems in the Use of the SA-45 For Treatment Planning The SA-45 is an abbreviated version of the original SCL-90 that consists of items selected for their ability to mirror the nine symptom domains of the parent instrument. Thus, it is limited in its ability to detect the presence or absence of symptoms that are not represented by those broad domains. Consequently, it cannot be used to provide a comprehensive, detailed assessment of the psychological status of the test-taker. The manner and purpose for which the SA-45 was developed poses additional limitations to its use for treatment planning purposes. The SA-45 does not yield information relevant to psychological strengths or assets that can be used by the patient in therapy. However, this is a common characteristic of many other "abnormal personality" instruments and not specific to the SA-45. Use of the SA-45 For Treatment Monitoring The planning of behavioral health care treatment does not end with the initiation of the therapeutic intervention. It involves the ongoing evaluation of patient progress and (if necessary) modification of treatment to maximize the patient's chances of achieving the established goals. In some cases, it also may involve modification of the goals themselves. Monitoring patient progress during the course of treatment thus is critical to the success of any therapeutic endeavor. Most frequently, treatment progress monitoring is conducted on an informal basis. Based on their impressions, clinicians typically evaluate and document the patient's progress after each treatment intervention. However, clinicians now are beginning to move toward more structured means of tracking patient improvement over time. The brevity and content of the SA-45 make it an ideal instrument to use for treatment monitoring. Treatment Monitoring with the SA-45 Verification of the appropriateness of the patient's individualized treatment regimen over time is required to ensure that what was initially thought to be the "best" approach continues to be just that. Periodic assessment of the patient's progress toward the achievement of established treatment goals is a critical part of the verification process. Impressionistic evaluation of the patient (i.e., "clinical judgment") certainly is one means of gauging patient progress. However, the subjective nature of this process limits its utility, particularly for tracking patients over extended periods of time. Clinical judgment does not permit close comparison with normative referents, nor does it lend itself well to the statistical analysis needed to determine if changes are "significant." For these reasons, the SA-45 (with its normative data and known psychometric properties) can provide a sound, defensible means of tracking improvement, stagnation, or deterioration

< previous page

page_741

next page >

< previous page

page_742

next page > Page 742

of general and domain-specific psychological disturbance as the patient progresses through treatment. General Considerations. Monitoring changes in the level of psychological distress or disturbance should begin with the administration of the SA-45 at the time of treatment initiation. This will serve as the baseline against which the patient's status may be compared thereafter. Although baseline measurement can be taken at any point in treatment, administering the SA-45 at the beginning of treatment has the added benefit of providing data for treatment planning purposes as described earlier. A decrease in the area T-score(s) of the GSI, PST, and/or relevant symptom domain indices and scales from one point in time to another would suggest that prescribed treatment is having a positive effect and support the continuation of the prescribed treatment. No change or an increase in relevant area T-scores would suggest that the patient's condition has not improved or has deteriorated. Assuming that symptomatic improvement should have taken place during the interval of time between testings, an evaluation of the appropriateness of the treatment would be warranted. Determining Statistically and Clinically Significant Change. How much change in an SA-45 scale or index score should be considered "significant" and deserving of attention? The answer to the question depends on whether clinicians prefer to base their judgment on statistically significant differences or clinically significant differences in test results. Recently, many clinicians and researchers have begun to apply Jacobson and Truax's (1991) reliable change (RC) index to determine the statistical significance of the differences between scores. The RC index is the difference between two scores at two points in time, divided by the standard error of difference (Sdiff). If the resulting value is less than -1.96, then the clinician can be 95% confident (p < .05) that real improvement has occurred. The minimum raw score and area T-score differences required for statistical significance for each SA-45 scale and index were computed from the RC index formula and are presented in Table 24.1. Jacobson and Truax's (1991) RC index allows clinicians to use instruments such as the SA-45 to demonstrate whether behavioral health care intervention has resulted in "statistically reliable" change from any two points in time. However, they and their colleagues (Jacobson, Follette, & Revenstorf, 1984) also acknowledged the importance of determining clinically significant change to patients, clinicians, and researchers. Accordingly, clinically significant change (i.e., improvement) may be described as change that is both statistically reliable and moves the patient either from the range of dysfunction into that of normal functioning, or within the functional (normal) range. Movement occurs when the patient's level of functioning, however measured, falls either (a) two standard deviations from the mean of the dysfunctional population, (b) within two standard deviations of the normal population, or (c) closer to the mean of the normal population than to the mean of the dysfunctional population. The raw-score-to-area T-score conversion tables in the manual (SAI, 1997) allow clinicians to determine whether the patient meets Criteria b or c. Note, however, that the nature of the SA-45 symptom domain T-scores (i.e., no inpatient symptom domain T-scores £ 30) makes it impossible to determine improvement on the symptom domain scales using Criterion a. Notwithstanding, according to Jacobson and Truax, Criterion c is preferable to Criterion b when the normal and dysfunctional distributions overlap, as is the case with the SA-45. This approach for determining clinically significant improvement in the patient's status also can be used to determine if clinically significant deterioration has occurred.

< previous page

page_742

next page >

< previous page

page_743

next page > Page 743

TABLE 24.1 Minimum Raw Score and T-Score Differences Required for Establishing Reliable Change Between Two Test Administrations for SA-45 Scales and Indices Adultsa Adolescentsb Scale Minimum Raw Minimum Minimum Raw Minimum TScore Difference T-score Score Score Difference Difference Difference 4.15 13.33 4.00 11.20 ANX 2.20 6.57 3.60 7.93 DEP 1.95 5.65 6.50 11.75 HOS 2.90 7.73 4.30 8.70 INT 3.05 8.93 4.85 11.64 OC 2.75 8.40 4.55 11.14 PAR 1.25 4.24 3.20 6.51 PHO 1.25 3.96 1.95 5.65 PSY 3.20 10.01 6.80 14.91 SOM 9.95 10.39 11.20 10.62 PST 13.50 8.09 21.15 10.92 GSI Note: Change scores derived from application of Jacobson and Truax's (1991) procedures forb computing the reliable change (RC) index using the test-retest reliability coefficients for nopatient adult and adolescent samples reported in the SA-45 manual (SAI, 1997). aBased on nonpatient test-retest reliability coefficients (N = 57) bBased on nonpatient test-retest reliability coefficients (N = 64) In this case, an RC greater than +1.96 and movement from the range of normal functioning to that dysfunction or within the dysfunctional range would be required. Jacobson and Truax (1991) did not specify what the clinical criteria for deterioration should be, as they did for determining improvement. However, based on the improvement criteria, clinicians might surmise that deterioration has occurred if the patient's SA-45 score(s) move to fall either two standard deviations from the mean of the normal population (in the direction of dysfunction), within two standard deviations of the dysfunctional population, or closer to the mean of the dysfunctional population than to the mean of the normal population. One might also surmise that the third criterion is the preferred standard. Another option might be useful for determining clinically significant change in SA-45 scores. Tables 4.6 through 4.11 of the manual (SAI, 1997) report cross-validated, age- and gender-specific GSI raw score cutoffs and logistic regression p-value cutoffs for classifying respondents as nonpatients or inpatients. The cutoffs for optimized classification and their associated sensitivity and specificity data for combined development and cross-validation age- and gender-based samples are summarized in Table 24.2. Using these cutoffs, individuals obtaining a GSI raw score or a logistic regression p value above the indicated cutoff for their age and gender would be classified as belonging to an inpatient (or dysfunctional) population; individuals with scores below that score or value would be classified as belonging to a nonpatient (normal) population. Thus, in the presence of statistically reliable change, clinicians might consider using change in classification from inpatient to nonpatient, or vice versa (regardless of whether GSI raw scores or p values are employed), from one point in time to another as indicating clinically significant change. Note that logistic regression cutoff values are more sensitive and specific than the GSI scores for all age and gender populations. Also, greater sensitivity and specificity is achieved with adult populations than with adolescent populations, regardless of whether GSI score or logistic regression values are used.

< previous page

page_743

next page >

< previous page

page_744

next page > Page 744

TABLE 24.2 Male and Female Adult and Adolescent GSI Raw Score Cutoffs and Logistic Regression Value Cutoffs for Optimized Classification of Inpatients and Nonpatients and Associated Sensitivities and Specificities Adults Adolescents Males FemalesMales Females Logistic Regression Value Cutoff Cutoff scorea .875 .875 .925 .775 .73 Sensitivity .57 .78 .87 .69 Specificity .68 .86 .87 GSI Raw Score Cutoff 1.58 1.44 1.67 Cutoff scoreb 1.44 .83 .57 .58 Sensitivity .77 .83 .58 .57 Specificity .78 Note: From Manual for the Symptom Assessment -45 Questionaire (SA -45) (Tables 4.8 and 4.11), by Strategic Advantage., Inc., 1996, Minneapolis, MN: Author. Copyright © 1997 by Strategic Advantage, Inc. Adapted with permission. a Logistic regression p value, above which respondents are classified as belonging to the inpatient sample. b GSI raw score, above which respondents are classified as belonging to the inpatient sample. Glide Path Approach to Monitoring. Another type of monitoring also may be implemented. Similar to the manner in which pilots use avionics to maintain their craft within a "glide path" during landing, clinicians may use the SA-45 to track how well the patient is following the "glide path of treatment" (R.L. Kane, personal communication, July 22, 1996). The glide path in this case represents expected symptomatic improvement at specific points over time based on objective data obtained from similar patients at various points during their treatment. As might be expected, this system would allow for minor deviations from the path. The end of the glide path is the treatment plan goal for symptomatic relief. Implementing a glide path system by any clinician or within an organization requires some degree of sophistication in statistical methods, thus limiting the number of settings in which it might be adopted. However, for those having the necessary resources, it can be a powerful means of determining if the patient is "on track" for goal attainment. Considerations for Monitoring Treatment with the SA-45 The frequency at which clinicians might use the SA-45 to monitor the patient is dependent on several factors. The first is the instrument itself. The demands of the SA-45 require the respondent to indicate, on a 5-point scale, "how much [each of the 45 listed symptoms] has bothered or distressed you during the past 7 days, including today." Thus, the SA-45 asks respondents to indicate how much they generally have been troubled by each symptom during the previous week. The responses elicited during a readministration that occurs less than 7 days after the first administration would include the patient's consideration of some portion of their status during the previously considered time period. This may make interpretation of the change in symptom status

< previous page

page_744

next page >

< previous page

page_745

next page > Page 745

(if any) from one administration to another difficult if not impossible. For this reason, it is recommended that the SA-45 not be readministered for at least 7 days. Another consideration is the anticipated length of treatment. If clinicians expect that the patient will be involved in treatment for only a limited time or number of sessions, multiple administrations of the SA-45 may be of little value from a monitoring standpoint. For example, limitations in a patient's health care plan may mean the patient cannot or will not be seen after 10 sessions; or it may be obvious that the presenting problem is, by its nature, likely to be time limited (e.g., bereavement after the death of a family member). In such cases, clinicians may wish to monitor patients only once or twice during the anticipated course of treatment to ensure that they are ontrack for the expected degree of improvement, given the brevity of the intervention. On the other hand, the clinician may wish to plan for more regular and frequent readministrations of the SA-45 (e.g., bimonthly) to those who likely will be seen on a more long-term basis (e.g., schizophrenics, some personality-disordered patients). Variables directly related to the patient may have some bearing on the frequency at which retesting should occur. It is probably safe to say that the chance of obtaining valid results from patients displaying resistance to treatment in general, or to completing the SA-45 in particular, is inversely related to the frequency at which they are required to retake the test. For this reason, resistant/uncooperative patients should not be retested any more than is absolutely necessary. Similarly, the benefits of obtaining retest data for monitoring purposes must be weighed against the psychological cost to the patient. For some individuals, the completion of instruments like the SA-45 can be quite stressful and, consequently, may not yield any benefits for the patient; and, in fact, readministration may result in more harm than good. Thus, clinicians must carefully assess the advisability of administering any instrumenteven onceto such patients. Finally, the clinician must consider the patient's symptomatology. Certainly, clinicians need to be aware of fluctuations in those symptoms that might impair the patient's ability to render valid responses to the SA-45 items. At the same time, the experienced clinician should be able to determine, by the type and severity of the patient's problem(s), the number of sessions or points in time when significant or otherwise expected changes should occur, and then to plan for reassessment with the SA-45 accordingly. In this approach, the clinician essentially develops a nonempirical, clinically based glide path and uses monitoring via the SA-45 to ensure that patients are where they are supposed to be at predetermined points in time. The glide path likely would be problem or symptom specific, such that patients presenting primarily with a social phobia having different glide paths and points of expected change than patients presenting with major depression. Potential Uses and Limitations for Treatment Monitoring in a Managed Care Setting The SA-45 lends itself well to use as a treatment progress monitoring instrument within managed care settings. Its brevity and low cost make it more accessible to organizations treating patients with limited health care benefits. This is an important consideration given that readministration of any psychological instrument can be expensive in terms of both actual costs and lost opportunities for actual treatment. Also, as a multidimensional measure of psychological symptomatology, the SA-45 provides the clinician with a broad survey of various symptom domains and a better measure of psychological distress and disturbance than can be obtained from brief measures of single-symptom

< previous page

page_745

next page >

< previous page

page_746

next page > Page 746

domains (e.g., depression screeners). Moreover, with the availability of scoring and reporting via the publisher's fax-in/fax-out processing service, most clinicians have access to technology permitting immediate access to SA45 findings. The limits to the SA-45's utility in managed care settings resulting from short lengths of treatment have been discussed earlier. These same limitations apply to similar measures of psychopathology. Also, whereas the cost of the SA-45 is quite low relative to comparable instruments, funds used to purchase it may limit the amount of actual treatment that is available to the patient. Use of the SA-45 For Treatment Outcomes Assessment The SA-45 was developed to support the behavioral outcomes research work being conducted for a large, nationwide network of inpatient psychiatric facilities. Thus, features of the instrumentbrevity, coverage of nine symptom domains, indices for summarizing overall level of disturbance, low costmake it more attractive than other measures of psychological distress that were not developed specifically for outcomes assessment. General Issues Use of the SA-45 for outcomes assessment raises no special issues beyond those for similar instruments. Again, it was developed specifically for use in behavioral health care outcomes research with adults and adolescents. Consequently, its use for this purpose is likely to be of much less concern than the use of other instruments that were not developed primarily for outcomes assessment purposes. Evaluation of the SA-45 Against Nimh Criteria for Outcomes Measures Ciarlo, Brown, Edwards, Kiresuk, and Newman (1986; later updated in Newman & Ciarlo, 1994) presented arguably the most comprehensive and relevant criteria for evaluating the utility of psychological assessment instruments for treatment outcomes assessment. As is indicated later, evaluation of the SA-45 against these criteria supports its use as a measure of psychological distress/disturbance in studies investigating the outcomes of behavioral health care intervention. Relevance to Target Groups. The SA-45 was developed to assess the outcomes of treatment rendered to psychiatric inpatient populations. With its ability to broadly evaluate a number of symptom domains and the availability of gender- and agespecific nonpatient norms, it has applicability for use with individuals with a wide variety of symptomatology who are being seen in either inpatient or outpatient behavioral health care facilities. Simple, Teachable Methods. The simplicity of the SA-45 and its self-report format make it an instrument requiring little training to ensure appropriate administration and scoring. Its accompanying manual provides all the information the test administrator

< previous page

page_746

next page >

< previous page

page_747

next page > Page 747

needs to maximize the chances of obtaining the most valid results from the administration of the test. Use of Measures with Objective Referents. No objective referents or definitions are provided for the SA-45's five response choices (e.g., ''Not at all," "All the time"). However, it is rare for any objective, self-report measure of psychopathology to provide definitions, examples, and so forth for the response choices that are available to the patient. Also, it might be argued that what clinicians are trying to measure with the SA-45 and similar instruments is patients' perception of themselves during a particular period of time, and that constraining respondents to arbitrarily determined definitions of constructs such as "Most of the time" lessens the validity of the response for individual patients. Use of Multiple Respondents. As a self-report instrument, the SA-45 permits the communication of the problems from only one perspective: that is, the patient's. However, the availability of age- and gender-specific normative data allows the SA-45 results to be evaluated along with the results obtained from the administration of patient-relevant instruments that are completed by collaterals (e.g., parent, spouse, teacher), provided that these instruments also have relevant normative data. More Process-Identifying Outcomes Measures. Ciarlo et al. (1986) indicated that "measures that provide information regarding the means or processes by which treatments may produce positive effects are preferred to those that do not" (p. 28). Like other symptom-oriented measures, the SA-45 does not offer a means of determining the process(es) by which therapeutic outcome (i.e., symptomatic relief) is effected. It has no theoretical basis, being designed to reflect what has changed and by how much, not why the change has occurred. Psychometric Strengths. A summary of the major findings supporting the validity and reliability of the SA-45 were summarized in the "Psychometric Considerations" section. In general, they support the use of the SA-45 for outcomes assessment purposes. Low Measure Costs Relative to Its Uses. The cost of SA-45 products and services lends itself well to the needs of today's behavioral health care providers and organizations. Use of SA-45 hand-scored materials is quite economical, even when administered multiple times to patients for treatment planning, monitoring, and outcomes assessment purposes. Although a bit more costly, use of the publisher's immediate fax-in scoring and fax-back reporting services is less expensive than comparable scoring and reporting services offered for similar instruments. In addition to providing immediate scoring and reporting, the fax-in service for the SA-45 has the additional benefit of providing to the user electronic files of all raw and scored data submitted by the user on a quarterly basis, at no extra cost. As a result, staff time needed for such activities is reduced or eliminated. This is an important consideration in settings where the data is required for program evaluation, utilization review, quality improvement, and/or any related purpose other than direct patient care. Understandability by Nonprofessional Audiences. The use of area T-scores based on nonpatient normative data facilitates the understanding of SA-45 results by the patient, relatives, staff, third-party payers, and other individuals with a vested interest in the patient's clinical status. Also, most people have little difficulty grasping the meaning of percentiles. Even when presented as patient norms-based percentiles and area T-scores, nonprofessional audiences should experience little difficulty in fully comprehending SA-45 findings.

< previous page

page_747

next page >

< previous page

page_748

next page > Page 748

Easy Feedback and Uncomplicated Interpretation. The SA-45 profile is designed to allow the plotting of the 11 scale and index area T-scores obtained from a single administration of the test. To facilitate the interpretation, horizontal lines at the area T-scores of 60 and 70 are printed across the profile. This allows for a graphical presentation of SA-45 results that enables the user to quickly determine if the patient's overall level of distress or their symptom-specific level of disturbance falls outside of the average range for nonpatients of the same age and gender group. This is particularly useful when the instrument is used for screening purposes, but it also has value when monitoring treatment in progress or assessing treatment outcomes. Manual plotting of SA-45 area T-scores obtained at intake onto a patient's SA-45 termination/discharge profile further facilitates an exposition of how much change has occurred as a result of treatment. Indications of which scale or index scores have "moved" into the unimpaired or "normal" range (T < 60) or into the less impaired range (59 > T < 70) also will be apparent. Usefulness in Clinical Services. Determination of whether or not the SA-45 meets this criterion can be made by answering the following question: Does the information obtained from the SA-45 justify the burden necessary to acquire that information? The answer is yes. Aside from its low cost, other significant features of the SA-45 are its brevity and the fact that it is a self-report instrument. A significant amount of information thus is obtained at minimal cost and with little burden to either patient or clinical staff. Compatibility with Clinical Theory and Practices. The SA-45 is an atheoretical measure of psychological disturbance that should be compatible with any clinical theory that incorporates an understanding of the nature of the psychological symptomatology measured by the SA-45. The availability of the age- and gender-specific nonpatient and inpatient norms make it appropriate for evaluating the effect of treatment interventions for heterogeneous patient groups in various treatment settings. At the time of this writing, comparison of a patient's SA-45 results to gender- and age-relevant psychiatric outpatients is not possible. However, the development of norms for this population is anticipated. Research Findings Relevant to Use of the SA-45 As an Outcomes Measure The SA-45 is a relatively new instrument; consequently, it cannot draw on the type of empirical support that typically accompanies more established instruments. However, data obtained during its development are relevant to its use as an outcomes measure. First, there are the test-retest reliability coefficients obtained from adult and adolescent nonpatient samples. Recall that adult raw score-based correlations generally were found to be in the .80's, with the exceptions being Somatization scale (.69) and Anxiety scale (.42). This possibly was due to the fact that some of the items on each of these scales may be sensitive to variations in normal, everyday experiences. For adolescents, the raw score-based correlations are quite variable, ranging from .51 for the Hostility scale to .85 for the Psychoticism scale. Somewhat similar to the adult findings, the Anxiety scale coefficient (.58) is the next to the lowest of the coefficients for adolescents. Thus, factors unrelated to treatment may influence the results on certain SA-45 scales from one point in time to another. At the same time, employing statistically sound methods

< previous page

page_748

next page >

< previous page

page_749

next page > Page 749

such as the RC index (which takes into account the reliability of the scale or index) should lessen concern about use of the less reliable SAI scales. In addition, Davison et al. (1997) compared the SA-45 scores of gender- and age-matched nonpatients and three inpatient groups assessed at three points in time (intake, discharge, and 6-month follow-up). Generally, the findings were as would be hypothesized for an instrument designed to be sensitive to changes in level of psychological distress: The scores of patients at intake were greater than those at discharge, which in turn were greater than those at follow-up. Also, those of nonpatients were lower than all patient groups. There were some exceptions to this trend, particularly among the adolescent groups. This is consistent with other findings that suggest the SA-45 is more sensitive to changes in the psychological status of adults than those of adolescents. Clinical Applications of the SA-45 For Outcomes Assessment There are several ways in which the SA-45 may be used to assess the outcomes of behavioral health care treatment. How it is applied in a specific behavioral health care setting depends a number of factors. what to Evaluate. The SA-45 provides nine symptom-specific measures and two measures of overall level of psychological distress. The GSI score probably is more frequently used for outcomes assessment purposes than any other SA45 variable. It not only reflects both pervasiveness (i.e., number of symptoms) and frequency of psychological symptomatology in one score, but also is one of the most reliable of the SA-45 variables. At the same time, scores on one or more SA-45 symptom domain scales may prove to be equally valuable. For example, if an outpatient clinic's clientele typically present with depressive symptomatology, then the score on the Depression scale may be as important a variable as the GSI to be evaluated in the clinic's outcomes assessment program. Intended Use. The SA-45 can serve any one of several purposes related to an organization's outcomes assessment initiative. The most obvious is that of providing an outcomes variable, that is, a direct measure of treatment outcomes in the domain of psychological functioning. This probably is the most common use of the SA-45 when employed for outcomes assessment. However, it also can serve other purposes. The first of which is as a predictor of other outcomes or related variables. For example, SA-45 scores obtained at the time of treatment initiation might serve as either a predictor of other outcomes, such as medical resource utilization or work functioning 6 months postdischarge, or as a predictor of variables that may have a relation to outcomes variables. This might include length of stay (LOS) in an inpatient facility, number of medications prescribed, or other process variables. The SA-45 also may help ensure fair and meaningful comparisons of outcomes among behavioral health care providers. When its results (GSI most frequently) are employed as a risk adjustor, it can help "level the playing field" by taking into account the fact that the patients of one provider typically may present with more severe psychological disturbance than the patients of the other providers. Risk adjusting outcomes by SA-45 results, either alone or along with other relevant variables (e.g., age, gender, history of previous treatment), is particularly important when facilities or organizations serving significantly different patient populations are being compared to each other or to a standard that represents the average of several facilities or organizations.

< previous page

page_749

next page >

< previous page

page_750

next page > Page 750

When to Measure. The outcomes of treatment can be determined by comparing a patient's status on the variable(s) of interest immediately before or at the time of treatment initiation (or intake), and then again at the point of treatment termination (or discharge). When administered at the beginning of treatment, the SA-45 provides a baseline measure of both overall and domain-specific psychological functioning. These results can then be compared with SA-45 termination results to determine how much change (if any) has occurred as a result of the treatment intervention. A comparison of SA-45 intake and discharge scores will yield important but limited information about the effect that treatment has had on the patient's psychological status. If the SA-45 is readministered one or more times after the discharge assessment (e.g., 3, 6, or 12 months postdischarge), the results from these follow-up assessments can be compared to those obtained at intake, discharge, or both. This will permit the clinician to draw additional and possibly more important conclusions regarding the effectiveness of treatment, that is, whether treatment has resulted in lasting effects on the patient's level of psychological functioning. Deciding whether to conduct follow-up assessment of patients with the SA-45 and/or any other assessment tool is not a simple matter. There are a number of issues that should be addressed when considering the incorporation of a follow-up assessment component into a provider's outcomes management program. Among the most important of these are: How useful is this type of information to the provider? How will the data be used? Who will be assessed (i.e., a sample of patients vs. all patients)? Should the assessment be conducted by phone interview or mail-out/mail-back survey? What financial and personnel resources are available for this undertaking? What is the likelihood of locating former patients months after discharge? There is no question that SA-45 lends itself well to follow-up assessment and can provide valuable information; the major issue is whether this is an endeavor that the provider can successfully undertake. How to Analyze the Data. The previous discussion of methods for determining statistical and clinical significant changes in SA-45 scores for treatment monitoring also applies to the use of the instrument to assess the treatment outcomes for the individual patient. Use of Jacobson and Truax's (1991) RC index and group membership criteria to determine clinically significant change in psychological functioning is appropriate, regardless of whether the clinician is assessing change from intake to discharge, discharge to follow-up, intake to follow-up, or from one follow-up assessment to another. The analysis of aggregated SA-45 data requires a different approach that is beyond the scope of this chapter. The reader is referred to Newman and Tejeda (chap. 8, this volume) for an excellent discussion of approaches to the analysis of group data. Use of SA-45 Findings with Other Evaluation Data The SA-45 can meet the need for a measure of current level of psychological functioning within a comprehensive outcomes assessment and management system. Results of the SA-45 can easily be integrated with patient- or clinician-reported data pertaining to other aspects of patient functioningsocial functioning, occupational functioning, academic performance, well-being, substance use, medical resources utilizationto present a clear picture of changes that have occurred as a result of treatment. SA-45 results are independent of findings from other measures and thus do not present the user with redundant information. In aggregate, the SA45 resultsparticularly the GSI scorecan

< previous page

page_750

next page >

< previous page

page_751

next page > Page 751

be used as risk-adjustment variables or predictors of outcomes in other domains of functioning. Provision of Feedback Regarding Outcomes Assessment Findings Treatment outcomes data is of potential interest to several stakeholders, including the patient, the service provider, and third parties with a vested interest in the patient (e.g., payers). The manner in which SA-45 findings are presented to illustrate changes in psychological functioning resulting from treatment will depend on the intended recipient of this information. The presentation of the results to the patient via the SA-45 profile of both pre- and posttreatment scores allows the patient to see how much they have improved as well as their overall level of psychological distress in relation to age- and gender-appropriate nonpatient norms. Depending on the level of interest and intellect of the patient, clinicians also might wish to supplement this information with discussions of changes in specific SA-45 item responses and/or the statistical and clinical significance of scale and index changes. A version of Finn's "therapeutic assessment" approach to providing assessment feedback to patients (Finn, 1996a, 1996b; Finn & Martin, 1997; Finn & Tonsager, 1992), modified for the discussion of posttreatment data as opposed to pretreatment data, may provide an excellent framework for this process. Providers and third parties are audiences that may require a different exposition of outcomes data. The primary difference is that these stakeholders are more likely to be interested in findings related to specific groups of patients rather than individual patients. A discussion of this type of exposition is presented later. Limitations of the SA-45 For Outcomes Assessment Purposes The SA-45 has the same limitations for use as an outcomes instrument as many other self-report, multidimensional measures of psychiatric symptomatology. Patients must have a certain minimum reading ability (sixth-grade level), and their psychological state must permit valid, reliable responding to the 45 items. Barring the fulfillment of these requirements at the time of treatment initiation, an objectively measured baseline of psychological distress may not be possible. However, the clinician may consider reading the SA-45 items to patients if their reading ability is the only issue. In this situation, having them indicate their answers on the SA45 answer sheet rather than giving an oral reply to the examiner would probably lessen, but not eliminate, the common concerns about the effects of this nonstandardized form of administration on the psychometric characteristics of the instrument. Two other limitations may be important to consider. First, the SA-45 does not attempt to assess all possible psychiatric symptom domains. Thus, amelioration of certain types of symptoms (e.g., those related to eating disorders, specific types of phobias, sexual dysfunction) may not be reflected in SA-45 results. Another potentially important consideration is related to the test-retest reliability of certain SA-45 scales. Note that the coefficients for the Somatization scale are relatively low for both adults and adolescents, as is the Anxiety scale coefficient for adults. Consequently, greater

< previous page

page_751

next page >

< previous page

page_752

next page > Page 752

changes in area T-scores or raw scores from one point in time to another would be required for these scales than for the other symptom domain scales (see Table 24.1). Use as a Data Source for Behavioral Health Care Service Report Cards Behavioral health care report cards are becoming common tools for communicating aspects of an organization's effectiveness in treating the populations it serves. Among the most important information that is typically conveyed in these report cards is that of the degree of positive change in the patients' level of psychological distress or disturbance. Thus, the SA-45 provides the type of data that can be useful and informative for the intended report card audience. Moreover, it lends itself well to the types of analyses that are frequently employed for the purpose of these reports. Generally, there are two ways in which the SA-45 can help provide evidence of an organization's ability to effectively treat patients. The first is by providing a direct measure of change in psychological status. Although any of several variables might be employed to show change, the single best and most useful SA-45 measure for this purpose is the GSI area T-score. This is because the GSI is more representative of the patient's general level of symptomatology than any other SA-45 measure. It is, in fact, a combination and representation of all nine symptom domain measures. In addition, experience indicates that professionals and nonprofessional/lay audiences either are familiar with the GSI (through its use with other instruments) or can easily grasp the nature of what is being represented by this summary index. Similarly, the nonpatient norms-based area T-score is a metric that is easily understood by most patient care stakeholders. Thus, using GSI nonpatient-based area Tscores enables an organization to convey the most meaningful information about change in psychological status, in the most understandable form to all parties with an interest in treatment outcomes. There are several ways in which aggregated GSI data can be used to represent a degree of improvement in the patient sample of interest. The most obvious (and arguably the most useful) is a straightforward average GSI area T-score change from treatment initiation to treatment termination. This data also may be delineated for subsamples based on diagnosis, age, LOS/number of outpatient sessions, payer of services, service unit, clinician, another outcomes variable (e.g., change in occupational functioning), or any other variables that would be meaningful for the intended audience. GSI data also can be presented in terms of the percent of patients who exhibited an area T-score change that is greater than a minimum standard set by the organization or another relevant party (e.g., accrediting body, payers). A useful minimum standard might be a decrease in GSI by 10 or more points, or a clinically significant GSI T-score decrease as defined by Jacobson and Truax (1991). Another way in which the SA-45 can help provide evidence of an organization's ability to effectively treat patients is by using it to risk adjust findings in other outcome domains. It is not uncommon for providers with less than favorable outcomes to complain that their patients' outcomes are worse because "My patients are sicker." As alluded to previously, SA-45 results can assist in making fair outcomes comparisons across providers, service units, and organizations by adjusting data that might have been influenced by the patients' level of psychological distress or disturbance. Further adjustment might be made based on other variables (e.g., LOS, education, motivation to engage in treatment) that the organization has found to be related to the outcome

< previous page

page_752

next page >

< previous page

page_753

next page > Page 753

domains of interest. There are a number of sophisticated statistical techniques that can be used for risk adjustment purposes. Discussion of these techniques is beyond the scope of this chapter; however, interested readers are referred to the work of Iezzoni (1994) for an excellent exposition on this topic. Case Study Mr. J is a 37-year-old single, African American male who has a long history of alcohol and other substance abuse dating back to the early 1980s. Most recently, he was admitted to a midwestern substance abuse treatment facility offering the full range of services. This most recent course of treatment was sought after the state threatened to discontinue welfare and disability benefits unless he resumed treatment for substance abuse. Mr. J has less than a high school education, is unemployed, and has no trade skills. He has a rather spotty work history, including a tour in the military. While there, he exhibited multiple problems, including substance abuse, and received a medical/psychiatric discharge. Medical records revealed these problems to include a personality disorder and an unspecified tic. Mr. J's history of substance abuse is accompanied by a history of numerous stays in inpatient psychiatric facilities and substance abuse treatment facilities. Over the years, Mr. J has been treated with a wide range of psychoactive medications, including antidepressants, major tranquilizers, anticonvulsants, and lithium. Initiation of the latest round of services began with Mr. J denying any problems with alcohol or other drugs, stating that, "If only you could fix my mental problems, I'd stop drinking and using drugs." This denial of problems was reflected in the results of the SA-45 at the time of intake. Only the PST T-score and a few of the symptom domain T-scores were elevated into the mildly impaired range, but just barely (i.e., T = 60). All other T-scores, including that for the GSI, fell within the average range for nonpatients (see Fig. 24.1). Mr. J was admitted to the treatment facility's intensive outpatient dual diagnosis program for the standard 6-week, 24-hourper-week treatment regimen. His intake diagnoses included alcohol dependence and depressive disorder NOS, with a rule-out of borderline personality and personality disorder NOS with borderline and histrionic features. On completion of the program, the SA-45 was readministered. The results obtained at the second administration were dramatically different form those seen 6 weeks earlier (see Fig. 24.1). The T-scores for all nine symptom domain scales and two summary indices were elevated more than two standard deviations above the mean (T ³ 70) for nonpatient adults. If the RC index values are applied from Table 24.1 and T-scores of 70 or greater are considered as being outside of the range of the "normal" population, then it would be accurate to conclude that clinically significant changein this case, deteriorationoccurred on all 11 SA-45 variables. However, in this particular instance, the clinician also must consider that the change in scores may actually reflect more of a lessening of an initial denial of problems (and, consequently, a more open admission of problems) than an exacerbation of symptomatology during the previous 6 weeks. This conclusion was, in fact, supported by his therapist's observations during that period of time. Mr. J's diagnosis at the end of the 6-week program was depressive disorder NOS and borderline personality disorder. For this reason, he was referred to a local mental health center for traditional, ongoing psychotherapy and medication follow-up. In addition, he participated in the treatment facility's twice-a-week, 15-week dualdiagnosis educational and support group geared to preventing substance abuse relapse. At the

< previous page

page_753

next page >

< previous page

page_754

next page > Page 754

completion of time-limited support group, Mr. J completed the SA-45 once again. Compared to the time of the second administration, he reported clinically significant improvement on symptoms related to obsessivecompulsiveness, somatization, hostility, interpersonal sensitivity, and paranoid ideation, as well as on the total number of symptoms reported to any degree (see Fig. 24.1). At the same time, statistically significant deterioration was noted on scales assessing phobic anxiety and psychotic symptoms. According to his therapist, the SA-45 profile once again accurately reflected the pattern of symptoms exhibited by Mr. J at that point in time, including some transient psychotic symptoms. The therapist also noted that the variation in symptomatology reported over the three administrations of the SA-45 reflected the course of psychological disturbance that is typically exhibited by patients at the facility. Mr. J continues to reside in the community and to receive SSI benefits. He also continues active participation in twice-a-month psychotherapy and monthly medication checks for a prescribed antidepressant. Mr. J reported that the time between his initiation of the intensive outpatient treatment to the completion of the support group represented the longest period of abstinence from alcohol and other drugs that he could remember. At last contact, he was still substance free. Conclusions The development of the SA-45 provides psychologists and others trained in the use of psychological tests with another useful tool for treatment planning and outcomes assessment. Derived from the original SCL-90, the SA45 was designed to assess the same symptom domains as the parent instrument using half as many items. The development

Fig. 24.1. Case study profile from three SA-45 administrations.

< previous page

page_754

next page >

< previous page

page_755

next page > Page 755

of nonpatient and inpatient adult and adolescent gender-specific norms, use of area T-scores instead of the more traditional (but less appropriate) linear T-scores, and the ability to employ sophisticated means for replacing missing responses are among the features that enable wide applicability in a variety of settings with various populations. Although relatively new, initial investigations into its psychometric characteristics support the SA-45 as a measure of psychiatric symptomatology. Cronbach's alpha coefficients and item-total correlations reveal acceptable levels of internal consistency reliability for the nine symptom domain scales. Test-retest correlations obtained from nonpatient adult and adolescent data are generally acceptable, with a few exceptionsmost notably, those for the Anxiety scalewhich warrant consideration when using the SA-45 for treatment monitoring or outcomes assessment purposes. The validity of the SA-45 has been examined from a variety of perspectives. It has been shown to be sensitive to expected differences in inpatient and nonpatient groups, as well as to changes in patient symptomatology over time as a result of treatment. This sensitivity is not as pronounced with adolescent populations as it is with adult populations. The SA-45 scales have been found to correlate highly with their companion scales from the SCL-90 and BSI. To some extent, this reflects the fact that the SCL-90 data also served as the source of data for both the SA-45 and BSI data in those investigations. Also, the interscale correlations are quite similar to those found for the SCL-90 and the BSI; at the same time, the SA-45 scales appear to be more independent than those found in these instruments. Finally, item-total scale correlations and the symptomatology assessed by each scale's five items attest to the content validity of the instrument. The availability of separate sets of normative data and accompanying area T-score conversion tables for adult and adolescent nonpatients and inpatients facilitates the interpretation of SA-45 results. For all age and gender groups, the use of the nonpatient norms-based area T-score of 60 (i.e., one standard deviation above the mean) is an appropriate cutoff for determining the likelihood of significant distress, regardless of whether one is evaluating symptom-domain or overall level of distress. Further interpretive information can be obtained from comparing these findings with those resulting from the use of area T-scores based on age- and genderappropriate inpatient norms, as well as looking at individual responses to items comprising those scales with significant area T-score elevations. Moreover, identification of individuals likely to belong to nonpatient versus inpatient populations is facilitated by logistic regression equations employing multiple SA-45 variables and associated cutoffs with known sensitivity, specificity, and prevalence-based predictive powers (PPP and NPP). In addition to helping identify specific symptom domains requiring intervention and to place individuals into appropriate levels of care, the SA-45 can assist in other treatment planning purposes. It can be used to confirm hypotheses generated by patient data obtained by other means. It also can provide information about the respondent's willingness to engage in treatment. Overall, its brevity, low cost, and symptom-focused orientation lend itself not only to treatment planning but also to monitoring change during the course of treatment. Finally, the SA-45 is well-suited as a measure of psychiatric symptomatology for use as part of a behavioral health care outcomes system. It fares quite well when evaluated against standard criteria for outcomes measures, particularly with regard to considerations of cost, psychometric integrity, ease of us, and understandability by nonprofessionals. Also, it provides information that can be useful for the development of behavioral health care report cards from two perspectives. First, comparison of results from pretreatment testing to those from posttreatment and/or follow-up testing can yield data

< previous page

page_755

next page >

< previous page

page_756

next page > Page 756

regarding the effectiveness of the intervention; it thus can serve as a direct measure of treatment outcomes. Second, SA-45 variables can be used to risk adjust other outcomes variables (e.g., work/school performance) or process variables (e.g., length of stay/number of therapy sessions) according to symptom severity, thus facilitating fair comparisons across service providers. As would be the case with any new psychological test instrument, the full utility and value of the SA-45 for treatment planning, monitoring, and outcomes assessment purposes will become evident only as psychologists and other behavioral health care professionals employ it in their clinical and research work. This obviously will take time, but the initial data are encouraging. Acknowledgments Portions of this chapter were adapted from Strategic Advantage, Inc., Manual for the Symptom Assessment-45 Questionnaire (SA-45) (1997), with permission from Strategic Advantage, Inc., Minneapolis, MN. The author also wishes to acknowledge Edwin S. Rivera, CSW, and the Addiction Center of Broome County, NY, for their assistance in development of the case study presented in this chapter. Multi-Health Systems (MHS), Inc., of Toronto, Canada, now publishes and distributes SA-45 software and materials. The SA-45 manual and automated reports offered by MHS are updated and revised versions of the original manual and reports that were previously published and distributed by Strategic Advantage, Inc., and that were referred to throughout this chapter. References American Psychological Association. (1992). Ethical principles of psychologists and code of conduct. American Psychologist, 47, 1597-1611. Boulet, J., & Boss, M. W. (1991). Reliability and validity of the Brief Symptom Inventory. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 3, 433-437. Ciarlo, J. A., Brown, T. R., Edwards, D. W., Kiresuk, T. J., & Newman, F. L. (1986). Assessing mental health treatment outcomes measurement techniques (DHHS Publication No. ADM 86-1301). Washington, DC: U.S. Government Printing Office. Davison, M. L., Bershadsky, B., Bieber, J., Silversmith, D., Maruish, M. E., & Kane, R. L. (1997). Development of a brief, multidimensional, self-report instrument for treatment outcomes assessment in psychiatric settings: Preliminary findings. Assessment, 4, 259-276. Derogatis, L. R. (1983). SCL-90-R: Administration, scoring and procedures manualII for the revised version (2nd ed.). Towson, MD: Clinical Psychometric Research. Derogatis, L. R. (1992). BSI: Administration, scoring and procedures manualII. Baltimore, MD: Clinical Psychometric Research. Derogatis, L. R. (1993). Brief Symptom Inventory (BSI) administration, scoring and procedures manual (3rd ed.). Minneapolis, MN: National Computer Systems. Derogatis, L. R. (1994). SCL-90-R: Symptom Checklist-90-R (SCL-90-R) administration, scoring, and procedures manual. Minneapolis, MN: National Computer Systems. Derogatis, L. R., & Cleary, P. A. (1977). Confirmation of the dimensional structure of the SCL-90: A study in construct validation. Journal of Clinical Psychology, 33, 981-989. Derogatis, L. R., Lipman, R. S., & Covi, L. (1973). SCL-90: An outpatient psychiatric rating scalepreliminary report. Psychopharmacology Bulletin, 9, 13-27.

< previous page

page_756

next page >

< previous page

page_757

next page > Page 757

Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi, L. (1974a). The Hopkins Symptom Checklist (HSCL): A measure of primary symptom dimensions. In P. Pichot (Ed.), Psychological measurements in psychopharmacology. Basel: Karger. Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi, L. (1974b). The Hopkins Symptom Checklist (HSCL): A self-report symptom inventory. Behavioral Science, 19, 1-15. Derogatis, L. R., Rickels, K., & Rock, A. (1976). The SCL-90 and the MMPI: A step in the validation of a new self-report scale. British Journal of Psychiatry, 128, 280-289. Derogatis, L. R., & Spencer, P. M. (1982). The Brief Symptom Inventory (BSI): Administration, scoring and procedures manualI. Towson, MD: Clinical Psychometric Research. Finn, S. E. (1996a). Assessment feedback integrating MMPI-2 and Rorschach findings. Journal of Personality Assessment, 67, 543-557. Finn, S. E. (1996b). Manual for using the MMPI-2 as a therapeutic intervention. Minneapolis, MN: University of Minnesota Press. Finn, S. E., & Martin, H. (1997). Therapeutic assessment with the MMPI-2 in managed health care. In J. N. Butcher (Ed.), Personality assessment in managed care (pp. 131-152). Minneapolis: University of Minnesota Press. Finn, S. E., & Tonsager, M. E. (1992). Therapeutic effects of providing MMPI-2 test feedback to college students awaiting therapy. Psychological Assessment, 4, 278-287. Iezzoni, L. I. (Ed.). (1994). Risk adjustments for measuring health care outcomes. Ann Arbor, MI: Health Administration Press. Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy, 15, 336-352. Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Newman, F. L., & Ciarlo, J. A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M. E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Pallak, M. S. (1994). National outcomes management survey: Summary report. Behavioral Healthcare Tomorrow, 3, 63-69. Reynolds, W. M. (1991). Adult Suicide Ideation Questionnaire professional manual. Odessa, FL: Psychological Assessment Resources. Strategic Advantage, Inc. (1997). Manual for the Symptom Assessment-45 Questionnaire (SA-45). Minneapolis, MN: Author. Ward, J. H. (1963). Hierarchial grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236-244.

< previous page

page_757

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_759

next page > Page 759

Chapter 25 Behavior and Symptom Identification Scale (Basis-32) Susan V. Eisen Melissa A. Culhane Mclean Hospital The 32-item Behavior and Symptom Identification Scale (BASIS-32) was developed to meet the need for a brief but comprehensive mental health status measure that would be useful in assessing the outcome of mental health treatment from the consumer's point of view. It is a brief but comprehensive measure of self-reported difficulty in the major symptom and functioning domains that lead to the need for mental health services. These domains include mood disturbances, anxiety, suicidality, psychotic symptoms, self-understanding, interpersonal relations, role functioning, daily living skills, impulsivity, and substance abuse. This chapter provides an overview of the BASIS-32, including a summary of its development, information about its reliability and validity, a basic interpretive strategy, and the status of available norms. The overview is followed by a discussion of the use of the instrument for treatment planning, treatment monitoring, and outcomes assessment, including a focus on its use in managed care settings. Limitations of the use of the BASIS-32 for these purposes are also presented. Several brief case studies are described to present how BASIS32 assessments can be clinically useful on an individual level. Overview Three main characteristics distinguish the BASIS-32 from most other outcome measures (Eisen, Dill, & Grob, 1994). First, it was empirically derived directly from the patient/client's perspective (Eisen, Grob, & Dill, 1991; Eisen, Grob, & Klein, 1986). Second, it was developed on an acutely ill sample of mental health consumers who required inpatient psychiatric care. Third, it includes both symptoms and functioning difficulties within one measure. The BASIS-32 assesses the consumer's perspective on a broad range of symptoms and problems. The instrument was designed to cut across diagnoses, recognizing the

< previous page

page_759

next page >

< previous page

page_760

next page > Page 760

wide range of symptoms and problems that occur across the diagnostic spectrum. Thus, it is not designed to define a syndrome or to make a diagnosis (cf. Gibbons, Clark, & Kupfer, 1993), but it is appropriate for recipients of mental health services with a wide range of diagnoses. Respondents are asked to report for each item, the degree of difficulty they have been experiencing in the past week. Difficulty is rated on a 5-point scale as follows: 0 = ''No difficulty", 1 = "A little", 2 = "Moderate", 3 = "Quite a bit", and 4 = "Extreme difficulty". Repeat assessments are obtained at desired intervals to assess change during or following treatment. The 32 items are scored into five subscales: relation to self and others, depression/anxiety, daily living/role functioning skills, impulsive/addictive behavior, and psychosis. In addition, an overall mean score is computed. An information packet including a sample copy of the measure, an instruction manual including scoring instructions, and papers reporting reliability and validity is available from the authors. Although copyrighted by McLean Hospital, permission is routinely granted to mental health providers and facilities to use the measure for noncommercial purposes at no charge. Nonexclusive licensing arrangements for commercial use of the BASIS32 by provider, insurance, managed care, pharmaceutical, software development, or consulting organizations can be made directly with McLean Hospital. Target Population Originally developed on psychiatric inpatients, the measure is now widely used among mental health service recipients receiving outpatient or partial hospital care. The BASIS-32 has been used among service recipients from age 14 on up, and is appropriate for all diagnoses excluding those with severe cognitive impairment (e.g., dementia or severe mental retardation). The instrument was one of nine selected for inclusion in a recent review of measures for assessing clinical outcomes among persons with serious mental illness (Dickerson, 1997). Cultural and Language Adaptations The BASIS-32 has been translated into Spanish, French, Japanese, Chinese, Korean, Cambodian, Vietnamese, and Tagalog. Testing of these translations has begun in Massachusetts, California, Hawaii, and Canada. However, psychometric properties and cultural equivalence of these translations have not yet been assessed. Modes of Administration Three different data collection procedures have been used to obtain BASIS-32 assessments: self- administration, structured interview, and telephone interview (Eisen, 1996). Questionnaire self-administration and structured interviews have been used at baseline and later time points. When administered as a structured interview, a clinician, researcher, support staff member, or volunteer reads the items to respondents and elicits their ratings with the help of 8 1/2" × 11" laminated "response cards" on which the rating scale is printed in large letters. Telephone interviews and mailed self-report questionnaires have been used at discharge/termination and followup time points.

< previous page

page_760

next page >

< previous page

page_761

next page > Page 761

Time Points In experimental research in which subjects are randomly assigned to different treatment conditions, one administration of the BASIS-32 is sufficient to determine differential effects of two or more treatment interventions on outcome. However, in mental health services research that is observational rather than experimental, and in continuous quality improvement programs where client characteristics cannot be assumed to be randomly distributed across treatment programs, at least two assessments (one at baseline and one during or following a treatment episode) are generally done to assess change. Short- and long-term follow-up assessments also can be done. Summary of Development Work on the BASIS-32 began from a commitment to develop a mental health outcome measure appropriate for a broad range of psychiatric disorders that would reflect the consumer's point of view. Lazare and Eisenthal (1979) described the value of attending to the consumer's perspective and the positive effects of a consumeroriented approach to the therapeutic process. Consistent with their findings, Eisen and Grob (1982) found that psychiatric outpatients in a rehabilitation program improved significantly in the areas they had identified as goals for treatment, but did not improve in areas they had not identified. The improvement was indicated by clinician report as well as self-report. Thus, an individualized consumer-focused approach to assessment was begun by asking patients what symptoms or problems brought them to the hospital that they would like help with (Target Complaint Procedure; Battle et al., 1966). This process was similar to the "discovery phase" of instrument development described more recently by Kessler and Mroczek (1995). A total of 897 problems were obtained from 354 patients. Twenty clinicians then sorted each of the problems into categories. These sortings were cluster analyzed to derive items that would comprise a standardized measure. Thus, an individualized, consumer-oriented assessment approach was used to develop a standardized measure comprising 32 items that were applicable to a wide range of mental health clients (Eisen et al., 1991). The BASIS-32 was originally developed for use with psychiatric inpatient hospital populations. At the time, there were few self-report measures that were designed for acutely ill cases requiring inpatient care. Most selfreport measures were designed for less impaired individuals treated as outpatients (Eisen et al., 1991). Skepticism regarding reliability and validity of self-reports contributed to the lack of tools available for outcome assessment in severely ill, hospitalized populations (Shrauger & Osborne, 1981). However, despite initial development on psychiatric inpatients, BASIS-32 is now widely used among outpatients as well. Types of Available Norms Both inpatient and outpatient descriptive statistics at two or more time points have been published for BASIS-32 (Eisen, 1995; Eisen & Dickey, 1996; Eisen et al., 1994; Eisen, Wilcox, Schaefer, Culhane, & Leff, 1997; Hoffmann, Capelli, & Mastrianni, 1997; Hoffmann & Mastrianni, 1995; Russo et al., 1997; Sederer et al., 1992). These data are available for comparative purposes. Eisen and Dickey (1996) reported admission and

< previous page

page_761

next page >

< previous page

page_762

next page > Page 762

discharge BASIS-32 scores for different age, sex, and diagnostic subgroups of hospitalized patients. Moreover, outpatient data are reported from a multisite outcome study that included 11 sites within the state of Massachusetts. In addition, several efforts are currently underway to compile BASIS-32 data collected by mental health providers and facilities throughout the United States. It is expected that norms for both treated and untreated samples will be available at the time of this volume's publication. Reliability and Validity Construction of the instrument was followed by a series of psychometric analyses to assess the factor structure, reliability, and validity of the measure. Factor Structure. Factor analysis of BASIS-32 admission data collected on 387 recipients of inpatient treatment resulted in the subscales indicated in Table 25.1. An oblique rotation was used for the factor analysis because the dimensions of symptomatology and functioning were not independent from each other, and because the resulting factors were more clearly differentiated with an oblique rather than orthogonal rotation (Eisen et al., 1994). This factor analysis was replicated by two other independent researchers among inpatients in other settings. Russo et al. (1997) found good factorial convergence among a sample of 361 inpatients; 70% of the items loaded on the same subscales as reported by Eisen et al. Hoffmann et al. (1997) found good factorial convergence among adult inpatients, but lower levels of convergence among adolescents from age 12 to 18. (The BASIS-32 is not recommended for patients under age 14.) Confirmatory factor analysis using linear structural relations (LISREL; Joreskog & Sorbom, 1993) was performed on BASIS-32 intake scores obtained from 407 recipients of outpatient services who participated in a multisite outcome study (Eisen et al., 1997). In confirmatory factor analysis, the data are fitted to the original model and the goodness of fit is assessed. Three measures of goodness of fit confirmed the original factor structure: the ratio of the chi-square to the degrees of freedom (2.97), the adjusted goodness-of-fit index (.80), and the root mean square error of approximation (.069) Despite acceptable factorial convergence among recipients of outpatient care, item-scale correlations reported for two items on the Impulsive/Addictive Behavior subscale and TABLE 25.1 Internal Consistency (Cronbach's Alpha's Alpa) of BASIS-32 Subscales and Overall Mean BASIS-32 Subscales 1985-1986 1986 1995 1995-1996 Inpatients Inpatients Inpatients Outpatients (N = 387)a (N = 144)a (N = 1563)b (N = 399)c Relation to Self/Others .76 .77 .88 .89 Daily Living/Role Functioning .80 .79 .88 .88 Depression/Anxiety .74 .76 .87 .87 Impulsive/Addictive Behavior .71 .68 .66 .65 Psychosis .63 .43 .66 .66 Overall Mean .89 n.a. n.a. .95 a Eisen, Dill, and Grob (1994). b Eisen and Dickey (1996). c Eisen, Wilcox, Schaefer, Culhane, and Leff (1997).

< previous page

page_762

next page >

< previous page

page_763

next page > Page 763

two items on the Psychosis subscale were less than .40, suggesting that an alternative subscale structure may provide a better fit for outpatients. Future work will further explore this possibility. Results of psychometric studies of the reliability and validity of BASIS-32 assessment data are summarized next. Internal Consistency. Internal consistency (Cronbach's alpha) coefficients were computed for each BASIS-32 subscale and for the overall mean on three samples of psychiatric inpatients and one sample of psychiatric outpatients (Cronbach, 1951). Results for each of the samples are presented in Table 25.1 (Eisen et al., 1994, 1997; Eisen & Dickey, 1996). Alpha coefficients reported by Hoffmann et al. (1997) were comparable to those reported by Eisen et al. (1994) for adults. Internal consistency of the psychosis subscale was lower, however, for adolescents from age 12 to 18. (r = .56; Hoffmann et al., 1997). Alpha coefficients reported by Russo et al. (1997) were higher than those reported in our own work, ranging from .74 to .81 for admission scores and .76 to .89 for discharge scores. Test-Retest Reliability. Test-retest reliability coefficients, computed on a separate sample of 40 inpatients assessed twice within a 2- to 4-day period, were as follows: Relation to Self and Others, r = .80; Daily Living/Role Functioning, r = .81; Depression/Anxiety, r = .78; Impulsive/Addictive Behavior, r = .65; Psychosis, r = .76; Full Scale, r = .85 (Eisen et al., 1994). Content Validity. Discriminant (content) validity was assessed by analyzing whether specific BASIS-32 subscale scores differentiated patients with corresponding diagnoses (Eisen et al., 1994). As expected, results indicated that patients with a diagnosis of unipolar depression had significantly higher scores on the Depression/Anxiety subscale compared to patients with other diagnoses. Patients with a psychotic disorder had significantly higher scores on the Psychosis subscale than patients not diagnosed with psychosis; and patients with a substance abuse disorder had significantly higher scores on the Impulsive/Addictive Behavior subscale than patients without a substance abuse diagnosis. Construct Validity. Construct validity was assessed by correlating BASIS-32 subscale scores with similar SF-36 subscale scores among psychiatric outpatients (Eisen et al., 1997). Results indicated high correlations among the SF-36 mental health dimensions and the Relation to Self/Others, Depression/Anxiety and Daily Living/Role Functioning subscales of BASIS-32 (rs ranged from .55 to .74). Criterion Validity. Eisen et al. (1994) assessed concurrent criterion validity by relating objective indicators of functioning at follow-up (6 months posthospital admission) with BASIS-32 follow-up scores. Two objective criteriacontinued hospitalization or rehospitalization during the 6 months after admission, and employment status at followupwere compared with patients' subjective reports of difficulty at the follow-up point. Results indicated a statistically significant linear trend. BASIS-32 follow-up scores were monotonically related to hospital status such that patients who were discharged to the community and had remained so during the 6-month follow-up period, reported significantly less difficulty than patients who had been rehospitalized during the 6-month follow-up interval, and than patients who were currently hospitalized at the 6-month time point. The latter group reported more difficulty than those who had remained continuously in the community or had been rehospitalized but were currently in the

< previous page

page_763

next page >

< previous page

page_764

next page > Page 764

community. Regarding employment status among patients identified at admission as having a paid occupation, scores on the Daily Living and Role Functioning subscale at follow-up were expected to differentiate those who were from those who were not working at follow-up. Results supported this expectation; patients who were working reported significantly less difficulty with respect to daily living and role functioning than those who were not working. Predictive Validity. Of particular interest in mental health services research is the value of a measure for predicting future service use. Predictive validity was assessed in an 11-site outpatient study that compared intake BASIS-32 scores of clients who had at least one return visit following initial evaluation, to intake scores of clients who did not return for further treatment (Eisen et al., 1997). Results indicated that those who did not return reported significantly less difficulty with Relation to Self/Others and with Depression/Anxiety. This finding is consistent with the idea that those who did not return felt less need for treatment due to lower levels of distress. Sensitivity to Treatment Effects. The BASIS-32 has been shown to be sensitive to change following treatment in both inpatients and outpatients. Mean scores at admission/intake and discharge/follow-up for three different samples, along with effect sizes (Cohen, 1988), are summarized in Table 25.2. All of the reported pre-post differences are statistically significant beyond the .001 level. Effect sizes for inpatients treated in 1995 range from moderate to large (.46 to .89). Outpatients treated in 1995, reporting lower levels of difficulty at intake than inpatients, exhibited moderate effect sizes ranging from .31 to .53. In addition to the work reported, three studies have been published reporting BASIS-32 data from other sites. In one, Hoffmann and Mastrianni (1995) used the BASIS-32 to assess outcome of two different treatment regimens (inpatient care followed by partial hospitalization vs. inpatient care followed by routine outpatient care). Results indicated that mean scores for both groups improved between hospital admission and discharge, but those receiving partial hospital care continued to improve during a 6-month follow-up interval, whereas those receiving routine outpatient care did not continue to improve. In two studies assessing reliability and validity of the BASIS-32 in adult inpatient settings, factor structure, internal consistency reliability, concurrent validity, and sensitivity to change were comparable to those reported earlier in the literature (Hoffmann et al., 1997; Russo et al., 1997). One of the studies (Hoffmann et al., 1997) replicated the analysis for adolescents separately and found lower levels of factorial congruence, reliability, and validity for adolescents. Basic Interpretive Strategy The BASIS-32 has a user-friendly, intuitive scoring system such that subscales and overall mean scores are computed on the same 0 to 4 rating scale as the individual items. Subscale and overall mean scores can be compared for different programs, payers, providers, patient/client demographic subgroups, and so on. In addition, when administered before and during or after treatment, pre-post comparisons can be made to assess improvement following treatment. Inferential statistical techniques can be used to assess statistical significance of both within- and between-group comparisons. Scores can also be risk adjusted to account for differences in facility, patient, and program characteristics.

< previous page

page_764

next page >

< previous page

page_765

next page > Page 765

TABLE 25.2 BASIS-32 Sensitivity to Change in Three Sample 6-Month FollowAdmission up 1986 Inpatientsa M SD M SD Effect (N = 247) Size Relation to Self/Others 1.47 .93 1.29 .99 .19 Daily Living/Role 1.77 1.05 1.21 .99 .55 Functioning Depresson/Anxiety 1.93 1.05 1.21 1.00 .70 Impulsive/Addictive .86 .86 .45 .61 .55 Behavior Psychosis .77 .91 .40 .64 .47 BASIS-32 Overall Mean 1.34 .68 .93 .74 .58 Admission Discharge 1995 Inpatientsb M SD M SD Effect (N = 949) Size Relation to Self/Others 2.16 1.03 1.46 .99 .69 Daily Living/Role 2.24 1.04 1.46 .99 .76 Functioning Depression/Anxiety 2.40 1.04 1.48 1.01 .89 Impulsive/Addictive 1.13 .85 .64 .73 .62 Behavior Psychosis .86 .88 .49 .73 .46 BASIS-32 Overall Mean 1.81 .80 1.15 .77 .84 60-Day FollowIntake up 1995-1996 Outpatientsc M SD M SD Effect (N = 223) Size Relation to Self/Others 2.02 .98 1.51 .93 .53 Daily Living/Role 2.01 1.01 1.60 .97 .41 Functioning Depression/Anxiety 2.05 1.03 1.51 1.01 .53 Impulsive/Addictive .90 .74 .61 .66 .41 Behavior Psychosis .73 .79 .50 .71 .31 BASIS-32 Overall Mean 1.58 .78 1.18 .74 .53 Note: Rating Scale: 0 = "No difficulty," 1 = "A little," 2 = "Moderate," 3 = "Quite a bit,'' 4 = "Extreme." a Eisen, Dill, & Grob (1994). b Eisen, dill, & Grob (1996). c Eisen, Wilcox, Schaefer, Culhane, & Leff (1997). On an individual level, responses can be used to gain an understanding of how patients perceive their own symptoms and problems. Items on which patients report a high degree of difficulty can become the focus of setting goals for treatment. Extreme difficulty reported on a number of specific items such as suicidal feelings or behavior, and controlling temper/violence, can serve to guide approaches to ensuring patient safety within the community or treatment setting. Use of BASIS-32 For Treatment Planning Although originally developed as a research tool to assess treatment outcome, ongoing quality improvement and outcome assessment demands have led to the use of the BASIS-32 as an integral part of a clinical evaluation. Not designed to replace a

< previous page

page_765

next page >

< previous page

page_766

next page > Page 766

comprehensive clinical evaluation, BASIS-32 may be viewed as an important part of the evaluationthe part that provides the clinician with patients' perspective on their symptoms and problems. Use of the BASIS-32 for treatment planning is just beginning to be empirically evaluated. Three such applications of the instrument are summarized here. Use of Basis-32 To Predict Readmission Within 30 Days of Discharge Readmission within 30 days of a previous discharge is one of four quality indicators identified by McLean Hospital's Continuous Quality Improvement program. Although many hospitalized patients have long histories of mental illness, if someone is rehospitalized so soon after discharge it is appropriate to examine whether the quality of care received in the hospital, the preparation for discharge, and/or the aftercare plans were successful in providing the supportive resources needed for patients to remain in the community. In an effort to reduce the rate of rapid rehospitalization, a continuous quality improvement (CQI) team was organized to identify patients at high risk for readmission within 30 days, and to develop specific readmission prevention strategies for high risk patients. Logistic regression analysis was done on 938 patients hospitalized in 1995 to identify predictors of readmission within 30 days. The analysis identified six statistically significant predictors, including BASIS-32 overall mean scores at admission and discharge from the hospitalization prior to the readmission: Those who were readmitted within 30 days reported more difficulty on admission, and on discharge from the hospitalization prior to the readmission. Other significant predictors included male gender, marital status of separated or divorced, greater number of previous psychiatric hospitalizations, longer length of stay during the hospitalization prior to the readmission (an average of 15.3 days compared to 12.3 days for nonreadmitted patients), and diagnostic comorbidity (multiple Axis I diagnoses). Taken together, the identified predictors suggest that those readmitted within 30 days are more chronic, persistent, and severely ill than those not readmitted within 30 days. These findings are consistent with those reported by Postrado and Lehman (1995), who found that self-reported symptom severity and previous hospitalization were predictive of rehospitalization. Figure 25.1 presents BASIS-32 scores on discharge from the initial hospitalization for patients rehospitalized within 30 days versus not rehospitalized within 30 days. Results indicated that patients readmitted within 30 days reported significantly more difficulty with respect to relation to self/others, depression/anxiety, daily living skills, impulsive/addictive behavior, and on the overall mean than did patients not readmitted within 30 days. After developing means for identifying high risk cases the task force developed a relapse prevention plan to pilot test. Future work will assess the effectiveness of the relapse prevention plan in reducing readmissions within 30 days among those at greatest risk for rapid rehospitalization. Use of Basis-32 To Enhance Treatment Alliance Treatment alliance has long been established as an important factor in therapeutic outcome (Allen et al., 1996; Horvath, 1994). Recognizing and acknowledging patients' perspectives on their symptoms and problems, as documented on the BASIS-32, provides

< previous page

page_766

next page >

< previous page

page_767

next page > Page 767

Fig. 25.1. Mean BASIS-32 scores at discharge for patients readmitted within 30 days versus non-readmitted patients. a means of including patients in the evaluation and treatment planning process that may enhance therapeutic alliance. In the interest of assessing potential utility of BASIS-32 as a therapeutic tool, a research project was initiated to determine whether using the BASIS-32 as a vehicle for acknowledging the patient's perspective will improve short-term, self-reported outcomes and satisfaction with care. To test utility of the BASIS-32 for this purpose, a pre-post comparison study with two groups and two time points was designed. The study is being implemented on an inpatient unit. Approximately half the patients on the unit are assigned a psychiatric resident who serves as a member of the treatment team; these patients comprise Group 1. When residents meet with each new patient assigned to them, they review with patients their BASIS-32 responses and scores. Residents are instructed to inform patients that their view of their symptoms and problems is important to the clinical staff and their concerns will be brought back to the rest of the clinical team; explain to patients what

< previous page

page_767

next page >

< previous page

page_768

next page > Page 768

BASIS-32 measures and what the scores mean; and begin to build an alliance with patients by recognizing and understanding the problems they have identified as causing the greatest difficulty. Thus, there is a systematic mechanism for a member of the clinical team to acknowledge and further explore the areas identified by the patient as most difficult, and to target those areas as a focus of the treatment plan. Approximately half the patients are not assigned a resident as a member of the treatment team, and thus comprise a comparison group (Group 2). They will have completed the BASIS-32 as part of the admission/evaluation process, but the staff does not review their responses with them. Both groups complete repeat BASIS-32 assessments prior to hospital discharge, along with a patient satisfaction survey. BASIS-32 outcome scores, patient satisfaction, and readmission within 30 days will be compared for the two groups to assess differences as a result of the intervention. Results are expected to be available by the time this volume is published. Use of Basis-32 To Identify Primary and Secondary Problems On an individual level, the BASIS-32 can be used to identify primary and secondary problems from the client's perspective. Where BASIS-32 subscales overlap with diagnosis, there is usually consistency in that patients diagnosed with depression or anxiety tend to report more difficulty with depression and anxiety on the BASIS32 than do patients with other diagnoses (Eisen et al., 1994). Frequently, however, patients report high levels of difficulty in areas that do not correspond to their diagnosis. For example, in a dual diagnosis program in which substance abuse may be the primary diagnosis, other symptoms such as depression may lead to a hospitalization, and may be of greater concern to the patient. Problems in interpersonal relationships, daily living skills such as holding a job, and depression are often identified as more difficult than psychotic symptoms for patients diagnosed with schizophrenia. Thus, whereas a clinician may see psychosis as the main focus of treatment, the patient may identify other priority areas for treatment. On an aggregate level, the BASIS-32 can be used to identify the prevalence of particular types of symptoms and problems in specific populations. For example, in two studies, the BASIS-32 was used to identify the extent of substance abuse among adult psychiatric inpatients admitted for other reasons (Eisen, Grob, & Dill, 1989), and among adolescents admitted for inpatient care (Eisen, Youngman, Grob, & Dill, 1992). Knowledge of the extent of particular kinds of symptoms and problems that are common to a group of service recipients can be extremely useful in program planning. Potential Use or Limits for Treatment Planning in a Managed Care Setting Because of their oversight of large numbers of cases receiving mental health services from a wide range of settings and providers, managed care companies are in a unique position to collect both treatment planning and outcomes data, and to compare results obtained for different providers, settings, and levels of care. They have the capacity to collect comprehensive utilization, quality of care, and cost data to which individual facilities do not have access. Moreover, they can combine these data to better understand the relations between quality, costs and outcomes.

< previous page

page_768

next page >

< previous page

page_769

next page > Page 769

However, potential exists for misuse of such data to restrict access to needed services. A particular concern may arise when the same instruments are used both for outcome assessment and for "decision support." Decision support refers to the processes and methods used for decision making regarding treatment utilization, such as level of care (whether or not to hospitalize), number of psychotherapy sessions needed, type of medication to prescribe, and so on. Clearly, these decisions must be made, and it is useful to develop tools that can help make the most appropriate decisions (Lyons, Shasha, Christopher, & Vessey, 1996). Furthermore, clients should be involved in the decision-making process. However, input into decisions about treatment and input into assessment of outcome may conflict. For example, a patient may have experienced significant improvement as reflected in their BASIS-32 scores, but may need continued treatment to maintain that level in the community. An insurer or managed care company may feel that, based on the outcome, no further benefit will be achieved with continued treatment, and no reimbursement will be provided for additional services. If patients know that such treatment decisions will be made based on their responses to an outcome instrument, they may be motivated to respond in the way they believe will get them the desired treatment. Issues of informed consent also come into play. In the interest of obtaining honest responses to assessment measures, the BASIS-32 instruction manual (Eisen & Youngman, 1997) suggests informing patients at the outset that "This is not a test. There are no right or wrong answers. We just want to know how you are feeling in each of these areas." If the BASIS-32 is to be used to make decisions regarding level of care or services to be covered by insurance, patients should be so informed. The effect of what respondents are told regarding the use of a particular instrument on their responses is an empirical question that should be systematically investigated. Provision of Feedback Regarding Findings from the Instrument Both case-level and aggregate BASIS-32 data can be provided to clinicians involved in the treatment process. Scoring and reporting hardware and software are commercially available that allow for the scanning of BASIS32 questionnaires, automated scoring of the items for the five subscale and one overall mean scores, and the graphic presentation of the scores at multiple time points. When used as part of the clinical evaluation, the printed score reports are included as part of the medical record and are available to clinical staff for their use in treatment planning. The reports provide immediate feedback as to the client's self-reported symptom and problem difficulty. In addition to the subscale and overall mean scores, the report identifies responses indicating "quite a bit" or "extreme" difficulty on six key items of potential clinical significance: adjusting to major life stresses, suicidal feelings or behavior, fear/anxiety or panic, drinking alcoholic beverages, taking illegal drugs, and controlling temper/anger/violence. These items were identified by clinicians as particularly useful for treatment planning and case management. To supplement the graphic reports that are automatically generated by the scoring software, a brief explanation has been prepared for clinical staff regarding how to read the reports. Feedback regarding BASIS-32 scores can also be shared with mental health consumers. For this purpose, a brief explanation for treatment recipients has also been prepared. If patients have questions or concerns about their scores, then they are encouraged to discuss them with their clinician. Staff and client explanations are presented in Fig. 25.2 and Fig. 25.3, and a sample individual BASIS-32 report is presented in Fig. 25.4.

< previous page

page_769

next page >

< previous page

page_770

next page > Page 770

CLINICIAN FEEDBACK The Behavior and Symptom Identification Scale (BASIS-32) is a 32item patient self-report scale that measures five symptom and functioning domains described on the left side on the back of this page, as well as an overall average of the 32 items. Scores can range from 0 to 4, with higher scores indicating greater difficulty. The patterned bars show level of difficulty reported by the patient before intake. The black bars show level of difficulty reported by the patient at followup. Scores at both times are also printed to the right of the bars. By comparing intake with follow-up scores you can see if the patient's symptom and problem difficulty went up or down in the interval between intake and follow-up (generally 30-60 days.) Below the graphs is a section identified as initial, previous, and current. Initial means the first time the patient filled out the questionnaire (before intake). Current means the second time the patient filled out the questionnaire (follow-up). Previous means the time before current if the patient filled out the questionnaire more than twice. Since BASIS-32 was completed only two times, previous is not applicable (N/A). The dates listed are the dates the questionnaires were scanned into the computer (not the intake or follow-up dates). If the patient reported quite a bit or extreme difficulty on any of six particular BASIS-32 items, these ratings are listed at the bottom of the page. The six possible items are: adjusting to major life stresses, fear/anxiety/panic, suicidal feelings or behavior, drinking alcoholic beverages, taking illegal drugs/misusing drugs, and controlling temper/outbursts of anger or violence. Fig. 25.2. Staff explanatory feedback for patients treated in ambulatory settings. Aggregate reports of intake and discharge or follow-up BASIS-32 scores are compiled and provided to clinical and administrative staff to describe particular patient/client subgroups, to compare programs, and to track changes over time in the client population or in patient/client outcomes. With any comparative program report, adjustments for program type, demographic, clinical and case-mix severity should be made to ensure valid comparison groups. Alternatively, BASIS-32 scores for diverse sample subgroups can be examined to assess differences among particular demographic and clinical subgroups. Aggregate reports can present results in a variety of ways. Most commonly, mean BASIS-32 subscale and overall scores at admission/intake and discharge/followup, are presented as shown in Fig. 25.5. Limitations of the Use of Basis-32 For Treatment Planning The BASIS-32 provides only one perspective on symptom and problem difficulty: that of the patient or client. It is not a clinical measure of impairment and should not replace a thorough clinical evaluation. Rather, it is designed to be one part of the evaluation,

< previous page

page_770

next page >

< previous page

page_771

next page > Page 771

Thank you for taking the time to complete the Behavior and Symptom Identification Scale (BASIS-32). Your answers will help us learn how much difficulty you have been having with symptoms and problems that bring people to treatment. BASIS-32 asks how much difficulty you have been experiencing during the past week, in each of 32 symptom and problem areas. Your answers were added into scores in six categories listed on the left side of the attached page (relation to self/others, depression/anxiety, etc.) Scores can range from 0 to 4, with higher scores indicating greater difficulty. The patterned bars show level of difficulty the first time you filled out the questionnaire (before intake). The black bars show level of difficulty the second time you filled out the questionnaire (at followup). Your scores at both times are also printed to the right of the bars. By comparing your intake with your follow-up scores you can see if your symptom and problem difficulty went up or down over this time period. Below the graphs is a section identified as initial, previous, and current. Initial means the first time you filled out the questionnaire (before intake). Current means the second time you filled out the questionnaire (follow-up). Previous means the time before current if you filled out the questionnaire more than twice. Since you filled out BASIS-32 only two times, previous is not applicable (N/A). The dates listed are the dates your questionnaires were put into the computer, not intake or follow-up dates. If you reported quite a bit or extreme difficulty on any of six particular BASIS-32 items, these ratings are listed at the bottom of the page. The six possible items are: adjusting to major life stresses, fear/anxiety/panic, suicidal feelings or behavior, drinking alcoholic beverages, taking illegal drugs/misusing drugs, and controlling temper/outbursts of anger or violence. If you have any questions or concerns about your mental health status please discuss them with your clinician. Fig. 25.3. Client explanatory feedback for clients treated in ambulatory settings. that which assesses the individual's self-reported level of difficulty with respect to each item. Clinical perspectives, as well as those of family members or others, also serve a valuable role in both outcome research and in treatment planning. Other limitations of the BASIS-32 include high intercorrelations among three of the subscales, suggesting limited discriminant validity of these domains. Generalizability of psychometric properties to more diverse populations is not yet known due to the small number of published studies from other settings. Internal consistency reliability of the Impulsive/Addictive Behavior and Psychosis subscales are not high enough for use at the level of the individual (McHorney, Ware, Lu, & Sherbourne, 1994). Although inpatient and outpatient BASIS-32 data have been reported in the literature for specific samples, national norms for inpatients, outpatients, and untreated community populations are still under development.

< previous page

page_771

next page >

< previous page

page_772

next page > Page 772

Fig. 25.4. Sample BASIS-32 individual report (Case Study 1): BASIS-32 scores at admission and discharge from inpatient care.

< previous page

page_772

next page >

< previous page

page_773

next page > Page 773

Fig. 25.5. Mean BASIS-32 scores at admission and discharge from a psychiatric hospital in 1996 (N = 749). The BASIS-32 is subject to limitations of all self- report measures in that acutely intoxicated, psychotic, or demented patients may be unable to respond appropriately to the questions. It is not appropriate for children under age 14 or for clients with serious cognitive impairment (e.g., Alzheimer's disease). Clients who cannot read also cannot complete the BASIS-32 as a self-administered questionnaire. However, many potential respondents who are unable to complete the BASIS-32 independently as a self-report questionnaire are able to complete it when the questions are read to them by an interviewer. Many of these limitations are currently being addressed; others are proposed for future research on the utility, validity, and generalizability of the BASIS-32. For example, national norms are currently being compiled for both inpatient and outpatient treated populations and should be available at the time of this volume's publication.

< previous page

page_773

next page >

< previous page

page_774

next page > Page 774

Use of the Instrument for Treatment Monitoring Treatment monitoring, as opposed to treatment outcome, implies assessment during the course of treatment rather than at its conclusion. For clients in short-term treatment, monitoring can be done at frequent intervals (e.g., weekly). Although there may be value in such frequent monitoring, the burdens and costs involved may outweigh any benefits. On the other hand, the need for ongoing but less frequent treatment monitoring may be more cost-effective for evaluation of those in long-term treatment (e.g., the severe and persistent mentally ill). The BASIS-32 can be administered at specified intervals during the course of long-term treatment. The intervals to be chosen should be based on length of time in the program, feasibility, burdens, and costs. It is often most feasible to repeat BASIS-32 assessments at intervals that coincide with those required for other types of monitoring (e.g., quarterly progress notes). It should be noted that in long-term chronic care programs in which the goal is to maintain the functional status and symptom control necessary for self-maintenance in the community and avoidance of rehospitalization, BASIS-32 scores should not be expected to continue to improve indefinitely. Most research has shown that the greatest improvement occurs in the earliest phase of treatment (Howard, Kopta, Krause, & Orlinsky, 1986; Howard, Lueger, Maling, & Martinovich, 1993). In long-term treatment monitoring, maintenance of goals achieved earlier should be considered a positive outcome. Annual monitoring may be sufficient to document symptom and functional stability. Use of the Instrument for Treatment Outcomes Assessment General Issues Historically, treatment outcomes assessment was the domain of research investigators. Over the past few years, however, rising costs of mental health care, demands for accountability, and restructuring of both mental health service delivery and reimbursement practices have brought outcome research beyond the realm of the researcher and into the local hospital, community mental health center, and private practitioner's office (Eisen & Dickey, 1996; Sederer, Dickey, & Hermann, 1996). Clinical providers and facilities have long been involved in quality assurance, utilization, and peer review activities. However, these generally focused on the structures and processes of care rather than on outcomes. Currently, third-party payers and accrediting agencies are requiring that outcome assessment be included in the continuous quality improvement activities of both medical and behavioral health care providers. The Joint Commission on Accreditation of Healthcare Organizations (JCAHO, 1997) went a step further in their ORYX outcomes initiative. Accredited facilities will be required to choose from a list of approved performance measurement systems, the specific system and outcome measures they plan to use to meet accreditation requirements. Data collected by facilities will be sent to a performance measurement site for processing and reporting. The performance measurement site is required to provide the facility with risk-adjusted aggregate reports of its own results, as well as comparative

< previous page

page_774

next page >

< previous page

page_775

next page > Page 775

reports of results from other similar sites. The goals of this initiative are to help standardize the processes and tools used for outcome and quality assessment, to provide benchmarking data against which facilities and providers can compare their own performance, and to enhance the likelihood that such information will be used to improve the quality of mental health care directly at the facility/provider level. Thus, providers are required to become directly involved in outcome assessment and quality improvement efforts within their facilities and practices. Related to benchmarking are the concepts of provider profiling and report cards. By standardizing methods and tools for outcome assessment, and narrowing down the number of different tools in use, it is possible to collect large amounts of data using the same instruments, thus facilitating the development of mental health ''report cards" that can provide appropriate comparisons across settings. The expectation that outcome assessment become a routine part of clinical care means that cost-effective methods and assessment instruments must be used for this purpose. Lengthy research interviews and batteries are often not practical for a provider or clinic to administer for routine outcome assessment in a clinical setting. The BASIS-32 was specifically developed for purposes of outcome assessment. Based on experience reported by other users, the instrument is generally practical and useful for ongoing outcome assessment. The value of the BASIS-32 for this purpose has been recognized in a number of ways. In March 1996, the BASIS-32 received a Blue Ribbon Award from the New England Healthcare Assembly for innovation in health care delivery. In January 1997, the Outcomes Roundtable Steering Committee, a national panel of experts in outcome assessment jointly sponsored by the National Alliance for the Mentally Ill and Johns Hopkins University, named McLean Hospital's Outcomes Management System as one of five exemplary programs in the United States. (Two out of the five exemplary programs selected use the BASIS-32 as a primary outcome measure.) In May 1997, the BASIS-32 was approved by the Medical Outcomes Trust Scientific Advisory Committee and Board of Trustees to be incorporated into the trust's library of internationally available outcomes measurement instruments. In July 1997, JCAHO approved the McLean BASIS-32TM plus Performance Measurement System. Several other JCAHO-approved Performance Measurement Systems include BASIS-32 as an outcome indicator. Evaluation of Basis-32 Against NIMH Criteria for Outcome Measures Newman and Ciarlo (1994) discussed 11 criteria for selecting measures for use in outcome assessment that were identified by an expert panel convened by the National Institute for Mental Health (NIMH) in 1986. The first of the 11 criteria is concerned with the measure's relevance and appropriateness to its target group. Newman and Ciarlo (1994) defined target group as "a cluster of clients with similar clinical-demographic characteristics that are expected to have similar treatment needs and course" (p. 99). The BASIS-32 was developed as a generic measure for use with a broad range of mental health service recipients. As such, the instrument has proven to be useful and sensitive among inpatient, outpatient, and partial hospitalization populations across a wide geographic region. In addition to its use in the United States, the BASIS-32 has been used in several countries abroad, including

< previous page

page_775

next page >

< previous page

page_776

next page > Page 776

England, Scotland, and Australia. Requests for information about the BASIS-32 have been received from mental health facilities in eight foreign countries. As noted by Newman and Ciarlo (1994), different groups or types of clients can be expected to differ in their responses. Consistent with that expectation, outpatients in the aggregate report lower levels of difficulty than do patients who require more intensive care in an inpatient setting (Eisen et al., 1997). However, the BASIS-32 has detected changes in severity following treatment within all sample populations tested. Thus, as intended, the BASIS-32 appears to have relevance to a broader target group than that identified by Newman and Ciarlo. The second criterion noted by Newman and Ciarlo (1994) is that the instrument should have simple, teachable methods. Again, the BASIS-32 meets the criterion. An information packet is available to those interested in using BASIS-32, which includes the measure, a brief instruction manual with scoring information, and reprints of published articles describing reliability and validity. The instruction manual contains background information on the development of the measure, general guidelines for administration, item clarifications and elaborations, and a sample protocol for self-administration. It also addresses procedural variations in the protocol, outlining possibilities for administration in situations where the data is collected by a staff member in a structured interview, or when data is collected as a part of a research project or other situation outside of a clinical treatment process. In addition, answers to some common questions about the BASIS-32, a statement concerning ethical considerations and general scoring algorithms are included. The BASIS-32 does not require administration by clinically trained staff. It is a self-report measure that can be explained to respondents in a few minutes, and generally completed in less than 15 minutes. It can also be administered as a structured interview by mental health workers, nursing assistants, support staff, or even volunteers. Scoring is relatively simple and straightforward. The instruction manual includes a hand-scoring sheet, as well as a program for computer scoring in Statistical Product and Service Solutions (SPSS) that can be easily adapted to other software packages. The scoring sheet facilitates the scoring process and clarifies the subscale construction by visually presenting the grouping of the items in each subscale. A recent BASIS-32 user survey indicated that out of 245 respondents, only 6.1% expressed difficulty with scoring the instrument. An additional tool for incorporating the BASIS-32 into an outcome system is the BASIS-32 Fact Sheet, a simple, one-page summary of the BASIS-32 information, including brief scoring information, general guidelines for administration, patient eligibility guidelines, and the background of the BASIS-32. This fact sheet is very concise and small enough that it can easily be photocopied and left within reach of all staff who are concerned with the administration of the instrument. Criterion 3 refers to the availability of objective referents within the instrument. The BASIS-32 addresses aspects of social, occupational, and community functioning, as well as common psychiatric symptoms. Many of the items include a number of specific examples to be included within the item. In addition, when administered as a structured interview, item clarifications and elaborations are available for the interviewer to provide to the respondent. These elaborations provide additional referents that respondents can use to rate their level of difficulty. Additional objective referents are included in a demographic section of the BASIS-32, and responses to these were used in the validation process. These items include employment and student status, and living arrangements in the community.

< previous page

page_776

next page >

< previous page

page_777

next page > Page 777

Criterion 4 specifies the use of multiple respondents. As a measure of the consumer's perspective, the BASIS-32 utilizes only one respondent. Multiple perspectives are valuable and measures reflecting a clinical perspective would add to the outcome assessment process. However, instruments specifically designed to reflect a clinical perspective should be used for that purpose. Criterion 5 calls for more process-identifying outcome measures. It is true that "there should be a relationship between process and outcome" (Newman & Ciarlo, 1994, p. 102). However, the measurement of process and outcome should be independent. The BASIS-32 is clearly able to measure outcome and therefore should be able to detect any differential effects of processes if, indeed, they have an impact on the outcome. On the other hand, instruments such as the BASIS-32 may be measuring effects that are unrelated to treatment process, such as the passage of time. As a generic outcome measure, the BASIS-32 was not designed to assess the effect of any one specific process, but rather was designed to assess effects of many types of treatment processes and interventions. Psychometric strengths of the BASIS-32 for both inpatient and outpatient samples have been presented earlier in this chapter (Criterion 6). In summary, its psychometric strength within adult inpatient populations has been confirmed by three independent researchers. Psychometric properties of the BASIS-32 have been reported in one multisite outpatient study. Further work with more diverse samples is proposed. Criterion 7, low costs relative to its uses, is clearly a strength for the BASIS-32. It is a very low cost instrument to utilize for many different reasons. The initial cost for obtaining the BASIS-32 information packet is minimal. Although copyrighted by McLean Hospital, permission is routinely granted to mental health providers and facilities to use the instrument for noncommercial purposes to assess outcomes of their clients. Automated scanning and scoring software and hardware is available for purchase from commercial vendors to facilitate use. However, purchase of an automated system is not required to use the BASIS-32. In terms of the cost of staff time and respondent burden, the BASIS-32 also appears highly cost-effective. The measure requires only 5 to 20 minutes to administer as a self-report or structured interview. Hand scoring of the instrument averages from 5 to 7 minutes. Manual data entry and computerized scoring, applying the program enclosed in the instruction manual to a statistical software package, saves considerable time over hand scoring and allows for generation of aggregate reports. Ease of understanding by nonprofessional audiences (Criterion 8) is another benefit of the BASIS-32. Because scoring is accomplished by simple averaging of items in a subscale that range in value between 0 and 4, subscale scores are easily interpreted by nonprofessionals. Likewise, any of the available computer-generated graphic reports depicting BASIS-32 results are simple in their design and presentation, allowing for easy interpretation of results. As a measure of self-perceived difficulties, BASIS-32 scores are also easily understood by patients. As such, they can serve as a tool for facilitating communication with clinical staff. Aggregate results can be prepared and are found to be very useful for describing patient subgroups, comparing programs, and tracking changes over time in the client population for various influential stakeholders. (See Fig. 25.5 for an example of an aggregate report that presents mean subscale scores at admission and discharge.) These have been successfully presented to a wide range of audiences. As stated previously, the BASIS-32 allows for both easy scoring and interpretation of results. The simple 5point scale provides an easily understood indicator of difficulty,

< previous page

page_777

next page >

< previous page

page_778

next page > Page 778

whether scores are hand tabulated or automatically generated as a graphic report generated by scoring software. The BASIS-32 was developed as a tool to measure treatment outcome and quality improvement; however, the instrument has been found to be extremely useful in clinical services as well (Criterion 10). As described in the section on usefulness of the measure for treatment planning, the BASIS-32 has been found to be useful in systematically presenting to the clinician, the patients' own perspective on their symptom and problem difficulty. This information is helpful in understanding the patient, in treatment planning, and in providing more patientcentered mental health care. The BASIS-32 was not designed to be utilized specifically in conjunction with one clinical theory or practice (Criterion 11), but rather to represent a broad range of treatments and services encompassing many theoretical orientations. Research Findings Relevant to Use of Basis-32 As an Outcomes Measure Findings regarding reliability and validity of the BASIS-32, as well as sensitivity to change, have been described previously in this chapter. Consequently, this section summarizes other research findings that have emerged using the BASIS-32. Method Effects: Self-Report Versus Structured Interview. In a study designed to assess the impact of mode of administration on BASIS-32 self-reported symptom and problem difficulty (Eisen, 1995), newly admitted inpatients were randomly assigned to one of three assessment procedures: interviewer administered, self-administered, or choice of procedure. Results indicated that participation rate was not affected by mode of administration. In terms of BASIS-32 scores, controlling for age and gender, patients assigned to self-administer the BASIS-32 reported significantly more difficulty with respect to relation to self and others than patients who were randomly assigned or who chose interviewer administration. In addition, patients randomly assigned to self-administer the BASIS-32 reported significantly more difficulty with respect to daily living/role functioning skills than patients who chose interviewer administration. The remaining three subscales were not significantly affected by assessment method. Self-reports of greater difficulty in some areas when self-administered compared to interviewer administered were consistent with findings in the literature (Hurt, Hyler, Frances, Clarkin, & Brent, 1984), and may be related to greater willingness to divulge sensitive information in the more anonymous situation of not having to report to an interviewer (Tausig & Freeman, 1988). BASIS-32 In Other Outcome Studies. Sederer et al. (1992) used the BASIS-32 as an outcome measure in a study comparing the impact of case-based reimbursement on clinical outcome and utilization of services. Psychiatric inpatients covered by a case-based reimbursement insurance plan were compared with inpatients treated at the same time in the same programs under a variety of conventional insurance plans. Results indicated no significant differences between groups in BASIS-32 outcome scores. Both groups reported statistically significant improvement from admission to discharge. Hoffmann and Mastrianni (1995) used the BASIS-32 as an outcome measure in a study comparing the effect of partial hospitalization versus standard outpatient treatment, following an inpatient treatment episode. Results indicated that patients receiving partial hospital care continued to improve from the time of discharge to a follow-up

< previous page

page_778

next page >

< previous page

page_779

next page > Page 779

point 6 months postdischarge, whereas patients receiving standard outpatient care deteriorated slightly between discharge and follow-up. Thus, the BASIS-32 was used to show differential effects of particular treatment modalities. Russo et al. (1997) and Hoffmann et al. (1997) also reported that BASIS-32 is highly sensitive as an outcome measure for the purpose of assessing change following inpatient treatment. Eisen and Dickey (1996) discussed the use of BASIS-32 to assess outcomes within a behavioral health care setting's continuous quality improvement program, and present outcome profiles by age, sex, and diagnostic groups. Eisen et al. (1997) confirmed the utility of the BASIS-32 as a sensitive change measure among recipients of outpatient services, whose initial BASIS-32 scores indicate lower levels of symptom and problem difficulty than has been reported for recipients of inpatient care. In addition to the published literature in which use of the BASIS-32 is reported there is much work using the instrument that is not published in the literature. Interest by behavioral health providers in the BASIS-32 prompted compilation of a BASIS-32 information packet comprised of a sample copy of the measure, an instruction manual including scoring instructions and a reference list, and several published papers describing methodology, reliability, and validity of the instrument. To date, the information packet has been distributed on request to more than 1,500 providers and provider organizations across the United States and abroad. More than 200 organizations have purchased automated scoring and reporting software. A recent survey yielded 245 responses from facilities that had requested the information packet. One hundred forty-four facilities indicated that they had used the BASIS-32: 45 with inpatients, 41 with partial hospital programs, and 104 with outpatient programs. Among BASIS-32 users are a pharmaceutical company using the instrument in a clinical drug trial, and several managed care companies, insurance companies, health maintenance organizations, hospitals, community mental health clinics, substance abuse programs, group practices, and individual clinicians. A few of these efforts are described later, but the great majority represent facility-specific quality improvement efforts or outcomes assessment programs for internal quality improvement purposes. As such, they are not reported in the research literature. Council of Behavioral Group Practices Outcomes Management Project. The Council of Behavioral Group Practices is a consortium of outpatient facilities that implemented a collaborative outcome study in 1994, calling for pretreatment assessments, along with 3-month and 6-month follow-ups. Currently, 19 practices in 12 different states are participating, and baseline data has been collected on more than 11,000 clients. The BASIS-32 (along with the SF-36) is used to assess outcome in a comprehensive project that also includes assessment of expectations for treatment, content of treatment as assessed by the Treatment Events Checklist, and client satisfaction. This program is coordinated by the University of Cincinnati Department of Psychiatry. Preliminary results suggest that greatest improvement occurs between intake and 3-month follow-up, with continued but more gradual improvement at 6-month follow-up (Kramer & Mahesh, 1996). The Council's Outcomes Management Project was also recognized by the Outcomes Roundtable Steering Committee as an Exemplary Outcomes Program. Future projects for this program include the development of an inpatient and child and adolescent protocol. All aspects of this program are intended to support the NCQA and JCAHO review process. Massachusetts Society for the Prevention of Cruelty to Children (MSPCC) Outcome Study. Among the services MSPCC provides are home- and clinic-based mental health services to children and adults throughout Massachusetts. In 1996, they implemented

< previous page

page_779

next page >

< previous page

page_780

next page > Page 780

an outcome study. A preliminary report of results showed that their sample population (N = 414 adults and adolescents age 14 and over) included a high percentage of population of color (52%: including 36% Latino, 10% African American, and 6% other populations of color). The outcome study used the BASIS-32 at intake and 2-month follow-up. MSPCC's sample contained greater ethnic diversity than the sample in the outpatient study done by Eisen et al. (1997). However, based on MSPCC preliminary results, the BASIS-32 appears to capture changes between intake and follow-up within each ethnic group (Latino, African American, and European descent). These and other data collected on populations of color can be used to assess systematic differences among ethnic groups in the psychometric properties and cultural equivalence of the BASIS-32. Greenleaf Health Systems, Inc. Outcomes Project. Greenleaf Health Systems based in Chattanooga, Tennessee, has been using the BASIS-32 to assess outcomes of inpatient care for more than 4 years. Patient outcomes are considered essential to maintaining the highest standards of patient care in the program. Reports have been prepared to assess and compare outcomes of inpatients treated in generic and specialty programs (chemical dependency and adolescent services), with assessments done at admission, discharge, and at 6- and 12-month follow-up. Again, the BASIS-32 has indicated statistically and clinically significant change following treatment. Applications of the Instrument for Outcomes Assessment Aggregate outcomes data can be used for several purposes: to evaluate effectiveness of a program; to compare outcome results of multiple programs within and outside a particular facility; to help guide decision making about program structure and content, staff training, and clinical care; and to improve the quality of mental health care. The BASIS-32 can be used for all of these purposes. In recent years, increasing costs of health care and consequent efforts to manage care have led to the demand for accountability. Purchasers of care (e.g., government agencies, commercial insurers, employers, and managed care companies) want to know the effectiveness and efficiency of the services provided by health care organizations with which they do business. In a competitive marketplace, health care organizations need to show that the care they provide results in favorable outcomes for consumers. Thus, outcome measures such as the BASIS-32 are increasingly being used to demonstrate improvement in symptom and problem difficulty following treatment. Accountability is being demonstrated at many levels, from insurance and managed care companies who are conducting their own outcome assessment projects, to large networks of providers, individual facilities, programs within facilities, and providers within programs. Organizations who are not assessing clinical outcomes and who cannot demonstrate program effectiveness, are at an increasing disadvantage in their efforts to maintain the contracts necessary to continue their work as health care providers. Demands to demonstrate program effectiveness can be met in a variety of ways using experimental or observational research designs with comparison groups. Although outcome assessment done as part of quality improvement cannot randomly assign patients to different treatment conditions (or no treatment), outcome assessment efforts can compare patients in different programs. Without comparative data, demonstration of improvement following treatment is consistent with program effectiveness, but alternative

< previous page

page_780

next page >

< previous page

page_781

next page > Page 781

explanations cannot be ruled out. Improvement over time may occur without any treatment, reflecting the course of illness rather than the effect of the intervention. Baseline assessment with the BASIS-32 can help to guide decision making about program content by informing administrators and clinical leaders about the kinds of symptoms and problems presented by their patients. For example, high scores on the Daily Living/Role Functioning subscale may suggest the need for services geared toward those areas. Similarly, high levels of difficulty with substance abuse may point to the need for substance abuse services. Needs for particular types of services can guide staff training and recruitment efforts. BASIS-32 profiles can be used to identify programs, providers, or individual cases with poor outcomes. Program and provider comparisons, however, should only be made with appropriate case-mix and risk adjustments. Differences in patient characteristics (e.g., age, sex, and socioeconomic status) may result in significant variation in outcomes. Even more important are clinical factors. Differences in diagnosis, severity of illness, chronicity, and comorbidity may lead to different outcomes. Failure to account for case-mix and risk factors can result in erroneous conclusions (Iezzoni, 1994; Iezzoni et al., 1992). On an individual level, endorsement of particular BASIS-32 items can be used to guide treatment planning and referral for additional specialized services. Individual outcome profiles can identify consumers who do not report improvement. For example, clinically significant improvement is generally identified as improvement of .5 standard deviation. With a standard deviation of 1.00 (the approximate standard deviation found consistently for Relation to Self/Others, Daily Living Skills, and Depression/Anxiety subscales), clinically significant improvement thus translates to a half point improvement on the 5-point rating scale. Standard deviations are typically lower for the Impulsive/Addictive Behavior and Psychosis subscales and for the overall mean (Table 25.2). Patients who do not improve to that degree can be identified and looked at in depth to try to identify patient, treatment, environmental, or other factors that might help in understanding the failure to respond to treatment. Identification of such factors can point toward a solution of the problem. Use of Findings from the Instrument with Other Evaluation Data BASIS-32 assessments provide outcome data from one perspective, that of the patient or client. However, measures reflecting other perspectives and other outcome domains can add greatly to a full understanding of outcome. A number of research studies report findings from other evaluation data. Russo et al. (1997) reported that in addition to the BASIS-32, at admission and discharge, nurses administered the Lehman Quality of Life Interview, which is a structured interview reflecting the patient's perspective (Lehman, 1988). Clinicians rated patients on the Psychiatric Symptom Assessment Scale (PSAS; Bigelow & Berthot, 1989) and the Social and Occupational Functioning Assessment Scale (SOFAS-Revised GAF; Roy-Byrne Dagadakis, Unutzer, & Ries, 1996). Results of the study indicated that global life satisfaction and functional quality of life domains were significantly correlated with some of the BASIS-32 subscales. However, the correlations (which ranged from r = .17 to r = .27) accounted for only from 3% to 7% of the variance, suggesting that the functional quality of life domains are relatively independent constructs from the domains assessed by the BASIS-32. Correlations of

< previous page

page_781

next page >

< previous page

page_782

next page > Page 782

patient-reported satisfaction with aspects of quality of life, and BASIS-32 subscale scores were substantially higher, ranging from .17 to .51, accounting for up to 26% of the variance and suggesting much greater overlap of life satisfaction with symptom and problem difficulty. The psychiatrist-rated SOFAS-GAF was not significantly correlated with any of the BASIS-32 subscales. This lack of relation between psychiatrist ratings and self-reported symptom difficulty is consistent with insignificant correlations found in another study between SOFAS-GAF ratings and Brief Symptom Inventory (BSI) ratings (Piersma & Boes, 1995). On the other hand, PSAS ratings of depression and anxiety, and psychosis were significantly correlated with the BASIS-32 Depression/Anxiety and Psychosis subscales, respectively. In addition, items related to substance abuse were significantly correlated with BASIS impulsive/addictive behavior scores (Russo et al. 1997). Eisen et al. (1997) had outpatients complete the SF-36 (Ware & Sherbourne, 1992), in addition to the BASIS32. The five BASIS-32 subscales were correlated with the eight SF-36 subscales to assess the degree to which each scale measured different constructs. Results indicated that three of the BASIS-32 subscales (Relation to Self/Others, Daily Living Skills, and Depression/Anxiety) correlated moderately highly with five of the SF-36 subscales (General Health, Vitality, Social Functioning, Role Limitations Due to Emotional Problems, and Mental Health). The correlations ranged from .45 to .68. Correlations with the three purely physical health SF-36 subscales (Physical Functioning, Role Limitations Due to Physical Health Problems, and Bodily Pain) were lower, ranging from .18 to .46. Impulsive/Addictive Behavior and Psychosis subscale scores correlated moderately with the SF mental health domains (rs from .18 to .44), and still lower with SF physical health domains (rs from .15 to .29). This pattern of correlations suggest that physical health status can be differentiated from mental health status. Similarly, impulsive/addictive behavior and psychosis are relatively independent constructs from physical health and other aspects of mental health. Clearly, for patients for whom physical health problems are likely to be prevalent, such as the elderly or clients with physical disabilities, use of the SF36 with the BASIS-32 adds an important dimension. Dickerson (1997) noted a number of other commonly used outcome measures, including rates of rehospitalization, incarceration, mortality, unemployment, and housing stability. Such measures provide useful supplementary data regarding community functioning, and they can be used in addition to the BASIS-32 and other assessments of clinical status. Provision of Feedback Regarding Outcomes Assessment Findings In order to be useful for assessing program effectiveness, for quality improvement or for decision making, provision of feedback regarding outcomes assessment findings is essential. In a clinical setting, results of aggregate data analysis are reported in three different ways. First, clinical and administrative staff receive quarterly reports presenting aggregate BASIS-32 outcomes for all patients discharged from their programs during each quarter, and for all patients discharged hospital wide. The reports present demographic and clinical descriptive data for the sample, as well as admission and discharge BASIS-32 scores presented graphically. Thus, change from admission to discharge can quickly be visually determined. Tests of statistical inference are used to assess statistical significance of the change, and this is reported as well. An annual hospital wide report

< previous page

page_782

next page >

< previous page

page_783

next page > Page 783

that includes data from a full calendar year also is prepared with breakdowns by demographic and clinical variables. Admission and discharge data for hospitalized patients discharged in 1995, broken down by age, sex, and diagnosis, is reported in the literature (Eisen & Dickey, 1996). Aggregate outcome data using the BASIS-32 is included in proposals to managed care and insurance companies, as well as to state and accrediting agencies. Periodic in-house publications, called McLean Reports, are used to summarize work with different quality indicators identified by the Continuous Quality Improvement program. The BASIS-32 is one of the identified quality indicators. The focus of the reports is on the rationale behind the use of particular indicators, data collection methods and results, and how results are used to improve quality of care. McLean Reports is distributed to several thousand mental health providers and agencies throughout the region. As an academic teaching and research facility, feedback is provided regarding outcomes assessment findings via presentations at conferences and workshops, and through publication in books and peer-reviewed journals. Limitations/Potential Problems in the Use of the Instrument for Outcomes Assessment The major limitations of BASIS-32 have been addressed in the section on limitations in the use of the instrument for treatment planning. Another potential limitation with respect to the use of the BASIS-32 for outcome assessment is the impossibility of detecting improvement should respondents report "no difficulty" on all items, (i.e., respondents may deny difficulty with all symptoms and problems). If, following treatment, patients understand that they are having difficulty, then the increase in difficulty scores would suggest that the patient got worse when, in fact, clinicians may feel that recognition of problems is actually the first step toward improvement. In this case, longer term follow-up would be needed to show decreased levels of difficulty. The incidence of total denial of symptoms and problems is rare, occurring in approximately 1% of inpatients and outpatients. In these situations, findings from an outcome measure reflecting a clinical perspective would be highly desirable. Although some benchmarking data are available, compilation of national norms for inpatient and outpatient mental health service recipients will greatly add to the utility of the BASIS-32 for outcome assessment. Increasingly widespread use of the BASIS-32 and requirements regarding performance measurement will speed up the process of creation of normative databases. Potential Use as a Data Source for Mental Health Service Report Cards BASIS-32 has the potential to serve as a data source for mental health service report cards. Although the concept of health care report cards is not new, report cards are currently at an early stage of development (U.S. General Accounting Office, 1994). Several mental health-specific report card initiatives are underway including the Mental Health Statistics Improvement Project (MHSIP, Center for Mental Health Services, 1996) and the Performance Measures for Managed Behavioral Healthcare Programs (PERMS; American Managed Behavioral Healthcare Association, 1995). In addition,

< previous page

page_783

next page >

< previous page

page_784

next page > Page 784

the ORYX initiative recently implemented by the Joint Commission on Accreditation of Health Care Organizations (JCAHO, 1997) will require inpatient and ambulatory mental health programs to collect and submit outcomes data to approved performance measurement sites for processing and reporting. These sites will, in turn, prepare risk-adjusted facility-specific reports and compile aggregate data for regional and national benchmarking purposes. The BASIS-32 is included as an outcome measure in at least 4 Performance Measurement Systems that have been approved by JCAHO. One of the main challenges of creating a mental health report card is to obtain consensus on what domains of outcome to assess, and what measures to use for this purpose. Lambert, Ogles, and Masters (1992) reported that in 348 studies reported in 20 journals between 1983 and 1988, more than 1,430 different outcome measures were used. Meta-analyses of outcome studies conclude that different measures often lead to different conclusions (Miller & Berman, 1983; Oliver & Spokane, 1988). With so many different measures in use it becomes difficult to create large normative databases for each measure. Thus, it is important for the field of outcome assessment to narrow down the number of outcome measures in use so that large databases can be created for a much smaller number of instruments. Case Studies Following are three brief case vignettes that describe how BASIS-32 assessments can be used to enhance the clinical understanding, treatment planning efforts, and clinical outcomes of three individuals admitted for inpatient care. Important details of these cases have been changed to ensure confidentiality. Case 1 Ms. S is a 34-year-old single, White woman who is well known at this hospital. She carries the diagnosis of schizophrenia with paranoid delusions for which she has had numerous hospitalizations over the past 3 years. Ms. S has been unable to work and is currently on psychiatric disability. She has been living in a group home for the past 6 months and receives outpatient care, including medication management, from a psychiatrist at a local community mental health center. Ms. S wants no contact with her family, and social interaction with other residents at her group home has been minimal. Efforts to engage her in a day program have been marginally successful, with only sporadic attendance. Ms. S states her chief complaint as "I am the good witch of the north." She was brought to the hospital this evening by staff at her group home following an incident in which she attempted to burn down the group home by throwing lit candles on her bed. This was reported as an effort to "please the members of the coven." This attempt was in response to auditory command hallucinations. Ms. S has also exhibited paranoid ideation in the form of beliefs that people on the subway are actually members of other "witch groups'' and are spying on her in an attempt to "steal her power." She denies any visual hallucinations or suicidal ideation. Staff members at the group home have been suspicious that she has not been taking all of her medications over the last few weeks. The patient's BASIS-32 scores on admission (Fig. 25.4) generally indicate "a little" difficulty overall, with lower levels of difficulty for psychosis and none for impulsive/addictive

< previous page

page_784

next page >

< previous page

page_785

next page > Page 785

behavior. Difficulty with daily living skills was reported to be somewhat higher, approaching "moderate" levels. "Quite a bit" or "extreme" difficulty was reported for adjusting to major life stresses, suggesting that a particular stressful event may have precipitated the current psychotic episode. Exploration of the nature of such an event by the clinical treatment team could enhance understanding of the patient's current condition and help to guide the treatment plan. The impulsivity and psychosis described by the clinical team and reported in the admission note are not reflected in the patient's BASIS-32 scores in these areas, suggesting that she herself does not see these areas as particularly problematic. In this case, the patient's perception does not coincide well with the clinical evaluation, clearly highlighting the importance of obtaining multiple perspectives to arrive at a complete clinical assessment. During the patient's 10-day stay at the hospital, she remained quite isolative and suffered from insomnia throughout her stay. Medications were started again, along with provision of information about mental illness and appropriate use of medications. Three days before discharge she reported that the auditory hallucinations had ceased. BASIS-32 scores on discharge indicated reduced levels of difficulty with depression/anxiety, daily living skills, and psychosis. Difficulty with relation to self/others and impulsive/addictive behavior increased slightly, with the overall mean score remaining almost identical to that obtained at admission. Discharge plans included recommendations to encourage social interaction at the group home, and to continue efforts to educate the patient regarding her illness and benefits of taking her medication. Case 2 Mr. J is a 46-year-old African American male who has been sent to this hospital after medical treatment and clearance of an unintentional heroin overdose. Mr. J. has been using heroin for the past 3 years, with a 9-month period of sobriety dating back to 1 year ago. Additional diagnoses include posttraumatic stress disorder and dysthymic disorder. His chief complaint is, "I've got to get off of this or I'm going to die." Mr. J has been seasonally employed as a construction worker for the past 25 years. As this admission is in winter, he currently finds himself unemployed and living off of his wife's income. His wife is an accountant and denies any personal drug use. Mr. J describes his marriage as very supportive and states that he is very fearful of losing this support should he find himself unable to end his drug use. The patient and his wife have 2 boys, ages 4 and 7. The patient describes himself as being very motivated to end his heroin habit. Figure 25.6 presents admission and discharge BASIS-32 data for this case. The patient reported "moderate" to "quite a bit" of difficulty at admission on all BASIS-32 subscales except psychosis. In addition, "quite a bit" or "extreme'' difficulty was reported on the following key items: adjusting to major life stresses; drinking alcoholic beverages; taking illegal drugs; misusing drugs; and controlling temper, outbursts of anger, violence. Following an unremarkable detoxification, the patient participated fully in group sessions during his 11-day stay on the dual diagnosis unit and NA meetings three times per week. Aftercare was arranged with a private therapist, who will also serve as a provider for couples therapy for the patient and his wife. The patient has agreed to attend local NA meetings three evenings per week, and has found himself a sponsor. Strategies to avoid former drug contacts were devised and Mr. J agreed to make every effort stay away from them. BASIS-32 scores at discharge indicate a consistent decrease in difficulty in all areas.

< previous page

page_785

next page >

< previous page

page_786

next page > Page 786

Fig. 25.6. Case Study 2: BASIS-32 scores at admission and discharge from inpatient care.

< previous page

page_786

next page >

< previous page

page_787

next page > Page 787

Fig. 25.7. Case Study 3: BASIS-32 scores at admission and discharge from inpatient care. Case 3 Ms. C is a 27-year-old single, White female. She was brought to the hospital for evaluation by her outpatient provider after she expressed extreme distress and made numerous

< previous page

page_787

next page >

< previous page

page_788

next page > Page 788

references to suicide during her session today, with a plan to overdose on alcohol and sedatives. Ms. C was unable to contract for safety and was admitted on an involuntary commitment status. Ms. C had been under the care of her outpatient provider following an incident where she was involved in the unsuccessful attempt to smuggle her Mexican boyfriend into the United States, which ended tragically in his death by drowning, which the patient witnessed. She was arrested for her involvement in the smuggling; however, she is currently free on bail and awaiting trial. Ms. C states her chief complaint is, "It's all my fault. I don't want to live anymore." She endorses symptoms of depression and anxiety in the form of anhedonia, insomnia, feelings of dizziness, and tachycardiaall of which have escalated in the past week. She also admits to drinking 6 to 9 shots of whiskey at a time, 5 days a week. Figure 25.7 presents BASIS-32 scores on admission and discharge. At admission, Ms. C reports extreme difficulty on all items relating to the subscale of Depression/Anxiety, resulting in a maximum subscale score of 4.00. Difficulty with Daily Living Skills also was quite high. Despite heavy drinking, scores on the Impulsive/Addictive Behavior subscale are moderated by lack of illegal drug use and other impulsive behaviors. Following a 7-day hospital stay that included individual and group therapy, a consultation to evaluate her alcohol abuse, and the initiation of psychopharmacological treatment, the patient exhibited dramatically fewer problems with depression and states that she no longer intends to kill herself. She will continue with her outpatient therapist, as well as her prescribed anti-depressants, which she appears to be tolerating well. She has been advised to attend Alcoholics Anonymous and was given a schedule of meeting times and locations in her town. Her BASIS-32 scores at discharge indicate lower levels of difficulty in all measured areas. Conclusions Since publication of reliability and validity data for the BASIS-32 in 1994 (Eisen et al., 1994), interest in the instrument to assess outcomes from the consumer's point of view has greatly increased. All users of the BASIS32 are encouraged to publish their data and report their experiences with the instrument. Researchers are encouraged especially to continue the work needed to generalize findings to more diverse populations, to test foreign translations of the instrument, to assess multicultural equivalence, and to empirically assess the utility of the instrument for treatment planning and outcomes assessment. Future work with the BASIS-32 will address its limitations, and enhance its utility, reliability, and validity across the spectrum of mental health consumers. References Allen, J.G., Coyne, L., Colson, D.B., Horwitz, L., Gabbard, G.O., Frieswyk, S.H., & Newson, G. (1996). Pattern of therapist interventions associated with patient collaboration. Psychotherapy, 33, 254-261. American Managed Behavioral Healthcare Association. (1995). PERMS 1.0 performance measures for managed behavioral healthcare programs. Quality Improvement and Clinical Services Committee of the American Managed Behavioral Healthcare Association. Battle, C. C., Imber, S. D., Hoehn-Saric, R. Stone, A. R., Nash, E. R., & Frank, J. D. (1966). Target complaints as criteria of improvement. American Journal of Psychotherapy, 20, 184-192.

< previous page

page_788

next page >

< previous page

page_789

next page > Page 789

Bigelow, L.B., & Berthot, B.D. (1989). The psychiatric symptom assessment scale. Psychopharmacology Bulletin, 25, 168-179. Center for Mental Health Services. (1996). The MHSIP consumer-oriented mental health report card. The final report of the Mental Health Statistics Improvement Program (MHSIP) task force on a consumer- oriented mental health report card. Washington, DC. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates. Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297- 334. Dickerson, F.B. (1997). Assessing clinical outcomes: The community functioning of persons with serious mental illness. Psychiatric Services, 48, 897- 902. Eisen, S.V. (1995). Assessment of subjective distress by patient self-report versus structured interview. Psychological Reports, 76, 35-39. Eisen, S.V. (1996). Behavior and Symptom Identification Scale (BASIS-32). In L. I. Sederer & B. (Eds.), Outcomes assessment in clinical practice (pp. 65-69). Baltimore, MD: Williams & Wilkins. Eisen, S.V., & Dickey, B. (1996). Mental health outcome assessment: The new agenda. Psychotherapy, 33, 181189. Eisen, S.V., Dill, D.L., & Grob, M.C. (1994). Reliability and validity of a brief patient-report instrument for psychiatric outcome evaluation. Hospital and Community Psychiatry, 45, 242-247. Eisen, S.V., & Grob, M.C. (1982). Clients' rehabilitation goals and outcome. Psychological Reports, 50, 763767. Eisen, S.V., Grob, M.C., & Dill D.L. (1989). Substance abuse in an inpatient population. McLean Hospital Journal, 14, 1- 22. Eisen, S.V., Grob, M.C., & Dill, D.L. (1991). Outcome measurement: Tapping the patient's perspective. In S.M. Mirin, J. Gossett, & M. C. Grob. (Eds.), Psychiatric treatment, advances in outcome research (pp. 213-235). Washington, DC: American Psychiatric Press. Eisen, S.V., Grob, M.C., & Klein, A.A. (1986). BASIS: The development of a self-report measure for psychiatric inpatient evaluation. The Psychiatric Hospital, 17, 165-171. Eisen, S.V., Wilcox, M., Schaefer, E., Culhane, M. A., & Leff, H.S. (1997). Use of BASIS-32 for outcome assessment of recipients of outpatient mental health services. Report to Human Services Research Institute, Cambridge, MA. Eisen S.V., & Youngman, D. (1997). Instruction manual for BASIS-32. Unpublished manuscript. Eisen S.V., Youngman, D., Grob, M.C., & Dill, D.L. (1992). Alcohol, drugs and psychiatric disorders: A current view of hospitalized adolescents. Journal of Adolescent Research, 7, 250-265. Gibbons, R.D., Clark, D.C., & Kupfer, D.J. (1993). Exactly what does the Hamilton Depression Rating Scale measure? Journal of Psychiatric Research, 27, 259-273. Hoffmann, F.L., Capelli, K., & Mastrianni, X. (1997). Measuring treatment outcome for adults and adolescents: Reliability and validity of BASIS-32. Journal of Mental Health Administration, 24, 316-331. Hoffmann, F.L., & Mastrianni, X. (1995). Partial hospitalization following inpatient treatment: Patient characteristics and treatment outcome. Continuum, 2, 247-261. Howard, K.I., Kopta, S.M., Krause, M.S., & Orlinsky, D.E. (1986). The dose-effect relationship in psychotherapy. American Psychologist, 41, 159-164. Howard, K.I., Lueger, R.J., Maling, M.S., & Martinovich, Z. (1993). A phase model of psychotherapy outcome: Causal mediation of change. Journal of Consulting and Clinical Psychology, 61, 678-685. Horvath, A.O. (1994). Research on the alliance. In A.O. Horvath & L.S. Greenberg (Eds.), The working alliance: Theory, research and practice (pp. 259-286). New York: Wiley. Hurt, S.W., Hyler, S.E., Frances, A., Clarkin, J. E., & Brent, R. (1984). Assessing borderline personality disorder with self-report, clinical interview or semistructured interview. American Journal of Psychiatry, 141, 1228-1231. Iezzoni, L.I. (1994). Risk adjustment for measuring health care outcomes. Ann Arbor, MI: Health Administration Press. Iezzoni, L. I., Restuccia, J.D., Schwartz, M. Schaumburg, D., Coffman, F.A., Kreger, B. E., Buterly, J.R., & Selker, H.P. (1992). The utility of severity of illness information in assessing the quality of hospital care. The

role of the clinical trajectory. Medical Care, 30, 428-444.

< previous page

page_789

next page >

< previous page

page_790

next page > Page 790

Joint Commission on Accreditation of Healthcare Organizations. (1997). ORYX: The next evolution in accreditation. Oakbrook Terrace, IL: Author. Joreskog, K., & Sorbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command language. Chicago: Scientific Software. Kessler, R.C., & Mroczek, D.K. (1995). Measuring the effects of medical interventions. Medical Care, 33, AS109-AS1195. Kramer, T.L., & Mahesh, N.M. (1996). The outcomes management project of the Council of Behavioral Group Practices. Unpublished quarterly report. Lambert, M.J., Ogles, B., & Masters, K.S. (1992). Choosing outcome assessment devices: An organizational and conceptual scheme. Journal of Counseling and Development, 70, 527-532. Lazare, A., & Eisenthal, S. (1979). A negotiated approach to the clinical encounter: I. Attending to the patient's perspective. In A. Lazare (Ed.), Outpatient psychiatry: Diagnosis and treatment (pp. 141-156). Baltimore, MD: Williams & Wilkins. Lehman, A.F. (1988). A quality of life interview for the chronically mentally ill. Evaluation and Program Planning, 11, 51-62. Lyons, J.S., Shasha, M., Christopher, N.T., & Vessey, J.T. (1996). Decision support technology in managed mental healthcare. In C. Stout (Ed.), The integration of psychological principles in policy development (pp. 161170). Westport, CT: Praeger. McHorney, C.A., Ware, J.E., Lu, R., & Sherbourne, C. D. (1994). The MOS 36-item short-form health survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Medical Care, 32, 40-66. Miller, R.C., & Berman, J.S. (1983). The efficacy of cognitive behavior therapies: A quantitative review of the research evidence. Psychological Bulletin, 94, 39-53. Oliver, L.W., & Spokane, A.R. (1988). Career intervention outcome: What contributes to client gain? Journal of Counseling Psychology, 35, 447-462. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-108). Hillsdale, NJ: Lawrence Erlbaum Associates. Piersma, H.L., & Boes, J.L. (1995). Agreement between patient self-report and clinician rating: Concurrence between the BSI and the GAF among psychiatric inpatients. Journal of Clinical Psychology, 51, 153-157. Postrado, L.T., & Lehman, A.F. (1995). Quality of life and clinical predictors of rehospitalization of persons with severe mental illness. Psychiatric Services, 46, 1161-1165. Roy-Byrne, P.P., Dagadakis, C., Unutzer, J., & Ries, R. (1996). Evidence for limited validity of the revised GAF-SOFAS. Psychiatric Services, 47, 864-866. Russo, J., Roy-Byrne, P., Jaffe, C., Ries, R., Dagadakis, C., Dwyer-O'Connor, E., & Reeder, D. (1997). The relationship of patient-administered outcome assessments to quality of life and physician ratings: Validity of the BASIS-32. Journal of Mental Health Administration, 24, 200-214. Sederer, L.I., Dickey, B., & Hermann, R. (1996). The imperative of outcomes assessment in psychiatry. In L.I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 1-7). Baltimore: Williams & Wilkins. Sederer L. I., Eisen S.V., Dill, D.L., Grob, M. C., Gougeon, M., & Mirin, S.M. (1992). Case-based reimbursement for psychiatric hospital care. Hospital and Community Psychiatry, 43, 1120-1126. Shrauger, J.S., & Osborne, T.M. (1981). The relative accuracy of self-predictions and judgments by others in psychological assessment. Psychological Bulletin 90, 322-351. Tausig, J.E., & Freeman, E.W. (1988). The next best thing to being there: Conducting the clinical research interview by telephone. American Journal of Orthopsychiatry, 58, 418-427. U.S. General Accounting Office (1994). Health care forum: Report cards are useful but significant issues need to be addressed (Publication No. GAO/HEHS-94-219). Washington, D.C. General Accounting Office. Ware, J.E., & Sherbourne, C.D. (1992). The MOS 36-item Short-Form Health Survey (SF-36): I. Conceptual framework and item selection. Medical Care, 30, 473- 483.

< previous page

page_790

next page >

< previous page

page_791

next page > Page 791

Chapter 26 Brief Psychiatric Rating Scale William O. Faustman Palo Alto Veterans Affairs Health Care System And Stanford University School of Medicine John E. Overall The University of Texas Health Science Center at Houston The assessment of clinical symptoms in psychiatric patients is of pivotal importance to both clinical care and research. Symptoms tend to vary across settings, ranging from features common to less severely ill outpatients (e.g., guilt, anxiety, depressed mood) to others more often viewed in acutely ill psychiatric inpatients (e.g., hallucinations, delusions, suspiciousness). Several clinical symptom rating scales (e.g., the Hamilton Rating Scale for Depression, the Scale for the Assessment of Positive Symptoms) have been developed to document features that are unique to specific diagnoses such as depression and schizophrenia. However, there is an important place in the assessment of psychiatric symptoms for an instrument that can be used routinely as a broad spectrum measure of psychopathology in patients with a variety of diagnoses. An ideal instrument for such clinical and research use would be relatively brief in administration time, simple to score, sensitive to change, and designed for repeated administration. The Brief Psychiatric Rating Scale (BPRS) is an instrument that meets the need for rapid and reliable global assessment of clinical symptoms across a broad range of psychiatric patients (e.g., patients with schizophrenia, mania, depression, schizoaffective disorder, dementia patients with psychiatric symptoms). It is a clinician-rated scale that provides for evaluation of 18 symptom constructs spanning much of the range of psychiatric manifestations at a relatively global level of integration. Though originally developed for inpatient populations, the rating scale has been documented to have a broad utility in a diverse range of settings. This notwithstanding, it initially attained widespread use in the evaluation of patients with a primary diagnosis of schizophrenia where it gained recognition as a research tool in clinical psychopharmacology research. The BPRS can be completed based on observations noted in a routine clinical interview lasting from 20 to 30 minutes. It is intended that the ratings utilize the clinical judgment of mental health professionals who are familiar with the constructs to be rated. Because the scale items are fundamental constructs of psychopathology, the instrument makes an effective teaching device for mental health trainees, with the psychiatric elements of a routine mental status examination (e.g., affect, mood, reality testing, thought process, orientation) being adequately represented in the scale. The BPRS thus can be integrated

< previous page

page_791

next page >

< previous page

page_792

next page > Page 792

into clinical teaching situations where the desire is to teach trainees the concepts and accurate assessment of psychiatric phenomena. This chapter provides an overview of the clinical and research use of the BPRS. An exhaustive review of the use of the BPRS is well beyond the scope of this chapter, because the scale has now been used in thousands of studies. The aim here is to present sufficient information for a reader to appreciate the variety of uses of the BPRS as an assessment tool and outcome measure. A review of common problems and questions raised in the clinical use of the scale also is provided. In addition, specific suggestions on the clinical use of the scale (e.g., item definitions, interview techniques) are offered. Overview Scale Development The late 1950s and early 1960s was a revolutionary period in the treatment of major psychiatric disorders. Medications were being identified that, for the first time, had the promise of selectively treating psychotic symptoms. As reviewed by Hollister and Csernansky (1990), these medications included the early phenothiazines that were found useful in the treatment of core symptoms of schizophrenia (e.g., thought disorder and auditory hallucinations). The sudden availability of these novel and potent treatments created a need for clinical assessment tools that were designed for the measurement of patient change. The BPRS was developed in the early 1960s to fill this need (Overall, 1974). The original BPRS (Overall & Gorham, 1962) consisted of an empirically derived set of 16 rating constructs that captured the primary content of two longer rating instruments (Multidimensional Scale for Rating Psychiatric Patients: Lorr, Jenkins, & Holsopple, 1953; and the Inpatient Multidimensional Psychiatric Scale). Preliminary work (e.g., Gorham & Overall, 1961; Overall, Gorham, & Shawver, 1961) detailed the evolution of the instrument. The 16-item scale was derived from the larger collections of prospective items by means of factor analysis. Two additional items (i.e., excitement, disorientation) not contained in the original instrument were added in later years to constitute the 18 items now found in the typically administered BPRS. These items were added to increase the utility of the scale in classification work (Overall, 1974), and they have proved useful in evaluation of geropsychiatric patients as well (Beller & Overall, 1984). It should be noted that contemporary use of the 18-item BPRS typically makes historical reference to the original 16-item version of the scale (Overall & Gorham, 1962). A recent publication by Overall and Gorham (1988) provides an updated reference for the full 18-item scale. In addition, Overall and Gorham (1988) noted that the scale has been in the public domain since the mid-1960s and reproduction for future clinical and research use is authorized. Overall and Gorham only requested that the instrument be faithfully reproduced, remain intact, and that appropriate source citation be included. A copy of the 18-item BPRS is offered in Fig. 26.1. Each symptom concept is scored on a 7-point scale ranging from "not present" to "extremely severe." Scoring takes the form of adding up a total score for the 18 items (i.e., BPRS total score) or forming linear combinations of items based on factor analytic studies discussed later (see discussion of interpretive strategy). It is important to note that two different versions of the scale make use of different ranges for point scoring. The scoring originally proposed by Overall and Gorham and reiterated in subsequent publications (e.g., Overall, 1974) emphasizes "not present'' as a natural origin to be scored 0 with

< previous page

page_792

next page >

< previous page

page_793

next page > Page 793

Fig. 26.1. The Brief Psychiatric Rating Scale. From Overall and Gorham (1988). "extremely severe" thus scored 6. Possibly because some computer programs cannot distinguish 0 from a blank data field, an alternative scoring assigns "not present" a score of 1, with "extremely severe" scored 7. The absence of a meaningful origin (e.g., minimum total score of 18) renders interpretation of a ratio such as "percentage change'' problematic. As a consequence, more recent work (Thompson, Buckley, & Meltzer, 1994) has suggested routine use of a version of the scale in which the original 0 to 6 (rather than 1 to 7) range of scaling is employed. To avoid confusion, always report which scoring convention is being used when reporting BPRS results. BPRS Item Definitions The BPRS was empirically derived from clinical evaluations of hundreds of psychiatric patients. Accordingly, the items of the BPRS represent common constructs familiar to

< previous page

page_793

next page >

< previous page

page_794

next page > Page 794

mental health professionals. Six of the items (emotional withdrawal, tension, mannerisms/posturing, motor retardation, uncooperativeness, and excitement) are rated on the basis of behavior observed during the BPRS interview (Overall & Klett, 1972). Ratings of the other 12 items are rated based on the clinician's impression of the content and quality of the patient interview. To maintain consistency in the definition of the items for clinical and research use, the following BPRS item descriptions are reproduced with permission from Overall and Klett (J.E. Overall & C.J. Klett, Applied Multivariate Analysis, McGraw-Hill, 1972, pp. 6-11). The 18 items of the scale are defined next. Somatic Concern. The severity of physical complaints should be rated solely on the number and nature of complaints or fears of bodily illness or malfunction, or suspiciousness of them, alleged during the interview period. The evaluation is of the degree to which patients perceive or suspect physical ailments to play an important part in their total lack of well-being. Worry and concern over physical health is the basis for rating somatic concerns. No consideration of the probability of true organic basis for the complaints is required. Only the frequency and severity of complaints are rated. Anxiety. Anxiety is a term restricted to the subjective experience of worry, overconcern, apprehension, or fear. Rating of degree of anxiety should be based on verbal responses reporting such subjective experiences on the part of the patient. Care should be taken to exclude from consideration in rating anxiety the physical signs that are included in the concept of tension, as defined in the BPRS. The sincerity of the report and the strength of the experiences as indicated by the involvement of the patient may be important in evaluating the degree of anxiety. Emotional Withdrawal. This construct is defined solely in terms of the ability of the patient to relate in the interpersonal interview situation. Thus, an attempt is made to distinguish between motor aspects of general retardation, which are rated as "motor retardation," and the more mental-emotional aspects of withdrawal, even though ratings in the two areas may be expected to covary to some extent. In the factor analyses of change in psychiatric ratings, a "general retardation" factor has emerged in several different analyses, and it has included emotional, affective, and motor retardation items. It is difficult to identify the basis for rating of ''ability to relate"; however, initial work has indicated that raters achieve reasonably high agreement in rating this quality. Emotional withdrawal is represented by the feeling on the part of the rater that an invisible barrier exists between the patient and other persons in the interview situation. It is suspected that eyes, facial expression, voice quality, lack of variability, and expressive movements all enter into the evaluation of this important but nebulous quality of psychiatric patients. Conceptual Disorganization. Conceptual disorganization involves the disruption of normal thought processes and is evidenced in confusion, irrelevance, inconsistency, disconnectedness, disjointedness, blocking, confabulation, autism, and unusual chain of associating. Ratings should be based on the patient's spontaneous verbal products, especially those longer, spontaneous response sequences, which are likely to be elicited during the initial, nondirective portion of the interview. Attention to the facial expression of the patient during the verbal response may be helpful in evaluating the degree of confusion or blocking. Guilt Feelings. The strength of guilt feelings should be judged from the frequency and intensity of reported experiences of remorse for past behavior. The strength of the guilt feelings must be judged in part from the degree of involvement evidenced by the

< previous page

page_794

next page >

< previous page

page_795

next page > Page 795

patient in reporting such experiences. Care should be exercised not to infer guilt feelings from signs of depression or generalized anxiety. Guilt feelings relate to specific past behavior that the patient now believes to have been wrong and the memory of which is a source of conscious concern. Tension. This construct is restricted in the BPRS to physical and motor signs commonly associated with anxiety. Tension does not involve the objective experience or mental state of the patient. Although research psychologists, in an effort to attain a high degree of objectivity, frequently define anxiety in terms of physical signs, in the BPRS observable physical signs of tension and subjective experiences of anxiety are rated separately. Although anxiety and tension tend to vary together, developmental research with the BPRS has indicated that the degree of pathology in the two areas may be quite different in specific patients. A patient, especially when under the influence of a drug, may report extreme apprehension but give no external evidence of tension whatsoever, or vice versa. In rating the degree of tension, the rater should attend to the number and nature of signs of abnormally heightened activation level (e.g., nervousness, fidgeting, tremors, twitches, sweating, frequently changing of posture, hypertonicity of movements, and heightened muscle tone). Mannerisms and Posturing. This symptom area includes the unusual and bizarre motor behavior by which a mentally ill person can often be identified in a crowd of normal people. The severity of manneristic behavior depends both on the nature and number of unusual motor responses. However, it is the unusualness, and not simply the amount of movement that is to be rated. Odd, indirect, repetitive movements or movements lacking normal coordination and integration are rated on this scale. Strained, distorted, abnormal posture and integration that are maintained for extended periods are rated. Grimaces and unusual movements of lips, tongue, or eyes are considered here also. Tics and twitches that are rated as signs of tension are not rated as manneristic behavior. Grandiosity. Grandiosity involves the reported feeling of unusual ability, power, wealth, importance, or superiority. The degree of pathology should be rated relative to the discrepancy between self-appraisal and reality. The verbal report of the patient and not his demeanor in the interview situation should provide the primary basis for evaluation of grandiosity. Care should be taken not to infer grandiosity from suspicions of persecution or from other unfounded beliefs where no explicit reference to personal superiority as the basis for persecution has been elicited. Ratings should be based on opinion currently held by the patient, even though the unfounded superiority may be claimed to have existed in the past. Depressive Mood. Depressive mood includes only the affective component of depression. It should be rated on the basis of expression of discouragement, pessimism, sadness, hopelessness, helplessness, and gloomy theme. Facial expression, weeping, moaning, and other modes of communicating mood should be considered, but motor retardation, guilt, and somatic complaints that are commonly associated with the psychiatric syndrome of depression should not be considered in rating depressive mood. Hostility. Hostility is a term reserved for reported feelings of animosity, belligerence, contempt, or hatred toward other people outside the interview situation. Raters may attend to the sincerity and affect present reporting on such experiences when they attempt to evaluate the severity of pathology in the symptom area. It should be noted that

< previous page

page_795

next page >

< previous page

page_796

next page > Page 796

evidences of hostility toward the interviewer in the interview situation should be rated on the Uncooperativeness scale and should not be considered in rating hostility as defined here. Suspiciousness. Suspiciousness is a term used to designate a wide range of mental experience in which the patient believes to have been wronged by another person or believes that another person has, or has had, intent to wrong. Because no information is usually available as a basis for evaluating the objectivity of the more plausible suspicions, the term accusation might be the degree to which the patient tends to project blame and to accuse other people or forces of maliciousness or discriminatory intent. The pathology in this symptom area may range from mild to suspiciousness through delusions of persecution and ideas of reference. Hallucinatory Behavior. The evaluation of hallucinatory experiences frequently requires judgment on the part of the rater whether the reported experience represents hallucination or merely vivid mental imagery. In general, unless the rater is quite convinced that the experiences represent true deviation from normal perceptual and imagery processes, hallucinatory behavior should be rated as "not present." Motor Retardation. Motor retardation involves the general slowing down and weakening of voluntary motor responses. Symptomatology in this area is represented by behavior that might be attributed to the loss of energy and vigor necessary to perform voluntary acts in a normal manner. Voluntary acts that are especially affected by reduced energy level include those related to speech as well as gross muscular behavior. With increased motor retardation, speech is slowed, weakened in volume, and reduced in amount. Voluntary movements are slowed, weakened, and less frequent. Uncooperativeness. This is the term adopted to represent signs of hostility and resistance to the interviewer and interview situation. It should be noted that "uncooperativeness" is judged on the basis of response of the patient to the interview situation, whereas "hostility" is rated on the basis of verbal reports and hostile feelings or behavior toward others outside the interview situation. It was found necessary to separate the two areas because of an occasional patient who refrains from any reference to hostile feelings and who even denies them while evidencing strong animosity toward the interviewer. Unusual Thought Content. This symptom area is concerned solely with the content of the patient's verbalization: the extent to which it is unusual, odd, strange, or bizarre. Notice that a delusional or paranoid patient may present bizarre or unbelievable ideas in a perfectly straightforward, clear, and organized fashion. Only the unusualness of content should be rated for this item, not the degree of organization or disorganization. Blunted Affect. This symptom area is recognized by reduced emotional tone and apparent lack of normal intensity of feeling or involvement. Emotional expressions are apt to be absent or of marked indifference and apathy. Attempted expressions of feeling may appear to be mimetic and without sincerity. Excitement. Excitement refers to the emotional, mental, and psychological aspects of increased activation and heightened reactivity. The excited patient tends to be active, agitated, quick, loud, and emotionally responsive. Whereas tension is a construct concerned with physical or motor manifestations of activation, excitement has reference primarily to the mental and emotional areas. Tension usually implies a binding of the physical activation potential, whereas excitement is the underlying activation potential. The degree of excitement depends on the strength of arousal and heightened affect.

< previous page

page_796

next page >

< previous page

page_797

next page > Page 797

Disorientation. This rating construct has been included to provide a place for recording the particular kind of confusion that is evidenced by lack of memory or proper association for persons, places, or times. Disoriented individuals may not know where they are, how to relate where they are to other points in the environment, or how to get from one place to another. The identities of persons that should be familiar may be confused. Location in time and place, and even personal identity, may be confused or unavailable for recall. Distortions in identity such as those that occur in delusional systems should not be rated under disorientation. Disorientation represents the type of confusion that frequently occurs in organic conditions. BPRS Interview Procedures The completion of the BPRS is intended to be based on a clinical interview with a patient. The format of this interview follows the general format of a routine, brief clinical assessment interview. Such an interview should cover a broad range of symptoms (e.g., mood, anxiety, thought content, and process) that are normally assessed in a 20-to 30-minute mental status examination. The original paper on the scale (Overall & Gorham, 1962) suggested spending approximately 3 minutes to develop rapport with the patient. This is followed by approximately 10 minutes of "nondirective interaction" in which clinical information can be obtained in an informal manner (Overall & Gorham, 1962). The final 5 to 10 minutes of the interview are used to ask specific questions to address topic areas that may not have been adequately addressed during the nondirective phase of the interview. Such an interview process allows for the most natural exchange of information between the patient and clinician. Naive BPRS interviewers often read down the 18-item rating scale asking pointed questions about each item in a fixed manner (e.g., "Are you worried about your health?"; "Are you anxious?''; "Have you been feeling guilty?"). Such an interview is less than ideal for both the comfort of the patient and the validity of the data being obtained. Rhoades and Overall (1988) offered sample questions for tapping various BPRS constructs. Structured clinical interviews for conducting the BPRS data gathering have also been offered (e.g., Tarell & Schulz, 1988). However, the wisdom of making the BPRS into a highly structured interview has been questioned (Overall & Klett, 1972; Rhoades & Overall, 1988). The BPRS was intended for use with an adaptable clinical interview that is capable of adjusting to a wide variety of patient and interview situations. Highly structured interviews may be capable of increasing reliability and symptom coverage when the BPRS is being used by less experienced clinicians. This increase in reliability may come at the cost of decreased validity in that a highly structured interview may not appropriately assess some patients (e.g., the guarded and suspicious patient who requires expert questioning to elicit symptomatology). In sum, it is recommended that the scale be based on a generally informal clinical interview that optimizes both comfort for the patient and data gathering for the clinician. Important Issues in the Clinical Use of the BPRS There are several important issues that are routinely raised when training raters in the use of the BPRS. The following address some of these common issues.

< previous page

page_797

next page >

< previous page

page_798

next page > Page 798

Use of a Reference in Time for the BPRS Rating. The BPRS was designed with the goal of documenting the change resulting from treatment interventions. Initial ratings may be performed at the initiation of treatment (e.g., inpatient admission interview, outpatient intake assessment). Clinicians using this instrument should clarify the exact time period that is relevant for the intended use of the particular BPRS data. At times, patients may vividly relate phenomena that they experienced many weeks ago (e.g., hallucinations, depressed mood, persecutory delusions), but that are reported in a believable manner to be currently absent or greatly diminished. Naturally, clinicians would not want to rate those remote experiences for the evaluation of current treatment interventions. Questions may be asked regarding how a patient has been feeling lately with a general time frame of the past week often being specified. Keep in mind that a large percentage of the items are rated based on behavior (e.g., tension, affect, conceptual organization) that the clinician directly observes within the interview and are not based on any form of self-report of the patient. How Should Information Held by One Specific Rater in a Joint Rating be Treated? Several authors (Overall & Gorham, 1962; Overall & Klett, 1972) have suggested using two independent raters to enhance reliability of BPRS ratings. Usually, these raters participate in a single joint interview and then independently complete rating forms (see section on reliability). In clinical research settings, these raters typically include the patient's primary psychiatrist or psychologist and an additional clinical or research staff member (e.g., research assistant, nurse). Typically, one of the raters assumes primary responsibility for conducting the interview with the other participating as an observer. However, each of these interviewers may come to the rating session with uniquely different observations of the patient in the time period immediately preceding the interview. For example, one of the raters may have heard the patient detailing a previously unmentioned bizarre delusion just prior to the rating sessions. However, the primary interviewer in the joint interview may be unaware of this newly detected symptom and not uncover it during the interview. Because of this possibility, it is recommended that the other rater be given an opportunity to ask the patient about aspects of the behavior that may be known only to that rater. Such a procedure assures a more thorough assessment with greater validity. Familiarity with the Patient Being Interviewed. In some settings, staff members may be asked to conduct BPRS interviews with patients with whom they have little familiarity. Obviously, this is less than an ideal situation. Some patients require extensive evaluation to detect symptoms that may be well defended (e.g., patients with high levels of suspiciousness). It is recommended that at least one of the raters be a clinician who is quite familiar with the patient, such as a treating psychologist or psychiatrist. In situations where this is impossible, a brief review of clinical records or discussion with staff members prior to the session may help identify target symptoms for assessment. Although such prior information may provide an important foundation, care must be exercised to verify the information concerning the time frame covered by the interview. Some research protocols require that ratings be restricted to observations and information elicited in the interview, although the use of prior information may still serve as a guide. Clinical Judgment and the Use of the BPRS. One of the features that makes the BPRS unique is that it is meant to utilize fully the clinical judgment of skilled mental health professionals. The effective use of the scale is not, however, limited to clinicians with advanced training. Trained research assistants and allied staff (e.g., psychiatric

< previous page

page_798

next page >

< previous page

page_799

next page > Page 799

nurses, clinical social workers) certainly can become reliable raters. However, a common mistake in using the scale is failure to factor in the quality of clinical judgment, observation, and listening skills. This problem can be evidenced in several ways. For example, although a patient may deny a specific question about experiencing auditory hallucinations, it may be completely clear to raters that the patient is responding actively to internal stimuli. Naive raters may mark "not present" for hallucinations in such a situation because "the patient said he or she was not hearing voices." A similar problem can be evidenced in self-disclosures made by a patient. It is common that, at one point in an interview, a patient will deny a certain symptom, only then to make extensive disclosures about this or similar symptoms at other points in the interview. Consider all symptoms described by a patient, rather than relying on denial by the patient during direct questioning. A Particular Symptom can be Rated on Multiple Items of the BPRS. Naive raters often feel that symptom constructs are independent and must correspond in a one-to-one manner with the item constructs on the BPRS. However, in clinical practice, a particular form of psychopathology may relate to multiple items. Grandiose delusions are an example, as such pathology may appropriately be rated on both the Grandiosity and Unusual Thought Content items of the scale. Remembering the Distinctions Between Hostility Versus Uncooperativeness and Tension Versus Anxiety. The restrictive definitions of these constructs is important to emphasize in training raters to use the scale. Specifically, take caution to note that hostility is meant to rate feelings directed toward individuals outside the rating setting, whereas feelings of anger directed toward the interviewer or interview situation are scored under the Uncooperativeness item (Overall & Klett, 1972). In essence, a patient cannot be hostile to an interviewer, only uncooperative. Understanding the distinction between the item definitions for Anxiety (scored without accounting for motor behavior) and Tension (scored exclusively by observing motor behavior) is often an important hurdle in training raters to use the scale appropriately. Neurological Disorders Such as Tardive Dyskinesia and the BPRS. Tardive dyskinesia (TD) is an involuntary movement disorder typically noted in the form of choreoathetoid movements of the orificial area and upper limbs (Lohr & Wisniewski, 1987). The disorder often is typified by rhythmic, repetitive movements of the mouth, tongue, lips, face, trunk, upper extremities (e.g., hands), and lower extremities (e.g., feet). TD is related to chronic administration of most antipsychotic medications (Lohr & Wisniewski, 1987). Clozapine, an atypical neuroleptic, does not appear to produce the syndrome (Naber, Leppig, Grohmann, & Hippius, 1989). Though it is uncertain at this time, it is possible that new antipsychotic agents (e.g., Olanzapine, Risperidone) with superior side effects profiles may also have a diminished risk of TD when compared to traditional agents such as haloperidol. Although epidemiologic prevalence studies offer diverse estimates of the incidence of TD, it is undoubtedly present in a significant percentage (e.g., 20% or more) of some chronically treated psychiatric populations (Lohr & Wisniewski, 1987). The high prevalence of this movement disorder in some psychiatric populations can present problems in terms of ratings on the BPRS. Manifestations of this type can resemble stereotyped movements that potentially could be rated under the BPRS Mannerisms and Posturing item. However, rating TD under Mannerisms/Posturing may have drawbacks in research studies directed toward the evaluation of new antipsychotic compounds. In

< previous page

page_799

next page >

< previous page

page_800

next page > Page 800

the double-blind testing of potential new antipsychotic medications, a reference medication selected from currently approved medications (e.g., haloperidol) typically is compared to the new compound for treatment efficacy. Prior to randomization to treatment with the new or standardized medication, patients often may show significant TD at an unmedicated baseline. Increasing doses of traditional neuroleptics such as haloperidol have the ability to temporarily "mask" TD as well as to produce it (Hollister & Csernansky, 1990), giving at least the appearance that the TD has decreased. New compounds that act through novel mechanisms may not reduce the severity of TD during double-blind testing. Accordingly, if clinicians rate TD under Mannerisms/Posturing, they actually may be introducing a bias against finding efficacy for novel compounds when they examine some BPRS-derived outcome criteria (e.g., total BPRS score). The current recommendation is that, if it is possible, clinicians should differentiate Mannerisms/Posturing from TD. Oral brucal symptoms of TD (e.g., rhythmic movements of the tongue, lips, jaw) are common and can typically be separated from mannerisms and posturing seen in some psychotic patients. TD in the outer extremities (i.e., arms, legs) may be somewhat more difficult to differentiate from the mannerisms seen in some patients. In some settings (e.g., multicenter clinical drug studies), the use of expert clinicians who are able to identify a syndrome of TD and differentiate TD from psychotic mannerisms may yield greater validity and sensitivity for the BPRS. However, such clinicians may not always be available and the true ability to differentiate TD and mannerisms may not be possible. In the very least, clinicians should address this problem and establish guidelines for the use of the BPRS within particular settings. Avoiding Psychodynamic Inferences. Though clinicians are encouraged to utilize clinical judgment in completing the BPRS, they should avoid inferring or assuming symptoms based on psychodynamic or other intrapsychic formulations. For example, a patient who has experienced a recent emotional loss may not actually manifest depressed mood or anxiety. In the absence of direct evidence for such symptoms, clinicians should not rate high levels of these symptoms, because the patient is "defending himself/herself from repressed anxiety and depression." Expanded Versions of the BPRS and Other Rating Scales Related to the BPRS There have been several attempts to form expanded versions of the BPRS through the addition of items. For example, Dingemans, Linszen, Lenior, & Smeets (1995) offered an expanded 24-item BPRS that includes new items for bizarre behavior, suicidality, self-neglect, motor hyperactivity, distractibility, and elevated mood. The adequacy of the BPRS in documenting the full range of symptoms in schizophrenia has been challenged in recent years. The Positive and Negative Syndrome Scale (PANSS; Kay, Fiszbein, & Opler, 1987) has been offered as a rating scale that contains the standard 18 BPRS items in addition to 12 new items. These new items include some constructs (e.g., delusions, poor rapport) similar to existing BPRS items (e.g., unusual thought content, emotional withdrawal). The scale has been designed to offer combinations of items thought to reflect a positive, negative, and general psychopathology scale. The negative symptoms scale of the PANSS includes five new items (e.g., stereotyped thinking, difficulty in abstract thinking, passive/apathetic social withdrawal) not found on the BPRS. The PANSS has been used in a variety of factor analytic studies

< previous page

page_800

next page >

< previous page

page_801

next page > Page 801

(Lindenmayer, Bernstein-Hyman, & Grochowski, 1994; Lindenmayer, Grochowski, & Hyman, 1995; Lindenmayer, Bernstein-Hyman, Grochowski, & Bark, 1995) addressing the nature of symptom presentation in schizophrenia. There are several advantages and disadvantages in the use of the PANSS. Obviously, the scale takes the BPRS and turns it into a more diagnosis specific instrument that may contain items of limited relevance in other diagnostic samples. Though an anchored BPRS is contained within the PANSS, the instrument may be of reduced utility and efficiency in research and clinical setting were a rating scale may need to be applied to patients with diverse diagnoses (e.g., depression, severe Axis II disorders, schizoaffective disorder) other than schizophrenia. The scale is nearly twice as long as the BPRS. This increased length may be an impedance in situations where patient and clinician time constraints are a reality. Nevertheless, the addition of new items to the BPRS may have advantages in schizophrenia research by increasing the depth of coverage for the primary dimensions of negative and positive sympotms. Although increased coverage of major symptom dimensions of schizophrenia might be expected to enhance reliability of their assessment, some recent work (Peralta & Cuesta, 1994) has noted potential psychometric limitations of the extended scale. In this work, several PANSS items (i.e., poor impulse control, preoccupation) demonstrated poor interrater reliability (intraclass correlation < .40) and the overall general psychopathology measure showed modest interrater reliability (intraclass correlation = .56). In addition, the positive scale was noted to have only modest internal consistency. These are actually lower than has been reported for the BPRS total score and factor scores. In sum, the PANSS is sufficiently different from the BPRS to make a global recommendation about its use difficult. The decision to use the scale may take into consideration the diagnostic characteristics of the patient sample, efficiency concerns, and specific research needs. Normative Information The use of specific norms with this instrument is limited somewhat, as the scale generally is designed to measure change within a patient or a group of patients receiving a similar treatment (e.g., medication intervention), rather than make comparisons to specific norms for clinical decision making. Attempts have been made to categorize patient profile subtypes (e.g., anxious-depression, florid thinking disorder) based on cluster analytic work, and mean BPRS profiles for these "phenomenological types" are provided in Overall (1974). These profile subtypes are said to be the most common patterns of BPRS symptoms found in psychiatric populations. Also, factor analytic work has identified item factors that replicate across patients, and the examination of scores for these factors may isolate more specific treatment effects. Reliability of the BPRS Clinician-based psychopathology rating scales such as the BPRS raise unique issues in reliability assessment. A major variance determinant for the reliability of the BPRS does not lie with the scale, but rather with the raters using the scale. Accordingly, to properly use the BPRS, raters must be familiar with the identification of the constructs

< previous page

page_801

next page >

< previous page

page_802

next page > Page 802

contained within the scale and rate patients with firm adherence to the definitions outlined for the BPRS items (Overall & Gorham, 1962; Overall & Klett, 1972). Adequate reliability levels for the BPRS total score and individual items have been reported in several studies. The original work of Overall and Gorham (1962) noted generally adequate interrater item reliability (r = .62 to .87, except for the tension item where r = .56) when used by experienced raters observing a jointly conducted interview. A summary of numerous studies offering BPRS reliability data was performed in an extensive review by Hedlund and Vieweg (1980). In general, interrater reliability of the scale was shown to be fairly high, with typical reliability measures (expressed as Pearson correlations) for the BPRS total score varying around .85 (Hedlund & Vieweg, 1980). Individual item reliabilities can fall to lower levels; this observation has been the concern of some authors (e.g., Gabbard et al., 1987). Further work has noted adequate interrater reliability in an inpatient setting in the Netherlands (Dingemans, Winter, Bleeker, & Rathod, 1983). Gottlieb, R.E. Gur, and R.C. Gur (1988) examined the interrater reliability of the BPRS in a sample of patients with dementia of the Alzheimer type. Interrater reliability was found to be quite high even though this population represents a diagnostic group that varies greatly from the types of patients who typically are assessed with the BPRS. One recently published work (McAdams, Harris, Bailey, Fell, & Jeste, 1996) noted adequate BPRS reliability in the evaluation of geriatric outpatient schizophrenics. Other work (Swanson, Turetsky, Bilker, R.C. Gur, & R.E. Gur, 1997) examined sex differences in the ratings of experienced BPRS raters. The results suggested that experienced female raters may rate patients as having a greater level of psychopathology on the BPRS. No ratersex by patient-sex interactions were noted. The results suggest that such sex differences should be evaluated in research settings that make use of the scale; however, consistent rater differences do not necessarily imply reduced reliability. It is important that the same raters follow patients where repeated evaluations are required. It is important to stress that interrater reliabilities typically are derived from a single joint interview and reliability measures decrease if raters interview patients and complete ratings based on independent rating sessions (Flemenbaum & Zimmermann, 1973). Moreover, long-term test-retest reliability typically is not of interest with the BPRS, because one principal goal of the scale is to measure change resulting from acute treatment. Reliability always will set the upper limit of validity, therefore techniques to maximize reliability are essential within settings using the BPRS for clinical or research purposes. First, adequate training and practice with the scale is essential. A weekly BPRS rating calibration session may be performed in research settings such as an inpatient clinical research center. A volunteer patient from the treatment program is interviewed by a clinician in front of a group of other clinicians and independent ratings are completed after the interview. One procedure that can be performed is to have the two most experienced BPRS raters arrive at a consensus rating based either on a mean of their two ratings or on a negotiated consensus following a discussion of their independent ratings. Measures of agreement (e.g., Mehalanobis distance) then can be calculated between other raters and the consensus rating. Raters can be entered into the pool of calibrated raters after they meet a consistent standard of agreement that can be designated (e.g., within a certain Mehalanobis distance for a fixed number of consecutive ratings). Ongoing measures of agreement can be calculated for all raters in such settings, and raters who drift out of reliable agreement can be removed temporarily from the rating pool until they once again demonstrate an adequate standard of reliability. This use of the BPRS as a training and research tool can be integrated easily into weekly

< previous page

page_802

next page >

< previous page

page_803

next page > Page 803

clinical case conferences that perform diagnostic interviews in a broad range of clinical settings. Recent work (Andersen et al., 1993) examined the rating skills acquisition rate of a pool of BPRS raters. The results suggested that extensive (30 sessions) rating practice was required before asymptotic levels of reliability were obtained. Interestingly, the results suggested that raters can more rapidly become reliable in the assessment of positive symptoms than negative symptoms. Another technique for maximizing reliability is the use of multiple raters observing a joint interview and independently completing ratings that are subsequently averaged. This technique was recommended in the original work of Overall and Gorham (1962), but it is not always used for practical reasons, such as lack of available rater time. Using multiple raters always increases reliability. As noted by Kraemer (1991), increasing reliability also increases the statistical power of experiments designed to detect treatment effects (e.g., medication or psychotherapy outcome studies). This ability to increase statistical power by using multiple raters rarely is considered in multicenter clinical drug studies that typically require only one rater during patient evaluations. However, when the practical and ethical issues of such studies are considered (e.g., risks of unknown side effects), it is somewhat surprising that the standard of practice is not to employ multiple raters. With the increased power provided by multiple raters, evaluations of medication efficacy could employ fewer patients. In detecting efficacy with fewer subjects, risks can be minimized in initial drug screening. Anchoring the BPRS An additional strategy for increasing the reliability of the BPRS has been the development of versions of the BPRS with behavioral anchors. These modified versions of the BPRS provide behavioral descriptions for anchors that define the range of severity for each BPRS item. These anchored scales vary somewhat in the specificity of item descriptors. Gabbard et al. (1987) published a version of the BPRS with suggested descriptions for the severity of very mild, moderate, and severe pathology on each of the BPRS items. Limited reliability data from a small sample of ratings (videotaped interviews) suggested that teams of raters employing the anchored version of the scale scored the tapes with a higher level of interrater reliability (Gabbard et al., 1987). Tarell and Schultz (1988) provided an anchored version of the BPRS, as well as specific interview questions for use by nursing personnel. Bech, Larsen, and Andersen (1988) provided a modified and anchored version of the BPRS, which they claimed to be tailored to the assessment of schizophrenic patients. However, the modifications of Bech et al. are rather extensive and include changing the scale from a 7-point to a 5-point scale and eliminating several items. An anchored version of the BPRS with extensive detail for item descriptors has been offered by Woerner, Mannuzza, and Kane (1988). Table 26.1 offers examples from two items of this scale in order to illustrate a version of the BPRS with anchors. The PANNS (Kay et al., 1987) contains an anchored version of the BPRS within a spectrum of additional items thought to measure positive and negative symptoms. This scale has the advantage of offering extended symptom coverage for applications in which schizophrenic patients are specifically selected for evaluation. Some work (Bell, Milstein, Beam-Goulet, Lyskaker, & Cicchetti, 1992) has suggested that the items added

< previous page

page_803

next page >

< previous page

page_804

next page > Page 804

TABLE 26.1 Examples of Anchored BPRS Items from the Brief Psychiatric Rating Scale-Anchored (BPRS-A) ItemHostility. Animosity, contempt, belligerence, disdain for other 10. people outside the interview situation. Rate solely on the basis of the verbal report of feelings and actions of the patients toward others in the past week. Do not infer hostility from neurotic defenses, anxiety, or somatic complaints. 1Not reported = 2Very Mild: occasionally feels somewhat angry. = 3Mild: often feels somewhat angry, or occasionally feels = moderately angry. 4Moderate: occasionally feels very angry, or often feels = moderately angry. 5Moderately Severe: often feels very angry. = 6Severe: has acted on his anger by becoming verbally or = physically abusive on one or two occasions. 7Very Severe: has acted on his anger on several occasions. = 9Cannot be assessed adequately because of severe formal thought = disorder, uncooperativeness, or marked evasiveness/guardedness; or Not assessed. ItemSuspiciousness. Belief (delusional or otherwise) that others have 11. now, or have had in the past, malicious or discriminatory intent toward the patient. On the basis of verbal report, rate only those suspicions which are currently held whether they concern past or present circumstances. Rate on the basis of reported (i.e., subjective) information pertaining to the past week. 1Not reported. = 2Very Mild: rare instances of distrustfulness which may or may = not be warranted by the situation. 3Mild. occasional instances of suspiciousness that are definitely = not warranted by the situation. 4Moderate: more frequent suspiciousness, or transient ideas of = reference. 5Moderately Severe: pervasive suspiciousness, frequent ideas of = reference, or an encapsulated delusion. 6Severe: definite delusion(s) of reference or persecution that is = (are) not wholly pervasive (e.g., an encapsulated delusion). 7Very Severe: as above, but more widespread, frequent, or = intense. 9Cannot be assessed adequately because of severe formal thought = disorder, uncooperativeness, or marked evasiveness.guardedness; of Not assessed. Note: From Woerner, Mannuzza, and Kane (1988). to the PANNS have a relatively low correlation with the original BPRS items, suggesting that they measure dimensions distinct from the original BPRS. An additional recent scale that largely incorporates the BPRS is the Psychiatric Symptom Assessment Scale (PSAS; Bigelow & Berthot, 1989). This 22-item scale adds several items (e.g., motor hyperactivity) not contained within the BPRS. The scale contains suggested anchor points for each item. Initial factor analytic data have been provided for a sample that was too small (n = 60) for adequate interpretation of a 22-item instrument. However, these factor analytic data suggest that the PSAS differed somewhat from the factors typically associated with the BPRS. In sum, this instrument represents an expanded and somewhat altered BPRS and is too early in development to demonstrate clear-cut psychometric advantages over the BPRS. The proliferation of anchored versions of the BPRS raises the question of whether these scales offer advantages over the standard BPRS. Unfortunately, given the relative paucity of psychometric data offered for most of these anchored scales, one is also left with the question of whether these scales actually represent the BPRS, or are in fact new and different scales. Rhoades and Overall (1988) detailed the potential problems in adding anchor points to the BPRS. The anchored scales have generally not been subjected to repeated factor analytic studies with large sample sizes to assess whether

< previous page

page_805

next page > Page 805

they yield factors comparable to the standard BPRS. There also is a possibility that anchored scales could be less sensitive to drug effects than the original BPRS by virtue of exclusionary definitions. Reliability always sets the upper limit to validity. However, it rarely is considered that procedures to increase reliability may actually reduce the validity and sensitivity of a scale. Strict behavioral anchors may reduce the ability to score subtle but observable changes in some items (e.g., suspiciousness). For example, during medication treatment, a clinician's overall impression of a patient's suspiciousness may be that it has improved greatly during treatment. Yet, if a patient continues to show a definite delusion (potentially fixed and longstanding), then behavioral anchors may continue to suggest that the patient should be rated as severe in terms of suspiciousness. In sum, Rhoades and Overall (1988) were perhaps correct in urging caution in the use of anchored scales. More than 1,000 published studies demonstrate that the original BPRS can be used as a sensitive measure of treatment effects. Such publication of reliability and validity of the altered scales is at present lacking. It would be useful to perform further work to determine whether anchored scales can replicate the factor structure and treatment sensitivity of the standard BPRS. BPRS Validity The major validation of the BPRS takes the form of discriminant validity obtained in a large number of controlled studies of medication response (Rhoades & Overall, 1988). A comprehensive accounting of the published studies is far beyond the scope of the present work because their number likely runs into the thousands. The scale and its factor subscales have been shown to be sensitive to treatment effects of antipsychotic medications in samples of schizophrenic patients (e.g., den Boer et al., 1990; Nair et al., 1986; Otani, Kondo, Kaneko, Ishida, & Fukishima, 1994). The BPRS was used to demonstrate the superior efficacy of Clozapine, an atypical antipsychotic medication that may be useful in the treatment of schizophrenic patients who are refractory to conventional medications (Kane, Honigfeld, Singer, & Meltzer, 1988). Recent work (Woerner, Alvir, Kane, Saltz, & Lieberman, 1995) has demonstrated the sensitivity of the BPRS to measure clinical response to antipsychotic medications in elderly patient samples. The BPRS has shown discriminant validity in the study of somewhat more novel treatment issues in schizophrenia. Several studies (De Freitas & Schwartz, 1979; Lucas et al., 1990) noted the detrimental effects (e.g., increased psychosis) of caffeine in schizophrenic patients. Antipsychotic medications are often of limited effectiveness in reducing a broad range of both psychotic and other (e.g., anxiety, blunted affect, depression) symptoms in schizophrenia. Accordingly, a growing number of augmentation studies (Kellner, Wilson, Muldawer & Pathak, 1975; Marshall et al., 1989; Small, Kellams, Milstein, & Moore, 1975; Wolkowitz et al., 1988) have been performed and have used the BPRS to document the changes brought about when additional pharmacological treatments (e.g., antianxiety agents) are added to antipsychotics. Tandon, Mann, Eisner, and Coppard (1990) used the BPRS to measure the effects of anticholinergic medications in clinical symptoms of schizophrenic patients. The BPRS has been used in studies examining the clinical efficacy of medications in the treatment of depression (Feighner, Merideth, & Claghorn, 1984; Hollister, Overall, Pokorny, & Shelton, 1971). However, it should be noted that other rating scales such as the Hamilton Rating Scale for Depression (HAM-D; Hamilton, 1960) and the Beck

< previous page

page_805

next page >

< previous page

page_806

next page > Page 806

Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961) frequently are included in such studies, with the BPRS included to provide a broader picture of associated psychopathology. Moreover, some data (Raskin & Crook, 1976) suggest that the BPRS may be limited in its sensitivity in documenting antidepressant drug effects. Further work supporting the validity of the BPRS has sought to determine the relationship of the scale to both other clinician-based rating scales and commonly employed self-report instruments. Many of these studies have examined relations to other instruments within samples of schizophrenic patients. Depressive Symptoms. Several works have examined correlations between the HAM-D (Hamilton, 1960) and the BPRS in schizophrenia. The evaluation of such relations is important because schizophrenic patients often manifest depressive symptoms (Becker, 1988) and the efficient documentation of the overall severity of these symptoms is important in both research and clinical practice. The HAM-D represents a somewhat lengthy scale (24 items in one commonly administered version) that requires additional questioning, item completion, and data recording. Craig, Richardson, Pass, and Bregman (1985) examined BPRS/HAM-D relations in 32 medicated inpatients with a diagnosis of schizophrenia. A combination of BPRS items reflecting depressive symptoms was found to be highly correlated (r = .79) with the HAM-D total score. Newcomer, Faustman, Yeh, and Csernansky (1990) evaluated this same relationship in a larger sample (n = 69) of inpatient schizophrenics who were free of treatment medications at the time of assessment. Once again, a robust (Spearman r = .80) relation was obtained between a BPRS cluster of depression related items and the total HAM-D score. Such a correlation is essentially at the level of interrater reliability for the instruments. These data suggest adequacy of the BPRS to yield a global depression score in schizophrenic patients. More recently, it has been noted that the HAM-D contains a depressive symptom factor that correlates well with the BPRS depression factor (Goldman, Tandon, Liberzon, & Greden, 1992). In addition, the HAM-D was found to contain clusters of items that correlated with negative symptom measures from both the BPRS and the Scale for the Assessment of Negative Symptoms. Negative Symptoms. The assessment of negative symptoms in schizophrenia has been a subject of growing interest in recent years. Negative symptoms often are thought of as an "absence" of behaviors typically found in nonpsychiatric populations (e.g., social interaction, affective modulation; Kulhara & Chadda, 1987). The concept of primary (i.e., core disorder-related symptoms) and secondary (e.g., symptoms that may be induced by the presence of significant psychotic symptoms or medication side effects) negative symptoms has developed in recent years. Carpenter (1991) further proposed the notion of deficit symptoms to encompass enduring traits (e.g., poor vocational adjustment, social isolation) that may be related to an amotivational syndrome. This growing interest in the assessment of negative symptoms is important for several reasons. Antipsychotic medications used in the treatment of schizophrenia are often effective in the reduction of core positive symptoms, but they may be less effectivethough not wholly ineffectivein the treatment of negative symptoms (Breier et al., 1987). Measurement considerations may well be of importance in this issue because it is likely more difficult to measure the absence of a subtle symptom (e.g., range of emotional expression) normally present than the presence of a symptom (e.g., delusions/unusual thought content) never present in unaffected individuals.

< previous page

page_806

next page >

< previous page

page_807

next page > Page 807

Some work (Guelfi, Faustman, & Csernansky, 1989) has found that negative symptoms are independent from core positive symptoms (e.g., hallucinations or unusual thought content) in unmedicated schizophrenic patients. The possibility exists that somewhat different biological mechanisms underlie negative symptoms (Csernansky et al., 1990; Newcomer, Faustman, Whiteford, Moses, & Csernansky, 1991). In addition, the specific treatment of negative symptoms in schizophrenia has become a high priority target for medication design and development. A broad range of rating scales have been offered for the evaluation of negative symptoms. One widely recognized instrument in current use is the Scale for the Assessment of Negative Symptoms (SANS; Andreasen & Olsen, 1982). This rating scale provides a broad evaluation in areas such as affective modulation and social interaction. Other rating scales tapping negative symptoms include a measure of emotional blunting developed by Abrams and Taylor (1978). The Positive and Negative Syndromes Scale (Kay et al., 1987) has also been proposed as a more comprehensive negative and positive symptom measure. Factor analytic studies have shown that the BPRS contains a distinct cluster of items measuring withdrawal/retardation. This combination of BPRS items has often been used to reflect a global negative symptom measure. Several studies have examined the degree to which such BPRS items correlate with the SANS, a scale specifically developed to measure negative symptoms. An initial work (Thiemann, Csernansky, & Berger, 1987) found that the BPRS withdrawal/retardation factor (Blunted Affect, Emotional Withdrawal, Psychomotor Retardation) correlated with the SANS total score at the level of interrater reliability. In other words, in this sample of schizophrenic patients, the scales were found to be redundant for yielding an overall measure of negative symptoms. Czobor, Bitter, and Volavka (1991) generally replicated this finding, noting an extremely high correlation between the SANS composite score and the withdrawal/retardation factor of the BPRS. Individual SANS items and subscales appeared to contain information somewhat unique from the BPRS measures, but such a finding would be expected due to any restriction in the number of items selected. Further work (R.E. Gur et al., 1991) has noted a strong correlation between overall ratings with the SANS and the withdrawal/retardation factor of the BPRS. More recent work (Nicholson, Chapman, & Neufeld, 1995) examined the relation between the SANS and several BPRS combinations of items reflecting negative symptoms. In a large sample (N = 100), the BPRS negative symptom measure had a composite correlation with the SANS of approximately .85. This is very close to the limit of reported reliability, suggesting that the two scales measure essentially an identical construct. The issue of the measurement of negative symptoms is clearly complicated. There has been a recent general notion that the use of three simple items from the BPRS ''could not possibly measure such a complicated construct as negative symptoms." Yet, as noted earlier, repeated studies have suggested the the addition of lengthy rating scales such as the SANS offers limited, if any, additional information not found in the BPRS withdrawal/retardation factor. Use of other new rating scales such as the PANSS supplements the BPRS with newer items thought to reflect negative symptoms. Though this may be of value, clinicians should still be required to demonstrate that a longer and more complicated measure adds important new information not found in the original rating scale. In sum, if clinicians desire a global measure of negative symptoms in schizophrenia, the BPRS provides a measure that correlates with commonly used summary measures from the SANS, an instrument whose specific intent is to measure negative symptoms.

< previous page

page_807

next page >

< previous page

page_808

next page > Page 808

Moreover, as extensively detailed by Thiemann et al. (1987), the BPRS provides other advantages that should be taken into consideration in scale selection. Unlike the SANS, the BPRS provides assessment of a broader spectrum of symptoms (e.g., positive symptoms such as hallucinations, and other symptoms such as depression). Thiemann et al. (1987) noted that the combined use of the SANS and BPRS results in a greater cost (i.e., scale completion time, data storage, interview time) for both patients and clinicians. In addition, the use of multiple intercorrelated rating scales can complicate data analyses by increasing the probability of both Type I and Type II errors (Thiemann et al., 1987). New scales for the assessment of schizophrenia (e.g., PANSS) add symptom constructs thought to reflect negative symptoms. Further work would be of interest to determine whether these additional items add further sensitivity and validity to the measurement of negative symptoms in schizophrenia. Thought Disorder and the BPRS. As a global assessment scale measuring a broad range of symptoms, the BPRS provides no qualitative information on the exact nature of thought process disturbances in disorders such as schizophrenia. Clinical and research scales have been developed to provide a detailed measure of the characteristics of thought disorder often found in psychotic patients. The Scale for the Assessment of Thought, Language, and Communication (TLC; Andreasen, 1979) represents such an attempt to provide a detailed accounting of the structure of speech. Simpson and Davis (1985) examined the relation between measures derived from ratings that jointly used the BPRS and the TLC. Ratings were conducted in a mixed psychiatric population, and the two scales were jointly entered into a factor analysis. Two separate factors for thought disorder were derived. The first represented disordered thought structure and contained numerous TLC items and the BPRS Conceptual Disorganization item. A second factor was described as disordered thought content (e.g., BPRS items of Hallucinations and Unusual Thought). The results are interpreted as suggesting that the addition of the TLC to the BPRS provides a more complete assessment of thought structure without altering the commonly obtained BPRS factor structure (Simpson & Davis, 1985). Thus, in specific applications where a detailed analysis of thought structure is desired, the addition of scales such as the TLC may be warranted. Correlates with Self-Report Measures. Self-report measures are often the major source of patient information in clinical settings and are a common component of research studies. Self-report measures are inexpensive to obtain and capable of measuring constructs that may not be readily obtainable with clinician-based scales. A limited number of studies have examined the relation between self-report measures and clinician-based ratings derived from the BPRS. Bitter, Jaeger, Agdeppa, and Volavka (1989) speculated that subjective complaints in schizophrenia can be measured with a scale labeled the Subjective Deficit Syndrome Scale (SDSS). Correlations were noted between this self-report measure and a range of BPRS symptoms. Correlates between a self-report measure of object relations (Bell Object Relations Inventory) and the BPRS have been sought in a mixed diagnostic sample (Bell, Billington, & Becker, 1986). Although overall BPRS scores were not related significantly to the items from the Bell scale, several of the BPRS items (e.g., Depressed Mood) correlated with self-report measures thought to reflect alienation, insecure attachment, egocentricity, and social incompetence (Bell et al., 1986). A number of studies have examined correlations between the Minnesota Multiphasic Personality Inventory (MMPI) and the BPRS. Tuthill, Overall, and Hollister (1967) noted numerous relationships between MMPI and BPRS items in a sample of patients

< previous page

page_808

next page >

< previous page

page_809

next page > Page 809

deemed to be candidates for antidepressant medications. Several other studies (Boerger, Graham, & Lilly, 1974; Lewandowski & Graham, 1972) noted BPRS and MMPI relationships in large samples of patients with mixed diagnostic features. Ward and Dillon (1990) focused on MMPI Scale 5 (Masculinity/Femininity, MF) relationships with the BPRS in a mixed-gender outpatient psychiatric sample. Using MF raw scores for analysis, MF scores were found to correlate significantly with ratings of depressed mood, guilt, and anxiety (Ward & Dillon, 1990). Significant MMPI-BPRS correlations also were found for the BPRS items of Somatic Concern, Anxiety, Depressed Mood, and Hostility; and the MMPI scales Hypochondriasis (Scale 1), Psychasthenia (Scale 7), Depression (Scale 2), and Hypomania (Scale 9), respectively (Ward & Dillon, 1990). Faustman, Moses, Csernansky, and White (1989) specifically examined replicable MMPI/BPRS correlations in a group of research diagnosed inpatients with schizophrenia. Replicable relationships were found for BPRS measures of Depressed Mood (MMPI Scale 2, depression), Hallucinatory Behavior (MMPI Scales F and the Wiggins Psychoticism content scale), Hostility (MMPI Scale 4, Psychopathic Deviate) and Tension (Wiggins Psychoticism content scale). Symptomatology in Schizophrenic Samples. There has been recent interest regarding the specific factor structure of symptoms in samples that are restricted to schizophrenic patients. Some work (e.g., Bell, Lysaker, Beam-Goulet, Milstein, & Lindenmeyer, 1994) has used the PANSS to suggest the presence of a five-factor model in schizophrenia. This and other work (Lindenmayer, Grochowski, & Hyman, 1995) employing the PANSS has noted a cognitive dimension that strongly includes measures of disorganization. This disorganization/cognitive factor is joined by PANSS factors thought to reflect positive symptoms, negative symptoms, excitement, and depression (Lindenmeyer et al., 1995). Other work (Andreasen, Arndt, Alliger, Miller, & Flaum, 1995) using the Scale for the Assessment of Positive Symptoms and the Scale for the Assessment of Negative Symptoms has suggested that psychotic symptoms in schizophrenia can be divided into psychotic and disorganized dimensions. Harvey et al. (1996) questioned the use of the BPRS total score to reflect overall pathology in schizophrenia. They noted that the use of the BPRS total score measure may add primarily error variance to research because it reflects a combination of items that both contribute and fail to contribute to a good fitting factor structure of schizophrenia. This work also noted that the factor structure of the BPRS remained the same when patients were assessed in both an unmedicated and medication state. Harvey et al. further employed a confirmatory factor analysis to suggest that a five-factor solution (inclusive of Activation) proposed by Overall and Klett (1972) did not fit the data well. An exploratory factor analysis, however, produced a better fit to their data with Conceptual Disorganization loading with other aspects of disorganization (e.g., Disorientation) and separate from Hallucinatory Behavior and Unusual Thought Content. Harvey et al. noted one limitation of their work is a relatively small sample (N = 135) when compared to other factor analytic work with the BPRS. More recent work (Goldman et al., 1997), based on an expansion of other work (Goldman et al., 1991), has conducted confirmatory factor analytic work in larger schizophrenic samples (N > 400, with a cross-validation sample of 520 unmedicated patients). These analyses provided support for a four-factor model (positive symptoms, negative symptoms, mood, and agitation/activation) similar to that found in prior work (e.g., Overall et al., 1967). Other recent work (Czobor & Volavka, 1996) suggested adequate reproducability of the commonly employed BPRS factor structure at pre-placebo

< previous page

page_809

next page >

< previous page

page_810

next page > Page 810

evaluation and after haloperidol treatment. The results suggested a change in the BPRS factor structure during the placebo treatment. In sum, further work in exclusively schizophrenic samples may be needed to clarify the best fitting factor structure for the BPRS in both the medicated and unmedicated condition. The presence of a separate factor representing thought/cognitive disturbance remains an interesting question for further investigation. Interpretative Strategy The BPRS yields scores representing several different levels of abstraction from global to specific for clinical and research interpretation. The total score is used as a global severity index, but this measure yields little qualitative symptom information and an exclusive reliance on this measure could be misleading. An elevated BPRS total score could relate to severe depression/anxiety in one patient while in another patient it may be due to a complex of psychotic symptoms such as hallucinations and unusual thought content. The total score thus has been used primarily to measure change in whatever symptoms a patient manifests at baseline in controlled clinical trials, and that baseline pattern is largely determined by diagnosis and other study inclusion criteria. Treatment studies using the BPRS often specify specific symptoms as inclusion criteria. For example, an antipsychotic medication trial may define inclusion criteria in terms of a combination of baseline (pretreatment) symptoms that include a minimum BPRS total score, as well as minimum levels of severity for target symptoms of treatment (e.g., hallucinatory behavior, unusual thought content, conceptual disorganization). Outcome of treatment may be analyzed in several ways. Patients may serve as their own controls, whereby changes from pretreatment baseline are compared with posttreatment values. Between-groups comparisons are also possible in studies employing several treatments and/or treatment levels. The BPRS has been subject to extensive factor analytic studies. Overall, Hollister, and Pichot (1967) reported multiple factor analyses on separate samples that total into the thousands of patients. Table 26.2 displays data representative of the factor analytic work of Overall et al. (1967). The results of this and other studies (Coyne & Spohn, 1989; Dingemans et al., 1983; see Hedlund & Vieweg, 1980, for a review) suggest several general factors that emerge when the scale is used in general psychiatric samples. Clinical change is often measured in terms of scores on both the BPRS total score and on each of the major BPRS factors. Overall and Klett (1972) proposed the scoring of four general BPRS factors. The items most consistently falling on these relatively independent factors are presented in Table 26.3. Typically, scores on these four factors are formed by calculating an unweighted sum of the item scores on each factor. Naturally, other combinations of items may be formed as required in different applications of the BPRS. Also, as noted by Hedlund and Vieweg (1980), some additional items tend to fall onto the primary clusters already noted. A separate activation factor (e.g., Excitement, Tension) has been suggested in some work (Hedlund & Vieweg, 1980). Somatic Concern may cluster with the Anxious-Depression factor, and recent work (Newcomer et al., 1990) has shown that this four-item Anxious/Depression combination correlates strongly with the total score from the HAM-D. The Mannerisms/Posturing item at times may be combined into the Withdrawal/Retardation factor (Guelfi et al., 1989; Overall et al., 1967). Karson and Bigelow (1986) offered a formula for combining BPRS items to measure paranoia in schizophrenia. In addition to the fact that some items may gravitate to different factors depending on the composition

< previous page

page_810

next page >

< previous page

page_811

next page > Page 811

TABLE 26.2 Normalized Varimax-Rotated Factors from VA Drug Screening Data (N = 725) I II III IV Somatic concern .09 .10 .04 .69 Anxiety .01 .14 .18 .72 Emotional withdrawal .05 .88 .01 .03 Conceptual disorganization .45 .41 .06 .19 Guilt feelings .08 .10 .05 .44 Tension .12 .04 .18 .38 Mannerisms-posturing .27 .59 .15 .04 Grandiosity .39 .06 .18 .25 Depressive mood .29 .02 .11 .78 Hostility .01 .04 .87 .08 Suspiciousness .40 .02 .76 .10 Hallucinatory behavior .82 .11 .04 .17 Motor retardation .15 .46 .21 .34 Uncooperativeness .00 .49 .40 .04 Unusual thought content .84 .06 .18 .01 Blunted affect .02 .76 .17 .15 Note. Reprinted with permission from Overall et al., 1967. Archies of General Psychiartry, 16,146-151. Copyright © 1967, American Medical Association. TABLE 26.3 Suggested Factor Dimensions of the BPRS Thinking Disturbance Hostile-Suspiciousness Conceptual Disorganization Hostility Hallucinatory Behavior Suspiciousness Unusual Thought Content Uncooperativeness Withdrawal-Retardation Anxious Depression Emotional Withdrawal Anxiety Motor Retardation Guilt Feelings Blunted Affect Depressed Mood Note. Reproduced with permission from J.E. Overall and C.J. Klett, Applied Multivariate Analysis, McGraw-Hill, 1972, p. 12. of the patient sample, Overall (1983) emphasized the advantages of having a balanced number of stable items scored for each factor. This is a primary reason for utilizing only 12 of the 18 BPRS items to define the balanced set of four factors. Additional literature on the factor structure of the BPRS in geriatric/geropsychiatric samples has been provided (Beller & Overall, 1984; Overall & Rhoades, 1988). This work suggested that the BPRS may have a somewhat different factor structure in a geropsychiatric population. Factor subtypes in this population were labeled Agitated Dementia, Retarded Dementia, Anxious Depression, Withdrawal Depression, and Paranoid Psychosis (Beller & Overall, 1984). Cross-cultural work has demonstrated the general replicability of the factor structure of the BPRS. For example, one recent work (Mass, Burmeister, & Krausz, 1997) replicated the general four-factor (Thought Disturbance, Hostility/Suspiciousness, Anxiety/Depression, Anergia) structure in a sample of psychiatric patients in Germany. The factor structure of the BPRS in Alzheimer's disease has recently been investigated (Ownby, Koss, Smyth, & Whitehouse, 1994). The results were generally consistent with

< previous page

page_811

next page >

< previous page

page_812

next page > Page 812

prior work in psychiatric patients, though the presence of a unique factor including dimensions of Tension and Uncooperativeness was thought to reflect unique symptomatology in this illness. The Brief Psychiatric Rating Scale for Children Children tend to present with unique manifestations of psychopathology. A distinct and separate instrument, the Brief Psychiatric Rating Scale for Children (BPRS-C) has been developed for specific use with children (Overall & Pfefferbaum, 1982). The BPRS-C contains 21 items, some of which are found in the standard adult BPRS (e.g., Depressed Mood, Blunted Affect) and others that are unique to the BPRS-C (e.g., Stereotypy, Feelings of Inferiority). Similar to the BPRS used in adults, the symptoms and behavior constructs of the BPRS-C are rated on 7-point scales ranging from "not present" to "extremely severe." The items contribute to seven composite factor scores with three items entering into each score (Overall & Pfefferbaum, 1982). In the case of the BPRSC, scale development emphasized a balanced factor structure with three items specifically chosen to represent each of seven primary factors. The names given these factor scores are as follows: Behavior Problems, Depression, Thinking Disturbance, Psychomotor Excitation, Withdrawal Retardation, Anxiety, and Organicity. The BPRS-C has not received the extensive psychometric validation of the adult BPRS, and Overall and Pfefferbaum (1982) noted that the scale needs further use in order to document further the reliability and validity of the instrument. Overall and Pfefferbaum (1984) subsequently published a detailed factor analysis of the BPRS-C in which the seven postulated factors were clearly confirmed. Gale, Pfefferbaum-Levine, Suhr, and Overall (1986) examined the reliability of both individual rating items and the factor scores. They concluded that the composite factor scores were adequately reliable, whereas reliability of scores on the individual items was marginal in some cases. Mullins, Pfefferbaum, Schultz, and Overall (1986) reported use of the BPRS-C to quantitate information from medical records in standard analyzable form. Stavrakaki, Vargo, Boodoosingh, and Roberts (1987) integrated the BPRS-C into a study of the relation between anxiety and depression in children. Other work (Casat, Pleasants, Schroeder, & Parler, 1989) has included the BPRS-C in pediatric psychopharmacology protocols. Use of the BPRS for Treatment Planning The BPRS has typically been employed as an outcome assessment instrument rather than a tool in clinical treatment planning. However, the extensive psychometric and outcome literature on the scale provides for valuable information supporting the use of the scale in treatment planning. The BPRS can be integrated into routine clinical practice in both inpatient and outpatient settings relatively easily. The scale is useful across a broad range of diagnoses and has a clinically meaningful factor structure in diagnostically mixed samples. It is relatively simple to score and can be administered on a repeated bases. Moreover, because the scale is not linked strongly to a theoretical treatment orientation, it can be used in a diverse range of treatment planning settings.

< previous page

page_812

next page >

< previous page

page_813

next page > Page 813

The BPRS is not without limitations. Time and training are required for this clinician-based measure to be used in a reliable manner. Obviously, budgetary and staff priorities will influence whether a particular clinical setting can make such investments. Research Applications and Findings Relevant to Treatment Planning The identification of target treatment symptoms in psychiatric patients is a major initial goal in the treatment planning process. Factor analytic studies with the BPRS provide direction for identifying symptom clusters for treatment planning. As noted in the section on interpretative strategy, there are four general symptom factors that can be identified in the BPRS when the instrument is used in diagnostically mixed samples. The use of these clusters (e.g., anxiety/depression, thinking disturbance) can yield target symptoms for treatment planning that are more behaviorally specific than the global severity index offered by the BPRS total score. In addition, one can obviously target specific items (e.g., depressed mood, conceptual disorganization) in the treatment planning process. Overall (1974) outlined techniques for patient classification using the BPRS. A major goal of this work was to identify patient profiles that may be responsive to different classes of medications without regard for traditional diagnostic classification. This work with the BPRS has sought to determine commonly appearing symptom complexes. Cluster analysis methodology was used to group together BPRS profiles with similar patterns and to segregate into different clusters profiles that did not satisfy similarity criteria. The general patient subtypes found in this way have been labeled as follows (Overall, 1974, pp. 69-70): florid thinking disorder, withdrawndisorganized thinking disturbance, paranoid hostile-suspiciousness, anxious depression, hostile depression, and retarded depression. Normative profiles of symptom severity for each of these "phenomenological subtypes" are provided by Overall (1974). Subsequently, Overall and Hollister (1982) developed BPRS codetype classification to assign psychiatric patients into these phenomenological subtypes, similar to the codetyping used with Minnesota Multiphasic Personality Inventory profiles. The hope of this work is that symptom-specific classification may prove to be a superior objective basis for choice of treatment. For example, florid thinking disorder patients may respond better to a more potent antipsychotic medications and withdrawn-disorganized type may respond preferentially to drugs with activating properties. Similarly, retarded depression may respond best to a less sedating drug and anxious depression may require the tranquilizing effects of a more sedating drug. This goes to the heart of the concern over whether drug treatments aim at controlling symptoms or at more fundamental change, and it is a viable research issue for which the BPRS is well suited. These general profiles for patient classification may be useful in treatment planning research, as it would seem likely that patients who present with different symptom profiles would show differential response to a broad array of treatments ranging from medications to psychotherapy. Clinical Applications of the BPRS in Treatment Planning As a broad-scope clinician-based rating instrument, the BPRS can be helpful in the identification of target symptoms for treatment planning. The fact that the scale captures a broad spectrum of symptom manifestations is a particular strength. For example,

< previous page

page_813

next page >

< previous page

page_814

next page > Page 814

patients with schizophrenia may present with a range of associated symptoms that are not core features of the disorder. The BPRS may identify such patients as having high levels of anxiety, depression, or guilt. The identification of these symptoms may have relevance to psychotherapy or medication treatment (e.g., augmentation of antipsychotic medications with antianxiety agents or lithium carbonate). The application of the BPRS at the initiation of treatment allows for the identification of a wide range of symptoms for treatment planning. As noted previously, the routine evaluation of major symptom clusters (e.g., Thinking Disturbance, Anxious Depression), as well as individual symptoms, may be useful in treatment planning. Specific goals for symptom reduction may be set and the effects of interventions (e.g., medications, individual psychotherapy, behavioral interventions) can be monitored on an ongoing basis. Because the scale is particularly amenable to repeated administration, the BPRS could be useful in updating treatment plans. In this process, the status of previously identified symptoms can be monitored. Because the BPRS evaluates a fairly broad range of areas, the emergence of new symptoms can be monitored and interventions can be developed as part of the ongoing treatment planning process. Clinicians treating acutely ill patients who have significant chronic disorders such as schizophrenia often are faced with questions of whether a patient is optimally treated. In other words, in patients who display chronic symptoms even at an optimally treated baseline, it must be questioned whether such an optimal remission of symptoms has been achieved prior to discharge. Determination of the degree of improvement may influence decisions about possible augmentative psychopharmacology. For example, Small et al. (1975) suggested that the addition of lithium carbonate to antipsychotic medications may produce further symptom remission in some schizophrenic patients. Ongoing collection of measures, such as the BPRS, provides a means of monitoring exacerbation and remission, as well as determining the effectiveness of new treatment interventions. Use of the BPRS with Other Evaluation Data The BPRS yields global severity measures on a range of symptoms commonly seen in psychiatric patients with major syndromes (e.g., major depression, schizophrenia, bipolar disorder). In clinical settings, the BPRS can be supplemented with a variety of other useful clinical data. For example, the BPRS does not provide information about levels of adaptive functioning, personal coping, and self-care skills. Treatment planning in rehabilitation settings (e.g., day hospital programs) with generally stable but chronically ill patients may have particular interests in identifying and monitoring such measures of adaptive functioning. Measures such as the Quality of Life Scale (Heinrichs, Hanon, & Carpenter, 1984) may be useful to determine general levels of adaptive skills (e.g., work history, social relations) in patients with severe disorders such as schizophrenia. The use of the BPRS in these situations may be to monitor remission and identify relapse (Lukoff, Liberman, & Nuechterlein, 1986). As a clinician-based observational measure, the BPRS does not yield a variety of measures familiar to many psychologists. Other measures outlined in this book (e.g., MMPI) provide a range of information regarding coping styles and psychopathology. The inclusion of such measures with the BPRS clearly can add information for treatment planning that cannot be derived from the BPRS alone. Specific recommendations for supplemental instrument selection is difficult, because the choice of self-report or projective

< previous page

page_814

next page >

< previous page

page_815

next page > Page 815

testing data to supplement a clinician-based rating scale is dependent on the interests and theoretical orientation of the practicing clinician and needs of the patient. However, as a fairly reliable measure of symptom severity that is independent of orientation preference, the BPRS yields an assessment of the severity and general quality of symptoms that can be observed by a clinician. As noted by Horowitz, Marmar, Weiss, Kaltreider, and Wilner (1986), such a measure may be among the most strongly related to patient outcome in settings where nonbehavioral therapies (e.g., dynamic psychotherapy) are preferred. Potential Use or Limits for Treatment Planning in a Managed Care Setting The BPRS can provide a fairly rapid means of documenting clinical symptoms observed in the course of clinical practice. A 10- to 15-minute clinical interview may be sufficient to complete the instrument in settings where the clinician is familiar with the scale. The scale may be of use in managed care settings where specific treatment goals are needed to justify and measure the effectiveness of various interventions (e.g., documentation of severe treatment refractory symptoms to justify atypical antipsychotic medications that may have higher initial treatment costs). However, the BPRS may have important limitations in managed care settings. As noted previously, the scale requires some degree of practice, and issues of between-clinician variability may be of importance. Time pressures in managed care settings may preclude these important training and monitoring requirements in the use of the scale. In addition, for most clinical applications, the scale is best used on about a weekly basis. Administering the scale more frequently (e.g., managed care setting where treatment may last only days) entails difficulties. Provision of Feedback Regarding Assessment Findings A fair degree of flexibility may be used in providing feedback to clients about BPRS measures obtained in the course of treatment planning. Many of the BPRS symptom constructs (e.g., anxiety, depressed mood) are not too abstract to be understood by most clients. Accordingly, the BPRS may represent one form of objective clinical data that can be discussed readily with clients at the initiation of treatment and monitored across the treatment process. In addition, the BPRS total score or symptom cluster scores (e.g., Anxious/Depression) can be scored in a fairly straightforward manner and can be followed with clients using measures such as percentage improvement since treatment initiation. It is well to recall that the recommended 0 to 6 scoring of the individual BPRS scales is important for the calculation of meaningful ratios, such as percentage improvement. Limitations/Potential Problems in Use for Treatment Planning The limitations of the BPRS fall into several categories. In sum, the major limitations include the following:

< previous page

page_815

next page >

< previous page

page_816

next page > Page 816

1. Although the BPRS contains items familiar to most mental health professionals, its use requires familiarity with the 18 symptom constructs and adherence to item definitions if it is to be used in a reliable manner. Some degree of staff training and monitoring is recommended to assure reliability. 2. Although the scale has been shown to be of some use in lesser degrees of psychopathology, it initially was developed for use in clinical drug studies with inpatient psychiatric samples. The scale may be of limited use in some outpatient psychotherapy settings where clients may show minimal levels of symptom severity. 3. The BPRS represents an outcome measure based solely on clinician observed symptom severity. It is not capable of measuring intrapsychic constructs (e.g., ego strength, self-esteem) or adaptive functioning (e.g., interpersonal relations, vocational abilities). Questions such as this require the addition of other measures (e.g., Horowitz et al., 1986). Use of the BPRS for Treatment Monitoring Patients with significant psychiatric disorders, such as schizophrenia and bipolar disorder, may show periods of relative exacerbation and remission of symptoms. The monitoring of treatment progress during these phases of treatment is important in the clinical decision-making process. Acute exacerbations of schizophrenia tend to show variability in terms of the speed of remission of symptoms (Freidhoff & Silva, 1995), with some patients showing significant improvement in 1 to 2 weeks, while other patients are still showing noticable improvements at 3 and 4 weeks of treatment. Accordingly, there is a clear role for treatment monitoring for both research and clinical work in psychotic disorders such as schizophrenia. Lukoff, Liberman, and Nuechterlein (1986) provided useful suggestions for BPRS symptom monitoring in schizophrenia. A modified version of the BPRS was employed to monitor schizophrenic patients during outpatient rehabilitation. Such an ongoing evaluation of symptoms may aid in the identification of emergent symptoms that would be suggestive of a relapse into florid psychosis (Lukoff, Liberman, & Nuechterlein, 1986). Accordingly, ongoing symptom monitoring can be integrated into treatment planning, raising the possibility of detecting increasing symptoms before full relapse in severe disorders such as schizophrenia. The weekly collection of a BPRS ratings allows for the rapid monitoring of the global severity of a range of symptoms frequently seen in psychiatric disorders. The effects of new interventions (e.g., adjunctive treatment medications such as mood stabilizers that are frequently used in psychotic disorders) can be monitored. As noted previously, the BPRS is optimally given on a weekly basis. The administration of the scale is therefore meant to be anchored on the symptoms of the patient over the course of the prior week. Administering the scale on a more frequent basis is fairly cumbersome and difficult. Accordingly, in some managed care settings (e.g., very short-term inpatient hospitalizations of less than 1 week) the use of the BPRS for treatment monitoring or outcome assessment may be difficult. On the other hand, managed care settings may require the demonstration of the ineffectiveness of some conventional treatments (e.g., standard antipsychotic medications) prior to the use of more expensive interventions (e.g., new generation antipsychotic medications). The use of the BPRS in settings can provide useful monitoring and outcome data to provide justification for novel treatments with relatively higher initial treatment costs.

< previous page

page_816

next page >

< previous page

page_817

next page > Page 817

Use of the BPRS for Outcome Assessment The BPRS has been employed as an outcome measure in a diverse range of treatment studies. As previously noted, the scale has several unique advantages in clinical and research settings. Use of the BPRS is not linked uniquely to any single patient diagnostic group. The scale provides for a rapid global assessment of a fairly broad spectrum of clinical constructs that are easily recognized by mental health professionals. As a clinicianbased psychopathology rating scale, the BPRS obviously does not measure constructs (e.g., self-esteem, egostrength) that may be of interest in some clinical settings. The BPRS has mixed advantages and disadvantages in terms of monetary costs. Clinicians can complete the scale in a matter of several minutes and can generally obtain sufficient information during routine clinical interviews (e.g., clinic intake evaluations) that may be part of standard clinical care. As noted in the discussion of reliability, the scale does require some degree of sophistication, training, and adherence to item definitions. The time and effort required to attend to training and reliability may be a resource allocation issue in some settings. Evaluation Against Criteria for Outcome Measures Attempts have been made (Ciarlo, Brown, Edwards, Kiresak, & Newman, 1986; Newman & Ciarlo, 1994) to define ideal criteria for client outcome measures. These goals for the development and use of outcome measures offer a means of evaluating the relative strengths and weaknesses of the BPRS. Newman and Ciarlo (1994) offered criteria for outcome assessment instruments that fall under the general headings of applications, methods/procedures, psychometric features, cost considerations, and utility consideration. In terms of applications, Newman and Ciarlo (1994) noted that an ideal outcome measure is useful in a wide range of settings and client samples. In this regard, the BPRS is fairly well suited. Alhough the scale originally was developed for use in symptomatic inpatients, the scale also can be used in outpatient settings (Pull & Overall, 1977). Unlike numerous clinician-based rating scales that are developed to measure a specific construct (e.g., negative symptoms) within a single disorder (e.g., schizophrenia), the BPRS is capable of assessing common symptom constructs that cut across diagnostic categories. In terms of measure application, further definitions of ideal outcome measures note that a measure should have a simple methodology that can be easily learned (Newman & Ciarlo, 1994). The BPRS has a general level of simplicity as an 18-item clinician-based scale for obtaining global severity ratings on a variety of constructs. Although these constructs are familiar to a broad variety of mental health professionals, the use of the scale requires a certain degree of sophistication and training. Newman and Ciarlo (1994) specified that a scale should have objective referents (meanings) that are consistent across clients. This criteria has been the subject of recent attention with the BPRS, because the exact definition of item severity (e.g., ''not present" to "extremely severe" on the 7-point scale) is not built in to the scale. In other words, clinicians within settings may use their own personal definitions of what constitutes the levels of severity of the BPRS items. Anchored versions of the scale have attempted to address this problem, but new scales generally lack sufficient psychometric data (e.g.,

< previous page

page_817

next page >

< previous page

page_818

next page > Page 818

factor structure demonstrated to be similar to the BPRS, data on sensitivity to treatment effects) at this time. Rhoades and Overall (1988) addressed the problem of defining levels of severity. The rating of "moderate" is suggested to be "the average or modal level of severity in patients who have the symptom in question" (Rhoades & Overall, 1988, p. 104). Rhoades and Overall (1988) continued by noting "other rating steps represent points between these three anchors`Very Mild' is closer to `Not Present' than to `Moderate,' whereas `Mild' is closer to `Moderate' than to `Not Present.'" A similar partition is suggested for ratings between ''moderate" and "extremely severe." Outcome measures should have sound psychometric properties with demonstrated reliability, validity, sensitivity, and freedom from bias (Newman & Ciarlo, 1994). The BPRS can be used in a reliable manner following some degree of training. Ongoing monitoring of reliability can optimize research and clinical use of the scale. The validity and sensitivity of the scale has been demonstrated in hundreds of treatment studies. A major emphasis of these studies has been directed at determining efficacy for psychiatric treatment medications. The scale also has been used as a psychotherapy outcome measure. The scale is not free from bias, because clinician-based instruments may be influenced by the expectations and hopes of the rater. In research applications, this problem can be addressed partially by designing studies with raters who are blinded to the treatment condition of the patient. This is an obvious and easy element of clinical drug studies (i.e., FDA guidelines require double-blind design), but may be more complicated in psychotherapy outcome studies. Such studies may need to employ raters who are blind to the therapy treatment condition. Among the final ideal criteria for outcome measures are costs and utility considerations (Newman & Ciarlo, 1994). These are general areas in which the BPRS excels. The BPRS has little initial cost given that the scale has been in the public domain since 1965 (Overall & Gorham, 1988). The scale does require approximately 20 minutes of clinician time to collect data from a patient. However, this interview is quite similar to routine clinical interviews that are common in treatment settings. Newman and Ciarlo (1994) noted that outcome measures should be understandable to a range of professional and nonprofessional audiences. The BPRS yields symptom constructs that are familiar to a range of clinicians who may have varying degrees of research and clinical sophistication. In addition, the construct and the global severity scoring procedures are fairly easily understood by nonprofessional individuals. The BPRS is easy to score (BPRS rating forms can easily be developed that allow for rapid scanning into computer databases) and can be summarized into approximately five scores reflecting overall symptom severity and specific symptom clusters. The scale can yield information that does not require sophisticated statistical analysis. Summary scores can be graphed easily with symptom severity scaled in the vertical (Y) axis and repeated ratings obtained across treatment (time) scored on the horizontal (X) axis. In addition, the scale can be useful in clinical service functions in that it can be easily integrated into a standard intake interview employed in most treatment settings. Newman and Ciarlo (1994) noted that an outcome measure should be compatible with a range of clinical theories and practices. When properly used, the BPRS represents a relatively atheoretical instrument. The scale makes no assumptions about the underlying dimensions (e.g., intrapsychic processes, neurobiology) that produce symptom change during treatment. The sensitivity of the scale to patient change has been demonstrated in applications ranging from brief dynamic psychotherapy to antipsychotic drug studies. There may be certain advantages in the atheoretical nature of the BPRS

< previous page

page_818

next page >

< previous page

page_819

next page > Page 819

in that it can be employed in diverse treatment settings to provide a global and uniform description of patient symptoms and change from treatment. Research Applications and Findings Using the BPRS as an Outcome Measure As previously noted, by far the greatest research application of the BPRS has been as an outcome measure in clinical psychopharmacology research. In this use, it has been the pivotal outcome measure used in the development of a broad array of psychiatric medications. The scale has found particular use in the evaluation of the efficacy of antipsychotic medications (Chouinard, Annable, & Campbell, 1989). Early work with the BPRS used the instrument for descriptive purposes and sought relations between patient characteristics and BPRS symptom clusters. For example, some studies (Overall, 1971; Overall, Henry, & Ford, 1971) have examined issues such as the relation between marital status and outcome as measured by the BPRS. Relatively simple treatment outcome research (e.g., Heinemann, Yudin, & Perlmutter, 1975; Konick, Friedman, Paolino, & Graham, 1972) has examined the general effects of hospitalization or day treatment programs on BPRS scores in a mixed diagnostic samples. Questions about the long-term outcome of specific disorders such as schizophrenia have included BPRS measures (Pokorny & Faibish, 1968) A group of studies have employed the BPRS to characterize unique outcome questions or predict patient characteristics important for treatment. Hoffmann, Wehler, and Noehl (1978) used the BPRS to compare schizophrenic patients with and without a history of bilateral prefrontal lobotomy. Pokorny and Kaplan (1976) used an expanded BPRS to determine the characteristics of patients who committed suicide following hospital discharge. Some work (Green, Nuechterlein, Ventura, & Mintz, 1990) has tracked schizophrenic patients with the BPRS to determine the temporal relation between psychotic and depressive symptoms. Various studies (Dixon, Haas, Weiden, Sweeney, & Frances, 1991; Hoffmann & Wefring, 1972; Westermeyer & Neider, 1988) have included the BPRS in descriptive studies of patients with alcohol or substance abuse histories. Horowitz et al. (1981) used eight BPRS items to assess reactions to the death of a parent. Several studies (Yesavage, 1984; Yesavage et al., 1983) have used the BPRS to identify the characteristics of schizophrenic inpatients who exhibit assaultive behavior. The BPRS has been used extensively (e.g., Faustman, Moses, & Csernansky, 1988; Glynn et al., 1990; Müller, Hofschuster, Ackenheil, & Eckstein, 1993; Newcomer et al., 1991) as a symptom measure for correlative research in schizophrenia and other disorders. The goal of such studies is to seek relations between disorder symptoms and biological/psychological variables, thereby elucidating symptom-biology relations that aid in understanding disorders such as schizophrenia. The BPRS has been included as an outcome measure in psychotherapy treatment studies. Horowitz et al. (1986) evaluated numerous outcome measures following brief dynamic psychotherapy. Interestingly, BPRS and other measures of symptom change appeared to show more robust changes from treatment than measures of adaptive functioning. Claghorn, Johnstone, Cook, and Itschner (1974) included BPRS measures in assessing outcome of group psychotherapy in schizophrenic patients. Yenson et al. (1976) noted the BPRS to be a sensitive outcome measure to change produced by psychotherapy combined with the administration of methylenedioxyamphetamine. Several

< previous page

page_819

next page >

< previous page

page_820

next page > Page 820

studies (Martin, Moore, & Sterne, 1977; Martin, Moore, Sterne, & McNairy, 1977) have employed the BPRS as a measure in work examining the relation between therapist expectancies and patient outcome. Use in Clinical Practice The BPRS has a range of applications in clinical practice, although the most common use of the scale is in the assessment of medication treatments. Outpatients treated in community mental health settings frequently may receive medications in addition to psychotherapy. In addition, psychopharmacological treatment is extremely common in inpatient settings. All psychotherapeutic medications possess side effects and risks that can range from being relatively benign (e.g., sedation) to quite serious (e.g., tardive dyskinesia, cardiac complications, intentional overdose). The risks versus benefits of these interventions should be weighed on a patient-by-patient basis. The inclusion of the BPRS in monitoring psychopharmacological interventions represents a means of providing information on the degree of benefit that medications may provide. In settings such as an inpatient psychiatry unit, the weekly recording of BPRS data can be included in a patient's chart, thus providing detailed information for future clinicians who may be involved in the treatment of a patient. Similar to the work of Horowitz et al. (1986), the scale can be used to complement measures of adaptive functioning in evaluating change during psychotherapy. The scale allows for repeated data collection and can be completed on a weekly basis if desired. The scale nearly always can be completed based on the type of information obtained in a routine intake interview in most clinical settings. Similar to the use of the scale in research settings, clinical use of the BPRS allows for the examination of a range of different scores. The total score provides a general level of psychopathology index. Use of the specific clusters (e.g., thinking disturbance, anxious depression) of items is also useful in tracking change during clinical practice. A variety of statistical procedures can be used to analyze BPRS outcome measures. In the case of within-subjects designs, a simple pretreatment-posttreatment comparison can be made for BPRS measures and statistics such as pairwise t tests or Wilcoxon matched-pairs tests. A disadvantage of such pretreatmentposttreatment comparisons is that they fail to include rich information collected at intervening time points. Some suggestions for alternative forms of analysis, such as slopes analysis, have been made (Kraemer & Thiemann, 1989). A final point about considerations of statistical versus clinical significance should be raised. Treatment interventions such as medications or psychotherapy may produce statistical significant results by producing small but consistent improvements in clinical symptoms. However, considerations of clinical significance should be weighed in determining the utility of a treatment. Issues such as side effects of medications and monetary costs of psychotherapy may need to be factored into the final decision about the usefulness of a treatment. Considerations of whether treatments produce significant changes in the quality of life of a patient also may be important. Use with Other Evaluation Data Research and clinical applications often include additional outcome measures to supplement the BPRS. The selection of these additional scales is dependent on the specific questions being addressed in the outcome analysis.

< previous page

page_820

next page >

< previous page

page_821

next page > Page 821

The BPRS provides a global severity index across a range of symptoms, but the scale provides little specific qualitative information about symptoms. For example, although the BPRS yields a global rating of hallucinatory behavior, it does not allow for a description of the specifics of the hallucinations, such as multiple versus single voices talking to the patient, presence of command hallucinations, and mood congruent versus noncongruent hallucinations. The addition of items from a scale such as the Schedule for Affective Disorders and Schizophrenia (SADS; Endicott & Spitzer, 1978) can provide such qualitative information when used with the BPRS. The BPRS does not provide for general measures of interpersonal relationships, vocational functioning, social support, or initiative. Such measures may be of particular interest in characterizing outcome of both psychological and medication treatments. A review of all the available outcome measures is beyond the scope of the present discussion. In clinical and research applications in which social and vocational functioning is of interest, one may want to consider the addition of the Quality of Life Scale (Heinrichs et al., 1984). Meltzer, Burnett, Bastani, and Ramirez (1990) showed that this instrument was sensitive to improvements obtained with Clozapine treatment in patients who were treatment resistant to typical antipsychotic medications. As mentioned previously, there has been a growing interest in the assessment of negative symptoms in schizophrenia. This has led to the practice of adding one or more negative symptom rating scales to the BPRS. Some data (Thiemann et al., 1987) suggest that the most commonly used negative symptom rating scale (Scale for the Assessment of Negative Symptoms) is redundant with the BPRS withdrawal-retardation factor in yielding an overall negative symptom measure. Accordingly, consideration should be given to data collection goals (measurement of global negative symptoms vs. specific items) when adding negative symptom measures to the BPRS, because redundant measures can create problems in terms of both efficiency and data interpretation (Thiemann et al., 1987). Provision of Feedback Regarding Assessment Findings Perhaps the most meaningful feedback that can be provided to clients is in the realm of specific BPRS items or clusters of items. Treatment typically targets a specific set of items (e.g., anxiety and depression) and feedback regarding outcome can be provided for these selected items. A useful form of feedback to patients on changes in overall symptoms or specific symptom clusters can be provided in terms of a percentage reduction of symptoms from treatment initiation. If BPRS data are collected regularly across time (e.g., recorded weekly), the effects of specific interventions (e.g., initiation or dose changes in treatment medications) may be reviewed with patients. In sum, the BPRS can provide for a repeated measure of global psychopathology that can be used readily in a single-subject analysis of treatment outcome. Limitations/Potential Problems in Use The limitations and problems in the use of the BPRS in treatment planning are essentially the same at those outlined for treatment planning. In sum, these pertain to need for training and scale familiarity to assure reliability, limitations for use in patients with

< previous page

page_821

next page >

< previous page

page_822

next page > Page 822

little overt symptoms, and data restriction to clinician-observed ratings of psychopathology. Case Studies The following two case studies illustrate the use of the BPRS to monitor symptom changes during inpatient medication treatment. Both cases were inpatients treated at the Stanford/VA Mental Health Clinical Research Center. Both patients were free of antipsychotic medications when the initial BPRS rating was conducted. These patients were treated subsequently with haloperidol (20 mg/day), a widely used antipsychotic medication. Ratings were conducted on a weekly basis and in most cases two raters were employed. These raters conducted a single joint interview with the patient and independently completed the BPRS. Ratings for the two raters were averaged. Case 1 Case 1 was a 24-year-old male who met research diagnostic criteria (RDC; Spitzer, Endicott, & Robins, 1978) for subacute schizophrenia. Figure 26.2 displays the weekly rating of overall BPRS scores. Figure 26.3 illustrates the symptom levels for the four major BPRS clusters described in the interpretive strategy section of this chapter. This patient demonstrated approximately a 12-point reduction in overall BPRS score, an amount that represents an approximate 25% reduction in general symptom severity. Figure 26.3 allows for the determination of what symptom clusters showed the most consistent change. These data illustrate that the most consistent and significant improvement across the course of treatment took place in the thinking/disturbance cluster (hallucinatory behavior, conceptual disorganization, and unusual thought content). The total score for this cluster changed from 9.5 (half-point scores are possible because this score reflects an average of two raters observing a single rating session) at baseline to 4 at the end of 4 weeks of treatment. Changes in other symptom clusters were less consistent.

Fig. 26.2. Case 1. Total BPRS score at medication-free baseline and at weekly follow-up during haloperidol treatment.

< previous page

page_822

next page >

< previous page

page_823

next page > Page 823

Fig. 26.3. Sum of factor scores for Case 1 at unmedicated baseline and at weekly follow-up during haloperidol treatment. Case 2 Case 2 was a 40-year-old male who met RDC criteria for chronic schizophrenia. As noted in Fig. 26.4, this patient showed a 15-point reduction from pretreatment baseline to the fourth week of treatment. An examination of Fig. 26.5 shows that this patient displayed improvements across a range of symptom clusters. An improvement across the withdrawal/retardation (blunted affect, emotional withdrawal, psychomotor retardation), hostility/suspiciousness (suspiciousness, hostility, uncooperativeness), and thinking/disturbance (hallucinatory behavior, conceptual disorganization, and unusual thought content) clusters was noted across 4 weeks of treatment. These two cases illustrate unique response patterns to the same medication treatment in schizophrenia. Case 1 showed a treatment response that was most clearly evident in a reduction of the thinking disturbance cluster of items. Case 2 demonstrated that some patients may show improvements across a broad array of positive and negative symptoms commonly observed in schizophrenia.

Fig. 26.4. Case 2. Total BPRS score at medication free baseline and at weekly follow-up during haloperidol treatment.

< previous page

page_823

next page >

< previous page

page_824

next page > Page 824

Fig. 26.5. Sum of factor scores for Case 2 at unmedicated baseline and at weekly follow-up during haloperidol treatment. Conclusions The BPRS represents a broad scope psychiatric rating scale that has been the subject of extensive psychometric study. This scale probably has no equal as a clinician-based rating scale that is applicable to patients with a wide range of diagnoses. Although the BPRS may be useful in any setting where patients display ratable levels of psychiatric symptoms (e.g., depression, thought disorder, delusions, anxiety), the scale is best geared for inpatient populations with a fairly high degree of symptom severity. It has a replicable factor structure and has been shown in many studies to be sensitive to psychiatric treatments ranging from psychotherapy to medication treatment. The instrument is not without limitations. An adequate level of familiarity with the BPRS item content is required, and training for interrater reliability is recommended to assure reliable and appropriate use of the scale. Acknowledgments This work was supported, in part, by grants MH-30854 and MH-32457 to the Stanford/VA Mental Health Clinical Research Center and the University of Texas Houston Health Science Center, respectively, and research support from the Department of Veterans Affairs. Also, the authors acknowledge the extremely valued contributions of Pamela J. Elliott in administrative support and gathering of materials. References Abrams, R., & Taylor, M.A. (1978). A rating scale for emotional blunting. American Journal of Psychiatry, 135, 226-229. Andersen, J., Kørner, A., Larsen, J.K., Schultz, V., Nielsen, B.M., Behnke, K., Munk-Andersen, E., & Bjørum, N. (1993). Agreement in psychiatric assessment. Acta Psychiatrica Scandinavica, 87, 128-132. Andreasen, N.C. (1979). Thought, language, and communication disorders: I. Clinical assessment, definition of terms, and evaluation of their reliability. Archives of General Psychiatry, 36, 1315-1321. Andreasen, N.C., Arndt, S., Alliger, R., Miller, D., & Flaum, M. (1995). Symptoms of schizophrenia: Methods, meanings, and

< previous page

page_824

next page >

< previous page

page_825

next page > Page 825

mechanisms. Archives of General Psychiatry, 52, 341-351. Andreasen, N.C., & Olsen, S.A. (1982). Negative versus positive schizophrenia: Definition and validation. Archives of General Psychiatry, 39, 789-794. Bech, P., Larsen, J.K., & Andersen, J. (1988). The BPRS: Psychometric developments. Psychopharmacology Bulletin, 24, 118-121. Beck, A.T., Ward, C.H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571. Becker, R.W. (1988). Depression in schizophrenia. Hospital and Community Psychiatry, 39, 1269-1275. Bell, M., Billington, R., & Becker, B. (1986). A scale for the assessment of object relations: Reliability, validity, and factorial invariance. Journal of Clinical Psychology, 42, 733-741. Bell, M., Milstein, R., Beam-Goulet, J., Lysaker, P., & Cicchetti, D. (1992). The Positive and Negative Syndrome Scale and the Brief Psychiatric Rating Scale. Reliability, comparability, and predictive validity. Journal of Nervous and Mental Disease, 180, 723-728. Bell, M.D., Lysaker, P.H., Beam-Goulet, J.L., Milstein, R.M., & Lindenmeyer, J.-P (1994). Five-component model of schizophrenia: Assessing the factorial invariance of the Positive and Negative Syndrome Scale. Psychiatry Research, 52, 295-303. Beller, S.A., & Overall, J.E. (1984). The Brief Psychiatric Rating Scale (BPRS) in geropsychiatric research: II. Representative profile patterns. Journal of Gerontology, 39, 194-200. Bigelow, L.B., & Berthot, B.D. (1989). Psychiatric Symptom Assessment Scale (PSAS). Psychopharmacology Bulletin, 25, 168-179. Bitter, I., Jaeger, J., Agdeppa, J., & Volavka, J. (1989). Subjective symptoms: Part of the negative syndrome of schizophrenia? Psychopharmacology Bulletin, 25, 180-185. Boerger, A.R., Graham, J.R., & Lilly, R.S. (1974). Behavioral correlates of single-scale MMPI code types. Journal of Consulting and Clinical Psychology, 42, 398-402. Breier, A., Wolkowitz, O.M., Doran, A.R., Roy, A., Boronow, J., Hommer, D.W., & Pickar, D. (1987). Neuroleptic responsivity of negative and positive symptoms in schizophrenia. American Journal of Psychiatry, 144, 1549-1555. Carpenter, W.T., Jr. (1991). Psychopathology and common sense. Biological Psychiatry, 29, 735-737. Casat, C.D., Pleasants, D.Z., Schroeder, D.H., & Parler, D.W. (1989). Buproprion in children with attention deficit disorder. Psychopharmacology Bulletin, 25, 198-201. Chouinard, G., Annable, L., & Campbell, W. (1989). A randomized clinical trial of haloperidol decanoate and fluphenazine decanoate in the outpatient treatment of schizophrenia. Journal of Clinical Psychopharmacology, 9, 247-253. Ciarlo, J.A., Brown, T.R., Edwards, D.W., Kiresuk, T.J., & Newman, F.L. (1986). Assessing mental health treatment outcome measurement techniques (DHHS Publication No. ADM 86-1301). Washington, DC: U.S. Government Printing Office. Claghorn, J.L., Johnstone, E.E., Cook, T.H., & Itschner, L. (1974). Group therapy and maintenance treatment of schizophrenics. Archives of General Psychiatry, 31, 361-365. Coyne, L., & Spohn, H.E. (1989). Dimensions of the Brief Psychiatric Rating Scale in schizophrenic samples with increasing psychopathology. Schizophrenia Research, 2, 199. Craig, T.J., Richardson, M.A., Pass, R., & Bregman, Z. (1985). Measurement of mood and affect in schizophrenic inpatients. American Journal of Psychiatry, 142, 1272-1277. Csernansky, J.G., King, R.J., Faustman, W.O., Moses, J.A., Jr., Poscher, M.E., & Faull, K. F. (1990). 5-HIAA in cerebrospinal fluid and deficit schizophrenic characteristics. British Journal of Psychiatry, 156, 501-507. Czobor, P., Bitter, I., & Volavka, J. (1991). Relationship between the Brief Psychiatric Rating Scale and the Scale for the Assessment of Negative Symptoms: A study of their correlation and redundancy. Psychiatry Research, 36, 129-139. Czobor, P., & Volavka, J. (1996). Dimensions of the Brief Psychiatric Rating Scale: An examination of stability during haloperidol treatment. Comprehensive Psychiatry, 37, 205-215. De Freitas, B., & Schwartz, G. (1979). Effects of caffeine in chronic psychiatric patients. American Journal of Psychiatry, 136, 1337-1338. den Boer, J.A., Ravelli, D.P., Huisman, J., Ohrvik, J., Verhoeven, W.M.A., & Westenberg,

< previous page

page_826

next page > Page 826

H.G.M. (1990). A double-blind comparative study of remoxipride and haloperidol in acute schizophrenia. Acta Psychiatrica Scandinavica, 82(Suppl. 358), 108-110. Dingemans, P.M., Linszen, D.H., Lenior, M. E., & Smeets, R.M. (1995). Component structure of the expanded Brief Psychiatric Rating Scale (BPRS-E). Psychopharmacology, 122, 263-267. Dingemans, P.M., Winter, M.L.F., Bleeker, J.A.C., & Rathod, P. (1983). A cross-cultural study of the reliability and factorial dimensions of the Brief Psychiatric Rating Scale (BPRS). Psychopharmacology, 80, 190-191. Dixon, L., Haas, G., Weiden, P.J., Sweeney, J., & Frances, A.J. (1991). Drug abuse in schizophrenic patients: Clinical correlates and reasons for use. American Journal of Psychiatry, 148, 224-230. Endicott, J., & Spitzer, R.L. (1978). A diagnostic interview: The Schedule for Affective Disorders and Schizophrenia. Archives of General Psychiatry, 35, 837-844. Faustman, W.O., Moses, J.A., Jr., & Csernansky, J. G. (1988). Luria-Nebraska performance and symptomatology in unmedicated schizophrenic patients. Psychiatry Research, 26, 29-34. Faustman, W.O., Moses, J.A., Jr., Csernansky, J.G., & White, P.A. (1989). Correlations between the MMPI and the Brief Psychiatric Rating Scale in schizophrenic and schizoaffective patients. Psychiatry Research, 28, 135143. Feighner, J.P., Merideth, C.H., & Claghorn, J.L. (1984). Multicenter placebo-controlled evaluation of nomifensine treatment in depressed outpatients. Journal of Clinical Psychiatry, 45, 47-51. Flemenbaum, A., & Zimmermann, R.L. (1973). Inter- and Intra-rater reliability of the Brief Psychiatric Rating Scale. Psychological Reports, 36, 783-792. Friedhoff, A.J., & Silva, R.R. (1995). The effects of neuroleptics on plasma homovanillic acid. In F.E. Bloom & D.J. Kupfer (Eds.), Psychopharmacology: The fourth generation of progress (pp. 1229-1233). New York: Raven. Gabbard, G.O., Coyne, L., Kennedy, L.L., Beasley, C., Deering, C.D., Schroder, P., Larson, J., & Cerney, M.S. (1987). Inter-rater reliability in the use of the Brief Psychiatric Rating Scale. Bulletin of the Menninger Clinic, 51, 519-531. Gale, J., Pfefferbaum-Levine, B., Suhr, M.A., & Overall, J.E. (1986). The Brief Psychiatric Rating Scale for Children: A reliability study. Journal of Clinical Child Psychology, 15, 341-345. Glynn, S.M., Randolph, E.T., Eth, S., Paz, G. G., Leong, G.B., Shaner, A.L., & Strachan, A. (1990). Patient psychopathology and expressed emotion in schizophrenia. British Journal of Psychiatry, 157, 877-880. Goldman, R.S., Axelrod, B.N., Tandon, R., Woodward, J.L., Alphs, L.D., Faustman, W. O., Hohagen, F., Gattaz, W.F., Keshavan, M., & Greden, J.F. (1997). Latent structure modeling of the Brief Psychiatric Rating Scale in schizophrenia. Unpublished manuscript. Goldman, R.S., Tandon, R., Liberzon, I., & Greden, J.F. (1992). Measurement of depression and negative symptoms in schizophrenia. Psychopathology, 25, 49-56. Goldman, R.S., Tandon, R., Woodward, J., Faustman, W., Hohagen, F., Gattaz, W.F., Keshavan, M.S., & Muckherjee, S. (1991). Consistency of psychopathological dimensions in schizophrenia as determined by the BPRS: A multicenter factor analytic study. Biological Psychiatry, 29, 45A (abstract). Gorham, D.R., & Overall, J.E. (1961). Dimensions of change in psychiatric symptomatology. Diseases of the Nervous System, 22, 576-580. Gottlieb, G.L., Gur, R.E., & Gur, R.C. (1988). Reliability of psychiatric scales in patients with dementia of the Alzheimer type. American Journal of Psychiatry, 145, 857-860. Green, M.F., Nuechterlein, K.H., Ventura, J., & Mintz, J. (1990). The temporal relationship between depressive and psychotic symptoms in recent-onset schizophrenia. American Journal of Psychiatry, 147, 179-182. Guelfi, G.P., Faustman, W.O., & Csernansky, J.G. (1989). Independence of positive and negative symptoms in a population of schizophrenic patients. Journal of Nervous and Mental Disease, 177, 285-290. Gur, R.E., Mozley, D., Resnick, S.M., Levick, S., Erwin, R., Saykin, A.J., & Gur, R.C. (1991). Relations among clinical scales in schizophrenia. American Journal of Psychiatry, 148, 472-478.

< previous page

page_826

next page >

< previous page

page_827

next page > Page 827

Hamilton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry, 23, 56-62. Harvey, P.D., Davidson, M., White, L., Keefe, R.S.E., Hirschowitz, J., Mohs, R.C., & Davis, K.L. (1996). Empirical evaluation of the factorial structure of clinical symptoms in schizophrenia: Effects of typical neuroleptics on the Brief Psychiatric Rating Scale. Biological Psychiatry, 40, 755-760. Hedlund, J.L., & Vieweg, B.W. (1980). The Brief Psychiatric Rating Scale: A comprehensive review. Journal of Operational Psychiatry, 11, 49-65. Heinemann, S.H., Yudin, L.E., & Perlmutter, F. (1975). A follow-up study of clients discharged from a day hospital aftercare program. Hospital and Community Psychiatry, 26, 752-754. Heinrichs, D.W., Hanon, T.E., & Carpenter, W.T. (1984). The Quality of Life Scale: An instrument for rating the schizophrenia deficit syndrome. Schizophrenia Bulletin, 10, 388-398. Hoffmann, H., & Wefring, L.R. (1972). Sex and age differences in psychiatric symptoms of alcoholics. Psychological Reports, 30, 887-889. Hoffmann, H., Wehler, R., & Noehl, G.V. (1978). Psychiatric symptoms of lobotomized and non-lobotomized chronic schizophrenics. Psychological Reports, 42, 262. Hollister, L.E., & Csernansky, J.G. (1990). Clinical pharmacology of psychotherapeutic drugs (3rd ed.). New York: Churchill Livingstone. Hollister, L.E., Overall, J.E., Pokorny, A.D., & Shelton, J. (1971). Acetophenazine and diazepam in anxious depression. Archives of General Psychiatry, 24, 273-278. Horowitz, M.J., Krupnick, J., Kaltreider, N., Wilner, N., Leong, A., & Marmar, C. (1981). Initial psychological response to parental death. Archives of General Psychiatry, 38, 316-323. Horowitz, M.J., Marmar, C.R., Weiss, D.S., Kaltreider, N.B., & Wilner, N.R. (1986). Comprehensive analysis of change after brief dynamic psychotherapy. American Journal of Psychiatry, 143, 582-589. Kane, J., Honigfeld, G., Singer, J., & Meltzer, H. (1988). Clozapine for the treatment-resistant schizophrenic. A double-blind comparison with chlorpromazine. Archives of General Psychiatry, 45, 789-796. Karson, C.N., & Bigelow, L. (1986). The paranoid quotient. A BPRS ratio for exploring subtypes in schizophrenia. Acta Psychiatrica Scandinavica, 73, 39-41. Kay, S.R., Fiszbein, A., & Opler, L.A. (1987). The Positive and Negative Syndrome Scale (PANSS) for schizophrenia. Schizophrenia Bulletin, 13, 261-276. Kellner, R., Wilson, R.M., Muldawer, M.D., & Pathak, D. (1975). Anxiety in schizophrenia. The responses to chlordiazepoxide in an intensive design study. Archives of General Psychiatry, 32, 1246-1254. Konick, D.S., Friedman, I., Paolino, A.F., & Graham, J.R. (1972). Changes in symptomatology associated with short-term psychiatric hospitalization. Journal of Clinical Psychology, 28, 385-390. Kraemer, H.C. (1991). To increase power in randomized clinical trials without increasing sample size. Psychopharmacology Bulletin, 27, 217-224. Kraemer, H.C., & Thiemann, S. (1989). A strategy to use soft data effectively in randomized controlled clinical trials. Journal of Consulting and Clinical Psychology, 57, 148-154. Kulhara, P., & Chadda, R. (1987). A study of negative symptoms in schizophrenia and depression. Comprehensive Psychiatry, 28, 229-235. Lewandowski, D., & Graham, J.R. (1972). Empirical correlates of frequently occurring two-point MMPI code types: A replicated study. Journal of Consulting and Clinical Psychology, 39, 467-472. Lindenmayer, J.-P., Bernstein-Hyman, R., & Grochowski, S. (1994). A new five factor model of schizophrenia. Psychiatric Quarterly, 65, 299-322. Lindenmayer, J.-P., Bernstein-Hyman, R., Grochowski, S., & Bark, N. (1995). Psychopathology of schizophrenia: Initial validation of a 5-factor model. Psychopathology, 28, 22-31. Lindenmayer, J.-P., Grochowski, S., & Hyman R.B. (1995). Five factor model of schizophrenia: Replication across samples. Schizophrenia Research, 14, 229-234. Lohr, J.B., & Wisniewski, A.A. (1987). Movement disorders: A neuropsychiatric approach. New York: Guilford.

< previous page

page_827

next page >

< previous page

page_828

next page > Page 828

Lorr, M., Jenkins, R.L., & Holsopple, J.Q. (1953). Multidimensional scale for rating psychiatric patients (VA Technical Bulletin No. 10-507). Washington, DC: Veterans Administration. Lucas, P.B., Pickar, D., Kelsoe, J., Rapaport, M., Pato, C., & Hommer, D. (1990). Effects of the acute administration of caffeine in patients with schizophrenia. Biological Psychiatry, 28, 35-40. Lukoff, D., Liberman, R.P., & Nuechterlein, K. H. (1986). Symptom monitoring in the rehabilitation of schizophrenic patients. Schizophrenia Bulletin, 12, 578-593. Lukoff, D., Nuechterlein, K.H., & Ventura, J. (1986). Manual for expanded Brief Psychiatric Rating Scale (BPRS). Schizophrenia Bulletin, 12, 594-602. Marshall, B.D., Glynn, S.M., Midha, K.K., Hubbard, J.W., Bowen, L.L., Banzett, L., Mintz, J., & Liberman, R.P. (1989). Adverse effects of fenfluramine in treatment refractory schizophrenia. Journal of Clinical Psychopharmacology, 9, 110-115. Martin, P.J., Moore, J.E., & Sterne, A.L. (1977). Therapists as prophets: Their expectancies and treatment outcome. Psychotherapy: Theory, Research and Practice, 14, 188-195. Martin, P.J., Moore, J.E., Sterne, A.L., & McNairy, R.M. (1977). Therapists prophesy. Journal of Clinical Psychology, 33, 502-510. Mass, R., Burmeister, J., & Krausz, M. (1997). Dimensional structure of the German version of the Brief Psychiatric Rating Scale (BPRS). Nervenarzt, 68, 239-244. McAdams, L.A., Harris, M.J., Bailey, A., Fell, R., & Jeste, D.V. (1996). Validating specific psychopathology scales in older outpatients with schizophrenia. Journal of Nervous and Mental Disease, 184, 246-251. Meltzer, H.Y., Burnett, S., Bastani, B., & Ramirez, L.F. (1990). Effects of six months of clozapine treatment on the quality of life of chronic schizophrenic patients. Hospital and Community Psychiatry, 41, 892-897. Müller, N., Hofschuster, E., Ackenheil, M., & Eckstein, R. (1993). T-cells and psychopathology in schizophrenia: Relationship to the outcome of neurolpetic therapy. Acta Psychiatrica Scandinavica, 87, 66-71. Mullins, D., Pfefferbaum B., Schultz, H. and Overall, J.E. (1986). Brief Psychiatric Rating Scale for Children: Quantitative scoring of medical records. Journal of Clinical Psychiatry, 19, 43-49. Naber, D., Leppig, M., Grohmann, R., & Hippius, H. (1989). Efficacy and adverse effects of clozapine in the treatment of schizophrenia and tardive dyskinesiaA retrospective study of 387 patients. Psychopharmacology, 99, S73-S76. Nair, N.P.V., Suranyi-Cadotte, B., Schwartz, G., Thavundayil, J.X., Achim, A., Lizondo, E., & Nayak, R. (1986). A clinical trial comparing intramuscular haloperidol decanoate and oral haloperidol in chronic schizophrenic patients: Efficacy, safety, and dosage equivalence. Journal of Clinical Psychopharmacology, 6(Suppl.), 30S-37S. Newcomer, J.W., Faustman, W.O., Whiteford, H. A., Moses, J.A., Jr., & Csernansky, J. G. (1991). Symptomatology and cognitive impairment associate independently with post-dexamethasone cortisol concentrations in unmedicated schizophrenic patients. Biological Psychiatry, 29, 855-864. Newcomer, J.W., Faustman, W.O., Yeh, W., & Csernansky, J.G. (1990). Distinguishing depression and negative symptoms in unmedicated patients with schizophrenia. Psychiatry Research, 31, 243-250. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Nicholson, I.R., Chapman, J.E., & Neufeld, R.W.J. (1995). Variability in BPRS definitions of positive and negative symptoms. Schizophrenia Research, 17, 177-185. Otani, K., Kondo, T., Kaneko S., Ishida, M., & Fukishima, Y. (1994). Correlation between prolactin response and therapeutic effects of zotepine in schizophrenic patients. International Clinical Psychopharmacology, 9, 287-289. Overall, J.E. (1971). Associations between marital history and the nature of manifest psychopathology. Journal of Abnormal Psychology, 78, 213-221. Overall, J.E. (1974). The Brief Psychiatric Rating Scale in psychopharmacology research. In P. Pichot (Ed.) Psychological measurements in psychopharmacology: Modern problems

< previous page

page_828

next page >

< previous page

page_829

next page > Page 829

in pharmacopsychiatry (pp. 67-78). Basel: Karger. Overall, J.E. (1983). Brief Psychiatric Rating Scale and Brief Population History Form. In P.A. Keller & L.G. Ritt (Eds.), Innovation in clinical practice: A source book (Vol. 2, pp. 307-316). Sarasota, FL: Professional Resources Exchange. Overall, J.E., & Gorham, D.R. (1962). The brief psychiatric rating scale. Psychological Reports, 10, 799-812. Overall, J.E., & Gorham, D.R. (1988). The Brief Psychiatric Rating Scale (BPRS): Recent developments in ascertainment and scaling. Psychopharmacology Bulletin, 24, 97-99. Overall, J.E., Gorham, D.R., & Shawver, J.R. (1961). Basic dimensions of change in symptomatology of chronic schizophrenics. Journal of Abnormal and Social Psychology, 62, 597-602. Overall, J.E., Henry, B.W., & Ford, H. (1971). Background variables and outpatient psychopathology. Psychological Reports, 28, 303-309. Overall, J.E., & Hollister, L.E. (1982). Decision rules for phenomenological classification of psychiatric samples. Journal of Consulting and Clinical Psychology, 50, 535-545. Overall, J.E., Hollister, L.E., & Pichot, P. (1967). Major psychiatric disorders. A fourdimensional model. Archives of General Psychiatry, 16, 146-151. Overall, J.E., & Klett, C.J. (1972). Applied multivariate analysis. New York: McGraw-Hill. Overall, J.E., & Pfefferbaum, B. (1982). The Brief Psychiatric Rating Scale for Children. Psychopharmacology Bulletin, 18, 10-16. Overall, J.E., & Pfefferbaum, B. (1984). A brief scale for rating psychopathology in children. In P.A. Keller & L.G. Ritt (Eds.), Innovations in clinical practice (Vol. 3, pp. 257-266). Sarasota, FL: Professional Resource Exchange. Overall, J.E., & Rhoades, H.M. (1988). Clinician-rated scales for multidimensional assessment of psychopathology in the elderly. Psychopharmacology Bulletin, 24, 587-594. Ownby, R.L., Koss, E., Smyth, K.A., & Whitehouse, P.J. (1994). The factor structure of the Brief Psychiatric Rating Scale in Alzheimer's disease. Journal of Geriatric Psychiatry and Neurology, 7, 245-250. Peralta, V., & Cuesta, M.J. (1994). Psychometric properties of the Positive and Negative Syndrome Scale (PANSS) in schizophrenia. Psychiatry Research, 53, 31-40. Pokorny, A.D., & Faibish, G.M. (1968). Criteria of outcome in schizophrenia. Hospital and Community Psychiatry, 11, 341-346. Pokorny, A.D., & Kaplan, H.B. (1976). Suicide following psychiatric hospitalization. Journal of Nervous and Mental Disease, 162, 119-125. Pull, C.B., & Overall, J.E. (1977). Adequacy of the Brief Psychiatric Rating Scale for distinguishing lesser forms of psychopathology. Psychological Reports, 40, 167-173. Raskin, A., & Crook, T.H. (1976). Sensitivity of rating scales completed by psychiatrists, nurses, and patients to antidepressant drug effects. Journal of Psychiatric Research, 13, 31-41. Rhoades, H.M., & Overall, J.E. (1988). The semi-structured BPRS interview and rating guide. Psychopharmacology Bulletin, 24, 101-104. Simpson, D.M., & Davis, G.C. (1985). Measuring thought disorder with clinical rating scales in schizophrenic and nonschizophrenic patients. Psychiatry Research, 15, 313-318. Small, J.G., Kellams, J.J., Milstein, V., & Moore, J. (1975). A placebo-controlled study of lithium combined with neuroleptics in chronic schizophrenic patients. American Journal of Psychiatry, 132, 1315-1317. Spitzer, R.L., Endicott, J., & Robins, E. (1978). Research Diagnostic Criteria: Rationale and reliability. Archives of General Psychiatry, 35, 773-791. Stavrakaki, C., Vargo, B., Boodoosingh, L., & Roberts, N. (1987). The relationship between anxiety and depression in children: Rating scales and clinical variables. Canadian Journal of Psychiatry, 32, 433-439. Swanson, C.L., Turetsky, B.I., Bilker, W., Gur, R. C., & Gur, R.E. (1997). Rater gender and symptom ratings of patient with schizophrenia. Schizophrenia Reseach, 24, 23. Tandon, R., Mann, N.A., Eisner, W.H., & Coppard, N. (1990). Effect of anticholinergic medication on positive and negative symptoms in medication-free schizophrenic patients. Psychiatry Research, 31, 235-241. Tarell, J.D., & Schulz, S.C. (1988). Nursing assessment using the BPRS: A structured interview. Psychopharmacology Bulletin, 24, 105-111.

< previous page

page_830

next page > Page 830

Thiemann, S., Csernansky, J.G., & Berger, P. A. (1987). Rating scales in research: The case of negative symptoms. Psychiatry Research, 20, 47-55. Thompson, P.A., Buckley, P.F., & Meltzer, H. Y. (1994) The Brief Psychiatric Rating Scale: Effect of scaling system on clinical response assessment. Journal of Clinical Psychopharmacology, 14, 344-346. Tuthill, E.W., Overall, J.E., & Hollister, L.E. (1967). Subjective correlates of clinically manifested anxiety and depression. Psychological Reports, 20, 535-542. Ward, L.C., & Dillon, E.A. (1990). Psychiatric symptom correlates of the Minnesota Multiphasic Personality Inventory (MMPI) Masculinity-Femininity Scale. Psychological Assessment, 2, 286-288. Westermeyer, J., & Neider, J. (1988). Social networks and psychopathology among substance abusers. American Journal of Psychiatry, 145, 1265-1269. Woerner, M.G., Alvir, J.M.J., Kane, J., Saltz, B. L., & Lieberman, J.A. (1995). Neuroleptic treatment of elderly patients. Psychopharmacology Bulletin, 31, 333-337. Woerner, M.G., Mannuzza, S., & Kane, J.M. (1988). Anchoring the BPRS: An aid to improved reliability. Psychopharmacology Bulletin, 24, 112-117. Wolkowitz, O.M., Breier, A., Doran, A., Kelsoe, J., Lucas, P., Paul, S.M., & Pickar, D. (1988). Alprazolam augmentation of the antipsychotic effects of fluphenazine in schizophrenic patients. Archives of General Psychiatry, 45, 664-671. Yenson, R., DiLeo, F.B., Rhead, J.D., Richards, W.A., Soskin, R.A., Turek, B., & Kurland, A.A. (1976). MDAAssisted psychotherapy with neurotic outpatients: A pilot study. Journal of Nervous and Mental Disease, 163, 233-245. Yesavage, J.A. (1984). Correlates of dangerous behavior by schizophrenics in hospital. Journal of Psychiatric Research, 18, 225-231. Yesavage, J.A., Becker, J.M.T., Werner, P.D., Patton, M.J., Seeman, K., Brunsting, D.W., & Mills, M.J. (1983). Family conflict, psychopathology, and dangerous behavior by schizophrenic inpatients. Psychiatry Research, 8, 271-280.

< previous page

page_830

next page >

< previous page

page_831

next page > Page 831

Chapter 27 The Outcome Questionnaire Michael J. Lambert Arthur E. Finch Brigham Young University As health care becomes more specialized it also becomes more expensive, particularly for inpatient services. As a result, American health care systems must become ever more cost conscious. In an effort to contain costs, third-party health care providers in the United States have attempted to reduce the unnecessary utilization of services through unit price containment, or as it is commonly called, managed health care. As managed health care practices continue to emerge within the mental health care field, these third-party providers must continually seek to measure the efficacy or inefficiency of various mental health care services across visits in an effort to maximize their service-to-cost ratio (Bloom, 1987; Brokowski, 1991; Richardson & Austad, 1991; Sabin, 1991). These efforts to contain costs while maintaining quality appear to be a worldwide phenomena rather than one that is limited to the United States or particular payer systems and views. Outcome assessment continues to emerge in this country and across the world as a reliable means of defining treatment goal criteria and monitoring the efficacy of treatments (Ahmed & Smith, 1991; Lambert, 1983; Mirin & Namerow, 1991; Moses-Zirkes, 1993). However, this process of outcome measurement is complicated by the numerous and varied measures being created and used by researchers and clinicians (J.E. Froyd, Lambert, & E. Froyd, 1996; Lambert, Ogles, & Masters, 1992; Moses-Zirkes, 1993). Some measures are well suited to the task of assessing patient improvement and deterioration, whereas others have more limitations than advantages. The managed health care arena requires that any outcome measure be easy to score, have low cost per administration, high sensitivity to changes in psychological distress over short periods of time, as well as the ability to tap into a wide array of characteristics associated with mental health functioning (Lambert, Hansen et al., 1996). In response to this need for valid measures of psychotherapy outcome in managed care, Lambert and associates developed the Outcome Questionnaire (OQ-45), a 45-item, self-report instrument that requires patients to rate their feelings on a 5-point Likert scale ranging from ''never" to "always." The OQ-45 is also designed to access common symptoms across a wide range of adult mental disorders and syndromes, including stress-related

< previous page

page_831

next page >

< previous page

page_832

next page > Page 832

illness and DSM-IV Axis V codes. In addition, the OQ-45 was designed to be used as a baseline screening instrument with application for gross treatment assignment decisions. However, it was not developed to be used as a diagnostic tool, a task better served by traditional diagnostic practices, including the use of more appropriate instruments such as the MMPI-2 (Lambert, Burlingame, et al., 1996). Overview Summary of Development The selection of specific items for the OQ-45 was determined by several considerations. First, items were selected that addressed commonly occurring problems across a wide variety of disorders. Second, items needed to tap the symptoms that are most likely to occur across patients, regardless of their unique problems. Third, items needed to measure personally and socially relevant characteristics that effect the quality of life of the individual. Finally, the number of items was limited so that administration of the OQ-45 assists, rather than hinders, customary clinical practice. The length of the OQ-45 makes it tolerable to patients and suitable for repeated testing while providing clinicians with data that can be used for decision making (Burlingame, Lambert, Reisinger, Neff, & Mosier, 1995). The development of items for the OQ-45 was based on the belief that three broad content areas are of critical importance in measuring patient status and psychotherapy outcome. These three areas are summarized by Lambert and Hill (1994) as reflecting "the need to evaluate changes that occur within the client, in the client's intimate relationships, and in the client's participation in community and social roles" (p. 79). These three content areas are the focus of items in the three subscales on the OQ-45: Symptom Distress, Interpersonal Relations, and Social Role. Symptom Distress. The Symptom Distress subscale (25 items) was derived from the 1988 National Institute of Mental Health (NIMH) study that identified the most prevalent types of mental disorders across five U.S. catchment areas, and from a review of 1992 Human Affairs International data on the most frequently diagnosed DSM-III-R diagnostic codes in their nationwide service area. The 1988 epidemiological study of 18,571 people across the United States showed that 15.4% of the population over age 18 fulfilled diagnostic criteria for a mental disorder. Approximately 12% of the total population received either an anxiety diagnosis or an affective disorder classification (Regier et al., 1988). The Human Affairs International data on diagnostic codes given to 2,145 patients reported nearly one third of the diagnoses given involved a form of affective disorder. An additional one third were based on some kind of anxiety disorder, including posttraumatic stress disorder. These data suggest that the most common intrapsychic symptoms to measure are depression- and anxiety-based, particularly when adjustment disorders are also taken into account. However, considerable research suggests that the symptoms of anxiety and depression cannot be easily separated, and that they tend to occur simultaneously and in a wide variety of patients (e.g., Feldman, 1993). Therefore, the OQ-45 is heavily loaded with such items, but no attempt has been made to provide separate scales for anxious and depressive symptomatology. In addition to symptomatology characteristic of anxiety and depression, the OQ-45 includes two items that screen for substance abuse.

< previous page

page_832

next page >

< previous page

page_833

next page > Page 833

Interpersonal Relations. The OQ-45 includes 11 items that measure satisfaction with, as well as problems in, interpersonal relations. Research on life satisfaction and quality of life suggest that people consider positive relationships essential to happiness (Andrews & Witney, 1974; Beiser, 1983; Blau, 1977; Deiner, 1984; Veit & Ware, 1983). Research on patients seeking therapy has shown that the most frequent problems addressed in therapy are interpersonal in nature (Horowitz, 1979; Horowitz, Rosenberg, Baer, Ureno, & Villasenor, 1988). Although factors associated with quality of life vary from study to study, most emphasize the importance of intimate relationships and their central contribution to well-being (Deiner, 1984; Zautra, 1983). In addition, interpersonal problems are clearly related to interpersonal distress, either as a direct cause or result of psychopathology, or as both a cause and a result (Horowitz et al., 1988). Therefore, items dealing with friendships, family, life, and marriage are included for assessment. These include items that attempt to measure friction, conflict, isolation, inadequacy, and withdrawal in interpersonal relationships. These items were derived from the marital and family therapy literature, as well as from research on those interpersonal problems most often complained of by patients who are undergoing psychotherapy (Horowitz et al., 1991). Social Role. Social role performance is assessed by focusing on the patient's level of dissatisfaction, conflict, distress, and inadequacy in tasks related to their employment, family roles, and leisure life (nine items). Assessment of social roles suggests that a person's intrapsychic problems and symptoms can effect their ability to work, love, and play. This is supported by the quality of life research already discussed, as well as the rationale that once people start to develop symptoms it is common for these symptoms to have an effect on their personal and work lives (Frisch, Cornell, Villaneuva, & Retzlaff, 1992). Kopta, Howard, Lowry, and Beutler (1994) also suggested that these symptoms can exist somewhat independently of intrapsychic symptoms and subjective distress. Thus, items were developed that measure performance in societal tasks such as work and leisure. Satisfaction in these areas is highly correlated with ratings of overall life satisfaction (Beiser, 1983; Blau, 1977; Frisch et al., 1992; Veit & Ware, 1983). Administration The OQ-45 is self-administering and requires no instructions beyond those printed on the answer sheet. It is appropriate for persons who have a sixth-grade reading level and who are age 18 or older. Patients should be encouraged to complete all items. It should also be mentioned that because of this test's inherent face validity, subjects taking this test can be affected by the attitudes of those who are in charge of the administration. It is very important for the test administrator to encourage the subject to fill out the scale in an honest and conscientious manner. Like many other self-report inventories, negative or biasing attitudes by clinicians or others who administer this test can severely impair its validity. Under typical circumstances, subjects will complete the scale in about 5 minutes. Some particularly careful individuals may require as much as 18 to 20 minutes, whereas others can complete the test in 3 to 4 minutes. If patients are unable to read, physically unable to write, or if the test is administered by phone in a follow-up study, administration of the test can be completed by reading items to patients. This is accomplished by giving patients a card with a 0 to 4 numerical scale (i.e., "never" to "almost always"),

< previous page

page_833

next page >

< previous page

page_834

next page > Page 834

or by asking them to write the scale out and refer to it while the administrator reads the items. The administrator then enters the item responses on the test blank. Little is known at this time about the differences in face-to-face versus phone administration of the OQ-45. Therefore, caution should be used in interpreting scores that are based on phone or other oral administration of the test. Types of Available Norms Normative data were drawn from several samples collected across a variety of geographical locations in the United States. These consisted of undergraduate student samples, community samples, and samples from a variety of business settings. The undergraduate samples from Utah, Idaho, and Ohio were tested in a classroom setting, with a proctor administering the tests to the students after reading the test directions and obtaining informed consent. In order to insure candid responses, subject's names were not associated with their test. Retest administration conducted 3 weeks following the initial testing period followed the same procedure. The OQ-45 has also been administered in a classroom setting on a weekly basis over a 10-week interval. Stability coefficients based on the Pearson product-moment coefficient allow estimates of the reliability of testing performed on a weekly basis. The community sample was drawn from a variety of locations. A subsample of 208 individuals was collected from Utah. Subjects were first contacted by phone, choosing each 10th name in the local Utah County phone directory. At that time, adults in the household were asked if they would fill out questionnaires in order to help the researchers understand the tests and how people respond to them. If they consented to participate, they were mailed questionnaires along with the consent form and a return envelope. After a week they were contacted by phone to see if they had complied. If they had not, they were encouraged to do so. Responses were anonymous to encourage candid reporting. Additional normative groups were collected from business settings. A large national insurance firm with 800 employees agreed to allow researchers to administer the OQ-45. A letter was sent under the signature of the primary researcher to each of the employees. The purpose of testing was explained and they were asked to complete the OQ-45 and return it in a self-addressed envelope. Completion of the test was voluntary and employees were instructed not to provide their name or other identifying information. Of the 800 OQ-45s that were mailed out, 365 (45%) were returned. This same procedure was also replicated in Ohio among a variety of business settings. The data collected from the various community and business locations were analyzed for differences using a one-way analysis of variance (ANOVA). As no significant differences were found, the community data was merged into one large sample of 815 subjects. Data from the clinical samples were typically collected by clinic receptionists who administered the OQ-45 prior to the patient's first therapy session and any subsequent therapy session. Included in the test packet was information pertaining to subject confidentiality, as well as a consent form. University counseling center data came from a counseling center at a large private western university. Student clients were included in the sample whether or not they received a DSM diagnosis. Employee assistance program (EAP) patients came from a database supplied by Human Affairs International. Members of the EAP patient sample sought or were referred for assistance, and they received a DSM-III-R diagnosis. EAP patients who came for help or were referred by

< previous page

page_834

next page >

< previous page

page_835

next page > Page 835

a supervisor but who were not diagnosed or treated for an emotional problem were excluded from the study. Also excluded were patients who were immediately referred for inpatient or outpatient treatment to outside providers. The EAP data summarizes responses from patients across seven different states. The outpatient clinic sample was drawn from a university-based outpatient clinic used to train clinicians in social work, clinical psychology, and marriage and family therapy. The mental health sample was drawn from an Ohio-based community mental health center serving a mostly rural catchment area. Inpatient data came from samples in Utah and Massachusetts. Data from the various clinical samples have been combined when the OQ-45 subscale scores between groups were comparable. At this point in time, normative data on the following samples have been analyzed: college undergraduates, community volunteers, university counseling center clients, employee assistance program patients, university outpatient clinic patients, outpatient private practice and clinic patients, community mental health center patients, and inpatients. Some of this normative data is presented in Table 27.1 for the total score and in Table 27.2 for the domain or subscale scores. These and related tables present OQ data in terms of raw score units. It is apparent from these tables that there are clear differences between the nonpatient samples and the patient samples in mean scores. Analysis of this data has been reported elsewhere (Lambert, Burlingame, et al., 1996; Umphress, Lambert, Smart, Barlow, & Clouse, 1997). Gender Differences. Comparisons between males and females within normal and patient samples indicate no significant difference between these groups. Unlike the MMPI and some other psychological tests, separate norms do not appear to be necessary. For the groups where gender data was available, it is apparent that no differences exist between the average scores of males and females (see Table 27.3). Inferential statistics (F test) confirm the obvious similarities between male and female OQ-45 scores. This is true in both patient and nonpatient samples. Thus, it does not appear to be necessary to have distinct male/female norms or interpretative graphs. Age Differences. The OQ-45 has been administered to adults between the ages of 17 and 80. Data at the upper end of the age continuum is not yet sufficient to draw firm conclusions, but the data analyzed up to this point does not suggest a significant correlation between age and OQ-45 scores. Exemplary data on this topic are presented TABLE 27.1 Normative Groups for the OQ-45 Raw Total Score Sample N M SD Undergraduate Students (Utah) 235 42.15 16.61 Undergraduate Students (Idaho) 131 51.34 24.45 Undergraduate Students (Ohio) 172 45.63 18.06 Community 815 45.19 18.57 EAP Clinical Services 441 73.61 21.39 University Counseling Center 486 75.16 16.74 Community Mental Health 342 83.09 22.23 Inpatient 207 88.80 26.66 Note. From Administration and Scoring Manual for the OQ45 (p. 6), by Lambert, M.J., Hansen, N.B. Umphress, V., Lunnen, K., Okiishi, J., Burlingame, G.M., & Reisinger, C.W. (1996). Copyright © 1996 by American Professional Credentialing Services LLC, Stevenson,MD. Reprinted with permission.

< previous page

page_835

next page >

< previous page

page_836

next page > Page 836

TABLE 27.2 Normative Groups for the OQ-45 Subscale Raw Scores Distress Interpersonal Social Role Sample N M SD M SD M SD Undergraduate Students 23522.9610.48 8.78 4.97 10.403.62 (Utah) Undergraduate Students 13127.5114.55 12.42 7.20 11.414.73 (Idaho) Undergraduate Students 17225.2011.04 10.30 5.33 10.133.69 (Ohio) Community 81525.4311.55 10.20 5.56 9.56 3.87 EAP Clinical Services 44142.8714.33 17.15 6.05 13.774.90 University Counseling Center 48641.2814.53 18.57 4.28 14.643.96 Community Mental Health 34249.4015.05 19.68 5.93 14.015.30 Inpatient 20749.9215.97 20.73 7.44 15.907.67 Note: From Administration and Scoring Manual for the OQ-45 (p. 6), by Lambert, M.J. Hansen, N.B., Umphress, V., Lunnen, K., Okiishi, J., Butlingame, G.M., & Reisinger, C.W. (1996) Copyright © 1996 by American Professional Credentialing Services LLC, Stevenson, MD. Reprinted with permission. TABLE 27.3 OQ-45 Raw Scores for Normative and Clinical Samples by Gender Distress InterpersonalSocial Role N Sample Total M (SD) M (SD) M (SD) Score 238 42.33 23.08(10.53) 8.95 (5.39) 10.37(3.62) Undergraduate (16.60) 91 42.73 22.71(10.07) 9.81 (6.24) 10.43(3.63) Male (15.89) 147 Female 42.10 23.43(10.89) 8.31 (4.72) 10.35(3.65) (17.21) 102 48.16 25.73(10.26) 10.81 (5.74) 9.81 (3.91) Community (18.23) 46 49.20 25.37 (9.70) 11.51 (5.83) 10.43(3.39) Male (17.59) 56 48.43 26.52(10.85) 10.52 (5.39) 9.48 (3.95) Female (18.48) 504 73.02 41.83(14.15) 17.13 (6.03) 13.76(4.83) EAP (21.05) 198 73.52 41.64(14.48) 17.49 (6.26) 14.41(5.04) Male (21.87) 306 72.70 41.96(13.96) 16.90 (5.88) 13.33(4.64) Female (20.70) University Outpatient 76 78.01 42.88(14.72) 17.25 (6.61) 14.24(5.72) Clinic (25.71) 23 76.27 40.86(15.08) 17.86 (6.42) 14.27(5.75) Male (26.53) 53 81.82 45.34(13.82) 17.80 (6.17) 14.70(5.62) Female (23.58) Note: From Administration and Scoring Manual for the OQ-45 (p. 8), by Lambert, M.J., Hansen, N.B., Umphres V., Lunnen, K., Okiishi, J., Burlingame, G.M., & Reisinger, C.W. (1996). Copyright © 1996 by American Profession: Credentialing Services LLC, Stevenson, MD. Reprinted with permission. in Table 27.4. These data are from the employee assistance program database (Lambert, Hansen, et al., 1996). Ethnicity. The OQ-45 has been administered to adults of several racial groups. The data available from some racial groups is not yet sufficient to draw definite conclusions, but for members of the Caucasian, African American, and Hispanic racial groups EAP data suggest that there is not a significant difference between race and OQ-45 total

score, or between race and either of the three subscale scores (see Table 27.5). The lack of significant difference on the total score and each of the subtest scores also suggests that the OQ-45 does not over- or underpathologize African American or Hispanic racial groups. At the same time, it is important to note that much of the data has been drawn from working adults so that interactions between social class and ethnic identity have been minimized (Nebeker, Lambert, & Heufner, 1995).

< previous page

page_836

next page >

< previous page

page_837

next page > Page 837

TABLE 27.4 OQ-45 Raw Scores by Age in a Sample of EAP Patients Total Score Symptom Interpersonal Social Role Distress Relations Performance Age N M (SD) M (SD) M (SD) M (SD) Range < 20 21 71.95(22.72)42.10 (15.41) 15.62 (6.56) 14.95 (4.73) 20-39 30373.44(21.08)41.99 (14.02) 17.50 (6.06) 13.73 (4.76) 40-59 17272.49(21.49)41.37 (14.46) 16.80 (5.84) 13.76 (4.82) >60 8 71.75(13.86)45.37(10.24) 14.50 (7.23) 11.38 (7.56) Note: From Administration and Scoring Manual for the OQ-45 (p. 8), by Lambert, M.J Hansen, N.B., Umphress, V., Lunnen, K., Okishi, J., Burlingame, C.W. (1996). Copyright © 1996 by American Professional Credentiating Services LLC, Stevenson, MD. Reprinted with permission. TABLE 27.5 OQ-45 Raw Scores by Ethnicity in a Sample of EAP Patients Symptom Interpersonal Social Role Total Score Distress Relations Performance Race N M (SD) M (SD) M (SD) M (SD) Caucasian 1,931 63.9 (22.7) 35.6 (14.7) 16.0 (6.5) 12.1 (4.8) African American 274 64.7 (24.1) 35.1 (14.8) 16.6 (6.8) 12.7 (5.4) Hispanic 36 63.5 (22.7) 36.7 (13.8) 15.5 (6.7) 12.7 (5.0) Other 37 66.1 (21.0) 37.0 (12.4) 16.1 (5.5) 12.2 (5.0) Note. From Administration and Scoring Manual for the Q-45 (p. 7), by Lambert, M.J., Hansen, N.B., Umphress, V., Lunnen, K., Okiishi, J., Burlingame, G.M., & Reisinger, C.W. (1996). Copyright © 1996 by American Professional Credentialing Services LLC, Sevenson, MD. Reprinted with permission. The Nebeker et al. (1995) data did reveal a significant difference on certain items between Caucasian and African American racial groups who were undergoing treatment. Those items that were significantly different from the Symptom Distress subscale included: "I am unable to keep disturbing thoughts out of my mind," "I have thoughts of ending my own life," "I am satisfied with my life,'' "After heavy drinking I need a drink the next morning to get going," "I tire quickly," and "I feel worthless." From the Interpersonal Relations subscale the following were different: "I have frequent arguments," "I feel lonely," "I feel my love relationships are full and complete," and "I am satisfied with my relationships with others." From the Social Role subscale, only one item was discriminatory: "I feel that I am not doing well in work/school." African Americans showed a tendency to report more symptomology on all of these items. For more detailed analysis of this data, the reader should consult Nebeker et al. (1995). An additional study was performed by Nebeker, Seely, Stephenson, and Lambert (1996) in an effort to explore the generizability of the norms developed for the OQ-45 on different populations of the Pacific Rim, generally classified as Asians and Polynesians. The actual ethnicity of the sample included Caucasian, Japanese, Chinese, Korean, Fijian, Maori, Kiribati, Cook Islander, Filipino, Hawaiian, Samoan, and Tongan college students. For OQ-45 total score, Caucasians scored significantly lower than Asians and Polynesians, and Polynesians scored significantly lower than Asians. This finding of

< previous page

page_837

next page >

< previous page

page_838

next page > Page 838

ethnic differences is consistent with many other comparative studies with Asian and Caucasian populations. Examples of such differences may include higher rates of expressed symptomatology and an increase in the overall rates of psychopathology in Asian populations (D. Sue & S. Sue, 1987; S. Sue & D. Sue, 1974; Ying, 1988). Yet again, there are also contradictory findings indicating no ethnic differences on some of the same measures (Tsushima & Onorato, 1982). Others have also urged cautious interpretation of such differences because confounding linguistic and cultural considerations are too vast to adequately account for (Park, Upshaw, & Koh, 1988; D. Sue & S. Sue, 1987; Williams, 1987). The findings of Nebeker et al. (1996) suggest that with persons of Asian ancestry, particularly recent immigrants and their families, scores on the OQ-45 should be interpreted with caution. Normative sampling of clinical and asymptomatic Asian samples need to be performed to determine clinically significant reliable change indices and cutoff scores for these populations. Until such data is obtained, clinicians and third-party providers using the OQ-45 should remember: There may be a response bias toward endorsing negative items and denying positive items; the collectivist heritage of many Asian respondents may clash with the individualistic questions of the OQ-45; and although some evidence suggests higher rates of psychopathology in Asian populations, elevations in the OQ-45 scores should be interpreted in light of the specific subject's linguistic and cultural background. Together these factors may result in elevated scores and reduced internal and external validity when using the OQ-45 with Asian populations, particularly those with less exposure to and experience with Western culture. In testing with Polynesian populations, clinicians and third-party providers should be aware that there also may be elevated scores resulting from additional family pressures. Again, these differences should be interpreted with caution as it is currently unclear whether such elevations indicate higher prevalence of these problems, or linguistic and cultural factors affecting the reporting of such issues. The Nebeker et al. (1996) study concluded that in spite of significant differences in Total OQ-45 scores and response patterns, the OQ-45 can still be a helpful measure for tracking psychotherapeutic outcome within ethnic populations. Because the OQ-45 was designed to measure clinical change resulting from therapy, subjects' scores from repeated administrations of the OQ-45 should be highly related to each other and provide idiographic validity in within-subject designs. Cutoff scores specific to particular ethnic populations will need to be developed. In spite of varied racial and cultural response sets and a lack of race-specific norms, the OQ-45 is probably still capable of providing meaningful psychotherapeutic outcome data. When clinicians and third-party payers practice acceptable standards of care and base treatment decisions on comprehensive data from multiple sources (including cultural factors and individual psychosocial history), then repeated administrations of the OQ-45 should provide, at the very least, an adequate marker of the direction of movement during the course of treatment. Reliability of the OQ Both test-retest and internal consistency reliability have been assessed using various subsamples of undergraduate students. Internal consistency has also been calculated on the EAP patients. Both the total score of the OQ-45 and the Symptom Distress subscale have demonstrated excellent internal consistency (above .90). More heterogeneity has

< previous page

page_838

next page >

< previous page

page_839

next page > Page 839

been found in the Interpersonal Relations and Social Role Content domains (.70 to .74), which is not surprising given the breadth of functioning that these latter subscales attempt to assess. Undergraduates who retook the OQ-45 under similar circumstances at 7-day intervals after the original administration established retest coefficients that ranged from .86 to .66. All subscale and total scores appeared to be temporally stable (Lambert, Burlingame, et al., 1996; Lambert, Hansen, et al., 1996). Validity Validity data relating to the OQ-45 is organized into two sections: concurrent validity, in which the OQ-45 is correlated with measures of similar variables; and construct validity, in which the OQ-45 is analyzed for the ability to discriminate between levels of psychopathology, its sensitivity to change, factor structure, and the relation of the subscales to the total score. Concurrent Validity. Concurrent validity was estimated across various samples studied over a 3-year period by calculating Pearson product-moment correlation coefficients (J. Cohen & P. Cohen, 1983) on the OQ-45 total score and individual subscale scores with a variety of measures thought to assess similar constructs. The initial validity studies were conducted on undergraduate students who were not in treatment. Most of the validity data have been reported in detail in two publications (Lambert, Burlingame, et. al, 1996; Umphress et al., 1997). Since the initial validity data was collected, a large-scale validity study was completed involving three clinical samples (Umphress et al., 1997). These clients were volunteers who agreed to participate in the protocol without remuneration. Subjects included individuals from the local community, and several patient samples collected from a university affiliated community mental health clinic, college counseling center, and an inpatient psychiatric unit. The community subjects were comprised of 210 persons chosen randomly from the 1994 telephone directory of a county in a western state. This sample had a mean age of 45.5 with a standard deviation of 17.3 years with an age range of from 18 to 94. The composition of the sample was 43.8% male and 56.1% female, and 95.7% Caucasian, 0.5% American Indian, 1.4% Hispanic, 1.0% Asian, and 1.4% other. Ninety-seven percent of the sample had completed high school. The counseling center sample included 53 subjects from a large private western university. Patients averaged 21.6 years of age with a standard deviation of 3.4 years. Around 23.7% were male and 77.3% female. Approximately 92.5% were Caucasian, 3.8% Hispanic, 1.9% Asian, and 1.9% other. About 18.9% of the counseling center clients were given a DSM-III-R Axis V code diagnosis. Over 9.4% were given a mood disorder diagnosis, 11.3% an anxiety disorder diagnosis, and 9.4% other miscellaneous diagnoses. Forty-three percent were not diagnosed because the patient failed to return after the intake interview. About 7.5% did not receive any diagnoses because they were judged to not meet the criteria for any DSM-III-R mental disorder or Axis V code (i.e., they had a problem requiring treatment but not meeting criteria for any specific mental disorder). The outpatient university community training clinic sample included 106 subjects. Patients averaged 30.6 years of age with a standard deviation of 10.61 years. The composition of this sample included 37.7% male and 62.3% female subjects. About 92.4% were Caucasian, 3.8% Hispanic, 1% Black, 1% Native American, and 1.9% Asian. Ninety-two percent were high school graduates. Approximately 32.1% were diagnosed

< previous page

page_839

next page >

< previous page

page_840

next page > Page 840

with a DSM-III-R mood disorder, 30.2% with a Axis V code problem, 11.3% with an anxiety disorder, 6.6% with an adjustment disorder, 3.8% with undifferentiated attention deficit disorder, 2.8% with schizophrenia, and 4.7% with some other mental disorder. Almost 5% were undiagnosed. The inpatient sample was comprised of 24 subjects. The mean age was about 32.5 years with a standard deviation of 8.55 years. About 41.6% were male, and 58.4% were female. All were Caucasian with the exception of one Native American. Seventy-nine percent were high school graduates. Over 79.1% of these patients were diagnosed with a major mood disorder such as bipolar disorder or major depression. About 12.5% received substance abuse or dependence diagnoses. The remaining 8.4% received primary diagnoses for schizophrenia or a personality disorder. The OQ-45 and several symptom distress measures were administered to each of the four samples. The other measures included the SCL-90-R, a 90-item self-report questionnaire assessing common psychiatric symptoms; the Beck Depression Inventory (BDI), a 21-item questionnaire that was developed through clinical observation of 21 attitudes and symptoms common to depressed psychiatric clients (Beck, Steer, & Garbin, 1988; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961); the State-Trait Anxiety Inventory (STAI), Form Y, a 40-item test for measuring anxiety divided into two 20-item parts (Y-1, for assessing "state" anxiety; and Y-2, for assessing "trait" anxiety); the Zung Self-rating Depression Scale (ZSDS), a widely-used 20-item questionnaire assessing the frequency of depressive symptoms, including pervasive affect, physiological symptoms, and psychological symptoms (Zung, 1965); the Zung Self-rating Anxiety Scale (ZSAS), a 20-item self-report instrument based on diagnostic criteria for anxiety (Zung, 1971); and the Taylor Manifest Anxiety Scale (TMAS), consisting of 50 anxiety-related items statistically analyzed and found to be the most indicative of manifest anxiety (Lambert, 1983). Two other instruments were used for the purpose of this study. One was the Inventory of Interpersonal Problems (IIP), a 127-item self-report scale designed to measure the type of interpersonal difficulties patients experience, as well as the corresponding degree of discomfort (Horowitz et al., 1988). The second was the Social Adjustment Scale-Self-report (SAS-SR), a 42-item self-report form covering seven role areas (work, as a worker, homemaker, or student; social and leisure activities; relationships with extended family; marital role; parental role; and family unit role) and the individual's frictions, negative feelings, and satisfaction about the given roles. The SCL-90-R, IIP, and SAS-SR were also recommended to be part of the "core battery" of psychotherapy outcome instruments by experts at the 1994 American Psychological Association Conference held at Vanderbilt University (Horowitz, Strupp, Lambert, & Elkin, 1997). The community sample was obtained by calling a random sample of residents from a county of about 200,000, using the local telephone directory. Three hundred individuals were originally contacted, and 209 returned completed packets, a return rate of 70%. Mental health and counseling centers participating in this study consisted of a university outpatient community clinic, a university counseling center, and an inpatient unit in a large western city. The return rates were 42.4%, 53%, and 24%, respectively. All scales were given to patients during the intake process prior to treatment to control for treatment effects. Although instructed to complete the scales in the order assigned, no procedures were implemented to insure that participants filled out the scales in the prescribed order. Assessment of the concurrent validity of the OQ-45 involved comparing each of the OQ-45 subscales (i.e., Symptom Distress, Interpersonal Relations, and Social Role

< previous page

page_840

next page >

< previous page

page_841

next page > Page 841

Functioning) with their respective criterion measure counterpartthe SCL-90-R, IIP, and SAS-SR. Comparisons between each of the subscales and the other criterion measures were also conducted to assess whether each of the OQ-45 subscales correlated highest with its respective criterion measure. Correlation coefficients were derived separately for each patient sample. The statistic used for this comparison was the Pearson productmoment correlation coefficient. Pearson correlations indicate high validity coefficients between the OQ-45 scores (total, SD, IR, SR) and the criterion measures (SCL-90-R, IIP, SAS-SR) across various clinical samples, suggesting high convergent validity (see Table 27.6). On average, correlations between the OQ-45 total score and the criterion measures, as well as correlations between the Symptom Distress (SD) and the criterion measures were higher than correlations between the other two OQ-45 subscales (Interpersonal Relations, Social Role) and the criterion measures (t = 3.46 or higher, df = 180; J. Cohen & P. Cohen, 1983). Although the total score and the Symptom Distress score manifested higher correlations compared to the other two OQ-45 subscales, the expected pattern of high correlations with matched criterion measure did not occur. For example, the Interpersonal Relations subscale correlated .64 with the IIP, but .63 with the GSI, and .62 with the SAS in the community clinic sample. It appears from these data (in combination with those collected from college students) that the OQ-45 has high to moderately high concurrent validity with a wide variety of measures that are intended to measure similar variables. Correlations are strongest with the total score. Clinicians can be confident that the OQ-45 total score provides an index of mental health, one that correlates quite highly with a variety of scales intended to measure symptom clusters of anxiety, depression, quality of life, social adjustment, and interpersonal functioning. The status of the three subscales is less certain. The Symptom Distress subscale correlates very highly with symptomatic disturbance (typically in the TABLE 27.6 Concurrent Correlations for the Outcome Questionnaire Total and Domain Scores with Three Frequently Used Measures of Treatment Outcome Criterion SymptomDistressInterpersonalSocial OQ Measures Relations Role Total Counseling Center (N = 53) GSI .82 .45 .55 .78 IIP .60 .49 .63 .66 SAS-SR .75 .53 .73 .79 Community Clinic (N = 106) GSI .84 .63 .55 .84 IIP .70 .64 .55 .74 SAS-SR .65 .62 .57 .71 Inpatient Unit (N = 24) GSI .92 .69 .51 .88 IIP .86 .57 .54 .81 SAS-SR .79 .69 .53 .81 Note. From "Concurrent and Construct Validity of the Outcome Questionnaire," by Umphress, V.J., Lambert, M.J., Smart, D.W., Barlow, S.H., & Clouse, G., 1997, Journal of Psychoeducational Assessment, 15 p. 47. Copyright © 1997 by Psychoeducational Assessment Corporation. Reprinted with permission. All Correlations are significant at the .01 level of confidence. Pearson correlations between the criterion measures were calculated on the three combined samples and were GSI and IIP - .73, GSI and IIP = .73, GSI and SAS = .69, and IIP and SAS = 70. GSI = Symptom Checklist 90-Revised Global Severity index. IIP = Inventory of Interpersonal Problems. SAS-SR = Social Adjustment Rating Scale-Self-report version.

< previous page

page_841

next page >

< previous page

page_842

next page > Page 842

mid-eighties). Both the Interpersonal Relations and Social Role subscales show modest correlations (.60s) with symptomatic scales, as well as with scales aimed at measuring problems in other areas of functioning. Construct Validity. To assess the discriminant validity of the OQ-45, the community and patient samples were compared to determine the OQ-45's sensitivity to psychopathology between various normal and patient groups. The statistics for this consisted of a single-factor ANOVA conducted on the total OQ-45 score and each of the subscales separately (Keppel, 1982). The analysis on the total OQ-45 scores and the symptom subscale scores found significant differences among groups. Pairwise comparisons for the total scores found that individuals from the community sample scored significantly lower on the OQ-45 total score than each of the patient samples. Additionally, there were significant differences between the patient samples. The inpatient unit sample mean OQ-45 total score was significantly higher when compared with any of the other patient sample means. With the exception of the inpatient sample, the outpatient community clinic mean scores were significantly higher than any of the other samples. Similar results were found for the subscales, with one exception. On the Social Role subscale, the university counseling center and community clinic samples did not differ significantly. All other site comparisons differed on this subscale (see Table 27.7). Further support for the construct validity of the OQ-45 was assessed by examining the community clinic data to see if significant score differences were evident between patients diagnosed with a DSM disorder and those assigned a V-code. Only this site was chosen for such a comparison because the other patient samples lacked a sufficient number of clients diagnosed with an Axis V code problem. Diagnosed patients' OQ-45 total and subscale scores prior to their first session were compared with V-code patients' scores prior to their first session. Diagnoses were made independent of information about OQ-45 scores. The diagnoses were made by the clients' therapists and confirmed by supervisors, but no reliability data were collected nor were any standard diagnostic interviews used. An independent groups t test was conducted with each of the subscales and the total score. Significant differences among groups were found for the total OQ-45 score and two of the three subscales (see Table 27.8). Differences between groups were not detected on the Interpersonal scale. Five patients were not diagnosed and therefore the total number included in this comparison is 101 rather than 106. Intercorrelations were also conducted between the various patient sample OQ-45 subscale scores to assess the scales' independence from one another. It was hypothesized that if subscales measured a unique domain, correlations between them would be low. Table 27.9 shows that all correlations (except one) were highly significant, with significant overlap between subscales. Sensitivity and Specificity As further evidence of the OQ-45's construct validity, its sensitivity and specificity were assessed. Because the OQ-45 may be used on occasion to screen medical patients as well as other samples of interest, it seemed important to document its usefulness in this task. Sensitivity is the proportion of the "true positives" that are correctly identified. The sensitivity of the OQ-45 is .84, which means that 84% of the true members of the patient group were properly classified as patients and 16% were misclassified (put in the

< previous page

page_842

next page >

< previous page

page_843

next page > Page 843

TABLE 27.7 OQ-45 Total and Subscale Raw Scores Across a Nonpatient Sample and Three Clinical Samples Sample Sites Counseling Community Inpatient Community Center Clinic Unit Domains M (N) SD M (N) SD M (N) SD M (N) SD F* Total 42.5(210)17.3 67.6 (53) 20.7 80.8 (106) 26.5 99.9(24)28.7110.8 Symptom 22.6(210)10.1 35.5 (53) 11.9 43.3 (106) 15.7 53.5(24)18.5 90.7 Interpersonal 9.3 (210) 5.2 15.9 (53) 5.6 19.2 (106) 7.5 23.2(24) 7.0 87.2 Social Role 8.9 (210) 3.6 13.2 (53) 4.8 14.5 (106) 5.2 18.2(24) 4.6 60.5 Note. From "Concurrent and Construct Validity of the Outcome Questionnaire," by Umphress, V.J., Lambert, M.J., Smart, D.W., Barlow, S.H., & Clouse, G., 1997, Journal of Psychoeducational Assessment, 15, p. 47 Copyright © 1997 by Pschoeducational Assessment Corporation. Reprinted with permission. Sample sites are ranked from lowest to highest 1 = Community, 2 = Community, 2 = Counseling Center, 3 = Community Clinic, 4 = Inpatient Unit. * p < .001 for all scores.

< previous page

page_843

next page >

< previous page

page_844

next page > Page 844

TABLE 27.8 OQ-45 Total Raw Scores for Patients Diagnosed with a DSM Disorder Versus a V-Code Problem Variable Number M SD df t value Total score: DSM 69 85.324.895 3.48* V-Code 32 66.224.0 Symptom subscale: DSM 69 46.414.795 3.79* V-Code 32 34.014.4 Interpersonal subscale: DSM 69 19.3 7.4 95 0.67 V-Code 32 18.2 7.5 Social role subscale: DSM 69 15.3 4.9 95 3.48* V-Code 32 11.6 4.0 Note. From "Concurrent and Construct Validity of the Outcome Questionnaire," by Umphress, V.J., Lambert, M.J., Smart, D.W. Barlow, S.H., & Clouse, G., 1997, Journal of Pschoeducational Assessment, 15, p. 48. Copyright © 1997 by Pschoeducational Assessment Corporation. Reprinted with permission. * t scores are significant at the .001 level. TABLE 27.9 Intercorrelations Among the OQ-45 Subscales and Total Scores across Three Clinical Samples Subscale SD IR SR Total Counseling Center SD 0.56* 0.74* 0.96* IR 0.42* 0.72* SR 0.81* Community Clinic SD 0.66* 0.65* 0.96* IR 0.47* 0.80* SR 0.75* Inpatient Unit SD 0.75* 0.67* 0.98* IR 0.39* 0.83* SR 0.72* Note. From "Concurrent and Construct Validity of the Outcome Questionnaire," by Umphress, V.J., Lambert, M.J., Smart, D.W., Barlow, S.H., & Clouse, G., 1997, Journal of Pschoeducational Assessment, 15 p. 48. Copyright © 1997 by Pschoeducational Assessment Corporation. Reprinted with permission. *Coeffients significant at the .001 level. nonpatient group) using the cutoff raw score of 63 (i.e., the cutoff score for identifying clinically significant change to be discussed shortly). Specificity is the proportion of "true negatives" that are correctly identified. The specificity of the OQ-45 is .83, meaning that 83% of the true members of the nonpatient group were placed in the nonpatient group using the cutoff score of 63. Along these

< previous page

page_844

next page >

< previous page

page_845

next page > Page 845

TABLE 27.10 Comparison of Level of Psychopathology as Measured by the OQ-45 Across Patient and Nonpatient Samples Comparison Group N M (SD) t (DF) Value Undergraduate 43846.49(19.82) 1.15 (1251) Community 81545.19(18.57) Community 81545.19(18.57)24.52*(1254) Employee Assistance 44173.61(21.39) Program Employee Assistance 44173.61(21.39) 6.05* (781) Program University Outpatient 34283.09(22.23) Clinic Note. From Administration and Scoring Manual for the OQ-45 (p. 15), by Lambert, M.J., Hansen, N.B., Umphress, V., Lunnen, K., Okiishi, J., Burliname, G.M., & Reisinger, C.W., (1996). Copyright © 1996 by American Professional Credentialing Services LLC, Stevenson, Md. Reprinted with permission. F Ratio = 274.2196 (significant. p < .001). *Significant beyond the .001 confidence level. same lines, Table 27.10 clearly demonstrates that OQ scores are reasonably distinct across patient samples. Clinically Significant Change: A Central Method for Interpretation of the OQ-45 A cutoff score has been devised between the community sample and several of the clinical samples, as this seems the most logical place to compare individuals for treatment outcome. University students were not included in this comparison as they may not be reflective of the general community population. EAP clinical data, university counseling center data, and inpatient data were not included either, as these groups do not seem reflective of the typical treatment population. Cutoffs for the OQ-45 total raw score and the subscale raw scores are as follows: Total: 63; Symptom Distress: 36; Interpersonal Relations: 15; and Social Role: 12. These cutoff values may then be used to assess client distress levels as well as monitor progress across sessions. The formula used to derive these cutoffs is:

It is recommended that the cutoff scores found within the manual be used for most general applications because they are based on large, diverse, and relatively representative samples. However, if special populations are being assessed, it may be more appropriate to construct new normative samples and compute new cutoffs. Similarly, a reliable change index (RCI) has been derived between the community and community mental health samples. The RCI index is used to determine whether the change exhibited by an individual in treatment is reliable, or clinically significant (Jacobson & Truax, 1991). In order for an individual's score to be considered clinically significantly changed, it must cross the cutoff score and have a magnitude greater than the RCI. The RCI value that has been computed between the community and outpatient clinic samples is 14, meaning that an individual's Total score must change by at least 14 points on the OQ-45 and cross the cutoff score of 63 to be considered clinically

< previous page

page_845

next page >

< previous page

page_846

next page > Page 846

significantly changed. The RCIs for each of the subscales are as follows: Symptom Distress, 10; Interpersonal Relations, 8; and Social Role, 7. The formula for computing the RCI is:

As with the cutoff score, the use of the RCI presented here is recommended for most general purposes because it is based on large and diverse normative samples. If specialized or more specific RCI values are desired, appropriate norms can be gathered and new RCI values can be derived. The position of normative groups along the continuum of OQ scores and in relation to the cutoff is illustrated in Fig. 27.1. Basic Interpretive Strategy To use the OQ-45 clinically, the clinician should consider three elements: the subjects' answers to certain select items, the total score, and the subscale scores. Interpretive graphs based on raw scores are included with the manual for plotting the total and subscale scores.

Fig. 27.1. Means, standard deviations, and 95% confidence intervals for the OQ-45 raw scores. Reprinted by permission of Lambert, M.J., & Huefner, J. APA Workshop, August 1997, Chicago.

< previous page

page_846

next page >

< previous page

page_847

next page > Page 847

Item Evaluation. The clinician should first consider patient ratings on certain select items (critical items). Item 8 is a suicide potential screening item that should be investigated further if the subject gives any rating higher than 0. Items 11, 26, and 32 are substance abuse items and also should be investigated further if ratings other than 0 (never) are given. Item 44 screens for violence at work: Any rating other than 0 (never) should be investigated for the possibility of current and/or future work conflicts that may lead to violent acts against fellow employees. Total Score (TOT). A high total score suggests that the patient is admitting to a large number of symptoms of distress (mainly anxiety, depression, somatic problems, and stress), as well as interpersonal difficulties and difficulties in their social role (such as work) and quality of life. In general, lower scores suggest that the patient is no more disturbed or distressed than the general population. An effective way to use the OQ-45 in clinical settings is to compare the patient's score with different normative samples. Ideally, normative data from inpatients, outpatients, community, and asymptomatic individuals would be available. At this time, only cutoff scores comparing patient and nonpatient samples are available for the OQ45. Cutoff scores for the total score and subscale scores were derived using the procedures suggested by Jacobson and Truax (1991). As indicated on the total score graph, the cutoff for entering the community population has been set at 63. When a patient's score falls below 63, it is more likely that they are part of the community sample than the patient sample. In addition, when a patient's score changes by more than 14 points in either direction from pretest, this change is said to be reliable. That is, changes of 14 points or more suggest movement by the patient that reliably (p < .05) exceed the measurement error of the OQ-45. Subscale Scores. To identify specific areas of difficulty, subscale scores can be consulted. The OQ-45 yields scores on three subscales: Symptom Distress, Interpersonal Relations, and Social Role. It is not possible for a patient to have a high total score without also having high subscale scores. On the other hand, a low total score does not mean that the patient does not have problems in one or more subscale domains. Research suggests that the most common disorders are anxiety disorders, affective disorders, adjustment disorders, and stress-related illness. The Symptom Distress (SD) subscale is composed of items that have been found to reflect the symptoms of these disorders, with a high score indicating that patients are bothered by these symptoms and low scores indicating either absence or a denial of the symptoms. Symptom scores correlate highly with measures of depression, such as the BDI. They also correlate highly with measures of anxiety, such as the STAI. The cutoff for this subscale was derived by the same method used for the Total score cutoff. As noted, the cutoff for Symptom Distress is 30. When a subject's score falls below this point, they are scoring similar to those people who made up the nonpatient sample. Research suggests that most patients experience difficulty in interpersonal relationships. In addition to the subjective discomfort reflected in the Symptom Distress subscale, patients complain of problems with their intimate relationships. Interpersonal Relationship (IR) items assess complaints such as loneliness, conflicts with others, and family and marital problems. High scores suggest difficulties in those areas, whereas low scores suggest both the absence of interpersonal problems as well as satisfaction with the quality of intimate relationships. Scores below the cutoff of 15 suggest that the patient is experiencing a level of satisfaction in relationships that is equivalent to normal functioning.

< previous page

page_847

next page >

< previous page

page_848

next page > Page 848

Dysfunction may extend beyond individuals' subjective sense of discomfort and beyond their closest relationships into the behaviors that are commonly expected to be manifested by adults in our society. The Social Role (SR) subscale measures the extent to which difficulties in the social roles of worker, homemaker, or student are present. Conflicts at work, overwork, distress, and inefficiency in these roles are assessed. High scores indicate difficulty in social roles, whereas low scores indicate adequate social role adjustment. Additional attention should be given to low scores to determine whether they result from social role satisfaction or from subject unemployment (e.g., the subject arbitrarily marking the items ''0" for never or not applicable). The cutoff score for SR is 15. Use of the OQ-45 For Treatment Planning The OQ-45 can be used in treatment planning if it is employed with other patient data. Human Affairs International (HAI), a large multistate managed care company, has used the OQ-45 total score at the inception of treatment to assist clinicians in initial level of care decisions. Their system is proprietary and so specific details cannot be offered. However, the generalities of procedures can be explained. HAI's system uses the OQ-45 intake score of either high, medium, or low to sort clients into categories. Other patient information, such as history of psychological treatment (e.g., no history of psychological treatment, recent inpatient care), is also used to categorize patients. Several other variables are used to categorize patients, and then all of this data is combined through algorithms to produce computer-generated suggestions to clinicians and care mangers for treatment planning or referral. Based on the picture of the patient that is collected at intake, some patients are retained in a brief therapy format (1 to 8 sessions) and others are referred for longer term outpatient treatment, medication consultation, substance abuse interventions, group therapy, and the like. Thus, the OQ-45 plays an important role in such decisions. In this context, it is considered an index of current psychopathology to be used in conjunction with clinical judgments, diagnostic formulations, and related information. As therapy continues, changes in OQ-45 scores (using intake as the baseline) are used (in conjunction with other information) to form additional algorithms for treatment planning decision making regarding the patient. For example, changes in OQ-45 scores can be used to trigger decisions regarding termination, step down to less intensive and costly treatments, or shifting to other alternate treatments such as medication. In addition, the early discovery of negative change can be highly helpful in sparking reviews of current treatment strategies, thus preventing or reducing patient drop out as well as ultimate negative effects from treatment. Informal evidence to date suggests that the best predictor of drop out from outpatient treatment, as well as ultimate patient outcome, is negative change from intake to session 3. Considerable research is necessary before it will be possible to be confident in the use of the OQ-45 for such decision making, because decisions may need to be based on the degree of acceleration in change not just its direction. This is discussed more fully when the issue of tracking patient progress is addressed. In addition to using the OQ-45 as part of the process of initial decision making, the OQ-45 can be used to help focus the treatment on specific aspects or patient difficulties. Although validity data do not provide strong support for the use of OQ-45 subtest scores, these scores can provide the clinician with clues about areas of dysfunction. Some patients,

< previous page

page_848

next page >

< previous page

page_849

next page > Page 849

for example, may express greater distress related to interpersonal functioning and others may appear to have greater dysfunction in social role performance than symptomatic discomfort. Occasionally, glancing at a patient's profile scores on each of the subscales provides a dramatic illustration of poor functioning in a particular domain. At this point in time, the OQ-45 has not been studied empirically with regard to its clinical use in treatment planning by individual clinicians. It was designed to serve the purpose of measuring patient progress and eventual outcome of mental health services. It is potentially possible that certain patterns of OQ-45 responses may coincide with a specific symptomatic presentation, but it would be difficult to justify the use of such patterns as a guide for treatment planning. The OQ-45 may provide valuable feedback on patient progress in evaluating treatment efficacy and in deciding whether to terminate or continue a current treatment protocol, but it is simply not capable (by itself) of leading an individual therapist to the most productive treatment strategy. The OQ-45 is an outcome instrument in the same manner that the MMPI-2 is a diagnostic tool. Both are invaluable within their specific arena, but much less effective beyond those boundaries. Use of the OQ-45 For Treatment Monitoring Purpose of Treatment Monitoring There are numerous reasons and motivations for monitoring the progress of psychotherapeutic interventions. Perhaps the most currently salient is the use of treatment monitoring within the parameters of managed health care. Treatment outcome or monitoring information becomes vital for determining initial approval of treatment plans, allotment of an original number of sessions, and the provision and justification of additional sessions. Additional reasons for treatment monitoring may include research evaluations, therapeutic validity testing, supervisory feedback in training settings, evaluating numerous therapeutic variables, and so forth. How to Use the OQ-45 For Treatment Monitoring The information provided by the OQ-45 becomes most meaningful when it is first administered to a patient prior to the application of any therapeutic interventions. The initial administration is best provided during the intake process. The OQ-45 takes a relatively small amount of time to complete, and therefore it should not provide much of a burden for the client. Subsequent administrations may be given weekly, at any determined midpoint intervals, and at the conclusion of treatment. Although information about improvement following a specific session may be very meaningful, what is perhaps more important is the ability to see the patterns and trends exhibited by a specific patient across the course of therapy. Considerations for Frequency of Monitoring Treatment Progress To date, the best information available concerning treatment monitoring with the OQ-45 comes from a study by Kadera, Lambert, and Andrews (1996) that attempted to better understand the relation of therapeutic units of intervention (sessions) to patient recovery

< previous page

page_849

next page >

< previous page

page_850

next page > Page 850

status. Patients were 64 adults who received psychotherapy from a university-based outpatient clinic that is used as a training facility for clinical psychology and social work graduate students. Mood disorders were the predominant diagnosis for both male and female clients; however, the sample also contained individuals with posttraumatic stress disorder, phobias, anxiety disorders, and personality disorders. Patients completed the OQ-45 prior to each weekly therapy session. Completion of the pretest occurred immediately before the first session, the first posttest then preceded the second session, the second posttest preceded the third session, and so on. This procedure was consistent with OQ-45 instructions asking patients to describe their functioning "over the last week." Patients received an OQ-45 from the clinic receptionist at the time of their appointment, completed it in a waiting area, and returned it to the receptionist before beginning their session. To summarize the outcome criteria of this study, patients were considered "recovered" when they met both of the criteria for clinically significant change by moving from the OQ-45 dysfunctional distribution into the OQ45 functional distribution (i.e., scored less than 67), and by showing positive gains of sufficient magnitude to be considered statistically reliable (improvement of at least 15 OQ-45 points).1 However, because the aim of this study was not only to assess whether a patient had recovered, but also to indicate when that recovery occurred in order to compute a dose-effect relationship, a third criterion had to be specified. Session-by-session assessment of change raised the possibility that some patients might be observed continuing in therapy after obtaining recovered status or might fluctuate between recovered and unrecovered status prior to termination. Therefore, patients were considered recovered at the earliest session at which they persistently met the criteria for clinically significant change (i.e., during the remainder of therapy they did not return to a nonrecovered status). In analyzing participant results, "recovered" patients, as discussed, met both criteria for clinically significant change. "Improved" patients met the criterion for statistical reliability by improving by at least 15 OQ-45 points but remained within the same dysfunctional or functional distribution they were in before starting therapy. "Deteriorated'' patients moved at least 15 OQ-45 points in the direction of increasing psychopathology. Patients showing "no change" did not improve or deteriorate more than 15 OQ-45 points during therapy. Results indicate 21 patients (33%) recovered, 16 patients (25%) improved, 24 patients (37%) experienced no change, and 3 patients (5%) deteriorated. Thus, about half (58%) of the patients showed reliably positive gains during therapy. The 21 recovered patients represent 47% of those categorized as dysfunctional at the beginning of treatment. These patients were used to formulate an initial dose-effect relationship. The minimum number of sessions required for patient recovery was 2, and all patients who recovered did so by 25 sessions. Approximately 14% of recovering patients were recovered by 4 sessions, 43% by 8 sessions, and 76% by 13 sessions. In the original manuscript, Kadera et al. (1996) plotted the therapeutic courses of the 21 recovered patients. These graphical representations are interesting in that they indicate that patients show not only a great variability from one another in their responses to therapy, but also show wide fluctuation in their subjective estimates of the intensity of their symptoms. For example, Patient I moved in and out of the functional distribution seven times before meeting criteria for recovery at Session 25. Patient 30 1 It should be noted that the cutoff score of 67 and RCI of 15 used in this study were based on initial calculations with initial OQ samples and are no longer the recommended levels. Recalculation using currently recommended cutoff scores had little effect on the results as they are summarized here.

< previous page

page_850

next page >

< previous page

page_851

next page > Page 851

reliably deteriorated by Session 2, then improved dramatically over the next two sessions to meet criteria for recovery by Session 4. Patient 11 made steady improvement after an initial deterioration at Session 2, met the criteria for recovery at Session 13, but continued in therapy for 16 more sessions with only minor additional benefit. The modal number of sessions a patient remained in therapy after meeting criteria for recovery was 3, with the majority continuing within a range of 0 to 5 sessions. Although neither therapists nor patients got feedback about OQ-45 scores, there was fairly high concordance between termination and "recovery." Further examination of the results indicates few patients changed in a steady linear fashion. This indicates a need for weekly or session-to-session measurements in order to accurately determine the session at which patients displaying such individualistic responses to therapy meet the criteria for clinically significant change. Potential Use and Limits for Treatment Monitoring in a Managed Care Setting Again, recall that the OQ-45 is a self-report instrument with high face validity. It is intended to measure the outcome of treatment and interventions across time. It is not designed for use as a definitive diagnostic tool, or as a means of predicting patient recovery or outcome. Further research needs to include a variety of measures of outcome rather than a single self-report scale, and should combine the views of patients and observers to effectively assess a wider range of mental health components as well as capture the different stages of change in the therapeutic process. This research should consider the most meaningful way to classify patients for comparative purposes. It would be important to know whether different patient subtypes require different dosages of therapy. Do some disorders require more treatment, on average, to effect change than others? How much more? It would also be important to know whether therapies intended to be brief are any more efficient than interventions having no theoretical or practical time constraints. What are the most efficient therapeutic interventions and modalities? Use of the OQ-45 For Treatment Outcomes Assessment General Issues Formal outcome research is a manifold enterprise ideally incorporating numerous measures of patients' subjective discomfort, expert judge ratings, physiological indices, and environmental data sources, such as employer reports of work performance, and the like (Lambert & Lambert, chap. 4). Although it is commonly accepted that such a multidimensional approach offers greatly improved means of charting patient progress in terms of both scientific rigor and comprehensive assessment, practical considerations encountered in routine clinical practice limit a clinical researcher's ability to conduct comprehensive assessments that integrate criteria from multiple sources. Whereas the distinction is somewhat artificial, there are a few clear differences between efficacy research (best represented by clinical trials and effectiveness research)

< previous page

page_851

next page >

< previous page

page_852

next page > Page 852

illustrated by program evaluations or consumer satisfaction studies (e.g., Seligman, 1995). Efficacy research typically involves a small, diagnostically homogeneous sample of patients who are assessed through the use of specialized scales. For example, a study of unipolar depressed patients would certainly include several measures of depression, most typically the BDI (a self-report scale) and the Hamilton Rating Scale for Depression (a clinician-based rating scale). In addition, a study would be likely to include a structured diagnostic interview, a measure of dysfunctional thinking, a hopelessness scale, a suicidality measure, and so forth. In contrast, effectiveness research usually involves a diagnostically heterogeneous sample and requires more broad-based measures of outcome with greater limitations with regard to the time and money that can be devoted to the research endeavor. It is in this context that the OQ-45 has its greatest applicability. Although the OQ-45 would be an appropriate scale for use in efficacy research, its greatest advantage is in settings where patients are not screened and therefore present with the widest range of difficulties, from DSM-IV V-code diagnoses to the most severe pathology. Outcome-minded third-party payers show a continued interest in brief measures of patient improvement that tap a variety of potential outcomes, without the attendant methodological complexities of formal outcome research. The OQ-45 is designed for repeated measurement of client status through the course of therapy and at termination. Ease of administration and scoring, low cost, sensitivity to changes in psychological distress over short periods of time, and an ability to tap a wide array of symptomatology and role functioning may make this instrument useful in a variety of clinical and counseling applications. As has been previously mentioned, the OQ-45 was formulated in accordance with Lambert's (1983) organizational scheme for outcome assessment, suggesting that three dimensions, or content areas, be evaluated: interpersonal (subjective discomfort) or symptomatic distress, interpersonal functioning, and social role performance. Use of this conceptualization seems justified in that its breadth affords a comprehensive review encompassing both the patient's inner life as well as functioning in applied situations like work and school. In addition, some items were included to tap positive states of mental health and life functioning. It was believed that these items would not only assess quality of life as perceived by the client, but also increase the range of measurement so that the test did not suffer from an artificially low ceiling (as is true in tests that only measure the presence or absence of psychopathology to the exclusion of aspects of healthy functioning). Essentially, the OQ-45 was developed in an attempt to bridge the gap between the demands of the health care providers and the stringent requirements of the research community. Although there are obvious shortcomings to such a compromise, the net result is a psychometrically sound instrument that can actually be used in real-world applications for the assessment of mental health care treatment outcome. Evaluation of the OQ-45 Against NIMH Criteria for Outcomes Measures Newman and Ciarlo (1994) suggested 11 criterion by which measures of outcome can be judged. These criteria were based on the recommendations of a panel of experts convened by the National Institute of Mental Health. The following summarize each criterion and provide evaluation and justification of the OQ-45's compliance:

< previous page

page_852

next page >

< previous page

page_853

next page > Page 853

Relevance to Target Group and Independent of Treatment Provided. The OQ-45 is relevant for adults age 17 and older who can read at the sixth-grade level. It is most appropriate for tracking outcome in outpatients, but can be applied with inpatients as well. Its content is related to day-today functioning and is not based on or biased toward any particular treatment theory or modality. It is as appropriate for patients undergoing psychoactive pharmaceutical interventions as it is for those undergoing psychological interventions. Simple, Teachable Methods. The OQ-45 was specifically designed for ease of administration. It is intended to be administered by a wide range of service professionals ranging from clinic receptionists to clinicians themselves. Administrative instructions are very straightforward and do not require a complex understanding of the instrument itself. Scoring may be accomplished a number of ways depending on the version of the instrument being used. The most straightforward version provides the Likert point values on the form itself, allowing it to be scored by simply transferring the point value to the appropriate subscale column (also clearly indicated), and then adding up the columns. Recent versions have been produced that can be scanned and scored by computer at clinics with the appropriate equipment. A recent commercially released software package also allows for actual online OQ45 administration that then establishes scoring, stores the data in a cumulative database for each client as well as each clinician, and provides graphic illustrations of any complete data as well as tabled treatment summaries. Use of Measures with Objective Referents. The items on the OQ-45 are based on objective constructs indicative of both quality of life and psychological symptomatology. However, the very nature of a self-report measure requires that participants establish a subjective understanding of their current condition. The OQ-45 is not exempt from this limitation. In fact, it requires not only personal conceptualization of current psychological functioning, but also a rating of intensity. Use of Multiple Respondents. The OQ-45 does not make use of multiple respondents. It is limited exclusively to the responses of the patient. More Process-Identifying Outcome Measures. Again, the OQ-45 only focuses on a subjective client understanding of current psychological functioning. It is not intended to identify the process, course, or likely outcome of a pathological condition. Were the OQ-45 designed to measure such constructs, it would likely lose many of its most desirable attributes, including ease of administration, short administration time, and straightforward scoring and interpretation. It is likely that data from repeated administrations of the OQ-45 combined with other meaningful diagnostic data and professional interpretation can provide valuable information leading to process identification. Psychometric Strength. As reported previously in the reliability, validity, and sensitivity sections of this chapter, the OQ-45 is a psychometrically sound instrument exhibiting high validity, consistent reliability, and the ability to measure client change across sessions; it also discriminates between normal, outpatient, and inpatient populations. Low Costs. One of the requirements of the OQ-45 design protocol was that it be very cost-effective, with a minimal cost per administration. Use of the OQ-45 requires a minimal licensing fee ($25 for the clinician) that allows licensees the lifetime privilege to reproduce and administer the instrument on an unlimited basis. Cost per administration

< previous page

page_853

next page >

< previous page

page_854

next page > Page 854

thus becomes limited to reproduction and administration costs (average cost appears to be about 3 cents per administration). Understanding by Nonprofessional Audiences. The OQ-45 was intended for general use in a wide range of settings, and as such was designed to be easily understood both conceptually and practically. Whereas this developmental tactic has lent itself to ease of administration, it has also been discovered that the results of OQ-45 administrations can be easily understood by patients and other nonprofessional observers when clinicians choose to share that information. Most appear to understand its utility almost as though it were a blood test being taken for analysis of current functioning. A lower score is likely to indicate better functioning and less pathology, and a high score represents some level of psychological distress. Easy Feedback and Uncomplicated Interpretation. Computer scoring as well as self-scored forms of the OQ-45 have yielded a straightforward instrument that is typically easy to interpret. Interpretation can begin with comparing the total score of one administration against the norms to establish the level of distress currently being experienced and whether this would be considered normal or abnormal. Interpretation of a single protocol can become more complicated by looking at the individual subscale domains as well as responses to individual items. However, even this level of interpretation is not very complex. Moreover, the OQ-45 is capable of presenting a slightly more complex interpretive picture when repeated measures are used to track individual client progress across sessions. This notion can be expanded to include evaluation of score profiles for a specific treatment provider, therapeutic intervention, or patient population. Feedback follows a similar course from a simple explanation of the total score to a complex statistical analysis and explanation of trends, patterns, and cycles. Useful in Clinical Services. The OQ-45 has a very useful role to play in any number of clinical settings. It can help establish levels of needed treatment, justify or nullify an extended number of sessions, track patient progress across time, monitor treatment effectiveness, and so forth. The simplicity of use, low cost, and straightforward interpretation are additional features that make the OQ-45 a very useful tool in a clinical setting. Compatibility with Clinical Theories and Practices. The OQ-45 was intentionally developed to be atheoretical with regard to psychological premises. This was done with the hope that it would allow the OQ-45 to be a powerful and meaningful instrument for any clinician to use regardless of clientele, theoretical perspective, or therapeutic style. The current research with the OQ-45 has shown that it can be used effectively in a diverse range of settings, providing meaningful if not different information in each instance. Obviously, differing theories and practical applications require varied implementation strategies. To date, the current OQ-45 appears to be flexible enough to serve these needs and demands. Research Findings Relevant to Use of the OQ-45 As an Outcomes Measure The OQ-45's construct validity depends, in part, on the ability of the OQ-45 to reflect change following interventions such as psychotherapy. Retest scores for individuals are not expected to fluctuate systematically over time, but it is expected that the scores of patients receiving psychological or psychopharmacological interventions would become

< previous page

page_854

next page >

< previous page

page_855

next page > Page 855

lower over time. Past psychotherapy research shows that most patients typically improve in therapy, and a portion improve in placebo treatments. The greatest gains are expected to take place by the eighth therapy session (Lambert & Bergin, 1994). Given the consistent nature of these past findings, the OQ-45 would be considered to have construct validity (to measure changes in level of psychological disturbance) if the scores for patients after seven sessions of therapy were lower than their pretherapy levels. This hypothesis was tested by following a subset of patients in treatment at the outpatient clinic. Of the 76 patients who took the OQ-45 prior to entering therapy, 40 patients had at least seven therapy sessions. As expected, a t test between the means of the patient pretest scores and their posttest scores after seven sessions of therapy revealed statistically significant improvement. This same data set was used to evaluate session-by-session change. The results suggest that the OQ-45 performs as an effective method of monitoring patient change. Data from the Kadera study showing mean group total score change over time is presented in Fig. 27.2. This data is presented in terms of the percentage of patients who met criteria for clinically significant change. As can be seen, 22% of patients met criteria for clinically significant change by the eighth session of therapy when drop-outs are included in the analysis. This same data can be contrasted with a similar graph based on asking a nonpatient group of college students to take the OQ-45 each week over 6 weeks. These students, who were enrolled in a psychology class, did not as a group change over time to the same degree

Fig. 27.2. Comparison of weekly OQ raw scores of treated patients and nonpatient controls who completed the OQ every week. Reprinted by permission of Lambert, Cattani-Thompson, Nebeker, Andrews, Kadera, and Erekson. Paper presented at the annual meeting of the Society for Psychotherapy Research, June 1996, Amelia Island, FL.

< previous page

page_855

next page >

< previous page

page_856

next page > Page 856

as those who were treated (Fig. 27.2). Both lines of improvement were based on the use of Hierarchal Linear modeling. The results suggest that patients improve significantly more rapidly (Lambert, Cattani-Thompson, Nebeker, Andrews, Kadera, & Erekson, 1996). Clinical Applications of the Instrument for Outcomes Assessment A pilot study of change in persons seeking or being referred for help in employee assistance programs managed by HAI provides interesting data on change (Lambert & Huefner, 1996). Across the country, 150 sites provided data, but no attempt was made to collect OQ-45 data on every employee that asked for assistance. It was possible to collect data on 3,302 patients who took the pretest and had at least two therapy visits. The maximum number of visits was 10. Pretreatment scores from 2,100 patients placed them in the dysfunctional range. Their pretreatment mean total score was 84.14 (SD = 15.82) with a range of 64 to 148, and the mean total score at posttreatment was 70.81 (SD = 22.46) with a range of 6 to 150. These patients had a mean of 3.9 sessions of treatment. The number of subjects who met criteria for clinically significant improvement (i.e., passing the total score cutoff [63] and improving by the RCI [14]) when summarized suggests that patients improve in very brief treatments even when the standard of improvement is rigorous. The total number of subjects who significantly improved was 30% within 10 sessions. After one session, 107 recovered, after two sessions 147, after three sessions 110, four sessions 82, and after five sessions 57, with 124 more improving through the tenth session. An additional 5 of the 58 patients (8.6%) improved by at least 14 points but did not pass the cutoff. Two percent deteriorated (i.e., at least 14-point increase) by the end of therapy whereas about 69% did not show substantial benefit. Of those patients in the functional range to begin with (36%), about 50% improved by at least 14 points but, of course, could not pass the cutoff. Another way to characterize change following therapy is displayed in Fig. 27.3. Figure 27.3 uses sloping procedures to show change on OQ-45 scores after time in reference to entry into the ranks of the nonpatient sample. This graph shows that there is a relation between severity of disturbance (initial OQ-45 elevation) and number of sessions to (group) recovery. When patients are grouped by the number of sessions they had, it appears that these groups are rank ordered in regard to initial test scores. Patients in this database were drawn from an EAP sample similar to that described by Lambert and Heufner (1996). Use of Findings from the OQ-45 With Other Evaluation Data Lunnen and Ogles (1998) reported the only study to date that has simultaneously used the OQ-45 and other measures of outcome. The purpose of their study was to explore the practical meaning of cutoff scores and criteria for clinically significant improvement. They compared the perceived level of change as subjectively reported from three distinct perspectives (patient, therapist, and significant other). They also compared reports of the therapeutic alliance and satisfaction across outcome groups. The results of this study suggested that those patients who were classified as improved (20-point positive change on the OQ-45 total score) also were rated as most improved on therapist and client ratings of perceived change. They also tended to have higher alliance scores. Surprisingly,

< previous page

page_856

next page >

< previous page

page_857

next page > Page 857

Fig. 27.3. Relationship between number of sessions of therapy, pretest OQ-45 raw score, and rapidity of improvement. Reprinted by permission of Lambert, M.J., & Huefner, J. APA Workshop, August 1997, Chicago. perhaps, satisfaction scores for the most part did not distinguish between improvers, no-changers, and deteriorators. Provision of Feedback Regarding Outcomes Assessment Findings Feedback based on the results of OQ-45 administrations may be used in a wide range of applications. Frequently, clients will ask what purpose the measure serves and inquire as to their personal results. The course of action to be followed here is typically left for the clinician to determine, and may even include a full disclosure of the results (see the case study later for an example of results shared with a client). Such an inquiry is essentially the equivalent of a client asking the question, "How am I doing. . . . Am I getting better?" and should be handled accordingly on a case-by-case basis. Charting the progress of a specific client may also be quite informative to a clinician and can even provide validating feedback as to therapeutic setbacks, stagnation, or rate and pattern of progress. For a clinician or a third-party provider, the most meaningful feedback is typically provided by an aggregate of clients and sessions. Once OQ-45 results have been accumulated across multiple clients and sessions, the resulting data may provide critical feedback on the progress of patients, typical patterns of improvement for the patients of different clinicians, and the effectiveness of treatments found in various hospitals and regions. Figure 27.4 presents the outcome for a particular therapist across 37 consecutive patients compared to outcome as reported by Kadera, Lambert, and Andrews (1995).

< previous page

page_857

next page >

< previous page

page_858

next page > Page 858

Fig. 27.4. Recovery in patients seen by an experienced therapist or therapists in training as measured by the OQ-45. Reprinted from Lambert, Okiishi, Finch, and Johnson (1998), Professional Psychology: Practice and Research. Reprinted with permission. These results suggest a therapist whose patients have unusually rapid gains. These results have been discussed elsewhere (Lambert, Okiishi, Finch, & Johnson, 1998) To date, the most effective means of accessing this vital information is through the use of the computerized administration and scoring program. Clients can take the OQ45 on the computer terminal itself, or a clinician may enter responses or score totals from a completed profile. The program will then provide tabled results describing clinician or clinic efficacy in terms of percentage of clients improved in relation to expected levels of success based on patient case mix. An example of the patient record that forms the basis for clinic reports is provided in Fig. 27.5 The bar graphs at the bottom trace the patient's progress over time. As can be seen, the patient has worsened from the initial consultation crossed the cutoff 63 once. The material in the top half of the report provides information on the most recent testing including responses to critical items. Limitations/Potential Problems in the Use of the OQ-45 For Outcomes Assessment High intercorrelations were found for the OQ-45 subscales, implying that the subscales share considerable variance and may be mainly measuring the same underlying sources of variance. These findings are similar to those found by Lambert, Burlingame, et al. (1996) with a student sample, and they mirror the findings of analyses conducted on frequently used tests such as the MMPI (Butcher, Graham, Williams, & Ben-Porath, 1989; W.G. Dahlstrom, Welsh, & W.G. Dahlstrom, 1972), SCL-90-R (Cyr, Doxey, & Vigna, 1988; Cyr, McKenna-Foley, & Peacock, 1985), MCMI (Millon, 1983), BSI

< previous page

page_858

next page >

< previous page

page_859

next page > Page 859

Fig. 27.5. Fictitious patient record tracking progress across sessions and reporting patient status at most recent testing. (Boulet & Boss, 1991), and IIP (Horowitz et al., 1988). For example, despite its prominence as a diagnostic tool and outcome measure, research with the original MMPI has found the intercorrelations between many of the scales to be quite high. Scales 7 (Psychasthenia) and 8 (Schizophrenia) have correlations ranging from .64 to .87, depending on the population sampled (Butcher et al., 1989; W.G. Dahlstrom et al., 1972). Factor analysis of the OQ-45 (Mueller, Lambert, & Burlingame, 1998) supports a three-factor structure for the OQ-45; however, the data suggests that a single-factor structure is equally plausible. Despite such high intercorrelations, researchers have provided rationales for why the scales may still make unique clinical contributions. For example, McKinley and Hathaway (1944) found substantial overlap between the Hysteria and Hypochondriasis clinical scales, which correlated .71. W.G. Dahlstrom et al. (1972) recounted that "careful

< previous page

page_859

next page >

< previous page

page_860

next page > Page 860

examination of the clinical contributions of each of these scales led McKinley and Hathaway to retain both scales . . . such as the fact that 32% of the cases of conversion hysteria had scores beyond the arbitrary cutting score on Hysteria, while being missed by the Hypochondriasis scale'' (pp. 23-24). It could be argued, therefore, that before discounting the OQ-45 subscales' utility based on their statistical nonindependence, further research should be conducted to ascertain whether such unique contribution from the scales occurs. Despite such rationale for the potential utility of OQ-45 subscales, it is difficult to argue for their present utility. Surveying Table 27.3, it is easily noted that most of the variance is accounted for by the Symptom Distress subscale, which argues that the OQ-45 is best conceived of as a measure of general distress. The implications of this conclusion are that until further research indicates the unique utility of the OQ-45 subscales, the total score should be used by clinicians and researchers. High correlations were found for the OQ-45 Symptom Distress, Interpersonal Relations, and Social Role subscales with their respective criterion measures, the SCL-90-R, the IIP, and the SAS-SR. All correlations were significant and suggest that the OQ-45 has good concurrent validity. With the various clinical samples, the Symptom Distress subscale correlated highest with the SCL 90-R GSI. However, the Interpersonal Relations subscale did not consistently correlate highest with its expected criterion measure, the IIP. Although the SAS-SR correlated highest with the Social Role subscale across sites, it should be noted that all correlations were high and the differences between correlations were not statistically significant. The overall high correlations between each of the OQ-45 subscales and the other supposedly nonanalogous criterion measures indicate that the respective subscales of the OQ-45 are not measuring highly distinct characteristics of patient functioning. Such results are not entirely surprising, as similar results have been found in earlier research on the OQ-45 with a student sample (Lambert, Burlingame, et al., 1996), as well as personality inventories and other short-form outcome measures that were developed as unique subscales for instruments such as the MMPI (W.G. Dahlstrom et al., 1972), MCMI (Millon, 1983), SCL-90-R (Brophy, Norvell, & Kiluk, 1988), the Brief Symptom Inventory (Boulet & Boss, 1991), IIP (Horowitz et al., 1988), and HSCL-21 (Deane, Leathem, & Spicer, 1992). For example, in Horowitz et al.'s study (1988), the Pearson correlation between the total score on the IIP and that on the SCL-90-R was .64. The scores on the subscales of each scale also correlated highly with the SCL-90-R; most were above .40. In a study by Brophy et al. (1988), the SCL-90-R subscales were compared with the Beck Depression Inventory (BDI). Although the SCL-90-R Depression subscale correlated highest with the BDI at .73, all other subscales correlated significantly with the BDI. Some of the more highly correlated subscales included: Obsessive Compulsive (.62), Anxiety (.59), Paranoid Ideation (.57), and Psychoticism (.57). Brophy and colleagues (1988) also found that SCL-90-R subscales and MMPI clinical scales were correlated significantly except for Masculine-Feminine and Mania, which suggests considerable overlap between the various subscales. Although these various instruments have been devised to measure distinct areas of functioning and are regarded by many psychologists to be the best in their respective domains, very high correlations are found between them (Strupp, Horowitz, & Lambert, 1997). Similar results are found when examining the correlations between the SCL-90-R, IIP, and SAS-SR with the present data. Correlations of .73 and .69 were found between the SCL-90R and the IIP and SAS-SR, respectively. The IIP and SAS-SR correlation was .70.

< previous page

page_860

next page >

< previous page

page_861

next page > Page 861

The literature provides an explanation for high subscale intercorrelations and nonanalogous instrument correlations. Lambert and Hill (1994) found that individuals who are experiencing psychological distress suffer in their personal relationships as well as work functioning. Additional research has suggested that interpersonal problems are related to internal distress (Horowitz et al., 1988). Thus, these areas of impairment appear to accompany one another regardless of how much test developers may wish them to be separate domains. Lambert, Burlingame, et al. (1996) provided the following rationale after reporting similar findings with a nonclinical college student sample: It is likely that these three domains co-vary so highly that they cannot be statistically separated. It appears that the Subscale scores, despite differences in content, may not provide distinct information. . . . It is important to note here that the OQ-45 was not designed as a multi-trait test such as the MMPI. The OQ-45 was instead designed as a single measure of patient recovery in mental health contexts that draws items from three theoretically interrelated domains, a presupposition reflected by its high internal consistency. (pp. 28-29) Consideration of such "interrelated domains" is important in psychotherapy outcome studies, as they allow for a more complete picture of the patient's overall functioning. Interventions can have indirect effects (on interpersonal and work functioning), as well as direct effects on "psychological symptoms"; thus, all such areas of functioning need to be considered despite their lack of statistical independence. The OQ-45 subscales present an as-yet-untested theoretical perspective on patient distress and outcome measurement; however, the inability of the subscales to assess unique domains, combined with high intercorrelations of the OQ-45 subscales suggest that clinicians should look to the patient's total score as a measure of their general distress level first and foremost. At present, patients' subscale scores may have limited interpretive ability for the clinician. Potential Use as a Data Source for Mental Health Service Report Cards As the costs of mental health care continue to rise, third-party insurers are more frequently requiring health care providers to document therapeutic progress. This is a controversial requirement as stringent demands for "accountability" leave clinicians in fear of losing their livelihood. Such fears typically center around losing jobs or positions on preferred provider panels if patients do not exhibit substantial improvements in brief periods of time. Clinicians are further concerned that such demands fail to take into account the severity of client pathology and are based on theoretical notions and research results that have minimal real-world utility (Wells, Burlingame, Lambert, Hoag, & Hope, 1996). Fortunately the future does not have to be this discouraging. As better empirical data on psychotherapy outcome is being gathered with more valid measures and instruments, there is reason for optimism. Rather than establishing unrealistic expectations, the results of outcome research are beginning to provide a realistic understanding of psychotherapeutic treatment. Furthermore, these results appear to be more in line with the actual experiences of providers rather than merely assumed limits and restraints of health care management and provider systems. This does not mean that clinicians in this age of accountability are free to do as they choose, but it does appear to indicate

< previous page

page_861

next page >

< previous page

page_862

next page > Page 862

that the marriage of outcome research and managed health care will not result in the grim demise that has been predicted. Outcome measures such as the OQ-45 can provide clinicians with a very real picture of how their services are benefitting others. In Wells et al. (1996), two different scenarios for implementation as a "report card" measure are illustrated. The OQ-45 is completed for each client during the original intake process and at scheduled points across sessions to track therapeutic progress. Within the clinician's office, the measure is scored and entered into the client's chart. This information on each client may then be used as: an intake measure of initial severity of symptoms and index of risk factors that will provide a more realistic picture of client presentation and potentially moderated improvement/recovery expectations, a tracking device for change, and a potential summary source for demonstrating the effectiveness of therapeutic interventions. At a corporate level, the results of each OQ-45 administration are gathered into a central data bank for storage, interpretation, and feedback. The results may then be analyzed for: reporting therapeutic efficacy to subscribers and/or profiling individual providers; establishing decision algorithms to empirically determine appropriate session limits (e.g., expectancy tables); and answering further research questions, such as evaluating the efficacy of innovative approaches to treatment. Again, the true value of an instrument such as the OQ-45 lies in the ability to provide valid, empirical feedback to patients, providers, and third parties about the efficacy of treatments. Hopefully, in spite of methodological limitations, outcome measures such as the OQ-45 will facilitate the aims of clients, clinicians, and policymakers. Figure 27.6 provides a factitious report generated by the computer software version of the OQ. The report can be generated at any time by simply choosing the report type that is wanted and clicking the choice. The data displayed is from the clinic report card and summarizes outcomes across clinics within a hospital chain. In this particular report, no adjustment has been made for the severity of the treatment population, although it is clear that differences in pretest OQ-45 scores are present. Future versions of the software will report data in the form of expected level of recovery and actual recovery markers. Case Study Larry Jensen is a 22-year-old, White male currently attending a large private western university. Larry was diagnosed with obsessive-compulsive disorder and major depressive disorder, recurrent. He frequently experienced unwanted sexual thoughts that created a great deal of anxiety within him. Eventually the anxiety would become overwhelming, driving Larry to engage in a complex compulsive ritual of praying, repeating a memorized poem a fixed number of times, saying specific words a set number of times in a specified order, and reading six pages in his Bible. There were days that Larry would spend up to 8 hours engaging in these compulsive behaviors. This would present an obvious drain on his time, making it difficult for him to succeed in his college classes or to hold a steady job. From time to time his frustration and anxiety became so overwhelming that he would be overcome with a severe sense of hopelessness and despair. At such times he seriously considered suicide as a potential solution to his problems.

< previous page

page_862

next page >

< previous page

page_863

next page > Page 863

Fig. 27.6. Report contrasting expected and observed outcome across clinics within a health maintenance organization. When Larry first came to the campus mental health services clinic, he was experiencing a great deal of psychological distress and was mildly suicidal. He completed an OQ-45 and a BDI as part of the intake process. His total OQ-45 score was 115, indicating excessive duress, which was further substantiated by a BDI score of 31. He met with a therapist for 1 hour, and the session largely focused on his suicidality. The following week Larry again completed the OQ-45 prior to his therapy session. This time his score was 93, indicating some improvement since the previous week; however, this score was still well within the clinical range. Over the next two sessions, Larry's OQ-45 score continued to decline by five points per session until at the fifth session his OQ-45 score was 78. To this point, Larry was progressing well in therapy and the reductions in his OQ-45 score appeared to be an accurate reflection of his improving psychological functioning (see Fig. 27.7). Prior to his sixth session, Larry again completed the OQ-45 and this time received a total score of 23. This represented a drop of 55 points from the previous week, a level of progress that seemed highly unlikely. As the session continued, Larry claimed that

< previous page

page_863

next page >

< previous page

page_864

next page > Page 864

Fig. 27.7. Outcome for "Larry" showing OQ-45 raw scores across 14 treatment sessions. he was doing great, that he was no longer obsessing, and that he believed he was cured. At this point his therapist decided to share some of the results of his OQ-45 administrations with him. The therapist explained that such an immense and dramatic change was pretty atypical and probably indicated a strong desire to improve and to no longer feel dependent on therapy for change, rather than an actual cure. Larry was skeptical but agreed to consider the feedback and return for an appointment the next week. When Larry returned the next week, his OQ-45 total score was 82. He came into the session somewhat sheepishly and said that he had just experienced a pretty rough week of obsessing and that he was not yet "cured." He then explained to the therapist that he had done this same thing with three previous therapists. The therapist explained that his reaction the previous week was perfectly understandable and explained how a client will sometimes step into this "flight into health" because they want so badly to return to normal functioning. Larry ended up completing 14 sessions with his therapist. Following this "flight into health," his progress in therapy was largely unremarkable. His OQ-45 total score continued to decline weekly until the total score at the final session was 52. At this point, Larry had almost completely eliminated his obsessive-compulsive behavior and was no longer depressed. He knew that he needed to continue working on this behavior, but felt that he would like to try handling it on his own and was optimistic that he would be able to do so.

< previous page

page_864

next page >

< previous page

page_865

next page > Page 865

Conclusions The OQ-45 is a brief self-report instrument designed for repeated measurement of client status through the course of therapy and at termination. Ease of administration and scoring, low cost, sensitivity to changes in psychological distress over short periods of time, and the ability to tap a wide array of symptomatology and aspects of role functioning may make this instrument useful in a variety of clinical and counseling applications. The OQ-45 was formulated in accordance with Lambert's (1983) organizational scheme for outcome assessment, suggesting that three dimensions or content areas be evaluated: interpersonal (subjective) discomfort or symptomatic distress, interpersonal functioning, and social role performance. Use of this conceptualization seems justified, in that its breadth affords a comprehensive review that encompasses inner life as well as progress in applied situations like work and school. In addition, some items were included to tap positive states of mental health and life functioning. It was believed that these items would not only assess quality of life as perceived by the client, but also increase the range of measurement so that the test did not suffer from an artificially low ceiling, as is true in tests that only measure the presence or absence of psychopathology rather than aspects of healthy functioning. To this point, research has provided support for the validity of the OQ-45 as a measure of psychological distress. In particular, the construct validity of the OQ-45 was supported as the OQ-45 total and Symptom Distress scores were shown to be sensitive to varying levels of psychopathology. The concurrent validity of the OQ-45 was also supported across several patient populations through the comparison of subscales with well-known criterion measures. In psychotherapy outcome research and clinical settings where the efficacy of psychotherapy and other treatment services are being assessed, an overall measure of psychological distress is often sought as a single summary indicator of patient distress and psychopathology. The OQ-45 has the advantage of being brief and psychometrically sound. Research has also indicated that the OQ-45 is an effective instrument for the purpose of tracking patient progress during and after treatment (Kadera et al., 1996). Because psychotherapy should serve to decrease the client's levels of distress, the scores on the OQ-45 should decline over treatment sessions. Changes in distress following therapy would therefore bolster the construct validity of the OQ-45. That such change is measurable with the OQ-45 was confirmed following seven sessions of individual psychotherapy. Essentially, the data presented to this point suggest that the items that make up the OQ-45 are rated differently over time by a majority of treatment participants. Problems typically noted with brief self-report tests (Boulet & Boss, 1991; Derogatis, 1977) apply to the OQ-45 as a measure of patient distress and deserve mention. First, interpretation of OQ-45 scores typically relies on the assumption that the client will be accurate in the assessment and reporting of their mental or emotional states. Either because of acquiescence, carelessness, boredom, lack of understanding, psychoticism, or numerous other factors, the clients' responses may not be congruent with how they are really feeling. The OQ-45 has no control for response sets, of which social desirability is likely to be the most common and problematic. Although this problem may seem serious, it is a general problem in the scales that have typically been used to assess outcome. Research suggests no systematic bias as a consequence of using such self-report scales (e.g., Ogles, Lambert, & Masters, 1996) in outcome studies. However, it seems imperative to use the OQ-45 only in settings in which clients are motivated to accurately report their psychological state.

< previous page

page_865

next page >

< previous page

page_866

next page > Page 866

Research has also indicated several additional limitations of the OQ-45 and suggests directions for future research. First, past investigations have included criteria (SCL-90-R, IIP, and SAS-SR) based on self-report. Validation research that includes other sources of information is needed. In particular, assessment of interpersonal problems can be provided through spouse, family, or roommate ratings of functioning. The social role domain can be assessed by measures of success at work or school, along with workmate ratings of performance and functioning. Symptomatic distress can be measured through clinician-based or behavioral measures of symptomatic states such as anxiety and depression. This type of research would allow further validation of subscales without the confound of the source of status/performance ratings. The redundancy between subscales of the OQ-45 and criterion measures relevant to the subscales can then be interpreted in light of judgments from other sources. A second limitation is the unknown reliability of patient diagnoses. Future research should make greater efforts to insure diagnostic accuracy through the use of structured diagnostic interviews. Whereas the results of research to date are not at all inconsistent with expectations about the relation of the OQ-45 with diagnostic formulations, more rigorous diagnostic efforts might show the OQ-45 to have even greater ability to differentiate levels of psychopathology. Future research is also needed to investigate the relation between social class and scores on the OQ-45. Current studies have not collected such data, but there could be differences between treatment centers on this dimension. Because social class is a variable that is commonly related to measures of pathology, these data are essential to a full understanding of the meaning of OQ-45 scores. More than a dozen studies on different aspects of the OQ-45 are currently underway. Each should strengthen its value for assessing treatment outcome. References Ahmed, T., & Smith, R. (1991). Impacts of managed health care employee assistance programs on costs and utilization. A report prepared for Aetna Health Plans. Andrews, F.M., & Witney, S.B. (1974). Developing measures of perceived life quality: Results from several national surveys. Social Indicators Research, 1, 1-26. Beck, A.T., Steer, R.A., & Garbin, M.G. (1988) Psychometric properties of the Beck Depression Inventory: Twenty-five years later. Clinical Psychology Review, 8 77-100. Beck, A.T., Ward, C.H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives for General Psychology, 4, 53-63. Beiser, M. (1983). Components and correlates of mental well-being. Journal of Health and Social Behavior, 15, 320-327. Blau, T.H. (1977). Quality of life, social interaction, and criteria of change. Professional Psychology, 8, 464473. Bloom, A. (1987). Liability concern of utilization review and quality assurance programs. HMO, 1, 128-133. Boulet, J., & Boss, M.W. (1991). Reliability and validity of the Brief Symptom Inventory. Journal of Consulting and Clinical Psychology, 61, 433-437. Brokowski, A. (1991). Current mental health care environments: Why managed care is necessary. Professional Psychology: Research and Practice, 22, 6-14. Brophy, C.J., Norvell, H.K., & Kiluk, D.J. (1988). An examination of the factor structure and convergent and discriminant validity of the SCL-90-R in an outpatient clinic population. Journal of Personality Assessment, 52, 334-340. Burlingame, G.M., Lambert, M.J., Reisinger, C.W., Neff, J., & Mosier, J. (1995). Pragmatics of tracking mental health outcomes in a managed care setting. Journal of Mental Health Administration, 22, 226-236.

< previous page

page_866

next page >

< previous page

page_867

next page > Page 867

Butcher, J.N., Graham, J.R., Williams, C.L., & Ben-Porath, Y.S. (1989). Development and use of the MMPI-2 content scales. Minneapolis: University of Minnesota Press. Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Cyr, J.J., Doxey, N.C., & Vigna, C.M. (1988). Factorial composition of SCL-90-R. Journal of Social Behavior and Personality, 3, 245-252. Cyr, J.J., McKenna-Foley, J.M., & Peacock, E. (1985). Factor structure of the SCL-90-R: Is there one? Journal of Personality Assessment, 49, 571-578. Dahlstrom, W.G., Welsh, G.S., & Dahlstrom, W.G. (1972). An MMPI handbook: Vol. 1. Clinical interpretation. Minneapolis: University of Minnesota Press. Deane, F.P., Leathem, J., & Spicer, J. (1992). Clinical norms, reliability and validity for the Hopkins Symptom Checklist-21. Australian Journal of Psychology, 44, 21-25. Derogatis, L.R. (1977). The SCL-90 manual: Scoring, administration and procedures for the SCL-90. Baltimore: Johns Hopkins University School of Medicine, Clinical Psychometrics Unit. Diener, E. (1984). Subjective well-being. Psychological Bulletin, 95, 542-575. Feldman, L.A. (1993). Distinguishing depression and anxiety in self-report: Evidence from confirmatory factor analysis on non-clinical and clinical samples. Journal of Consulting and Clinical Psychology, 61, 631-638. Frisch, M.B., Cornell, J., Villaneuva, M., & Retzlaff, P.J. (1992). Clinical validation of the Quality of Life Inventory: A measure of life satisfaction for use in treatment planning and outcome assessment. Psychological Assessment, 4, 92-101. Froyd, J.E., Lambert, M.J., & Froyd, E. (1996). A review of practices of psychotherapy outcome measurement. Journal of Mental Health, 5, 11-15. Horowitz, L.M. (1979). On the cognitive structure of Interpersonal Problems treated in psychotherapy. Journal of Consulting and Clinical Psychology, 47, 5-15. Horowitz, L.M., Locke, K.D., Morse, M.B., Waikar, S.V., Dryer, D.C., Tarnow, E., & Ghannam, J. (1991). Selfderogations and the integration theory. Journal of Personality and Social Psychology, 61, 68-79. Horowitz, L.M., Rosenberg, S.E., Baer, B.A., Ureno, G., & Villasenor, V.S. (1988). Inventory of Interpersonal Problems: Psychometric properties and clinical applications. Journal of Consulting and Clinical Psychology, 56, 885-892. Horowitz, L.M., Strupp, H.H., Lambert, M.J., & Elkin, I. (1997). Overview and summary of the core-battery conference. In H.H. Strupp, L.M. Horowitz, & M.J. Lambert (Eds.), Measuring patient changes in mood, anxiety, and personality disorders: Toward a core battery (pp. 11-54). Washington, DC: American Psychological Association. Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Kadera, S.W., Lambert, M.J., & Andrews, A.A. (1996). How much therapy is really enough? A session-bysession analysis of the psychotherapy dose-effect relationship. Journal of Psychotherapy Practice and Research, 5, 132-151. Keppel, G. (1982). Design and analysis (2nd ed.). Engelwood Cliffs, NJ: Prentice-Hall. Kopta, S.M., Howard, K.I., Lowery, J.L., & Beutler, L.E. (1994). Patterns of symptomatic recovery in psychotherapy. Journal of Consulting and Clinical Psychology, 62, 1009-1016. Lambert, M.J. (1983). Introduction to assessment of psychotherapy outcome: Historical perspective and current issues. In M.J. Lambert, E.R. Christensen, & S.S. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 3-32). New York: Wiley. Lambert, M.J., & Bergin, A.E. (1994). The effectiveness of psychotherapy. In A.E. Bergin & S.L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 143-189). New York: Wiley. Lambert, M.J., Burlingame, G.L., Umphress, V.J., Hansen, N.B., Vermeersch, D., Clouse, G., & Yanchar, S. (1996). The reliability and validity of a new psychotherapy outcome questionnaire. Clinical Psychology and Psychotherapy, 3, 249-258. Lambert, M.J., Cattani-Thompson, K.C., Nebeker, R.S., Andrews, A.A., Kadera, S., & Erekson, K. (1996, June). The retest artifact

< previous page

page_868

next page > Page 868

and its implications for establishing dose-effect estimates of patient change following psychotherapy. Paper presented at the annual meetings of the Society for Psychotherapy Research, Amelia Island, FL. Lambert, M.J., Hansen, N.B., Umphress, V., Lunnen, K., Okiishi, J., Burlingame, G.M., & Reisinger, C.W. (1996). Administration and scoring manual for the OQ-45.2. Stevenson, MD: American Professional Credentialing Services LLC. Lambert, M.J., & Hill, C.E. (1994). Assessing psychotherapy outcomes and processes. In A.E. Bergin & S.L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 72-113). New York: Wiley. Lambert, M.J., & Huefner, J.C. (1996). Measuring clinically significant improvement in the EAP environment. EAP Exchange, 6, 22-23. Lambert, M.J., & Huefner, J.C. (1997). Measuring outcomes in clinical practice. A workshop presented at the annual meetings of the American Psychological Association, Chicago, IL. Lambert, M.J., Ogles, B.M., & Masters, K.S. (1992). Choosing outcome assessment devices: An organizational and conceptual scheme. Journal of Counseling and Development, 70, 527-532. Lambert, M.J., Okiishi, J.C., Finch, A.E., & Johnson, L., (1998). Outcome assessment: From conceptualization to implementation. Professional Psychology: Practice and Research, 29, 63-70. Lunnen, C., & Ogles, B.M. (1998). A multi-perspective, multi-variable evaluation of reliable change. Journal of Consulting and Clinical Psychology, 66, 400-410. McKinley, J.C., & Hathaway, S.R. (1944). The MMPI: Hysteria, hypomania, and psychopathic deviate. Journal of Applied Psychology, 28, 153-174. Meuller, R.M., Lambert, M.J., & Burlingame, G.M. (1998). Construct validity of the Outcome Questionnaire: A confirmatory factor analysis. Journal of Personality Assessment. Millon, T. (1983). Millon Clinical Multiaxial Inventory manual (3rd ed.). Minneapolis: National Computer Systems. Mirin, S., & Namerow, M. (1991). Why study treatment outcome? Hospital and Community Psychiatry, 42, 1007-1013. Moses-Zirkes, S. (1993, March). Outcome research: Everybody wants it. American Psychological Association Monitor, 24(3), 1. Nebeker, R.S., Lambert, M.J., & Heufner, J.C. (1995). Ethnic differences on the Outcome Questionnaire. Psychological Reports, 77, 875-879. Nebeker, R.S., Seely, K., Stephenson, A., & Lambert, M.J. (1996, April). Polynesian, Asian, and Caucasian differences on the Outcome Questionnaire. Paper presented at the annual meeting of the Rocky Mountain Psychological Association. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessments. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Ogles, B.M., Lambert, M.J., & Masters, K.S. (1996). Assessing outcome in clinical practice. New York: Allyn & Bacon. Park, K.B., Upshaw, H.S., & Koh, S.D. (1988). East Asian's responses to Western health items. Journal of Cross-Cultural Psychology, 21, 423-427. Regier, D.A., Boyd, J.H., Burke, Jr., J.D., Rae, D.S., Myers, J.K., Kramer, M., Robins, L.N., George, L.K., Karno, M., & Locke, B.Z. (1988). One-month prevalence of mental disorders in the United States. Archives of General Psychiatry, 45, 977-986. Richardson, L.M., & Austad, C.S. (1991). Realities of mental health practice in managed care settings. Professional Psychology: Research and Practice, 22, 52-59. Sabin, J.E. (1991). Clinical skills for the 1990's: Six lessons from HMO practice. Hospital and Community Psychiatry, 42, 605-608. Seligman, M.E.P. (1995). The effectiveness of psychotherapy: The Consumer Report's study. American Psychologist, 50, 965-974. Strupp, H.H., Horowitz, L.M., & Lambert, M.J. (1997). Measuring patient changes in mood, anxiety & personality disorders: Toward a core battery. Washington, DC: American Psychological Association. Sue, D., & Sue, S. (1987). Cultural factors in the clinical assessment of Asian Americans. Journal of Consulting and Clinical Psychology, 55, 479-487.

Sue, S., & Sue, D. (1974). MMPI comparisons between Asian-American and non-Asian students utilizing a student health psychiatric clinic. Journal of Counseling Psychology, 21, 423-427.

< previous page

page_868

next page >

< previous page

page_869

next page > Page 869

Tsushima, W.T., & Onorato, V.A. (1982). Comparison of MMPI scores of White and Japanese-American medical patients. Journal of Consulting and Clinical Psychology, 50(1), 150-151. Umphress, V.J., Lambert, M.J., Smart, D.W., Barlow, S.H., & Clouse, G. (1997). Concurrent and construct and validity of the Outcome Questionnaire. Journal of Psychoeducational Assessment, 15, 40-55. Veit, C.T., & Ware, J.E. (1983). The structure of psychological distress and well-being in general populations. Journal of Consulting and Clinical Psychology, 51, 730-742. Wells, M.G., Burlingame, G.M., Lambert, M.J., Hoag, M.J., & Hope, C.A. (1996). Conceptualization and measurement of patient change during psychotherapy: Development of the outcome questionnaire and youth outcome questionnaire. Psychotherapy, 33, 275-283. Williams, C.L. (1987). Issues surrounding psychological testing of minority patients. Hospital and Community Psychiatry, 38, 184-189. Ying, Y. (1988). Depressive symptomatology among Chinese-Americans as measured by the CES-D. Journal of Clinical Psychology, 44, 739-746. Zautra, A.J. (1983). Social resources and quality of life. American Journal of Community Psychology, 11, 275290. Zung, W.W. (1965). A self-rating depression scale. Archives of General Psychiatry, 12, 63-70. Zung, W.W. (1971). A rating instrument for anxiety disorders. Psychosomatics, 6, 371-379.

< previous page

page_869

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_871

next page > Page 871

Chapter 28 Primary Care Evaluation of Mental Disorders (PRIME-MD) Steven R. Hahn Albert Einstein College of Medicine and Jacobi Medical Center Kurt Kroenke Regenstrief Institute for Health Care and Indiana University School of Medicine Janet B.W. Williams Robert L. Spitzer New York State Psychiatric Institute and Columbia University The Primary Care Evaluation of Mental Disorders (PRIME-MD) is a two-stage case-finding and diagnostic instrument that was designed specifically for primary care clinicians in the general medical setting (Spitzer et al., 1994). Half of individuals with psychopathology who receive any medical care do so exclusively from primary care providers. Although one quarter of adult primary care patients have mental disorders, half or fewer of those disorders are detected, and those that are detected often receive suboptimal treatment. Admittedly, primary care patients with mental disorders may have fewer and milder symptoms, and less impairment than patients seen by mental health specialists. On the other hand, mental disorders encountered in primary care are associated with more functional impairment than most of the medical disorders that are typically the principal focus of general medicine. There are many reasons why primary care physicians fail to detect and treat mental disorders. Deficiencies in both what physicians know how to do and how they use that knowledge contribute. Unlike new procedures or tests that supplant less efficient versions of what physicians already understand and use, the PRIME-MD was designed from the outset to change physicians' practice patterns and to remedy a knowledge deficit. This perspective dictated that the PRIME-MD had to be, on the one hand, a self-guiding, user-friendly educational tool that could effectively provide the knowledge of diagnostic criteria that primary care physicians lack. On the other hand, it also had to be a rapid, cost-effective procedure whose application would be consistent with the existing milieu of the primary care encounter. The generation of case-finding tools antecedent to the PRIME-MD, such as the Zung Self-rating Depression Scale (Zung, 1965), the General Health Questionnaire (GHQ; Goldberg & Hillier, 1978), and the Center for Epidemiological Studies Depression Scale (CES-D; Radloff, 1977), identified patients likely to have some mental disorder without making specific diagnoses. On the other hand, diagnostic tools capable of making specific diagnoses such as the Structured Clinical Interview for DSM-III-R (SCID; Spitzer, Williams, Gibbon, & First, 1992) and Diagnostic Interview Schedule (DIS; Robins et al., 1985) were far too complicated and time consuming to be compatible

< previous page

page_871

next page >

< previous page

page_872

next page > Page 872

with primary care practice. In contrast, the PRIME-MD was developed as a single procedure to both screen populations to determine who was at risk and to guide the clinician all the way to a specific DSM criteria-based diagnosis. Further, ease and efficiency of administration in the primary care setting was a major aim. The Prevalence and Health-Related Consequences of Mental Disorders Prevalence A number of studies dating from the late 1970s through 1980s have examined the prevalence of common mental disorders in the primary care setting. Data from these studies have converged on the conclusion that major depressive disorder is present in 6% to 10% of primary care patients (Katon & Schulberg, 1992), and mental disorders meeting DSM-III or Research Diagnostic Criteria (RDC) are present in 20% to 26% of these patients. An important international study found a worldwide mean prevalence of eight common ICD-10 mental disorders of 21%, but also demonstrated considerable cross-cultural variability (8% in Shanghai, 53% in Santiago, Chile: Ormel et al., 1994; Sartorius et al., 1993; Ustun & Sartorius, 1996). The few studies examining psychiatric comorbidity demonstrated that patients frequently have more than one mental disorder (Coyne, Fechner-Bates, & Schwenk, 1994; Ormel et al., 1994; Zimmerman et al., 1994). The National Comorbidity Survey concluded that half of all lifetime mental disorders are accounted for by the 14% of the population having three or more disorders (R.C. Kessler et al., 1994). Health-Related Outcomes There is ample documentation that mental disorders have an adverse effect on health-related outcomes. Depression is associated with more impairment in health-related quality of life than most chronic medical diagnoses (Turner & Noh, 1988; Von Korff, Ormel, Katon, & Lin, 1992; Wells et al., 1989). Patients with psychopathology use more health services in the primary care setting than patients without mental disorders (Henk, Katzelnick, Kobak, Greist, & Jefferson, 1996; Karlsson, Lehtinen, & Joukama, 1995; Regier et al; 1988; Shapiro et al., 1984; Von Korff et al., 1992) and are less satisfied with care (Cherkin, Deyo, Street, & Barlow, 1996; Hansson, Borgquist, Nettelbladt, & Nordstrom, 1994; Hueston, Mainous, & Schilling, 1996; Wyshak & Barsky, 1995). Detection of Mental Disorders Measuring Physician Detection Assessment of clinicians' ability to detect mental disorders in primary care must overcome several methodological challenges. The criteria for ''detection" used in most studies have been inconsistent and lack diagnostic precision and rigor. Psychopathology has been labeled as detected if physicians believed any psychopathology was present, or if they identified broad categories of disorder such as depression (Katon & Von Korff, 1990; L.G.

< previous page

page_872

next page >

< previous page

page_873

next page > Page 873

Kessler, Cleary, & Burke, 1985). In some studies, the prescription of any psychotropic medication or provision of "counseling" was used as a surrogate for detection. In no study was specific diagnostic labeling required as a criteria of detection. Physician detection can be ascertained from chart review, which tends to underestimate rates of detection (Jenks, 1985; Katon & Von Korff, 1990), or by physician questionnaire, which alerts physicians to the purpose of the study and therefore tends to overestimate detection. Rates of Detection Despite these methodological problems, studies of physician detection converge on the conclusion that 30% to 60% of patients with mental disorders are not detected as having a mental disorder. Depression has been the most intensively studied of the mental disorders. Using several criteria for detection of depression, Kirmayer, Robbins, Dworkind, and Yaffe (1993) found that rates ranged from 24% to 67% depending on the rigor of the criterion used. Coyne, Schwenk, and Fechner-Bates (1995) discovered that family physicians endorsed a diagnosis of depression in 35% of patients with major depression and 28% of patients with any depressive disorder. Simon and von Korff's (1995) study of 2,000 patients found that two thirds of depressed patients were recognized as distressed and half were prescribed medications. Physician Factors Influencing Detection Physician interviewing style has been shown to influence detection of mental disorders. Both the content of the interview and the communication process have an impact on the likelihood of detection. In Badger et al.'s (1994a) study of physician interviewing, inquiry regarding specific symptoms of depression was positively correlated with detection of depression. However, elicitation of specific depressive symptoms was generally low. Most physicians elicited only three symptoms, and physicians never made their diagnosis on the basis of complete DSM criteria (i.e., depressed mood or anhedonia and a total of five symptoms). Failure to elicit information regarding all the symptoms relevant to the diagnosis of psychopathology is presumed to be due in part to deficits in physicians' knowledge of these criteria. Indeed, studies have demonstrated that primary care physicians have incomplete knowledge in the diagnosis and management of mental disorders (Cohen-Cole et al., 1982; Penn, Boland, McCartney, Kohn, & Mulvey, 1997). Robbins, Kirmayer, Cathébras, Yaffe, and Dworkind (1994) demonstrated that knowledge that psychological problems influence physical illness correlated positively with physicians' detection of mood and anxiety disorders. Physicians' skills have also been associated with detection. Skill in using "patient-centered" communication and in gathering a lot of information in general, as well as sensitivity to nonverbal communication, are positively associated with detection (Badger et al., 1994a, 1994b; Goldberg, Jenkins, Millar, & Faragher, 1993; Robbins et al., 1994; Roter et al., 1995). Whereas interview content and process are consistently related to detection, the effect of physicians' attitudes is less clear. In one study, self-rated interest in psychosocial issues did not correlate with desirable interviewing style (Badger et al., 1994b). In another study, sensitivity to patients' emotions correlated negatively with detection of mental disorders (Robbins et al., 1994). A tendency to blame depressed patients for causing their illness has been shown to correlate with nondetection (Robbins et al., 1994). Main

< previous page

page_873

next page >

< previous page

page_874

next page > Page 874

et al. (1993) confirmed that clinicians' perceptions of the importance of detecting depression in their practice was related to a multitude of attitudes, including physicians' emotional discomfort dealing with depression, their perception that patients would be uncomfortable discussing depression, perceived self-efficacy and satisfaction in treating depression, and the perceived time and effort required for treatment. Patient Factors Influencing Detection Few studies have examined patient characteristics associated with physician detection of mental disorders in primary care, and virtually all have examined only mood disorders. Unsurprisingly, the most consistent observation has been that detection is better when psychiatric symptomatology is overt and more severe (Badger et al., 1994a; Coyne et al., 1995; Freeling, Rao, Paykel, Sireling, & Burton, 1985; Schwenk, Coyne, & FechnerBates, 1996). Three studies using DSM-IV Global Assessment of Functioning scores (GAF; Spitzer, Gibbon, Williams, & Endicott, 1996) demonstrated that undetected depressed patients have milder impairment (Coyne et al., 1994, 1995; Schwenk et al., 1996). The GAF relies heavily on symptom severity in assessing functional status, so the association between functional impairment and detection may be confounded by symptom severity. It has commonly been taught that underlying depression is often missed when anxiety, which may be more obvious, is present (Bridges & Goldberg, 1987; Paykel & Priest, 1992; Rodin, Craven, & Littleford, 1991). Coyne et al. (1994) demonstrated the opposite: Comorbid anxiety was twice as prevalent in detected than in undetected (58% vs. 27%) depressed patients. The presence of multiple physical symptoms and comorbid medical conditions has also traditionally been described as a barrier to detection of mental disorders, resulting in so-called masked depression. The true relation between physical symptoms and detection probably depends both on the number of symptoms and patients' willingness to accept psychological explanations for their symptoms and distress. Kirmayer et al. (1993) demonstrated that the likelihood of detection in patients with anxiety or mood disorders increased with total number of unexplained physical symptoms and hypochondriacal worry, but was decreased and delayed in patients reluctant to attribute their symptoms to psychological distress. Several studies suggest that patients' resistance to psychiatric labeling decreases detection and stems from fear of stigmatization and confusion about the implications of the diagnosis (Dew, Dunn, Bromet, & Schulberg, 1988; Olfson, 1991; Paykel & Priest, 1992). Kirmayer et al. (1993) found that age had no effect on recognition, male gender delayed detection initially but had no effect over a 12-month period, and detection correlated positively with level of education. Other studies have suggested that female patients are more likely to have mental disorders diagnosed, but with some risk for false positive attribution (Cleary, Burns, & Nycz, 1990). Systems Factors. Physicians frequently cite lack of time as one of the most important obstacles to detection (Main et al., 1993; Orleans, George, Houpt, & Brodie, 1985; Rost, Humphrey, & Kelleher, 1994). The adequacy and availability of mental health specialists and services may also affect primary care physicians' efforts to detect mental disorders (Klinkman, 1997). Moreover, the adverse effect of payment policies that preclude reimbursement for the treatment of mental disorders in the primary care setting is a major disincentive to detection and treatment (Glass, 1995; Hirschfeld et al., 1997), and

< previous page

page_874

next page >

< previous page

page_875

next page > Page 875

will be more important for economically disadvantaged patients who have greater difficulty obtaining mental health services from specialists. Rost, Smith, Matthews, and Guide (1994) documented nonreimbursement for care of mental disorders as a reason for deliberately misdiagnosing depression in primary care. Overview PRIME-MD Investigators The PRIME-MD was developed by a team of investigators headed by Robert Spitzer and Janet Williams, whose previous accomplishments included pioneering work as senior editors of the American Psychiatric Association's Diagnostic and Statistical Manual (DSM), versions III and III-R. This work established fundamental principles of psychiatric nosology and classification used today. In addition, Spitzer and Williams developed the Structured Clinical Interview for DSM (SCID), a comprehensive diagnostic procedure designed to be used by a trained mental health provider (Spitzer et al., 1992). Other members of the PRIME-MD investigatory team were primary care internists and family physicians working in academic primary care training programs. In addition to being themselves representative of the endusers of the PRIME-MD, members of the PRIME-MD team had extensive experience as clinical investigators of topics related to the epidemiology, diagnosis, and management of mental disorders in primary care settings and in teaching psychiatric and behavioral science to medical trainees. Initial Development of the PRIME-MD Development of the PRIME-MD incorporated the following assumptions based on the epidemiological and educational research already summarized: 1. Mental disorders are common. 2. Many patients have more than one type of mental disorder. 3. Mental disorders often are undetected. 4. Case finding that identifies patients at high risk for a mental disorder has an inadequate impact on diagnosis and treatment; the procedural endpoint should be a diagnosis. 5. Physician acceptance requires an instrument that is easy to use, rapid, and focused on common and important disorders. The first version of the PRIME-MD was developed over an 8-month period of administering preliminary versions of the PRIME-MD to 450 patients at seven primary care sites. Tested instrument items were discussed and revised at weekly conference calls, resulting in the final version of the PRIME-MD that was validated in the PRIME-MD 1000 Study (Spitzer et al., 1994). Further minor modifications, described later, were made after the validation study to reflect differences in DSM-IV criteria and to streamline application. General Considerations and Description of the PRIME-MD Design of the PRIME-MD began with a two-stage "screen/case-find and diagnose" procedure accomplishing the second and critical third steps of the continuum of care. This design was based on the expectation that, in contrast to screening or case-finding

< previous page

page_875

next page >

< previous page

page_876

next page > Page 876

procedures, the specific diagnostic endpoint of the PRIME-MD would be the beginning of a self-sustained process of evaluation and treatment of the diagnosed condition. Although the objective of achieving a specific diagnosis determined the minimum length and complexity of the procedure, the need to create an acceptably brief instrument set limits on the level of diagnostic detail included, and on the extent to which evaluation of diagnosed disorders could be included. The screening/case-finding component was to be a self-administered, paper-and-pencil self-report that could be administered in the waiting room prior to the clinical encounter. The screen needed to both identify patients in need of further evaluation, and enhance efficiency by limiting further evaluation to specific categories of disorder. The PRIME-MD diagnostic component was conceived as a branching-logic, physician-administered interview subdivided into modules addressing the categories of disorder screened by the patient self-report. Criteria for the inclusion of a diagnosis in the PRIME-MD should ideally follow those established for screening or case finding in general (Campbell, 1987; Frame, 1986; Schwenk, 1996): The condition must be sufficiently common and have an important impact on health-related quality of life or mortality. The case-finding/screening procedure must be accurate and have an acceptable risk and cost. Screening or case finding must improve outcomes compared to waiting for the disorder to become more apparent. Treatment of the condition at the stage of screening or case finding must be available, acceptable, and effective. In addition to excluding uncommon or trivial conditions, and in the interest of producing an acceptably brief and efficient instrument, the PRIME-MD would not include: Protocols for subtypes of conditions or the secondary evaluation of the diagnosed conditions. Diagnoses that would be detected in the course of the evaluation of comorbid conditions already included in PRIME-MD (e.g., posttraumatic stress disorder, which is usually accompanied by one of the included mood or anxiety disorders). Conditions that are readily detected (if not accurately diagnosed) by means already routinely employed in primary care practice (e.g., conditions producing thought disorders that are apparent in the course of an ordinary medical interview). The resulting instrument addresses five categories of mental disorders (listed in Table 28.1). The first component of the PRIME-MD is a one-page Patient Questionnaire (PQ), completed by the patient before seeing the physician (see Fig. 28.1). The second component is the Clinician Evaluation Guide (CEG), a structured interview administered by the physician (see Fig. 28.2). The PQ is used by the physician to determine which, if any, of the five "modules" of the CEG should be administered to the patient. Patient Questionnaire. The Patient Questionnaire consists of 25 yes-no questions about symptoms and signs present during the previous month and a single question about the patient's overall health. The questions are divided into five groups corresponding to the five categories of mental disorder assessed in the PRIME-MD. The PQ is designed as paper-and-pencil self-report, but can also be administered by clerical or nursing personnel or by the physician. The first group of PQ items addresses 15 of the most common physical symptoms (excluding upper respiratory symptoms) encountered in primary care (Kroenke, Arrington, & Mangelsdorff, 1990; Schappert, 1992). The PQ begins with physical symptoms in the belief that patients visiting their primary care provider would be most comfortable

< previous page

page_876

next page >

< previous page

page_877

next page > Page 877

TABLE 28.1 Prevalence of Selected Psychiatric Disorders Detected by PRIME-MD in 1,000 Primary Care Patients Mental Disorder Total Site Sample Range No. (%) % Any psychiatric diagnosis 386(39) 30-52 Any DSM-IV threshold diagnosis 257(26) 18-38 Subthreshold only 129(13) 10-14 Any mood disorder 260(26) 19-35 Major depressive disorder 115(12) 7-19 Dysthymia 78 (8) 5-15 Partial remission or recurrence of major 63 (6) 4-9 depression Minor depressive disorder 64 (6) 2-9 2-4 Rule out depressive disorder due to physical disorder, medication, or other drug 24 (2) Rule out bipolar disorder 8 (1) < 1-1 Any anxiety disorder 178(18) 10-25 Anxiety not otherwise specified 90 (9) 7-13 Generalized anxiety disorder 70 (7) 2-13 Panic disorder 36 (4) 1-6 Rule out anxiety disorder due to physical 1-3 disorder, medication, or other drug 19 (2) Any somatoform disorder 139(14) 9-29 Multisomatoform disorder 82 (8) 4-18 Somatoform disorder not otherwise 42 (4) 2-9 specified Hypochondriasis 22 (2) < 1-5 Somatoform pain disorder 8 (1) 1-1 Probable alcohol abuse 51 (5) 3-7 Any eating disorder 30 (3) 1-7 Binge eating disorder 30 (3) 1-7 Bulimia nervosa 1 (< 0- < 1 1) Eating disorder not otherwise specified 1 (< 0- < 1 1) Note. Adapted with permission from Spitzer, Williams, Kroenke et al. (1994) JAMA, 272, 1749-1756. Copyright © 1994, American Medical Association. with these items and would perceive the provider as being interested in their overall health, both physical and emotional. Originally, three or more symptoms from this group were used to trigger administration of the somatoform module of the CEG. However, a seven-symptom threshold has recently been shown to identify most patients with a clinically significant somatoform disorder (Kroenke, Spitzer, deGruy, & Swindle, 1998). The original PQ contained a 16th item in this group that screened for hypochondriacal concerns, a diagnosis eliminated in the final version. The somatoform PQ items are followed by a single screening question for eating disorders; a positive response triggers administration of the eating disorders module of the CEG. Mood disorders are addressed in two items that ascertain the presence of depressed mood or anhedonia, the two DSM "A criteria" for the diagnosis of mood disorders. A positive response to either of these items triggers the mood module of the CEG. Anxiety disorders are addressed in three items: one each for the cognitive and the physical symptoms of generalized anxiety, and a single item screening for panic attacks. A single positive response to any of these items is an indication that the anxiety module should be administered. Four items address alcohol use, the first three of which are taken from the CAGE Questionnaire (Ewing, 1984). The fourth alcohol screening item substitutes a more

< previous page

page_877

next page >

< previous page

page_878

next page > Page 878

Fig. 28.1. The PRIME-MD Patient Questionnaire (PQ). Copyright © 1996, Pfizer Inc. All rights reserved. Reproduced with permission. sensitive question about consuming more than five drinks in one day for the least sensitive CAGE question (taking an "eye-opener," i.e., consumption to counteract symptoms of withdrawal or hangover). Again, a positive response to any one of the four items triggers the alcohol module of the CEG. In answering most PQ items, patients are instructed to answer yes only if they have "A LOT been bothered during the last month" (capitalization in the PQ). The only

< previous page

page_878

next page >

< previous page

page_879

next page > Page 879

Fig. 28.2. The first page of the PRIME-MD Clinician Evaluation Guide (CEG) mood module. Copyright © 1996, Pfizer Inc. All rights reserved. Reproduced with permission. exceptions are the anxiety attack item, and the four alcohol items that do not require that the symptom be present "often." The last question on the PQ addresses the patient's perception of their health. This item was included because a discrepancy between patient and physician perception of health status can be a clue to hidden psychopathology (Olfson, Gilbert, Weissman, Blacklow, & Broadhead, 1995).

< previous page

page_879

next page >

< previous page

page_880

next page > Page 880

The Clinician Evaluation Guide. The CEG is a structured interview guide, divided into five modules addressing each category of mental disorder. The diagnostic data and decisions are based on DSM-IV criteria. The sequence in which the modules should be administered, indicated by their position in the CEG, was established to maximize efficiency by capitalizing on the redundancy of some items in the mood and anxiety modules. The chosen sequencemood, anxiety, eating, alcohol, and somatoformalso makes it possible to determine if physical symptoms are due to a mood or anxiety disorder so that they are not also counted when considering somatoform disorders (as per DSM-IV criteria). In each module entered, the physician continues from item to item unless instructed to skip items or exit from the module. The anxiety module consists of two subsections: The items pertaining to the diagnosis of panic disorder are addressed only if the PQ question about panic attacks was endorsed; otherwise, the anxiety module begins with questions designed to determine if generalized anxiety disorder is present. The CEG takes an average of 8.5 minutes to administer. Languages. Versions of the PRIME-MD are available in several languages, including four Philippine dialects, Afrikaner, Spanish, French, and Chinese (Pang, Chao, Fabb, Lai, & Leung, 1997; Pang, Chao, Fabb, Leung, Ng, & Yeung, 1997). Automated Forms of the PRIME-MD The Patient Problem Questionnaire: A Self-Administered Version of the Prime-Md. The PRIME-MD was designed to be a cost-efficient method of diagnosing mental disorders, and in most respects it meets that objective admirably. However, the quest for greater efficiency in the primary care setting has become relentless in the current health care marketplace. Although the 8.5 minutes per patient required on average to administer the PRIME-MD does not seem like a lot of time to invest in the detection and diagnosis of mental disorders, it represents one half to one third of the time typically allocated for the routine primary care encounter in internal medicine. It is also the case that half of the CEG interviews performed will not be rewarded with a diagnosis. Furthermore, administering the PRIME-MD is just the beginning of a process that includes evaluation, treatment planning, and patient education, all of which also take time. The use of midlevel providers to administer the PRIME-MD on behalf of the primary physician as described here is one strategy for enhancing the efficiency of the PRIME-MD. Another strategy is to administer not only the PQ screen, but the CEG itself as a patient self-report. A newly developed Patient Problem Questionnaire (PPQ), has been designed to do just that (Spitzer, Williams, & Kroenke, 1997). The PPQ is three pages long and contains questions similar to those found in the CEG. To simplify administration, several mood and anxiety disorders have been combined into single categories. Dysthymia, major depression in partial remission or recurrence, and minor depression have been combined into "other depression." Generalized anxiety disorder and anxiety not otherwise specified are combined into a single category, "other anxiety disorder." Because the PPQ relies upon self-report, diagnoses must be confirmed by the physician. However, instead of taking an average of 8.5 minutes of physician time to take an inventory of symptoms, the PPQ enables the physician to accomplish the symptom-focused part of the interview much more rapidly, allowing more time for patient-centered evaluation and treatment intervention. Base rates of diagnoses obtained using the PPQ in preliminary evaluations are similar to those obtained using the PRIME-MD (Spitzer et al., 1997).

< previous page

page_880

next page >

< previous page

page_881

next page > Page 881

Computer-Administered Telephone PRIME-MD. Kobak and his colleagues developed a version of the PRIME-MD that uses interactive voice response (IVR) technology to administer a computerized version by telephone (Kobak et al., 1997). In their validation study, a convenience sample of 200 subjects selected to ensure high prevalence of PRIME-MD disorders was interviewed by a computer-driven system that enables patients to record their responses to questions using a Touch-Tone phone. The IVR-PRIME-MD uses the same branching structure as the PRIME-MD, but includes questions for social phobia and obsessive-compulsive disorder, and excludes the somatoform module. The operating characteristics and the validity of the IVR-PRIME-MD were compared to the PRIME-MD administered by a primary care physician during a clinical encounter, and to the SCID administered by a trained mental health practitioner during a phone interview. Using the SCID as the diagnostic standard, the IVRPRIME-MD demonstrated high specificity (79%), sensitivity (88%), and overall accuracy (k = .66) for the presence of any diagnosis. These operating characteristics were comparable to those of the PRIME-MD administered face-to-face by a trained primary care clinician (83%, 88%, and .70, respectively). The IVRPRIME-MD diagnosed more alcohol abuse in primary care patients than did the clinician-administered PRIMEMD. Comparative data on patient and physician acceptance and satisfaction with the three different procedures were not reported. Disorders Included in the PRIME-MD Although there may be some debate regarding the extent to which mood, anxiety, and alcohol disorders meet all the indications for screening (Schwenk, 1996; U.S. Preventive Services Task Force, 1996), their inclusion in the PRIME-MD was justified by their prevalence, associated morbidity, and potential for treatment. Mood Disorders. Major depressive disorder, major depression in partial remission or recurrence, dysthymic disorder, bipolar disorder, and minor depression were selected for inclusion in PRIME-MD. A qualifying diagnosis of secondary mood disorder can be appended to any of the aforementioned conditions. Subtypes of major depressive disorder and bipolar disorder were not included based on a consensus that the two more general diagnoses were appropriate endpoints for a diagnostic process that would still require an evaluation extending beyond the objectives of the PRIME-MD as a case-finding and diagnostic tool. Anxiety Disorders. Among the anxiety spectrum disorders, panic and generalized anxiety disorders, and anxiety not otherwise specified (NOS), and a qualifying diagnosis of secondary anxiety disorder were included. Posttraumatic stress disorder (PTSD) and obsessive-compulsive disorder (OCD) were not included in PRIME-MD because they are relatively less common, and should be detected during the evaluation of the already included mood and anxiety disorders that are typically comorbid with PTSD and OCD. Alcohol Abuse. The endpoint selected for the alcohol abuse/dependence module was a conditional diagnosis, that is, "probable alcohol abuse or dependence." Whereas the administration of the PRIME-MD Alcohol module represents a more intensive evaluation than is typically employed by the small group of primary care providers who do ask more than one or two quantitative questions about alcohol consumption, the task

< previous page

page_881

next page >

< previous page

page_882

next page > Page 882

of reaching definitive conclusions about alcohol abuse or dependence and distinguishing between the two goes beyond what can be accomplished with the PRIME-MD. Alerting the physician and patient to the presence of a potential problem with alcohol, even if it is not known whether full criteria for dependence or abuse are present, is probably worthwhile despite the potential adverse effect of labeling the patient. It is now believed that even lower levels of alcohol use, so-called problem drinking, may warrant physician intervention in the form of counseling about the dangers of excess alcohol use and consideration of referral for self-help or other treatment. Somatoform Disorders. Despite confusion regarding appropriate diagnostic criteria in the primary care setting and less consensus about effective management strategies, somatoform disorders were included in the PRIME-MD for several reasons. First, somatoform disorders are important in explaining many physical symptoms encountered in primary care patients for which there is no adequate organic explanation (Kroenke, Arrington, & Mangelsdorff, 1990; Schappert, 1992). Second, somatoform disorders, by some definitions such as Escobar's abridged somatization disorder, have been demonstrated to have a significant prevalence and impact on health-related quality of life (Escobar, Burnman, Karno, Forsythe, & Golding, 1987). Third, somatization has been demonstrated to present a serious challenge to the doctor-patient relationship and is one of the principle causes of physician-experienced distress (Hahn, Thompson, Stern, Budner, & Wills, 1994). Fourth, interventions with somatizing patients have been demonstrated to have an impact on health care costs, utilization of resources, and possibly functional status (Smith, Monson, & Ray, 1995; Smith, Rost, & Kashner, 1995). Fifth, an effective case-finding and diagnostic procedure for somatoform disorders would be of value in investigating the epidemiology, diagnosis, and management of these disorders, even in the absence of current consensus on management strategies. The choice of which specific somatoform disorders to be included was one of the most challenging in the development of the PRIME-MD. Somatization disorder is relatively rare in primary care (1% to 4%) and fails to capture many patients with clinically significant somatization. In addition, the diagnosis of somatization disorder requires the assessment of the lifetime prevalence of 35 symptoms and a complex decision algorithm considered to be far too cumbersome, especially for so low a yield. On the other hand, undifferentiated somatoform disorder (and somatoform pain disorder, a minor variation) requires the presence of only one unexplained symptom and was therefore felt to be too inclusive. Escobar's abridged somatization disorder was closer to the kind of diagnosis felt to be useful, but it too required assessment of lifetime prevalence of many symptoms and had different diagnostic criteria for males and females (Escobar et al., 1987). It was decided to use ''multisomatoform disorder" despite the fact that it is not an official DSM diagnosis. A diagnosis of multisomatoform disorder is made when three or more current symptoms whose presence or severity is not explained by organic disease are associated with disability and a pattern of unexplained symptoms for 2 or more years. This diagnosis has the advantages of not requiring the assessment of lifetime incidence of symptoms from a lengthy list, identical criteria for male and female patients, and the likelihood of identifying the group of patients whose health-related quality of life and medical care is influenced by somatoform symptoms (Kroenke, Spitzer et al., 1997). Somatoform pain disorder and somatoform disorder NOS were included in the original version of the PRIMEMD to assess the prevalence and impact of lower levels of somatization. Hypochondriasis as a distinct preoccupation with specific diseases

< previous page

page_882

next page >

< previous page

page_883

next page > Page 883

whose absence cannot be accepted by the patient despite adequate testing and evidence was also included. Because their prevalence was low, hypochondriasis and somatoform pain disorder are not included in the current version of the PRIME-MD. Eating Disorders. Recent studies have indicated that eating disorders are common in the general population. Primary care providers have an important role to play in detecting and managing these conditions (National Institute of Mental Health, 1993; Spitzer et al., 1993). PRIME-MD includes the diagnoses of binge eating disorder (not a DSM diagnosis), bulimia nervosa, and eating disorder, not otherwise specified. Validation: The PRIME-MD 1,000 Study Purpose The PRIME-MD 1,000 Study was designed to test the utility and validity of the PRIME-MD by answering the following questions: Criterion Validity. 1. Are the frequencies of mental disorders found by PRIME-MD comparable to those observed in other primary care samples using structured but much longer diagnostic interview schedules administered by mental health professionals? 2. Do diagnoses made by primary care providers using PRIME-MD agree with those made independently by mental health professionals? Convergent Validity. 3. Do patients with PRIME-MD mental disorders have significant functional impairment and greater health care utilization compared with patients without PRIME-MD diagnoses? 4. Is there a substantial relation between physician-generated PRIME-MD diagnoses of mood, anxiety, and somatoform disorders and the patient scores on corresponding self-rated symptom severity scales? Utility. 5. What is the average amount of time required by the primary care provider to complete the PRIME-MD evaluation? 6. Does the use of PRIME-MD increase recognition of mental disorders by primary care providers? 7. Do primary care providers find the information obtained with the PRIME-MD of value in understanding and treating their patients, and does this affect their treatment? 8. Are patients comfortable having their primary care providers ask them questions about psychological symptoms, and do they believe their answers will be helpful to their physicians in understanding and treating their problems? Methods Sites and Selection of Subjects. The study was conducted at the four primary care sites shown in Table 28.2. To ensure accurate estimates of true prevalence for the rarer of PRIME-MD diagnoses whose frequencies were approximately 4% during the development phase, a sample size of 1,000 subjects was enrolled in the study (this sample size would allow the prevalence estimates to have a narrow 95% confidence interval of ± 1%). From January 1992 to March 1993, 1,360 patients were approached in order to obtain the desired sample size. Reasons for nonparticipation included: the patient had

< previous page

page_883

next page >

< previous page

page_884

next page > Page 884

Site (N)

TABLE 28.2 PRIME-MD 1,000 Study Sites Description

New England Medical Center General Medical Associates. Boston, MA (228)

Sampling Method* Hospital-based All patients group during a practice selected clinic session

City Hospital Every third Jacobi Medical Center/Albert clinic patient until Einstein physician's College of Medicine (formerly Bronx quota reached Municipal Hospital Center) Bronx, NY (293) Clinic for both Consecutive Walter Reed Army Medical Center active-duty patients during General Medicine Clinic. Bethesda, and retired a clinic session MD (303) military until personnel physician's and their quota reached families Family practice Two University of Alabama College of clinic consecutive Medicine. Mobile, AL (176) patients persession, the first one chosen by random from among the first five patients * First 369 subjects sampled by convenience, remaining 631 using method indicated to aoid sampling bias. already been evaluated with PRIME-MD during the developmental phase (n = 109), did not desire to participate (n = 89), was unable to speak English (n = 81), was too ill or frail (n = 53), or other reasons (n = 28). The first 369 patients were selected by convenience, but independently of the participating physicians' suspecting or knowing that a patient had any psychopathology. The remaining 631 patients were selected using site-specific methods to avoid sampling bias (see Table 28.2). The convenience sample did not differ significantly from the systematically sampled subjects in age, sex, ethnicity, education, functional status, or the frequency of PRIMEMD diagnoses. A total of 31 primary care providers participated in the study (including four of the authors). Their mean age was 40 years (SD = ± 9.1), 60% were male, and the average number of years of practice since residency was 10 (SD = ± 9.5). Physicians at three of the sites (76% of the 31 physicians) were trained in internal medicine. The remaining physicians, all at the University of South Alabama, were trained in family medicine. All physicians participated in a 1- to 3-hour training session on the use of PRIME-MD led by one of the authors. Data Collection. Prior to the clinical encounter, patients completed the PQ as well as measures of symptom severity, functional status, and utilization of health care services (see Table 28.3). Physicians received the PQ at the beginning of the visit, and examined it during the visit at a time of their choice. They administered any CEG modules triggered by positive responses. Before examining the PQ, physicians indicated whether they believed the patient to be currently suffering from a mood, anxiety, alcohol abuse/dependence, eating, or somatoform disorder based on their prior knowledge of and interaction with the patient up to that point. After completing any CEG modules to be administered, physicians indicated which of eight medical conditions (plus a write-in "other") the patient had, their assessment of the value of the PRIME-MD, and the time they began and ended the PRIME-MD assessment. After one third of the patients were entered, the 10-item Difficult Doctor Patient Relationship Questionnaire (DDPRQ-10) was added to the protocol and completed after the PRIME-MD CEG. The DDPRQ-10 evaluates the physician's subjective response to the patient and the process of care, assessing the extent to which the patient

< previous page

page_884

next page >

< previous page

page_885

next page > Page 885

TABLE 28.3 Data Collected in the PRIME-MD 100 Study Instrument/item Description (ref) Patient Completed PRIME-MD 27-item* case-finding/screening self-report for mood, Patient anxiety, (PQ) alcohol, eating, and somatoform Questionnaire disorders Medical 20-item health-related quality of life on six Outcomes dimensions rated 0 Study to 100; 100 = best health (Stewart et al., 1988) Short-form General Health Survey (SF-20) Zung 20-item self-report depression screen (Zung, 1965) Depression Scale Zung Anxiety 20-item self-report anxiety screen (Zung, 1971) Scale Somatic 20-item self-report somatoform symptom assessment Symptom (Wyshak, Inventory Barsky, & Klerman, 1991) Health care 3-item self-report: 2 on number of recent visits, 1 on utilization satisfaction with care during last 3 months Disability days Number of days in last 3 months Physician Completed The PRIME- Modular structured interview for diagnosis of mood, MD Clinician anxiety, Evaluation alcohol, eating, and somatoform disorders Guide (CEG) Familiarity 1-item: "not at all," "somewhat," and "fairly well" with patient Knowledge of 5-item yes-no assessment of the presence of mood, patient's anxiety, mental alcohol, eating, and somatoform disorders, completed disorders prior to before examining PQ PRIME-MD Time for the Time required to examine PQ and complete CEG PRIME-MD Comorbid 9-item yes-no assessment of hypertension, arthritis, physical cancer, and disorders heart, diabetes, liver, renal, pulmonary, and "other" diseases. Perceived 1-item, 5-point Likert (1 = "not at all valuable" to 5 = value of "very PRIME-MD valuable" Treatment, Added midway through the study current and planned Difficult 10-item self-report assessing physician's subjective Doctor Patient response to Relationship patient and process of providing care. Administered Questionnaire- to last 10 627 subjects (Hahn et al., 1994) Self-assessed 2 self-report Likert items (collected once for each interest and participating training in physician). psychiatric diagnosis Mental Health Telephone interview at 3 of the 4 study sites within Professional 48 hours Telephone of PRIME-MD. Initiated midway through study Interview completed by 431 of the 539 eligible Repeat of Modular structured interview for diagnosis of mood,

PRIME-MD CEG mood, anxiety, alcohol, and eating disorders modules SCID Openended probes

anxiety, alcohol, and eating disorders. Somatoform module eliminated from phone assessment because data on medical conditions unavailable to phone interviewer. Open-ended questions assessing functioning, mood, psychosocial stressors. Ambiguous responses systematically explored (Spitzer et al., 1992) 2 items, administered to patients who were asked questions from the CEG by their provider

Patient's comfort with and perceived value of the CEG *The version of PRIME-MD PQ used in the validation study had 26 symptom-related items and 1 self-report assessment of general health, the current version of the PRIME-MD PQ contains 25 symptom items and 1 general health item. is experienced as "difficult" by the provider (Hahn et al., 1994). Midway through the study, two items assessing current and planned treatment with psychotropic medications and referral were added. Approximately 8% of the items on the patient validation study questionnaires and 3% of the CEG nondiagnostic items were not completed. For each SF-20 scale and for the three symptom severity scales, total scores were estimated if information was available

< previous page

page_885

next page >

< previous page

page_886

next page > Page 886

for more than 60% of the scale items. After these estimates, results remained missing on from 7% to 12% of subjects for different scales. To assess whether primary care providers using PRIME-MD make diagnoses that conform to DSM-III-R criteria, a telephone interview was conducted at three of the four study sites within 48 hours of the PRIME-MD visit. Telephone assessments were performed by a PhD clinical psychologist or senior psychiatric social worker who were unaware of the results of the PRIME-MD evaluation, using a semistructured interview that included the CEG questions and several open-ended questions about overall functioning, mood, recent stressors, and problems with work or family. The open-ended questions, taken from the SCID (Spitzer et al., 1992), were used to screen and probe for psychopathology that might not otherwise be elicited, and as in the standard administration of the SCID, interviewers were specifically instructed to explore ambiguous responses. Somatoform diagnoses were not assessed because the interviewer did not have access to information regarding possible physical illness as a cause of physical symptoms. Because the mental health practitioners had special training in evaluating psychopathology, and the interview, compared with the CEG, was more like a psychiatric clinical interview, the mental health practitioner assessment can be regarded as a diagnostic criterion standard for assessing the validity of the primary care providers' PRIME-MD diagnoses. The mental health practitioner telephone interview initiated midway through study was completed by 431 of the 539 eligible subjects. Nonparticipation in the telephone reinterview was due to failure to reach the subject within 48 hours (n = 44, 8%), not providing informed consent for a reinterview (n = 29, 5%), too ill, hard of hearing or other communication problem (n = 20, 4%), and no phone (n = 15, 3%). Results Description of Patients. Patients in the PRIME-MD 1,000 Study ranged in age from 18 to 91, with a mean age of 55 (SD ± 16.5 years). The mean age range across the four sites was 43 to 64 years. Sixty percent were female (site range 50% to 73% female); 58% were White (site range 30% to 75% White); 28% were college graduates (site range 4% to 45% college graduates); and 77% were established clinic patientsthe remainder were being seen for the first time. The most common types of physical disorders were hypertension (48%), arthritis (23%), diabetes (17%), heart disease (15%), and pulmonary disease (8%). Prevalence of PRIME-MD Diagnoses. The prevalence rates for patients in four broad categories of severity were: 19% for those who had so few symptoms on the PQ that no CEG module was triggered (symptom screen negative); 42% for patients who had symptoms but who did not meet criteria for any diagnoses; 13% for those who met criteria for a mild or "subthreshold" mental disorder; and 26% for patients who met criteria for a DSM-III-R diagnosis. The prevalence of the 18 specific diagnoses assessed in the PRIME-MD 1,000 Study are displayed in Table 28.1. Most subjects with psychopathology had multiple disorders: Of the 386 patients with a disorder, 56% had more than one, and 29% had three or more. Agreement with Mental Health Practitioner Diagnoses. Agreement between diagnoses made by primary care physicians using the PRIME-MD and mental health practitioners using the telephone interview protocol was generally good. Analysis of specific disorders and categories of disorders that were made at least 10 times by either group (regardless of agreement) are presented in Table 28.4. Sensitivity (the proportion of

< previous page

page_886

next page >

< previous page

page_887

next page > Page 887

TABLE 28.4 Indexes of Agreement Between PRIME-MD Diagnoses Made by Primary Care Physicians (PCPs) and Mental Health Professionals (MHPs) and Their Prevalence (n = 431) Prevalence % Sensitivity Specificity Positive Overall k PCP MHP % % Predictive Accuracy Value % Rate % Any psychiatric diagnosis 83 88 80 86 .71 37 36 Any mood disorder 67 92 78 84 .61 26 30 Major depressive disorder 57 98 80 92 .61 10 14 Partial remission or recurrence of major 26 96 41 89 .26 6 10 depressive disorder Dysthymia 51 96 56 92 .49 8 9 Minor depressive disorder 22 94 19 89 .15 7 6 Any anxiety disorder 69 90 60 86 .55 21 19 Panic disorder 57 99 68 96 .60 4 5 Generalized anxiety disorder 57 97 55 94 .52 7 7 Anxiety disorder not otherwise specified 33 91 31 84 .23 12 11 Probable alcohol abuse/dependence 81 98 65 98 .71 5 4 Any eating disorder 73 99 80 98 .73 5 5 Note. Adapted with permission from Spitzer, Williams, Kroenke et al. (1994) JAMA, 272, 17491756. Copyright © 1994, American Medical Association.

< previous page

page_887

next page >

< previous page

page_888

next page > Page 888

patients given a diagnosis by the mental health practitioner correctly identified by the primary care provider) was very good for any psychiatric diagnosis and at least satisfactory for the diagnostic modules. Sensitivity ranged from poor in subthreshold diagnoses to very good for more severe disorders. Specificity (the proportion of patients found to be free of the diagnosis by the mental health practitioner that were also noted to be disease free for that diagnosis by the primary care provider) was excellent for all diagnostic modules and for specific diagnoses, indicating a low frequency of false positive diagnoses by primary care providers using PRIME-MD. Overall accuracy rates across modules and specific categories were generally excellent. The k coefficient for chance corrected agreement was good for any diagnosis (.71), and satisfactory to good for specific modules. k for both major depressive and panic disorders also were good (.61 and .60). Operating Characteristics of the PQ. Though clinicians are explicitly instructed to follow their own clinical impressions in choosing to administer a CEG module even if the PQ screen item(s) are negative, the overall sensitivity of the PRIME-MD procedure will be limited by the sensitivity of the PQ. Similarly, the efficiency of the PRIME-MD will be determined to a great extent by the specificity of the PQ. The operating characteristics of the PQ are reported in Table 28.5 using both the PRIME-MD result obtained by the primary care provider and the mental health practitioner evaluation as criteria. Sensitivity (using the mental health practitioner's diagnoses), was good to excellent for the anxiety, eating, and alcohol modules. The sensitivity of the two PQ depression items for major depression was 86%, identical to that of the 20-item Zung depression scale (using the cutoff score of 50 on the Zung in the same sample; Magruder-Habib, Zung, & Feussner, 1990). Specificity for major depressive disorder was also virtually identical to the 20-item Zung (75% for the two-item PQ, and 74% for the Zung). Specificity was particularly good for mood, alcohol, and eating modules using either the mental health practitioner or primary care PRIMEMD diagnoses as the criterion standard. The PQ anxiety and somatoform disorders screens were the least specific, and thus the least efficient. Symptom Severity and PRIME-MD Diagnoses. The correlation between PRIME-MD diagnoses and symptom severity as measured by symptom checklists was examined to assess convergent validity. Partial correlations between scores on the Zung depression scale (Zung, 1965) and any PRIME-MD mood disorders was .58; between the Zung anxiety scale (Zung, 1971) and any anxiety disorder, .53; and between the Somatic Symptom Inventory (Wyshak, Barsky, & Klerman, 1991) and any somatoform disorder, .44. Health-Related Quality of Life. Mental disorders are associated with functional impairment and increased health care utilization. Evaluation of these health-related outcomes was therefore performed using the Medical Outcomes Study Short Form-20 (SF20; Stewart et al., 1988) as an additional assessment of construct or convergent validity of PRIME-MD diagnoses. Results of the SF-20 confirm that health-related quality of life (HRQL) is impaired proportionally to the severity of psychopathology diagnosed on the PRIME-MD (Spitzer et al., 1994, 1995). Figure 28.3 shows the means of the six SF-20 scales, grouped by severity of psychopathology into four groups: patients who were symptom screen-negative, patients who had symptoms but no diagnosed disorder, patients with a subthreshold diagnosis, and patients with a threshold mental disorder. The mean SF-20 scores have been adjusted for number of physical disorders, gender, age, minority status, educational level, and study site. Group main effects for severity of pathology were all significant (p < .001), and all paired comparisons among the four groups were significant at p less than .05 with the exception of differences

< previous page

page_888

next page >

< previous page

page_889

next page > Page 889

TABLE 28.5 Operating Characteristics of the Patient Questionnaire (PQ) Criterion Standard: Criterion Standard: Diagnoses by Primary Care Physician Diagnoses by Mental Health (N = 1000) Professional (N = 431) PQ Screen No.*Sensitivity Specificity Positive Overall Sensitivity Specificity Positive Overall Positive %** % Predictive Accuracy % % Predictive Accuracy Module Value % Rate % Value % Rate % Any module 805 100 32 48 58 92 48 50 64 Mood 325 100 91 80 94 69 82 62 78 Anxiety 486 100 63 37 59 94 53 31 60 Alcohol 124 100 92 41 93 81 91 27 91 Eating 139 100 89 23 89 86 88 28 88 Somatoform 681 100 37 20 46 Note. Adapted with permission from Spitzer, Williams, Kroenke et al. (1994). JAMA, 272, 17491756. Copyright Copyright; 1994, American Medical Association. * Number of subjects from the total sample of 1,000 who screened positive for that module. ** Sensitivity is 100% because a primary care physician module diagnosis was made only when the PQ was screen positive for that module.

< previous page

page_889

next page >

< previous page

page_890

next page > Page 890

Fig. 28.3. Relation of PRIME-MD results to functional status. All paired comparisons between the four groups were significant at p less than .05, using Bonferroni's correction for type I errors, with the exceptions of the differences between the symptom screen negative patients and patients with symptom screen positive but no psychiatric diagnoses on the role and social functioning scales (both p < .10). SF-20 indicates Short-form General Health Survey. Adapted with permission from Spitzer, Williams, Kroenke et al. (1994) JAMA, 272, 1749-1756. Copyright © 1994, American Medical Association. between symptom screen negative patients and those with symptoms but no diagnosis on the role and social functioning scales. The four groups of patients already described were kept from their usual activities because of not feeling well for 2.2, 3.2, 4.3, and 11.0 days, respectively. Overall group effect, as well as pairwise differences between those with threshold diagnoses and each of the other three groups, were all significant at p less than .001 (values adjusted for number of physical disorders, gender, age, minority status, educational level, and study site). In order to isolate the unique contribution of each disorder or class of disorders, the decrements in individual SF-20 scales were adjusted for the effects of demographic characteristics and the presence or absence of other mental and physical disorders by multiple regression analysis (Spitzer et al., 1995). To permit comparisons across all six scales, results were expressed as an effect size (difference between those with and without the disorder on each scale divided by the standard deviation of the scale in the total sample). The PRIME-MD 1,000 Study confirmed the results of previous studies that mood disorders are associated with substantial impairment in HRQL that exceeds that associated with nearly all physical disorders across all domains of HRQL, including pain and physical functioning. The PRIME-MD 1,000 Study also established that significant and unique patterns of functional impairment are associated with other mental disorders. In contrast to mood disorders producing impairment in all dimensions of HRQL, anxiety disorders were found to be associated specifically with impaired social functioning and mental health. Somatoform disorders have a profound effect on role functioning, bodily

< previous page

page_890

next page >

< previous page

page_891

next page > Page 891

pain, and general perception of health, but no impairment in mental health. (Somatizers reported insignificantly better mental health functioning than did patients with no mental disorder. This finding is both interesting and consistent with the theory that somatizing patients experience psychological distress as physical symptoms). It is important for primary care providers to appreciate, as an incentive for improving the detection and treatment of mental disorders in their practices, that much of the suffering borne by their patients is secondary to mental disorders rather than the physical conditions that all too often are exclusively the focus of physician attention. Mental disorders accounted for a substantially larger percentage of the variance in all domains of HRQL, including physical functioning and bodily pain, than did all of patients' physical disorders combined. Results from the PRIME-MD 1,000 Study demonstrated that even minor mental disorders such as minor depression, anxiety disorder NOS, and somatoform disorder NOS are associated with significant impairment. Utilization of Health Care Services. Observed differences in self-reported health care utilization between the four groups of patients distinguished by the presence and severity of PRIME-MD diagnosed mental disorders conform to the expected positive correlation between psychopathology and utilization. Figure 28.4 shows self-reported visits to physicians and emergency department during the previous 3 months. The group main effect was significant for both number of visits to the physician (p < .005) and for visits to the emergency room (p < .001). Pairwise differences between those with threshold disorders and all three other groups were also significant with the exception of the difference in physician visits between those with threshold and those with only subthreshold diagnoses. Time, Perceived Value, and Acceptability of PRIME-MD. The PRIME-MD CEG took an average of 8.4 minutes, and less than 20 minutes in 95% of cases. In patients without a PRIME-MD diagnosis, the CEG required an average of 5.6 minutes, and 95% of those cases required less than 11 minutes. The PRIME-MD procedure took an average of 11.4 minutes for those with a PRIME-MD diagnosis, and less than 24 minutes in 95% of cases. As expected, the PRIME-MD CEG took longer to administer in patients who had more mental disorders. After administering the CEG, physicians rated the value of the PRIME-MD using a 5-point Likert scale to respond to the following question: "Considering the time that you spent doing the CEG, how valuable was the information that you obtained in helping you understand and treat this patient?" Physicians found the PRIMEMD to be somewhat or very helpful for 61% of patients given the CEG, and for 83% of those who met criteria for a PRIME-MD diagnosis. Surprisingly, the PRIME-MD was perceived to be more valuable the longer it took to administer (partial r = .36, controlling for number of physical disorders, gender, age, minority status, educational level, and study site), undoubtedly due to the fact that longer interviews produced more diagnoses. Of equal interest is the observation that physicians found the PRIME-MD to be just as valuable in evaluating patients whose psychopathology had already been suspected as in diagnosing previously unknown cases. These results suggest that use of the PRIME-MD enhanced diagnostic clarity or certainty even when patients had recognized psychiatric caseness, supporting the contention that a procedure resulting in a diagnosis will be more valuable than one limited to case finding as its endpoint. It was rare for physicians to find that the PRIME-MD was of no value; they did so for only 7% of patients administered the CEG and only 1% of patients who received a PRIME-MD diagnosis. During the mental health specialist telephone interview, those patients who had been asked questions from the CEG during their PRIME-MD assessment were asked how

< previous page

page_891

next page >

< previous page

page_892

next page > Page 892

Fig. 28.4. Number of visits to physicians and the emergency room in patients with DSM-III-R mental disorders, subthreshold mental disorders, positive Patient Questionnaire (PQ) results but no mental disorder, and negative PQ results. comfortable they were answering those questions. Almost all (96% of the 252) were ''very" or "somewhat' comfortable. Almost all (90%) also said they believed that the CEG questions were "very" or "somewhat" valuable in helping their physicians better understand or treat the problems they had been having. Detection of Mental Disorders. Despite the high self-assessed interest in psychiatric diagnosis reported by the study physicians, previously undetected psychiatric cases were diagnosed in approximately one half (48%) of the 287 patients with mental disorders who were previously known to their physicians. In about a third of patients who had been recognized as psychiatric cases, the PRIME-MD result indicated a category of disorder that differed from the physicians' pre-PRIME-MD assessment. Administration of PRIME-MD generated plans for new treatments in 60 of the 105 with mental disorders who were currently receiving no treatment. Planned treatments were as common in previously detected as they were in newly diagnosed patients. Conclusions of the PRIME-MD 1,000 Study The validity and utility of the PRIME-MD are supported by the results of the PRIME-MD 1,000 Study. The prevalence of threshold mental disorders diagnosed by the PRIME-MD are similar to those found in previous studies that used longer structured interviews administered by mental health providers (J.E. Barrett, J.A. Barrett, Oxman, & Gerber, 1988; Katon & Schulberg, 1992; Schulberg & Burns, 1988; Von Korff et al., 1987). The agreement between PRIME-MD diagnoses made by primary care providers

< previous page

page_892

next page >

< previous page

page_893

next page > Page 893

and those made by the mental health provider during the blinded telephone interview approaches that observed among mental health practitioners using diagnostic interview schedules (Andreasen, Flaum, & Arndt, 1992; Williams et al., 1992). The high specificities observed across CEG modules indicates that physicians using the PRIME-MD seldom made false positive diagnoses. Whereas PRIME-MD sensitivity is modest for some disorders compared to mental health practitioners using a longer interview, it is double what the physicians themselves achieve when unaided by the PRIME-MD procedure. The sensitivity of primary care physicians using PRIME-MD is also better than that achieved by lay-interviewers using a longer structured diagnostic interview in a study that evaluated that approach compared to psychiatrists as the standard (Helzer et al., 1985). The observed level of agreement between primary care provider PRIME-MD and mental health practitioner diagnoses was achieved despite the difficulties inherent in the diagnosis of the relatively mild cases of psychopathology encountered in the primary care setting (Robins, 1985). Measures of psychiatric symptoms, health-related quality of life, satisfaction with care, and utilization of health care services all demonstrated the adverse effect of psychopathology on these health-related outcomes, strongly supporting the convergent validity of PRIME-MD diagnoses. The component of the PRIME-MD for which the best comparative data exist, the PQ depression screen, measures up extremely well against other screening approaches. Using the PRIME-MD 1,000 Study database, the two-item PQ depression screen had essentially the same sensitivity and specificity as the 20-item Zung depression scale. In a meta-analysis comparing nine depression case-finding instruments, the two-item PQ depression screen compared favorably to its much longer predecessors (Mulrow et al., 1995). Whooley et al. conducted a head-to-head comparison of the two-item PRIME-MD PQ depression screen with six other depression case-finding instruments, including the CES-D, the Beck Depression Inventory, and the SDDS-PC (Whooley, Avins, Miranda, & Browner, 1997). Their results indicated that the all of these instruments had comparable ROC curves (range .82 to .89), and the PRIME-MD PQ two-item depression screen had a sensitivity of 96%. The relatively brief time required to administer the PRIME-MD and physicians' rating of its value in management support the conclusion that the PRIME-MD can be integrated into the time-pressured primary care environment in some settings. The multidiagnostic scope of the PRIME-MD is congruent with the demonstrated high levels of comorbidity found in patients with mental disorders in primary care, and provides information important in the evaluation and management of patients with multiple disorders. At the same time, the modular construction of the physician interview enhances efficiency by focusing the physician's attention on disorders that have a high probability of being present. Changing Behavior in the Detection of Mental Disorders: The Role of PRIME-MD Competing Demands Model of Detection The PRIME-MD was developed to remedy a perceived deficit in physicians' detection and treatment of mental disorders. However, in critiquing the performance of primary care clinicians, it is only fair to point out the many barriers that interfere with detection and management, including limited time, somatic presentations of depressed patients,

< previous page

page_893

next page >

< previous page

page_894

next page > Page 894

stigmatization that can inhibit open discussion, competing medical problems, and inadequate reimbursement (Kroenke, 1997). Klinkman (1997) offered a "competing demands model," which argues that given the complex and multiple agendas patients and physicians bring to each clinical encounter, nondetection is neither surprising nor necessarily "inappropriate." He suggested that four domains will influence the priority of those agendas and the likelihood that detection of mental disorders will get adequate attention: physician characteristics, including skills, knowledge, and attitudes about psychosocial issues; patient characteristics; the structure of the health care "ecosystem"; and the public policy environment. The ability of a case-finding and diagnostic tool such as the PRIME-MD to change clinical practice will depend on its effect in all four of these domains. The PRIME-MD must correct deficiencies in physicians' clinical skills and knowledge, and be instrumental in changing attitudes that influence the priority of detection and diagnosing mental disorders. The PRIME-MD should detect disorders despite patient characteristics that contribute to nondetection and be useful in educating patients and enhancing their disposition to accept treatment for mental disorders. The PRIME-MD must be consistent with the structural elements of the primary care setting, such as time constraints and reimbursement policies, that constitute barriers to detection. Ideally, a detection and diagnostic procedure should be instrumental in changing the practice systems' incentive structure in favor of giving diagnosis of mental disorders a higher priority. Finally, a case-finding diagnostic tool can be instrumental in influencing public policy by generating population-based data about unmet needs and structural barriers to detection and treatment. Previous Efforts to Change Physician Practice in Detection of Mental Disorders The management of mental disorders can be described as a sequence of steps that constitute a continuum of care common to virtually all medical problems. Clinicians must maintain an appropriate index of suspicion, screen (asymptomatic) or case-find (symptomatic but hidden) patients to identify patients at risk, diagnose the condition when present, evaluate patients with the diagnosis, and initiate and monitor treatment. Four strategies have been employed to improve management of mental disorders that can be distinguished by the stages of the continuum of care they address and the extent to which they alter the structure of the health care delivery system. These include educational interventions to enhance skills and knowledge in any or all of the stages, with no structural interventions; structured use of instruments for case finding alone, without diagnostic, evaluation, or treatment interventions; liaison psychiatry collaborative care and referred care interventions that include the diagnostic, evaluation, and treatment stages of the continuum, with or without a case-finding intervention; and structured use of a two-stage case-finding and diagnostic instrument such as the PRIME-MD. Impact of Educational Interventions on Detection, Treatment, and Outcome. Several studies have evaluated the effect of educational interventions on physicians' ability to make accurate diagnoses of mental disorders. Most have assessed the impact of the educational intervention using written case vignettes or videotaped cases and have

< previous page

page_894

next page >

< previous page

page_895

next page > Page 895

demonstrated improvement in physician skill and knowledge (Andersen & Harthorn, 1990; Bowman, Goldberg, Millar, Gask, & McGrath, 1992; Gask, Goldberg, Lesser, & Millar, 1988; Penn et al., 1997). Goldberg, Steele, Smith, and Spivey (1980) demonstrated improved diagnostic accuracy in actual clinical practice in the poorest performing physicians. Other assessments of the impact of physician education on actual clinical practice have yielded more discouraging results (Shapiro et al., 1987). In a follow-up study of collaborative care intervention that included intensive physician education, Lin et al. (1997) found that rates of antidepressant medication prescription and patient adherence declined to preintervention levels after termination of the structural changes implemented during the active study period. Impact of Case Finding on Detection, Treatment, and Outcome. Structured methods to aid the physician are an obvious solution to the underdiagnosis of depression and other mental disorders in the primary care setting. The first generation of instruments designed to enhance care of mental disorders in the primary care setting focused on the second step of the continuum of care: case finding. The General Health Questionnaire (Goldberg & Hillier, 1978), for example, was designed to measure the symptoms and distress associated with any mental disorder. The Zung Depression Inventory (Zung, 1965), the Beck Depression Inventory (Beck, Ward, Mendelson, Mack, & Erbaugh, 1961), and the Center for Epidemiological Studies Depression Scale (Radloff, 1977) were designed to detect a population of patients with a high probability of having a mood disorder, but did not purport to make specific DSM criteria-based diagnoses. For the most part, although sensitive for psychiatric "caseness," these self-report questionnaires are not very specific for specific categories of disorder; rather, they measure the generalized distress, or "demoralization," associated with all mental disorders (B.P. Dohrenwend & B.S. Dohrenwend, 1965; Katon & Von Korff, 1990; Mulrow et al., 1995). A number of studies have evaluated the effect of providing feedback from case-finding instruments to physicians on the detection and management of depression, and the results are inconclusive. Several casefinding/feedback studies have demonstrated increased notation of depression and levels of treatment (Linn & Yager, 1980; Magruder-Habib et al., 1990; Moore, Lilimperi, & Bobula, 1978). Two studies showed that disclosure of depression had some beneficial effect on severity and duration of depression (Johnstone & Goldberg, 1976; Zung, Magill, Moore, & George, 1983). Other studies failed to show significant changes in detection (Hoeper, Nycz, Kessler, Burke, & Pierce, 1984; Shapiro et al., 1987), and in one study that examined depression status as an outcome, disclosure of undetected depression had no impact on patients' depression status at 12 months (Dowrick & Buchan, 1995). Impact of Collaborative Care and Referred Care Interventions. The failure of enhanced case finding alone to consistently improve outcomes led to interventions that enhance or supplement physician performance in the remaining stages of the continuum of care (i.e., diagnosis, evaluation, and treatment). Two series of studies have demonstrated that supplementing primary care providers' usual care approach to post-case-finding management can lead to better depression related outcomes than interventions that stop at the case-finding stage. Schulberg et al. (1996) conducted a study in which the benefit of feedback from case finding was compared to on-site pharmacotherapy administered by specially trained primary care physicians, or to on-site psychotherapy delivered by mental health specialists. In this study, the CES-D was used as a case-finding tool with 7,652 patients in clinic waiting rooms, the DIS was used to establish a diagnosis, and patients with

< previous page

page_895

next page >

< previous page

page_896

next page > Page 896

depression were evaluated by a consultation-liaison psychiatrist. Ultimately, 276 subjects agreed to randomization to one of three treatments: pharmacotherapy, interpersonal therapy, or feedback to and care from their primary provider (i.e., "usual care" plus feedback). Patients randomized to either pharmacotherapy or interpersonal therapy (standardized treatment) did significantly better over the 8 months of the study in terms of severity of depressive symptoms and recovery from depression. Among treatment completers, 70% of the patients receiving standardized treatment had recovered at 8 months compared to 20% of usual care patients. Among all patients (intent to treat analysis), recovery was seen at 8 months in approximately half of those receiving standardized treatment and in 18% of those in the usual care wing. Schulberg et al.'s (1996) study used a referral model in which pharmacotherapy as well as psychotherapy were taken out of the hands of the primary care provider, although the services were provided in the primary care site. In contrast, Katon et al. (1995) evaluated a collaborative model in which patients were seen by both primary care provider and consultation-liaison (C/L) psychiatrist, the C/L psychiatrist actively consulted with the primary care provider, and both primary care providers and patients received educational interventions. Significant improvement in depression-related outcomes were observed. Katon et al.'s study differs from the Schulberg et al. study in that no case-finding intervention was included. Subjects in the study had all been recognized as depressed by their provider without the aid of a protocol for detecting previously unrecognized depression. Thus, whereas this study, and a subsequent one that added psychotherapy to the treatment program (Katon et al., 1996), demonstrated the utility of a "collaborative" intervention for patients with recognized depression, the potential impact on the 40% to 60% of depressed patients that probably remained undetected cannot be assessed. Structured Use of PRIME-MD: Case Finding and Diagnosis. The positive impact of PRIME-MD on rates of detection and treatment demonstrated in the PRIME-MD 1000 Study support the conclusion that a two-stage, case-finding and diagnostic instrument may have more impact on diagnosis and the initiation of treatment than case finding alone. A study performed by Valenstein et al. (1997) further supported the conclusion that use of PRIME-MD will result in improved detection and higher rates of treatment. In contrast to the PRIME-MD 1,000 Study in which the use of PRIME-MD was initiated and monitored by research coordinators and incentives for participation were provided to a group of self-selected physicians, Valenstein et al. examined rates of PQ and CEG use by unselected physicians under three "realistic (i.e., economically and logistically feasible) support conditions," and a no-support control: 1. Nonclinical staff support condition (NCSS): The PQ was distributed to patients, collected, and placed in the patients' chart prior to the visit by nonclinical research staff. 2. Nursing support condition (RN): The PQ was distributed, collected, and placed in the chart by nursing staff already working in the clinic (i.e., no additional personnel were added to the clinic staff). 3. Physician "prompt" condition (Prompt): PQs were distributed by both RNs and nonclinical staff and reviewed by research staff who placed brightly colored written prompts in the chart, indicating which modules were triggered by the PQ. 4. No-support condition: Physicians were made aware that the PRIME-MD was available but initiation and completion was left entirely up to the provider. After instructing all providers in the use of the PRIME-MD, the three active support conditions and no-support condition were employed during rotating weeks of the 15-week study period. The study was performed in a Veterans Administration (VA) clinic and 2,263 patients were enrolled and eligible for evaluation. All three active

< previous page

page_896

next page >

< previous page

page_897

next page > Page 897

support conditions significantly increased the use of the PQ (11% in no-support vs. 80% to 81% in the active support conditions). Whereas physicians were selective in which PQ-positive patients received a CEG interview, all active support conditions produced much higher rates of CEG use than the no-support condition. Compared to no-support, the odds ratio (OR) for CEG use in the NCSS condition was 9.7, 95% confidence interval (CI) was 2.8 to 33.3; for RN support, OR = 14.2, CI was 3.9 to 52.6; and for Prompt, OR = 24.7, CI was 7.5 to 81.7. The use of written physician prompts was significantly more likely to result in CEG completion than the NCSS or RN support conditions. Compared to no-support, all three active support conditions resulted in significantly more new diagnoses (ORs ranged from 4.1 to 3.2), and the RN and Prompt conditions resulted in significantly more provider actions (OR = 1.7, CI was 1.1 to 2.6, and OR = 1.8, CI was 1.2 to 2.7, respectively). Differential Impact of Education, Case-Finding and Diagnostic Tools, and Structural Intervention. One conclusion that can be drawn from evaluation of these four kinds of interventions (i.e., physician education, feedback from case finding alone, collaborative care or referred care, and structured use of two-stage casefinding and diagnostic tools) is that physician education and/or case-finding interventions that fall short of diagnosis are not enough to change care patterns. On the other hand, in the two other intervention strategies where change in patterns of care were demonstrated, the task of diagnosis (as opposed to case finding) was targeted with a structural intervention. The collaborative care or referred care strategy has the best evidence for change in outcome, is the most effective, and makes the greatest change in the structure of the delivery system. In Schulberg's study, initiation and delivery of care along the entire continuum from case finding to treatment monitoring was taken out of the hands of the primary care provider and became an independent part of the structure of the health care delivery system. In Katon's studies, the diagnosis, evaluation, and treatment phases of care were structurally changed in the treatment environment. The conclusion that change in system structure, not improvement in physician skills and knowledge, is the necessary condition for improvement in mental disorders-related care is consistent with the "competing demands model" discussed earlier. That model anticipates that physician skill and knowledge is easily overshadowed, for better or for worse, by other and more potent determinants of physician behavior. The Schulberg and Katon studies evaluate the impact of interventions that include all stages of the continuum of care. The PRIME-MD 1,000 Study and the Valenstein et al. study are early efforts to assess the impact of the PRIME-MD's ability to help the physician make specific diagnoses. It remains to be seen whether the change in process observed in these studies results in commensurate change in outcome, and how the type of intervention studied by Valenstein et al. (1997) compares in cost and effect to the collaborative care and referred care model. As is described later, the PRIME-MD can also serve as a tool in collaborative care or referred care interventions. Use of the PRIME-MD in Treatment Planning Candidates for Evaluation with the PRIME-MD: Clinical Setting In Primary Care. The recommendations of expert panels on case finding, screening, and prevention in the primary care management of mental disorders can be taken as a

< previous page

page_897

next page >

< previous page

page_898

next page > Page 898

point of reference for consideration of appropriate use of the PRIME-MD in clinical practice. The U.S. Preventive Services Task Force is perhaps the most influential of consensus groups, and has developed recommendations regarding mood disorders, alcohol abuse, and suicide risk. The task force has not recommended the routine screening of persons who are "asymptomatic" for mood disorder, but has encouraged case finding through maintaining a high index of suspicion among patients with risk factors that they list as "adolescents and young adults, persons with a family or personal history of depression, those with chronic illnesses, those who perceive or have experienced a recent loss, and those with sleep disorders, chronic pain, or multiple unexplained somatic complaints" (U.S. Preventive Services Task Force, 1996, pp. 544-545). The Agency for Health Care Policy and Research clinical practice guidelines for depression in primary care are more explicit in their recommendation to use case-finding self-report instruments, but also favor their administration to a subset of patients with risk factors (Depression Guideline Panel). The U.S. Preventive Services Task Force reached the conclusion that there was "insufficient evidence to conclude that routine depression screening is indicated in unselected patients, because it has not been shown that the early detection and treatment of depression in primary care leads to improved outcome when compared to routine diagnosis and treatment of this disorder when symptoms appear and are detected" (p. 543). Although the PRIME-MD 1,000 Study does not provide the direct outcome evidence that the Preventive Medicine Task Force requires to recommend universal case finding, it does provide presumptive evidence that such a recommendation may be appropriate. Furthermore, the task force recommendations were based on the assumption that the first generation of long depression screening instruments rather than a two-item screen were the only methods available for case finding. Indeed the task force, specifically citing the PRIME-MD as an example, advised that approaches such as that embodied in the PRIME-MD may change their assessment of the utility of "routine use of screening tools." Another caveat that the task force applied to its own recommendations is the fact that the clinical action being considered is not, strictly speaking, "screening" because the disorders being searched for cannot be "asymptomatic." From this perspective, the PRIME-MD PQ is not so much a screening tool that detects asymptomatic patients as it is a procedure that corrects a deficiency in physicians' clinical practice. Whereas the risk factors mentioned in the task force recommendations should raise the providers' index of suspicion, the presence of a mood disorder in 26% and of some mental disorder in 39% of unselected patients suggest that the clinician's index of suspicion should already be raised to a level sufficient to trigger case finding when the patient walks in the door. The cost-benefit of such an approach seems justified by the operating characteristics and efficiency of the PRIME-MD Patient Questionnaire. It is hard to argue that physicians should not ask two questions to determine whether a patient is depressed or anhedonic if they do not already know. In the case of depression in particular, even without the added efficiency of self-administration featured by the PRIME-MD PQ, answers to the two PQ depression items can be ascertained faster and have better predictive power than an assessment of the risk factors that the prevention task force and the AHCPR recommend be used to trigger case finding. These observations suggest that the PRIME-MD might play a useful role in the routine care of primary care patients. When the PRIME-MD is administered to an unselected group of primary care patients, 80% will trigger at least one module of the Clinician Evaluation Guide. In half of those evaluations, the physician will be rewarded

< previous page

page_898

next page >

< previous page

page_899

next page > Page 899

by the confirmation of a mental disorder. Two thirds of these disorders will meet criteria for a DSM-IV diagnosis and the remaining third will have a minor or "subthreshold" disorder. If the physician is familiar with the patient, the yield of new diagnoses will still double the number of patients whose psychopathology is detected. Finally, there is strong evidence that even previously detected disorders will be more specifically and precisely identified. The timing of administration of the PRIME-MD to established patients and readministration to previously screened patients should be guided by clinical judgment. In this regard, risk factors such as those identified by the task force should trigger use of the PRIME-MD. Other characteristics of patients that should prompt evaluation with the PRIME-MD are discussed later. In Episodic Care. The investment of time required to administer the PRIME-MD, however brief, is nevertheless easier to justify in a long-term primary care relationship with the physician. The utility of the PRIME-MD in episodic care deserves separate consideration. In the episodic, or "walk-in," encounter, patients may be seen by their own physicians, someone other than their own physician, or they may not (yet) have a personal physician. Patients may be seen for relatively simple administrative problems such as medication renewals, or for relatively straightforward medical problems, such as upper respiratory infections. Although PRIME-MD might still reveal undiagnosed mental disorders, its addition to an unscheduled episodic visit is usually not consistent with the time allocated for episodic visits. On the other hand, episodic visits are often the mode of presentation for the mental disorders that the PRIME-MD was designed to detect (Kroenke et al., 1990, 1994; Kroenke & Mangelsdorf, 1989). In these presentations the PRIME-MD will facilitate greater understanding of the patient's true needs, and may lead to greater patient and physician satisfaction. In a recent study employing the PRIMEMD with 500 patients presenting for evaluation of a physical symptom, Jackson et al. demonstrated that use of the PRIME-MD coupled with a means to quickly identify symptom-related expectations decreased patients' unmet expectations and residual illness concerns, as well as physicians' experience of patients as difficult (Jackson, Chamberlin, & Kroenke, 1996). In Medical Subspecialty Care. Every medical subspecialist frequently confronts one or more conditions that are recognized to be caused in part by or reactive to psychological factors. Pulmonologists treat asthmatics whose symptoms vary dramatically with current life stressors, as do symptoms in irritable bowel disease, various headache syndromes, low back pain, and other medical disorders. In some cases, underlying organic disease treated by the subspecialist may be absent altogether despite the presence of highly suggestive symptoms. For example, the most common medical presentation of panic disorder in the primary care setting is chest pain, and panic disorder is one of the most common diagnoses eventually made in patients who undergo cardiac catheterization and are found to have normal coronary arteries (Beitman et al., 1989). Ideally, patients should have been evaluated for mental disorders by a primary care physician prior to referral. However, evaluation of psychopathology prior to referral for subspecialty care is currently not the rule for several reasons. In fee-for-service systems patients often selfrefer to specialists. In the primary care setting, 40% to 60% of mental disorders are undetected unless a procedure like the PRIME-MD is routinely employed. For these reasons, it would be useful for subspecialists to incorporate the PRIME-MD into their evaluation of some of their patients, particularly those with unexplained symptoms.

< previous page

page_899

next page >

< previous page

page_900

next page > Page 900

Use in Consultation-Liaison Psychiatry. The PRIME-MD was designed for use by primary care providers and tested in the outpatient primary care setting. Its raison d'etre was the documented insensitivity of unaided primary care providers to the mental disorders present in their patients. Presumably, liaison psychiatrists would not suffer from this deficit. However, the growing familiarity of primary care providers with the PRIME-MD, and the PRIME-MD's capacity to make the diagnostic criteria of mental disorders explicit in an easy to understand format, suggest that it could be used by mental health specialists consulting on medical patients as a method of communicating the results of the psychiatric evaluation. Candidates for Administration: Patient Characteristics Among those characteristics that should raise the clinician's index of suspicion and trigger administration of the PRIME-MD, the three most strongly substantiated by the PRIME-MD 1,000 Study were: the presence of multiple unexplained physical symptoms, functional impairment out of proportion to the patient's nonpsychiatric medical problems, and physician experienced difficulty in caring for the patient. Physical Symptoms. The PRIME-MD 1,000 Study confirmed previous studies demonstrating that physical symptoms in medical patients are associated with mental disorders and frequently cannot be accounted for by medical diagnoses (Kroenke et al., 1994). Patients in the study endorsed a mean of 4 out of the 15 physical symptoms on the PRIME-MD PQ. Physicians determined that 16% to 33% of those symptoms were somatoform. The presence of any physical symptom more than doubled the likelihood of an anxiety or mood disorder, and somatoform symptoms had a particularly strong association with psychopathology. Figure 28.5 shows the dramatic increase in the

Fig. 28.5. Relation between the number of physical symptoms and the prevalence of mood and anxiety disorders.

< previous page

page_900

next page >

< previous page

page_901

next page > Page 901

prevalence of mood and anxiety disorders associated with increasing numbers of physical symptoms. These data strongly support the argument that any patient with multiple physical complaints or somatoform symptoms (i.e., symptoms that cannot be accounted for by physical disease) should be assessed for mental disorders. Functional Impairment. As previously discussed, the impairment in health-related quality of life produced by mental disorders is so substantial that functional impairment should trigger an evaluation for mental disorders in any patient who has not already been assessed. The PRIME-MD 1,000 Study demonstrated that mental disorders produced more degradation of HRQL than did physical conditions, even in those patients who had significant physical disorders. Physician-Experienced Difficulty in the Doctor-Patient Relationship. One dramatically notable characteristic of patients that should trigger evaluation with the PRIME-MD is the physician's subjective experience of difficulty in caring for the patient. Variously referred to as difficult or frustrating patients, or with more pejorative terms such as "crock" or "heart-sink" patients (i.e., your heart sinks when you know you have to see them), these patients have long been thought to be difficult to care for in part because of the presence of mental disorders. The recent development of the Difficult Doctor Patient Relationship Questionnaire (DDPRQ), the 10-item version of which was used in the PRIME-MD 1,000 Study, has for the first time enabled empirical study of patients who are experienced as difficult (Hahn et al., 1994, 1996). The DDPRQ is a self-report instrument designed to detect and measure difficulties in the doctor-patient relationship as perceived by the physician. It is completed by medical providers after seeing patients. The 10item DDPRQ-10 takes less than 1 minute to complete, has high internal consistency reliability (Cronbach's alpha = .88), and results in a continuous "difficulty score" or classifies patients dichotomously as difficult or notdifficult based on a cutoff point established using the distribution of difficulty scores and factor analysis of the original 30-item version (Hahn et al., 1994, 1996). Some DDPRQ questions probe the physician's subjective response to the patient, for example: Do you find yourself secretly hoping that this patient will not return? How "frustrating" do you find this patient? Other items assess the physician's perception of patient behavior typically thought to be difficult, for example: To what extent are you frustrated by this patient's vague complaints? How time consuming is caring for this patient? The DDPRQ-10 was administered to 627 subjects in the PRIME-MD 1,000 Study and 15% were classified as difficult. Difficult patients were much more likely than not-difficult patients to have a mental disorder (67% vs. 35%, p < .0001). Difficult patients also had a higher number of mental disorders (1.8 vs. 0.7, p < .001). Difficulty scores increased with the total number of mental disorders (r = .42, p < .001). Physicians experienced 25% of patients with mental disorders as difficult, compared with only 8.5% of patients with no disorder (p < .001). The prevalence of patients experienced as difficult ranged from 11% to 20% at the four sites, but this variance was entirely accounted for by differences in the prevalence of mental disorders and functional status at the four sites. Six psychiatric disorders had particularly strong associations with difficulty: the odds ratio for multisomatoform disorder was 12.3 (95% confidence interval = 5.9-25.8); for panic disorder, 6.9 (2.6-18.1); for dysthymia, 4.2 (2.0-8.7); for generalized anxiety disorder, 3.4 (1.7-7.1); for major depressive disorder, 3.0 (1.85.3); and for probable alcohol abuse or dependence, 2.6 (1.01-6.7).

< previous page

page_901

next page >

< previous page

page_902

next page > Page 902

Patients rated as difficult also experienced worse health-related quality of life, higher utilization of health care, and decreased satisfaction. Psychopathology appeared to have a strong relation to these adverse outcomes because many of the differences between difficult and not-difficult patients were eliminated after adjusting for coexisting mental disorders. Assessing patients who cause problems in the doctor-patient relationship with the PRIME-MD may be warranted because: 1. The physician's subjective experience of difficulty is readily available data that does not require special history taking, examinations, or a test to detect. However, it does require modest introspective skills and the self-confidence necessary to consciously acknowledge a presumably nonaltruistic or negative response to the patient. 2. Difficult patients have poorer health-related outcomes and therefore are especially in need of effective intervention. 3. More than half of difficult patients have mental disorders, and the proportion of those disorders detected is as low as in not-difficult patients. 4. The adverse health-related outcomes associated with difficult patients seem to be strongly associated with their burden of mental disorders, making the treatment of those disorders a logical target for efforts to both help the patient and decrease the physician's distress. Method of Administration The PRIME-MD PQ. In standard practice, the PRIME-MD PQ is given to the patient to complete prior to seeing the physician. During the PRIME-MD 1,000 study, it was given to patients in the waiting room by clerical staff. It might also be mailed to the patient's home. Although the PQ requires minimal reading skills, a few patients will have difficulty due to problems with visual acuity or literacy. Patients have accepted having the PQ read to them, though the loss of confidentiality may influence response bias, particularly with the alcohol abuse section. The decision to give the patient the PRIME-MD PQ may be made by clerical staff following a protocol; for example, it may be automatically administered to all patients who are new to the office or provider, scheduled for periodic health maintenance, or making unscheduled visits with particular presenting complaints. Alternatively, the physician may decide to ask a patient to complete the PQ before seeing them because mental disorders are suspected, or for any of the range of problems associated with mental disorders (e.g., the patient seems dissatisfied or is experienced as difficult, has multiple physical symptoms is a ''high utilizer" of health care services, etc.). Clinicians might also elect to administer the PQ orally to the patient during the visit. The PRIME-MD's modular design allows the physician to administer only those PQ items that seem relevant, though the high rates of undetected comorbid mental disorders from different categories of mental disorder should lead to caution about skipping sections of the PQ. In practice, when physicians are familiar with certain patients, they may already know whether the patient has multiple physical complaints. In fact, the presence of multiple physical symptoms should prompt administration of the nonsomatoform PQ items and subsequently triggered CEG modules. Therefore, it is common for physicians to administer the 10 PQ items that address mood, anxiety, eating, and alcohol disorders during the clinical encounter. When the PQ is administered orally, the physician can move directly to the corresponding CEG module as soon as one mood, generalized anxiety, or alcohol item is endorsed because the remaining symptoms on

< previous page

page_902

next page >

< previous page

page_903

next page > Page 903

the PQ in these sections are also addressed by the CEG. The eating disorders and panic disorder PQ sections already consist of only one item. The PRIME-MD CEG. If patients have completed the PQ, they typically hand it to the physician at the beginning of the visit. A quick glance will inform the physician whether or not any modules have been triggered by the patient's responses. Even if CEG modules have been triggered, the physician should nonetheless begin the encounter with an appropriate "patient-centered" assessment of the patient's concerns and negotiate an agenda for the visit (CohenCole, 1991; Putnam & Lipkin, 1995; Smith & Hoppe, 1991). The decision as to when or even whether to administer the CEG should be made in the context of that negotiated agenda. The CEG modules should be administered in the order in which they are arranged in the PRIME-MD, that is, mood, anxiety, eating, alcohol, and finally somatoform. This sequence maximizes efficiency and allows symptoms secondary to mood and anxiety disorders to be identified prior to administering the somatoform module so that those symptoms are not also counted toward a somatoform diagnosis. When responding to CEG items it is important to ensure that the patient understands the duration and severity criteria that are the "frame" for each symptom. For example, symptoms of depression are counted toward a diagnosis only if they have been present nearly every day for at least 2 weeks. Within each module, the items are addressed sequentially until an instruction to skip to a subsequent item, or exit the module (and begin the next module triggered by the PQ), is encountered. The CEG indicates the diagnoses present as the criteria are met. Documentation. The final page of the PRIME-MD CEG package is a checklist of PRIME-MD diagnoses with ICD-9-CM codes that can be used as a summary of the diagnostic findings and placed in the chart. The summary sheet also includes the CPT and EM codes that can be used for visits during which the PRIME-MD has been administered. Alternatively, the entire PRIME-MD CEG and/or PQ can be placed in the patient's record. Patient-Centered Versus Symptom-Driven Interviewing. The PRIME-MD CEG provides the structure for a "physician-centered," or symptom-driven, assessment of the patient's symptoms. The issues addressed by the PRIME-MD typically have great significance to the patient and the PRIME-MD interview often evokes strong emotional responses. It is important to address the patient's emotional response to the diagnoses and underlying psychosocial issues, and the proper method for doing so is a patient-centered interaction. Patient-centered interviewing is characterized by the use of open-ended questions that elicit the patient's emotional response as well as factual information (Cohen-Cole, 1991; Putnam & Lipkin, 1995; Smith & Hoppe, 1991). Open-ended questions are followed by "directive" questions that focus the patient's attention on more circumscribed areas while leaving decisions about how to discuss those areas primarily up to the patient. In the overall process of the medical encounter, closed-ended questions used in the PRIME-MD CEG would ideally be integrated in a patient-centered interview. The physician should be sensitive to emotional reactions evidenced by the patient's tone of voice, facial expression, or additional comments made to elaborate on the yes-no responses that the CEG asks for. In such instances, the physician should acknowledge the patient's emotional responses and promise to return to them shortly, while suggesting that completion of a few more questions might lead to a better understanding of the patient's mood or anxiety disorder. This approach offers the advantage of allowing the

< previous page

page_903

next page >

< previous page

page_904

next page > Page 904

physician to formulate therapeutic recommendations, such as medication or referral for therapy based on specific diagnoses, when talking to the patient about the life situations that underlie their emotional distress and mental disorders. On the other hand, the physician may want to move to a patient-centered interviewing style as soon as an emotional response is evident, returning to the remaining PRIME-MD CEG items later to complete the assessment of the patient's mental disorders. The PRIME-MD item inquiring about suicidal ideation is so likely to evoke the kind of emotional distress that requires attention that the PRIME-MD itself employs a patientcentered "directive question" as a follow-up to a positive response. The PRIME-MD CEG alcohol abuse and dependence module requires the greatest skill in administration. Whereas the agreement between mental health practitioner interview and primary care provider assessment using the PRIME-MD was good for alcohol disorders (k = .71), a recent study comparing a computer-driven, interactive voice recognition PRIME-MD telephone interview demonstrated twice the prevalence of alcohol problems discovered by face-to-face interview with either the PRIME-MD administered by a primary care provider or the SCID administered by a mental health practitioner (Kobak et al., 1997; Spitzer et al., 1994). Detection of alcohol abuse in patients is complicated by the patient's desire to conceal a socially undesirable behavior, physicians' collusion in avoiding detection of a problem that is difficult to treat and evokes strong reactions in the caregiver, and the denial that is a cardinal feature of alcoholism (Clark, 1995; Moore et al., 1989; Ness & Ende, 1994). Who Should Administer the PRIME-MD? The PRIME-MD was developed for and validated with primary care physicians. Experience with more extensive structured psychiatric interviews in the hands of trained lay-interviewers has demonstrated adequate validity and reliability compared to mental health professionals. Therefore, it is reasonable to expect that, with the exception of the somatoform module which requires reaching conclusions regarding the relation between symptoms and physical illness, nurses or midlevel providers such as nurse practitioners, physicians assistants, and social workers could administer the PRIME-MD accurately on behalf of physician providers. Adoption of this strategy could result in a significant increase in the efficiency of the PRIME-MD for the primary provider because half of patients who are interviewed with the CEG do not have a PRIME-MD mental disorder. Instead of interviewing 80% of patients and achieving a diagnosis in half of those cases, the midlevel provider would interview the 80% of patients who trigger at least one module of the CEG and report the results to the primary care physician. The primary care physician could then devote more time to reviewing the PRIME-MD results, evaluating and treating the diagnosed conditions and less time performing interviews that do not alter the course of treatment. Providing Feedback of Results. The administration of the PRIME-MD CEG is intrinsically a form of feedback to the patient. The physician's administration of each module is cast as an explicit response to the patient's responses on the PQ: "I see that you indicated that you have been bothered by feeling down or depressed during the last month. Let me ask you some questions about other problems you might have that could be related to feeling depressed." As patients observe the physician administering the PRIME-MD CEG, they become aware that a series of symptoms that go together and constitute a syndrome is being evaluated. Though patients may have no more sophisticated an understanding of the syndrome than they do of many of the laboratory results that physicians show them or report to them, such as a "spot" on a chest x-ray or the value of their thyroid stimulating hormone (TSH), the fact that there is a recognized

< previous page

page_904

next page >

< previous page

page_905

next page > Page 905

coherence to the pattern of responses to the questions being asked can make a powerful impression. As the PRIME-MD CEG is administered, physicians can report diagnoses both as they are made and when all CEG modules have been administered. Although additional data needs to be gathered to complete evaluation of the diagnoses made, giving feedback to the patient about those diagnoses is the first stage of intervention. Patients' knowledge of and attitudes about the diagnosed mental disorders should be ascertained, as well as their preferences regarding different treatment options. Whereas the results of a PRIME-MD evaluation do require explanation, the fact that they are diagnoses rather than scale or symptom scores makes the physician's efforts to educate the patient more straightforward. Use with Other Evaluation Data. All of the diagnoses obtained by the PRIME-MD require further evaluation before treatment can be planned and initiated. Methods used to accomplish this evaluation can be structured or unstructured, but in actual practice the PRIME-MD has not commonly been followed by secondary structured protocols. The PRIME-MD has implicit indications for such evaluation in several places. For example, if the mood module reveals suicidal ideation, the physician is directed to ask the open-ended question, "Tell me about it." Several approaches to the assessment of suicide risk based on demographic characteristics and the presence, lethality, and availability of a suicide plan have been described (Brody et al., 1995; Depression Guideline Panel, 1993a; Depression Guideline Panel, 1993b). The presence of significant suicidal ideation or intent is generally considered to be an urgent indication for consultation with a mental health specialist. Although 75% of patients with an anxiety disorder and half of those with alcohol abuse have a comorbid mood disorder, and thus will be assessed for current suicidal ideation when the mood module is administered, patients with anxiety and alcohol-related disorders who did not trigger the mood module may also benefit from assessment for suicidal ideation. Both the mood and anxiety modules direct the physician to assess whether disorders diagnosed in those modules are secondary to "physical disorder, medication, or other drug." The PRIME-MD does not provide guidance regarding specific evaluations to rule out a secondary disorder but strategies for accomplishing this are well discussed in the medical literature. The essential task of the somatoform module is for the primary provider to "rule out" physical explanations for the presence or severity of the patient's symptoms. The PRIME-MD does not provide specific guidance for this task either, relying instead on the knowledge and expertise of physicians to accomplish this process. One of the basic steps in the evaluation of patients with any one mental disorder is to determine whether other mental disorders are present. Because the PRIME-MD assesses the most common of mental disorders in primary care, this important step is intrinsic to the PRIME-MD. Assessment of mental status, psychosis, and drug abuse other than alcohol are not included in the PRIME-MD and should be addressed in any patient with a PRIMEMD diagnosis. Other mental disorders such as obsessive-compulsive disorder, posttraumatic stress disorder, and social phobia might also be considered in selected patients in whom suggestive symptoms are detected during the PRIME-MD evaluation. One measure of the severity of mood disordersthe number of the nine symptoms of depression presentis intrinsic to the PRIME-MD. Severity of mood and other disorders should also be assessed in terms of associated functional impairment. The simplest classification scheme would characterize the disorder as mild if activities have

< previous page

page_905

next page >

< previous page

page_906

next page > Page 906

become more difficult for the patient but are still being performed in an acceptable fashion. Patients with moderate disorders will have lost the ability to perform some important activities, and those with severe disorders will have substantial loss of functional capacity. Patients with mild disorders will rarely need to be hospitalized, those with moderate disorders may if their support system is not robust, and those with severe disorders will generally require hospitalization unless they have a very strong social support system. Although structured methods for assessing functional capacity exist, primary care clinicians generally perform an unstructured assessment. Self-administered HRQL instruments that can be computer scored are available for use in outpatient settings (Parkerson, Broadhead, & Tse, 1995; Ware, 1993). Data obtained using structured assessments of patients' health-related quality of life can assist physicians in managing mental disorders, and they could be used to target the specific areas of deficit revealed on multidimensional HRQL instruments such as the SF-20 or SF-36 (Rubenstein et al., 1995). Assessment of the patient's social system (i.e., family, job, friends, and other sources of support and causes of life stress) is an important component of the evaluation of patients with mental disorders. Primary care providers usually perform an unstructured assessment of these factors. There is a growing body of literature describing semistructured, efficient, and sophisticated methods of assessing the patient's social system and applying that perspective to primary care (Hahn, 1997). Use of the PRIME-MD in Treatment Monitoring The purpose of monitoring the treatment of conditions diagnosable with the PRIME-MD is to assess symptom severity and the persistence of the diagnosis. The PRIME-MD is designed to detect when patients have the criteria required for diagnosis and can be used to determine whether conditions persist. However, it is not designed for the graduated assessment of symptom severity or response to treatment. Nonetheless, the PRIMEMD can be used at any time to determine if the criterion symptoms are still present, and the PQ physical symptom checklist and the symptom lists in the mood and anxiety modules can be administered at sequential visits as part of the process of monitoring treatment. Case Study The following case study is based on the management of representative patients seen in the clinic. It illustrates the use of the PRIME-MD for problem identification and persistence of diagnoses. A.G. was seen for the first time by a third-year medical resident who presented the patient to a supervising attending physician. A.G. had been a patient of the clinic for 10 years and had made an average of 15 visits per year for the last several years to her previous primary care provider, to the walk-in clinic, and the emergency room. In the previous year alone she had made 40 visits. Her record was filled with notes recording multiple physical complaints that had been vigorously investigated with persistently negative results. Although she complained of shortness of breath, and asthma was on

< previous page

page_906

next page >

< previous page

page_907

next page > Page 907

her problem list, her lung exams and pulmonary function tests were consistently normal. She also reported diarrhea, but stool tests and colonoscopy were normal. Symptoms suggestive of an ulcer led to gastroscopy, which was unremarkable. She complained of severe pain in multiple joints, but on examination there was no evidence of inflamation and x-rays and serological tests were negative. The resident described A.G. as presenting her current symptoms (i.e., "a flare up of her diarrhea, and joint pains") with an innocent but earnest request that the physician would resolve the problem. The resident found it striking that the history of past failures to diagnose and treat her problems had no effect on her current expectations. The resident was at a loss as to how to manage the situation, and more than a little frustrated and annoyed by the perceived unreasonableness of the patient's expectations. Although the resident reported that the patient seemed somewhat anxious and agitated, she could not characterize the patient's affect or psychiatric symptoms more specifically. The attending physician pointed out that the patient's pattern of multiple physical complaints was consistent with a somatoform disorder, that mood and anxiety disorders commonly occurred in association with somatoform disorders, and that all of these diagnoses were commonly undetected unless formally assessed. The attending requested that the resident use the PRIME-MD. On the Patient Questionnaire (PQ), A.G. endorsed 10 of the 16 physical symptoms, triggering the somatoform module of the Clinician Evaluation Guide (CEG); the "feeling depressed" item triggered the mood module; and the "panic attack" and "worried about a lot of things" items triggered the anxiety module. The eating disorders and alcohol abuse and dependence screens on the PQ were negative. She rated her health as "poor." The resident then administered the PQ-triggered modules in their sequence in the CEG: the mood module, the anxiety module, and the somatoform module. In the mood module, the patient endorsed five symptomsimpaired sleep, low energy, poor concentration, depressed mood, and low self esteemwhich fulfills the criteria for major depressive disorder. Confirming 2 years of impairment-causing depressed mood at least half the time, her presentation also fulfilled the criteria for dysthymia. In the panic section of the anxiety module she initially admitted to rare attacks of "nerves." However, when the panic questions were applied to what the patient called "asthma attacks," which always had a prominent component of anxiety, the patient's symptoms met criteria for panic disorder. In the remaining sections of the anxiety module, the patient admitted to being nervous, anxious and on edge, had many physical symptoms associated with generalized anxiety disorder (some of which had already been detected on the mood module), and acknowledged that these symptoms caused functional impairment. However, she said that her only worries were her medical conditions; otherwise, she did not worry about many different things. Her symptoms therefore met the criteria for anxiety disorder not otherwise specified but not that for generalized anxiety disorder. In the somatoform module, her symptoms met the criteria for multisomatoform disorder by confirming many years of multiple, impairment-causing physical symptoms that doctors could not adequately explain or treat. The mood and anxiety disorders diagnosed using the CEG were presented to the patient, who accepted them as an accurate description of her emotional state and as among her many medical problems. She was interested in trying medication to relieve her emotional distress and a tricyclic antidepressant was begun. As anticipated, A.G. did not believe that her physical symptoms were due to a mood or anxiety disorder, nor was the diagnosis of multisomatoform disorder used explicitly

< previous page

page_907

next page >

< previous page

page_908

next page > Page 908

in offering a new explanatory model for her illness experience. Instead, a family systems assessment was performed (Hahn, 1997), which revealed lifelong failure and frustration in establishing a marital relationship, and a compensatory and overly close relationship with the patient's mother. The relationship between A.G. and her mother was complicated by the mother's unusually great need for assistance in caring for the A.G.'s chronically ill and dependent 35-year-old sister. The patient's physical symptoms enabled her, on the one hand, to compete with her sister's illness in obtaining her mother's attention and, on the other hand, to excuse herself from obligation with minimal guilt when her mother's requests for help with the sister became excessive. In the course of several family and individual meetings with the patient, the resident and attending physician empathically acknowledged the distress caused by the patient's family problems and difficulty establishing a satisfying marital relationship. It was suggested to A.G. that she would be entitled to impose limits on her involvement with her mother and sister "even if she was in perfect health." Finally, A.G was encouraged to accept that these family issues were as worthy of attention as her physical symptoms (not the cause of them), and individual and/or family therapy was recommended. The patient accepted the rationale for referral but was skeptical about its usefulness and more skeptical about her family's willingness to participate. On subsequent visits, the patient's antidepressant medication was gradually increased to a full dose, and her symptoms of depression and the frequency of panic attacks decreased. Repeat administration of the PRIME-MD revealed that she no longer had current dysthymia, and her major depression was in partial remission. Her unscheduled visits decreased over the next year and the intensity of her somatization somewhat diminished. When she did complain of physical symptoms, the physician responded by inquiring about the situation with her mother and sister and the patient was able to ventilate her frustration and receive some empathic support. When the resident completed her training, the attending physician became the primary care provider. Initially, the patient did not follow through with the referral for psychotherapy, but 2 years later when she initiated another relationship, she began weekly psychotherapy. The patient's somatization decreased further, and on the occasions that she did report a "flare-up" of one of her complaints, a recent perturbation in the family system could usually be easily identified. She has chosen to remain on the antidepressant medication and has been free of all but brief episodes of depressed mood. This case demonstrates a number of important features related to the utility of the PRIME-MD. The patient had severe psychopathology, with four concurrent distinguishable mental disorders. Mean health-related quality of life scores for patients with this combination of diagnoses are in the lowest decile, and health care utilization rates and physician-experienced difficulty are in the highest decile of patients in the PRIME-MD 1,000 Study. Although physicians had previously suspected that mental disorders were present in this patient, they had never made a specific diagnosis or initiated diagnosis-specific treatment. The PRIME-MD enabled the physician to identify specific diagnoses and reframe the patient's and her own understanding of the cause of the patient's morbidity (although their respective explanatory models were not identical). Establishing specific diagnoses permitted the physician to initiate treatment and substantially improve the patient's health-related quality of life while decreasing inappropriate utilization of health care services and the physician's frustration. Moreover, repeated administration of the PRIME-MD enabled the physician to track the patient's illness over time. This case also illustrates the utility of the PRIME-MD as an educational instrument facilitating supervision and training of clinicians in the clinical setting.

< previous page

page_908

next page >

< previous page

page_909

next page > Page 909

Use of the PRIME-MD in Treatment Outcome Assessment Evaluation of the PRIME-MD as an Outcome Measure The PRIME-MD measures up well to many of the criteria established for evaluating outcome measures. Relevance. Epidemiologic data from the PRIME-MD 1,000 Study and previous studies of psychopathology in primary care clearly establish the relevance of the disorders targeted by the PRIME-MD. The prevalence, associated morbidity, and low rates of unaided detection in the primary care setting all support the conclusion that results obtained from the PRIME-MD will be relevant to the intended population. The use of situational or patient characteristic triggers for the administration of the PRIME-MD might further increase the likelihood that the result will be relevant to patient care. Procedures. Results of the PRIME-MD 1,000 Study demonstrate the feasibility of training primary care physicians to use the PRIME-MD. Both the PQ and CEG are self-guiding instruments that require little additional documentation or instruction to use. Aside from the PRIME-MD forms, no other equipment is needed. Minimal clinical support staff effort is required to initiate the PRIME-MD procedure with literate patients, and the clerical staff time required to assist patients with the PQ who cannot read is brief. The Patient Problem Questionnaire, an entirely self-report form of the PRIME-MD, requires greater patient sophistication but is easy for the majority of patients to complete. The computer-driven interactive voice recognition PRIME-MD will require investment in suitable telephone and computer systems. Objective Referents. Individual items and the diagnoses evaluated by the PRIME-MD are scored dichotomously as present or absent. Each item is scored according to specific intensity and duration of the symptom as relevant for the specific diagnosis. Results of the PRIME-MD are thus comparable between patients and populations. Use of Multiple Respondents. The PRIME-MD was not designed to be completed by evaluators or respondents other than the patient and provider. Certain symptoms, such as psychomotor agitation or retardation, can be judged by the physician as well as the patient's self-report. It is not unusual for other family members to be present during primary care encounters, and their input to the questions being addressed can contribute to the assessment. The PRIME-MD has no specific mechanism for identifying the source of data when other informants are used, and relies on the clinician's judgment in incorporating the data into the assessment. Data from the PRIME-MD 1,000 Study would suggest that obtaining information regarding alcohol use from other informants might be particularly useful in that subset of patients whose self-report may be distorted by denial and social desirability reporting bias. Treatment Linkage. Compared to unaided clinicians' general impressions that patients with mental disorders have some kind of psychological problem, the specific diagnoses generated by PRIME-MD are an essential step in determining the range of treatments from which the patient may benefit. However, the PRIME-MD is not designed to evaluate the severity of diagnoses in a way that can be used to identify which of the treatments available for the diagnoses is more likely to be useful. To the

< previous page

page_909

next page >

< previous page

page_910

next page > Page 910

limited extent that the instrument can be used to monitor the patient during treatment, it can be of assistance in determining if a chosen treatment is working. Psychometric Strength. The PRIME-MD 1,000 Study has established that the PRIME-MD is a reliable and valid instrument when used in the primary care setting. Agreement between PRIME-MD and standardized diagnostic interviews administered by mental health professionals approaches that observed between mental health professionals in psychiatric research studies. Construct validity is strongly supported by convergence with other validated measures of health-related quality of life and psychiatric symptomatology. The greater sensitivity of the PRIME-MD to alcohol abuse when administered by phone using a computerguided interactive voice response interview suggests that mode of administration may introduce some social desirability and other response biases. The PQ screen for alcohol problems is as susceptible to conscious withholding and unconscious denial as are other alcohol screens. It remains to be seen how new self-report versions of the PRIME-MD vary in performance from the physician-administered version and vary depending on whether administration is by paper-and-pencil, interactive computer in the office, or interactive voice response methods. It is well known that prevalence of psychopathology is strongly and negatively correlated with socioeconomic status (SES). This association was observed in the PRIME-MD 1,000 Study where intrasite variability in rates of psychiatric caseness could be explained for three of the four sites using the limited demographic indicators of SES available. It is therefore likely that the physician-administered PRIME-MD will give stable results across different medical practices. A great deal still needs to be learned regarding the utility of the PRIME-MD as a treatment planning and monitoring tool. In particular, sensitivity to treatment-related change in patient status has not yet been evaluated. Low Measure Costs Relative to Its Uses. Costs for administration of the PRIME-MD are limited to the cost of the forms and the time required for distribution and administration. No additional costs are incurred in analyzing or processing the results. As a structured and validated psychiatric assessment, the PRIME-MD may qualify for reimbursement from thirdparty insurance payers, covering the cost of administration and generating revenue. The PRIME-MD has been cited as an example of the kind of case-finding diagnostic tool appropriate for programs targeted to improve the health and cost of care of populations (Katon et al., 1997). It can be used for cost containment to determine or justify eligibility for treatments or services. At one clinic site, prescriptions for the SSRI that is supplied at low cost to patients requires completion of the PRIME-MD mood module. Administration to high utilizers of medical care may identify many patients with otherwise unidentified mental disorders, enhance the diagnostic precision in those who were previously known to have psychopathology, and lead to lower long-term costs of care. Understanding by Nonprofessional Audiences. PRIME-MD results are specific diagnoses and thus are relatively easy for the patient or lay audience to understand. Educating patients about their diagnosis is an essential first step in the process of treatment and evaluation, and explaining the results of the PRIME-MD thus does not add to the burden of patient education, but rather it facilitates it. In presenting PRIME-MD results, there is no need to explain psychological constructs that are secondary to

< previous page

page_910

next page >

< previous page

page_911

next page > Page 911

the central diagnostic description of the patient's problem, nor do the ranges, anchors, and significance of particular values of score scales need to be explained. One of the challenges inherent in presenting psychiatric diagnoses to the many patients who are reluctant to accept that they have a mental disorder is the absence of a true laboratory test that can "prove scientifically" that the disorder is present. An exception is the powerful effect often observed when abnormal laboratory tests indicating pathology produced by alcohol abuse are presented to patients. In some respects, the PRIME-MD resembles a "test." In contrast to an interview gathering the same data but without use of the PRIME-MD forms, when the PRIME-MD is administered patients are impressed that the questions they are being asked constitute a recognized diagnostic pattern because the instrument exists as a physical form. Indeed, the PRIME-MD has been compared to an electrocardiogram or chest X-ray: In this case, it is a "physical" tracing of the psyche rather than the heart or the lungs. The PRIME-MD's existence reinforces the point that mental disorders are well understood by the medical community and common enough to warrant the development of a standardized procedure. The effect of the procedure not only enhances patient understanding but contributes to the kind of confidence that individual patients need to adhere to regimens and that families need in order to be supportive. Patients themselves can refer to the PRIME-MD procedure when explaining the nature of their problems to others in their social system. For the same reasons that the PRIME-MD may facilitate communication with patients and their families, data about populations can be easily communicated to funding agencies, administrators, and other clinical personnel. The PRIME-MD is ideal for efficiently determining the prevalence and persistence of indicator psychiatric diagnoses in populations in program outcomes research and evaluation (Katon et al., 1997). In the short period of time that the PRIME-MD has been available, it has already become one of the most popular instruments for epidemiological and outcomes studies in the primary care setting. Easy Feedback and Uncomplicated Interpretation. The PRIME-MD requires no further "interpretation" after administration because the results are presented as diagnoses. Though the diagnoses require explanation in their application, no scale constructs or values need to be interpreted. The summary sheet for diagnoses made during the procedure is a convenient and readily understood reporting format. Usefulness in Clinical Services. The diagnoses derived from the PRIME-MD serve as the foundation for clinical action and communication. PRIME-MD diagnoses are the foundation for choosing further evaluations, treatments, and services. The PRIME-MD results can be used to justify reimbursement for assessment and treatment of mental disorders. The PRIME-MD was designed to accomplish these objectives in the most efficient, least costly, and burdensome fashion. The high perceived benefit of the PRIME-MD reported by physicians, and especially the positive correlation between perceived benefit and the amount of time required to administer it, support the conclusion that the goal of cost-effectiveness has been adequately achieved. Compatibility with Clinical Theories and Practices. The theoretical foundation of the PRIME-MD is the almost universally accepted DSM-IV diagnostic model. The PRIME-MD has utility across a broad spectrum of clinical theories and practices because it makes no presumptions, other than in choice of included conditions, regarding treatment approach or etiological model. The PRIME-MD detects and diagnoses conditions

< previous page

page_911

next page >

< previous page

page_912

next page > Page 912

that all primary care settings need to attend to regardless of the services they offer or their approach to treatment. PRIME-MD Related Literature The original PRIME-MD 1000 Study (Spitzer et al., 1994) provided a rich data set from which much as been learned about mental disorders in primary care, including health-related quality of life (Linzer et al., 1996; Spitzer et al., 1995), gender differences (Kroenke & Spitzer, 1998; Linzer et al., 1996; Williams et al., 1995), patient difficulty (Hahn et al., 1996), somatization (Kroenke et al., 1994, Kroenke, Spitzer, et al., 1997; Kroenke & Spitzer, 1998; Kroenke, Spitzer, deGruy, & Spindle, 1998), and alcohol disorders (Johnson et al., 1995). A literature review (Mulrow et al., 1995) and prospective study (Whooley et al., 1997) show that the PRIME-MD is as good or better than other measures designed to detect depression. Two studies have demonstrated that PRIME-MD performs as well as a longer structured, criteria-based diagnostic interview (Kobak et al., 1997; Spitzer et al., 1994). Valenstein et al. (1997) investigated the use of the PRIME-MD by unselected physicians under several realistic clinical support conditions. Increasingly, PRIME-MD is becoming a criterion standard itself for primary care research, including recent studies in special populations (Parker et al., 1997; Philbrick, Connelly, & Wofford, 1996) and primary care patients with physical complaints (Kroenke, Jackson, & Chamberlin, 1997; O'Malley, Wong, Kroenke, Roy, & Wong, 1998). Conclusions The PRIME-MD is a two-stage, psychiatric case-finding and diagnostic tool designed specifically for use in primary care. In contrast to older ''first generation" mental disorder case-finding instruments, the PRIME-MD's unique contribution was to be the first primary care instrument that takes the critical step of determining a specific psychiatric diagnosis after finding an undifferentiated psychiatric case. The PRIME-MD contains separate modules addressing the five most common categories of psychopathology seen in primary care: mood disorders, anxiety disorders, alcohol abuse and dependence, eating disorders, and somatoform disorders. The PRIME-MD is valid and reliable when used by primary care providers, acceptable to patients, and perceived as valuable to primary care providers (Spitzer et al., 1994). The PRIME-MD has increasingly been selected as a research tool by clinical investigators. When the PRIME-MD is initiated by clerical or nursing staff, primary care providers will in fact employ the PRIME-MD, make new psychiatric diagnoses, and initiate new treatments without additional added incentive (Valenstein et al., 1997). PRIME-MD's use of a preliminary case-finding screen, separate modules for each category of disorder, and use of branching logic with timely exits in the clinician-administered diagnostic component account for its demonstrated temporal efficiency. The recently developed and entirely self-administered Patient Problem Questionnaire and interactive voice response versions of the PRIME-MD promise even greater efficiency. The central function of the PRIME-MD is in detection and treatment planning. It has an obvious and prominent role to play in continuous care settings, but can also be of benefit when applied judiciously in episodic care, in subspecialty consultations, and

< previous page

page_912

next page >

< previous page

page_913

next page > Page 913

as an educational tool and facilitator of communication in consultation-liaison psychiatry. Whereas evidence suggests some role for the PRIME-MD as part of routine health maintenance with most if not all primary care patients, those with many physical symptoms, unexplained functional impairment, and patients that are experienced as difficult by their provider are most certain of benefitting from its use. The PRIME-MD has proven easy to use. Administration of the PRIME-MD is flexible: It may be used in its entirety or in parts. Use of the PQ may be initiated by the provider or clerical staff in the waiting room or begun during the visit by administering the PQ orally. The CEG is designed to be administered by the physician, but may also be administered by nurse or midlevel provider, by phone using an interactive voice response system, or self-administered as the Patient Problem Questionnaire. The biggest challenge in administering the PRIME-MD is to incorporate the PRIME-MD's symptom-driven questions into a patient-centered clinical encounter. The PRIME-MD is unique among psychological tools in that it was designed to change clinicians' behavior in a domain of practice that has been refractory to modification despite numerous previous efforts. An understanding of this dilemma has been enhanced by the "competing demands" model described by Klinkman (1997), which argues that because there are always more agendas than can be addressed, attention to those that are not obvious, immediately compelling, and/or reinforced by structural mandate or incentive will always be deferred in favor of agendas that are. Undetected mental disorders remain undetected because they are not obvious and immediately compelling, and the health care delivery system currently provides little structural incentive or mandate for detection. Initiating the PRIME-MD procedure by protocol immediately changes the priority given to mental disorders because the PRIME-MD case-finding Patient Questionnaire is designed to make mental disorders obvious, and the diagnostic Clinician Evaluation Guide is designed to make them compelling. Although individual clinicians may choose to use the PRIME-MD with individual patients or more generally in their own practices, changing the mental health of primary care populations will undoubtedly require more general modification of the health care system. The PRIME-MD is ideally suited for this task. However, further study of patient and encounter characteristics that should trigger the PRIME-MD, the resources required to support its administration and manage resulting diagnoses, and evaluation of changes in health outcomes remains to be done. Acknowledgment The development of the PRIME-MD was underwritten by an unrestricted educational grant from the Roerig and Pratt Pharmaceuticals divisions of Pfizer Inc., New York. References Andersen, S.M., & Harthorn B.H. (1990). Changing psychiatric knowledge of primary care physicians: effects of a brief intervention on clinical diagnosis and treatment. General Hospital Psychiatry, 12, 177-190. Andreasen, N.C., Flaum M., & Arndt, S. (1992). The Comprehensive Assessment of Symptoms and History (CASH): An instrument for assessing diagnosis and psychopathology. Archives of General Psychiatry, 49, 615623. Badger, L., deGruy, F., Hartman, J., Plant, M. A., Leeper, J., Anderson, R., Ficken, R., Gaskins, S., Maxwell, A., Rand, E., & Tietze,

< previous page

page_913

next page >

< previous page

page_914

next page > Page 914

P. (1994a). Patient presentation, interview content, and the detection of depression by primary care physicians. Psychosomatic Medicine, 56, 128-135. Badger, L., deGruy, F., Hartman, J., Plant, M. A., Leeper, J., Ficken, R., Maxwell, A., Rand, E., Anderson, R., & Templeton, B. (1994b). Psychosocial interest, medical interviews, and the recognition of depression. Archives of Family Medicine, 3, 899-907. Barrett, J.E., Barrett, J.A., Oxman, T.E., & Gerber, P.D. (1988). The prevalence of psychiatric disorders in a primary care practice. Archives of General Psychiatry, 45, 1100-1106. Beck, A.T., Ward, C., Mendelson, M., Mack, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571. Beitman, B.D., Mukerji, V., Lamberti, J.W., Schmid, L., DeRosear, L., Kushner, M., Flaker, G., & Basha, I. (1989). Panic disorder in patients with chest pain and angiographically normal coronary arteries. American Journal of Cardiology, 63, 1339. Bowman, F.M., Goldberg, D.P., Millar, T., Gask, L., & McGrath, G. (1992). Improving the skills of established general practitioners: The long-term benefits of group teaching. Medical Education, 26, 63-68. Bridges, K., & Goldberg, D. (1987). Somatic presentation of depressive illness in primary care. In P. Freeling, L.J. Downey, & J.C. Malkin (Eds.), The presentation of depression: Current approaches (pp. 9-11). London: Royal College of General Practitioners. Brody, D.S., Thompson, T.L., Larson, D.B., Ford, D.E., Katon, W.J., & Magruder, K. M. (1995). Recognizing and managing depression in primary care. General Hospital Psychiatry, 17, 93-107. Campbell, T.L. (1987). Is screening for mental health problems worthwhile in family practice? An opposing view. Journal of Family Practice, 25, 184-187. Cherkin, D.C., Deyo, R.A., Street, J.H., & Barlow, W. (1996). Predicting poor outcomes for back pain in primary care using patients' own criteria. Spine, 21, 2900-2907. Clark, W. (1995). Effective interviewing and intervention for alcohol problems. In M. Lipkin Jr., S.M. Putnam, & A. Lazare (Eds.), The Medical Interview: Clinical care, education and research (pp. 284-293). New York: Springer-Verlag. Cleary, P.D., Burns, B.J., Nycz, G.R. (1990). The identification of psychiatric illness by primary care physicians: The effect of patient gender. Journal of General Internal Medicine, 5, 355-360. Cohen-Cole, S.A. (1991). The Medical interview: The three function approach. St. Louis: Mosby-Yearbook. Cohen-Cole, S.A., Bird, J., Freeman, A., Boker, J., Hain, J., & Shugerman, A. (1982). An oral examination of the psychiatric knowledge of medical housestaff: Assessment of needs and evaluation baseline. General Hospital Psychiatry, 4, 103-111. Coyne, J.C., Fechner-Bates, S., & Schwenk, T.L. (1994). Prevalence, nature and comorbidity of depressive disorders in primary care. General Hospital Psychiatry, 16, 267-274. Coyne, J.C., Schwenk, T.L., & Fechner-Bates, S. (1995). Nondetection of depression by primary care physicians reconsidered. General Hospital Psychiatry, 17, 3-12. Depression Guideline Panel. (1993a). Depression in primary care: Vol. 1. Detection and diagnosis. Clinical Practice Guideline no. 5. Rockville, MD: Department of Health and Human Services, Public Health Service, Agency for Health Care Policy and Research. (Publication no. 93-0550) Depression Guideline Panel. (1993b). Depression in primary care: Vol 2. Treatment of major depression. Clinical Practice Guideline no. 5. Rockville, MD: Department of Health and Human Services, Public Health Service, Agency for Health Care Policy and Research. (Publication no. 93-0551) Dew, M.A., Dunn, L.O., Bromet, E.J., & Schulberg, H.C. (1988). Factors affecting help-seeking during depression in a community sample. Journal of Affective Disorders, 14, 223-234. Dohrenwend, B.P., & Dohrenwend, B.S. (1965). The problem of validity in field studies of psychological disorders. Journal of Abnormal Psychology, 70, 52-69. Dowrick, D., & Buchan, I. (1995). Twelve month outcome of depression in general practice: Does detection or disclosure make a difference? British Medical Journal, 311, 1274-1276. Escobar, J.I., Burnman, M.A., Karno, M., Forsythe, A., & Golding, J.M. (1987). Somatization in the community. Archives of General Psychiatry, 44, 713-718.

< previous page

page_914

next page >

< previous page

page_915

next page > Page 915

Ewing, J.A. (1984). Detecting alcoholism: The CAGE questionnaire. Journal of the American Medical Association, 252, 1905-1907. Frame, P.S. (1986). A critical review of adult health maintenance. Part 4. Prevention of metabolic, behavioral and miscellaneous conditions. Journal of Family Practice, 23, 29-39. Freeling, P., Rao, B.M., Paykel, E.S., Sireling, L. I., & Burton, R.H. (1985). Unrecognized depression in general practice. British Medical Journal, 290, 1880-1883. Gask, L., Goldberg, D., Lesser, A.L., & Millar, T. (1988). Improving the psychiatric skills of the general practice trainee: An evaluation of a group training course. Medical Education, 22, 132-138. Glass, R.M. (1995). Mental disorders: Quality of life and inequality of insurance coverage (editorial). Journal of the American Medical Association, 274, 1557. Goldberg, D., Jenkins, L., Millar, T., & Faragher, E. (1993). The ability of trainee general practitioners to identify psychological distress among their patients. Psychological Medicine, 23, 185-193. Goldberg, D.P., & Hillier, V.F. (1978). A scaled version of the General Health Questionnaire. Psychological Medicine, 9, 139-145. Goldberg, D.P., Steele, J.J., Smith, C., & Spivey, L. (1980). Training family doctors to recognize psychiatric illness with increased accuracy. Lancet, 2, 521-523. Hahn, S.R. (1997). Working with specific populations: Families. In M.D. Feldman & J. F. Christensen (Eds.), Behavioral medicine in primary care: A practical guide (pp. 57-71). Stamford, CT: Appelton & Lange. Hahn, S.R., Kroenke, K., Spitzer, R.L., Brody, D., Williams, J.B.W., Linzer, M., & deGruy III, F.V. (1996). The difficult patient: Prevalence, psychopathology, and functional impairment. Journal of General Internal Medicine, 11, 1-8. Hahn, S.R., Thompson, K.S., Stern, V., Budner, N. S., & Wills, T.A. (1994). The difficult doctor-patient relationship: Somatization, personality and psychopathology. Journal of Clinical Epidemiology, 47, 647-658. Hansson, L., Borgquist, L., Nettelbladt, P., & Nordstrom, G. (1994). The course of psychiatric illness in primary care patients: A 1-year follow-up. Social Psychiatry and Psychiatric Epidemiology, 29, 1-7. Helzer, J.E., Robins, L.N., McEvoy, L.T., Spitznagel, R.L., Stoltzman, R.K., Farmer, A., & Brockington, I.F. (1985). A comparison of clinical and diagnostic interview schedule diagnoses: Physician reexamination of layinterviewed cases in the general population. Archives of General Psychiatry, 42, 657-666. Henk, H.J., Katzelnick, D.J., Kobak, K.A., Greist, J.H., & Jefferson, J.W. (1996). Medical costs attributed to depression among patients with a history of high medical expenses in a health maintenance organization. Archives of General Psychiatry, 53, 899-904. Hirschfeld, R.M.A., Keller, M.B., Panico, S., Arons, B.S., Barlow, D., Davidoff, F., Endicott, J., Froom, J., Goldstein, M., Gorman, J. M., Marek, R.G., Maurer, T.A., Meyer, R., Phillips, K., Ross, J., Schwenk, T.L., Sharstein, S.S., Thase, M.E., & Wyatt, R.J. (1997). Consensus statement: The National Depressive and ManicDepressive Association consensus statement on the undertreatment of depression. Journal of the American Medical Association, 277, 333-340. Hoeper, E.W., Nycz, G.R., Kessler, L.G., Burke, J.D., & Pierce, W.E. (1984). The usefulness of screening for mental illness. Lancet, 1, 33-35. Hueston, W.J., Mainous, A.G., & Schilling, R. (1996). Patients with personality disorders: Functional status, health care utilization, and satisfaction with care. Journal of Family Practice, 42, 54-60. Jackson, J.L., Chamberlin, J., & Kroenke, K.(1996). A controlled trial to assess the value of recognizing mental disorders and concerns and expectations. Journal of General Internal Medicine, 11 (Suppl. 1), 134. Jenks, S.F. (1985). Recognition of mental distress and diagnosis of mental disorder in primary care. Journal of the American Medical Association, 253, 1903-1907. Johnstone, A., & Goldberg, D. (1976). Psychiatric screening in general practice. Lancet, 1, 605-608. Johnson, J.G., Spitzer, R.L., Williams, J.B.W., Kroenke, K., Linzer, M., Brody, D., deGruy, F., & Hahn, S.R. (1995). Psychiatric comorbidity, health status, and functional impairment associated with alcohol abuse and dependence in primary care patients:

< previous page

page_915

next page >

< previous page

page_916

next page > Page 916

Results from the PRIME-MD 1000 study. Journal of Consulting and Clinical Psychology, 63, 133-140. Karlsson, H., Lehtinen, V., & Joukama, M. (1995). Psychiatric morbidity among frequent attender patients in primary care. General Hospital Psychiatry, 17, 19-25. Katon, W., Robinson, P., Von Korff, M., Lin, E., Bush, T., Ludman, E., Simon, G., & Walker, E. (1996). A multifaceted intervention to improve treatment of depression in primary care. Archives of General Psychiatry, 53, 924-932. Katon, W., & Schulberg, H.C. (1992). Epidemiology of depression in primary care. General Hospital Psychiatry, 14, 237-247. Katon, W., & Von Korff, M. (1990). Caseness criteria for major depression: The primary care clinician and the psychiatric epidemiologist. In C. C. Attkinson & J.M. Zich, (Eds.), Depression in primary care: Screening and detection (pp. 43-62). New York: Routledge. Katon, W., Von Korff, M., Lin, E., Walker, E., Simon, G.E., Bush, T., Robinson, P., & Russo, J. (1995). Collaborative management to achieve treatment guidelines: Impact on depression in primary care. Journal of the American Medical Association, 273, 1026-1031. Katon, W., Von Korff, M., Lin, E., Unützer, J., Simon, G., Walker, E., Ludman, E., & Bush, T. (1997). Population-based care of depression: Effective disease management strategies to decrease prevalence. General Hospital Psychiatry, 19, 169-178. Kessler, L.G., Cleary, P.D., & Burke, J.D. (1985). Psychiatric disorders in primary care: Results of a follow-up study. Archives of General Psychiatry, 42, 583-587. Kessler, R.C., McGonagle, K.A., Zhao, S., Nelson, C.B., Hughes, M., Eshleman, S., Wittchen, H.U., & Kendler, R.S. (1994). Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the U.S. Archives of General Psychiatry, 51, 8-19. Kirmayer, L.J., Robbins, J.M., Dworkind, M., & Yaffe, M.J. (1993). Somatization and the recognition of depression and anxiety in primary care. American Journal of Psychiatry, 150, 734-741. Klinkman, M.S. (1997). Competing demands in psychosocial care: A model for the identification and treatment of depressive disorders in primary care. General Hospital Psychiatry, 19, 98-111. Kobak, K.A., Taylor, L.H., Dottl, L., Greist, J. H., Jefferson, J.W., Burroughs, D., Mantel, J. M., Katzelnick, D.J., Norton, R., Henk, H. J., & Serlin, R.C. (1997). A computer-administered telephone interview to identify mental disorders. Journal of the American Medical Association, 278, 905-910. Kroenke, K. (1997). Discovering depression in medical patients: Reasonable expectations. Annals of Internal Medicine, 126, 463-465. Kroenke, K., Arrington, M.E., & Mangelsdorff, A. D. (1990). The prevalence of symptoms in medical outpatients and the adequacy of therapy. Archives of Internal Medicine, 150, 1685-1689. Kroenke, K., Jackson, J.L., & Chamberlin, J. (1997). Depressive and anxiety disorders in patients presenting with physical complaints: Clinical predictors and outcome. American Journal of Medicine, 103, 339-347. Kroenke, K., & Mangelsdorf, A.D. (1989). Common symptoms in ambulatory care: Incidence, evaluation, therapy, and outcome. American Journal of Medicine, 86, 262-266. Kroenke, K., & Spitzer, R.L. (1998). Gender differences in the reporting of physical and somatoform symptoms. Psychosomatic Medicine, 60, 150-155. Kroenke, K., Spitzer, R.L., deGruy, F.V., Hahn, S.R., Linzer, M., Williams, J.B., Brody, D., & Davies, M. (1997). Multisomatoform disorder: An alternative to undifferentiated somatoform disorder for the somatizing patient in primary care. Archives of General Psychiatry, 54, 352-358. Kroenke, K., Spitzer, R.L., deGruy, F.V., & Swindle, R. (1998). A symptom check list to screen for somatoform disorders in primary care. Psychosomatics, 39, 263-272. Kroenke, K., Spitzer, R.L., Williams, J.B.W., Linzer, M., Hahn, S.R., deGruy, F.V. III., & Brody, D. (1994). Physical symptoms in primary care: Predictors of psychiatric disorders and functional impairment. Archives of Family Medicine, 3, 774-779. Lin, E.H., Katon, W., Simon, G., Von Korff, M., Bush, T.M., Rutter, C.M., Saunders, K. W., & Walker, E.A. (1997). Achieving guidelines for the treatment of depression in primary care: Is physician education enough? Medical Care, 35, 831-842. Linn, L.S., & Yager, J. (1980). The effect of screening, sensitization, and feedback on notation

< previous page

page_916

next page >

< previous page

page_917

next page > Page 917

of depression. Journal of Medical Education, 55, 942-949. Linzer, M., Spitzer, R., Kroenke, K., Williams, J. B., Hahn, S., Brody, D., & deGruy, F. (1996). Gender, quality of life, and mental disorders in primary care: Results from the PRIME-MD 1000 study. American Journal of Medicine, 101, 526-533. Magruder-Habib, K., Zung, W.W.K., & Feussner, J. R. (1990). Improving physicians' recognition and treatment of depression in general medical care: Results from a randomized clinical trial. Medical Care, 28, 239-250. Main, D., Lutz, L., Barrett, J., Matthew, J., & Miller, R.S. (1993). The role of primary care clinician attitudes, beliefs, and training in the diagnosis and treatment of depression. Archives of Family Medicine, 2, 1061-1066. Moore, R.D., Bone, L.R., Geller, G., Mamon, J. A., Stokes, E.J., & Levine, D.M. (1989). Prevalence, detection and treatment of alcoholism in hospitalized patients. Journal of the American Medical Association, 261, 403407. Moore, J.T., Lilimperi, D., & Bobula, J.A. (1978). Recognition of depression by family medicine residents: the impact of screening. Journal of Family Practice, 7, 509-513. Mulrow, C.D., Williams, J.W., Jr., Gerety, M. B., Ramirez, G., Montiel, O.M., & Kerber, C. (1995). Casefinding instruments for depression in primary care settings. Annals of Internal Medicine, 122, 913-921. National Institute of Mental Health. (1993). Eating disorders. Bethesda, MD: National Institutes of Health. Ness, D.E., & Ende, J. (1994). Denial in the medical interview: Recognition and management. Journal of the American Medical Association, 272, 1777-1781. O'Malley, P.G., Wong, P.W.K., Kroenke, K., Roy, M.J., & Wong, R.K.H. (1998). The value of screening for psychiatric disorders prior to upper endoscopy. Journal of Psychosomatic Research, 44, 279-287. Olfson, M. (1991). Primary care patients who refuse specialized mental health services. Archives of Internal Medicine, 151, 129-132. Olfson, M., Gilbert, T., Weissman, M., Blacklow, R.S., & Broadhead, W.E. (1995). Recognition of emotional distress in physically healthy primary care patients who perceive poor physical health. General Hospital Psychiatry, 17, 173-180. Orleans, C.T., George, L.K., Houpt, J.L., & Brodie, H.K.H. (1985). How primary care physicians treat psychiatric disorders: A national survey of family practitioners. American Journal of Psychiatry, 142, 52-57. Ormel, J., Von Korff, M., Ustun, T.B., Pini, S., Korten, A., & Oldehinkel, T. (1994). Common mental disorders and disability across cultures: Results from the WHO collaborative study on psychological problems in general health care. Journal of the American Medical Association, 272, 1741-1748. Pang, A.H.T., Chao, D.V.K., Fabb, W.E., Lai, K.Y.C., & Leung, T. (1997). Validation study of the Chinese version of the Primary Care Evaluation of Mental Disorders (cPRIME-MD) Part Itranslation and reliability. Supplement to the Journal of the American Medical Association, Southeast Asia, 13, 16-18. Pang, A.H.T., Chao, D.V.K., Fabb, W.E., Leung, T., Ng., F.S., & Yeung, O.C.Y. (1997). Validation study of the Chinese version of the Primary Care Evaluation of Mental Disorders (cPRIME-MD) Part IIValidity. Supplement to the Journal of the American Medical Association, Southeast Asia, 13, 19-22. Parker, T., May, P.A., Maviglia, M.A., Petrakis, S., Sunde, S., & Gloyd, S.V. (1997). PRIME-MD: Its utility in detecting mental disorders in American Indians. International Journal of Psychiatry and Medicine, 27, 107-128. Parkerson, G.R., Broadhead, W.E., & Tse, C. K. (1995). Health status and severity of illness as predictor of outcomes in primary care. Medical Care, 33, 53-56. Paykel, E.S., & Priest, R.G. (1992). Recognition and management of depression in general practice: Consensus statement. British Medical Journal, 305, 1198-1202. Penn, J.V., Boland, R., McCartney, J.R., Kohn, R., & Mulvey, T. (1997). Recognition and treatment of depressive disorders by internal medicine attendings and housestaff. General Hospital Psychiatry, 19, 179-184. Philbrick, J.T., Connelly, J.E., & Wofford, A. B. (1996). The prevalence of mental disorders in rural office practice. Journal of General Internal Medicine, 11, 9-15. Putnam, S.M., & Lipkin, Jr., M. (1995). The patient-centered interview: Research support. In M. Lipkin, Jr., S.M. Putnam, & A.

< previous page

page_917

next page >

< previous page

page_918

next page > Page 918

Lazare (Eds.), The Medical Interview: Clinical care, education and research (pp. 530-537). New York: Springer-Verlag. Radloff, S.L. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applying Psychological Measurement, 1, 385-401. Regier, D.A., Hirschfeld, R.M.A., Goodwin, F. K., Burke, J.D. Jr., Lazar, J.B., & Judd, L. L. (1988). The NIMH depression awareness, recognition and treatment program: Structure, aims and scientific bases. American Journal of Psychiatry, 145, 1351-1357. Robbins, J.M., Kirmayer, L.F., Cathébras, P., Yaffe, M.J., & Dworkind, M. (1994). Physician characteristics and the recognition of depression and anxiety in primary care. Medical Care, 32, 795-812. Robins, L.N. (1985). Epidemiology: Reflections on testing the validity of psychiatric interviews. Archives of General Psychiatry, 42, 918-924. Robins, L.N., Helzer, J.E. Orvaschel, H., Anthony, J. C., Blazer, D.G., Burnam, A., & Burke, J.D., Jr. (1985). The Diagnostic Interview Schedule. In W.W. Eaton & L.G. Kessler (Eds.), Epidemiologic field methods in psychiatry: The NIMH Epidemiologic Catchment Area Program (pp. 285-308). New York: Academic Press. Rodin, G., Craven, J., & Littleford, C. (1991). Depression in the medically ill: An integrated approach. New York: Brunner/Mazel. Rost, K., Humphrey, J., & Kelleher, K. (1994). Physician management preferences and barriers to care for rural patients with depression. Archives of Family Medicine, 3, 409-414. Rost, K., Smith, R., Matthews, D.B., & Guide, B. (1994). The deliberate misdiagnosis of major depression in primary care. Archives of Family Medicine, 16, 267-276. Roter, D.L., Hall, J.A., Kern, D.E., Barker, L. R., Cole, K.A., & Roca, R.P. (1995). Improving physicians' interviewing skills and reducing patients' emotional distress: A randomized clinical trial. Archives of Internal Medicine, 155, 1877-1884. Rubenstein, L.V., McCoy, J.M., Cope, D.W., Barrett, P.A., Hirsch, S.H., Messer, K.S., & Young, R.T. (1995). Improving patient quality of life with feedback to physicians about functional status. Journal of General Internal Medicine, 10, 607-614. Sartorius, N., Ustun, T.B., Costa e Silva, J.A., Goldberg, D., Lecrubier, Y., Ormel, J., Von Korff, M., & Wittchen, H.V. (1993). An international study of psychological problems in primary care: Preliminary report from the WHO Collaborative Project on Psychological Problems in General Health Care. Archives of General Psychiatry, 50, 819-824. Schappert, S.M. (1992). National Ambulatory Medical Care Survey: 1989 summary. Vital Health Statistics, 13(110). Schulberg, H.C., Block, M.R., Madonia, M. J., Scott, C.P., Rodriguez, E., Imber, S. D., Perel, J., Lave, J., Houck, P.R., & Coulehan, J.L. (1996). Treating major depression in primary care practice: Eight-month clinical outcomes. Archives of General Psychiatry, 53, 913-919. Schulberg, H.C., & Burns, B.J. (1988). Mental disorders in primary care: Epidemiologic, diagnostic, and treatment research directions. General Hospital Psychiatry, 10, 79-87. Schwenk, T.L. (1996). Screening for depression in primary care: A disease in search of a test. Journal of General Internal Medicine, 11, 437-439. Schwenk, T.L., Coyne, J. C., & Fechner-Bates, S. (1996). Differences between detected and undetected patients in primary care and depressed psychiatric patients. General Hospital Psychiatry, 18, 407-415. Shapiro, S., German, P.S., Skinner, E.A., Von Korff, M., Turner, R.W., Klein, L.E., Teitelbaum, M.L., Kramer, M., Burke, J. D. Jr., & Burns, B.J. (1987). An experiment to change detection and management of mental morbidity in primary care. Medical Care, 25, 327-339. Shapiro, S., Skinner, E.A., Kessler, L.G., Von Korff, M., German, P.S., Tischler, G.L., Leaf, P.J., Benham, L., Cottler, L., & Regier, D.A. (1984). Utilization of health and mental health services: Three epidemiologic catchment sites. Archives of General Psychiatry, 41, 971-978. Simon, G.E., & von Korff, M. (1995). Recognition, management, and outcomes of depression in primary care. Archives of Family Medicine, 4, 99-105. Smith, G.R., Monson, R.A., & Ray, D.C. (1986). Psychiatric consultation in somatization disorder. New England Journal of Medicine, 314, 1407-1413.

< previous page

page_918

next page >

< previous page

page_919

next page > Page 919

Smith, G.R., Rost, K., & Kashner, T.M. (1995). A trial of the effect of a standardized psychiatric consultation on health outcomes and costs in somatizing patients. Archives of General Psychiatry, 52, 238-243 Smith, R.C., & Hoppe, R.B. (1991). The patient's story: integrating the patient and physician centered approaches to interviewing. Annals of Internal Medicine, 115, 470-477. Spitzer, R.L., Gibbon, M., Williams, J.B.W., & Endicott, J. (1996). Global assessment of functioning (GAF) scale. In L.I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (chap. 11, pp. 76-78, and Appendix B, p. 182). Baltimore: Williams & Wilkins. Spitzer, R.L., Kroenke, K., Linzer, M., Hahn, S. R., Williams, J.B.W., deGruy, F.V., III, Brody, D., & Davies, M. (1995). Health-related quality of life in primary care patients with mental disorders: Results from the PRIME-MD 1000 Study. Journal of the American Medical Association, 274, 1511-1517. Spitzer, R.L., Williams, J.B.W., Gibbon, M., & First, M.B. (1992). The Structured Clinical Interview for DSMIII-R(SCID): I. History, rationale, and description. Archives of General Psychiatry, 49, 624-629. Spitzer, R.L., Williams, J.B.W., & Kroenke, K. (1997). Quick guide to the Patient Problem Questionnaire. Biometrics Research, NY State Psychiatric Institute. Available from R. L. Spitzer. Spitzer, R.L., Williams, J.B.W., Kroenke, K., Linzer, M., deGruy, F.V. III, Hahn, S.R., Brody, D., & Johnson, J.G. (1994). Utility of a new procedure for diagnosing mental disorders in primary care: The PRIME-MD 1000 Study. Journal of the American Medical Association, 272, 1749-1756. Spitzer, R.L., Yanovski, S., Wadden, T., Wing, R., Marcus, M.D., Stunkard, A., Devlin, M., Mittchell, J., Hasin, D., & Horne, R.L. (1993). Binge eating disorder: Its further validation in a multisite study. International Journal of Eating Disorders, 13, 137-153. Stewart, A.L., Hays, R.D., & Ware, J.E. (1988). The MOS Short-Form General Health Survey: Reliability and validity in a patient population. Medical Care, 26, 724-732. Turner, R.J., & Noh, S. (1988). Physical disability and depression: A longitudinal analysis. Journal of Health and Social Behavior, 29, 23-37. U.S. Preventive Services Task Force. (1996). Guide to clinical preventive services (2nd ed.). Baltimore: Williams & Wilkins. Ustun, T.B., & Sartorius, N. (Eds.). (1996). Mental illness in primary care: An international study. New York: Wiley. Valenstein, M., Dalack, G., Blow, F., Figueroa, S., Standiford, C., & Douglas, A. (1997). Screening for psychiatric illness with a combined screening and diagnostic instrument. Journal of General Internal Medicine, 12, 679-685. Von Korff, M., Ormel, J., Katon, W., & Lin, E.H.B. (1992). Disability and depression among high utilizers of health care: A longitudinal analysis. Archives of General Psychiatry, 49, 91-100. Von Korff, M., Shapiro, S., Burke, J.D., Teitlebaum, M., Skinner, E.A., German, P., Turner, R.W., Klein, L., & Burns, B. (1987). Anxiety and depression in a primary care clinic: Comparison of Diagnostic Interview Schedule, General Health Questionnaire, and practitioner assessments. Archives of General Psychiatry, 44, 152156. Ware, J.E. (1993). SF-36 Health Survey: Manual and interpretation guide. Boston: The Health Institute, New England Medical Center. Wells, K.B., Stewart, A., Hays, R.D., Burnam, M.A., Rogers, W., Daniels, M., Berry, S., Greenfield, S., & Ware, J. (1989). The functional status and well-being of depressed patients: Results from the Medical Outcomes Study. Journal of the American Medical Association, 262, 907-913. Whooley, M.A., Avins, A.L., Miranda, J., & Browner, W.S. (1997). Case-finding instruments for depression: Two questions are as good as many. Journal of General Internal Medicine, 12, 439-445. Williams, J.B.W., Gibbon, M., First, M.B., Spitzer, R.L., Davies, M., Borus, J., Howes, M.J., Kane, J., & Pope, H.G., Jr. (1992). The Structured Clinical Interview for DSM-III-R (SCID): II. Multisite test-retest reliability. Archives of General Psychiatry, 49, 630-636. Williams, J.B.W., Spitzer, R.L., Linzer, M., Kroenke, K., Hahn, S.R., deGruy, F.V., & Lazev, A. (1995). Gender differences in depression in primary care. American Journal of Obstetrics and Gynecology, 173, 654-659. Wyshak, G., & Barsky, A. (1995). Satisfaction with and effectiveness of medical care in

< previous page

page_919

next page >

< previous page

page_920

next page > Page 920

relations to anxiety and depression: Patient and physician ratings compared. General Hospital Psychiatry, 17, 108-114. Wyshak, G., Barsky, A.J., & Klerman, G.L. (1991). Comparison of psychiatric screening tests in a General medical setting using ROC analysis. Medical Care, 29, 775-785. Zimmerman, M., Lish, J.D., Farber, N.J., Hartung, J., Lush, D., Kuzma, M.A., & Plescia, G. (1994). Screening for depression in medical patients: Is the focus too narrow. General Hospital Psychiatry, 16, 388-396. Zung, W.W.K. (1965). A self-rating depression scale. Archives of General Psychiatry, 12, 63-70. Zung, W.W.K. (1971). A rating instrument for anxiety disorders. Psychosomatics, 12, 164-167. Zung, W.W.K., Magill, M., Moore, J.T., & George, D.T. (1983). Recognition and treatment of depression in a family medicine practice. Journal of Clinical Psychiatry, 44, 3-6.

< previous page

page_920

next page >

< previous page

page_921

next page > Page 921

Chapter 29 Beck Depression Inventory and Hopelessness Scale Randy Katz Joel Katz Brian F. Shaw University of Toronto The Beck Depression Inventory (BDI) The Beck Depression Inventory (BDI) is a self-administered inventory designed to assess current severity of depression developed from clinical observations of depressed and non-depressed psychiatric patients. Clinical observations of attitudes and symptoms characteristic of depressed patients are represented in a 21-item, multiple-choice style questionnaire. Each item consists of several statements varying in the degree to which they reflect specific depressive symptoms and attitudes. Each BDI item requires a rating response on an ordinal scale from 0 to 3, where 0 represents the total absence of the symptom or attitude and 3 indicates the most severe level. The following 21 symptoms and attitudes were established from clinical observation: (a) mood, (b) pessimism, (c) sense of failure, (d) lack of satisfaction, (e) guilt feelings, (f) sense of punishment, (g) selfdislike, (h) self-accusation, (i) suicidal wishes, (j) crying, (k) irritability, (1) social withdrawal, (m) indecisiveness, (n) distortion of body image, (o) work inhibition, (p) sleep disturbance, (q) fatigability, (r) loss of appetite, (s) weight loss, (t) somatic preoccupation, and (u) loss of libido. Summary of Development The BDI was designed for use as a semistructured interview administered by trained interviewers. However, it was developed and refined further to be used as a self-rating instrument taking only 10-15 minutes for administration and scoring. When self-administered, the individual selects one or more of the choices from each item that best reflects how he or she feels. The BDI score is the sum of the rank value associated with the highest ranked statement endorsed from each of the 21 items. The original BDI was developed by Beck and his colleagues in 1961 (Beck, Ward, Mendelson, Mock, & Erbaugh, 1961) and revised by Beck for publication in Beck, Rush, Shaw, and Emery (1979). In refining the psychometric characteristics of the instrument,

< previous page

page_921

next page >

< previous page

page_922

next page > Page 922

modifications have been made, including reducing the number of response possibilities and rewording certain items. There currently are two paper-and-pencil forms of the BDI. One is a short 13-item format that mainly measures a cognitive dimension of depression; and the other is a longer 21-item format that measures noncognitive dimensions of depressive disorder, including somatic concerns (Beck, Steer, & Garbin, 1988). Validity coefficients between the two forms are acceptably high and range from 0.89 to 0.97 (Beck & Beck, 1972; Beck, Rial, & Rickels, 1974; Reynolds & Gould, 1981). Despite these minor differences between versions, the two instruments have been found to be comparable in psychiatric patients (Beck & Steer, 1984). A card format (May, Urquart, & Tarran, 1969) and a number of computer-administered forms have been developed, but there are no data on the reliability and validity of these methods of administration (Beck et al., 1988). Basic Validity and Reliability Information Content Validity. Over the past 30 years, advances in the classification and diagnostic practices of psychiatric disorders have led to the development of DSM in North America and the ICD system in Europe. Although both systems have progressed along similar paths, in recent years the DSM has received more international attention and is perhaps the more widely used and accepted diagnostic system. As noted earlier, the BDI originally was developed from an atheoretical model derived from observations by trained clinicians of patients suffering from depressive illness. Although the BDI is a useful tool for assessing many features of clinical depression, it does not provide enough information to establish a DSM-III-R diagnosis of major depressive episode, but must be supplemented with additional material. For instance, the BDI focuses on a 1-week period preceding administration, whereas DSM-III-R requires the presence of symptoms over a minimum of 2 weeks. The BDI does not assess symptoms relevant to weight gain, hypersomnia, psychomotor agitation, or retardation (Moran & Lambert, 1983; Viedenburg, Krames, & Flett, 1985). Finally, the BDI does not assess for change from a previous level of functioning, which is a critical criteria for the diagnosis of DSMIII-R major depressive disorder. Overall, the content validity is good for six of the nine DSM-III criteria for depressive episode (Moran & Lambert, 1983), but does not address satisfactorily the remaining three criteria. Concurrent Validity. Beck et al. (1988) cited 35 studies where correlations were reported between the BDI and other well-established instruments that measure depression, including (a) Hamilton Psychiatric Rating Scale for Depression (HRSD; Hamilton, 1960), (b) Zung Self-Reported Depression Scale (Zung SDS; Zung, 1965), (c) Minnesota Multiphasic Personality Inventory Depression Scale (MMPI-D; Hathaway & McKinley, 1943), (d) Multiple Affect Adjective Checklist Depression Scale (MAACL-D; Zukerman & Lubin, 1965), and (e) clinicians' ratings of depth of depression (Beck et al., 1974; Salkind, 1969; Strober, Green, & Carlson, 1981; Wittig, Hanlon, & Kurland, 1963) (see Table 29.1). The correlation coefficients between the BDI and these measures ranged anywhere from a relatively modest .33 with DSM-III major depression (Hesselbrock, Hesselbrock, Tenmen, Meyer, & Workman, 1983) to a more substantial .86 with the Zung SDS (Turner & Romano, 1984) and HRSD (Steer, McElroy, & Beck, 1982). However, the most significant relationship was found between clinicians' ratings and the BDI, where the correlation coefficient was reported at .96 (Beck et al., 1974). This is not surprising, because the BDI was developed on the basis of clinical observation of patients suffering with depression. Taken together, the data show that the BDI correlates well with most other self-report measures of depression.

< previous page

page_922

next page >

< previous page

page_923

next page > Page 923

TABLE 29.1 Correlations Between the Beck Depression Inventory and Other Measures of Depression References ClinicalHamiltonZungMMPI-MAACLD D Psychiatric Bailey and Coopen (1976) .68 Beck et al. (1961) .66 Blatt et al. (1982) .81M .44 .77F Bloom and Brady (1968) .66 Davies et al. (1975) .73 .73 Hesselbrock et al. (1983) .59 May et al. (1969) .65 Mendels, Secunda, and .79 .70 .59 Dyson (1972) Metcalfe and Goldman .62 (1965) Reynolds and Gould (1981) .57 Rounseville et al. (1979) .60 .71 Schnurr et al. (1976) .61 .70 .55 Seitz (1970) .83 .41 Steer et al. (1982) .86 Strober et al. (1981) .67 Nonpsychiatric Atkeson et al. (1982) .73 Campbell et al. (1984) .56 Clarke and Williams (1979) .67 Coleman and Miller (1975) .55 Giambra (1977) .66 Hammen (1980) .80 Hatzenbuchler et al. (1983) .78 Marsella et al. (1975) .62 Salkind (1969) .73 Schwab et al. (1967) .75 Scogin and Merbaum .63 (1983) Scott et al. (1982) .63 Tanaka-Matsumi, and .68 Kameoka (1986) Turner and Romano (1984) .86 .75 Combined Beck et al. (1974) .55 .89 .56 .96 .67 Carroll et al. (1973) .41 Schaeffer et al. (1985) .65 .81 .59 .67 .76 .57 Reprinted from Clinical Psychology Review, 8, by A.T. Beck, R.A. Steer, and M.A. Garbin, ''Psychometric properties of the Beck Depression Inventory: Twenty-five years of evaluation," pp. 77100, Copyright (1988)), with kind permission from Pergamon Press Ltd, Headington Hill Hall, Oxford OX3 OBW, UK. Discriminant Validity. Although the BDI was developed to assess the severity or depth of depression in psychiatric patients (Beck et al., 1961), a number of authors have investigated the discriminant validity of the BDI in relation to psychiatric and nonpsychiatric populations (Akiskal, Lemmi, Yetevanian, King, & Belluomini, 1982; Byerly & Carlson, 1982; Clark, Cavanaugh, & Gibbons, 1983; Gallagher, Nies, & Thompson, 1982). These studies demonstrated significantly lower scores on the BDI among nondepressed normals than depressed psychiatric patients and patients with other nonpsychiatric clinical disorders.

< previous page

page_923

next page >

< previous page

page_924

next page > Page 924

Evidence of the ability of the BDI to discriminate between subtypes of depression is limited. Studies looking at the ability of the BDI to discriminate between subtypes of depression generally have failed to show any significant effect (Delay, Pachot, Lemperiere, & Mirouze, 1963; Schnurr, Hoaken, & Jarrett, 1976). However, Beck et al. (1988) reported that outpatients with a recurrent episode of major depression showed higher mean BDI scores than patients suffering with a dysthymic disorder. Reliability. Beck et al. (1988) examined the reliability of the BDI by conducting a meta-analysis of 25 published papers using the BDI. The subject samples for these populations consisted of schizophrenics, substance abusers, college students, and depressed patients. Regardless of the population sampled, internal consistency estimates were high (ranging from .73 to .95). In addition, Beck et al. (1988) presented information on the stability of the BDI from 10 studies that administered the inventory to the same patients on two occasions. As expected, stability estimates were higher for nonpsychiatric patients (.60 to .83) than for psychiatric patients (.48 to .86), reflecting the sensitivity of the BDI to changes in psychiatric symptomatology. Research Applications and Findings One of the most important applications of the BDI has to do with its sensitivity in measuring change in depressive symptoms and severity. The BDI has been used extensively in research studies designed to assess the efficacy of pharmacological interventions (Bellack & Rosenberg, 1966; Broadhurst, 1970; Burrows, Foenander, Davies, & Scoggins, 1976; Coppen, Whybrow, Nuguera, Maggs, & Prange, 1972; Lipsedge & Rees, 1971; Mendels, Secunda, & Dyson, 1972), electroconvulsive therapy (ECT; Green & Statduhat, 1966), psychotherapy (Blackburn, Bishop, Glen, Whalley, & Cristie, 1981; Kovacs, Rush, Beck, & Hollon, 1981; Rush, Beck, Kovacs, & Hollon, 1977), and group therapy (Antonuccio, Lewinsohn, & Steinmetz, 1982). Overall, these studies have shown the BDI to be a sensitive and valuable instrument in detecting statistically significant changes in symptoms and their severity as a result of these various treatment approaches. The value of using symptombased research tools such as the BDI recently was advocated by Costello (1992). Recent studies have highlighted the importance of defining a significant change in BDI scores from a clinical as opposed to a statistical perspective. One approach, advocated by Jacobson, Follette, and Revenstorf (1984), aims to determine whether the observed changes exceed measurement error of the particular psychometric instrument taking into account correctional factors. Alternatively, Steer, Beck, and Garrison (1986) suggested that at least a 10-point drop in BDI scores from pre- to posttreatment would indicate a clinically significant change, but there are no specific studies on this important decision. Limitations/Potential Problems in Use The BDI was developed as a symptom inventory, not as a diagnostic instrument. Therefore, inappropriate use of the BDI as a diagnostic instrument can lead to misleading information, which may overestimate the prevalence of depressive illness. For instance, Ennis, Barnes, Kennedy, and Trachtenberg (1989) examined a series of 71 consecutive admissions to an inpatient psychiatric crisis service following the patients' deliberate attempts at self-harm. Although 80% of those admitted to hospital scored within the moderate to severe ranges of

< previous page

page_924

next page >

< previous page

page_925

next page > Page 925

depression as measured by the BDI, only 31% met DSM-III criteria for major depressive episode. Ennis and his colleagues reported a dramatic reduction in BDI scores within a few days following admission, even though these patients did not receive any significant treatment for depression. Similar findings were reported by Newson-Smith and Hirsch (1979), using the General Health Questionnaire (GHQ) and the Present State Examination (PSE), and by van Pragg and Plutchik (1985), using subjective recollection of distress. These findings suggest that for patients in a current state of acute emotional distress, high BDI scores may not necessarily reflect clinical depression, but may be interpreted as general psychological distress. Beck Hopelessness Scale (BHS) Two opposing views tended to dominate the literature on depression in the early 1960s. One view held that hopelessness represents an amorphous emotional experience that does not lend itself to measurement or systematic quantification. A second opposing view proposed that, although the emotional component is prominent in the experience of hopelessness, the construct nevertheless can be defined, measured, and objectified in terms of a system of negative statements and attitudes concerning an individual's current view of self and future expectations (Stotland, 1969). Although difficult to define, hopelessness may be seen as the degree to which an individual has a general negative expectancy about events in his or her future, and is one component of Beck's (1967) cognitive triad of negative cognition (i.e., the depressed person's experiences regarding the self, the world, and the future). The relationship between an individual's specific goals and his or her expectations about the likelihood of achieving them plays a major role in determining the degree of hopelessness experienced (Melges & Bowlby, 1969). The Beck Hopelessness Scale (BHS) was designed operationally to define and quantify the concept of hopelessness and to facilitate the study of negative expectations and their relationship to psychopathology. The BHS is a 20-item self-administered inventory constructed in a forced choice (true/false) format to assess the respondent's negative expectations and pessimistic outlook. Each of the 20 items is scored either 1 or 0. A score of 1 is assigned to 11 items for a true response and to the remaining 9 items when a false response is endorsed. The total score is obtained by calculating the sum of the scores on all 20 items (range of possible scores is from 0 to 20). Summary of Development The BHS (Beck, Weissman, Lester & Trexler, 1974) was developed to advance the study of those psychopathological states in which a pervasive sense of personal hopelessness dominated the clinical picture. For instance, hopelessness is a core characteristic of depressive disorder (Beck, 1963, 1967; Melges & Bowlby, 1969), a defining feature of suicidal intent (Beck, 1963; Beck, Brown, Berchick, Stewart, & Steer, 1990; Beck, Steer, Kovacs, & Garrison, 1985; Hill, Gallagher, Thompson, & Ishida, 1988), and is associated strongly with certain physical illnesses (Schmale, 1958). In its development, items were selected from two main sources. Nine items were selected from Heimberg's (1961) test regarding attitudes about the future, and 11 items were drawn from a series of statements made by psychiatric

< previous page

page_925

next page >

< previous page

page_926

next page > Page 926

patients, reflecting the clinical characteristics of hopelessness or negative expectations about the future (Beck et al., 1974). Basic Validity and Reliability Information Content Validity. Content validity initially was assessed by several clinicians who reviewed the BHS for depressive content and comprehensibility (Beck et al., 1974). It subsequently was administered concurrently with the BDI. The BHS has a moderately high correlation with the BDI (e.g., r = .68; Minkoff, Bergman, Beck, & Beck, 1973) and with clinical ratings of hopelessness (Ammerman, 1988). Concurrent Validity. Concurrent validity was assessed by comparing BHS scores with general clinical ratings of hopelessness, which included the negative expectancies and observable behaviors of (a) outpatients in a general medical practice and (b) patients who had been hospitalized for attempting suicide. Correlations between BHS scores and clinical ratings of hopelessness for general practice patients and the attempted suicide sample were .74 and .62, respectively (Beck et al., 1974) as well as with the Stuart Future test (.60). In addition, BHS ratings have been shown to be related significantly to expressed suicidal intent (Beck, Kovacs, & Weissman, 1975). Predictive Validity. Beck et al. (1985) carried out a prospective study, in which 165 patients initially hospitalized for significant suicidal ideation were followed-up over a 10-year period. The data were analyzed to determine the relevant cutoff score to maximize the predictive power of the BHS. Ninety-one percent of the sample obtained a BHS score of 10 or more, whereas only 9% (one patient) of completed suicide attempts had a score under 10. More recently, Beck et al. (1990) confirmed the predictive power of the BHS in its ability to identify suicide completers from among a large sample (n = 1,958) of psychiatric outpatients. A scale cutoff score of nine or higher identified 94% (n = 16) of the 17 patients who eventually committed suicide. The high-risk group identified by this cutoff score was 11 times more likely to commit suicide compared with low-risk patients with BHS scores under nine. These findings support the view that the BHS can be an important instrument in correctly identifying psychiatric patients who ultimately commit suicide. However, this sensitivity in detecting suicide risk occurs at the expense of incorrectly classifying a high proportion of patients who will not commit suicide (i.e., low specificity). Nevertheless, given the importance of correctly identifying high-risk patients, a high rate of false positives is acceptable. Construct Validity. Perhaps the most convincing evidence for the construct validity of the BHS comes from its strong association with suicidal intent and actual suicide completion (Beck et al., 1985, 1990). Hopelessness as measured by the BHS has a stronger association with suicidal intent than do measures of clinical depression (Beck et al., 1985, 1990; Weissman, Beck, & Kovacs, 1979). Indeed, Beck et al. (1975) found that the relationship between depression and suicidality is reduced when the effect of hopelessness is partialled out statistically. Further evidence for the construct validity of the BHS comes from two factor analytic studies, where three similar main factors consistently emerged from both (Beck et al., 1974; Hill et al., 1988). These studies suggested that three factors with the most clinical relevance represented affective, motivational, and cognitive aspects of hopelessness. Factor 1, labeled "feelings about the future" (Beck et al., 1974) or "hope" (Hill et al., 1988), loaded on affect-laden

< previous page

page_926

next page >

< previous page

page_927

next page > Page 927

associations such as hope, enthusiasm, happiness, faith, and good times. Factor 2, labeled "loss of motivation" (Beck et al., 1974) or "giving up" (Hill et al., 1988), loaded heavily on constructs associated with giving up and deliberate self-denial. Factor 3, labeled "future expectations" (Beck et al., 1974) or "plans about the future" (Hill et al., 1988), included items related to a dark future, negative expectations, and a vague and uncertain outlook. Reliability. Overall, the BHS has been shown to be a reliable measure of hopelessness reflecting a negative expectation for positive future outcomes. Beck et al. (1974) examined the reliability of the BHS in a population of 294 hospitalized patients who had attempted suicide. The coefficient alpha for internal consistency of the scale calculated using the Kuder-Richardson formula was 0.93. Intercorrelations for individual scale items and total scale score were within an acceptable range from .39 to .76. Further evidence for the reliability of the BHS was obtained by Hill et al. (1988) in their examination of hopelessness as a measure of suicidal intent in the depressed elderly. An examination of the internal consistency of the BHS indicated a coefficient alpha of .84 and a Spearman-Brown split-half reliability of .82. Interpretative Strategies and Treatment Planning The total score on the BDI can range from 0, suggesting no depression, to a maximum score of 63, indicating a severe state of clinical depression. Although there are no specific cutoff scores designed to reflect clinical caseness, the following ranges, suggested by Beck et al. (1988) typically have been used to guide decision making in clinical and research settings; 0-9 absence of, or minimal, depression; 10-18 mild to moderate depression; 19-29 moderate to severe depression; 30-63 severe depression. In addition to using the total BDI score as a general index of severity in assessing depressive symptoms, an examination of individual items endorsed with a rank score of 2 or 3 on the questionnaire may point the clinician to further investigation. For example, when patients endorse Item 9 (concerned with suicide) with a response of 2 or 3, it is imperative that the clinician carry out a thorough assessment of the risk of suicide. There also is evidence that the pessimism item on the BDI differentiates suicide completers from noncompleters (Beck et al., 1985), and therefore should alert the clinician to the possible danger of suicide ideation or behavior, and hence to further investigation. Likewise, an affirmative response to the item related to concerns about health or somatic preoccupation might lead one to consider further medical investigation and on the cognitive-affective items, to further psychological investigation. The BDI can be used to develop treatment planning from early on in the initial stages of therapy. High scores on items related to motivational deficits, such as social withdrawal and work inhibition, would suggest a treatment plan emphasizing behaviorally oriented strategies focused on helping the patient to increase his or her activities. In contrast, high scores on items related to cognitive deficits, such as pessimism, self-dislike, and selfblame/criticism, would suggest a treatment plan with greater emphasis on identifying and addressing hopelessness, negative thinking, and cognitive distortions. The BDI is sensitive to changes in depressive symptoms, and therefore can be used to track variations in these symptoms on a session-by-session basis. A number of studies using the BDI as a pre- and posttreatment measure have demonstrated significant reductions in mean BDI scores as a consequence of various types of pharmacological treatments. For

< previous page

page_927

next page >

< previous page

page_928

next page > Page 928

instance, mean BDI scores were found to be reduced in depressed patients treated with tricyclic medications (Bellack & Rosenberg, 1966; Lipsege & Rees, 1971), lithium carbonate (Mendels et al., 1972), and ECT (West, 1981). The BDI also has been found to be sensitive to psychologically oriented therapeutic interventions. The mean BDI score was lower following cognitive behavior therapy (CBT) in several studies (Blackburn et al., 1981; Kovacs et al., 1981; Rush et al., 1977) and comparable results have been found with interpersonal therapy. The importance of the BHS lies in its clinical utility. It has been successful in identifying patients experiencing such intense hopelessness that they are of high risk for suicide. As mentioned earlier, the total score on the BHS can range from 0, suggesting no hopelessness, to a maximum score of 20, indicating the absence of all hope. Although there are no specific cutoff scores designed to reflect caseness with respect to hopelessness, a score of 9 or more has been associated with a significant risk of suicide (Beck et al., 1985, 1990). High-risk psychiatric outpatients with a score of 9 or more were 11 times more likely to commit suicide than low-risk patients with scores below 9 (Beck et al., 1990). When interpreting scores on the BHS, clinicians should be mindful that scores above 10 may signal immediate or long-term suicide potential. It must be emphasized that a comprehensive assessment of suicide should include other clinical indices, including a history of suicide attempts, family history of suicide, alcohol and drug abuse, and the presence of an affective disorder (Beck et al., 1990). Case Report Mr. A is a 43-year-old married man (second marriage) with three children (from his current marriage). He presented to the clinic with severe anxiety and sleep difficulty that he attributed to concerns about his job. He also reported increasing his alcohol consumption from being a "business drinker" (he was in sales/marketing) to drinking for stress relief (average of four drinks per day for 7 weeks). He denied feeling depressed and denied having suicidal ideation. Mr. A had concerns about how the clinician would respond to him, and on several occasions commented that he must seem like a real "baby" for being so "stressed out." He had no family history for depression or alcohol abuse. He described his father as an "Iron John" type and his mother as "loving, but a worrier." The major precipitants for his recent symptoms involved both financial and work stresses. His company was going through a major restructuring, and it appeared that he would be under extreme pressure to produce or be fired. Two years ago, he moved into a new house with a large mortgagea decision that had worried him. He was very concerned that his friends who were "fun loving jocks" would see through him and ridicule him. In fact, his best friend had commented that Mr. A seemed "off in space" at their last lunch. Mr. A reported that his wife was understanding and supportive. She had been in the health-care field and encouraged him to get a psychological consultation. Mr. A felt considerable responsibility toward his family and was moved to tears in the interview when he thought about "letting them down." As part of our standard intake assessment, Mr. A completed the BDI and the BHS. His BDI score was 18, with notable items (scored 2 or 3) being sleep disturbance, guilt, failure, and decreased interest. This score was notable given Mr. A's general comments that he

< previous page

page_928

next page >

< previous page

page_929

next page > Page 929

wasn't depressed (he endorsed the BDI statement No. 1sadness as 0). Mr. A's BHS score was 14, a score that was concerning, given his clinical presentation. Mr. A had considerable pessimism about his situation. In the second interview, he minimized his report stating that work might "turn around." The clinician took careful note of his hopelessness and, consistent with cognitive therapy, related it to his degree of helplessness and his self-criticism (worthlessness). The risk of suicidal behavior was considered. Mr. A denied any intent to attempt suicide, any previous attempts, and had only fleeting thoughts about suicide and escape. He began a treatment regimen including antidepressant medication and cognitive behavior therapy. Three weeks later, Mr. A, during his therapy session, acknowledged that he had, in fact, bought ammunition for his rifle just 1 week before his initial evaluation. He reported feeling positive about his therapy. By disclosing this information, the therapist arranged to dispose of the gun and ammunition. Mr. A maintained that he did not intend to harm himself, but acknowledged that his feelings of despondency were greater than he had expressed initially. It was clear that his concerns about his job were going to be ongoing. The company was not doing well and the marketing efforts in the recession were having limited effects. Therapy focused on his perceived helplessness and his attributional style (significant self-blame and tendency to take excessive responsibility for failure). Interestingly, his BDI score remained relatively stable at 18 to 20 for 11 weeks. Mr. A's sleep improved, but other symptoms (guilt, sense of failure) were very resilient. By 16 weeks of therapy, his score was 11; and by 20 weeks, it was 9. His BHS score dropped from 14 to 7 by week 11 and was 3 at the end of 20 weeks. In sum, this case illustrates how a psychological assessment utilizing the BDI and BHS may help to alert the clinician to issues as a function of their discrepancy with self-report in the clinical interview. The BDI and BHS are both sensitive to change over the course of therapy and may be used to determine the severity of depression and hopelessness, respectively. In addition, it may be useful to consider higher (or lower) than expected scores to pursue in the interview and/or over time. The self-report scales are both prone to social desirability, and unfortunately it may be that significant clinical symptoms are not reported. On the other hand, as in the case of Mr. A, important symptoms or a state of mind like hopelessness may be detected when clinically the patient minimizes his or her distress. Self-report instruments are not infallible, but they do provide information that is clinically useful. Summary and Conclusions The Beck Depression Inventory (BDI) and the Beck Hopelessness Scale (BHS) are 20-item, self-report inventories designed to measure depression and hopelessness, respectively, in a variety of clinical and research settings. Both questionnaires are easily understood and administered, and require approximately 5-10 minutes to complete and score. The BDI has been the subject of extensive psychometric evaluation and has been demonstrated to have high content, concurrent, predictive, and construct validity, and also to be highly internally consistent. It is especially useful in treatment planning with high and low scores suggesting different psychotherapeutic strategies. The BHS was designed to define and measure operationally the concept of hopelessness and its relationship to psychopathology. Although the BHS has not been studied as extensively as the BDI, the available literature indicates that it, too, has high validity and internal consistency. In particular, the BHS is useful in identifying

< previous page

page_929

next page >

< previous page

page_930

next page > Page 930

patients at high risk for attempted or completed suicide, but it also has low specificity. The resulting high rate of false positives can be overlooked in view of the importance of correctly identifying patients at high risk for suicide. References Akiskal, H.S., Lemmi, H., Yetevanian, B., King, D., & Belluomini, J. (1982). The utility of the REM latency in psychiatric diagnosis: A study of 81 depressed outpatients. Psychiatry Research, 7, 101-110. Ammerman, R.T. (1988). Hopelessness scale. In M. Mersen (Ed.), Dictionary of behavioural assessment techniques (pp. 251-252). University of Pittsburgh: Pergamon. Antonuccio, D.O., Lewinsohn, P.M., & Steinmetz, J.L. (1982). Identification of therapist differences in a group treatment for depression. Journal of Consulting and Clinical Psychology, 50, 433-435. Atkeson, B.M., Calhoun, K.S., Resnick, P.A., & Ellis, E.M. (1982). Victims of rape: Repeated assessment of depressive symptoms. Journal of Consulting and Clinical Psychology, 50, 96-102. Bailey, J., & Coopen, A. (1976). A comparison between the Hamilton Rating Scale and the Beck Inventory in the measurement of depression. British Journal of Psychiatry, 128, 486-489. Beck, A.T. (1963). Thinking and depression. 1: Idiosyncratic content and cognitive distortions. Archives of General Psychiatry, 9, 324-335. Beck, A.T. (1967). Depression: Clinical, experimental, and theoretical aspects. New York: Harper & Row. Beck, A.T., & Beck, R.W. (1972). Screening depressed patients in family practice: A rapid technique. Postgraduate Medicine, 52, 81-85. Beck, A.T., Brown, G., Berchick, R.J., Stewart, B.L., & Steer, R.A. (1990). Relationship between hopelessness and ultimate suicide: A replication with psychiatric outpatients. American Journal of Psychiatry, 147, 190-195. Beck, A.T., Kovacs, M., & Weissman, A. (1975). Hopelessness and suicidal behavior: An overview. Journal of the American Medical Association, 234, 1146-1149. Beck, A.T., Rial, W.Y., & Rickels, K. (1974). Short form of depression inventory: Cross-validation. Psychological Reports, 34, 1184-1186. Beck, A.T., Rush, A.J., Shaw, B.F., & Emery, G. (1979). Cognitive therapy of depression. New York: Guilford. Beck, A.T., & Steer, R.A. (1984). Internal consistencies of the original and revised Beck depression inventories. Journal of Clinical Psychology, 40, 1365-1367. Beck, A.T., Steer, R.A., Kovacs, M., & Garrison, B. (1985). Hopelessness and eventual suicide: A 10-year prospective study of patients hospitalized with suicidal ideation. American Journal of Psychiatry, 142(5), 556563. Beck, A.T., Steer, R.A., & Garbin, M.A. (1988). Psychometric properties of the Beck Depression Inventory: Twenty-five years of evaluation. Clinical Psychology Review, 8, 77-100. Beck, A.T., Ward, C.H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571. Bellack, L., & Rosenberg, S. (1966). Effects of antidepressant drugs on psychodynamics. Psychosomatic Medicine, 7, 106-114. Berndt, S.M., Berndt, D.J., & Byars, W.D. (1983). A multi-institutional study of depression in family practice. Journal of Family Practice, 16, 83-87. Blackburn, I.M., Bishop, S., Glen, A.I.M., Whalley, I.J., & Christie, W. (1981). The efficacy of cognitive therapy in depression: A treatment trial using cognitive therapy and pharmacotherapy each alone and in combination. British Journal of Psychiatry, 139, 181-189. Blatt, S.J., Quinlan, D.M., Chevron, E.S., McDonald, C., & Zuroff, D. (1982). Dependency and self-criticism: Psychological dimensions of depression. Journal of Consulting and Clinical Psychology, 50, 113-115.

< previous page

page_930

next page >

< previous page

page_931

next page > Page 931

Broadhurst, A.D. (1970). I tryptophan vs. ECT (letter). Lancet, 1, 1392. Burrows, G.D., Foenander, G., Davies, H., & Scoggins, B.A. (1976). Rating scales as predictors of response to tricyclic antidepressants. Australian and New Zealand Journal of Psychiatry, 10, 53-56. Byerly, F.C., & Carlson, W.A. (1982). Comparison among inpatients, outpatients, and normals on three selfreport depression inventories. Journal of Clinical Psychology, 38, 797-804. Campbell, M.M., Burgess, P.M., & Finch, S. J. (1984). A factorial analysis of BDI scores. Journal of Clinical Psychology, 40, 992-996. Carroll, B.J., Fielding, J.M., & Blashki, T. G. (1973). Depression rating scales: A critical review. Archives of General Psychiatry, 28, 361-366. Christenfeld, R., Lubin, B., & Satin, M. (1978). Concurrent validity of the Depression Adjective Check List in a normal population. American Journal of Psychiatry, 135, 582-584. Clarke, D.C., Cavanaugh, S.V., & Gibbons, R. D. (1983). The core symptoms of depression in medical and psychiatric patients. Journal of Nervous and Mental Diseases, 171, 705-713. Clarke, M., & Williams, A.J. (1979). Depression in women after perinatal death. Lancet, 1, 916-917. Coleman, R.E., & Miller, A.G. (1975). The relationship between depression and marital maladjustment in a clinic population: A multitrait-multimethod study. Journal of Consulting and Clinical Psychology, 43, 647-651. Coppen, A., Whybrow, P.C., Nuguera, R., Maggs, R., & Prange, A.J. (1972). The comparative antidepressant value of 1 trytophan and imipramine with and without attempted potentiation by leithrenine. Archives of General Psychiatry, 26, 474-478. Costello, C.G. (1992). Research on symptoms versus research on syndromesArguments in favor of allocating more research time to the study of symptoms. British Journal of Psychiatry, 160, 304-308. Davies, B., Burrows, G., & Poyton, C. (1975). A comparative study of four depression rating scales. Australian and New Zealand Journal of Psychiatry, 9, 21-24. Delay, J., Pachot, P., Lemperiere, T., & Mirouze, R. (1963). La nosologie des etats depressifs: Rapports entre l'etologie et la semiologie: 2 Resultats du Questionnaire de Beck/Classification of depressive states. [Agreement between etiology and symptomatology: 2 Results of Beck's Questionnaire.] Encephale, 52, 497-505. Ennis, J., Barnes, R.A., Kennedy, S., & Trachtenberg, D. D. (1989). Depression in selfharm patients. British Journal of Psychiatry, 154, 41-47. Gallagher, D., Nies, G., & Thompson, L. W (1982). Reliability of the Beck Depression Inventory with older adults. Journal of Consulting and Clinical Psychology, 50, 152-153. Giambra, L. M. (1977). Independent dimension of depression: A factor analysis of three self-report depression measures. Journal of Clinical Psychology, 33, 928-935. Green, W.J., & Statduhat, P.P. (1966). The effects of the ECT on the sleep dream cycle in a psychotic depression. Journal of Nervous Mental Disorders, 143, 123-134. Hamiliton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry, 23, 56-62. Hammen, C. I. (1980). Depression in college students: Beyond the Beck Depression Inventory. Journal of Consulting and Clinical Psychology, 48, 126-128. Hatzenbuchler, L.C., Parpal, M., & Mathews, L. (1983). Classifying college students as depressed or nondepressed using the Beck Depression Inventory: An empirical analysis. Journal of Consulting and Clinical Psychology, 51, 360-366. Heimberg, L. (1961). Development and construct validation of an inventory for the measurement of future time perspective. Unpublished master's thesis, Vanderbilt University, Nashville, TN. Hesselbrock, M.M., Hesselbrock, V.M., Tenmen, H., Meyer, R.E., & Workman, K.L. (1983). Methodological considerations in the assessment of depression in alcoholics. Journal of Consulting and Clinical Psychology, 51, 399-405. Hill, R.D., Gallagher, D., Thompson, L.W., & Ishida, T. (1988). Hopelessness as a measure of suicidal intent in the depressed elderly. Psychology and Aging, 3, 230-232. Jacobson, N.S., Follette, W.C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods of reporting variability and evaluating

< previous page

page_932

next page > Page 932

clinical significance. Behaviour Therapy, 15, 336-352. Kovacs, M., Rush, A.J., Beck, A.T., & Hollon, D.S. (1981). Depressed outpatients treated with cognitive therapy or pharmacotherapy: A one-year follow-up. Archives of General Psychiatry, 38, 33-39. Lipsedge, M.S., & Rees, W.I. (1971). A double-blind comparison of doxepin and amitriptyline for the treatment of depression with anxiety. Psychopharmacologia, 19, 153-162. Marsella, A.J., Sanborn, K.O., Kameoka, V., Shiguru, L., & Brennan, J. (1975). Cross-validation of self-report measures of depression among normal populations of Japanese, Chinese, and Caucasian ancestry. Journal of Clinical Psychology, 31, 281-287. May, A.E., Urquart, A., & Tarran, J. (1969). Self-evaluation of depression in various diagnostic and therapeutic groups. Archives of General Psychiatry, 21, 191-194. Melges, F., & Bowlby, J. (1969). Types of hopelessness in psychopathological process. Archives of General Psychiatry, 20, 690-699. Mendels, J., Secunda, S.K., & Dyson, W.L. (1972). A controlled study of the antidepressant effects of lithium carbonate. Archives of General Psychiatry, 26, 154-157. Metcalfe, M., & Goldman, E. (1965). Validation of an inventory for measuring depression. British Journal of Psychiatry, 111, 240-242. McKinley, J.C., & Hathaway, S.R. (1943). Identification and measurement of psychoneuroses in medical practice; Minnesota Multiphasic Personality Inventory. Journal of the American Medical Association, 122, 161167. Minkoff, K., Bergman, E., Beck, A.T., & Beck, R.W. (1973). Hopelessness, depression, and attempted suicide. American Journal of Psychiatry, 130, 455-459. Moran, P.W., & Lambert, M.J. (1983). A review of current assessment tools for monitoring changes in depression. In M.S. Lambert, E.R. Christensen, & S.S. DeJulio (Eds.), The assessment of psychotherapy outcome. New York: Wiley. Newson-Smith, J.G.B., & Hirsh, S.R. (1979). Psychiatric symptoms in self-poisoning patients. Psychological Medicine, 9, 493-500. Reynolds, W.M., & Gould, J.W. (1981). A psychometric investigation of the standard and short form of the Beck Depression Inventory. Journal of Consulting and Clinical Psychology, 49, 306-307. Rounseville, B.J., Weissman, M.M., Rosenberger, P.H., Wilber, C.H., & Kleber, H.D. (1979). Detecting depressive disorders in drug abusers. Journal of Affective Disorders, 1, 255-267. Rush, A.J., Beck, A.T., Kovacs, M., & Hollon, S. (1977). Comparative efficacy of cognitive therapy and pharmacotherapy in the treatment of depressed outpatients. Cognitive Therapy and Research, 1, 17-37. Salkind, M.R. (1969). Beck Depression Inventory in general practice. Journal of the Royal College of General Practitioners, 18, 267-271. Schaefer, A., Brown, J., Watson, C.G., Plemel, D., DeMotts, J., Howard, M.T., Petrik, N., Balleweg, B.J., & Anderson, D. (1985). Comparison of the validities of the Beck, Zung, and MMPI depression scales. Journal of Consulting and Clinical Psychology, 53, 415-418. Schmale, A.H. (1958). Relationship of separation and depression to disease: A report on a hospitalized medical population. Psychosomatic Medicine, 20, 259-277. Schnurr, R., Hoaken, P.C.S., & Jarrett, F. J. (1976). Comparison of depression inventories in a clinical population. Canadian Psychiatric Association Journal, 21, 473-476. Schwab, J. J., Bialow, M., Brown, J. M., & Holzer, C. E. (1967). Diagnosing depression in medical inpatients. Annals of Internal Medicine, 67, 695-707. Scogin, F. R., & Merbaum, M. (1983). Humorous stimuli and depression: An examination of Beck's premise. Journal of Clinical Psychology, 39, 165-169. Scott, N. A., Hannum, T. E., & Christ, S. L. (1982). Assessment of depression among incarcerated females. Journal of Personality Assessment, 46, 372-379. Seitz, F. C. (1970). Five psychological measures of neurotic depression: A correlation study. Journal of Clinical Psychology, 26, 504-505. Steer, R. A., McElroy, M. G., & Beck, A. T. (1982). Structure of depression in alcoholic men: A partial replication. Psychological Report, 50, 723-728. Stotland, E. (1969). The psychology of hope. San Francisco, CA: Jossey-Bass.

< previous page

page_933

next page > Page 933

Strober, M., Green, J., & Carlson, G. (1981). Utility of the Beck Depression Inventory with psychiatrically hospitalized adolescents. Journal of Consulting and Clinical Psychology, 49, 482-483. Tanaka-Matsumi, J., & Kameoka, V. A. (1986). Reliabilities and concurrent validities of popular self-report measures of depression, anxiety, and social desirability. Journal of Consulting and Clinical Psychology, 54, 328333. Turner, J. A., & Romano, J. M. (1984). Self-report screening measures for depression in chronic pain patients. Journal of Clinical Psychology, 40, 909-913. van Pragg, H., & Plutchik, R. (1985). An empirical study on the ''cathartic effect" of attempted suicide. Psychiatry Research, 16(2), 123-130. van Pragg, H., & Plutchik, R. (1987). Interconvertability of five self-report measures of depression. Psychiatry Research, 22(3), 243-256. Viedenburg, K., Krames, I., & Flett, G. L. (1985). Reexamining the Beck Depression Inventory: The long and short of it. Psychological Reports, 36, 767-778. Weissman, A., Beck, A. T., & Kovacs, M. (1979). Drug abuse, hopelessness, and suicidal behavior. International Journal of addiction, 14, 451-464. West, E. D. (1981). Electric convulsion therapy in depression: A double-blind controlled trial. British Medical Journal, 282, 355-357. Zukerman, M., & Lubin, B. (1965). Manual for the Multiple Affect Adjective Checklist. San Diego: California Educational and Industrial Testing Service. Zung, W.W.K. (1965). A self-rating depression scale. Archives of General Psychiatry, 12, 63-70.

< previous page

page_933

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_935

next page > Page 935

Chapter 30 Hamilton Depression Inventory Kenneth A. Kobak Dean Foundation William M. Reynolds University of British Columbia The Hamilton Depression Inventory (Reynolds & Kobak, 1995a, 1995b) was developed as a self-report, paperand-pencil and computer-administered measure that emulates and extends the Hamilton Depression Rating Scale (HAMD; Hamilton, 1960, 1967) clinical interview. The HAMD was one of the first symptom rating scales developed to quantify the severity of depressive symptomatology. First introduced by Max Hamilton in 1960, it has since become the most widely used and accepted outcome measure for the evaluation of depression. It was included in the National Institute of Mental Health's' Early Clinical Drug Evaluation program (ECDEU) assessment manual (Guy, 1976) in an attempt to provide a uniform battery of assessments for use in psychotropic drug evaluation. As a result, it has become the standard outcome measure used in clinical trials of new antidepressant medications presented to the Food and Drug Administration (FDA) by pharmaceutical companies for approval of new drug applications. It was also the primary outcome measure in the National Institute of Mental Health (NIMH) collaborative studies comparing pharmacotherapy with psychotherapy for the treatment of depression (Elkin et al., 1989), and is often the standard against which other depression rating scales are validated (Carroll, Feinberg, Smouse, Rawson, & Greden, 1981; Montgomery & Asberg, 1979; Reynolds & Kobak, 1998). The scale has been translated into European (Fava, Kellner, Munari, & Pavan, 1982; Hamilton, 1986; Ramos-Brieva, & Cordero-Villafafila, 1988) and Asian (Kim, 1977) languages. The psychometric properties of the scale have been well documented (see Hedlund & Vieweg, 1979, for a review), and a meta-analysis found the scale more sensitive in measuring change due to treatment (both drug and psychotherapy) than several self-rated scales (Edwards et al., 1984; Lambert, Hatch, Kingston, & Edwards, 1986). Hamilton (1960, 1967) reported that the scale was not intended for use as a diagnostic instrument, but was designed for measuring symptom severity in patients who have already been diagnosed as depressed. As in all symptom rating scales, all the exclusion and inclusion criteria required for a diagnosis are not necessarily evaluated. Nonetheless, using various cutoff scores, the scale has shown good properties as a "screener," that is, it demonstrated high levels of sensitivity and specificity in identifying those likely to

< previous page

page_935

next page >

< previous page

page_936

next page > Page 936

have a diagnosis of major depression and thus warranting further evaluation (Reynolds, Kobak, & Greist, 1992a). The sensitivity and specificity of the HAMD in distinguishing between diagnostic groups has also been examined (Reynolds et al., 1992a; Riskind, Beck, Brown, & Steer, 1987; Thase, Hersen, Bellack, Himmelhoch, & Kupfer, 1983). Others have incorporated the HAMD into or have extracted HAMD scores out of structured diagnostic interviews such as the Diagnostic Interview Schedule (DIS; Whisman et al., 1989) and the Schedule for Affective Disorders (SADS; Endicott, Cohen, Nee, Fleiss, & Sarantakos, 1981). The HAMD was designed to be administered by a trained clinician using a semistructured clinical interview. The standard version of the HAMD consists of 17 "items," each of which is rated on either a 5-point (0-4) or a 3-point (0-2) scale, the latter being used in cases where quantification was felt either difficult or impossible (see Table 30.1). In general, on the 5-point scale, a rating of 0 = absent, 1 = doubtful to mild, 2 = mild to moderate, 3 = moderate to severe, and 4 = very severe. A rating of 4 is usually reserved for extreme symptoms. On the 3point scale, a rating of 0 = absent, 1 = probable or mild, and 2 = definite. The scale originally contained items for diurnal variation, TABLE 30.1 HDI Item Congruence with 17-item Clinician HAMD and with DSM-IV Diagnostic Criteria, Melancholic Subtype, and Associated Features of Major Depressive Disorder DSM-IV Diagnostic HDI Question Clinician Criteria HAMD Question 1. Depressed Mood 1a, 1b, 1c, 1d 1 2. Loss of Interest/Pleasure 7a, 14 7, 14 3. Weight/Appetite Loss 12, 17a, 17b, 16 17c 4a.Insomnia: Initial, Mid, 4a, 4b, 5a, 5b, 4, 5, 6 Late 6a, 6b 4b.Hypersomnia 18 Not assessed 5a.Psychomotor Retardation 8 8 5b.Psychomotor Agitation 9 9 6. Fatigue, Loss of Energy 13a 13 7a.Guilt/Worthlessness 2 2 7b.Worthlessness 21 Not assessed 8. Indecisiveness 23 Not assessed 9. Suicidal 3 3 Ideation/Behavior DSM-IV Melancholic Features 1. Distinct depressed mood 1a, 1b 1 2. Lack of Mood Reactivity 1c Not assessed 3. Diurnal Mood Variation 1e Not assessed 4. Excessive Guilt 2 2 5. Early Morning 6a, 6b 6 Awakening 6. Pervasive Loss of Interest7a 7 7. Marked Psychomotor 8 8 Retardation 8. Marked Psychomotor 9 9 Agitation 9. Significant Weight Loss 17a, 17b 16 DSM-IV Associated Features 1. Work Impairment 7a 7 2. Psychic Anxiety 10a, 10b 10 3. Somatic Anxiety 11a, 11b, 11c, 11 11d 4. Somatic, General 13b 13 5. Hypochondriasis 15a, 15b 15 6. Helplessness 19 Not assessed 7. Hopelessness 22 1

< previous page

page_936

next page >

< previous page

page_937

next page > Page 937

depersonalization/derealization, paranoia, and obsessional symptoms. Hamilton later dropped these items, as they did not adequately measure depression severity or occurred too infrequently to add to the scale's utility (Hamilton, 1974). Several of the items consist of a constellation of individual "symptoms," all of which should be considered in determining the items' final rating. For example, for Item 1, depressed mood, Hamilton suggested evaluating feelings of sadness, as well as pessimism, hopelessness, and a tendency to cry as a way to gauge depressed mood. Both the frequency and severity of each symptom should be considered in rating each item, and ratings should be based on a change from the patient's "usual self" due to depression (e.g., if a person never had much interest in sex, current low interest could not be counted as present, unless the current low interest is even lower than the person's usual baseline). The time frame on which the ratings are based is typically the past week in order to assess current severity apart from temporary or minor fluctuations. Hamilton provided only general guidelines as to the administration and scoring of the scale. No standardized probe questions were provided to elicit information from patients and no behaviorally specific guidelines were developed for determining each items' rating. In order to improve the interrater reliability of the scale, several sets of guidelines have been developed by a diverse group of researchers (Bech, Kastrup & Rafaelsen, 1986; Kobak, Schaettle, Katzelnick, & Simon, 1995; Miller, Bishop, Norman, & Maddever, 1985; Potts, Daniels, Burnham, & Wells, 1990; Whisman et al., 1989; Williams, 1988). As a result, raters trained at the same site using the same set of guidelines have achieved adequate interrater reliability. However, interrater reliability between raters at different sites has been difficult to achieve due to the use of different sets of guidelines and differences in clinical training and experience (Hooijer et al., 1991). In addition, training raters is a time-consuming process. Even when raters undergo extensive reliability training, adequate interrater reliability is often not obtained. For example, in a recent reliability training for a multisite clinical trial (Demitrack, Faries, DeBrota, & Potter, 1997), the difference in maximum and minimum total HAMD scores (full-scale range 0 to 52) obtained from 86 raters from 32 sites evaluating the same patients on videotape presentations varied from a spread of 14 points in the best case, to a spread of 21 points in the worst case. This occurred in spite of training that included an overview of interview methodology, a detailed review of anchor points, a review of the customized interview guide, and review and discussion of videotaped interviews. Even with fully structured interviews, problems with interrater reliability have been found. In one study, clinicians failed to ask up to 5% of the required questions (Fairbairn, Wood, & Fletcher, 1959) and in another incorrect diagnoses were made on as many as 10% of patients because of clinician branching errors and misapplication of the scoring algorithm (Kobak et al., 1997). To address these and other issues, a self-report version of the HAMD, called the Hamilton Depression Inventory (HDI; Reynolds & Kobak, 1995b) was developed. The HDI is designed to emulate the clinician-administered version and provide parallel scores and item content. The HDI is available in two formats: paper-and-pencil and desktop computer (PC) administration. The items in the paper-and-pencil and desktop computer-administered versions are identical. As is described later, the HDI also has been enhanced to reflect current diagnostic criteria and contemporary descriptions of depression. A score summary report as well as more detailed clinical interpretive report are provided using the HDI software available from the publisher. Two versions of the HDI are available: the full-scale version consisting of 23 items that evaluate symptoms of depression according to current diagnostic criteria and also allowing for the derivation

< previous page

page_937

next page >

< previous page

page_938

next page > Page 938

of a 17-item version that corresponds to the original HAMD described by Hamilton; and a 9-item "HDI Short Form" useful for screening purposes or where time or other restraints preclude the use of the longer version. Details of these and other features, its use in treatment planning and monitoring treatment outcome, and a review of the scale's psychometric data are presented next. Overview of the Hamilton Depression Inventory Goals of Development There were four primary goals in developing the HDI. The first was to develop a self-report paper-and-pencil and computer-administered version of the HAMD that was consistent with (i.e., equivalent to) the clinicianadministered version. The goal was to have both the item content and the scores obtained parallel those obtained with the clinician-administered version. One psychometric issue when adapting a clinician-administered test to another form of administration (e.g., computer or paper-and-pencil) is that factors associated with the mode of administration may result in the inability to generalize normative and validity data from the original test to the alternative version (Hofer & Green, 1985). The goal was to demonstrate the equivalence of the HDI with the clinician version from the perspective of cutoff scores, rank-order data, mean scores, variances, and item homogeneity. Second, the HDI was to be updated to make it consistent with current definitions of depression. The original HAMD was developed 38 years ago, prior to the development of modern diagnostic criteria. Thus, the HDI was expanded to parallel current symptoms associated with major depression as outlined in the current Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; American Psychiatric Association, 1994). Third, the scale was to reflect a more complete and more precise evaluation of depressive symptomatology than is typically obtained in paper-and-pencil self-report measures. In this way, the HDI differs from other self-report measures of depression, such as the Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961) and the Self-report Depression Scale (SRDS; Zung, 1965) in that several questions (as opposed to just one) are asked in rating each item. For example, as previously discussed, in rating Item 1 (depressed mood), feelings of sadness, crying, and pessimism are evaluated, and both the frequency and severity of the symptom is assessed. Answers to all these questions are then processed by an empirically derived scoring algorithm to obtain the final score for that item. In this way, the HDI obtains a more thorough evaluation of each item, and is more similar to how a clinician would probe, using branching logic as appropriate. For example, if the frequency of a symptom is reported "not at all," then the interview skips over the severity probe for that symptom. Fourth, an aim was to increase both the reliability and the clinical utility of the scale. The former was attained by establishing a consistent set of questions for obtaining information and a consistent and empirically based scoring algorithm for determining each item's rating. This insures the entire domain of each symptom is covered with all persons and that the basis for determining scores does not vary from rater to rater. The clinical utility of the was increased by the self-report format, which overcomes the problem of training interviewers, as well as the cost and effort of clinician time in

< previous page

page_938

next page >

< previous page

page_939

next page > Page 939

administering and scoring the scale; and identification of a subset of items to constitute a short form, which allows for use in clinical or research settings where time or other issues preclude the use of the full scale. The computer-administered version offers several additional advantages, such as the elimination of data entry and scoring, immediate availability of results, and easier utilization of branching logic. In addition, patients are often more honest with the computer or prefer it when disclosing information of a sensitive nature, such as alcohol and drug use (Lucas, Mullin, Luna, & McInroy, 1977; Skinner & Allen, 1983), suicidal ideation (Greist et al., 1973; Petrie & Abell, 1994), sexual functioning (Greist & Klein, 1980), and social anxiety (Katzelnick et al, 1995; Kobak et al., 1997). Initial Development and Psychometric Data: Computer Version The items for the preliminary form of the HDI were written to parallel the 17 item domains of the clinician HAMD. Originally, a total of 52 questions were developed to evaluate these 17 items; 10 of the 17 items utilize more than one question in determining the score for that item. These items were then pilot tested in a sample of 61 depressed outpatients via desktop computer (Kobak, Reynolds, Rosenfeld, & Greist, 1990). Items with a correlation of less than .50 with the corresponding item on the clinician version were revised or dropped, resulting in a final set of 37 questions that evaluated the 17 item domains specified by Hamilton. The initial investigation was designed to evaluate the viability of a self-report version of the HAMD. Of particular interest was the determination of equivalence between the computer form and clinician-administered HAMD. After the pilot study, the 17-item computer version was evaluated in a sample of 97 adults, who were given both the computer-administered HDI and the clinician-administered HAMD in a counterbalanced order (Kobak et al., 1990). The sample consisted of patients with a research diagnostic criteria (RDC; Spitzer, Endicott, & Robins, 1978) diagnosis of major depression (n = 52), minor depression (n = 20), and community controls (n = 25). Diagnoses were made using the Schedule for Affective Disorders (SADS; Endicott & Spitzer, 1978). The desktop computer version begins by asking patients for their name and demographic data, and by giving some brief instructions on the computer interview. To answer each question, patients press a numbered key from 0 to 4 on the computer keyboard. The program has error checking built-in so that numbers out of range cannot be entered. Patients can change their answer or return to a previous question with different keystrokes. After completing the interview, patients are instructed to return to clinicians and inform them that they are finished. Numeric and graphical summaries of present and previous interviews are available immediately. The clinician-administered HAMD was administered using training procedures and administration and scoring guidelines developed by William M. Reynolds. These guidelines have been used in clinical research for the past 12 years, and included scoring items to the half-point for greater reliability and accuracy. The internal consistency reliability (coefficient alpha) was .91 for the computer version and the median item-tototal scale correlation was .62, suggesting a high degree of internal consistency reliability. The correlation between the computer and clinician HAMD was high, r(95) = .96,p< .0001, suggesting a high degree of criterion-related validity. The mean score difference between the computer and clinician HAMD was nonsignificant (.10 of a

< previous page

page_939

next page >

< previous page

page_940

next page > Page 940

point), t(96) = .41, p=ns. The computer version correlated highly with the Beck Depression Inventory (BDI; Beck et al., 1961; r=.93), providing evidence of convergent validity. The computer-administered version was successful in discriminating between patients with major and minor depression, as well as discriminating patients with either major or minor depression from nonpsychiatric controls, F(2, 94) = 214.62, p < .0001. The clinical sensitivity of the computer HAMD was also examined using cutoff scores to differentiate patients with major depression from nonpsychiatric controls. Using a cutoff score of 17, the specificity (true negatives) of the computer HAMD was 100%, and the sensitivity was 94%. No order effects were found. Given the positive results of this study, the Hamilton Depression Inventory (Reynolds & Kobak, 1995a) was created by revising the original set of items and expanding the items to evaluate additional symptoms of depression specified by current definitions of depression. This measure was also developed to be administered in both paper-and-pencil and computer formats to provide scores on the expanded form that conforms to DSM-IV symptoms of Major Depression, as well as a score based on the 17 items corresponding to those on the clinician HAMD. In addition, a short form of the HDI was developed for use in screening, as well as a melancholia subscale that provides additional clinical information. Toward these efforts, the psychometric properties of these scales and subscales were evaluated. These research findings are summarized here. HDI Full Scale The 17 items described by Hamilton covered many of the DSM-IV symptoms associated features of major depressive episode (American Psychiatric Association, 1994, pp. 320-323, 345-347). However, the original 17 items did not cover all the DSM-IV symptoms and features of Major Depressive Episode and Dysthymia, having no specific items for indecisiveness, worthlessness, detachment, helplessness, hopelessness, or social isolation, or the atypical symptoms of hypersomnia or increased appetite. In addition, the original 17 items did not provide information on DSM-IV subtypes, such as Major Depression with Melancholia, which is useful information for treatment planning. Thus, the HDI was developed to include items evaluating these domains. To evaluate these new domains, an additional 8 items were developed resulting in a 25-item measure. Statistical analyses of these items resulted in dropping two of these new items: social isolation (an associated feature in DSM-IV) and weight gain. Statistical analyses of these items found low item-to-total scale correlation coefficients (r < .20), low factor loadings in an exploratory factor analysis, and a low rate of endorsement. Thus, the final version of the HDI consisted of 23 items that are evaluated with 38 questions. The number of questions used in determining each item's rating varies, as does the content and number of reponse options for each question. Answers to individual questions are processed by an empirically derived scoring algorithm to yield scores for each of the 23 items. The scoring algorithm uses weighted means of the questions to derive each item's final score. The final score for each item is consistent with the range for that particular item described by Hamilton, that is, either 0-4, or 0-2. Table 30.1 contains a list of the DSM-IV symptoms of major depressive disorder and melancholic features and the corresponding HDI items evaluating these domains. This expanded version allows for a more comprehensive evaluation of depression consistent with modern diagnostic criteria while retaining the original 17 HAMD items. The HDI requires a fifth-grade reading level, and takes about 10 minutes to complete.

< previous page

page_940

next page >

< previous page

page_941

next page > Page 941

HDI-17 The items in the 17-item HDI scale are contained within the full scale, 23-item version, and are thus obtained by calculating a subscore rather than a separate administration. The 17-item version was retained as a separate subscale with separate validation data in order to make more accurate comparisons to the 17-item clinician HAMD, and to provide HDI scale scores consistent with and parallel to the 17-item clinician HAMD. HDI-Melancholia Subscale (HDI-Mel) As previously described, the HDI-Mel subscale consists of nine items that are part of the full scale HDI (Table 30.1). The melancholia scale evaluates the presence of melancholic features as defined in DSM-IV (American Psychiatric Association, 1994). Several investigators have examined subscales derived from the HAMD that are indicative of a melancholic subtype (Bech et al., 1986; Kovacs, Rush, Beck, & Hollon, 1981; Thase et al., 1983; Zimmerman, Coryell, Pfohl, & Stangl, 1986). With the advent of DSM-IV in 1994, these features and their significance have been further defined. The clinical implications of melancholic features are described in DSMIV, and include the increased likelihood to respond to antidepressant medication, the decreased likelihood to have either a premorbid personality disorder, lack of a clear precipitant of their current depressive episode, and a decreased likelihood of responding to placebo medication. Melancholic features are more often associated with older patients, more common in inpatients versus outpatients, and are frequently associated with several biological markers, such as dexamethasone nonsuppression, hyperadrenocorticism, reduced rapid eye movement latency, and abnormal tyramine challenge and dichotic listening tests (American Psychiatric Association, 1994, p. 384). HDI Short Form Whereas the full scale HDI offers a comprehensive evaluation of depressive symptomatology, clinical and research settings are often limited by time and other constraints, making use of the full scale HDI unfeasible. The HDI short form (HDI-SF) provides a valid and reliable depression screener for these situations. Several writers have performed factor analytic and logistic studies of the 17-item HAMD in order to identify a subset of items that provide a global measure of depression severity that is useful as a "unidimensional" index of overall severity (Bech et al., 1975, 1981; Bech, Allerup, Reisby, & Gram, 1984; Gibbons, Clark, & Kupfer, 1993; Maier, Philipp, & Gerken, 1985; Riskind et al., 1987). The criteria for selection of items for the HDI short form included high item-to-total scale correlation coefficients and the ability to distinguish between persons with major depression, other psychiatric disorders, and nonpsychiatric community controls. Based on a preliminary analysis, nine items were chosen: HDI items 1 (depressed mood), 3 (suicide), 7 (loss of interest/work impairment), 10 (psychic anxiety), 13 (fatigue/somaticgeneral), 19 (helplessness), 21 (worthlessness), 22 (hopelessness), and 23 (indecisiveness). An analysis of variance found these items had the highest F values in distinguishing between the three diagnostic categories, all F > 200.00, p < .0001. A discriminant function analysis between persons with major depression and community controls found eight

< previous page

page_941

next page >

< previous page

page_942

next page > Page 942

of these items demonstrated Wilks's lambda values of .200 to .466 with F(1, 256) values ranging from 1024.8 to 293.8. Although the suicide item was somewhat less discriminating between groups, Wilks's lambda = .594, F(1, 256) = 174.3, it was included on the short form due to its clinical significance. All nine items showed high itemto-total correlation coefficients with the full scale HDI ranging from .63 to .87. The HDI-SF items also show considerable overlap with the short-form items identified on the clinician HAMD by other researchers (Bech et al., 1975, 1981, 1984; Gibbons et al., 1993; Maier et al., 1985; Riskind et al., 1987). Psychometric Properties of the HDI Normative Data Normative (standardization) data for the HDI was based on a nonreferred community sample of 510 adults (235 male and 271 female) between 18 and 89 years of age (18-24 = 16%; 25-39 = 38%;, 40-64 = 34%; and over 65 = 12%). The sample was 84.6% white, 5.3% African American, 4.5% Asian, 4.5% Hispanic, and 1.1% of other ethnicity. Raw scores were linearly transformed into standard scores (i.e., T-scores) and percentile ranks. Normative data for the full scale HDI, as well as the 17-item, 9-item short form, and melancholia subscale are available in the user manual for the entire standardization sample as well as separately for males and females (Reynolds & Kobak, 1995b). It was noted, however, that the absolute value of the HDI score is more meaningful than a normative comparison, and normative data should not be used as the sole basis for score interpretation. No significant differences were found in full scale HDI scores in comparisons involving age, F = 1.87, p > .05, or ethnicity, F = 1.65, p > .10. A small, but statistically significant difference was found for gender, with females scoring 1.18 points higher, t = 2.37, p < .05. Similar results were found for the 17-item and 9-item short form. A useful strategy for interpretive purposes is the examination of cutoff scores denoting clinically significant levels of depressive symptomatology. Cutoff scores for the full scale HDI, as well as the 17-item, 9-item short form, and melancholia subscale were empirically derived from samples of normal community adults and psychiatric outpatients with major depression. As cutoff scores were chosen that maximized sensitivity (i.e., minimized false negatives), normative data may be used in conjunction with cutoff scores in score interpretation. Use of cutoff scores is described further in the section on interpretive strategies. Reliability and Validity Data Data on the reliability and validity of the HDI was based on a sample of 921 adults (396 males and 521 females), including a nonreferred community sample (n = 510), psychiatric outpatients (n = 313), and college students (n = 98). Participants were between 18 and 89 years of age (M = 38.28, SD = 15.38) and were distributed across a wide range of age groups (18-24 = 23%; 25-39 = 36%; 40-64 = 33%; and over 65 = 9%). The sample was 83.8% White, 4.6% African American, 6.9% Asian, 3.1% Hispanic, and

< previous page

page_942

next page >

< previous page

page_943

next page > Page 943

1.6% of other ethnicity. This sample is representative of a cross-section of individuals likely to be evaluated with the HDI. Reliability. The reliability of the HDI was examined from several perspectives, including internal consistency (Cronbach's, 1951, coefficient alpha), test-retest, item homogeneity in the form of item-total score correlation coefficients, and estimates of the standard error of measurement. Coefficient alpha (roughly the equivalent to the mean of all possible split-halves) was chosen as the most appropriate measure of internal consistency, as the item content of the HDI is not necessarily randomly distributed. A summary of the reliability information for the entire development sample on all three forms of the HDI and the melancholia subscale is presented in Table 30.2. As shown in Table 30.2, internal consistency is high (over .90) for all forms. This is especially noteworthy for the 9-item short form, in that the lower number of items would typically tend to reduce the reliability coefficient. The internal consistency of the HDI was also examined separately for the psychiatric outpatient sample. Similar results were found, with a coefficient alpha of .89 for the full scale HDI. Overall, the internal consistency results support the item homogeneity of the HDI. Further support for the item homogeneity of the HDI is provided by the high median item-to-total correlation coefficients and mean interitem correlation coefficients found on all forms (Table 30.2). An examination of individual items in the whole development sample found high correlation coefficients for 20 of the 23 items (i.e., between .43 and .84), moderate correlation coefficients for two items (insight and hypersomnia at .34 and .39, respectively), and a low correlation for one item (weight loss, .26). The low correlation for weight loss can be partially explained by the low rate of endorsement for this item, as well as its low mean score (.18). The latter resulted in restricted variance for the item, and thus attenuating the correlation. Rehm and O'Hara (1985) found similar results on the clinician HAMD. An examination of individual items on the HDI short form found high item-to-total correlation coefficients for all the items (range = .59 to .85). This high degree of item homogeneity suggests the short form possesses psychometric characteristics similar to the 17- and 23-item versions. Overall, the results support the item homogeneity of the TABLE 30.2 HDI Reliability Estimates and Standard Error of Measurement (SEM) for the Total Development Sample (n = 921) and for Psychiatric Outpatients Only (n = 313) Form Sample ra riib Mdnritc Range rit SEM HDI-23 Total Sample .931 .358 .57 .26-.84 2.81 Psychiatric Outpatients .890 .250 .49 .21-.79 3.47 HDI-17 Total .897 .328 .53 .23-.82 2.41 Psychiatric Outpatients .850 .246 .40 .28-.73 2.90 HDI-SF Total .924 .578 .74 .59-.85 1.66 Psychiatric Outpatients .880 .448 .64 .41-.79 2.07 HDI-Mel Total .818 .314 .51 .22-.79 1.66 Psychiatric Outpatients .755 .234 .45 .03-.71 2.32 Note. Reproduced by special permission of the Publisher, Psychological Assessment Resources, Inc., 16204 North Florida Avenue, Lutz, Florida 33549, from the Hamilton Depression Inventory Proffessional Manual by William M. Reynolds, Ph.D., and Kenneth A. Kobak, Ph.D. Copyright © 1995 by PAR, Inc. Further reproduction is prohibited without permission of PAR, Inc. r = Coefficient alpha reliability, rii = Mean interitem correlation. Mdn rit = Median item total scale correlation.

< previous page

page_943

next page >

< previous page

page_944

next page > Page 944

HDI, indicating it is a reliable measure of a relatively homogenous construct of depression. This supports the use of a total HDI score as a reliable indicator of depression severity. In addition to internal consistency reliability, test-retest reliability was also examined in a subsample of 189 participants. This subsample included both community (n = 110) and psychiatric (n = 79) participants, and had demographic characteristics that were roughly similar to the development sample. The mean retest interval was 6.2 days (range 2 to 9 days, mode 7 days). All retesting was done prior to any treatment intervention. The test-retest reliability coefficient of .954 was found for both the full scale and 17-item HDI, .930 for the HDI short form, and .926 for the melancholia subscale. The results indicate a high degree of rank-order stability on all forms of the HDI. Mean score changes were small (1.14, .83, .61, and .51 for full scale, 17-item, 9-item, and melancholia HDI scales, respectively), but significant (t = 4.77, 4.69, 3.77, and 3.75, respectively; p < .001 for all comparisons). Although statistically significant due to the large sample size, these changes were not clinically significant (e.g., roughly equivalent to about a 10th of a standard deviation). Given the potential for random fluctuation when evaluating a state (vs. a trait) construct such as depression, the results are particularly strong. These findings support the use of the HDI as an outcome measure, as changes in scores associated with nonintervention factors (i.e., error variance) was minimal. Overall, the results support the reliability of the HDI. As reliability (i.e., the stability or consistency of a test measure) is a precondition for validity (i.e., how well a test actually measures what it purports to measure), the results provide a strong foundation for the examination of the validity of the HDI. Validity. According to the Standards for Educational and Psychological Testing (American Educational Research Association, 1985), validity refers to ''the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores", and "test validation is the process of accumulating evidence to support such inferences" (p. 9). The validity of the HDI was examined from a number of perspectives, including content validity; criterion (i.e., concurrent) validity; construct validity, in the form of convergent and discriminant validity; and clinical validity in the form of HDI scores differentiating between contrasted groups, and in the sensitivity and specificity of HDI cutoff scores. Given the many research investigations documenting the validity of the clinician HAMD, demonstrating the equivalence of the HDI to the clinician version provides additional validation support. Content validity refers to the extent to which a test adequately represents or samples the domain it purports to measure. The current standard for the classification of depression is the Diagnostic and Statistical Manual for Mental Disorders (DSM-IV; American Psychiatric Association 1994). The DSM-IV attempts to define and describe the symptoms of depression from an empirical basis while remaining neutral regarding etiology and atheoretical in nature (American Psychiatric Association, 1994, p. xviii). As shown in Table 30.1, the HDI evaluates the main symptoms of depression as defined by DSM-IV, as well as many of the associated features described in the manual. Content validity may also be inferred by item-to-total scale correlation coefficients. Although it does not measure whether the entire domain of the construct of depression is evaluated, it does indicate how well each of the items covary with sum of the remaining items. Items that are a significant part of the construct they are measuring should covary with the overall score for that construct. Given the high internal consistency of the HDI, each of the items thus contributes in a meaningful way to the total score. From

< previous page

page_944

next page >

< previous page

page_945

next page > Page 945

this perspective, item-to-total scale correlation coefficients provide additional statistical support of content validity. As described in the Standards for Educational and Psychological Testing (American Educational Research Association, 1985), types of validity are not mutually exclusive, but tend to overlap conceptually. Thus, further evidence of content validity is also provided in the following sections on contrasted group validity and the ability of HDI items to differentiate between persons with major depression, other psychiatric disorders, and nonreferred community adults. Criterion-related validity refers to how well a scale's score predicts performance on an outside measure of the same construct. Concurrent validity, a type of criterion validity, refers to how well the scale predicts scores on a related measure given at the same time (i.e., concurrently). For test validation purposes, there is typically a "gold standard" measure against which the new scale is compared, one that is widely used and accepted, and whose psychometric properties have been well established. Given that the clinician HAMD is one of the standards for the evaluation of depressive symptomatology and that the purpose was to demonstrate the equivalence between the self-report HDI and the clinician HAMD, the clinician HAMD was chosen as the criterion for the criterionrelated validity studies that follow. To examine the criterion-related concurrent validity of the HDI, a subsample of 403 adults (male = 174, female = 229) were given both the HDI and the clinician-administered HAMD in a counterbalanced order in a single session. Participants ranged in age from 18 to 89 years (M = 38.43, SD = 13.04) and were from diverse ethnic backgrounds (Caucasian = 86.4%; African American = 7.1%; Asian = 2.3%; Hispanic = 2.8%; other ethnicity = 1.4%). Participants had a DSM-III-R diagnosis of major depression (n = 135), other psychiatric disorder (n = 151), or were community controls with no current psychopathology (n = 117). Diagnoses were confirmed using the Structured Clinical Interview for the DSM-III-R (SCID; Spitzer, Williams, Gibbon, & First, 1988) modified to assess for current psychopathology only, except for those disorders that carry a lifetime diagnosis (e.g., bipolar disorder). Both the SCID and clinician HAMD interviews were conducted by one of nine research coordinators who received extensive training on the administration and scoring of the interviews, or by the first author, who conducted the majority (58%) of the interviews. Interviewers were blind to participants' HDI scores. Participants were also given a number of other self-report measures to examine convergent and discriminant validity, which are discussed in a later section. The correlation coefficient between the 17-item clinician HAMD and all versions of the HDI were very high (.941, .945, .910, and .912 for the full scale, 17-item, 9-item short form, and HDI-Mel, respectively; all p < .001), providing strong evidence for the criterion-related validity of the HDI. Of particular interest is the high validity coefficient for the 9-item short form, providing support for its utility for screening and research purposes. Overall the results indicate that the HDI and HAMD share a high degree of score variance and provide supportive evidence for criterion-related validity of the HDI. In addition to the correlation between scores on the HDI and HAMD, the mean score difference between the two instruments was examined. As the HDI was developed to provide scores that parallel the clinician HAMD, this examination is of particular importance. The mean score obtained on the entire sample on the 17-item HAMD (M = 12.83, SD = 8.60) was only .33 of a point different than the mean score obtained with the 17-item HDI (M = 13.16, SD = 8.75), t(402) = 2.28, p < .05. Again, although the comparison was statistically significant due to the large sample size associated with 402 degrees of freedom, the magnitude of the effect was small, and the difference was

< previous page

page_945

next page >

< previous page

page_946

next page > Page 946

clinically insignificant. The variances associated with each of the measures also were similar. The construct validity of the HDI was examined from the perspective of convergent and discriminant validity, and the diagnostic efficacy of HDI cutoff scores. Convergent validity was established by examining the relation between the HDI and scores on other measures of the same construct (i.e., depression), as well as related constructs (i.e., suicidal ideation, hopelessness, self-esteem, anxiety). High correlation coefficients would be expected with scales measuring the same construct, and moderate correlation coefficients with similar constructs. As previously noted, the criterion validity data presented previously comparing the HDI to the HAMD (the current "gold standard") also provide evidence for construct validity. As further evidence of construct validity, the relation between HDI scores and scores on the Beck Depression Inventory (BDI; Beck et al., 1961) was examined, as was the relation between scores on the HDI and scores on scales measuring related constructs. These included the Beck Hopelessness Scale (BHS; Beck, Weissman, Lester, & Trexler, 1974), Adult Suicidal Ideation Questionnaire (ASIQ; Reynolds, 1991), Beck Anxiety Inventory (BAI; Beck, Epstein, Brown, & Steer, 1988), and the Rosenberg Self-esteem Scale (RSES; Rosenberg, 1965). As a validity check of the data, the short form of the Marlowe-Crown Social Desirability Scale (Reynolds, 1982) was also administered to evaluate the extent to which respondents responded in a socially desirable manner. The correlation coefficients between the HDI and convergent validity measures are presented in Table 30.3. The relation between the BDI and all forms of the HDI were high, ranging from .91 to .93, p < .0001; the correlation coefficient between the BDI and the HDI-Mel was .89, p < .0001. The results presented in Table 30.3 support the convergent validity of the HDI as a measure of depression severity. Also shown in Table 30.3 are the correlation coefficients between the HDI and measures of constructs that are associated with depression. As would be expected, moderately high correlation coefficients were found between the HDI and these measures, providing further support for the construct validity of the HDI. In order to evaluate whether the high correlation coefficients between these measures are specific to depression as opposed to a general level of emotional distress, a multiple regression analysis was performed with the HDI as the dependent variable. The standardize TABLE 30.3 Correlation Coefficients Betwween HDI and Related Measures of Psychological Distress for the Total Sample Measure N HDI HDI- HDI- HDI17 SF Mel Beck Depression Inventory (BDI) 764.93***.91***.92***.89*** Beck Hopelessness Scale (BHS) 482.78***.72***.81***.73*** Adult Suicidal Ideation Questionnaire (ASIQ) 895.66***.63***.69***.61*** Beck Anxiety Inventory (BAI) 483.77***.77***.72***.69*** Rosenberg Self-esteem Scale (RSES) 625.68***.63***.73***.65*** 486.37***.35***.37***.39*** Marlowe-Crowne Social Desirability Scale-Short Form (MCSDS-SF) Note. Reproduced by special permission of the Publisher, Psychological Assessment Resources, Inc., 16204 North Florida Avenue, Lutz, Florida 33549, from the Hamilton Depression Inventory Professional Manual by William M. Reynolds, Ph.D., and Kenneth A. Kobak, Ph.D. Copyright © 1995 by PAR, Inc. Further reproduction is prohibited without permission of PAR, Inc. *** p < .001.

< previous page

page_946

next page >

< previous page

page_947

next page > Page 947

beta coefficients indicate the amount of variance associated with each of the independent variables. Results found the largest standardized beta coefficient was with the BDI (.582), confirming that the majority of the variance was attributable to the relation between the HDI and BDI, a measure of depression. Only a small amount of variance attributable to the other measures, with betas ranging from .01 (ASIQ) to .26 (BAI). Similar results were found when the clinician HAMD was substituted in the regression equation for the BDI, with an even larger beta of .68, p < .0001, found for amount of variance attributed to the HAMD. Discriminant validity was examined by the relation between HDI scores and scores on the Marlowe-Crowne Social Desirability Scale. Theoretically, low correlations would be expected between social desirability and selfreported depression. The scale also served as a methodological check, as high scores would confound the interpretations of the data. Results found a low correlation (-.37) with the HDI, with a small coefficient of determination (r2 < .14). These correlation coefficients suggest a minimal, insignificant relation between these two variables. Construct validity was also determined by examining the clinical efficacy of HDI cutoff scores. These cutoff scores are used as a rough threshold to determine a "clinically relevant" level of depressive symptomatology (i.e., symptoms that result in some degree of impairment in the person's life). The degree to which cutoff scores accurately place individuals into correct diagnostic categories is another measure of the clinical validity of the HDI. Using a cutoff score of 19, the full (23-item) scale HDI demonstrated a sensitivity of 99.3%, that is, the HDI correctly identified 99.3% of persons who had been diagnosed with major depression on the SCID. The same cutoff score demonstrated a specificity of 95.9%, that is, the HDI correctly identified 95.9% of the persons who did not have any diagnosis on the SCID. The positive predictive value (PPV; i.e., the percentage of persons identified by a test as having a specific characteristic that actually have that characteristic) was also examined. This is important, because it provides an indication of the clinical utility of the test (i.e., a test that has a high rate of false positives may still be highly sensitive, but of little practical value). The PPV of the full scale HDI using a cutoff of 19 was 86.9%, which is high, and addresses the problem of poor PPV that has been reported in the literature with past depression screeners (Campbell, 1987). The kappa and phi coefficients associated with a cutoff score of 19 were also high (.905 and .908, respectively). Similar analyses were performed on the 17-item HDI and the 9-item HDI short form using cutoff scores of 15 and 10, respectively. Results are presented in Table 30.4. High levels of sensitivity and specificity were found across HDI forms, with all values greater than .95. High values were also found for PPV, chi-square, phi, and kappa coefficients. Overall, the results demonstrate a high degree of association between HDI cutoff scores and the diagnosis of major depression, supporting the clinical efficacy of the HDI. Although the HDI is not intended as a diagnostic instrument, the clinical utility of HDI cutoff scores have shown to be valuable in identifying persons with a significant (i.e., clinical, level of depressive symptomatology). The clinical validity of the HDI was examined from the perspective of contrasted groups validity (Wiggins, 1973), also known as criterion group validity (A.L. Edwards, 1970). This refers to the ability of a test to differentiate between groups of people known to have different levels of the construct under examination. The contrasted groups validity for all three forms of the HDI and the HDI-Mel was examined by comparing mean score differences between persons with major depression, persons with other psychiatric disorders,

< previous page

page_947

next page >

< previous page

page_948

next page > Page 948

TABLE 30.4 Clinical Utility of HDI Cutoff Scores to Differentiate Among Psychiatric Outpatients with Major Depression and Normal Community Controls HDI Version Cutoff Sensitivity Specificity PPP x2 k f Score HDI Full Scale 19 99.3 95.9 86.9 536.14*** .908*** .905 HDI-17 15 95.7 96.7 88.7 525.65*** .899*** .898 HDI-SF 10 97.1 97.1 90.1 546.58*** .917*** .916 Note. PPP = Positive predictive power: the proportion of persons who are identified by the HDI as clinical depressed (i.e., who score at or above the cutoff score) who actually have a diagnosis of major depression. Reproduced by special permission of the Publisher, Psychological Assessment Resources, Inc., 16204 North Florida Avenue, Lutz, Florida 33549, from the Hamilton Depression Inventory Professional Manual by William M. Reynolds, Ph.D., and Kenneth A. Kobak, Ph.D. Copyright © 1995 by PAR, Inc. Further reproduction is prohibited without permission of PAR, Inc. ***p < .001 and community controls. This comparison is a rigorous test of contrasted groups validity in that persons with other psychiatric disorders often have some degree of comorbid depressive symptomatology. The ability of the HDI to differentiate between these groups is strong evidence for the validity and clinical utility of the scale. As shown in Table 30.5, highly significant differences were found between the three diagnostic groups on all forms of the HDI. Persons with major depression had nearly double the HDI scores as persons with other psychiatric diagnoses, and nearly four standard deviations above the community controls. The results provide strong support for the contrasted groups validity and clinical utility of the HDI as a measure of the severity of depressive symptomatology. TABLE 30.5 Contrasted Groups Validity of the HDI, HDI-17, HDI-SF, and HDI-Mel HDI Community Other Major F Group Form (n = 510) Psychiatric Depression Comparison (1) Diagnoses (n = 140) (n = 173) (3) (2) HDI Mean 7.29 16.66 30.93 SD 5.64 8.28 7.13 747.64*** 3 > 2 > 1 HDI-17 Mean 5.71 12.29 22.13 SD 4.22 6.09 5.10 664.07*** 3 > 2 > 1 HDI-SF Mean 3.14 8.30 16.70 SD 3.00 4.54 3.88 838.62*** 3 > 2 > 1 HDI-Mel Mean 3.34 7.03 13.10 SD 2.82 3.94 3.12 548.64*** 3 > 2 > 1 Note. Scheffe post hoc comparisons computer with p < .01. Reproduced by special permission of the Publisher, Psychological Assessment Resources, Inc., 16204 North Florida Avenue, Lutz, Florida 33549, from the Hamilton Depression Inventory Professional Manual by William M. Reynolds, Ph.D., and Kenneth A. Kobak, Ph.D. Copyright © 1995 by PAR, Inc. Further reproduction is prohibited without permission of PAR, Inc. *** p > .001.

< previous page

page_948

next page >

< previous page

page_949

next page > Page 949

Basic Interpretive Strategy Interpretation of the HDI consists of examination of the following elements: validity check, HDI raw score and cutoff score, HDI Melancholia Checklist, comparisons with normative data, HDI critical items, major depression checklist, and examination of individual items. In clinical applications, assessment of depressive symptomatology is typically from the perspective of current definitions of depression and diagnostic criteria. As such, the full scale HDI is recommended as the standard format. In situations (typically research settings) where there is a need for scores consistent with the clinician HAMD, the 17-item scores may be extracted from the full scale HDI format. In situations where time is limited, or when a brief screener is desired, the HDI short form may be used. In general, the full scale HDI provides the most complete evaluation of depressive symptomatology by virtue of assessing the a widest range of symptoms. Validity Check Before making any clinical decisions or recommendations based on HDI results, the HDI protocol should be checked for signs of invalid responding. Reasons for invalid responding include attempts to minimize or exaggerate symptoms, or a lack of compliance with the evaluation process resulting in cursory responding. The latter may be examined on the computer-administered version by examining the time it took to administer the HDI. A very short response time may indicate that the subject answered the questions without reading the items. Some individuals may have trouble reading or understanding the questions, but may be reticent to inform the examiner. In some cases, an invalid response set may be due to extreme distress or psychological disorganization. At least 19 of the 23 HDI items should be completed to be considered a valid administration (this is not applicable for the computer-administered HDI, as items cannot be skipped). The clinician should also check the HDI answer sheet for unusual patterns of responding, such as endorsing the same response to all items (with the exception of 0), or a consistent pattern of responses, such as alternating responses (e.g., 1, 2, 1, 2, 1, 2). Such response patterns are rare in valid protocols and indicate the likelihood of an invalid response set. Comparing items that evaluate opposite symptoms is another check for an invalid protocol. For example, Items 4 through 6 (insomnia) can be compared with Item 18 (hypersomnia), and Item 8 (psychomotor retardation) can be compared with Item 9 (psychomotor retardation). In most cases, high scores on these pairs of opposites symptoms are unlikely and may suggest an invalid response to the HDI. Another indication of invalid responding on the paper-and-pencil version is the completion of items that should have been skipped. Unlike the computer version, where the computer does the branching and only administers items that are appropriate, the paper-and-pencil version instructs respondents to skip follow-up questions when the response to the initial question was negative (e.g., if the response to Item 1a, frequency of depressed mood, is "not at all" the person is instructed to skip over the follow-up question regarding severity). A consistent pattern of failure to do this may indicate an invalid response set. Finally, blank items, particularly the suicide item, should be checked as an indication of potential difficulty. The clinician should ask the client to complete the missing item,

< previous page

page_949

next page >

< previous page

page_950

next page > Page 950

and follow-up by exploring the reasons for skipping the item. This process is facilitated by the HDI critical items form discussed later. Raw Score and Cutoff Scores The first step in interpreting the HDI scores should involve comparing the raw score to the HDI cutoff score. Cutoff scores are used to indicate the presence of a clinically significant level of depressive symptomatology. Cutoff scores were derived from a number of psychometric perspectives, including frequency distributions of the community sample, sensitivity, specificity, predictive power, and hit rate. Statistical analyses (i.e., chi-square, phi coefficient, and kappa) were computed to identify cutoff scores that differentiate between depressed outpatients and nondepressed community controls. Because one use of the HDI is to screen for clinical levels of depression, the HDI cutoff scores were chosen to maximize sensitivity (i.e., minimize false negatives, while retaining acceptable specificity), and are thus conservative in this regard. For the full scale HDI, the range of possible scores is 0 to 73. In actual clinical use, scores above 50 are rare. As many of the HDI items involve several questions that are then averaged according to a weighted scoring algorithm, total raw scores may not be an integer. These raw scores were retained and rounded to the nearest half-point in order retain accuracy. The average HDI scores in the community sample was 7, and the mean score for outpatients with a diagnosis of major depression was 31. A cutoff score of 19 is suggested to denote a clinical level of depressive symptomatology on the full scale HDI. In the community sample, this corresponds to the 96th percentile, a Tscore of 71, and is about 2 standard deviations above the mean. Persons scoring at or above the cutoff should be referred for further evaluation and possible treatment. Clinicians desiring higher levels of sensitivity may adjust the cutoff scores according to the normative data provided in the test manual (Reynolds & Kobak, 1995b). For the HDI-17, scores may range from 0 to 52, with scores above 35 being relatively rare. The recommended cutoff score on the HDI-17 is 15. This corresponds to the 97th percentile of the community sample and a Tscore of 71. The mean score for psychiatric outpatients with major depression was 22, with only 4.3% scoring below the cutoff score of 15. The mean score in the community sample on the HDI-17 was approximately 6. Again, the cutoff score was intended to maximize sensitivity for screening purposes, and thus is slightly less than the cutoff of 17 used in some, but not all, antidepressant outcome studies (Dunlop, Dornseif, Wernicke, & Potvin, 1990; Paykel, 1979). The HDI-SF score range is 0 to 33. As a brief screener in clinical and research settings, a cutoff score of 10 is recommended. This corresponds to the 97th percentile of the community sample and a T-score of 72. Although the HDI-SF is not intended to replace the full scale HDI, it does provide a valid and reliable screening tool for clinical and research applications. After examination of cutoff scores, raw scores may also be interpreted for levels of severity. Table 30.6 provides a general guide for the interpretation of raw scores. Scores may be classified as "not depressed," "subclinical," "mild," "moderate," ''moderate to severe," and "severe." These score ranges are provided as general interpretive guidelines, and are not intended to provide formal classifications or diagnostic groupings.

< previous page

page_950

next page >

< previous page

page_951

next page > Page 951

TABLE 30.6 Descriptions of Clinical Severity Levels of Depressive Symptomatology Associated with HDI Scores Clinical Description Form Range of Scores HDI 0-13.5 Not depressed 14.0-18.5 Subclinical 19.0-25.5 Mild 26.0-32.5 Moderate 33.0-39.5 Moderate to severe 40.0 Severe HDI-17 0-9.5 Not depressed 10.0-14.5 Subclinical 15.0-19.5 Mild 20.0-24.5 Moderate 25.0-29.5 Moderate to severe 30.0 Severe HDI-SF 0-6.0 Not depressed 6.5-8.5 Subclinical 9.0-12.5 Mild 13.0-16.5 Moderate 17.0-20.5 Moderate to severe 21.0 Severe Note. Descriptions associated with HDI score ranges are general guidelines to suggest levels of clinical severity. These descriptions should not be considered a formal classification of HDI scores or diagnostic groupings. Reproduced by special permission of the Publisher, Psychological Assessment Resources, Inc., 16204 North Florida Avenue, Lutz, Florida 33549, from the Hamilton Depression Inventory Professional Manual by William M. Reynolds, Ph.D., and Kenneth A. Kobak, Ph.D. Copyright © 1995 by PAR, Inc. Further reproduction is prohibited without permission of PAR, inc. Melancholic Features Examination of the HDI-Mel score provides an indication as to the extent to which the client reports features of melancholia as outlined in DSM-IV. A cutoff score of 16 is suggested as an indication of a clinical level of melancholic symptoms. Whereas the HDI-Mel scale was not intended to provide a diagnosis of the melancholic subtype, it does provide a measure of the extent to which the person's current depression is associated with melancholic features. Normative Data A secondary interpretive perspective may be gained by comparison of raw scores with normative data. This provides information as to the significance of the raw score, particularly in clients who have clinical levels of depression. Normative data allows comparisons using percentile ranks and standard scores. This evaluation should be considered secondary to the comparison of raw scores to cutoff scores, as depression assessment is more similar to a criterion-referenced orientation than to a norm-referenced orientation.

< previous page

page_951

next page >

< previous page

page_952

next page > Page 952

Critical Items Seven HDI have been identified as particularly useful in interpreting the HDI, as they demonstrate particular utility in differentiating persons with major depression from other diagnostic groups, or are particularly important due to the serious nature of the item. These consist of Item 1 (depressed mood), Item 3 (suicide), Item 7 (loss of interest), Item 13 (fatigue, general somatic), Item 21 (worthlessness), Item 22 (hopelessness), and Item 23 (indecisiveness). A score of 2 or greater on these items should be considered significant. Persons endorsing 3 or more critical items but score below the cutoff should nonetheless receive further evaluation. Conversely, persons who score above the cutoff but low on most of the critical items should be evaluated as possible false positives. Occasionally, persons without clinical depression but with other psychiatric disorders or certain medical conditions with a lot of somatic complains exceed the cutoff score. Examination of the critical items, particularly Item 1 (depressed mood) can help identify these individuals. In general, persons endorsing three or more critical items should receive further evaluation regardless of their HDI raw score. A score of 1 or greater on Item 3 (suicide) should always be followed up on given the seriousness nature of suicidal ideation and behavior. HDI Major Depression Checklist The HDI contains items that evaluate the nine symptoms that constitute Criteria A of a major depressive episode as currently defined by DSM-IV diagnostic criteria. These symptoms include depressed mood, loss of interest/pleasure, weight loss, insomnia/hypersomnia, psychomotor agitation/retardation, fatigue, worthlessness/guilt, indecisiveness, and suicidal ideation/behavior. DSM-IV requires a minimum of five of the nine symptoms to be present for a diagnosis of Major Depressive Episode, one of which must be depressed mood or loss of interest/pleasure. A score of 2 or greater on an item is considered above the threshold. The checklist is provided for descriptive purposes, and is not intended to constitute a formal diagnosis of depression, as the latter involves other inclusion and exclusion criteria not evaluated by the HDI (or any symptom rating scale). However, cases where either Item 1 or Item 7 is endorsed, and four or more other symptoms are endorsed clearly warrant further evaluation of the remaining diagnostic criteria to confirm the presence of a diagnosis of Major Depressive Episode or another affective disorder. Examination of Individual Items Examination of individual items provides useful information for the clinician. For example, clients who endorse several of the endogenous symptoms of depression may be particularly appropriate referrals for antidepressant treatment, whereas clients who more heavily endorse the more cognitive items such as hopelessness, helplessness, and low self-esteem may be appropriate candidates for cognitive interventions. Item 3 (suicide) should always be examined. This item on the clinician HAMD has shown to correlate highly with other measures of suicidal ideation and behavior (Bulik, Carpenter, Kupfer, & Frank, 1990; Reynolds, 1991; Reynolds, Kobak, & Greist, 1993; Reynolds, Kobak, Greist, Jefferson, & Tollefson, 1993).

< previous page

page_952

next page >

< previous page

page_953

next page > Page 953

Clinical Applications of the HDI The HDI may be used in a wide range of clinical and research applications. The utility of the HDI, within clinical domains, has been discussed and in particular the HDI short form for screening groups of individuals who may be at risk for a depressive disorder. This may take place in a range of community or clinical settings. The HDI full scale is particularly useful in mental health and other treatment-oriented settings, including general medical practice. There are several reasons for this recommendation. First, the HDI covers symptom domains of depression consistent with specifications for major depressive disorder in DSM-IV. Although it does not provide a diagnosis, the cutoff score is reasonably accurate in identifying individuals who may demonstrate a depressive disorder. Second, the flexible response format, either paper-and-pencil or computer administration with computer scoring options available for both formats, allows for each setting to select the format most appropriate for their clients. Third, the HDI includes several computer scoring programs that provide either a brief scoring report or a detailed interpretive report of between 7 and 10 pages. Examples of the use of the HDI for treatment planning, treatment monitoring, and treatment evaluation are provided next. Use of the HDI for Treatment Planning The cornerstone of an effective treatment plan starts with a comprehensive and accurate evaluation. Although a clinician's theoretical orientation may determine areas of particular interest for evaluation (e.g., evaluation of irrational beliefs for those whose approach is primarily cognitive-behavioral), current diagnostic criteria focuses on the description of signs and symptoms, and is atheoretical in terms of etiology. Thus, an accurate review of symptoms provides a basis from which clinicians can then turn to theoretical orientations for an appropriate intervention strategy. Three types of scales are typically used in an initial assessment process: screeners, diagnostic instruments, and symptoms rating scales. A screener identifies persons who are likely to have a disorder and thus warrant further evaluation. Diagnostic instruments, such as the SCID and SADS, provide such an evaluation in order to confirm the diagnosis. Symptom rating scales provide an indication as to the severity of symptoms associated with the disorder, as well as a general accounting of which symptoms are present. The HDI has psychometric properties that enable it to serve the dual functions of both a screener and as a symptom rating scale. As a screener, it identifies individuals with a clinically significant level of depressive symptomatology, and thus likely to have a diagnosis of depression. Such individuals warrant more in-depth evaluation to confirm the diagnosis. As a symptom rating scale, it provides an in-depth evaluation of both the type and severity of symptoms currently present. The HDI may be included as part of a battery of assessments given to new clients seeking treatment. Ideally, the HDI (and other scales evaluating symptom severity) should be used in conjunction with a careful diagnostic interview, as the latter confirms the presence of a disorder, whereas the former indicates the severity of the disorder. Clients typically present with an identified problem, and the focus of assessment is often on exploring the dimensions of this presenting problem. However, depression is often comorbid with other psychiatric disorders, such as anxiety disorders, eating disorders, alcohol and drug abuse, and personality disorders (American Psychiatric Association, 1994, p. 340). Thus, evaluation of depressive symptomatology is warranted

< previous page

page_953

next page >

< previous page

page_954

next page > Page 954

even for those patients whose primary presenting problem is not depression. From a treatment standpoint, unidentified comorbid symptoms of depression can hamper treatment efforts. For example, patients with obsessive-compulsive disorder and significant comorbid depression have been found to fail to habituate during exposure therapy (Buchanan, Meng, & Marks, 1996; Foa, Steketee, Grayson, & Doppelt, 1983). The identification and treatment of depression in primary and managed care is of particular importance. The NIMH multisite Epidemiologic Catchment Areas study found the majority of persons (68%) with depression were not diagnosed or treated, yet 45% presented themselves to primary care physicians for treatment of a nonpsychiatric medical condition (Shapiro et al., 1984). Depressed primary care patients are also more likely to be high utilizers of nonpsychiatric medical services (Katon et al., 1990; Widmer & Cadoret, 1979). Identification and treatment of these individuals would result in both the relief of suffering, increased quality of life, and provide a cost-offset from saved utilization (Katzelnick, Kobak, Greist, Jefferson, & Henk, 1997). The HDI, particularly the short form, provides a screening instrument that can be used in the primary care setting. Patients who exceed the cutoff score on initial evaluation should be evaluated further to confirm the presence of an affective disorder, and determine if depression is a primary or secondary problem. For example, patients with social phobia often develop secondary depression due to limitations and failures resulting from their phobic avoidance. In such cases, the focus of treatment may be on the social phobia, although the severity of the comorbid depression may require concurrent treatment of the affective symptoms, particularly if vegetative and motivational symptoms are present. Examination of the HDI-Mel score, as well as the HDI critical items, provide information that will help in this determination. Occasionally, patients with an anxiety disorder and a medical disorder with pronounced somatic symptoms may exceed the cutoff score and not have primary depressive illness. This can be evaluated by examination of the HDI critical items. High raw scores and low scores on the critical items (particularly Item 1, depressed mood) are an indication that the HDI score is elevated by these other factors. Conversely, patients may occasionally fall below the cutoff and have significant depressive symptomatology. Patients who score 2 or greater on three or more critical items should receive further evaluation regardless of the HDI raw score. Item 3 (suicide) should always be examined, and persons with a score of 1 or greater should always receive follow-up evaluation. Not all persons who feel suicidal are depressed, thus this item may be elevated even in those without a high level of depressive symptomatology. Suicidal ideation has been found to be prominent in patients with other psychiatric disorders, such as panic disorder (Weissman, Klerman, Markowitz, & Ouellette, 1989), social phobia (Cox, Direnfeld, Swinson, & Norton, 1994), and obsessive-compulsive disorder (Reynolds, Kobak, & Greist, 1992b), often when there is no comorbid diagnosis of depression. Thus, careful examination of this item is warranted. An examination of Item 22 (hopelessness) should also be examined in conjunction with Item 3, as the relation between hopelessness and suicide has been well established and is often a better predictor of suicidal intent than depression (Beck, Brown, Berchick, Stewart, & Steer, 1990; Beck, Kovacs, & Weissman, 1975). The HDI can help in determining the appropriate level of care. A score of 3 on Item 3 (suicide) indicates the person is thinking about suicide and has a plan, and a score of 4 indicates a recent suicide attempt. Careful follow-up is warranted to determine if individuals are currently at risk of harming themselves and if hospitalization is required. Patients scoring 3 on Item 7b (work performance) indicates impairment at a level where

< previous page

page_954

next page >

< previous page

page_955

next page > Page 955

simple self-care, such as washing and bathing, is difficult, and a score of 4 indicates that individuals report being unable to care for themselves at all. In such cases, partial or full hospitalization may be indicated. Delusional thinking in the form of somatic delusions are captured on Item 15a, and the presence of psychosis should be evaluated. The HDI total scale score as a reflection of overall symptom severity may also be considered in choosing the appropriate level of care. The HDI may also be useful in determining the appropriate therapeutic approach. Persons scoring above the cutoff on the HDI-Mel scale demonstrate a clinical level of melancholic features. The DSM-IV reports that persons with these features are more likely to have responded to antidepressant medications, and are thus good candidates for this type of treatment intervention (American Psychiatric Association, 1994, p. 384). They are also less likely to have a clear precipitant to their current episode and less likely to have a premorbid personality disorder, further indicating a somatic approach to treatment. Traditionally, the concept of melancholia has been used to indicate a more "endogenous" or biologically based depression, although the concept has been the focus of much debate (Nelson, Mazure, & Jatlow, 1990; Nelson, Mazure, Quinlan, & Jatlow, 1984; Price, Nelson, Charney, & Quinlan, 1984; Zimmerman, Black, & Coryell, 1989). DSM-IV focuses on the clinical implications of melancholia and does not provide any etiological interpretations. Among patients with a diagnosis of depression, patients with atypical features (i.e., hypersomnia, increased appetite, mood reactivity, leaden paralysis, and rejection sensitivity) have shown preferential response to certain classes of drugs, such as monoamine oxidase inhibitors (Liebowitz et al., 1988; Thase, Carpenter, Kupfer, & Frank, 1991). These features have been recognized as important clinical information, and have been included in the DSM-IV. HDI Item 1c (mood reactivity), Item 8 (psychomotor retardation), and Items 18a and 18b (hypersomnia) evaluate these domains, and warrant examination for treatment planning. Patients with atypical features typically have an earlier age of onset of their first depressive episodes, and their episodes tend to follow a more chronic, less episodic course, with only partial recovery between major episodes (American Psychiatric Association, 1994, p. 385). As such, patients with atypical features may be candidates for long-term, supportive treatment, or interventions aimed at preventing relapse. Cognitive-behavioral therapy is one of the most well-validated treatments for depression (Beck, 1991; Hollon, Shelton, & Loosen, 1991). Persons scoring high on HDI items associated with the cognitive symptoms of depression are particularly appropriate candidates for this type of therapy. In particular, Items 22 (hopelessness), 21 (worthlessness), 19 (helplessness), and 2 (guilt) are symptom domains that are amenable to a cognitive intervention. As recommended by Beck and colleagues (1979), patients with severe behavioral or motivational deficits might benefit more from an initial treatment approach that focuses on behavioral interventions, in order to restore the patient's functioning (p. 117). Elevated scores on Items 7a (loss of interest/pleasure), 7b (work difficulty), 8 (psychomotor retardation), and 13a (fatigue) indicate that a behavioral approach initially might be indicated. Beck, Rush, Shaw, and Emory (1979) and others (Lewinsohn, Antonuccio, Steinmetz, & Teri, 1984; Teri & Lewinsohn, 1982) suggested such interventions as activity scheduling and pleasant and unpleasant activities monitoring as techniques that are useful in this situation. Results of the HDI may be shared with the patient as part of a working collaboratively with the patient in developing a treatment plan. This collaborative approach to treatment planning helps foster the "therapeutic alliance," an important component for successful

< previous page

page_955

next page >

< previous page

page_956

next page > Page 956

treatment outcome. Reviewing results may be used as a way of building rapport with the patient, and as a springboard into a fuller discussion of symptoms and issues from the patient's perspective. Use of the HDI for Treatment Monitoring Once a treatment plan has been established, the HDI may be used as a gauge to monitor the effectiveness of the treatment intervention. Whereas the standard time frame of the HDI is the past 2 weeks, patients may be instructed to evaluate their symptoms over the past week in order to evaluate changes more precisely. Given the high 1-week test-retest reliability of the HDI, changes found are likely to be associated with the treatment intervention rather than measurement error. Clinical practice guidelines for the treatment of major depression in primary care have recently been developed by the U.S. Department of Health and Human Services Agency for Health Care Policy and Research (AHCPR; Depression Guideline Panel, 1993). These guidelines identify three stages in the treatment of depression. The acute stage is aimed at removing all depressive symptoms. If a relapse occurs within 6 months of remission, a relapse is declared. The continuation phase is aimed at preventing this relapse. Once a patient has been asymptomatic for 6 months, a recovery is declared. Once a recovery is declared, treatment for most patients may be stopped. The maintenance phase follows recovery, and is aimed at preventing a recurrence of depression. Recurrences occur in 50% of cases within 2 years after continuation treatment (NIMH, 1985). Thus, for some patients, continued monitoring and relapse prevention intervention may be warranted during this phase. Given these guidelines, it is recommended that the HDI be administered weekly during acute phase of treatment, and monthly during the continuation and maintenance phases. Clinician discretion should be used to adjust the frequency of administration up or down as warranted by clinical judgment. Symptoms of depression may improve at different rates, depending on the type of treatment intervention. For example, DiMascio et al. (1979) found that antidepressant therapy had its effect mainly on vegetative symptoms, such as sleep (HDI Items 4, 5, 6, and 18) and appetite (HDI Items 12 and 17), with improvements occurring early in treatment, often within the first week. Interpersonal psychotherapy, on the other hand, had its effect mainly on mood (HDI Item 1), suicidal ideation (HDI Item 3), and work and interests (HDI Item 7), with these effects occurring later in treatment, usually at 4 to 8 weeks. Similarly, Rush, Kovacs, Beck, Weissenburger, and Hollon (1981) found patients treated with cognitive therapy had improvements in hopelessness (HDI Item 22) and mood (HDI Item 1), generally preceded improvements by vegetative symptoms. Monitoring of differential symptom change by examination of these items, as well as scores on the HDI-Mel scale and HDI critical items, can serve as a guide for treatment focus. Improvement in cognitive symptoms without a similar improvement in vegetative symptoms after a course of cognitive therapy of adequate duration may indicate the need for the addition of antidepressant therapy. The reverse may also be true. Ongoing monitoring of treatment informs the clinician as to whether the interventions chosen are effective. In the case where there is no progress being made, the therapist may wish to reevaluate the treatment plan and identify reasons for the lack of progress. Depression often follows a fluctuating course, and patients sometimes get worse during

< previous page

page_956

next page >

< previous page

page_957

next page > Page 957

the course of treatment. Particular attention should be paid to Item 3 (suicide) and appropriate follow-up taken any time this item is endorsed. The computer-administered HDI is particularly useful for ongoing clinical assessment. Patients may take the computer interview in the waiting room while waiting to see the clinician. The results are scored automatically and a report is available to the clinician to review prior to the session. Such a situation was set up by Kobak et al. (1997) at an outpatient community mental health clinic in conjunction with a study of a computerized diagnostic screener. A desk with a desktop computer and printer was set up in the patient waiting room. Patients were instructed to arrive a few minutes prior to the start of each session and take the computer HDI. Results were reviewed and filed in the patients chart (APA ethical guidelines require that computer-administered assessments filed in charts be clearly labeled that the data were obtained by computer administration; American Psychological Association, 1986). Objective data on positive changes over time served as reinforcers in therapy, and often helped to counteract patients' negative thinking that they will never get any better. Patients enjoyed taking the computer interview, and objected on the few occasions when it was not administered due to time or other constraints. Use of the HDI for Treatment Outcomes Assessment With the advent of managed care, increased focus is being paid to treatment outcomes. Consumers, health care providers, employers, and managed care organizations all have an interest in documenting treatment outcomes. These interests range from evaluating the cost-effectiveness of treatment to making informed choices as a consumer. From the clinician's perspective, valid and reliable outcome measures provide information of patients' status that enable more informed treatment decisions. Patients are the ultimate beneficiaries of this information, in the form of increased quality of care. In research applications, clinician-administered symptom rating scales are the standard outcome measures, and have been used in pharmacological clinical trials for several decades. However, their use in mental health and clinical settings has been limited. Reasons for this include clinician's lack of expertise and training in the administration and scoring of these scales, and the time and cost involved. The HDI has the benefit of providing a score consistent with a clinician-administered scale without the time and cost of clinician involvement. The computer-administered version can be given directly to patients by the computer, eliminating the time and costs involved in administration and scoring by staff members. The computer-administered form of HDI has been used as an outcome measure in several clinical drug trials (Kobak, Greist, Jefferson, Katzelnick, & Schaettle, 1996; Kobak, Reynolds, Greist, Jefferson, & Tollefson, 1994) and was the primary outcome measure in a study of the treatment of depressed high utilizing medical patients by primary care physicians in a large health maintenance organization (HMO; Katzelnick et al., 1997). Determining Clinically Significant Treatment Change The determination of whether changes in HDI scores are clinically meaningful is a complex undertaking that is a function of the original level of depressive symptomatology, nature of the depressive disorder, and nature and extent of the treatment regime. There is no hard and fast single rule for specifying an absolute change score criterion

< previous page

page_957

next page >

< previous page

page_958

next page > Page 958

for significant change. One criterion that has been used in many pharmacological treatment outcome studies with the HAMD is a reduction in scores of 50% over the course of treatment. Given the similarities in basic content and assessment metric between the HDI and HAMD, it is reasonable to view a 50% reduction in score as clinically meaningful. In most cases of individuals with depression, such a reduction will result in the individual demonstrating a posttreatment score below the cutoff score on the HDI or HDI-17. Another perspective on change would be to view a change in standard scores (T-scores) of one and one-half standard deviations, or 15 T-score points, as clinically meaningful. Thus, an individual with a T-score of 90 on the HDI who posttreatment manifests a standard score of 75T, although still demonstrating a mild clinical level of depressive symptomatology, may be viewed as having shown a clinically significant reduction in HDI scores. It is important to note that the previous guidelines are broad suggestions for the evaluation of changes in HDI and HDI-17 scores. A similar perspective may be taken for the HDI-SF, although this measure should be used with caution for the evaluation of treatment outcome as a function of reduced item coverage. The previous criterion of 50% reduction in score may also be useful when applied to the HDI-Mel subscale, particularly in cases of more endogenous depression where antidepressant medications are the primary mode of treatment. The examination of change in specific symptomcontent domains, such as cognitive or somatic symptoms, is not suggested given the more limited reliability of such scores. Evaluation Against NIMH Criteria Newman and Ciarlo (1994) discussed the 11 criteria identified by the National Institute of Mental Health as important to consider when choosing an outcome measure. These criteria fall under five general headings: applications, methods and procedures, psychometric features, cost, and utility. What follows is a brief overview of how the HDI addresses these concerns. Criterion 1 Applications. This addresses the concern that the measure used is appropriate for the group being studied, that it adequately evaluates the symptom domain of that group, and that it is independent of the treatment provided. The evidence provided previously in this chapter provides support for the HDI as an appropriate outcome measure of depressive symptomatology. Its construct and content validity document that it adequately samples the domain of depressive symptomatology. The validation sample adequately samples both genders, and consists of a wide range of ages and ethnicity. It is appropriate for the evaluation of severity and change in depressive symptomatology independent of treatment modality, and the clinician version has been used to evaluate both pharmacological and psychological interventions (Elkin et al., 1989). Criterion 2 Simple, Teachable Methods and Procedures. The HDI provides an in-depth user manual with explicit instruction on the administration and interpretation of the scale. The computer-administered version provides a separate manual containing software documentation and a number for technical support.

< previous page

page_958

next page >

< previous page

page_959

next page > Page 959

Criterion 3 Psychometric Features. The psychometric features of the HDI are well documented from several perspectives (Reynolds & Kobak, 1995a, 1995b). In addition, the psychometric data on the clinician HAMD may by inferred to also apply to the HDI, by virtue of the demonstration of equivalence of the two forms. Newman and Ciarlo (1994) discussed the value of "objective referents," that is, standardized, concrete examples of each level of a symptom evaluated by the scale. The HDI addresses the limitations of the clinician version in this regard by providing behavioral descriptors whenever possible. For example, in rating insomnia, both the number of hours it takes to fall asleep and the number of days per week the problem occurred is evaluated. Regarding the use of multiple respondents, the HDI is more limited in this regard than the clinician HAMD, in that the clinician may use information from other sources, such as family, in determining ratings. Newman and Ciarlo (1994) discussed the advantages and disadvantages of using multiple perspectives. Persons using the HDI should incorporate additional information obtained from other perspectives in making clinical decisions. Criterion 4 Costs. The HDI is cost-effective compared to the clinician-administered version. The computer-administered version of the HDI provides outcomes data without requiring any clinician involvement in the administration or scoring of the test. Furthermore, a detailed interpretive report is also available from the HDI computer scoring program. This program also saves clinician time and can be easily integrated into a word processing file for editing and inclusion into the clinician's case report. Newman and Ciarlo (1994) speculated that if the costs of obtaining outcomes data were limited to the costs of purchasing the instrument and processing the data (without the use of a professionals time), the costs may be meet the NIMH estimates of the percentage of an agencies budget that is reasonable to allocate for outcomes research (i.e., .5%). The computer-administered HDI provides an instrument to accomplish this standard. Criterion 5 Utility. The NIMH guidelines cite as an asset the ability of test results to be understood by a nonprofessional audience, enabling all who have an interest in outcomes to take advantage of the information. Whereas the HDI is meant to be interpreted by a qualified professional, general descriptions of score values are provided in terms understandable to the nonprofessional (see Table 30.6). The HDI also provides graphic and narrative reports and computerized scoring as recommended by the panel. The HDI has clinical utility in case planning, ongoing treatment monitoring, and outcomes evaluation. Computer administration enables the collection and processing of data without burden to the clinician or support staff. As previously discussed, the HDI measures the construct of depression consistent with current definitions, and it is thus compatible for use in evaluating outcomes from a variety of treatment approaches. The clinician HAMD has been used as an outcome measure to evaluate a diverse range of treatment interventions utilized for treating depression, including medication, cognitive therapy, and interpersonal psychotherapy (Elkin et al., 1989). One final note is warranted on the use of the HDI for outcomes assessment. As previously mentioned, different symptoms of depression may respond at different rates, depending on the treatment and symptom (DiMascio et al., 1979; Rush et al., 1981). As such, before making conclusions as to the efficacy of an intervention, clinicians should wait until the standard course of treatment recommended for the intervention is attained. For example, Rush et al. (1981) found that with cognitive therapy, changes in cognitive symptoms such as hopelessness preceded improvements in vegetative symptoms. Before determining that a cognitive intervention has been ineffective for vegetative symptoms, clinicians should not perform the final outcome measurement until the proper

< previous page

page_959

next page >

< previous page

page_960

next page > Page 960

duration of treatment has been administered, according to the guidelines of the intervention. The HDI includes items evaluating both of these domains, allowing for the examination of vegetative and cognitive symptoms separately. In addition, some researchers have asserted that certain depression interventions may be effective for only certain classes of symptoms. For example, DiMascio et al. (1979) found that pharmacologic intervention with amitriptyline (one of the older generation tricyclic antidepressants) had its most profound impact on vegetative symptoms, with less efficacy on other domains. In cases where an intervention is quite effective on one domain (e.g., vegetative symptoms) and not another (e.g., cognitive symptoms), examination of the total score may not reveal the efficacy on the particular symptom group (Gibbons et al., 1993). Thus, separate examination of change scores for different classes of symptoms is warranted in order to make more accurate conclusions as to the efficacy of treatment. Case Study The following case study is provided as an example for interpreting the HDI. It is an actual clinical case, with a pseudonym and minor changes in demographic data to insure anonymity. The HDI answer and summary sheets for this case are included in Fig. 30.1 for illustration purposes. A complete interpretive report is also generated, but is not reproduced here. Case Illustration: Howard The case illustration is that of Howard, a 32-year-old married plumber with a high school education. Howard has a history of recurrent episodes of major depression extending back to age 18. These episodes generally follow a pattern of occurring about every 6 months, although for the past 5 years they have been occurring about every 3 months. Howard reported that his depressive episodes generally last about 2 to 4 weeks; however, the current episode has lasted at least 6 months, and is the most severe episode to date. Howard was evaluated in a university-based department of psychiatry as a potential candidate for participation in a research trial of a new antidepressant medication. He was interviewed by a research psychiatrist. Based on a clinical interview with the SCID, Howard received a DSM-III-R diagnosis of Major Depression, Recurrent and Severe. He also met criteria for Melancholic Subtype, due to symptoms of pervasive loss of interest or pleasure in almost all activities, lack of reactivity to usually pleasurable stimuli, diurnal variation with depression regularly worse in the morning, early morning awakening, psychomotor retardation, and significant anorexia and weight loss. There was no evidence of any significant personality disturbance before his first episode of major depression, and he has never been treated with psychiatric medications. Howard's Form HS Summary Sheet is presented in Fig. 30.1. Howard obtained a score of 49.5 on the full scale HDI, which is extremely high and equivalent to a T-score of 125. This raw score is well about the cutoff score of 19.0 on the HDI and suggests a very severe level of depressive symptomatology. On the HDI-17, Howard obtained a score of 37.5, significantly above the cutoff of 15.0 used to indicate a clinical level of depressive symptomatology. Howard received a score of 22.0 on the HDI-Mel. This

< previous page

page_960

next page >

< previous page

page_961

next page > Page 961

Fig. 30.1. Howard's form HS Summary Sheet. Reproduced by special permission of the Publisher, Psychological Assessment Resources, Inc., 16204 North Florida Avenue, Lutz, Florida 33549, from the Hamilton Depression Inventory by William M. Reynolds, Ph.D., and Kenneth A. Kobak, Ph.D. Copyright © 1195 by PAR, Inc. Further reproduction is prohibited without permission of PAR, Inc.

< previous page

page_961

next page >

< previous page

page_962

next page > Page 962

Fig. 30.1. (Continued). score is well above the cutoff score of 16.0 for this subscale and consistent with the psychiatric diagnosis of major depression with melancholic features. Howard's self-report on the HDI was consistent with his psychiatric diagnosis. On the HDI Major Depression Checklist, which lists the nine primary symptoms of major depressive disorder, Howard received a criterion-level score of 2.0 or higher on eight of the nine criteria (depressed mood, loss of interest, weight loss, insomnia, psychomotor retardation and agitation, fatigue, worthlessness and guilt, and indecisiveness). For a diagnosis of

< previous page

page_962

next page >

< previous page

page_963

next page > Page 963

major depression, five of the nine DSM-IV symptoms must be evident nearly every day for at least a 2-week period. The Major Depression Checklist is not a diagnostic indicator, but rather a guide to suggest the possibility of a diagnosis of depression. Howard received a score of 2.9 on HDI Item 1, which is high, with scores of 3 on Questions 1a and 1b that evaluate depressed mood. He also indicated that his depressed mood was worse in the mornings, a symptom scored on the HDI-Mel. His score on HDI Item 7 (loss of interest and pleasure/poor work performance) was 3.5, indicating significant impairment in social or occupational performance. His score of 4 on Question 7a indicated a total loss of interest in and pleasure from usual activities. This is complicated by persistent problems with indecision, as reflected in a score of 3.0 on Item 23. Howard's score on HDI Item 3 (suicide) was 1.0, indicating that even though his overall level of depression was severe, he had only mild suicidal ideation and was neither currently having active suicidal ideation nor was currently having active suicidal intentions (e.g., thoughts of taking one's life with specific details as to time, method, or place). His score does indicate more general feelings that life is not worth living and thoughts about death in a more general way. His scores on the three insomnia items (e.g., Items 4, 5, and 6) were all 2.0, indicating persistent and severe troubles with falling asleep, waking during the night, and early morning awakening. According to guidelines described by Hamilton (1960, 1967), items rated on a 0- to 2-point scale represent symptoms that are difficult to quantify. Thus, a score of 2 indicates both the certainty that the symptom exists and that it is significantly severe. Howard's scores of 2 on the insomnia items indicated his insomnia was persistent (e.g., almost every night) and significantly intense (e.g., for more than an hour every night). Items 12 (appetite) and 17 (weight loss) were other items using a 0- to 2-point scale on which Howard obtained scores of 2.0 These scores indicate an almost total loss of appetite and significant weight loss associated with the current episode. His psychiatric history revealed a 15-pound weight loss (dropping from 170 to 155 pounds) since the onset of his current symptoms. Howard's HDI protocol also indicated his depressive episode was complicated by significant comorbid anxiety. His score on the HDI Item 10 (psychic anxiety) was 3.5, indicating a severe level of anxiety that was present almost all the time. In a related manner, his score of 3.0 on Item 15 (hypochondriasis) reflected constant worry about health problems for which no medical explanation was established. He also reports pacing and feelings of restlessness, which are reflected in a score of 3.0 on Item 9 (psychomotor agitation). A score of 3 on this item reflects a degree of restlessness severe enough to interfere with a person's functioning. Howard also reported some degree of psychomotor retardation, as indicated by his score of 2.0 on Item 8. Although somewhat inconsistent with psychomotor agitation, his score on Item 8 may reflect his decreased ability to work and some slowness of speech. Other significant aspects of the current episode include significant feelings of hopelessness (as indicated by a score of 2.0 on Item 22), strong feelings of worthlessness (as indicated by a score of 3.0 on Item 21), and almost constant feelings of helplessness (as reflected by a score of 3.0 on Item 19). In spite of the severity and longstanding nature of Howard's problems with depression, his insight into his problems is limited, as indicated by a score of 1.5 (out of a possible score range of 0-2) on Item 16. This suggests that although he accepts the possibility that his symptoms occur because he is depressed, he does not really think this to be the case. Howard's interview using the Suicidal Behavior History Form (SBHF; Reynolds & Mazza, 1992) indicated no prior suicide attempts. His score on the Adult Suicidal

< previous page

page_963

next page >

< previous page

page_964

next page > Page 964

Ideation Questionnaire (ASIQ; Reynolds, 1991) was 35, which is slightly above the cutoff level of 31, indicating clinical levels of suicidal ideation when compared to normative data from a community sample. This is balanced by the fact that none of the ASIQ critical items were endorsed as having occurred within the past month. Howard was subsequently treated with an antidepressant medication and showed significant improvement. It should be noted that the melancholic subtype of major depression, for which Howard met the criteria, is believed to be particularly responsive to somatic therapy. (American Psychiatric Association, 1987, 1994). Howard's HDI profile indicated elevated scores on all HDI items associated with a melancholic subtype (i.e., Questions 1a and 1b, depressed mood; Question 1c, reactivity; Questions 1e, worse in the mornings; Item 2, feelings of guilt; Questions 6a and 6b, early morning awakening; Question 7a, loss of interest or pleasure; Item 8, psychomotor retardation; Item 9, psychomotor agitation; and Questions 17a and 17b, weight loss). Overall, Howard's scores of 37.5 on the HDI-17 and 49.5 on the full scale HDI indicates a very severe clinical depression. It is relatively rare to see full scale HDI scores over 40 in outpatients. Patients with scores in this range should be followed very carefully and either treated or promptly referred for appropriate treatment. Conclusions The HDI, as a patient-completed computer-administered and paper-and-pencil version of the HAMD, builds on a strong foundation for the assessment of depression in adults. The HDI may be used in a range of clinical applications: for screening, treatment planning, treatment monitoring, and measuring treatment outcomes. The various forms of the HDI increases its utility, making it an appropriate instrument for both clinical and research purposes. The full scale form of the HDI evaluates domains consistent with current diagnostic symptoms of depression. The HDI differs from traditional self-report measures in that it emulates a clinical interview by asking several questions in evaluating each symptom domain, weighs the answers to the questions in arriving at a final score, and uses branching logic. Unlike clinicians, the HDI is consistent and does not vary from person to person in terms of the questions asked, nor in the scoring algorithm used to determine item ratings. Depression is a serious illness, associated with significant disability and decreased quality of life (Broadhead, Blazer, George, & Tse, 1990; Hays, Wells, Sherbourne, Rogers, & Spritzer, 1995). Estimates of those with severe depression who die of suicide are as high as 15% (American Psychiatric Association, 1994, p. 340). The good news is great advances have been made in both the treatment and public awareness of depression and its impact. Increased efforts are being made in the screening of depressed individuals (Baer et al., 1995), and in the empirical evaluation of treatment interventions (Elkin et al., 1989). The HDI provides a psychometrically sound and clinically useful tool for both of these purposes. The rise in HMOs has resulted in the inclusion of marketplace factors into the clinical care of patients. Although this has caused some concern and resistance, the measurement of treatment outcomes may provide new opportunities for improved clinical care. The measurement of outcomes provides clinicians and patients with more information on which they can mutually make more informed treatment decisions. Systematic data gathering can inform which treatments work for which patients under what conditions. Psychometrically sound outcome measures such as the

< previous page

page_964

next page >

< previous page

page_965

next page > Page 965

HDI can provide a tool by which patients, clinicians, and managed care providers can search together for what Minichiello and Baer (1994) referred to as ''the bottom line: what works." Acknowledgments The case study in this chapter is reproduced by special permission of the publisher, Psychological Assessment Resources, Inc., 16204 North Florida Avenue, Lutz, Florida 33549, from the Hamilton Depression Inventory Professional Manual by William M. Reynolds, Ph.D., and Kenneth A. Kobak, Ph.D. Copyright © 1995 by PAR, Inc. Further reproduction is prohibited without permission of PAR, Inc. References American Educational Research Association. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association. American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd rev ed.) Washington, DC: Author. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. American Psychological Association (1986). Guidelines for computer-based tests and interpretations. Washington, DC: Author. Baer, L., Jacobs, D.G., Cukor, P., O'Laughren, J., Coyle, J.T., & Magruder, K.M. (1995). Automated telephone screening for depression. JAMA, 273, 1943-1944. Bech, P., Allerup, P., Gram, L.F., Reisby, N., Rosenberg, R., Jacobsen, O., & Nagy, A. (1981). The Hamilton Depression Scale: Evaluation of objectivity using logistic models. Acta Psychiatrica Scandinavica, 63, 290-299. Bech, P., Allerup, P., Reisby, N., & Gram, L.F. (1984). Assessment of symptom change from improvement curves on the Hamilton Depression Scale in trials with antidepressants. Psychopharmacology, 84, 276-281. Bech, P., Gram, L.F., Dein, E., Jacobsen, O., Vitger, J., & Bolwig, T.G. (1975). Quantitative rating of depressive states. Acta Psychiatrica Scandinavica, 51, 161-170. Bech, P., Kastrup, M., & Rafaelsen, O.J. (1986). Mini-compendium of rating scales for states of anxiety, depression, mania, schizophrenia with corresponding DSM-II syndromes. Acta Psychiatrica Scandinavica, 73, 5-37. Beck, A.T. (1991). Cognitive therapy: A 30-year retrospective. American Psychologist, 46, 368-375. Beck, A.T., Brown, G., Berchick, R.J., Stewart, B. L., & Steer, R. (1990). Relationship between hopelessness and ultimate suicide: A replication with psychiatric outpatients. American Journal of Psychiatry, 147, 190-195. Beck, A.T., Epstein, N., Brown, G., & Steer, R.A. (1988). An inventory for measuring clinical anxiety: Psychometric properties. Journal of Consulting and Clinical Psychology, 56, 893-897. Beck, A.T., Kovacs, M., & Weissman, A. (1975). Hopelessness and suicidal behavior: An overview. Journal of the American Medical Association, 234, 1146-1149. Beck, A. T., Rush, A. J., Shaw, B., & Emory, G. (1979). Cognitive therapy of depression. New York: Guilford. Beck, A.T., Ward, C., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571. Beck. A.T., Weissman, A., Lester, D., & Trexler, M. (1974). The measurement of pessimism: The Hopelessness Scale. Journal of Consulting and Clinical Psychology, 42, 861-865. Broadhead, W.E., Blazer, D.G., George, L. K., & Tse, C.K. (1990). Depression, disability

< previous page

page_965

next page >

< previous page

page_966

next page > Page 966

days, and days lost from work in a prospective epidemiologic survey. JAMA, 264, 2524-2528. Buchanan, A.W., Meng, K.S., & Marks, I.M. (1996). What predicts improvement and compliance during the behavioral treatment of obsessive compulsive disorder? Anxiety, 2, 22-27. Bulik, C.M., Carpenter, L.L., Kupfer, D.J., & Frank, E. (1990). Features associated with suicide attempts in recurrent major depression. Journal of Affective Disorders, 18, 29-37. Campbell, T.L. (1987). Is screening for mental health problems worthwhile in family practice? An opposing view. Journal of Family Practice, 25, 184-187. Carroll, B.J., Feinberg, M., Smouse, P.E., Rawson, S.G., & Greden, J.F.F. (1981) The Carroll Rating Scale for Depression: I. Development, reliability and validation. British Journal of Psychiatry, 138, 194-200. Cox, B.J., Direnfeld, D.M., Swinson, R.P., & Norton, G.R. (1994). Suicidal ideation and suicide attempts in panic disorder and social phobia. American Journal of Psychiatry, 151, 882-887. Demitrack, M.A., Faries, D., DeBrota, D., & Potter, W.Z. (1997). The problem of measurement error in multisite clinical trials. Psychopharmacology Bulletin, 33, 513. Depression Guideline Panel (1993). Depression in Primary Care: Vol. 2. Treatment of major depression. Clinical practice guidelines, number 5 (AHCPR Publication No. 93-0551). Rockville, MD: U.S. Department of Health and Human Services. DiMascio, A., Weissman, M.M., Prusoff, B. A., Neu, C., Zwilling, M., & Klerman, G. L. (1979). Differential symptom reduction by drugs and psychotherapy in acute depression. Archives of General Psychiatry, 36, 14501456. Dunlop, S.R., Dornseif, B.E., Wernicke, J.F., & Potvin, J.H. (1990). Pattern analysis shows beneficial effects of fluoxetine treatment in mild depression. Psychopharmacology Bulletin, 26, 173-180. Edwards, A.L. (1970). The measurement of personality traits by scales and inventories. New York: Holt, Rinehart & Winston. Edwards, B.C., Lambert, M.J., Moran, P.W., McCully, T., Smith, K.C., & Ellingson, A. G. (1984). A metaanalytic comparison of the Beck Depression Inventory and the Hamilton Rating Scale for Depression as measures of treatment outcome. British Journal of Clinical Psychology, 23, 93-99. Elkin, I., Shea, M.T., Watkins, J.T., Imber, S.D., Sotsky, S.M., Collins, J.F., Glass, D. R., Pilkonis, P.A., Leber, W.R., Docherty, J. P., Fiester, S.J., & Parloff, M.B. (1989). National Institute of Mental Health treatment of depression collaborative research program. Archives of General Psychiatry, 46, 971-983. Endicott, J., Cohen, J., Nee, J., Fleiss, J., & Sarantakos, S. (1981). Hamilton Depression Rating Scale: Extracted from regular and change versions of the schedule for affective disorders and schizophrenia. Archives of General Psychiatry, 38, 98-103. Endicott, J., & Spitzer, R.L. (1978). A diagnostic interview: The schedule for affective disorders and schizophrenia. Archives of General Psychiatry, 35, 837-844. Fairbairn, A.S., Wood, C.H., & Fletcher, C. M. (1959). Variability in answers to a questionnaire on respiratory symptoms. British Journal of Preventive and Social Medicine, 13, 175-193. Fava, G.A., Kellner, R., Munari, F., & Pavan, L. (1982). The Hamilton Depression Rating Scale in normals and depressives: A cross cultural validation. Acta Psychiatrica Scandinavica, 66, 27-32. Foa, E.B., Steketee, G.S., Grayson, J.B., & Doppelt, H.G. (1983). Treatment of obsessive-compulsives: When do we fail? In E. Foa & P.M.G. Emmelkamp (Eds.), Failures in behavior therapy (pp. 10-34). New York: Wiley. Gibbons, R.D., Clark, D.C., & Kupfer, D.J. (1993). Exactly what does the Hamilton Depression Rating Scale measure. Journal of Psychiatric Research, 27, 259-273. Greist, J.H., Gustafson, D.H., Stauss, F.F., Rowse, G.L., Laughren, T.P., & Chiles, J. A. (1973). A computer interview for suicide-risk prediction. American Journal of Psychiatry, 130, 1327-1332. Greist, J.H., & Klein, M.H. (1980). Computer programs for patients, clinicians, and researchers in psychiatry. In J.B. Sydowski, J. H. Johnson, & T.A. Williams (Eds.), Technology in mental health care delivery systems (pp. 161-181). Norwood, NJ: Ablex. Guy, W. (1976). ECDEU assessment manual for psychopharmacology (rev. ed.). (Publication

< previous page

page_966

next page >

< previous page

page_967

next page > Page 967

No., ADM 76-338). Rockville, MD: National Institute of Mental Health, U.S. Department of Health, Education, and Welfare. Hamilton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery and Psychiatry, 23, 5662. Hamilton, M. (1967). Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychiatry, 6, 278-296. Hamilton, M. (1974). General problems of psychiatric rating scales (especially for depression). In P. Pichot (Ed.), Modern problems of pharmacopsychiatry: Vol. 7. Psychiatric measurements in psychopharmacology (pp. 125-138). Basel: Karger Hamilton, M. (1986). The Hamilton Rating Scale for Depression. In N. Sartorius & T.A. Ban (Eds.), Assessment of depression (pp. 143-152). Berlin: Springer-Verlag. Hays, R.D., Wells, K.B., Sherbourne, C.D., Rogers, W., & Spritzer, K. (1995). Functioning and well-being outcomes of patients with depression compared with chronic general medical illnesses. Archives of General Psychiatry, 52, 11-19. Hedlund, J.L., & Vieweg, B.W. (1979). The Hamilton Rating Scale for Depression: A comprehensive review. Journal of Operational Psychiatry, 10, 149-161. Hofer, P.J., & Green, B.F. (1985). The challenge of competence and creativity in computerized psychological testing. Journal of Consulting and Clinical Psychology, 53, 826-838. Hollon, S.D., Shelton, R.C., & Loosen, P.T. (1991). Cognitive therapy and pharmacotherapy for depression. Journal of Consulting and Clinical Psychology, 59, 88-99. Hooijer, C., Zitman, F.G., Griez, E., van Tilburg, W., Willemse, A., & Dinkgreve, M.A.H.M. (1991). The Hamilton Depression Rating Scale (HDRS): Changes in scores as a function of training and version used. Journal of Affective Disorders, 22, 21-29. Katon, W., Von Korff, M., Lin, E., Lipscomb, P., Russo, J., Wagner, E., & Polk, E. (1990). Distressed utilizers of medical care: DSM-III-R diagnoses and treatment needs. General Hospital Psychiatry, 12, 355-362. Katzelnick, D.J., Kobak, K.A., Greist, J.H., Jefferson, J.W., & Henk, H.J. (1997). Effect of primary care treatment of depression on service use by patients with high medical expenditures. Psychiatric Services, 48, 5964. Katzelnick, D.J., Kobak, K.A., Greist, J.H., Jefferson, J.W., Mantle, J.M., & Serlin, R.C. (1995). Sertraline for social phobia: A double-blind, placebo-controlled crossover study. American Journal of Psychiatry, 152, 13681371. Kim, K.I. (1977). Clinical study of primary depressive symptoms: Part 1. Adjustment of Hamilton's rating scale for depression. Neuropsychiatry, 16, 36-60. Kobak, K.A., Greist, J.H., Jefferson, J.W., Katzelnick, D.J., & Schaettle, S.C. (1996, May). Computerized assessment in clinical drug trials. Paper presented at the National Institute of Mental Health, New Clinical Drug Evaluation Unit, 36th Annual Meeting, Boca Raton, FL. Kobak, K.A., Greist, J.H., Jefferson, J.W., Reynolds, W.M., & Tollefson, G.D. (1994). The computer administered Hamilton Depression Rating Scale in a double-blind study of fluoxetine vs. imipramine in agitated depression. Unpublished manuscript. Kobak, K.A., Reynolds, W.R., Rosenfeld, R., & Greist, J.H. (1990). Development and validation of a computer administered Hamilton Depression Rating Scale. Psychological Assessment, 2, 56-63. Kobak, K.A., Schaettle, S., Katzelnick, D.J., & Simon, G. (1995). Guidelines for the Hamilton Depression Rating Scale: Modified for the depression in primary care study. Madison, WI: Dean Foundation. Kobak, K.A., Taylor, L. vH., Dottl, S.L., Greist, J.H., Jefferson, J.W., Burroughs, D., Mantle, J.M., Katzelnick, D.J., Norton, R., Henk, H.J., & Serlin, R.C. (1997). A computer-administered telephone interview to identify mental disorders. Journal of the American Medical Association, 278, 905-910. Kovacs, M., Rush, A.J., Beck, A.T., & Hollon, S. D. (1981). Depressed outpatients treated with cognitive therapy or pharmacotherapy: A 1-year follow-up. Archives of General Psychiatry, 38, 33-39. Lambert, M.J., Hatch, D.R., Kingston, M. D., & Edwards, B.C. (1986). Zung, Beck, and Hamilton Rating Scales as measures of treatment outcome: A meta-analytic comparison. Journal of Consulting and Clinical Psychology, 54, 54-59. Lewinsohn, P.M., Antonuccio, D.O., Steinmetz, J. L., & Teri, L. (1984). The coping

< previous page

page_968

next page > Page 968

with depression course: A psychoeducational intervention for unipolar depression. Eugene, OR: Castalia. Liebowitz, M.R., Quitkin, F.M., Stewart, J.W., McGrath, P.J., Harrison, W., Markowitz, J. S., Rabkin, J.G., Tricamo, E., Goetz, D. M., & Klein, D.F. (1988). Antidepressant specificity in atypical depression. Archives of General Psychiatry, 45, 129-137. Lucas, R.W., Mullin, P.J., Luna, C.B.X., & McInroy, D.C. (1977). Psychiatrists and a computer as interrogators of patients with alcohol-related illnesses: A comparison. British Journal of Psychiatry, 131, 160-167. Maier, W., Philipp, M., & Gerken, A. (1985). Dimensions of the Hamilton Depression Scale. European Archives of Psychiatry and Neurological Sciences, 234, 417-422. Miller, I. W., Bishop, S., Norman, W.H., Maddever, H. (1985). The modified Hamilton Rating Scale for Depression: Reliability and validity. Psychiatry Research, 14, 131-142. Minichiello, W.E., & Baer. L. (1994). Managed care: Our behavioral imperative. Behavior Therapist, 17, 22. Montgomery, S.A., & Asberg, M. (1979). A new depression scale designed to be sensitive to change. British Journal of Psychiatry, 134, 382-389. Nelson, J.C., Mazure, C., Quinlan, D.M., & Jatlow, P.I. (1984). Drug-responsive symptoms in melancholia. Archives of General Psychiatry, 41, 663-668. Nelson, J.C., Mazure, C.M., & Jatlow, P. I. (1990). Does melancholia predict response in major depression? Journal of Affective Disorders, 18, 157-165. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcomes assessment (pp. 98-108). Hillsdale, NJ: Lawrence Erlbaum Associates. NIMH Consensus Development Conference Statement (1985). Mood disorders: Pharmacologic prevention of recurrences. American Journal of Psychiatry, 142, 469-476. Paykel, E.S. (1979). Predictors of treatment response. In E.S. Paykel & A. Coppen (Eds.), Psychopharmacology of affective disorders (pp. 193-220). Oxford, England: Oxford University Press. Petrie, K., & Abell, W. (1994). Responses of parasuicides to a computerized interview. Computers in Human Behavior, 10, 415-418. Price, L.H., Nelson, J.C., Charney, D.S., & Quinlan, D.M. (1984). The clinical utility of family history for the diagnosis of melancholia. Journal of Nervous and Mental Disease, 172, 5-11. Potts, M.K., Daniels, M., Burnam, M.A., & Wells, K.B. (1990). A structured interview version of the Hamilton Depression Rating Scale: Evidence of reliability and versatility of administration. Journal of Psychiatry Research, 24, 335-350. Ramos-Brieva, J.A., & Cordero-Villafafila, A. (1988). A new validation of the Hamilton Rating Scale for Depression. Journal of Psychiatric Research, 22, 21-28. Rehm, L.P., & O'Hara, M.W. (1985). Item characteristics of the Hamilton Rating Scale for Depression. Journal of Psychiatric Research, 19, 31-41. Reynolds, W.M. (1982). Development of reliable and valid short forms of the Marlowe-Crowne Social Desirability Scale. Journal of Clinical Psychology, 38, 119-125. Reynolds, W.M. (1991). Adult Suicidal Ideation Questionnaire: Professional manual. Odessa, FL: Psychological Assessment Resources. Reynolds, W.M., & Kobak, K.A. (1995a). Development and validation of the Hamilton Depression Inventory: A self-report version of the Hamilton Depression Rating Scale. Psychological Assessment, 7, 472-483. Reynolds, W.M., & Kobak, K.A. (1995b). Hamilton Depression Inventory: A self-report version of the Hamilton Depression Rating Scale. Professional manual. Odessa, FL: Psychological Assessment Resources. Reynolds, W.M., & Kobak, K.A. (1998). Reynolds Depression Screening Inventory: Professional manual. Odessa, FL: Psychological Assessment Resources. Reynolds, W.R., Kobak, K.A., & Greist, J.H. (1992a, August). Diagnostic utility of the Hamilton Depression Rating Scale. Paper presented at the annual meeting of the American Psychological Association, Washington, D.C. Reynolds, W.R., Kobak, K.A., & Greist, J.H. (1992b, June). Suicidal behavior in outpatients with panic disorder, obsessive compulsive disorder and major depression. Paper presented

< previous page

page_969

next page > Page 969

at the International Conference on Suicidal Behavior, Pittsburgh, PA. Reynolds, W.M., Kobak, K.A., & Greist, J.H. (1993, March). The Adult Suicidal Ideation Questionnaire: Psychometric characteristics with psychiatric outpatients. Paper presented at the annual meeting of the Society for Personality Assessment, San Francisco, CA. Reynolds, W.M., Kobak, K.A., Greist, J.H., Jefferson, J.W., & Tollefson, G.D. (1993, May). Fluoxetine versus imipramine: Changes in suicidal ideation. Paper presented at the annual meeting of the American Psychiatric Association, San Francisco, CA. Reynolds, W.M., & Mazza, J.J. (1992). Suicidal Behavior History Form: Clinician's guide. Odessa, FL: Psychological Assessment Resources. Riskind, J.H., Beck, A.T., Brown, G., & Steer, R. A. (1987). Taking the measure of anxiety and depression: Validity of the reconstructed Hamilton scales. Journal of Nervous and Mental Disease, 175, 474-479. Rosenberg, M. (1965). Society and the adolescent self-image. Princeton, NJ: Princeton University Press. Rush, A.J., Kovacs, M., Beck, A.T., Weissenburger, J., & Hollon, S.D. (1981). Differential effects of cognitive therapy and pharmacotherapy on depressive symptoms. Journal of Affective Disorders, 3, 221-229. Shapiro, S., Skinner, E.A., Kessler, L.G., Von Korff, M., German, P.S., Tischler, G.L., Leaf, P.J., Benham, L., Cottler, L., & Regier, D.A. (1984). Utilization of health and mental health services. Three epidemiologic catchment area sites. Archives of General Psychiatry, 41, 971-978. Skinner, H.A., & Allen, B.A. (1983). Does the computer make a difference? Computerized versus face-to-face versus self-report assessment of alcohol, drug, and tobacco use. Journal of Consulting and Clinical Psychology, 51, 267-275. Spitzer, R.L., Endicott, J., & Robins, E. (1978). Research diagnostic criteria: Rationale and reliability. Archives of General Psychiatry, 35, 773-782. Spitzer, R.L., Williams, J.B., Gibbon, M., & First, M.B. (1988). Structured Clinical Interview for DSM-III-R. New York: New York Psychiatric Institute. Teri, L., & Lewinsohn, P.M. (1982). Modification of the pleasant and unpleasant events schedules for use with the elderly. Journal of Consulting and Clinical Psychology, 50, 444-445. Thase, M.E., Carpenter, L., Kupfer, D.J., & Frank, E. (1991). Atypical depression: Diagnostic and pharmacologic controversies. Psychopharmacology Bulletin, 27, 17-22. Thase, M.E., Hersen, M., Bellack, A.S., Himmelhoch, J. M., & Kupfer, D.J. (1983). Validation of a Hamilton subscale for endogenomorphic depression. Journal of Affective Disorders, 5, 267-278. Weissman, M.M., Klerman, G.L., Markowitz, J. S., & Ouellette, R. (1989). Suicidal ideation and suicide attempts in panic disorder and attacks. New England Journal of Medicine, 321, 1209-1214. Whisman, M.A., Strosahl, K., Fruzzetti, A.E., Schmaling, K.B., Jacobson, N.S., & Miller, D. M. (1989). A structured interview version of the Hamilton Rating Scale for Depression: Reliability and validity. Psychological Assessment, 1, 238-241. Widmer, R.B., & Cadoret, R.J. (1979). Depression in family practice: Changes in pattern of patient visits and complaints during subsequent developing depression. Journal of Family Practice, 9, 1017-1021. Wiggins, J.S. (1973). Personality and prediction: Principles of personality assessment. Reading, MA: AddisonWesley. Williams, J.B.W. (1988). A structured interview guide for the Hamilton Depression Rating Scale. Archives of General Psychiatry, 45, 742-747. Zimmerman, M., Black, D.W., & Coryell, W. (1989). Diagnostic criteria for melancholia: The comparative validity of DSM-III and DSM-III-R. Archives of General Psychiatry, 46, 361-368. Zimmerman, M., Coryell, W., Pfohl, B., & Stangl, D. (1986). Validity of the Hamilton Endogenous Subscale: An independent replication. Psychiatry Research, 18, 209-215. Zung, W.W.K. (1965). A self-rating depression scale. Archives of General Psychiatry, 12, 63-70.

< previous page

page_969

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_971

next page > Page 971

Chapter 31 Beck Anxiety Inventory Kimberly A. Wilson Edwin De Beurs Carleton A. Palmer Dianne L. Chambless University of North Carolina at Chapel Hill For several reasons, including a discussion of the Beck Anxiety Inventory (BAI; Beck, Epstein, Brown, & Steer, 1988) in a book devoted to psychological testing's contribution to clinical practice makes sense. First, anxiety represents one of the most pervasive set of symptoms seen in clinical work (Williams & Poling, 1989), and literature reviews demonstrate that the BAI is among the most common measures used in current research to assess this construct. Second, the BAI is a quick, inexpensive self-report measure, which are features that make its use particularly appealing in a field increasingly focused on cost-effectiveness. Third, although the BAI enjoys widespread use in research protocols, practical applications of the measure outside of this realm remain largely unstudied in the literature, and coherent examinations dedicated specifically to this topic are lacking. The goal of this chapter is to review relevant information on the BAI that addresses its ability to effectively contribute to the care of those who seek treatment for psychological problems. Overview Development The BAI is a 21-item, self-report measure designed by Beck and his colleagues to assess the severity of anxiety symptoms (Beck et al., 1988). It is composed of items drawn from the Anxiety Check List (ACL; Beck, Steer, & Brown, 1985), the Physicians' Desk Reference Check List (PDR; Beck, 1978), and the Situational Anxiety Check List (SAC; Beck, 1982). In developing this scale, the authors paid particular attention to minimizing any relation to depressive symptoms; the overlap in scores found in previous measures of anxiety and depression has been problematic (Gotlib & Cane, 1989; Lipman, 1982; Snaith & Taylor, 1985).

< previous page

page_971

next page >

< previous page

page_972

next page > Page 972

The initial pool of 86 items was reduced by administering the ACL, PDR, and SAC to a sample of 810 depressed and anxious outpatients from the Center for Cognitive Therapy at the University of Pennsylvania. Items were eliminated if they were duplicates, judged to be excessively similar, or if they were not supported by the principal factor analysis conducted following initial administration. The scale composed of the surviving 37 items was then completed by a new group of 116 outpatients. Data collected from this administration were subjected to correlational analysis with criterion measures and comparisons between different diagnostic groups, used to further reduce the number of items. The resulting 21 items were then tested on a new sample of 160 outpatients and now constitute the BAI. Available Norms Individual norms for patients with panic disorder with and without agoraphobia, social phobia, obsessivecompulsive disorder, and generalized anxiety disorder are presented in the BAI manual (Beck & Steer, 1990). Results indicate that those with panic disorder tend to score higher than those with other anxiety disorders. Data from other studies support this finding (e.g., Fydrich, Dowdall, & Chambless, 1992). Later investigations also demonstrate that patients with anxiety disorders obtain higher means than those with mood disorders (Steer, Rissmiller, Ranieri, & Beck, 1993; but see Steer, Ranieri, Beck, & Clark, 1993) and no disorder (Borden, Peterson, & Jackson, 1991; Dent & Salkovskis, 1986; Gillis, Haaga, & Ford, 1995; A. Osman, Barrios, Aukes, J.R. Osman, & Markway, 1993) (see Table 31.1). TABLE 31.1 BAI Means of Groups with and without Anxiety Disorders Group M SD n Citation Panic Disorder with Agoraphobia 27.27 13.1195a Beck & Steer (1990) 28.29 14.2949a Fydrich et al. (1992) Panic Disorder without 28.81 13.4693a Beck & Steer (1990) Agoraphobia 22.22 12.2111a Fydrich et al. (1992) Social Phobia 17.77 11.6444a Beck & Steer (1990) Obsessive-Compulsive Disorder 21.69 12.4226a Beck & Steer (1990) Generalized Anxiety Disorder 18.83 9.08 90a Beck & Steer (1990) Mood Disorders 19.44b,c,d11.43118e Steer, Rissmiller, Ranieri, & Beck (1993) 23.89c 12.3968a,e Hewitt & Norton (1993) 12.81d 8.39 16a,e Hewitt & Norton (1993) No Disorder 10.5 9.12 293f Borden, Peterson, & Jackson (1991) 6.6 8.10 242g Gillis, Haaga, & Ford (1995) 11.08 9.10 65f Dent & Salkovskis (1986) 7.78 5.65 65g Dent & Salkovskis (1986) 11.54 10.26225g Osman et al. (1993) 13.1 9.6 326f,hCreamer et al. (1995) 11.8 9.2 326f,i Creamer et al. (1995) a Outpatient sample. b Computer-assisted administration. c Unipolar. d Bipolar. e Inpatient sample. f Undergraduate student sample. g Non-student community sample. h Time 1 administration, low stress. i Time 2 administration, high stress.

< previous page

page_972

next page >

< previous page

page_973

next page > Page 973

Psychometric Properties Reliability. Several studies have demonstrated that the BAI serves as a reliable measure (e.g., Fydrich et al., 1992). The internal consistency of the instrument appears to be quite high, with alphas ranging from .90 to .94 in both clinical and nonclinical samples at a variety of developmental stages (Beck et al., 1988; Borden et al., 1991; de Beurs, Wilson, Chambless, Goldstein, & Feske, 1997; Fydrich et al., 1992; Hewitt & Norton, 1993; Jolly, Aruffo, Wherry, & Livingston, 1993; Kabakoff, Segal, Hersen, Van Hasselt, 1997; Osman et al., 1993; Steer, Kumar, Ranien, & Beck, 1995; Steer, Ranieri et al., 1993). Analysis of test-retest reliability in a 7- to 11-day period yielded correlations of .67 (Fydrich et al., 1992) and .75 (Beck et al., 1988); de Beurs et al. (1997) reported high test-retest reliability (r = .83) and stability over a 1-month period. Validity. Support for the convergent validity of the BAI has been demonstrated in adult clinical populations (e.g., Beck et al., 1988; Beck & Steer, 1991; de Beurs et al., 1997; Fydrich et al., 1992; Steer et al., 1995; Steer, Ranieri et al., 1993), adolescent psychiatric patients (Jolly et al., 1993; Steer et al., 1995), older adult psychiatric patients (Kabacoff et al., 1997), and community samples (Borden et al., 1991; Creamer, Foran, & Bell, 1995). Results from studies of adult clinical populations produced moderate to high correlations between the BAI and anxiety diaries (r = .54-.56; de Beurs et al., 1997; Fydrich et al., 1992), the Hamilton Rating Scale for Anxiety (Hamilton, 1959; r = .51; Beck et al., 1988), the Cognition Checklist Anxiety Subscale (Beck, Brown, Steer, Eidelson, & Riskind, 1987; r = .51; Beck et al., 1988), the State-Trait Anxiety Inventory (Spielberger, 1983; r = .47-.58; Fydrich et al., 1992; Kabakoff et al., 1997), the anxiety subscale of the SCL-90-R (Derogatis, 1983; r = .81; Steer, Ranieri et al., 1993), and the anxiety subscale of the Brief Symptom Inventory (Derogatis, 1975; r = .78; de Beurs et al., 1997). The content validity of the BAI is more questionable. Although the included items (e.g., nervous, shaky, unable to relax) accurately depict the construct of anxiety, they do not accomplish this goal fully. For example, cognitive components of anxiety, such as heightened self-focus and fear of social harm, are not substantially represented in the scale. Additionally, symptoms of anxiety such as restlessness and fatigue are not included in the inventory, possibly to avoid overlap with depressive symptoms. Support for the divergent validity of the measure has been equivocal, particularly as it relates to depression. This limitation appears especially problematic because the BAI was designed to assess the severity of anxious symptoms in a way that would minimize any relation to depression. Although the BAI is generally more successful than other self-report anxiety instruments in discriminating between anxious and depressed patients (e.g., STAI; Fydrich et al., 1992), correlations between the BAI and measures of depression often run quite high. For example, Steer, Ranieri et al. (1993) found that, in a sample of outpatients with various diagnoses, the BAI correlated .61 with the Beck Depression Inventory (BDI; Beck & Steer, 1987) and .62 with the depression subscale of the SCL-90-R (Derogatis, 1983), although the correlation with the anxiety subscale of the SCL-90-R was even higher (r = .81). Furthermore, the mean BAI score for the anxious group (M = 20.66, SD = 12.65) did not significantly differ from the mean BAI score for the depressed group (M = 20.78, SD = 12.55). However, comorbidity of anxiety and depression is extensive, and the authors suggest that insufficient attendance to diagnostic overlap may account for these findings. Other studies have generally yielded lower yet substantial correlations between the BAI and the BDI, with indices ranging

< previous page

page_973

next page >

< previous page

page_974

next page > Page 974

from .45 to .63 (Beck et al., 1988; de Beurs et al., 1997; Fydrich et al., 1992; Hewitt & Norton, 1993). Although high correlations between measures of anxiety and depression initially appear troubling, this consistent finding may result from a genuine overlap between anxiety and depression. According to some researchers (Clark, Steer, & Beck, 1994; Watson & Kendall, 1989), both mood states are characterized by the same underlying construct of negative affectivity. This concept is defined as the tendency to feel worry, disgust, sadness, and self-dissatisfaction, characteristics that consistently load onto a single factor when analyzed within the full spectrum of emotional states. The difference between anxiety and depression lies in the presence of neurophysiological arousal in anxious patients and the absence of positive affectivity (e.g., joy, delight, enthusiasm, and energy) in depressed patients. Because current measures of depression do not include components of positive affectivity, and omit items that might capture a depressive state more distinctly, high correlations between instruments designed to assess anxiety and depression are to be expected. Despite these high correlations, items from the BAI and items from the BDI consistently load on two separate factors (Beck et al., 1988; Hewitt & Norton, 1993), indicating that the two measures do indeed assess unique constructs to some degree. Therefore, although evidence exists for both conceptual and measurement overlap, anxiety and depression as measured by the BAI and BDI are distinct entities. Interpretive Strategy Each of the 21 items of the BAI is scored from 0 ("not at all bothered") to 3 ("severely bothered"), yielding a total sum score that can range from 0 to 63. According to the manual (Beck & Steer, 1990), this score can be interpreted as follows: 0 to 9 reflects normal anxiety, 10 to 18 indicates mild to moderate anxiety, 19 to 29 is considered moderate to severe anxiety, and 30 to 63 suggests severe anxiety. Although this interpretive strategy is recommended, it is unclear how these cutoff scores were derived and to what extent they accurately reflect the intensity levels described. Clinical experience with anxious patients scoring in the 20 to 29 range indicates such patients to be quite intensely anxious, and there are seldom scores of 30 or above. The BAI manual recommends that additional factors be considered when evaluating sum scores of the BAI. Some studies (Borden et al., 1991; Hewitt & Norton, 1993; Osman et al., 1993; Steer, Ranieri et al., 1993) have found that means for women are up to five points higher than means for men, a finding that suggests that gender might serve as a relevant factor in the evaluation of BAI scores. However, this finding is likely not artifactual or limited to the BAI, given that women generally report more anxiety and have higher rates of anxiety disorders than men (Kessler et al., 1994). Age is also listed as an important demographic characteristic to consider when evaluating the BAI. Older patients do tend to earn lower scores than younger patients (Gillis et al., 1995), but this trend is also consistent with findings from the prevalence of anxiety disorders in patients across the life span (Kessler et al., 1994). In addition to examination of sum scores of the BAI, factor-analyzed clusters of items may be worth evaluating to obtain more specific information about the patient's anxiety symptoms. For example, Beck et al. (1988) reported that the instrument consisted of four factors: neurophysiological, subjective, panic, and autonomic symptoms. This finding has been replicated in at least two studies (Osman et al., 1993; Steer, Ranieri

< previous page

page_974

next page >

< previous page

page_975

next page > Page 975

et al., 1993). Other investigations (Beck et al., 1988; Hewitt & Norton, 1993; Kabakoff et al., 1997; Steer, Rissmiller et al., 1993) have yielded a two-factor solution, comprising subjective/cognitive complaints and somatic complaints. The breakdown of these factors is presented in Table 31.2. Use of the BAI for Treatment Planning General Issues in Treatment Planning The BAI was designed to assess the severity of self-reported anxiety among patients seeking treatment from mental health care providers. Therefore, use of the instrument for treatment planning could potentially include: initial screening to identify if anxiety is a relevant problem area for the patient; assessment of whether the severity is sufficiently above normal to warrant focusing on anxiety treatment and, if so, the extent of the severity; identification of the predominant features of the person's experience of anxiety (e.g., subjective, such as fear of losing control and being terrified, vs. somatic, such as heart pounding and hands trembling); and recognition of the presence of comorbid anxiety if other disorders are suspected to be primary. The BAI may serve as a cost-effective method to accomplish the previous goals. It is a short measure that requires approximately 8 minutes to complete (Demos & Prout, 1994). Therefore, the instrument could be used as a quick screen to identify if anxiety serves as a problem worth evaluating further. In addition, given the strong support for the measure's convergent validity and reliability, clinicians need not sacrifice psychometric soundness for this efficiency. Another example of the cost-effectiveness of the BAI involves the time required by the clinician. Although trained professionals must interpret the results, clerical staff can easily administer and score the BAI with minimal supervision and training. Another strong point of the BAI is its simplicity. The instructions and items are short and easily understood, and the format is straightforward for the respondent. These strengths are particularly important for clients who are in distress and asked to complete long questionnaires about their complaints. Use of the BAI eases this burden and increases the likelihood that information will be gathered with sufficient accuracy. Consideration of factor scores from the BAI may assist the clinician in planning effective treatment. Although the literature does not provide consistent empirical support for the therapeutic value of matching (e.g., using relaxation to treat a patient whose somatic symptoms are most salient), this approach may have superior face validity in the clients' view and thus enhance their confidence in the treatment. Potential Problems in the Use of the BAI for Treatment Planning Although the BAI purports to assess anxiety, and convergent validity initially appears to be more than adequate, questions remain about the precise essence of what the BAI measures. As described earlier, conceptual and measurement overlap with depression instruments suggests that the BAI would serve as an unsuitable measure to assess differential diagnosis with depression. In addition, debate has recently ensued on whether the BAI fails to assess symptoms beyond those of panic disorder. Potentially, then, use

< previous page

page_975

next page >

< previous page

page_976

next page > Page 976

TABLE 31.2 Factor Solutions Osman et al. (1993) and Steer, Ranieri, et al. (1993) Factor 1: Factor 2: Factor 3: Factor 4: Panic Subjective Neurophysiological Autonomic Choking Nervous Shaky Face flushed Breathing Scared Unsteady Feeling hot Fear of dying Worst happening Wobbliness Sweating Heart pounding1 Losing control Faint Indigestion1 Unable to relax Dizzy Terrified Numbness Trembling Kabakoff et al. (1997) Factor 1: Somatic Factor 2: Subjective Numbness Unable to relax Feeling hot] Terrified Wobbliness Nervous Dizzy Fear of losing control Heart pounding Fear of dying Unsteady Scared Chokin Trembling Shaky Difficulty breathing Indigestion Faint Face flushed Sweating Hewitt & Norton (1993) Factor 1: Cognitive Factor 2: Somatic Terrified Face flushed Scared Feeling hot Losing control Numbness or tinging Worst happening Dizzy or lightheaded Nervous Wobbliness in legs Unable to relax Sweating Shaky Heart pounding Hands trembling Unsteady Fear of dying Difficulty breatihing Faint Indigestion Chocking Beck et al. (1988) Factor 1: Cognitive Factor 2: Somatic Terrified Face flushed Losing control Feeling hot Worst happening Numbness or tingling Nervous Dizzy or lightheaded Unable to relax Wobbliness in legs Shaky Sweating Choking Heart pounding Fear of dying Unsteady Difficulty breathing Scared Indigestion Faint Hands trembling Shaky 1 ''Heart pounding" and "Indigestion" did not significantly load onto any of the four factors in Steer, Ranieri, et al. (1993).

< previous page

page_976

next page >

< previous page

page_977

next page > Page 977

of the BAI to plan a treatment strategy may be misleading if the clinician assumes the results of the administration describes accurately the construct of anxiety for all groups. Recently, some researchers have begun to critique the BAI for its excessive emphasis on panic symptoms to the exclusion of general anxiety. Specifically, a debate developed in a recent issue of Behaviour Research and Therapy in which Cox and colleagues proposed that the BAI is excessively "panic-centric" (Cox, Cohen, Direnfeld, & Swinson, 1996). In this discussion, they claimed that most of the items in the BAI directly mimic the criteria outlined in the Diagnostic and Statistical Manual of Mental Disorders for panic attacks (DSM-IV; American Psychiatric Association, APA, 1994), neglect other relevant symptoms of anxiety such as fatigue and restlessness, and fail to capture the class of anxiety symptoms experienced by those with other anxiety disorders such as generalized anxiety disorder and specific phobias. As partial support for their contention, Cox and colleagues cited the finding that panic disorder patients consistently score higher on the BAI than patients with other anxiety disorders. They also provide results from a factor analysis that found subsets of items from the BAI and the Panic Attack Questionnaire (PAQ; Norton, Dorward, & Cox, 1986) consistently loading onto the same factors. Steer and Beck (1996) refuted these arguments by demonstrating that panic disorder samples also obtain higher ratings of generalized anxiety when the Revised Hamilton Anxiety Rating Scale (Riskind, Beck, Brown & Steer, 1987) is used to assess these symptoms. They concluded from this finding that the higher scores obtained on the BAI by patients with panic disorder are not spurious, but reflect genuinely heightened anxiety experienced by patients in this group compared to those with other anxiety disorders. In addition, they explained that although many of the items in the BAI are identical to criteria for panic attacks, several of the items are also duplicates of criteria for GAD. It should be noted, however, that the GAD criteria they cited derive from the revised third edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-III-R; APA, 1987) rather than the more current DSM-IV (APA; 1994). In fact, the latest version of the DSM has deleted most of the items from the GAD category that were represented in the BAI in order to improve differential diagnosis between panic disorder and GAD. Examination of the instructions for the BAI also support the conclusions of Cox et al. (1996) and suggests that the BAI may excessively tap symptoms of panic disorder. Respondents are directed to indicate the degree to which they "have been bothered by each symptom," rather than the degree to which they have experienced each symptom. This distinction appears subtle, but the difference may be an important one. A critical feature of patients with panic disorder, one that sets them apart from those with other anxiety disorders, is fear of fear. Data indicate that patients with panic disorder score higher on measures that assess fear of body sensations when compared to those with generalized anxiety disorder (GAD), obsessive-compulsive disorder (OCD), and social phobia (Chambless & Gracely, 1989). Therefore, querying the extent to which respondents are bothered by various symptoms (rather than experience those symptoms) may tap fear of fear and, as a result, be most applicable to patients with panic disorder. Only one study to date appears to address this question (de Beurs et al., 1997), though indirectly. In a principal components factor analysis, de Beurs and colleagues found that the BAI loaded on a separate factor from the Body Sensations Questionnaire (Chambless, Caputo, Bright, & Gallagher, 1984), a measure designed to assess fear of somatic complaints frequently associated with panic. These results imply that the instructions of the BAI do not lead respondents to anchor their answers to fear of the symptoms but, rather, to the presence of the symptoms themselves. Nevertheless, this question may

< previous page

page_977

next page >

< previous page

page_978

next page > Page 978

deserve further exploration given that this study serves as the only investigation to address this issue. In sum, it is possible that results from the BAI better represent the degree of panic-related anxiety experienced by the patient than the degree of anxiety in general. As a consequence, patients without panic disorder may appear less symptomatic than they truly are. Furthermore, regardless of diagnostic category, the specific problem experienced by the patient may be misidentified if the clinician interprets the results to indicate general anxiety as opposed to symptoms experienced by those with panic disorder. Hopefully, future research will address this issue, because data are inconclusive and the debate is currently far from resolved. Use of the BAI with Other Evaluation Data Given the narrow focus of the BAI, additional scales ought to be considered to add to data obtained from the BAI and define more thoroughly the presenting problems of the patient. When the patient's primary problem is an anxiety disorder, the BAI should be supplemented by other measures, for example: the Mobility Inventory for agoraphobia (Chambless, Caputo, Jasin, Gracely, & Williams, 1985); the Padua Inventory for obsessivecompulsive disorder (Sanavio, 1988); the Penn State Worry Questionnaire for generalized anxiety disorder (Meyer, Miller, Metzger, & Borkovec, 1990); the Social Phobia and Anxiety Inventory for social phobia (Turner, Beidel, Dancu, & Stanley, 1989); the PTSD checklist for posttraumatic stress disorder (Blanchard, Jones-Alexander, Buckley, & Forneris, 1996); or target ratings for specific phobias (Marks & Mathews, 1979). Structured interviews, though more time consuming, also can augment questionnaire findings. These include the Anxiety Disorders Interview Schedule-Revised (DiNardo & Barlow, 1988), the Hamilton Anxiety Rating ScaleRevised (Hamilton, 1959), the Panic Disorder Symptom Severity Scale (Shear et al., 1997), the Yale-Brown Obsessive-Compulsive Scale (Goodman, Price, Rasmussen, & Mazure, 1989), and the clinician-administered PTSD scale (Blake, Weathers, Nagy, & Kaloupek, 1995). Provision of Feedback Findings from the BAI can be used to provide feedback to both the patient and third-party payers. Specifically, clinicians can reference scores obtained on the measure to normative means from community samples or those with anxiety disorders (see Table 31.1) to describe to patients where their scores fall within the full range of anxiety severity. Furthermore, the patients may benefit from analysis of the factors within the scale to clarify the patterns of their symptoms along with the subsequent implications for the proper course of treatment. Additionally, data from the BAI can be used to facilitate justification for treatment to insurance providers by serving as documentation for the presence and severity of a treatable problem. Use of the BAI for Treatment Monitoring The BAI may serve as a quick and effective method to track progress throughout treatment. Because the measure requires little time to complete, it can be administered at the beginning of each session without demanding much effort from the patient or clinician.

< previous page

page_978

next page >

< previous page

page_979

next page > Page 979

In addition, the 1-week time frame referenced in the instructions for how the patient has been feeling is an appropriate and useful period to query, and therefore is an advantage to using the measure. Data gathered from periodic completion of the questionnaire in conjunction with scores from other pertinent measures can inform not only the patient and clinician of the gains and setbacks over time, but also third-party payers who often request documentation of the effects of treatment. Scores from the BAI along with those from other instruments may serve as justification for why continued therapy is recommended or why the treatment is proposed to lead to positive change. For example, a significant and stable reduction in scores earned on the subjective subscale for a given patient may indicate progress made in the cognitive domain, and her continued heightened scores on the somatic factor may suggest that further treatment is warranted to specifically address that problem area. In addition, this message becomes more compelling if, for instance, the Body Sensations Questionnaire (Chambless et al., 1984) is used to supplement the finding that the patient continues to struggle with a fear of physical sensations associated with panic. Furthermore, consistent use of the BAI to track the patient's fluctuations throughout therapy may not only suggest the appropriate content area to focus on, but also can indicate when a particular treatment approach is not working and requires rethinking (see Case Study later). Use of the Instrument for Treatment Outcomes Assessment An expert panel from the National Institute of Mental Health devised a set of 11 criteria for clinicians to evaluate the merit of instruments in assessing treatment outcome (Newman & Ciarlo, 1994). Here, those criteria are followed in an analysis of the strengths and weakness of the BAI in effectively fulfilling that role. Criterion 1 Relevance to Target Group The extent to which the BAI effectively measures relevant constructs for various groups remains in question. As already described, items on the BAI may overrepresent symptoms experienced in panic disorder and omit items that are more relevant for those with other anxiety disorders. Therefore, the measure appears to be best suited for clients diagnosed with panic disorder, particularly if the assessment question focuses on the panic itself rather than the accompanying general anxiety. However, norms are available for various samples representing a variety of anxiety disorder and nonclinical populations, facilitating use with a more extensive distribution of clients. Administration of the BAI need not be modified to accommodate different groups. The procedure for completing the questionnaire is straightforward and standardized, enhancing the consistency with which scores can be compared. The measure is brief enough that it could be administered orally to clients whose ability to read is limited. Criterion 2 Simple, Teachable Methods The simplicity of administration of the BAI reflects a great strength of the measure. The format for completing the questionnaire is clear and simple, and the individual items are short, as is the instrument as a whole. The answer key remains the same for

< previous page

page_979

next page >

< previous page

page_980

next page > Page 980

every item, facilitating comparison among items as well as the ease and speed with which the measure can be filled out. Scoring the instrument requires less than 5 minutes by minimally trained clerical staff. The manual is comprehensive and thoroughly explains the process of interpretation. The BAI could easily be administered by computer if one wanted to develop a program (e.g., Steer, Rissmiller et al., 1993). Criterion 3 Use of Measures with Objective Referents Items on the BAI query how much the respondent has been bothered by specific behavioral and cognitive symptoms in the past week. The content and presentation of these items are the same for all those who complete the questionnaire and are not modified to accommodate the specific complaints of a given client. Therefore, standardization of the measure and generalizability of the findings are enhanced. This criterion is fully met by the BAI. Criterion 4 Use of Multiple Respondents The BAI is designed to be completed by individuals presenting with problems of anxiety. Many of the items tap constructs that would be difficult to rate by outside observers (e.g., fear of losing control, heart pounding), necessitating the format of self-report. Clinician ratings and interviews can be used to supplement findings from the BAI and serve as concurrent validation of the client's status. Criterion 5 More Process-Identifying Outcome Measures The BAI has limited utility as a measure of the process of change with treatment. Criterion 6 Psychometric Strengths As already described, the BAI possesses strong psychometric qualities. Indices of test-retest reliability and internal consistency are high, and have been demonstrated in several studies with various samples from clinical and nonclinical populations. In addition, convergent validity has been well established with high correlations between the BAI and other measures of anxiety. In support of divergent validity, these correlations consistently exceed those between the BAI and measures of depression, though the latter correlations still run high. One study has demonstrated the BAI's sensitivity to treatment effects, with a pre- to posttest change effect size comparable to other measures of anxiety such as the Brief Symptom Inventory (BSI) Anxiety subscale and panic diaries (de Beurs et al., 1997). Criterion 7 Low Measure Costs Relative to Its Uses The BAI appears to serve as a cost-effective instrument for treatment outcome evaluation. The materials are inexpensive, administration and scoring are quick, and the requisite training for scoring and interpretation is minimal. The economic benefits are numerous. For example, use of the BAI to monitor progress can indicate when problems arise, reduce session time wasted on ineffective treatment, and allow for shifts in focus that may be more

< previous page

page_980

next page >

< previous page

page_981

next page > Page 981

therapeutic. Furthermore, discussion of findings can serve as an efficient way to communicate the results of therapy to clinicians, clients, families, and third-party payers. Criterion 8 Understanding by Nonprofessional Audiences Results produced by the BAI are easily interpretable by consumers, families, and larger systems such as insurance companies and regulating agencies. The manual provides ranges of scores that represent normal, moderate, and severe anxiety, although the reasoning behind the cutoff decisions is unclear. To further evaluate the meaning of BAI scores, norms are available to facilitate comparison among various clinical and nonclinical groups. Criterion 9 Easy Feedback and Uncomplicated Interpretation Unlike other assessment measures, the BAI has no accompanying scoring program that automatically produces reports. However, the simplicity of the procedure for scoring and interpreting the results may make this luxury unnecessary. Criterion 10 Useful in Clinical Services Use of the BAI facilitates clinical communication and decision making. Test results help to assess the severity of specific complaints and can identify those in need of treatment for anxiety. Furthermore, findings can assist in treatment planning and monitoring with frequent administration and serve as documentation for treatment outcome that can be communicated easily to patients, their families, insurance companies and other service providers. Criterion 11 Compatibility with Clinical Theories and Practices The BAI was developed using an outpatient sample, making the instrument well-suited to this population. It has since been normed on a number of diverse populations, with samples that include inpatients, outpatients, and community groups that range in age from adolescence to older adulthood. This broad distribution enhances the generalizability of the use of the instrument. Using the BAI to Evaluate Endstate Functioning Jacobson and colleagues (Jacobson & Revenstorf, 1988; Jacobson & Truax, 1991) proposed a system for evaluating endstate functioning in a clinically meaningful way. Patients can be categorized into one of four levels of posttreatment functioning: reliably deteriorated, unchanged, reliably improved, and recovered. Allocation is based on

< previous page

page_981

next page >

< previous page

page_982

next page > Page 982

calculations using statistical criteria that include reliability coefficients of valid instruments as well as established norms and variance. To be labeled reliably improved, a patient must show change on a measure beyond what could be expected by chance alone, given the measurement error of the instrument. For recovery the criteria are even more rigorous. To be considered recovered, two criteria must be met: One has to be both reliably improved and also score below a cutoff point that distinguishes dysfunctional from functional populations (Criterion c). The cutoff point is the score halfway between the variance-corrected means for the normal population and patients. The following analysis examines the utility of the BAI in applying Jacobson's criteria for assessment of patients with panic disorder. Because men consistently score lower on the BAI than women, reliable improvement and recovery indices were calculated separately for each sex. Using the data on reliability from Beck and Steer (1990) and variance scores from de Beurs et al. (1997), the required reliable change index score of 1.96 (p < .05) translates into a required reduction of at least 15 points for women and 17 points for men for demonstrating change. On average, this change necessitates nearly one scale point reduction per item, a stringent criterion to pass. Data from de Beurs et al.'s (1997) study with panic disorder patients and community norms from Osman et al. (1993) was used to calculate the cutoff points for recovery. The cutoff point c on the BAI for patients with panic disorder with agoraphobia is 18.4 for women and 15.3 for men. Thus, female patients (formerly) diagnosed with panic disorder with agoraphobia are more likely to come from the functional population than the dysfunctional population if they obtain scores on the BAI of 18 or less. For male patients, this score is 15 or less. Figures 31.1 and 31.2 depict the possible outcomes of endstate functioning when various combinations of pretreatment and posttreatment BAI scores are considered for women and men, respectively, diagnosed with panic disorder with agoraphobia. The diagonal area represents uncertainty due to the measurement error of the instrument. Combinations of scores above or below that area demonstrate change (for better or worse) beyond that which can be expected from chance fluctuations. If the posttest score falls to the right of the uncertainty band, below the diagonal, reliable improvement has been attained. Furthermore, scores that fall within this area and below the cutoff point that distinguishes panic disorder populations from normal populations (the horizontal line) indicate that the patient may be labeled recovered. Application of the Jacobson criteria with the BAI proves problematic. The standard measurement error of the BAI is quite large, which makes the instrument imprecise for assessment of clinically meaningful change in individual cases. This point is illustrated using data from an ongoing study in the clinic. Patients who suffered from panic disorder with agoraphobia received six sessions of an experimental treatment, either eye movement desensitization and reprocessing (EMDR; Shapiro, 1995) or associative therapy and relaxation treatment (ART). Patients completed a battery of self-report measures, which included the BAI, before and after treatment. Gains achieved on the BAI through these brief experimental treatments are depicted in Table 31.3, which shows the number and proportion of patients who were unchanged, reliably changed, and recovered. The first criterion of reliable improvement appears to be particularly stringent. A shift in BAI score of at least 15 points is only attained by 21% of the sample, even though, following treatment, a third of the sample earned scores below 10 on the BAI, that is, normal levels of anxiety according to the BAI manual (Beck & Steer, 1990). This failure to meet criterion results when pretreatment scores were not sufficiently high to leave room for the dramatic change required to support reliable improvement given the wide band of measurement error. Five brief case presentations illustrate these problems:

< previous page

page_982

next page >

< previous page

page_983

next page > Page 983

Fig. 31.1. Endstate functioning according to Jacobson criteria for women diagnosed with panic disorder with agoraphobia. Case 1: This patient had a pretreatment score of 35 on the BAI, showing that she was quite anxious before treatment, a finding corroborated by high scores on other measures included in the assessment battery. Furthermore, her pretest score was sufficiently high to leave room for improvement. She responded quite well to EMDR and obtained a posttest score of 8 on the BAI. The reduction of 27 points indicated she had reliably improved, and her low posttest score was well within the range of the functional population. She met criteria for clinically significant change. Case 2: This patient earned a pretest score of 47 on the BAI, and a posttest score of 27 following ART. The reduction of 20 scale points documented statistically reliable improvement. However, her posttest score of 27 was well within the range of the dysfunctional population, suggesting she was still quite anxious. She met criterion for reliable change but is not considered to be recovered. Case 3: This patient scored 26 on the BAI at pretest, and 12 after ART. Although her posttest score was well within the range of the functional population, the reduction fell one point short of reliable improvement, and thus she was not deemed reliably changed. In Fig. 31.1, her score falls within the middle band, indicating uncertainty due to the measurement error of the BAI. Case 4: This patient obtained a pretest BAI score of 27 and a posttest score of 13. Her pretest score is very close to the average score of PDA patients and can therefore be considered typical. Her posttest score shows some residual anxiety, but is well within the range of the normal population. However, the Jacobson criteria label her unchanged because her reduction in anxiety falls one point short of that required by the reliable change index. Like Case 3, her score falls within the middle band of uncertainty in Fig. 31.1. Case 5: This patient had a surprisingly low score on the BAI, both at pretest and at posttest, 7 and 1, respectively, in spite of being severely distressed according to the data from the structured

< previous page

page_983

next page >

< previous page

page_984

next page > Page 984

Fig. 31.2. Endstate functioning according to Jacobson criteria for men diagnosed with panic disorder with agoraphobia. TABLE 31.3 Proportion of Panic Disorder with Agoraphobia Patients with Various Levels of Endstate Functioning Following Experimental Treatments Men Women n % n % Unchanged 4 80 23 79 Reliably changed 1 20 2 7 Recovered 0 0 5 14 Total 5 100 29 100 diagnostic interview. After the first administration of the BAI, she was questioned about completing the questionnaire in this manner. She indicated that she only experienced feelings of anxiety when she was in situations where she expected panic attacks. This was mainly in large open spaces, where she experienced feelings of being lost and depersonalization. She had organized her life around not exposing herself to such situations, and was quite successful in her avoidance. Hence, she had experienced no anxiety in the week before administration of the BAI, although she was severely dysfunctional. Obviously, the BAI is not well-suited for measuring any progress in treatment for this patient. Only a measure of avoidance would be sensitive to treatment.

< previous page

page_984

next page >

< previous page

page_985

next page > Page 985

As these case studies illustrate, fairly high BAI pretest scores are required before patients can meet criteria for reliable change but not recovery. Female patients must earn pretest scores of at least 33 (15 [index of reliable change] + 18 [the cutoff point for recovery]) and male patients must obtain scores of at least 32 (17 [index of reliable change] + 15 [the cutoff point for recovery]) to end up in the category of reliably changed, but not recovered. Because the mean score for PDA patients on the BAI is 27.27 (SD = 13.1; Beck & Steer, 1990), considerably less than half of a PDA sample will have a score of at least 33 or 32 on the BAI. Consequently, although many of the patients in this sample obtained posttest scores that are quite low, they are not considered recovered according to the Jacobson criteria because they show no reliable change, which is required for recovery. In some cases, this reflects reality (e.g., Case 5) whereas in others (e.g., Case 4) a patient who has made considerable improvement falls short of criterion because of the broad band of measurement error. This problem is not unique to the BAI but typical of anxiety measures if the patient sample has not been selected to be highly anxious before treatment. Thus, despite the attraction of using a system like Jacobson's (Jacobson & Truax, 1991) that allows the clinician to assert systematically reliable improvement, the clinician needs to be wary of applying this approach unless pretreatment scores are very high (e.g., above 33 for panic patients). Lest the results of treatment appear overly pessimistic with patients suffering from more moderate levels of anxiety, alternative approaches to presentation of outcome results might be considered. For example, as described by Kendall and Grove (1988), patients who score within one standard deviation of the mean of the normal population might be considered to be substantially improved. Although no studies to date have empirically evaluated the validity of this method, use of the proposed criterion may serve as a relatively conservative way to convey important information about patient status to clinicians, third-party payers, and the patients themselves. Unfortunately, this approach may be less convincing to informed audiences than the Jacobson (Jacobson & Truax, 1991) procedure's more statistically based criterion for change. Application of this alternative criterion to the PDA sample described earlier indicates that 68% fall within the normal range after treatment. Also due to the large measurement error of anxiety instruments such as the BAI, Jacobson's criteria (Jacobson & Truax, 1991) for reliable deterioration are stringent and may yield particularly low and misleading rates of patients whose functioning is declining. Clinicians should consider an increase in BAI score that is equal to one standard deviation from the mean to be indicative of decline warranting serious attention. Although deterioration is rare in the treatment of adult populations when appropriate treatment is used, use of the previous criteria may inform the clinician when the direction of treatment ought to be rethought (see Case Study section) or when clients are experiencing a noticeable exacerbation of symptoms they might not report during the session. Case Study Recently, cognitive-behavioral therapy was applied to a somewhat unusual case involving panic disorder. Over the course of treatment, the BAI was used to monitor the client's treatment progress and general level of anxiety. At several points during the course of treatment, the treatment plan was altered on the basis of the observed pattern of BAI scores to better meet the needs of this particular client. By the end of treatment, the pattern of BAI scores nicely illustrated the ups and downs of therapy (see Fig. 31.3).

< previous page

page_985

next page >

< previous page

page_986

next page > Page 986

Fig. 31.3. Progression of BAI scores throughout treatment of a patient with panic disorder. In addition to its use in monitoring treatment progress, the BAI was administered at the initial interview prior to treatment and following the last session as a measure of treatment outcome. In this capacity, the BAI also provided reasonably good indications of the client's pre- and posttreatment levels of anxiety. Course of Treatment T.C. is a middle-aged woman who presented for treatment at a university-based clinic. When she first contacted the clinic, T.C. was experiencing frequent panic attacks and complained of high general anxiety and tension. T.C. had been experiencing panic attacks for several months, beginning with a conversation with a friend who mentioned the death of a relative. The mention of a relative's death had set off fears of her own parents' possible move to a retirement home (a recently discussed topic) and ultimate death, which led to high anxiety and panic. That experience of panic was T.C.'s first since she had overcome panic disorder and agoraphobia approximately 20 years earlier. T.C. had been participating in psychodynamic psychotherapy focusing on her upbringing and familial relationships for several months prior to her arrival at the university clinic, but she decided to discontinue this treatment and pursue time-limited cognitive-behavioral treatment that more specifically targeted her panic disorder. During the first two sessions, which were devoted to an intake assessment, T.C. scored 22 and 19, respectively, on the BAI. At that time, T.C. reported frequent panic attacks in response to cues of death or loss. She also reported chronic tension that produced pain in her arms and shoulders, and stomach discomfort associated with Irritable Bowel Syndrome. Thus, T.C.'s scores on the BAI during this period reflected

< previous page

page_986

next page >

< previous page

page_987

next page > Page 987

severe anxiety and tension accompanied by frequent panic attacks. In contrast to typical panic disorder presentations, T.C. did not express much fear of bodily sensations, many concerns about dying or going crazy, or much phobic avoidance. Instead, she consistently reported fears of being ''vulnerable," such that she could not handle feelings of anxiety. As a result, treatment focused primarily on reducing her generalized anxiety and the frequency of her panic attacks rather than on changing her cognitions about the feared consequences of panic, as is more typical for panic patients. Once the therapist constructed a conceptualization of the case and a treatment plan, treatment began with two primary aims. The first aim was to reduce T.C.'s level of chronic tension and anxiety with progressive muscle relaxation training. This portion of the treatment began in Session 3 and proceeded relatively effectively over the course of treatment. The second aim was to reduce T.C.'s fear of emotional events, particularly those involving death or loss. Goals associated with this aim were to help T.C. establish a connection between emotional deprivation that she experienced as a child and her current fear of emotional reactions, such as sadness, and to guide her through exposure to emotional stimuli. In contrast to the first aim of treatment, the goals associated with the second treatment aim were altered several times over the course of treatment. Sessions 3 through 5 were devoted to beginning progressive muscle relaxation training and to establishing the framework for exposure to emotional stimuli. During this time, the therapist and T.C. discussed distressing experiences with her parents involving emotions and her tendency to disconnect herself from emotional experiences. The goal of these sessions was to establish a framework through which exposure to emotional stimuli could proceed. Although she responded with understanding and interest to this conceptualization of her anxiety and panic, T.C. became increasingly negative about discussing issues involving her parents and her style of dealing with emotions over the course of Sessions 3 through 5. In addition, her anxiety level remained high, and she continued to experience occasional panic attacks. Furthermore, T.C. commented on several occasions that the material being dealt with was too difficult to address. Over these first three sessions, T.C.'s anxiety was reflected in high scores on the BAI. T.C. scored 16 at Session 3 and 22 at Session 5. Session 4 was preceded by 2 weeks during which the therapist was on vacation. T.C.'s BAI score at this session was a surprisingly low 6. When asked about this relatively low level of anxiety, T.C. remarked that these 2 weeks had been especially peaceful and relaxing: the demands of her schedule had lessened, and she had not had to think about therapy or the issues being discussed. Thus, in contrast to typical clients with panic disorder who experience heightened and more frequent anxiety in the therapist's absence, T.C. seemed to be relieved by the break in therapy. Such an explanation for the pattern of scores observed across these three sessions helped the therapist decide that the initial approach provoked too much anxiety in the client to be therapeutic. Therefore, the exposure experiences were reframed in a way that was more accessible to the client. For Sessions 6 though 10, the therapist altered the conceptualization of T.C.'s panic to place more emphasis on her tendency to become tense and "fight off" anxiety in fearful situations. During this time, T.C. was given readings and instruction concerning the method of "floating" with the anxiety until it dissipated. Initially, the notion of "floating" led T.C. to feel passive and vulnerable, which produced high anxiety. This anxiety was reflected in her BAI scores of 17 and 16 for Sessions 6 and 7. At Session 8, T.C.'s score on the BAI was 6. During this session, she reported making a connection to a time when she had used the floating technique successfully in her own life. As a result of this insight, T.C. appeared much less anxious and much more confident in

< previous page

page_987

next page >

< previous page

page_988

next page > Page 988

her ability to cope with anxiety. After a week in which she missed therapy, however, T.C. returned for Sessions 9 and 10 with BAI scores of 14 and 16. During these sessions, she reported a loss of confidence in her ability to use the floating technique, because she was no longer experiencing panic attacks and therefore could not effectively practice the technique. Thus, whereas most clients with panic disorder feel less anxious when their panic attacks subside, T.C. paradoxically responded with anxiety to this improvement. During Sessions 9 and 10, T.C. also reported an increase in painful gastrointestinal symptoms, which the therapist viewed as related to her high levels of chronic anxiety. Once again, the therapist interpreted T.C.'s pattern of BAI scores as evidence that therapy was not effectively addressing her concerns. Therefore, the therapist decided to again reframe the exposure work in a way that was more accessible to the client. At Session 11, however, T.C. reported having just found out that afternoon that her father had died suddenly several days before. As fearing such a death had been a preoccupation of the client, the therapist decided that grief work and exploration of T.C.'s feelings around this issue took precedence over other therapeutic goals. This work continued for Sessions 11 and 12, during which time T.C. consistently scored 11 on the BAI. The lessening of T.C.'s anxiety during this period seemed related to her realization that one of her worst fears had come true, and she had survived. Whereas much of her fear associated with anxiety and panic had been vague and amorphous, the death of her father was a concrete event with which she found herself able to cope. T.C. presented for Sessions 13 through 16 with increased concern and frustration over her perceived inability to cope with chronic anxiety. Her BAI scores of 15 and 14 for Sessions 13 and 14, respectively, reflected her anxiety over this issue as well as a more concrete problem. During this time, T.C. recalled that her first panic attack in this most recent episode of panic disorder was set off by the death of her friend's relative, whose name was June. She found that she felt panicky every time she heard someone say "June," and, as it was now May, she feared that the following month would be intolerable. At this time, the therapist and T.C. discussed the need for exposure to the name of the month and ways that T.C. could use the skills she had developed, relaxation and floating, plus diaphragmatic breathing and a present-focus mindset to cope with the anxiety-provoking situations. The use of these skills and engagement in exposure exercises led to a decrease in her anxiety over the next couple of weeks. T.C.'s BAI score was 9 for both Session 15 and Session 16 even as June approached. During these sessions, T.C. expressed greater confidence in her ability to cope with anxiety as well as greater success in using relaxation techniques. Treatment Outcome After 17 sessions, the therapist's training schedule required termination of therapy, and T.C. was referred to another therapist. T.C.'s posttreatment BAI score was 11, as compared to her pretreatment score of 22. At the time of termination, T.C. had become more skilled in her use of relaxation techniques and her levels of chronic tension and anxiety had been reduced. As a result, she had experienced only one panic attack in the previous 15 weeks. In addition, she was no longer complaining of gastrointestinal symptoms, and she was experiencing less chronic tension and pain in her shoulders and arms. These gains were reflected in her 50% score reduction on the BAI. Despite these gains, T.C. maintained a fear of the upcoming month of June and a belief that she could not deal effectively with episodes of extreme anxiety in the future. Although

< previous page

page_988

next page >

< previous page

page_989

next page > Page 989

she had shown progress in her ability to cope with anxiety, T.C. believed that she would not be strong enough to handle anxiety in the future, which left her feeling anxious. In general, she continued to feel moderate levels of anxiety and tension on a daily basis. The limitations in T.C.'s therapeutic gains were reflected by a comparison of her pretreatment and posttreatment BAI scores. Although her posttreatment BAI score of 11 was half of her pretreatment score of 22, this change was not reliably or clinically significant according to the Jacobson criteria presented earlier. Her posttreatment scores falls below Criterion c (the horizontal line in Fig. 31.1), indicating her score was more likely to be in the normal than the panic disorder range, but in the indeterminate band of measurement error (see T.C. in Fig. 31.1). Thus, despite an apparently large decrease in her level of anxiety, T.C.'s BAI scores were consistent with the clinical decision that she would benefit from further treatment. Conclusions Use of the BAI to screen, plan, and monitor treatment and evaluate outcome is recommended. As already outlined, the measure is efficient, cost-effective, and easy to administer, complete, and interpret. Furthermore, the sound psychometric properties of the instrument have been supported in numerous studies with a wide range of clinical and nonclinical populations. The limitations of the BAI also warrant consideration and have been described in this chapter. First, the measure potentially overrepresents symptoms of panic disorder to the exclusion of symptoms of general anxiety; however, this issue requires further examination because studies to date that address this question are rather limited. Currently, the narrow focus of the items suggests that additional measures ought to be included in assessment batteries to augment findings from the BAI and thoroughly assess patients throughout treatment. Second, although the BAI distinguishes anxiety from depression better than some other measures, its overlap with depression is substantial. Thus, in making the common differential assessment of anxiety versus depression, the clinician needs to supplement the BAI with a careful history and interview. References American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd rev. ed.). Washington, DC: Author. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Beck, A.T. (1978). PDR Check List. Unpublished questionnaire, University of Pennsylvania, Center for Cognitive Therapy, Philadelphia. Beck, A.T. (1982). Situational Anxiety Check List. Unpublished questionnaire, University of Pennsylvania, Center for Cognitive Therapy, Philadelphia. Beck, A.T., Brown, G., Steer, R.A., Eidelson, J.I., & Riskind, J.H. (1987). Differentiating anxiety from depression: A test of the cognitive content-specificity hypothesis. Journal of Abnormal Psychology, 96, 179183. Beck, A.T., Epstein, N., Brown, G., & Steer, R.A. (1988). An inventory for measuring anxiety: Psychometric properties. Journal of Consulting and Clinical Psychology, 56, 893-897. Beck, A.T., & Steer, R.A. (1987). Manual for the Revised Beck Depression Inventory. San Antonio, TX: Psychological Corporation. Beck, A.T., & Steer, R.A. (1990). Beck Anxiety Inventory Manual. San Antonio, TX: Psychological Corporation. Beck, A.T., & Steer, R.A. (1991). Relationship between the Beck Anxiety Inventory and the Hamilton Anxiety Rating Scale with anxious patients. Journal of Anxiety Disorders, 5, 213-223.

< previous page

page_989

next page >

< previous page

page_990

next page > Page 990

Beck, A.T., Steer, R.A., & Brown, G. (1985). Beck Anxiety Check List. Unpublished manuscript, University of Pennsylvania, Center for Cognitive Therapy, Philadelphia. Blake, D.D., Weathers, F.W., Nagy, L.M., & Kaloupek, D.G. (1995). The development of a clinicianadministered PTSD Scale. Journal of Traumatic Stress, 8, 75-90. Blanchard, E.B, Jones-Alexander, J., Buckley, T.C., & Forneris, C.A. (1996). Psychometric properties of the PTSD Checklist (PCL). Behaviour Research and Therapy, 34, 669-673. Borden, J.W., Peterson, D.R., & Jackson, E. A. (1991). The Beck Anxiety Inventory in nonclinical samples: Initial psychometric properties. Journal of Psychopathology and Behavioural Assessment, 13, 345-356. Chambless, D.L., Caputo, G.C., Bright, P., & Gallagher, R. (1984). Assessment of fear of fear in agoraphobics: The Body Sensations Questionnaire and the Agoraphobic Cognitions Questionnaire. Journal of Consulting and Clinical Psychology, 52, 1090-1097. Chambless, D.L., Caputo, G.C., Jasin, S.E., Gracely, E.J., & Williams, C. (1985). The Mobility Inventory for Agoraphobia. Behaviour Research and Therapy, 23, 35-44. Chambless, D.L., & Gracely, E.J. (1989). Fear of fear and the anxiety disorders. Cognitive Therapy and Research, 13, 9-20. Clark, D.A., Steer, R.A., & Beck, A.T. (1994). Common and specific dimensions of self-reported anxiety and depression: Implications for the cognitive and tripartite models. Journal of Abnormal Psychology, 103, 645-654. Cox, B.J., Cohen, E., Direnfeld, D.M., & Swinson, R.P. (1996). Does the Beck Anxiety Inventory measure anything beyond panic attack symptoms? Behaviour Research and Therapy, 34, 949-954. Creamer, M., Foran, J., & Bell, R. (1995). The Beck Anxiety Inventory in a nonclinical sample. Behaviour Research and Therapy, 33, 477-485. de Beurs, E., Wilson, K.A., Chambless, D.L., Goldstein, A.J., & Feske, U. (1997). Convergent and divergent validity of the Beck Anxiety Inventory for patients with panic disorder and agoraphobia. Depression and Anxiety, 6, 140-145. Demos, V.C., & Prout, M.F. (1994). A review of scales for the brief assessment of anxiety. In L.V. Creek, S. Knapp, & T.L. Jackson (Eds.), Innovations in clinical practice: A source book (Vol. 13, pp. 167-178). Sarasota, FL: Professional Resource Press/Professional Resource Exchange. Dent, H.R., & Salkovskis, P.M. (1986). Clinical measures of depression, anxiety, and obsessionality in nonclinical populations. Behaviour Research and Therapy, 24, 689-691. Derogatis, L.R. (1975). The Brief Symptom Inventory. Baltimore, MD: Clinical Psychometric Research. Derogatis, L.R. (1983). The SCL-90-R administration, scoring, and procedures manual-II. Towson, MD: Clinical Psychometric Research. DiNardo, P.A., & Barlow, D.H. (1988). Anxiety Disorders Interview Schedule-Revised (ADIS-R). Albany, NY: Phobia and Anxiety Disorders Clinic, State University of New York at Albany. Fydrich, T., Dowdall, D., & Chambless, D.L. (1992). Reliability and validity of the Beck Anxiety Inventory. Journal of Anxiety Disorders, 6, 55-61. Gillis, M.M., Haaga, D.A.F., & Ford, G.T. (1995). Normative values for the Beck Anxiety Inventory, Fear Questionnaire, Penn State Worry Questionnaire, and Social Phobia and Anxiety Inventory. Psychological Assessment, 7, 450-455. Goodman, W.K., Price, L.H., Rasmussen, S.A., & Mazure, C. (1989). The Yale-Brown Obsessive Compulsive Scale: I. Development, use, and reliability. Archives of General Psychiatry, 46, 1006-1011. Gotlib, I.H., & Cane, D.B. (1989). Self-report assessment of depression and anxiety. In P.C. Kendall & D. Watson (Eds.), Anxiety and depression: Distinctive and overlapping features (pp. 131-169). New York: Academic Press. Hamilton, M. (1959). The assessment of anxiety states by rating. British Journal of Medical Psychology, 32, 5055. Hewitt, P.L., & Norton, G.R. (1993). The Beck Anxiety Inventory: A psychometric analysis. Psychological Assessment, 5, 408-412. Jacobson, N.S., & Revenstorf, D. (1988). Statistics for assessing the clinical significance of psychotherapy techniques: Issues, problems and new developments. Behavioral Assessment, 10, 133-145. Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research.

< previous page

page_991

next page > Page 991

Journal of Consulting and Clinical Psychology, 59, 12-19. Jolly, J.B., Aruffo, J.F., Wherry, J.N., & Livingston, R. (1993). The utility of the Beck Anxiety Inventory with inpatient adolescents. Journal of Anxiety Disorders, 7, 95-106. Kabakoff, R.I., Segal, D.L., Hersen, M., & Van Hasselt, V.B. (1997). Psychometric properties and diagnostic utility of the Beck Anxiety Inventory and the Stait-Trait Anxiety Inventory with older adult psychiatric outpatients. Journal of Anxiety Disorders, 11, 33-47. Kendall, P.C., & Grove, W.M. (1988). Normative comparisons in therapy outcome. Behavioral Assessment, 10, 147-158. Kessler, R.C., McGonagle, K.A., Zhao, S., Nelson, C.B., Hughes, M., Eshleman, S., Wittchen, H.U., & Kendler, K.S. (1994). Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: Results from the National Comorbidity Study. Archives of General Psychiatry, 51, 8-19. Lipman, R.S. (1982). Differentiating anxiety and depression in anxiety disorders: Using rating scales. Psychopharmacology Bulletin, 18, 69-77. Marks, I. M., & Mathews, A.M. (1979). Brief standard self-rating for phobic patients. Behaviour Research and Therapy, 17, 263-267. Meyer, T.J., Miller, M.L., Metzger, R.L., & Borkovec, T.D. (1990). Development and validation of the Penn State Worry Questionnaire. Behaviour Research and Therapy, 28, 487-495. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Norton, G.R., Dorward, J., & Cox, B.J. (1986). Factors associated with panic attacks in nonclinical subjects. Behavior Therapy, 17, 239-252. Osman, A., Barrios, F.X., Aukes, D., Osman, J. R., & Markway, K. (1993). The Beck Anxiety Inventory: Psychometric properties in a community population. Journal of Psychopathology and Behavioral Assessment, 15, 287-297. Riskind, J.H., Beck, A.T., Brown, G., & Steer, R. A. (1987). Taking measure of anxiety and depression: Validity of the reconstructed Hamilton Scales. Journal of Nervous and Mental Disease, 175, 474-479. Sanavio, E. (1988). Obsessions and compulsions: The Padua Inventory. Behaviour Research and Therapy, 26, 169-177. Shapiro, F. (1995). Eye movement desensitization and reprocessing: Basic principles, protocols and procedures. New York: Guilford. Shear, M., Brown, T., Barlow, D., Money, M., Sholomkas, D., Woods, S., Gorman, J., & Papp, L. (1997). The Multi-Center Collaborative Panic Disorder Severity Scale. American Journal of Psychiatry, 154, 1571-1575. Snaith, R.P., & Taylor, C.M. (1985). Rating scales for depression and anxiety: A current perspective. British Journal of Clinical Pharmacology, 19, 17S-20S. Spielberger, C.D. (1983). Manual for the State-Trait Anxiety Inventory. Palo Alto, CA: Consulting Psychologists Press. Steer, R.A., & Beck, A.T. (1996). Generalized anxiety and panic disorders: Response to Cox, Cohen, Direnfeld, and Swinson (1996). Behaviour Research and Therapy, 34, 955-957. Steer, R.A., Kumar, G., Ranieri, W.F., & Beck, A.T. (1995). Use of the Beck Anxiety Inventory with adolescent psychiatric outpatients. Psychological Reports, 76, 459-465. Steer, R.A., Ranieri, W.F., Beck, A.T., & Clark, D.A. (1993). Further evidence for the validity of the Beck Anxiety Inventory with psychiatric outpatients. Journal of Anxiety Disorders, 7, 192-205. Steer, R.A., Rissmiller, D.J., Ranieri, W.F., & Beck, A.T. (1993). Structure of the computer-assisted Beck Anxiety Inventory with psychiatric inpatients. Journal of Personality Assessment, 60, 532-542. Turner, S.M., Beidel, D.C., Dancu, C.V., & Stanley, M.A. (1989). An empirically derived inventory to measure social fears and anxiety: The Social Phobia and Anxiety Inventory. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 1, 35-40. Watson, D., & Kendall, P.C. (1989). Understanding anxiety and depression: Their relation to negative and positive affective states. In P.C. Kendall & D. Watson (Eds.), Anxiety and depression: Distinctive and overlapping features (pp. 3-22). San Diego, CA: Academic Press.

< previous page

page_992

next page > Page 992

Williams, C.L., & Poling, J. (1989). An epidemiological perspective on the anxiety and depressive disorders. In P.C. Kendall & D. Watson (Eds.), Anxiety and depression: Distinctive and overlapping features (pp. 317-337). San Diego, CA: Academic Press.

< previous page

page_992

next page >

< previous page

page_993

next page > Page 993

Chapter 32 Measuring Anxiety and Anger with the State-Trait Anxiety Inventory (STAI) and the State-Trait Anger Expression Inventory (STAXI) Charles D. Spielberger Sumner J. Sydeman Ashley E. Owen Brian J. Marsh University of South Florida Early studies of emotion endeavored to discover, from an analysis of the introspective reports of trained observers, the qualitative feeling-states, or "mental elements," that comprised different emotions (Titchener, 1897; Wundt, 1896). Unfortunately, this phenomenological approach generated findings that were obviously artificial, and contributed to a discouraging degree of conceptual ambiguity and empirical inconsistency (Plutchik, 1962; Young, 1943). Moreover, subjective reports about emotional states came to be viewed with extreme suspicion because they were unverifiable and easily falsified (Duffy, 1941). Distrust of verbal reports was further intensified by psychoanalytic formulations emphasizing the distortions in mood and thought that may be produced by unconscious mental processes. With the advent of behaviorism shortly after the turn of the century, together with psychology's acceptance of the physicalistic assumptions of logical positivism, research on emotion shifted from the investigation of subjective feeling-states to the evaluation of behavioral and physiological variables. Because early phenomenological conceptions of emotion did not readily fit with positivistic scientific methods, the typical behaviorist paradigm employed in research on emotion involved the manipulation of experimental conditions designed to influence a particular emotional state, and observation of the effects of these manipulations on behavioral and/or physiological responses. The epistemology and methodology of stimulus-response (S-R) psychology and, especially, the prevailing bias against subjective experience as a desideratum for the science of psychology, required investigators to evaluate the impact of carefully defined and manipulated antecedent (stimulus) conditions on specific physiological and behavioral responses. Thoughts and feelings were rarely investigated because they were considered to be beyond the limits of scientific inquiry. Beginning in the 1960s, there has been growing recognition and acceptance of the unique importance of the experiential component of emotions. Clearly, emotional states in humans cannot be meaningfully defined by stimulus and response operations alone. Most authorities now regard human emotions as complex psychobiological states or conditionsreactions characterized by specific feeling qualities and widespread bodily

< previous page

page_993

next page >

< previous page

page_994

next page > Page 994

changes, particularly in the autonomic nervous system. It is also now generally accepted that appraisal of a particular event or situation will greatly influence an individual's emotional reactions, and that differences in personality and past experience may dispose people to respond to similar stimulus circumstances in radically different ways (Lazarus & Folkman, 1984; Lazarus & Opton, 1966). In the present context, the concept of emotion is used much as this term is currently employed in common language: to refer to complex, qualitatively different, psychobiological states or conditions of the human organism that have both phenomenological and physiological properties. The quality and intensity of the feelings that characterize emotional states seem to be their most unique and distinctive features. Therefore, the scientific study of emotional phenomena requires the development of appropriate methods to distinguish between qualitatively different emotional states, the intensity of such states at a particular time, and measuring individual differences in how often the emotion is experienced by different people. The nature of anxiety and anger as emotional states and personality traits, and the procedures employed in their measurement are briefly reviewed here. Measures of state and trait anxiety, and the development of the StateTrait Anxiety Inventory (STAI) are discussed first. The next part examines conceptual ambiguities in the constructs of anger, hostility, and aggression, briefly evaluates several instruments developed to assess anger and hostility, and describes the construction and validation of the State-Trait Anger Scale (STAS). The development and validation of the Anger Expression (AX) Scale and the State-Trait Anger Expression Inventory (STAXI) for assessing the experience, expression, and control of anger are considered next. The chapter concludes with a discussion of the utilization of measures of anxiety and anger in treatment planning and evaluation. Nature and Measurement of Anxiety The importance of fear (anxiety) and rage (anger) as scientific constructs is reflected in the writings of Charles Darwin (1872/1965), who considered the expression of these emotions to be universal and adaptive characteristics of both humans and animals that had evolved over countless generations through a process of natural selection. Noting that both fear and rage varied in intensity, Darwin observed that fear increased from mild apprehension or surprise, to an extreme "agony of terror." Manifestations of fear included trembling, dilation of the pupils, increased perspiration, changes in voice quality, erection of the hair, and peculiar facial expression. For Freud (1924), fear and anxiety referred to "something felt," a specific unpleasant emotional state or condition that included experiential, physiological, and behavioral components. Freud equated fear with objective anxiety, an emotional reaction that was proportional in intensity to a real danger in the external world. He regarded anxiety as the "fundamental phenomenon and the central problem of neurosis" (Freud, 1936, p. 85), and used the term neurotic anxiety to describe emotional reactions that were greater in intensity than would be expected on the basis of the objective danger. The source of the danger for neurotic anxiety was the individual's own sexual or aggressive impulses that had been repressed because they had been punished and were now considered unacceptable. Freud initially believed that anxiety resulted from the discharge of repressed, somatic sexual tensions (libido). When blocked from normal expression, libidinal energy accumulated and was automatically discharged as freefloating anxiety. This view was subsequently modified in favor of a more general conception of anxiety as a signal

< previous page

page_994

next page >

< previous page

page_995

next page > Page 995

indicating the presence of a danger situation. The perceived presence of danger evokes an unpleasant emotional state, which serves to warn the individual that some form of adjustment is necessary and motivates escape behavior or repression. In emphasizing the adaptive utility of anxiety as a motivator of behavior that helps individuals to avoid or cope with danger, Freud's Danger Signal Theory is quite consistent with Darwin's evolutionary perspective. For nearly a century, clinical studies of anxiety have appeared in the psychiatric and psychoanalytic literature with increasing regularity, but prior to 1950 there was relatively little experimental research on human anxiety (May, 1950; Spielberger, 1966). The complexity of anxiety phenomena, the lack of appropriate measuring instruments, and ethical problems associated with inducing anxiety in laboratory settings all contributed to the paucity of research. Since 1950, however, research on human anxiety has been facilitated on two fronts: Conceptual advances have clarified the nature of anxiety as a theoretical construct, and a number of scales have been created for measuring this construct. Cattell and Scheier (1963) pioneered the application of multivariate techniques to defining and measuring anxiety. A variety of self-report and physiological measures of anxiety were included in their factor analytic studies in which relatively independent "state" and "trait" anxiety factors have consistently emerged (Cattell, 1966). Physiological measures that fluctuated over time, such as respiration rate and blood pressure, had strong loadings on the state anxiety factor, but only slight loadings on trait anxiety. In contrast, most psychometric measures of anxiety were relatively stable over time, and had strong loadings on the trait anxiety factor and weak loadings on state anxiety. Thus, Cattell's research identified two related, yet logically quite different anxiety constructs: An unpleasant emotional state or condition that varies in intensity and fluctuates over time, and relatively stable individual differences in anxiety proneness as a personality trait. The concept of anxiety as an emotional state (S-Anxiety) is comparable to the conceptions of fear and objective anxiety formulated by Freud (1936). S-Anxiety consists of unpleasant, consciously perceived feelings of tension, apprehension, nervousness and worry, with associated activation or arousal of the autonomic nervous system, that can be most meaningfully and unambiguously operationally defined by a combination of introspective verbal reports and physiological-behavioral signs (Spielberger, 1972a). Trait anxiety (T-Anxiety) has the characteristics of a class of constructs that D.T. Campbell (1963) referred to as acquired behavioral dispositions, and which Atkinson (1964) labeled as motives. Measures of T-Anxiety assess individual differences in the tendency to perceive a wide range of situations as dangerous or threatening, especially situations that involve evaluation by other persons or threats to self-esteem. Individuals high in T-Anxiety respond to perceived threats with more frequent and intense elevations in S-Anxiety than persons low in T-Anxiety. Instruments for Measuring Anxiety A variety of questionnaires, rating scales, and psychometric tests are currently employed to measure state and trait anxiety in research and clinical practice. Self-report psychometric questionnaires are by far the most popular procedures for assessing anxiety. Among these, the Taylor (1953) Manifest Anxiety Scale (MAS) has been used extensively in experimental research. The MAS consists of 50 true-false statements, which were selected from the Minnesota Multiphasic Personality Inventory (MMPI) on the basis of item content reflecting symptoms of anxiety that are generally observed in individuals with anxiety neuroses.

< previous page

page_995

next page >

< previous page

page_996

next page > Page 996

Cattell and Scheier (1963) developed the Anxiety Scale Questionnaire (ASQ) to assess anxiety in clinical situations. They assembled a large pool of multiple-choice items presumed to be related to anxiety phenomena, and employed factor analytic procedures as the primary basis for final item selection. Despite major differences in the conceptions of anxiety that guided the construction of the ASQ and the MAS, and in item format and the method of test construction, correlations between these measures are typically .80 or higher. Because these correlations approach the reliability of the individual scales, the MAS and the ASQ may be considered as equivalent measures. Although constructed before the importance of the state-trait distinction was established, the MAS and the ASQ require respondents to report how often they generally experience symptoms of anxiety, suggesting that these scales measure T-Anxiety. A number of questionnaires, rating scales, psychometric inventories, and physiological measures that have been used to assess anxiety are described by Levitt (1980). Many of these measures have also been reviewed and evaluated by McReynolds (1968) and Borkovec, Weerts, and Bernstein (1977). In early studies, S-Anxiety was most often measured by assessing physiological changes associated with activation (arousal) of the autonomic nervous system. Among the physiological measures that have been used as indicants of S-Anxiety, changes in heart rate appear to be the most popular (See Borkovec et al., 1977; Hodges, 1976; Lader, 1975; Levitt, 1980; Martin, 1973; McReynolds, 1968). Projective techniques such as the Rorschach Inkblots and the Thematic Apperception Test have also been used extensively in the clinical evaluation of anxiety, but there is relatively little empirical evidence of the validity of these instruments as measures of S-Anxiety. The Hamilton (1959) Rating Scale is widely used for evaluating symptoms of anxiety observed in clinical interviews and psychotherapy sessions. Specific symptoms assessed with the Hamilton scale include: anxious mood (worry, apprehension); tension (inability to relax, trembling, restlessness); and fears of strangers, animals, traffic, and crowds. The severity of each symptom is rated on a 5point scale, from "none" to "very severe, grossly disabling." Given the symptoms observed and the nature of the ratings procedure, the Hamilton scale provides information about the intensity of state anxiety that a patient is experiencing during a specific time period. The Affect Adjective Check List (AACL) developed by Zuckerman (1960) and his associates (Zuckerman & Biase, 1962; Zuckerman & Lubin, 1965) was the first instrument that provided measures of both S-Anxiety and T-Anxiety. Although respondents are instructed to report how they feel "today" rather than at a particular moment, evidence of the validity of the AACL-Today form as a measure of S-Anxiety is nevertheless impressive. However, because respondents only check adjectives that describe them and do not report the intensity of their feelings, this response format makes the AACL somewhat insensitive for assessing anxiety as an emotional state. Relatively low correlations of the AACL General Form with the MAS and the ASQ raise questions about the concurrent validity of this form as a measure of T-Anxiety. The State-Trait Anxiety Inventory (STAI) The STAI was developed by Spielberger, Gorsuch, and Lushene (1970) to provide reliable, relatively brief, selfreport scales for assessing both state and trait anxiety in research and clinical practice. Freud's (1936) Danger Signal Theory and Cattell's concepts of state and trait anxiety (Cattell, 1966; Cattell & Scheier, 1958, 1961, 1963), as refined and elaborated by Spielberger (1966, 1972a, 1972b, 1976, 1977, 1979b, 1983), provided the conceptual

< previous page

page_996

next page >

< previous page

page_997

next page > Page 997

framework that guided the STAI test construction process. State anxiety (S-Anxiety) was defined as a temporal cross-section in the emotional stream-of-life of a person, consisting of subjective feelings of tension, apprehension, nervousness, and worry, and activation (arousal) of the autonomic nervous system. It was further assumed that S-Anxiety varies in intensity and fluctuates over time as a function of perceived threat. Trait anxiety (T-Anxiety) was defined in terms of relatively stable individual differences in anxiety proneness (i.e., differences between people in the tendency to perceive stressful situations as dangerous or threatening), and in the disposition to respond to such situations with more frequent and intense elevations in S-Anxiety. Differences in T-Anxiety are reflected in the frequency that anxiety states have been experienced in the past, and in the probability that S-Anxiety reactions will be manifested in the future. When test construction for the STAI began in 1964, the initial goal was to develop an inventory consisting of a single set of items that could be administered with different instructions to assess both state and trait anxiety. A large pool of items was selected and adapted from existing trait anxiety measures and a number of new items were written, which included adjectives from the AACL that were considered appropriate for assessing the absence of anxiety. For each adapted and new item, the essential psychological content was retained, but the format was modified so that the item could be given with different instructions to assess either S-Anxiety or TAnxiety. This pool of more than 60 items was administered to large samples of university students and psychiatric patients, first with state and then with trait instructions. The state instructions required respondents to report the intensity of their feelings of anxiety, ''right now, at this moment." The trait instructions asked subjects to report how they generally feel by indicating the frequency of occurrence of anxiety-related feelings or symptoms. In selecting items for the preliminary form of the STAI, each item when given with trait instructions that correlated significantly with the students' scores on three well-known T-Anxiety scales was retained for further study. The three criterion measures were the Taylor (1953) MAS, Cattell and Scheier's ASQ (1963) (the two most widely used anxiety measures at the time test construction was begun) and Welsh's (1956) "Factor A" Anxiety Scale (which was derived from a factor analysis of the 566 MMPI items). The internal consistency and stability of each STAI item was also evaluated when given with either trait or state instructions. On the basis of extensive item-validity research with more than 2,000 university students, comprising 10 independent samples, a final set of 20 items was selected for Form A, the preliminary form of the STAI. Although the items comprising the STAI (Form A) were intended to be administered with different instructions to measure both S-Anxiety and T-Anxiety, research with the inventory revealed that altering the instructions could not overcome the strong state or trait psycholinguistic connotations of key words in some items (Spielberger et al., 1970). For example, "I feel upset," was a highly sensitive measure of S-Anxiety. Scores on this item increased markedly under stressful conditions and were lower under relaxed conditions, as compared with a neutral condition. However, when given with trait instructions, the scores on this item were unstable over time, and correlations with other T-Anxiety items were relatively low. In contrast, "I worry too much," was stable over time and correlated highly with other T-Anxiety items. But scores on this item did not reliably increase in response to stressful circumstances, nor did they decrease under relaxed conditions, as was required for the construct validity of an S-Anxiety item. Given the difficulties encountered in measuring state and trait anxiety with the same items, the test construction strategy was modified and separate sets of items were selected

< previous page

page_997

next page >

< previous page

page_998

next page > Page 998

for assessing S-Anxiety and T-Anxiety. Those items with the best concurrent validity as measures of T-Anxiety (i.e., highest correlations with the MAS, ASQ and Welsh Anxiety Scale) and that were most stable over time were selected for the 20-item STAI (Form X) T-Anxiety Scale. In addition, the construct validity of each SAnxiety item when given with state instructions was evaluated under high and low stress conditions. Those items with the best construct validity and highest internal consistency as measures of S-Anxiety were selected for the STAI(Form X) S-Anxiety Scale. Only 5 items met the validity criteria for both scales. The remaining 30 items were sufficiently different in content to be regarded as unique measures of either state or trait anxiety. In responding to the STAI T-Anxiety items, subjects rate themselves on the following 4-point frequency scale: 1 = "Almost never," 2 = "Sometimes," 3 = "Often,'' 4 = "Almost always." The following are representative TAnxiety items, reflecting the presence or absence of trait anxiety: Anxiety Present: I worry too much over something that really doesn't matter; I get in a state of tension or turmoil as I think over recent concerns and interests. Anxiety Absent: I am content; I am a steady person. The STAI S-Anxiety scale was constructed to measure the intensity of anxiety as an emotional state, with low scores indicating feeling calm and serene, intermediate scores indicating moderate levels of tension and worry, and high scores reflecting intense anxiety, approaching terror and panic. In responding to the S-Anxiety items, subjects rate the intensity of their feelings on the following 4-point scale: 1 = "Not at all," 2 = "Somewhat," 3 = "Moderately so," 4 = "Very much so." The following are representative S-Anxiety present and absent items: Anxiety Present: I am tense; I am worried. Anxiety Absent: I feel calm; I feel secure. Insights gained in a decade of research following the publication of the STAI (Form X) stimulated a major revision of the inventory (Spielberger, 1983). The primary goal in revising the scale was to develop "purer" measures of state and trait anxiety in order to provide a more valid basis for differentiating between patients suffering from anxiety and depressive disorders in clinical diagnosis. Careful scrutiny of the content of the STAI items with the best psychometric properties resulted in clearer conceptual definitions of the constructs of state and trait anxiety, which then guided the construction of potential replacement items for the revised STAI (Form Y). Selection of replacement items was based on item analyses and factor analyses of responses to the original and replacement items; 30% of the original items were replaced. The item replacement procedures are described in detail in the revised STAI (Form Y) test manual (Spielberger, 1983). In the construction and standardization of the STAI (Form Y), more than 5,000 additional subjects were tested. Factor analyses of the STAI (Form Y) items (Spielberger, Vagg, L.R. Barker, Donham, & Westberry, 1980; Vagg, Spielberger, & O'Hearn, 1980) identified clear-cut trait and state anxiety factors, which were generally consistent with the results of previous factor studies of Form X (B.M. Barker, H.R. Barker, & Wadsworth, 1977; Gaudry & Poole, 1975; Gaudry, Spielberger, & Vagg, 1975; Kendall, Finch, Auerbach, Hooke, & Mikulka, 1976; Spielberger et al., 1980). Distinctive state and trait anxiety-absent and anxiety-present factors emerged in the four-factor solutions for Form Y, which were similar to those reported in previous factor studies of Form X. However, Form Y had better simple structure, and the factors were more differentiated

< previous page

page_998

next page >

< previous page

page_999

next page > Page 999

and more stable than in Form X, reflecting a better balance between T-Anxiety present and absent items (Spielberger et al., 1980). Reliability, Stability and Internal Consistency of the Stai. Detailed reliability data for the STAI (Form Y) are reported in the test manual (Spielberger, 1983). Test-retest stability coefficients for the STAI (Form Y) T-Anxiety Scale are reasonably high for college students, ranging from .73 to .86, but somewhat lower for high school students, ranging from .65 to .75. The median stability coefficients for a number of different samples of college and high school students were .77 and .70, respectively. In contrast, the stability coefficients for the S-Anxiety scale were relatively low, with a median of only .33. However, this lack of stability was both expected and considered essential because a valid measure of state anxiety should reflect the influence of unique situational factors that exist at the time of testing. Because anxiety states are expected to vary in intensity as a function of perceived stress, measures of internal consistency such as alpha coefficients provide a more meaningful index of the reliability of state measures than test-retest correlations. Alpha coefficients for the STAI (Form Y) S-Anxiety Scale, computed by Formula KR20 as modified by Cronbach (1951), are uniformly high. The S-Anxiety alphas were .90 or higher for large, independent samples of students, working adults, and military recruits, with a median coefficient of .93. The alpha coefficients for the STAI (Form Y) T-Anxiety Scale were also uniformly high for these groups, with a median coefficient of .90. The S-Anxiety and T-Anxiety alpha coefficients for younger, middle-age, and older working adults were also very high over the entire age range. The distribution of scores on the STAI S-Anxiety scale when given under neutral conditions is positively skewed and approaches a normal distribution under stressful conditions. Consequently, alpha reliability coefficients are generally somewhat higher when this scale is given under conditions of psychological stress. For example, the alpha reliability was .92 for the S-Anxiety scale when it was administered to college males immediately after a difficult intelligence test, and .94 when it was given immediately after a distressing film. For the same subjects, the alpha reliability was .89 when the scale was given following a brief period of relaxation training. Further evidence of the internal consistency of the STAI scales is provided by item-remainder correlations. In the normative samples, the item-remainder correlations for all 20 T-Anxiety items, and for 19 of the 20 S-Anxiety items, were .30 or higher for both sexes, and were .50 or higher for more than half of the items. In summary, the internal consistency of the STAI (Form Y) S-Anxiety and T-Anxiety Scales is quite high as measured by alpha coefficients and item-remainder correlations. Test-retest stability is also relatively high for the STAI T-Anxiety scale as required for a valid trait measure, but low for the S-Anxiety scale as would be expected for a measure designed to assess transitory changes in anxiety as an emotional state. Content, Concurrent, and Construct Validity of the STAI. Individual STAI items were required to meet stringent validity criteria at each stage of the test development process (Spielberger, 1983; Spielberger & Gorsuch, 1966; Spielberger et al., 1970). As previously noted, most items in the initial item pool were adapted from other anxiety scales, and selected for the STAI on the basis of significant correlations with both the Taylor (1953) MAS and Cattell and Scheier's (1963) ASQ, the two most widely used measures of trait anxiety at the time the STAI was developed (Spielberger et al., 1970). It was subsequently recognized that the MAS contained a number of items that reflected depression rather than anxiety (e.g., "I cry easily," "I feel useless at times," "I feel

< previous page

page_999

next page >

< previous page

page_1000

next page > Page 1000

blue"). In revising the STAI, the conceptual definition of state and trait anxiety were improved, and new items were constructed in keeping with these definitions. The items with depressive content were found to have weaker psychometric properties than the new items, and were therefore eliminated from the revised STAI (Form Y) (Spielberger, 1983). Although the content of several ASQ items are more closely related to anger than anxiety (e.g., "Often I get angry with people too quickly"), items with anger content were not included in the original STAI item pool. Relatively high correlations of scores on the STAI T-Anxiety scale with the ASQ and the MAS, ranging from .73 to .85, indicate a high degree of concurrent validity, suggesting that the three inventories can be considered, essentially, as equivalent measures of trait anxiety. However, a major advantage of the STAI T-Anxiety scale is that it is less contaminated with depression and anger than the MAS and the ASQ. A second advantage is that the STAI T-Anxiety scale is comprised of only 20 items, as compared with the 43-item ASQ and the 50-item MAS, and thus requires only about half as much time to administer. Evidence of the construct validity of the T-Anxiety scale is also reflected in the relatively high mean scores of various neuropsychiatric (NP) patient groups for whom anxiety is a major symptom (Spielberger, 1983). Except for character disorders, all NP groups have substantially higher T-Anxiety scores than normal subjects. General medical and surgical (GMS) patients with psychiatric complications also have higher T-Anxiety scores than GMS patients without such complications, indicating that the T-Anxiety scale can identify nonpsychiatric patients with emotional problems. The lower T-Anxiety scores of patients with character disorders, for whom the absence of anxiety is an important defining condition, provides further evidence of the discriminant validity of the STAI. In order to demonstrate construct validity in assessing S-Anxiety, the score for each STAI S-Anxiety item had to increase significantly in a stressful situation when compared with a neutral situation, and to decline in relaxing situations. Evidence of the construct validity of the S-Anxiety scale was demonstrated in findings that the SAnxiety scores of college students were significantly higher during classroom examinations, and lower after relaxation training, than when they were tested in a relatively nonstressful class period (Spielberger, 1983). Further evidence of the construct validity of the S-Anxiety scale was observed in military recruits tested shortly after they began a highly stressful training program. The S-Anxiety scores of the recruits were much higher than those of high school and college students of about the same age who were tested under relatively nonstressful classroom conditions, suggesting that the recruits were experiencing a high state of emotional turmoil when they were tested. In constructing and validating the STAI, more than 10,000 adolescents and adults have been tested. Norms for high school and college students, working adults, military personnel, prison inmates, and psychiatric, medical, and surgical patients are reported in the revised STAI (Form Y) Test Manual (Spielberger, 1983). The StateTrait Anxiety Inventory for Children (STAIC) measures anxiety in 9- to 12-year-old elementary school children (Spielberger, 1973), with extensive norms for fourth-, fifth-, and sixth-grade students. The STAIC has been used in numerous studies of normal children, as well as children who have emotional or physical problems, and has also been used with adolescents. Since first introduced more than a quarter century ago (Spielberger & Gorsuch, 1966), the STAI and the STAIC have been used extensively in more than 8,000 studies. These measures have also been adapted for crosscultural research in 58 different languages and dialects (Spielberger, 1989). Psychological research in which the STAI has been used includes experimental investigations and clinical studies of stress-related

< previous page

page_1000

next page >

< previous page

page_1001

next page > Page 1001

psychiatric, psychosomatic, and medical disorders; investigations of general psychological processes, such as attention, memory, learning, and academic achievement; research on situation-specific anxiety phenomena, such as test anxiety, anxiety in sports, and speech anxiety; studies of depression, schizophrenia, sociopathy, and substance abuse; and as an outcome measure in research on the effectiveness of biofeedback, psychotherapy, and various forms of behavioral and cognitive treatment. Anger, Hostility, and Aggression The maladaptive effects of anger are traditionally emphasized as important contributors to the etiology of the psychoneuroses, depression, and schizophrenia. Much has also been written about the negative impact of anger and hostility on physical health and psychological well-being, but the definitions of these constructs are ambiguous and sometimes contradictory. Moreover, the terms anger, hostility, and aggression are often used interchangeably in the research literature, and this conceptual confusion is reflected in a diversity of measurement operations of questionable validity (Biaggio, Supplee, & Curtis, 1981). Given the substantial overlap in prevailing conceptual definitions of anger, hostility, and aggression, we have referred collectively to these constructs as the AHA! Syndrome (Spielberger, Jacobs, Russell, & Crane, 1983), and have proposed the following working definitions: The concept of anger usually refers to an emotional state that consists of feelings that vary in intensity, from mild irritation or annoyance to intense fury and rage. Although hostility usually involves angry feelings, this concept has the connotation of a complex set of attitudes that motivate aggressive behaviors directed toward destroying objects or injuring other people. . . . While anger and hostility refer to feelings and attitudes, the concept of aggression generally implies destructive or punitive behavior directed towards other persons or objects. (p. 16) Anger is clearly at the core of the AHA! Syndrome, and different aspects of this emotion are typically emphasized in various definitions of hostility and aggression. Moreover, ambiguities and inconsistencies in the definitions of these constructs are reflected in the procedures that have been developed to assess them. The earliest efforts to assess anger and hostility were based on clinical interviews, behavioral observations, and projective techniques, such as the Rorschach Inkblots and the Thematic Apperception Test. Physiological and behavioral correlates of anger and hostility, and various manifestations of aggression have also been investigated in numerous studies. In contrast, the phenomenology of anger (i.e., the experience of angry feelings) has been largely neglected in psychological research. Moreover, most psychometric measures of anger and hostility confound angry feelings with the mode and direction of the expression of anger. Measures of Hostility and Anger Beginning in the 1950s, a number of self-report psychometric scales were constructed to measure hostility (e.g., Buss & Durkee, 1957; Caine, Foulds, & Hope, 1967; Cook & Medley, 1954; Schultz, 1954; Siegel, 1956). A rational-empirical strategy was employed in developing the Buss-Durkee (1957) Hostility Inventory (BDHI), which is generally regarded as the most carefully constructed of the early psychometric measures of hostility.

< previous page

page_1001

next page >

< previous page

page_1002

next page > Page 1002

Conceptualizing hostility as multidimensional, Buss (1961) developed items to assess seven facets of this construct, each defined by a BDHI subscale. The dimensionality of the BDHI was investigated in two studies in which responses to the individual BDHI items were factored. In contrast to the seven hostility dimensions presumed to be assessed by the BDHI subscales, Bendig (1962) found only two major underlying factors, which he labeled overt and covert hostility. Russell (1981) identified three meaningful BDHI factors, which he described as neuroticism, general hostility, and expression of anger. The importance of distinguishing between anger and hostility was explicitly recognized in the early 1970s, marked by the appearance in the psychological literature of three anger measures: the Reaction Inventory (RI), the Anger Inventory (AI), and the Anger Self-report (ASR). The RI was developed by Evans and Stangeland (1971) to assess the degree to which anger was evoked by specific situations (e.g., "People pushing into line"). Similar in conception and format to the RI, Novaco's (1975) AI inquires about reactions to anger-provoking incidents (e.g., "Being called a liar," "Someone spits at you"). In responding to the RI and the AI, examinees rate the degree to which each situation or incident would make them feel angry. Zelin, Adler, and Myerson (1972) designed the ASR to assess both "awareness of anger" and different modes of anger expression. In validating this scale, the scores of psychiatric patients were found to correlate significantly with psychiatrists' ratings of anger. Because the ASR and the RI have each been used in only one or two published studies over the past 25 years, the construct validity of these scales has yet to be established. Although the AI has been used more often in research than the other anger measures, Biaggio et al. (1981) found no significant correlations of this scale with either self or observer ratings of anger and hostility, and reported that the test-retest stability of the AI over a brief 2-week interval was only .17. In a series of studies, Biaggio and her colleagues (Biaggio et al., 1981; Biaggio & Maiuro, 1985) examined and compared the reliability, concurrent and predictive validity, and the correlates of the BDHI, RI, AI, and ASR. On the basis of their findings, Biaggio (1980) concluded that evidence of the validity of psychometric measures of anger and hostility was both fragmentary and limited. A common problem with existing measures of anger and hostility is that, in varying degrees, these scales confound the experience and expression of anger with situational determinants of angry reactions. Furthermore, none of these measures explicitly takes the state-trait distinction into account. The ASR Awareness subscale comes closest to examining the extent to which subjects experience angry feelings, but this instrument does not assess the intensity of these feelings at a particular time. A number of BDHI items specifically inquire about the frequency that anger is experienced or expressed (e.g., "I sometimes show my anger"; "Almost every week, I see someone I dislike"; "I never get mad enough to throw things"; italics added). Although these items implicitly assess individual differences in anger as a personality trait, most BDHI items seem to evaluate hostile attitudes (e.g., resentment, negativism, suspicion), rather than angry feelings. A coherent theoretical framework that distinguishes between anger, hostility, and aggression as psychological concepts, and that takes the state-trait distinction into account is essential for constructing and validating psychometric measures of anger and hostility. The State-Trait Anger Scale (STAS) Anger, as a psychological construct, refers to phenomena that are both more fundamental and less complex than hostility and aggression. The State-Trait Anger Scale (STAS), which is analogous in conception and similar in format to the State-Trait

< previous page

page_1002

next page >

< previous page

page_1003

next page > Page 1003

Anxiety Inventory (STAI) (Spielberger, 1983; Spielberger et al., 1970), was constructed to measure the intensity of anger as an emotional state and individual differences in anger proneness as a personality trait. Prior to constructing the STAS, working definitions of state and trait anger were formulated. State anger (S-Anger) was defined as a psychobiological state or condition consisting of subjective feelings that vary in intensity, from mild irritation or annoyance to intense fury and rage, with concomitant activation or arousal of the autonomic nervous system. It was further assumed that S-Anger would fluctuate over time as a function of perceived affronts, injustice, or frustration. Trait anger (T-Anger) was defined in terms of individual differences in the frequency that S-Anger was experienced over time, assuming that persons high in T-Anger perceive a wider range of situations as anger provoking (e.g., annoying, irritating, frustrating) than those low in T-Anger and more frequently experience elevations in S-Anger whenever such conditions are encountered. With these working definitions of S-Anger and T-Anger, a pool of items was assembled to assess the intensity of angry feelings (S-Anger) and individual differences in anger proneness (T-Anger). The following are examples of S-Anger items: "I feel angry," "I am furious," "I feel irritated." Examinees report the intensity of their angry feelings at a particular time by rating themselves on the following 4-point scale: "Not at all," "Somewhat,'' "Moderately so," "Very much so." Examples of T-Anger items are: "I have a fiery temper," "I fly off the handle," "It makes me furious when I am criticized in front of others." In responding to the T-Anger items, examinees indicate how they generally feel by rating themselves on the following frequency scale: "Almost never," "Sometimes," "Often," "Almost always." Fifteen S-Anger and 15 T-Anger items were selected from the item pool for the preliminary form of the STAS and administered to a large sample of university students. Alpha coefficients for the preliminary STAS S-Anger scale were .93 for both males and females, indicating a high degree of internal consistency. Alpha coefficients for the T-Anger scale were .87 for both sexes, providing similar strong evidence of the internal consistency of this scale. The item-remainder correlations for the individual S-Anger and T-Anger items were also uniformly high (median r = .68). Jacobs, Latham, and Brown (1988) examined the stability of the STAS for a large group of university students. Test-retest reliability coefficients for the STAS T-Anger scale over a 2-week interval were .70 and .77 for males and females, respectively. In contrast, the stability coefficients for the STAS S-Anger subscale were much lower (.27 for males, .21 for females), as would be expected for a measure of transitory anger. Given the high internal consistency of the preliminary STAS scales, it was possible to reduce the length of these scales without unduly weakening their psychometric properties. In revising the STAS, it was also considered desirable to develop internally consistent measures of S-Anger and T-Anger that were relatively independent of anxiety. Therefore, in selecting the final set of 10 S-Anger and 10 T-Anger items, those items with the highest item-remainder correlations for each scale and the lowest correlations with measures of anxiety were identified (L.R. Barker, 1979; Westberry, 1980). With only two exceptions, the item-remainder correlations for the 15 SAnger items were .50 or higher. These two items (i.e., "I am annoyed," "I am resentful") and the three items that had the high correlations with the STAI S-Anxiety scale (i.e., "I feel like I'm about to explode," "I feel frustrated," "I feel aggravated") were eliminated, reducing the number of S-Anger items from 15 to 10. The number of T-Anger items was also reduced from 15 to 10 by eliminating two items with the lowest itemremainder correlations (i.e., "People who think they are

< previous page

page_1003

next page >

< previous page

page_1004

next page > Page 1004

always right irritate me," "I get annoyed when I am singled out for correction"), and three items for which the correlations with the STAI T-Anxiety scale were relatively high (i.e., "I feel irritated," "It makes my blood boil when I am pressured," "I feel angry"). It is interesting to note that one of the T-Anger items that was eliminated (i.e., ''I feel angry") was included in the 10-item S-Anger scale and had excellent content validity as a measure of anger. However, the correlation of this item with T-Anxiety was almost as high as its item-remainder coefficient, suggesting that feelings of anger are frequently associated with symptoms of anxiety. The STAS S-Anger and T-Anger items were generated primarily on a rational basis. The high internal consistency of these scales, as reflected in item-remainder correlations and alpha coefficients, provides impressive evidence of the utility of the working definitions that guided the item-selection process. Correlations between the 10- and 15-item S-Anger and T-Anger scales, ranged from .95 to .99 for Navy recruits and college students, indicating that the 10-item scales provide essentially the same information as the longer forms (Spielberger, 1988). After the items with the highest correlations with anxiety were eliminated, the correlations of the 10-item S-Anger and T-Anger scales with the STAI scales were substantially lower. Factor analyses of the STAS S-Anger items identified only a single underlying factor for both males and females, indicating that the S-Anger scale measures a unitary emotional state that varies in intensity. In contrast, factor analyses of the T-Anger items identified two correlated factors, which were labeled Angry Temperament (T-Anger/T) and Angry Reaction (T-Anger/R). The T-Anger/T items describe individual differences in the disposition to express anger, without specifying any provoking circumstance (e.g., "I am a hotheaded person"). The T-Anger/R items describe angry reactions in situations that involve frustration and/or negative evaluations (e.g., "It makes me furious when I am criticized in front of others."). The results of a study of hypertensive patients clearly demonstrated that the two T-Anger subscales measure different facets of trait anger (Crane, 1981). The T-Anger scores of hypertensive patients were significantly higher than those of medical and surgical patients with normal blood pressure, but this difference was due entirely to the substantially higher Angry Reaction (T-Anger/R) scores of the hypertensives. The Angry Temperament (T-Anger/T) scores of the hypertensives were actually slightly lower than those of the control patients, but this difference was not statistically significant. Crane (1981) also found that hypertensives had significantly higher T-Anxiety scores than the control patients, and that the hypertensives S-Anger and SAnxiety scores were higher while performing a mildly frustrating task than the corresponding scores for patients with normal blood pressure. Concurrent and Construct Validity of the STAS The concurrent validity of the STAS T-Anger scale was evaluated by computing correlations with measures of the Buss-Durkee (1957) Hostility Inventory (BDHI), and the Hostility (Ho; Cook & Medley, 1954) and Overt Hostility (Hv; Schultz, 1954) scales of the Minnesota Multiphasic Personality Inventory, which were administered along with the STAS to large samples of college students and Navy recruits. Moderately high correlations of the STAS T-Anger scale with the three hostility measures were found for males and females in both samples, providing evidence of a strong relation between T-Anger and hostility.

< previous page

page_1004

next page >

< previous page

page_1005

next page > Page 1005

Moderate correlations of the STAS T-Anger scale were also found with the Neuroticism scale of the Eysenck Personality Questionnaire (EPQ; H.J. Eysenck & S.B.G. Eysenck, 1975) and the T-Anxiety scale of the StateTrait Personality Inventory (STPI; Spielberger, 1979a). These findings are consistent with the clinical observation that neurotic individuals frequently experience angry feelings that they cannot readily express (Spielberger, 1988). Small negative correlations of T-Anger with the EPQ Lie scale suggested that trait anger scores might be slightly reduced by test-taking attitudes that lead people with high Lie scores to inhibit reports of negative characteristics such as anger. Essentially zero correlations of the T-Anger scale with the EPQ Extraversion and STPI Curiosity scales indicate that T-Anger is unrelated to these personality dimensions. Although STAS T-Anger scores correlate significantly with measures of hostility, there are important differences in the meaning of anger and hostility as personality constructs. Hostile people frequently experience anger, but they also exhibit other behaviors that go beyond angry feelings, for example, acts of meanness, viciousness, vindictiveness and cynicism. The empirical relation between anger and hostility was further explored in factor analyses of responses to the 10 T-Anger items in which the BDHI total and subscale scores, and scores on the MMPI Ho and Hv scales were included. In order to evaluate the discriminant validity of the anger and hostility measures, the item scores for the STPI T-Anxiety and T-Curiosity were also included, and, three- and fourfactor solutions were extracted, with promax rotation (Spielberger, 1980; Westberry, 1980). In the three-factor solutions, which were similar for both males and females, the very strong first factor clearly measured an Anger/Hostility dimension. The second and third factors were defined primarily by anxiety and curiosity items. The STAS T-Anger and Buss-Durkee total scores had the highest loadings on the Anger/Hostility factor. All 10 T-Anger items, the Ho and Hv scale scores, and all of the BDHI subscales except Guilt also had salient loadings on this factor. Interestingly, the BDHI Guilt, Suspicion, and Resentment subscales had higher loadings on the Anxiety factor than on the Anger/Hostility factor. Separate Anger and Hostility factors emerged for both males and females in the four-factor solutions, along with Anxiety and Curiosity factors that were similar to those obtained in the three-factor solutions. The T-Anger scale and all but one of the 10 T-Anger items had their highest loadings on the Anger factor. The Hostility factor was defined by high loadings of the Buss-Durkee Total and the Ho scales, and by salient loadings for all of the BussDurkee subscales, except Guilt. Several BDHI subscales also had salient secondary loadings on the Anger factor; the BDHI Suspicion and Resentment subscales and the Ho scale had higher secondary loadings on the Anxiety factor than on the Anger factor. Thus, the results of the factor analyses indicated that anger and hostility measures assess different but related constructs, and that three of the hostility measures were more strongly related to anxiety than anger. In a series of studies at Colorado State University, Deffenbacher (1992) and his colleagues used the STAS TAnger Scale to assess multiple aspects of anger. Individuals with high T-Anger scores reported that they experienced greater intensity and frequency in day-to-day anger across a wide range of provocative situations than persons low in T-Anger. High T-Anger individuals also reported anger-related physiological symptoms two to four times more often than low anger subjects. When provoked, persons with high T-Anger scores were also characterized by stronger tendencies to both express and suppress anger, and by less constructive and more dysfunctional coping, as manifested

< previous page

page_1005

next page >

< previous page

page_1006

next page > Page 1006

in physical and verbal antagonism. In a study of trait anger and self-concept, Stark and Deffenbacher (1986) found that high T-Anger students did not like themselves as much as those low in T-Anger, nor did they feel as worthwhile or confident. Negative events such as failure also seemed to have a more devastating (catastrophizing) impact on high T-Anger individuals (Story & Deffenbacher, 1985), who reported that they experienced high levels of anxiety more frequently than students with low T-Anger scores. As our research on anger has progressed, the critical importance of differentiating between the experience and expression of anger has become increasingly apparent (Spielberger et al., 1985). It seemed essential not only to distinguish, both conceptually and empirically, between the experience of anger as an emotional state (S-Anger) and individual differences in anger proneness as a personality trait (T-Anger), but also to identify and measure the characteristic ways in which people express their anger. Theory and research on anger expression are briefly reviewed in the following section. The development of a scale to assess the expression and control of anger is then described in some detail. The Expression and Control of Anger The conceptual distinction between "anger-in" and "anger-out" as major modes of anger expression has long been recognized in psychophysiological research. The effects of anger expression on the cardiovascular system were a major focus of the classic studies of Funkenstein and his coworkers more than 40 years ago (Funkenstein, King, & Drolette, 1954). These researchers exposed healthy college students to anger-inducing laboratory conditions, and measured their pulse rate and blood pressure to determine the physiological effects of how they handled their anger. Students who became angry during the experiment and directed their anger toward the investigator or the laboratory situation were classified as anger-out; those who suppressed their anger and/or directed it at themselves were classified as anger-in. Typically, the increase in pulse rate for students classified as anger-in was three times greater than for the anger-out group. Following the procedures used by Funkenstein et al. (1954), individuals are generally classified as anger-in if they suppress their anger or direct it inward, toward the ego or self (Averill, 1982; Tavris, 1982). When held in or suppressed, anger may be subjectively experienced as an emotional state, S-Anger, which varies in intensity. Defining anger-in as suppressed anger differs from the psychoanalytic conception of anger turned inward toward the ego or self (Alexander, 1939, 1948). In psychoanalytic conceptions, angry feelings often result in guilt and depression (Alexander & French, 1948), whereas the thoughts and memories relating to an anger-provoking situation may be repressed, and are thus, not directly experienced. Anger directed outward generally involves both the experience of S-Anger and its manifestation in some form of aggressive behavior. Those who express their anger in aggressive behavior, directing it toward other persons or objects in the environment, are classified as anger-out. Anger-out may be expressed in physical acts such as slamming doors, destroying objects, and assaulting other persons, or in verbal behavior in the form of criticism, threats, insults, or the extreme use of profanity. These physical and verbal manifestations of anger may be directed toward the source of provocation or frustration, or expressed indirectly toward persons or objects associated with or symbolic of the provoking agent.

< previous page

page_1006

next page >

< previous page

page_1007

next page > Page 1007

Harburg and his associates reported impressive relations between anger expression, elevated blood pressure (BP) and hypertension, demonstrating that anger-in and anger-out have different effects on the cardiovascular system (Harburg, Blakelock & Roeper, 1979; Harburg et al., 1973; Harburg & Hauenstein, 1980; Harburg, Schull, Erfurt, & Schork, 1970). These investigators classified individuals as anger-in or anger-out on the basis of their self-ratings of how they would express anger if treated unfairly by a supervisor, a landlord, or a police officer. Gentry (1972) and his colleagues (Gentry, Chesney, Gary, Hall, & Harbury, 1982; Gentry, Chesney, Hall, & Harburg, 1981) subsequently corroborated and extended Harburg's findings. It should be noted, however, that the procedures used by Harburg and Gentry for classifying individuals as anger-in raises important conceptual issues because individuals who do not experience anger are equated with those who experience and suppress their angry feelings. Very different personality dynamics have been attributed by Rosenzweig (1976, 1978) to "impunitive" persons, who do not experience anger in anger-provoking situations, and "intrapunitive" persons who turn anger in when provoked, often blaming themselves for the anger directed toward them by others. The Anger Expression (AX) Scale Differentiating between the experience of angry feelings and how these feelings are expressed can be accomplished by measuring the intensity of S-Anger, individual differences in T-Anger, and the frequency that anger is suppressed (anger-in) or overtly expressed in behavior (anger-out). Anger expression was implicitly defined by Funkenstein et al. (1954), Harburg et al. (1973), and Gentry et al. (1982) as comprising a single dimension, varying from extreme suppression or inhibition of anger to the expression of anger in assaultive or destructive behavior. Spielberger et al. (1985) attempted to construct a unidimensional, bipolar measure, the Anger Expression (AX) Scale, to assess this dimension. As a first step in constructing the AX Scale, working definitions of anger-in and anger-out were formulated on the basis of a review of the relevant research literature. Anger-in was defined in terms of how often an individual experiences, but holds in (suppresses) angry feelings, rather than on the basis of the more ambiguous psychoanalytic construct of anger turned against the ego. Anger-out was defined as the frequency that an individual expresses angry feelings in verbal or physically aggressive behavior. In contrast to the procedure used by Funkenstein and Harburg of assigning subjects to dichotomous anger-in or anger-out categories, the AX Scale was designed to measure a continuum of individual differences in how often anger was held in or expressed. Consistent with working definitions of anger-in and anger-out, the content of the items for the AX Scale ranged from strong inhibition or suppression of angry feelings (AX/In) to extreme expression of anger toward other persons or objects in the environment (AX/Out). The rating format for the AX Scale was the same as that used with the STAS T-Anger scale (Spielberger, 1980), but the instructions differed markedly from those used in assessing T-Anger. Rather than asking respondents to indicate how they generally feel, they were instructed to report "how often you generally react or behave in the manner described when you feel angry or furious." In responding, subjects rate themselves on the following 4-point frequency scale: 1 = "Almost never," 2 = "Sometimes,'' 3 = "Often," 4 = "Almost always." Examples of AX Scale items are ("When angry or furious"):

< previous page

page_1007

next page >

< previous page

page_1008

next page > Page 1008

AX/In: I keep things in; I boil inside, but I don't show it. AX/Out: I lose my temper; I strike out at whatever infuriates me. In a study of the relation between anger expression and blood pressure, Johnson (1984) administered a 33-item preliminary version of the AX Scale to 1,114 high school students. Three items with poor psychometric properties that were judged to be ambiguous were subsequently discarded. To verify that the AX Scale items were measuring a unitary psychological construct, students' responses to the individual items were evaluated in separate factor analyses for males and females. Although originally the intention was to develop a unidimensional, bipolar measure of anger expression, the results of the factor analyses clearly indicated that the AX items were tapping two independent dimensions. These dimensions were labeled Anger/In and Anger/Out on the basis of the content of the items with strong loadings on one of the factors and negligible loadings on the other. Given the strength and clarity of the Anger/In and Anger/Out factors, the striking similarity (invariance) of these factors for males and females, and the large samples on which the factor analyses were based, the test construction strategy for developing the AX Scale was modified to identify homogeneous subsets of items to comprise scales for measuring anger-in and anger-out. Of the 30 items from the original item pool on which the identification of the Anger/In and Anger/Out factors was based, 8 items with relatively small loadings (below .35) on both factors were eliminated. The selection of the final subsets of AX Scale items for measuring angerin and anger-out was based on further factor analyses and subscale item-remainder correlations (Spielberger et al., 1985). Eight items with uniformly high loadings for both sexes on the Anger/In factor and negligible loadings on the Anger/Out factor were selected for the AX/In Scale; the median loadings of these items on Anger/In and Anger/Out were .665 and -.045, respectively. Similarly, eight items with uniformly high loadings for both sexes on the Anger/Out factor and negligible loadings on Anger/In were selected for the AX/Out Scale; the median loading of these items was .59 on Anger/Out and -.01 on Anger/In. The internal consistency of the 8-item AX/In and AX/Out subscales was evaluated by computing alpha coefficients and item-remainder correlations. The alphas ranged from .73 to .84, and were somewhat higher for the AX/In Scale. All but one of the item-remainder correlations for the AX/In and AX/Out items were .37 or greater. Jacobs et al. (1988) examined the test-retest reliability of the Anger Expression (AX) Scale and found stability coefficients that ranged from .64 to .86. Johnson (1984) and Pollans (1983) found essentially zero correlations between the AX/In and AX/Out scales for both males and females in large samples of high school and college students; similar findings have also been reported for other populations (Knight, Chisholm, Paulin, & Waal-Manning, 1988; Spielberger, 1988). Thus, the AX/In and AX/Out subscales are internally consistent, factorially orthogonal, and empirically independent. Clearly, these scales assess two independent angerexpression dimensions that are relatively stable over time. Measurement of Anger Control The original AX Scale item pool included several items intended to measure the middle range of the angerin/anger-out continuum. Three of these items (e.g., "Control my temper," "Keep my cool," ''Calm down faster") were retained in the final set of 20 AX Scale items because each item had substantial loadings on both the Anger/In and

< previous page

page_1008

next page >

< previous page

page_1009

next page > Page 1009

Anger/Out factors. In subsequent research with the AX Scale, these items tended to coalesce to form the nucleus of an anger control factor (Pollans, 1983), which stimulated the development of the AX Anger Control (AX/Con) Scale. The first step in constructing the AX/Con Scale was to assemble a pool of items with appropriate content. Using the three anger control items as a guide, dictionary and thesaurus definitions of "control" and idioms pertaining specifically to the control of anger were consulted in writing additional anger control items. The new AX/Con items were administered to a large sample of university students along with the 20 AX Scale items. In separate factor analyses for males and females of the AX/Con items, a large anger control (Anger/Con) factor and several very small factors were found for both sexes. The items with the strongest loadings on this factor for both males and females were added to the three original AX/Con items to form the 8-item AX/Con Scale. To evaluate the independence of the Anger/Con factor, and to determine its relation to the Anger/In and Anger/Out factors, the items comprising the 8-item AX/Con, AX/Out, and AX/In scales were administered to a large sample of university students (Spielberger, Krasner, & Solomon, 1988). In factor analyses of responses to the 24 AX items, an Anger/Con factor was the strongest to emerge for both males and females; all 8 AX/Con items had dominant salient loadings on this factor. Well-defined Anger/In and Anger/Out factors were also found; all AX/In and AX/Out items had dominant salient loadings on the appropriate factor. For both sexes, the AX/Con Scale correlated negatively with AX/Out (r = -.59 and -.58 for males and females, respectively). Correlations of the AX/In and AX/Out scales were essentially zero for both sexes. The independence of the AX/In and AX/Out subscales and moderately high negative correlations of AX/Con with AX/Out have been consistently demonstrated (Pollans, 1983; Spielberger, 1988; Spielberger et al., 1985). Evidence of the concurrent (convergent) and discriminant validity of the AX Scale is reflected in the correlations of these scales with other anger and personality measures (Spielberger, 1988). Moderately high correlations of AX/Out scores with T-Anger and T-Anger/T-scores, and smaller correlations of both AX/Out and AX/In scores with T-Anger/R suggest that persons with an angry temperament are more likely to express their anger outwardly rather than suppress it, whereas individuals who frequently experience anger when they are frustrated or treated unfairly are equally likely to suppress or outwardly express their anger. Small but highly significant correlations of both AX/In and AX/Out with the STPI T-Anxiety Scale suggested that individuals who suppress or express anger more often are also likely to experience anxiety more frequently than individuals with low anger expression scores. Correlations of all three anger expression measures with the STPI T-Curiosity scale were essentially zero, providing evidence of discriminant validity. A major reason for constructing the AX Scale was to develop an instrument to facilitate investigation of how various components of anger contribute to the etiology of hypertension and coronary heart disease. Harburg et al. (1973, 1979) and Gentry et al. (1981, 1982), as previously noted, have reported that individuals who tend to suppress anger have higher systolic and diastolic blood pressure, and Williams et al., (1980) found that patients with high scores on the MMPI Ho scale were more likely to develop coronary artery disease. Similarly, Dembroski, MacDougall, Williams, and Haney (1985) found that high ratings of potential for hostility and anger-in were positively associated with angiographically documented severity of coronary atherosclerosis. Johnson (1984) administered the AX Scale to 1,114 high school students in an investigation of the relation between anger expression and blood pressure (BP). Measures

< previous page

page_1009

next page >

< previous page

page_1010

next page > Page 1010

of systolic (SBP) and diastolic (DBP) blood pressure were obtained during the same class period in which these students responded to the psychological tests. The correlations of AX/In scores with SBP and DBP were positive, curvilinear, and highly significant for both sexes. Although no relation was found between suppressed anger and BP over 60% to 80% of the range of AX/In scores, students with very high AX/In scores had much higher BP. Because the correlations of AX/Out scores with BP were quite small, the overall pattern of correlations indicated that higher blood pressure was associated with holding anger in. Johnson (1984) also examined the influence of a number of variables found to be related to BP in previous research. Height, weight, dietary factors (salt intake), racial differences, and family history of hypertension correlated significantly with BP, but even after partialing out the influence of these variables, AX/In scores were still positively and significantly associated with elevated SBP and DBP. Indeed, in separate multiple regression analyses for males and females, AX/In scores were found to be better predictors of blood pressure than any other measure (i.e., AX/In scores were first to enter stepwise multiple discriminant equations for both sexes). The State-Trait Anger Expression Inventory The STAS and the AX Scale were combined to form the State-Trait Anger Expression Inventory (STAXI), which provides relatively brief, objectively scored measures of the experience, expression, and control of anger (Spielberger, 1988). The STAXI consists of 44 items, which form five primary scales and two subscales. The components of anger assessed by each STAXI scale are described in Table 32.1. Fuqua et al. (1991) administered the STAXI to a large sample of college students and factor analyzed their responses to the 44 individual items. The results of this analysis led TABLE 32.1 Definitions of the Components of Anger Assessed by the Subscales of the State-Trait Anger Expression Inventory Scale Anger Component Measured by Each STAXI Scale SAn emotional state marked by subjective feelings that vary in Anger: intensity, from mildannoyance or irritation to intense fury and rage, 10 Items accompanied by activation of the autonomic nervous system. The intensity of S-Anger varies as a function ofperceived injustice, being attacked or treated unfairly by others, or frustration resulting from barriers to goal-directed behavior. TIndividual differences in anger proneness, i.e., the tendency to Anger: perceive a wide range of situations as annoying or frustrating, and 10 Items to respond with elevations in S-Anger. High T-Anger individuals experience S-Anger more often, and with greater intensity, than persons low in T-Anger. T-Anger/T (4 items): Individual differences in a general disposition to experienceanger with little or no specific provocation. T-Anger/R (4 items): Individual differences in the disposition to feel angry when criticized or treated unfairly. AX/In: Individual differences in the frequency that angry feelings are 8 Items experienced, but held in or suppressed. AX/Out: Individual differences in the frequency that feelings of anger are 8 Items expressed in aggressive behavior directed toward other people or objects in the environment. AX/Con: Individual differences in the frequency that an individual attempts 8 Items to control the outward expression of angry feelings. AX/EX: This measure provides a general index of the frequency that anger 24 Items is experienced and expressed, irrespective of the direction of expression (expressed inwardly or outwardly).

< previous page

page_1010

next page >

< previous page

page_1011

next page > Page 1011

these investigators to conclude "that seven factors provided the best fit of the data to the instrument and its theoretical foundations" (p. 442). The first six factors identified by Fuqua et al. (1991) corresponded almost exactly to the five primary STAXI scales, except that items from the STAXI T-Anger Scale loaded on separate T-Anger Temperament and Reaction factors. Almost all of the items corresponding to each STAXI scale had salient loadings on the appropriate factor and negligible loadings on the other factors. In addition to providing confirmation of the factor structure of the STAXI, these findings provide strong evidence that the STAXI scales and subscales measure meaningful, relatively independent components of the experience, expression, and control of anger. The seventh factor identified by Fuqua et al. (1991) was defined by secondary but salient loadings for three of the 10 STAXI S-Anger items (i.e., "Feel like . . . breaking things," ". . . banging on the table," ". . . hitting someone"). Although these items had higher loadings on Factor I, the original S-Anger factor, which was defined by dominant salient loadings for all 10 S-Anger items, the findings of Fuqua et al. nevertheless suggested that there might be a second S-Anger factor. The content of the items with salient loadings on this factor seemed to reflect high levels of S-Anger associated with a strong instigation to express angry feelings in aggressive behavior. Van der Ploeg (1988) administered a Dutch adaptation of the 20-item State-Trait Anger Scale (STAS) to male military draftees in The Netherlands. In a separate factor analysis of the 10 S-Anger items, he also found two S-Anger factors that were quite similar to those reported by Fuqua et al. (1991). Factor analyses of the 44-item STAXI provides further evidence of two distinctive, yet highly correlated SAnger factors (D.G. Forgays, D.K. Forgays, & Spielberger, 1997). The two S-Anger factors that emerged in these analyses have been defined as Feeling Angry (e.g., "I feel angry") and Feel Like Expressing Anger (e.g., "I feel like hitting someone"). Recent research findings with an expanded version of the STAXI identified three SAnger factors, the Feeling Angry factor that was identified in previous research, and two components of the Feel Like Expressing Anger factor that formed separate factors. These components involve verbal expression (e.g., "feel like screaming") and physical expression (e.g., "I feel like hitting someone"). In the expanded and revised STAXI, all three S-Anger factors will be assessed with 5-item subscales. The revised STAXI will also contain two anger control scales for measuring the control of anger-out (e.g., ''Keeping the lid on") and controlling suppressed anger by reducing its intensity (e.g., "calming down," "cooling off"). Guidelines for Interpreting STAXI Scores and Research Applications The STAXI has proved useful for assessing the experience, expression, and control of anger in normal and abnormal individuals (Deffenbacher, 1992; Moses, 1992), and for evaluating the role of these anger components in a variety of disorders, including alcoholism, hypertension, coronary heart disease, and cancer (Spielberger, 1988). Comparing STAXI test scores with appropriate scale norms is an important first step in test interpretation. Norms for the STAXI scales are reported in the test manual for male and female high school and college students, and working adults (Spielberger, 1988). In addition, there are norms for the following special interest groups: general medical and surgical patients, prison inmates, and military recruits. The distributions of scores on the S-Anger and T-Anger/T scales are positively skewed, which prevents these scales from effectively discriminating among respondents

< previous page

page_1011

next page >

< previous page

page_1012

next page > Page 1012

Scale S-Anger

TAnger

TABLE 32.2 Guidelines for Interpreting High STAXI Scores Characteristics of Persons with High Scores Individuals with high scores are experiencing relatively intense angry feelings at the time the test was administered. If S-Anger is elevated relative to T-Anger, the individual's angry feelings are likely to be situationally determined. Elevations in S-Anger are more likely to reflect chronic anger if T-Anger and AX/In scores are also high. High T-Anger individuals frequently experience angry feelings, especially when they feel they are treated unfairly by others. Whether persons high in T-Anger suppress, express, or control their anger can be inferred from their scores on the AX/ln, AX/Out, and AX/Con scales.

TAnger/T Persons with high T-Anger/T scores are quick-tempered and readily express their anger with little provocation. Such individuals are often impulsive and lacking in anger control. High T-Anger/T individuals who have high AX/Con scores may be strongly authoritarian and use anger to intimidate others. TAnger/R Persons with high T-Anger/R scores are highly sensitive to criticism, perceived affronts, and negative evaluation by others. They frequently experience intense feelings of anger under such circumstances. AX/ln Persons with high AX/ln scores frequently experience intense angry feelings, but tend to suppress these feelings rather than to express them either physically or in verbal behavior. Persons with high AX/ln scores who also have high AX/Out scores may express their anger in some situations, while suppressing it in others. AX/Out Persons with high AX/Out scores frequently experience anger which they express in aggressive behavior. Anger-out may be expressed in physical acts such as assaulting other persons or slamming doors, or verbally in the form of criticism, sarcasm, insults, threats, and the extreme use of profanity. AX/Con Persons with high scores on the AX/Con scale tend to invest a great deal of energy in monitoring and preventing the expression of anger. Although controlling anger is certainly desirable, the overcontrol of anger may result in passivity and withdrawal. Persons with high AX/Con and high T-Anger scores may also experience anxiety and depression. with low scores. However, low scores on the other STAXI scales may provide useful information that contributes to understanding the personality dynamics of an individual with such scores. Individuals who score below the 25th percentile on the T-Anger, AX/In, and AX/Out scales generally experience, express, or suppress relatively little anger. However, low scores on these scales when AX/Con scores are very high may indicate excessive use of the defenses of denial and repression to protect an individual from experiencing unacceptable angry feelings. General guidelines for interpreting high scores for each of the STAXI scales are provided in Table 32.2. Percentile ranks reported in the STAXI manual corresponding to STAXI scale scores (Spielberger, 1988) indicate how a particular person compares with other individuals who are similar in age and gender. Scores between the 25th and 75th percentile on individual STAXI scales fall in what may be considered the normal range. Although individuals with scale scores that approach the 75th percentile are generally more prone to experience, outwardly express, or suppress anger than those with scores below the median, such differences are generally not sufficient to detect persons whose anger problems may predispose them to develop physical or psychological disorders (Spielberger, 1988). Individuals with anger scores above the 75th percentile are likely to experience and/or express angry feelings to a degree that may interfere with optimal functioning. The anger of these individuals may contribute to difficulties in interpersonal relationships, or dispose them to develop psychological or physical disorders. High AX/In scores,

< previous page

page_1013

next page > Page 1013

especially when associated with a low AX/Out scores and high levels of anxiety, have been found to be associated with elevated blood pressure (Johnson, 1984) and hypertension (Crane, 1981). Very high scores on both the AX/In and AX/Out scales (above the 90th percentile) may place an individual at risk for coronary artery disease and heart attacks, especially for persons who are high in anger control. The STAS and the AX scales have been used extensively in research on the relation between anger and health (Brooks, Walfish, Stenmark, & Canger, 1981; Cavanaugh, Kanonchoff, & Bartles, 1987; Johnson & Broman, 1987; Johnson-Saylor, 1984; Schlosser, 1986; Vitaliano, 1984; Vitaliano et al., 1986). With the development of the improved STAXI measures for assessing the experience and expression of anger, suppressed anger has been consistently identified as an important factor in elevated BP and hypertension (e.g., Deshields, 1986; Gorkin, Appel, Holroyd, Saab, & Stauder, 1986; Hartfield, 1985; Johnson, Spielberger, Worden, & Jacobs, 1987; Kearns, 1985; Schneider, Egan, & Johnson, 1986; Spielberger et al., 1985, 1988; van der Ploeg, van Buuren, & van Brummelen, 1988). McMillan (1984) used the STAXI scales to assess the anger experienced by patients undergoing treatment for Hodgkin's disease and lung cancer. The STAXI scales have also been used to examine relations between hardiness, well-being, and coping with stress (Schlosser & Sheeley, 1985a, 1985b), and to investigate the role of anger in Type-A behavior (Booth-Kewley & Friedman, 1987; Croyle, Jemmott, & Carpenter, 1988; Goffaux, Wallston, Heim, & Shields, 1987; Herschberger, 1985; Janisse, Edguer, & Dyck, 1986; Krasner, 1986; Spielberger et al., 1988). Kinder and his colleagues (Curtis, Kinder, Kalichman, & Spana, 1988; Kinder, Curtis, & Kalichman, 1986) used the STAXI scales in a series of studies of psychological factors that contribute to chronic pain, and Stoner (1988) investigated the effects of marijuana use on the experience and expression of anger. The STAXI scales have also been used in research on the effects of situational factors on the experience and expression of anger (Aragona, 1983; Buck, 1987; Pape, 1986). Assessment of Emotions in Treatment Planning The DSM-IV provides criteria for diagnosing anxiety disorders (American Psychiatric Association, 1994), but little attention has been given to the classification of problems with anger (Deffenbacher, 1992). Nevertheless, the assessment of both anger and anxiety is essential in planning an effective treatment program, and in evaluating the relative efficacy of different forms of behavioral and pharmacological interventions. Because the management of anxiety and anger during treatment is among the chief concerns of most psychotherapists and counselors, the valid assessment of these emotions can facilitate the treatment process (Deffenbacher, Demm, & Brandon, 1986). Consequently, obtaining reliable and valid measures of state and trait anxiety, and carefully assessing the experience, expression, and control of anger are essential in selecting an optimal form of treatment, monitoring the treatment process, and in evaluating treatment outcome. Assessing Anxiety in Treatment Planning and Evaluation Symptoms of anxiety are typically found in almost all emotional disorders. From a psychoanalytic perspective, as was previously noted, Freud (1936, p. 85) regarded anxiety as the "fundamental phenomenon and the central problem of neurosis." According to

< previous page

page_1013

next page >

< previous page

page_1014

next page > Page 1014

de la Torre (1979), dealing with transitory anxiety (S-Anxiety) must also be a major priority in all forms of short-term psychotherapy, including crisis intervention and dynamic treatments that focus on specific problems of the patient or client, such as test anxiety (Spielberger & Vagg, 1995). Diverse manifestations of anxiety in various physical and psychological disorders generally require different forms of treatment. As de la Torre (1979) noted, "The ubiquitousness of anxiety among psychiatric patients demands a careful assessment and diagnosis. The transitory anxiety in a well-compensated individual differs considerably from the intense anxiety that heralds psychotic decompensation. Both situations require different kinds of interventions and will have different prognostic outcomes" (p. 379). The STAI has been used to assess state and trait anxiety in more than 9,000 investigations, including psychological and pharmacological studies of the treatment of psychiatric, psychosomatic, and medical patients (Spielberger, 1989). The assessment of anxiety as a personality trait (T-Anxiety) is especially important in evaluating treatment outcomes in phobias (Foa & Kozak, 1985), and in panic and generalized anxiety disorders (Barlow, 1985). Careful assessment of anxiety is also essential in applications of systematic desensitization to the treatment of phobic patients, and in clients with conditioned aversion reactions (Suinn & Deffenbacher, 1988). The STAI has also been used extensively in test anxiety treatment studies. Test anxious individuals manifest high levels of S-Anxiety during examinations, which contributes to impaired test performance (Spielberger, Anton, & Bedell, 1976). It has been demonstrated that systematic desensitization, rational-emotive therapy, cognitive-behavioral interventions, and even relaxation training are all successful in reducing S-Anxiety in testing situations. However, cognitive treatment strategies appear to be more effective for reducing both test anxiety and level of T-Anxiety in test anxious students (Spielberger et al., 1976). Assessing Anger in Treatment Planning and Evaluation Deffenbacher (1992) reported research findings from a series of studies that have important implications for the clinical assessment and treatment of anger-related problems. In these studies, high T-Anger subjects experienced heightened S-Anger and physiological arousal on a daily basis, which could be targeted for behavioral treatment such as relaxation training and coping skills programs (Deffenbacher et al., 1986; Deffenbacher & Stark, 1992; Hazaleus & Deffenbacher, 1986). By helping clients learn to lower anger by engaging in self-initiated relaxation exercises, successful treatment frees them to use more effective problem-solving and social skills that were previously disrupted by anger-related cognitions, and unpleasant and distracting physiological arousal associated with heightened states of anger. Deffenbacher's (1992) consistent finding that high anger individuals experience anger across a wide range of ongoing daily situations has important implications for clinical treatment. His research suggests that emotional states of anger can be conceptualized as a complex cognitive-psychophysiological phenomenon embedded in a specific situational context. Effective treatment requires that all aspects of this phenomenon be carefully assessed, along with the behaviors triggered by or associated with anger. Deffenbacher recommended that a number of different measurement strategies be used in assessing anger, such as interviewing, role play, and selfmonitoring so that the range of real and potential sources of anger may be mapped. He further suggested that, in the latter stages

< previous page

page_1014

next page >

< previous page

page_1015

next page > Page 1015

of therapy, it may be appropriate to use self-monitoring measures of S-Anger, to provide opportunities for assessment, rehearsal, and transfer of skills and insights. Assessment of when, where, and why clients employ different anger expression strategies will not only contribute to clarifying the nature of anger and its expression, but will also help to identify adaptive strategies that can be used effectively in anger-provoking situations. High trait anger individuals seem to interpret many situations as insulting and frustrating (Beck, 1976). Maladaptive anger is related to serious personality problems, including difficulties in interpersonal relations and many health-related disorders (Hazaleus & Deffenbacher, 1985; Hogg & Deffenbacher, 1986; Story & Deffenbacher, 1985; Zwemer & Deffenbacher, 1984). Therefore, effective strategies for controlling anger are urgently needed in treatment planning (Deffenbacher, 1992). Effective treatment of anger-related problems also requires detailed knowledge concerning individuals' experience of both state and trait anger and their modes of anger expression (Sharkin, 1988). Careful assessment of the experience, expression, and control of anger is not only essential for understanding problems rooted in anger, but assessment is also a necessary first step in treatment planning. Because of the multidimensionality of anger, multifaceted interventions are likely to be required in order to produce beneficial treatment outcomes (Deffenbacher, 1991; Novaco, 1979). According to Deffenbacher (1992), therapeutic strategies for dealing with anger and anxiety should include psychodynamic, self-explorative, behavioral, and cognitive interventions to help patients perceive the world as less threatening. If successful, such interventions would help patients feel less vulnerable, thereby reducing personal frustration and decreasing the intensity and frequency of angry reactions. Relaxation exercises, social skills training, and cognitive-behavioral interventions have proved effective in decreasing levels of anxiety and anger (Deffenbacher et al., 1986; Deffenbacher, Story, Stark, Hogg, & Brandon, 1987). Research with the STAI and, more recently, with the STAXI and its subscales provides encouraging evidence of the utility of these inventories in treatment planning, and in the evaluation of treatment process and outcome. In a recent comprehensive evaluation and critique, Moses (1992) concluded that the STAXI is a "specific, sensitive, psychometric instrument," and that "If future applications of the STAXI are as experimentally rigorous as the development of this measure, there is great potential for its use to significantly further our understanding of important stress-based and stress-influenced syndromes and to help in identifying effective means by which such disorders may be reversed and prevented" (p. 524). Conclusions Recent advances in the conceptualization of anxiety and anger have stimulated the development of improved instruments for the measurement of these emotions. Early theories of anxiety, and conceptual ambiguity and confusion in current theoretical interpretations of anger, hostility, and aggression were examined. A number of techniques and procedures that have been developed to assess anxiety were discussed, and the construction and validation of a psychometric inventory designed to assess state and trait anxiety was reviewed. Research on anger and hostility was examined, and the procedures employed in developing and validating a psychometric instrument for measuring the experience, expression, and control of anger were described in detail. The chapter concluded with a discussion of issues concerning the utilization of measures of

< previous page

page_1015

next page >

< previous page

page_1016

next page > Page 1016

anxiety and anger in treatment planning, and in the evaluation of therapeutic interventions with individuals experiencing anxiety and anger-related problems. References Alexander, F.G. (1939). Emotional factors in essential hypertension: Presentation of a tentative hypothesis. Psychosomatic Medicine, 1, 175-179. Alexander, F.G. (1948). Emotional factors in hypertension. In F. Alexander & T.M. French (Eds.), Studies in psychosomatic medicine: An approach to the cause and treatment of vegetative disturbances. New York: Ronald. (Original work published 1939) Alexander, F.G., & French, T.M. (Eds.). (1948). Studies in psychosomatic medicine: An approach to the cause and treatment of vegetative disturbances. New York: Ronald Press. American Psychiatric Association (1994). Diagnostic and statistical manual of mental disorders (4th ed., rev.). Washington, DC: American Psychiatric Association. Aragona, J.C. (1983). Physical child abuse: An interactional analysis. (Doctoral dissertation, University of South Florida, Tampa, 1983). Dissertation Abstracts International, 44, 1225B. Atkinson, J. (1964). An introduction to motivation. Princeton, NJ: Van Nostrand-Reinhold. Averill, J.R. (1982). Anger and aggression: An essay on emotion. New York: Springer-Verlag. Barker, B.M., Barker, H.R., & Wadsworth, A.P., Jr. (1977). Factor analysis of the items of the State-Trait Anxiety Inventory. Journal of Clinical Psychology, 33, 450-455. Barker, L.R. (1979). Personality variables as determinants of performance problems of recruits in the U.S. armed forces. Unpublished master's thesis, University of South Florida, Tampa. Barlow, D.H. (1985). The dimensions of anxiety disorders. In A.H. Tuma & J.D. Maser (Eds.), Anxiety and the anxiety disorders (pp. 479-500). Hillsdale, NJ: Lawrence Erlbaum Associates. Beck, A. (1976). Cognitive therapy and the emotional disorders. New York: International Universities Press. Bendig, A.W. (1962). Factor analytic scales of covert and overt hostility. Journal of Consulting Psychology, 26, 200. Biaggio, M.K. (1980). Assessment of anger arousal. Journal of Personality Assessment, 44, 289-298. Biaggio, M.K., & Maiuro, R.D. (1985). Recent advances in anger assessment. In C.D. Spielberger & J.N. Butcher (Eds.), Advances in personality assessment (Vol. 5, pp. 71-111). Hillsdale, NJ: Lawrence Erlbaum Associates. Biaggio, M.K., Supplee, K., & Curtis, N. (1981). Reliability and validity of four anger scales. Journal of Personality Assessment, 45, 639-648. Booth-Kewley, S., & Friedman, H.S. (1987). Psychological predictors of heart disease: A quantitative review. Psychological Bulletin, 101, 343-362. Borkovec, T.D., Weerts, T.C., & Bernstein, D. A. (1977). Assessment of anxiety. In A. R. Ciminero, K.S. Calhoun, & H.E. Adams (Eds.), Handbook of behavioral assessment (pp. 367-428). New York: Wiley. Brooks, M.L., Walfish, S., Stenmark, D.E., & Canger, J.M. (1981). Personality variables in alcohol abuse in college students. Journal of Drug Education, 11, 185-189. Buck, D.K. (1987). The impact of parental divorce on levels of anger and anxiety in young adults in comparison to young adults whose parents are not divorced. Unpublished master's thesis, Ohio University, Athens. Buss, A.H. (1961). The psychology of aggression. New York: Wiley. Buss, A.H., & Durkee, A. (1957). An inventory for assessing different kinds of hostility. Journal of Consulting Psychology, 21, 343-349. Caine, T.M., Foulds, G.A., & Hope, K. (1967). Manual of the hostility and direction of hostility questionnaire (HDHQ). London: University of London Press. Campbell, D.T. (1963). Social attitudes and other acquired behavioral dispositions. In S.

< previous page

page_1016

next page >

< previous page

page_1017

next page > Page 1017

Koch (Ed.), Psychology: A study of a science (Vol. 6, pp. 94-172). New York: McGraw-Hill. Cattell, R.B. (1966). Patterns of change: Measurement in relation to state-dimension, trait change, liability, and process concepts. In Handbook of multivariate experimental psychology. Chicago: Rand McNally. Cattell, R.B., & Scheier, I.H. (1958). The nature of anxiety: A review of thirteen multivariate analyses comprising 814 variables. Psychological Reports, 4, 351. Cattell, R.B., Scheier, I.H. (1961). The meaning and measurement of neuroticism and anxiety (pp. 57, 182). New York: Ronald. Cattell, R.B., & Scheier, I.H. (1963). Handbook for the IPAT Anxiety Scale (2nd ed.). Champaign, IL: Institute for Personality and Ability Testing. Cavanaugh, D.J., Kanonchoff, A.D., & Bartels, R.L. (1987). Menstrual irregularities in athletic women may be predictable based on pre-training menses. Unpublished manuscript, Ohio State University, Department of Work Psychology, Columbus. Cook, W.W., & Medley, D.M. (1954). Proposed hostility and pharisaic-virtue scales for the MMPI. Journal of Applied Psychology, 38, 414-418. Crane, R.S. (1981). The role of anger, hostility, and aggression in essential hypertension. (Doctoral dissertation, University of South Florida, Tampa, 1981). Dissertation Abstracts International, 42, 2982B. Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-335. Croyle, R.T., Jemmott, J.B., III, & Carpenter, B. D. (1988). Relations between four individual difference measures associated with cardiovascular dysfunction and anger coping style. Psychological Reports, 63, 779786. Curtis, G., Kinder, B., Kalichman, S., & Spana, R. (1988). Affective differences among subgroups of chronic pain patients. Anxiety Research: An International Journal, 1, 65-73. Darwin, C. (1965). The expression of emotions in man and animals. Chicago: University of Chicago Press. (Original work published 1872) Deffenbacher, J.L. (1991, July). Cognitive-behavioral approaches to general anger reduction. In J.F.A. Cruz (Ed.), Proceedings of the International Congress on Stress, Anxiety, and the Emotional Disorders. Braga, Portugal. Deffenbacher, J.L. (1992). Trait anger: Theory, findings, and implications. In C.D. Spielberger & J.N. Butcher (Eds.), Advances in Personality Assessment (Vol. 9, pp. 177-201). Hillsdale, NJ: Lawrence Erlbaum Associates. Deffenbacher, J.L., Demm, P.M., & Brandon, A.D. (1986). High general anger: Correlates and treatment. Behavior Research and Therapy, 24, 480-489. Deffenbacher, J.L., & Stark, R.S. (1992). Cognitive and cognitive-relaxation treatments of general anger. Journal of Counseling Psychology, 39, 158-167. Deffenbacher, J.L., Story, D.A., Stark, R.S., Hogg, J.A., & Brandon, A.D. (1987). Cognitive-relaxation and social skills interventions in the treatment of general anger. Journal of Counseling Psychology, 34, 171-176. de la Torre, J. (1979). Anxiety states and short-term psychotherapy. In W.E. Fann, I. Karacan, A.D. Polorny, & R.L. Williams (Eds.), Phenomenology and treatment of anxiety (pp. 377-388). Jamaica, NY: Spectrum. Dembroski, T.M., MacDougall, J.M., Williams, R.B., & Haney, T.L. (1985). Components of Type A, hostility, and anger-in: Relationship to angiographic findings. Psychosomatic Medicine, 47, 219-233. Deshields, T.L. (1986). Anger and assertiveness in essential hypertension. Dissertation Abstracts International, 46, 3212B. (University Microfilms No. 85-24, 330.) Duffy, F. (1941). An explanation of "emotional" phenomena without the use of the concept "emotion." Journal of General Psychology, 25, 283-293. Evans, D.R., & Stangeland, M. (1971). Development of the reaction inventory to measure anger. Psychological Reports, 29, 412-414. Eysenck, H.J., & Eysenck, S.B.G. (1975). Manual of the Eysenck Personality Questionnaire. London: Hodder & Stroughton. Foa, E.B., & Kozak, M.J. (1985). Treatment of anxiety disorders: Implications for psychopathology. In A.H. Tuma & J.D. Maser (Eds.), Anxiety and the anxiety disorders (pp. 421-452). Hillsdale, NJ: Lawrence Erlbaum Associates. Forgays, D.G., Forgays, D.K., & Spielberger, C.D. (1997). Factor structure of the State-Trait Anger Expression Inventory for young adults. Journal of Personality Assessment, 69, 497-507.

< previous page

page_1018

next page > Page 1018

Freud, S. (1924). Collected papers (Vol. 1). London: Hogarth. Freud, S. (1936). The problem of anxiety. New York: Norton. Funkenstein, D.H., King, S.H., & Drolette, M. E. (1954). The direction of anger during a laboratory stressinducing situation. Psychosomatic Medicine, 16, 404-413. Fuqua, D.R., Leonard, E., Masters, M.A., Smith, R.J., Campbell, J.L., & Fischer, P.C. (1991). A structural analysis of the State-Trait Anger Expression Inventory (STAXI). Educational and Psychological Measurement, 51, 439-446. Gaudry, E., & Poole, C. (1975). A further validation of the state-trait distinction in anxiety research. Australian Journal of Psychology, 27, 119-125. Gaudry, E., Spielberger, C.D., & Vagg, P.R. (1975). Validation of the state-trait distinction in anxiety research. Multivariate Behavior Research, 10, 331-341. Gentry, W.D. (1972). Biracial aggression: 1. Effect of verbal attack and sex of victim. Journal of Social Psychology, 88, 75-82. Gentry, W.D., Chesney, A.P., Gary, H.G., Hall, R.P., & Harburg, E. (1982). Habitual anger-coping styles: 1. Effect on mean blood pressure and risk for essential hypertension. Psychosomatic Medicine, 44, 195-202. Gentry, W.D., Chesney, A.P., Hall, R.P., & Harburg, E. (1981). Effect of habitual anger-coping pattern on blood pressure in black/white, high/low stress area respondents. Psychosomatic Medicine, 43, 88. Goffaux, J., Wallston, B.S., Heim, C.R., & Shields, S.L. (1987, March). Type A behaviors, hostility, anger and exercise adherence. Paper presented at the Eighth Annual Session of the Society of Behavioral Medicine, Washington, DC. Gorkin, L., Appel, M., Holroyd, K.A., Saab, P. G., & Stauder, L. (1986). Anger management style and family history status as risk factors for essential hypertension. Unpublished manuscript, Ohio University, Athens. Hamilton, M. (1959). The assessment of anxiety states by rating. British Journal of Medical Psychology, 32, 50. Harburg, E., Blakelock, E.H., & Roeper, P.J. (1979). Resentful and reflective coping with arbitrary authority and blood pressure: Detroit. Psychosomatic Medicine, 3, 189-202. Harburg, E., Erfurt, J.C., Hauenstein, L.S., Chape, C., Schull, W.J., & Schork, M.A. (1973). Socio-ecological stress, suppressed hostility, skin color, and Black-White male blood pressure: Detroit. Psychosomatic Medicine, 35, 276-296. Harburg, E., & Hauenstein, L. (1980). Parity and blood pressure among four race-stress groups of females in Detroit. American Journal of Epidemiology, 111, 356-366. Harburg, E., Schull, W.J., Erfurt, J.C., & Schork, M.A. (1970). A family set method for estimating heredity and stress-I. Journal of Chronic Disease, 23, 69-81. Hartfield, M.T. (1985). Appraisals of anger situations and subsequent coping responses in hypertensive and normotensive adults: A comparison. (Doctoral dissertation, University of California, 1985). Dissertation Abstracts International, 46, 4452B. Hazaleus, S.L., & Deffenbacher, J.L. (1985). Irrational beliefs and anger arousal. Journal of College Student Personnel, 26, 47-52. Hazaleus, S.L., & Deffenbacher, J.L. (1986). Relaxation and cognitive treatments of anger. Journal of Consulting and Clinical Psychology, 54, 222-226. Herschberger, P. (1985). Type A behavior in non-intensive and intensive care nurses. Unpublished master's thesis, University of South Florida, Tampa. Hodges, W.F. (1976). The psychophysiology of anxiety. In M. Zuckerman & C.D. Spielberger (Eds.), Emotions and anxiety: New concepts, methods, and applications (pp. 175-194). Hillsdale, NJ: Lawrence Erlbaum Associates. Hogg, J.A., & Deffenbacher, J.L. (1986). Irrational beliefs, depression and anger in college students. Journal of College Student Personnel, 27, 349-353. Jacobs, G.A., Latham, L.E., & Brown, M.S. (1988). Test-retest reliability of the State-Trait Personality Inventory and the Anger Expression Scale. Anxiety Research, 1, 263-265. Janisse, M.P., Edguer, N., & Dyck, D.G. (1986). Type A behavior, anger expression, and reactions to anger imagery. Motivation and Emotion, 10, 371-385. Johnson, E., Spielberger, C., Worden, T., & Jacobs, G. (1987). Emotional and familial determinants of elevated blood pressure in

< previous page

page_1019

next page > Page 1019

black and white adolescent males. Journal of Psychosomatic Research, 31, 287-300. Johnson, E.H. (1984). Anger and anxiety as determinants of elevated blood pressure in adolescents. Unpublished doctoral dissertation, University of South Florida, Tampa. Johnson, E.H., & Broman, C.L. (1987). The relationship of anger expression to health problems among Black Americans in a national survey. Journal of Behavioral Medicine, 10, 103-169. Johnson-Saylor, M.T. (1984). Relationships among anger expression, hostility, hardiness, social support, and health risk. Unpublished doctoral dissertation, University of Michigan, Ann Arbor. Kearns, W.D. (1985). A laboratory study of the relationship of mode of anger expression to blood pressure. Unpublished master's thesis, University of South Florida, Tampa. Kendall, P.C., Finch, A.J., Jr., Auerbach, S. M., Hooke, J.F., & Mikulka, P.J. (1976). The State-Trait Anxiety Inventory: A systematic evaluation. Journal of Consulting and Clinical Psychology, 44, 406-412. Kinder, B., Curtis, G., & Kalichman, S. (1986). Anxiety and anger as predictors of MMPI elevations in chronic pain patients. Journal of Personality Assessment, 50, 651-661. Knight, R.G., Chisholm, B.J., Paulin, J.M., & Waal-Manning, H.J. (1988). The Spielberger Anger Expression Scale: Some psychometric data. Journal of Clinical Psychology, 27, 279-281. Krasner, S.S. (1986). Anger, anger control, and the coronary prone behavior pattern. Unpublished master's thesis, University of South Florida, Tampa. Lader, M. (1975). Psychophysiological parameters and methods. In L. Levi (Ed.), Emotions: Their parameters and measurement (pp. 341-367). New York: Raven. Lazarus, R.S., & Folkman, S. (1984). Stress, appraisal, and coping. New York: Springer. Lazarus, R.S., & Opton, E.M., Jr. (1966). The study of psychological stress. In C.D. Spielberger (Ed.), Anxiety and behavior (pp. 225-262). New York: Academic Press. Levitt, E.E. (1980). The psychology of anxiety (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Martin, I. (1973). Somatic reactivity: Methodology. In H.J. Eysenck (Ed.), Handbook of abnormal psychology (2nd ed., pp. 417-456). San Diego: Knapp. May, R. (1950). The meaning of anxiety. New York: Ronald. McMillan, S.C. (1984). A comparison of levels of anxiety and anger experienced by 2 groups of cancer patients during therapy for Hodgkin's disease and small cell lung cancer. Unpublished master's thesis, University of South Florida, Tampa. McReynolds, P. (1968). The assessment of anxiety: A survey of available techniques. In P. McReynolds (Ed.), Advances in psychological assessment (Vol. 1, pp. 244-264). Palo Alto, CA: Science and Behavior Books. Moses, J.A. (1992). State-Trait Anger Expression Inventory (research ed.). In D.J. Keyser & R.C. Sweetland (Eds.), Test critiques (Vol. 9, pp. 510-525). Austin, TX: PRO-ED. Novaco, R.W. (1975). Anger control: The development and evaluation of an experimental treatment. Lexington, MA: Lexington Heath. Novaco, R.W. (1979). The cognitive regulation of anger and stress. In P.C. Kendall & S.D. Hollon (Eds.), Cognitive behavioral interventions, theory, research, and procedures (pp. 241-285). New York: Academic Press. Pape, N.E. (1986). Emotional reactions and anger coping strategies of anger suppressors and expression. (Doctoral dissertation, University of South Florida, Tampa, 1986). Dissertation Abstracts International, 47, 2627B. Plutchik, R. (1962). The emotions. New York: Random House. Pollans, C.H. (1983). The psychometric properties and factor structure of the Anger EX-pression (AX) Scale. Unpublished master's thesis, University of South Florida, Tampa. Rosenzweig, S. (1976). Aggressive behavior and the Rosenzweig picture frustration study. Journal of Clinical Psychology, 32, 885-891. Rosenzweig, S. (1978). The Rosenzweig Picture-Frustration (P-F) Study basic manual and adult form supplement. St. Louis: Rana. Russell, S.F. (1981). The factor structure of the Buss-Durkee Hostility Inventory. Unpublished master's thesis, University of South Florida, Tampa. Schlosser, M.B. (1986, August). Anger, crying, and health among females. Paper presented at the 94th annual

convention of the American Psychological Association, Washington, DC.

< previous page

page_1019

next page >

< previous page

page_1020

next page > Page 1020

Schlosser, M.B., & Sheeley, L.A. (1985a, August). The hardy personality: Females coping with stress. Paper presented at the 93rd annual convention of the American Psychological Association, Los Angeles, CA. Schlosser, M.B., & Sheeley, L.A. (1985b, August). Subjective well-being and the stress process. Paper presented at the 93rd annual convention of the American Psychological Association, Los Angeles, CA. Schneider, R.H., Egan, B., & Johnson, E.H. (1986). Anger and anxiety in borderline hypertension. Psychosomatic Medicine, 48, 242-248. Schultz, S.D. (1954). A differentiation of several forms of hostility by scales empirically constructed from significant items on the MMPI. Dissertation Abstracts, 17, 717-720. Sharkin, B.S. (1988). Treatment of client anger in counseling. Journal of Counseling and Development, 66, 361365. Siegel, S. (1956). The relationship of hostility to authoritarianism. Journal of Abnormal and Social Psychology, 52, 368-373. Spielberger, C.D. (1966). Theory and research on anxiety. In C.D. Spielberger (Ed.), Anxiety and behavior (pp. 3-20). New York: Academic Press. Spielberger, C.D. (1972a). Anxiety as an emotional state. In C.D. Spielberger (Ed.), Anxiety: Current trends in theory and research (Vol. 1, pp. 24-49). New York: Academic Press. Spielberger, C.D. (1972b). Current trends in theory and research on anxiety. In C.D. Spielberger (Ed.), Anxiety: Current trends in theory and research (Vol. 1, pp. 3-19). New York: Academic Press. Spielberger, C.D. (1973). Manual for the State-Trait Anxiety Inventory for Children. Palo Alto, CA: Consulting Psychologists Press. Spielberger, C.D. (1976). Stress and anxiety and cardiovascular disease. Journal of the South Carolina Medical Association (Suppl. 15), 15-22. Spielberger, C.D. (1977). Anxiety: Theory and research. In B.B. Wolman (Ed.), International encyclopedia of neurology, psychiatry, psychoanalysis, and psychology (pp. 81-84). New York: Human Sciences Press. Spielberger, C.D. (1979b). Understanding stress and anxiety. London: Harper & Row. Spielberger, C.D. (1979a). Preliminary manual for the State-Trait Personality Inventory (STPI). Unpublished manuscript, University of South Florida, Tampa. Spielberger, C.D. (1980). Preliminary manual for the State-Trait Anger Scale (STAS). Tampa, FL: University of South Florida, Human Resources Institute. Spielberger, C.D. (1983). Manual for the State-Trait Anxiety Inventory: STAI (Form Y). Palo Alto, CA: Consulting Psychologists Press. Spielberger, C.D., (1988). Manual for the State-Trait Anger Expression Inventory (STAXI). Odessa, FL: Psychological Assessment Resources. Spielberger, C.D. (1989). State-Trait Anxiety Inventory: A comprehensive bibliography (2nd ed.). Palo Alto, CA: Consulting Psychologists Press. Spielberger, C.D., Anton, W.D., & Bedell, J. (1976). The nature and treatment of test anxiety. In M. Zuckerman & C.D. Spielberger (Eds.), Emotions and anxiety: New concepts, methods and applications (pp. 317-345). New York: Lawrence Erlbaum Associates/Wiley. Spielberger, C.D., & Gorsuch, R.L. (1966). The developoment of the State-Trait Anxiety Inventory. In C.D. Spielberger & R.L. Gorsuch, Mediating processes in verbal conditioning (Final report to the National Institutes of Health, U.S. Public Health Service on Grants MH-7229, MH-7446, and HD-947). Tampa: University of South Florida. Spielberger, C.D., Gorsuch, R.L., & Lushene, R. D. (1970). STAI: Manual for the State-Trait Anxiety Inventory. Palo Alto: Consulting Psychologists Press. Spielberger, C.D., Jacobs, G., Russell, S., & Crane, R. (1983). Assessment of anger: The State-Trait Anger Scale. In J.N. Butcher & C.D. Spielberger (Eds.), Advances in personality assessment (Vol. 2, pp. 159-187). Hillsdale, NJ: Lawrence Erlbaum Associates. Spielberger, C.D., Johnson, E.H., Russell, S. F., Crane, R.J., Jacobs, G.A., & Worden, T. J. (1985). The experience and expression of anger: Construction and validation of an anger expression scale. In M.A. Chesney & R.H. Rosenman (Eds.), Anger and hostility in cardiovascular and behavioral disorders (pp. 5-30). New York: Hemisphere/McGraw-Hill. Spielberger, C.D., Krasner, S.S., & Solomon, E. P. (1988). The experience, expression and control of anger. In M.P. Janisse (Ed.), Health psychology: Individual differences and stress (pp. 89-108). New York: Springer-

Verlag.

< previous page

page_1020

next page >

< previous page

page_1021

next page > Page 1021

Spielberger, C.D., & Vagg, P.R. (Eds.) (1995). Test anxiety: theory, assessment, and treatment. Washington, DC: Taylor & Francis. Spielberger, C.D., Vagg, P.R., Barker, L.R., Donham, G.W., & Westberry, L.G. (1980). The factor structure of the State-Trait Anxiety Inventory. In I.G. Sarason & C.D. Spielberger (Eds.), Stress and anxiety (Vol. 7, pp. 95109). Washington, DC: Hemisphere. Stark, R.S., & Deffenbacher, J.L. (1986, April). General anger and self-concept. Paper presented at Rocky Mountain Psychological Association, Denver, CO. Stoner, S.B. (1988). Undergraduate marijuana use and anger. Journal of Psychology, 122, 343-347. Story, D., & Deffenbacher, J.L. (1985, April). General anger and personality. Paper presented at Rocky Mountain Psychological Association, Tucson, AZ. Suinn, R.M., & Deffenbacher, J.C. (1988). Anxiety management training. The Counseling Psychologist, 16, 3149. Tavris, C. (1982). Anger, the misunderstood emotion. New York: Simon & Schuster. Taylor, J.A. (1953). A personality scale of manifest anxiety. Journal of Abnormal Social Psychology, 48, 285. Titchener, E.B. (1897). An outline of psychology. New York: MacMillan. Vagg, P.R., Spielberger, C.D., & O'Hearn, T. P., Jr. (1980). Is the State-Trait Anxiety Inventory multidimensional? Personality and Individual Differences, 1, 202-214. van der Ploeg, H.M. (1988). The factor structure of the State-Trait Anger Scale. Psychological Reports, 63, 978. van der Ploeg, H.M., van Buuren, E.T., & van Brummelen, P. (1988). The role of anger in hypertension. Psychotherapy and Psychosomatics, 43, 186-193. Vitaliano, P.P. (1984). Identification and intervention with students at high risk for distress in medical school. Unpublished doctoral dissertation, University of Washington, Seattle. Vitaliano, P.P., Maiuro, R.D., Russo, J., Mitchell, E.S., Carr, J.E., & van Citters, R. L. (1986). A biopsychosocial model to explain personal sources of medical student distress. Proceedings of the 26th Annual Conference on Research in Medical Education, 26, 228-234. Welsh, G.S. (1956). Factor dimensions A and R. In G.S. Welsh & W.G. Dahlstrom (Eds.), Basic readings on the MMPI in psychology and medicine (pp. 264-281). Minneapolis: University of Minnesota Press. Westberry, L.G. (1980). Concurrent validation of the Trait-Anger Scale and its correlation with other personality measures. Unpublished master's thesis, University of South Florida, Tampa. Williams, R.B., Haney, T.L., Lee, K.L., Kong, Y., Blumenthal, J., & Whalen, R.E. (1980). Type A behavior, hostility, and coronary atherosclerosis. Psychosomatic Medicine, 42, 539-549. Wundt, W. (1896). Outlines of psychology. New York: Dustav E. Stechert. Young, P.T. (1943). Emotion in man and animal. New York: Wiley. Zelin, M.L., Adler, G., & Myerson, P.G. (1972). Anger self-report: An objective questionnaire for the measurement of aggression. Journal of Consulting and Clinical Psychology, 39, 340. Zuckerman, M. (1960). Development of an Affect Adjective Check List for the measurement of anxiety. Journal of Consulting Psychology, 26, 291. Zuckerman, M., & Biase, D.V. (1962). Replication and further data on the Affect Adjective Check List measures of anxiety. Journal of Consulting Psychology, 26, 291. Zuckerman, M., & Lubin, B. (1965). Manual for the Multiple Affect Adjective Checklist. San Diego: Educational and Industrial Testing Service. Zwemer, W.A., & Deffenbacher, J.L. (1984). Irrational beliefs, anger and anxiety. Journal of Counseling Psychology, 31, 391-393.

< previous page

page_1021

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_1023

next page > Page 1023

Chapter 33 Minnesota Multiphasic Personality Inventory-2 (MMPI-2) Roger L. Greene Pacific Graduate School of Psychology James R. Clopton Texas Tech University The Minnesota Multiphasic Personality Inventory (MMPI) was developed to provide an objective means of diagnosing psychopathology (Hathaway & McKinley, 1940), and it quickly became the most widely used and researched objective personality inventory (Lubin, Larsen, Matarazzo, & Seever, 1985). The restandardization of the MMPI resulted in the MMPI-2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989), which is the focus of this chapter. The major change within the items on the standard validity and clinical scales in the restandardization of the MMPI-2 was the deletion of 13 items with objectionable or outdated content (Butcher, Dahlstrom et al., 1989). New content scales (Butcher, Graham, Williams, & Ben-Porath, 1989) were developed for the MMPI-2 so that clinicians have empirically and rationally derived scales available for interpretation. The MMPI-2 consists of 567 true-false items. Scoring proceeds by counting the client's ''deviant" responses to each of the items on a particular scale. The items are not weighted in the scoring process, that is, each deviant response is simply counted. The normative group for the MMPI-2 consisted of 2,600 individuals who were selected to be representative of the United States (Butcher, Dahlstrom et al., 1989). The normative group matched U.S. census data for age, ethnicity, and marital status, but it had a higher level of education and occupation. The potential impact of this higher level of education and occupation in the MMPI-2 normative sample on codetype and scale interpretation has been a focus of ongoing concern (Caldwell, 1990; Helmes & Reddon, 1993). However, Schinka and LaLone (1997) developed a census-matched subsample within the MMPI-2 restandardization sample and found only one difference that exceeded 3 T-score points between these two samples on the standard validity and clinical scales, content scales, and supplementary scales. Types of Available Norms The MMPI-2 normative group consisted of adults who ranged in age from 18 to 89. The person's scores on the MMPI-2 scales are compared to those of either men or

< previous page

page_1023

next page >

< previous page

page_1024

next page > Page 1024

women in the normative group. Separate norms for men and women were developed because there are gender differences on several scales with women scoring higher on Scales 2 (Depression), 3 (Hysteria), 7 (Psychasthenia), and 0 (Social Introversion). There has been some discussion of developing unigender norms for the MMPI-2 (Tellegen, Butcher, & Hoeglund, 1993) because of the potential problems of gender-specific norms in personnel settings in which gender cannot be used as a criterion for selection, but there has not been much movement on this front. Specific norms are not provided by age on the MMPI-2, even though it is well known that there are substantial effects of age below the age of 20. These age effects are reflected in the development of separate sets of adolescent norms for the original MMPI (Marks & Briggs, 1972), and the restandardization of a different form of the MMPI for adolescents (MMPI-A; Butcher et al., 1992). Colligan, Osborne, Swenson, and Offord (1983, 1989) found substantial effects of age on MMPI performance in their contemporary normative sample with differences of 10 or more T points between 18- and 19-year-olds and 70-year-olds on Scales L and 9 (Hypomania). Several MMPI-2 scales demonstrate differences of nearly 5 T points between 20-year-olds and 60year-olds (Butcher, Dahstrom et al., 1989; Caldwell, 1997; Greene & Schinka, 1996) with scores on Scales L (women only), 1 (Hypochondriasis), and 3 (Hysteria) increasing and Scales 4 (Psychopathic Deviate) and 9 (Hypomania) decreasing with age. Given that these age comparisons involve different cohorts, it is not possible to know whether these effects actually reflect the influence of age or simply differences between the cohorts. Butcher et al. (1991) found few effects of age in older (> 60) men and they saw no reason for age-related norms in these men. The potential effects of education have not been investigated in any systematic manner either on the MMPI or the MMPI-2, although such research clearly is needed. When the men and women in the MMPI-2 normative group with less than a high school education were contrasted with men and women with postgraduate education (Dahlstrom & Tellegen, 1993, pp. 58-59), the differences on the following scales exceeded 5 T points: L (women only), F, K, 5 (Masculinity-Femininity), and 0 (Social Introversion). Men and women with less than a high school education had a higher score in all of these comparisons except for Scales K and 5. When psychiatric patients with 8 years or less of education were contrasted with patients with 16 or more years of education (Caldwell, 1997), the differences ranged from 4 to 8 T points on all of the scales except 3 (Hysteria). The patients with less education had higher scores in all of these comparisons except for Scales K and 5. There do not appear to be any systematic effects of occupation or income within the MMPI-2 normative group (Dahlstrom & Tellegen, 1993; Long, Graham, & Timbrook, 1994). There have been no studies of the effects of these two factors in psychiatric patients. The effects of ethnicity on MMPI performance have been reviewed by Dahlstrom, Lachar, and Dahlstrom (1986) and Greene (1987), and they concluded that there is not any consistent pattern of scale differences between any two ethnic groups. Timbrook and Graham (1994) and Zalewski and Greene (1996) reached a similar conclusion in their review of ethnicity on MMPI-2 performance. Multivariate regressions of age, education, gender, ethnicity, and occupation on the standard validity and clinical scales in the MMPI-2 normative group (Dahlstrom & Tellegen, 1993) and psychiatric patients (Caldwell, 1997: age, education, and gender only; Schinka, LaLone, & Greene, 1997) have shown that the percentage of variance accounted for by these factors does not exceed 10%. The exception is Scale 5 (Masculinity-Femininity) in which slightly over 50% of the variance is accounted for by gender.

< previous page

page_1024

next page >

< previous page

page_1025

next page > Page 1025

Such small percentages of variance are unlikely to impact the interpretation of most MMPI-2 profiles. In summary, it appears that demographic variables will have minimal impact on the MMPI-2 profile in most individuals. It may be important to monitor the validity of the MMPI-2 profile more closely in persons with limited education and lower occupations. A major reason that demographic effects are seen in these persons may simply reflect the eighth-grade reading level of the MMPI-2. Basic Reliability and Validity Information Test-retest reliability for the individual validity and clinical scales on the MMPI-2 range from .68 to .92 for a 2week interval (Butcher, Dahlstrom et al., 1989). Test-retest reliability for the MMPI-2 content scales are comparable, ranging from .78 to .91 (Butcher, Graham et al., 1989). The research on the validity of the original MMPI is so prolific that it almost defies summarization, because it has been estimated that there are over 10,000 studies on the MMPI (Dahlstrom, Welsh, & Dahlstrom, 1975). Although only 13 items were deleted on the standard validity and clinical scales of the MMPI-2, the development of uniform T-scores and a larger and more representative normative group has raised questions on codetype comparability between the MMPI and MMPI-2 and whether the validity research on the original MMPI can be generalized directly to the MMPI-2. Codetype comparability between the MMPI and MMPI-2 ranges from 50% to 90% using well-defined codetypes, and 40% to 70% for nonrestrictive codetypes (cf. Dahlstrom, 1992; Edwards, Morrison, & Weissman, 1993a, 1993b; Graham, Timbrook, Ben-Porath, & Butcher, 1991; Morrison, Edwards, Weissman, Allen, & DeLaCruz, 1995). (A well-defined codetype requires at least 5 T points between the scale(s) defining the codetype and the next highest clinical scale, whereas nonrestrictive codetypes place no restrictions on the relative elevations of any of the clinical scales.) None of the studies of the comparability of the MMPI and MMPI-2 actually has examined the empirical correlates of the two instruments despite calls for such research (Dahlstrom, 1992). The extant literature that has examined the empirical correlates of MMPI-2 scales and codetypes has been very consistent with the correlates reported for their MMPI counterparts (Archer, Griffin, & Aiduk, 1995; BenPorath, Butcher, & Graham, 1991; Bence, Sabourin, Luty, & Thackrey, 1995; Boone, 1994; Moser, 1996; Sieber & Meyers, 1992; Wetzler, Khadivi, & Oppenheim, 1995). It appears safe to assume that the correlates of welldefined MMPI-2 codetypes and the individual validity and clinical scales will be very similar to those for the MMPI. The data are less clear for MMPI-2 codetypes that are not well-defined, although it still will be safe to interpret the individual scales in these codetypes using MMPI correlates. Basic Interpretive Strategy Interpretation of the MMPI-2 is based on codetypes, that is, the two highest clinical scales elevated to a T-score of 65 or higher. This interpretation is then supplemented by the examination of specific scales, such as the content and supplementary scales, as

< previous page

page_1025

next page >

< previous page

page_1026

next page > Page 1026

well as individual "critical" items. Butcher and Williams (1992), Friedman, Webb, Lewak, and Nichols (1999), Graham, (1993) and Greene (1991) provided examples of the general interpretive strategy for the MMPI-2. Computer interpretations of the MMPI-2, which can provide the clinician with the general dimensions of the patient's psychopathology, are available from several commercial sources (Butcher, 1989; Caldwell, 1989; Greene & Brown, 1998). Butcher (1987) provided an overview of the issues that arise in computerized psychological assessment. Use of the MMPI-2 For Treatment Planning General Issues The MMPI-2, as an overall measure of psychopathology, will identify the patient's specific psychopathology that can be used to plan treatment interventions and techniques. Thus, if the MMPI-2 indicates that the patient is depressed, treatment planning will be different than if the patient has a number of physical symptoms and is not depressed, or if the patient has numerous bizarre symptoms. This level of analysis of the MMPI-2 will not be explored here because it is self-evident to most clinicians. Research Applications and Findings A number of specific studies that have examined the relation between the MMPI/MMPI-2 and treatment planning or outcome are reviewed later. The interested reader should consult Dahlstrom et al. (1975), who devoted an entire chapter to the MMPI and treatment evaluation that covers the literature through 1973. Butcher (1990) provided a more recent summary of the MMPI-2 in psychological treatment. Clinical Applications The validity, clinical, and content scales of the MMPI-2 are examined in turn as they affect treatment planning. Rather than review all of the specific statements that could be made within each set of scales, only summary statements about major issues are made. The interested reader can refer to the various MMPI-2 sources mentioned earlier for more specific statements about treatment planning with a given scale or set of scales and for the research that validates it. The issues for treatment planning related to the consistency of item endorsement on the MMPI-2 are outlined in Table 33.1. The actual scales/indexes used by the clinician to identify consistency of item endorsement, for example, are not specified because consistency can be assessed in a number of ways (e.g., F scale; Variable Response Inconsistency Scale, VRIN; the difference between F and FB). The focus in Table 33.1 is on the implications of the consistency of item endorsement for treatment planning, not how it is to be assessed. Even a brief perusal of Table 33.1 should reveal that the specific issues within a given section have a number of important implications for treatment planning. For example, if a patient is unwilling to comply with completing

< previous page

page_1026

next page >

< previous page

page_1027

next page > Page 1027

TABLE 33.1 Consistency of item Endorsement in Treatment Planning Issue Implications Items Patients have reading comprehension skills at the eighth grade endorsed or higher and should be able to read any treatment materials consistently that are provided. Patients are compliant with the assessment process that bodes well for all interventions. Items Patients either lack the necessary reading skills or intellectual endorsed ability to endorse the items consistently or they are unwilling inconsistently to engage in the assessment process. In the former instance, any reading within the treatment process needs to be deemphasized. In the latter instance, patients who are noncompliant with assessment are very probable to be noncompliant with other aspects of treatment. Their noncompliance should become the first focus of treatment. the MMPI-2 in a consistent and accurate manner, then there is little reason to expect that the patient will be more compliant with other tasks in treatment. Consequently, clinicians should address such issues directly with the patient rather than assume that the reasons for giving the MMPI-2 are no longer important. If the clinician believes that the information to be provided by the MMPI-2 warranted administration of the test, then the patient's noncompliance should not be overlooked or dismissed lightly. (There are some aspects of the medical model that clinicians might keep in mind, because physicians rarely omit collecting a blood sample or an urine specimen simply because patients do not want to provide them.) It is important to determine the reason that the patient has endorsed the items inconsistently. For example, if patients have limited intellectual ability or limited educational opportunity, taped administration should be used to avoid these problems. These factors also will have to be considered in planning treatment. If patients are too toxic or neuropsychologically or psychiatrically impaired to complete the MMPI-2 consistently, further assessment or any educational or psychotherapy interventions will need to be delayed until they can function appropriately. There are a number of general issues that arise based on the accuracy of item endorsement (see Table 33.2). Patients who are willing and able to provide an accurate self-description have good insight into their behavior and can share it openly, which TABLE 33.2 Accuracy of Item Endorsement in Treatment Planning Issue Implications Overreporting of Patients are reporting more severe and extensive psychopathology psychopathology than would be expected given their background and clinical history. They are likely to terminate treatment prematurely despite their reporting severe problems. Their reasons for being evaluated should be reviewed to determine whether there is any apparent motivation to overreport. Accurate Patients have good insight into their behavior and are able reporting and willing to provide an accurate evaluation of themselves. of They are engaged in the assessment process and would be psychopathology expected to engage in treatment. Content-based interpretations of the MMPI-2 will describe their problems accurately. Underreporting Patients are not reporting any form of psychopathology of despite their presence in a treatment setting. Their problems psychopathology are likely to be chronic and not producing distress for them. They are experiencing little motivation for any type of intervention.

< previous page

page_1027

next page >

< previous page

page_1028

next page > Page 1028

augurs well for any therapeutic intervention. Content-based interpretations of the MMPI-2 should reflect accurately the patients' current psychological status in these circumstances. Patients who overreport psychopathology are at high risk to terminate treatment prematurely, despite the seemingly pervasive psychopathology being reported. Alternatively, they may have some reason for exaggerating the severity of their psychopathology, which consequently will interfere with their ability to participate in treatment in a meaningful manner. Patients who underreport psychopathology have little internal motivation for treatment because they are not experiencing emotional distress. These patients' psychopathology and deviant behaviors are very chronic and ego-syntonic (not distressing to them), which is reflected by the lack of elevation in the MMPI-2 profile, and consequently their behaviors are very difficult to change in short-term treatmentif they can be changed at all. Also, if the person taking the MMPI-2 is not the identified patient, then he or she is not bothered by the presence of psychopathology in significant others, which does not bode well for any intervention or treatment. The use of the standard validity and clinical scales of the MMPI-2 in treatment planning is outlined in Tables 33.3 and 33.4, respectively. Probably the most important caveat is that clinicians should note carefully clinical scales that are not elevated. The emphasis on the specific clinical scales that are elevated can lead clinicians to ignore low point scales or scales within the normal range. For example, patients who have T-scores at or below 50 on Scales 2 (Depression) and 7 (Psychasthenia) are not reporting or experiencing any distress over whatever behaviors/symptoms brought them to treatment. Similarly, patients who have low scores on Scales 1 (Hypochondriasis), 2 (Depression), and 3 (Hysteria) have few psychological defenses preventing their behaviors/symptoms from being expressed overtly. Patients who have low scores on Scale 9 (Hypomania) have little or no energy to invest in the treatment process. The implications of these low scores for treatment planning should be apparent. Elevation of the clinical scales indicates that patients are distressed over the existence of behaviors and/or symptoms of psychopathology, not whether patients actually have psychopathology. That is, patients with chronic and/or ego-syntonic behaviors and TABLE 33.3 Use of MMPI-2 Validity Scales/Indexes for Treatment Planning Scale Potential Issues L scale Patients are likely to be naive, psychologically unsophisticated, (T > defensive, and controlled. In an inpatient setting, patients who have a 64) Within-Normal Limit profile (all clinical scales below a T-score of 65) are likely to be psychotic or seriously emotionally disturbed. L No specific interpretations can be made. scale (T < 50) F scale Patients are experiencing rather severe psychopathology, which (T > should be readily apparent, if they are not overreporting 80) psychopathology. It may be necessary to lower their level of distress before making any specific treatment interventions. F scale Patients are not reporting and/or experiencing any form of discomfort (T < or psychological distress. They probably are underreporting the full 50) extent and severity of their problems. K scale Patients are very defensive and guarded. They are reluctant to (T > acknowledge that they have any psychological problems. They will be 60) resistant to any type of treatment intervention. K scale Patients see themselves as having few resources for coping with their (T < problems and they are fearful of being overwhelmed by them. 40) Supportive interventions will be needed initially.

< previous page

page_1028

next page >

< previous page

page_1029

next page > Page 1029

TABLE 33.4 Use of the MMPI-2 Clinical Scales in Treatment Planning Potential Issues

Scale 1 (Hs) Patients focus on vague physical ailments. They are very resistant to > 64 considering that they might have psychological problems. They are pessimistic about being helped. They are argumentative with staff. Treatment will need to reassure them that their ailments will not be ignored. Conservative interventions should be used whenever possible. 1 No specific interpretations can be made. (Hs) < 45 2 (D) > 64 Patients are experiencing distress and likely to be depressed. Their depressive mood should be readily apparent. It is important to determine whether internal or external factors are producing the negative mood state and to plan treatment accordingly. 2 (D) < 45 Patients are not reporting any type of emotional distress either as a result of their presence in treatment or the behaviors/symptoms that led them to consider treatment. The possibility of acting out in an impulsive manner should be evaluated. There is little internal motivation for any type of treatment or intervention. 3 (Hy) Patients are naive, suggestible, and lack insight into their own and > 64 others' behavior. They deny any type of psychological problems. Under stress specific physical ailments will be seen. They look for simplistic, concrete solutions to their problems. Treatment should focus on shortterm goals because there is limited motivation. They initially may be enthusiastic about treatment, then later resist treatment or fail to cooperate. 3 (Hy) Patients are caustic, sarcastic, and socially isolated. They have few < 45 defenses for coping with any problems that they encounter. Wellstructured, behavioral interventions should be used whenever possible. 4 (Pd) Patients are in conflict either with family members and/or persons in > 64 positions of authority. They may make a good initial impression, but more long-term contact will reveal that they are egocentric and have little concern for others. Any treatment should focus on short-term goals with emphasis on behavior change rather than their verbalized intent to change, no matter how sincere they may sound. Low scores on Scales 2 (Depression) and 7 (Psychasthenia) make elevations on Scale 4 particularly pathognomonic. 4 (Pd) Patients are rigid, conventional, and have little psychological insight < 45 into themselves or others. Explicit, behavioral directives to change will be most productive if there is motivation to follow them. 5 (Mf) Patients do not identify with their traditional gender role and are > 64 concerned about sexual issues. Male patients frequently worry and their feelings are easily hurt. Women patients are confident and satisfied with themselves. 5 (Mf) Patients identify with their traditional gender role. Male patients are < 40 confident and self-assured. Female patients are trusting of and depend on others and lack self-confidence. Their feelings are easily hurt and they cry easily. 6 (Pa) Patients are suspicious, hostile, and overly sensitive, which is readily > 64 apparent to everyone. Any treatment is problematic because of the difficulty in developing a therapeutic relationship based on trust. Any intervention must be instituted slowly. 6 (Pa) Patients have narrow interests and they tend to be insensitive to and < 45 unaware of the motives of others. Explicit, behavioral directives to change will be most productive if there is motivation to follow them. 7 (Pt) Patients are worried, tense, and indecisive, which is readily apparent to

> 64 everyone. Ruminative and obsessive behaviors may be seen. It may be necessary to lower their level of anxiety before implementing treatment of other symptoms. 7 (Pt) Patients are secure and comfortable with themselves, which augurs < 45 poorly for any type of intervention in a clinical setting. 8 (Sc) Patients feel alienated and remote from the environment and others. At > 64 higher elevations ( > 79), difficulties in logic and judgment may become evident. Interventions should be directive and supportive. Psychotropic medications may be needed. 8 (Sc) Patients are conventional, concrete, and unimaginative. Any < 45 intervention should be behavioral, directive, and focus on short-term goals. 9 (Ma) Patients are overactive, impulsive, emotionally labile, and euphoric > 64 with occasional outbursts of anger. They may need to be evaluated for a manic mood disorder. Short-term behavioral goals should be pursued.

< previous page

page_1029

next page >

< previous page

page_1030

next page > Page 1030

TABLE 33.4 (Continued) Potential Issues

Scale 9 (Ma) Patients have a low energy and activity level. They may have a serious < 45 depressive disorder, which should be evaluated carefully. Suicide potential should be reviewed particularly as they start to feel better. 0 (Sc) Patients are introverted, shy, and socially insecure. They withdraw > 64 from and avoid significant others that exacerbate their distress. Interventions need to address specifically their tendency to withdraw and avoid others. 0 (Sc) Patients are extroverted, gregarious, and socially poised. They may < 45 have difficulty in forming intimate relationships with others at very low scores (T < 35). They are unlikely to have a thought disorder. The probability of acting out is increased. Group therapies are particularly useful with these patients. symptomatology may not elevate any of the clinical scales above a T-score of 64, which makes it difficult to distinguish between a normal individual and a severely disturbed patient on the MMPI-2 without access to additional information. Scales 5 (Masculinity-Femininity) and 0 (Social Introversion) moderate how patients will express the psychopathology that is being tapped by a specific clinical scale. Men who elevate Scales 5 and 0 above Tscores of 64 will be passive, introverted, and avoid social interactions, and these characteristics decrease the probability of their acting out and increase the probability of their obsessing, ruminating, and fantasizing. Conversely, men who have T-scores below 40 on Scales 5 and 0 will be active, outgoing, and extraverted, and these characteristics increase the probability of their acting out and decrease the probability of their obsessing, ruminating, and fantasizing. (These same statements will hold for women of their T-score on Scale 5 is the opposite of what has been indicated for men.) For example, the treatment plan for a patient with a T-score above 64 on Scale 0 should encourage the patient to interact with friends and small groups of acquaintances and to avoid isolating and withdrawing from others. Group treatment may be particularly helpful in such patients if they are supported through the initial stages of becoming comfortable with others. The clinician should look for consistency among the clinical scales that are elevated in deciding the importance of particular areas in treatment planning. If Scale 1 (Hypochondriasis) is elevated and it is to be interpreted as reflecting the presence of somatization, other clinical scales (3, Hysteria, or 7, Psychasthenia) or content scales (HEA, Health Concerns) suggestive of somatization should be elevated. If such concordance is not found among somatization scales, some other interpretation of Scale 1 that is consistent with the other elevations or lack thereof must be considered. The more concordance that is found among scales that have the same correlates and/or scale content, the more treatment planning should emphasize these particular areas. The specific uses of the content scales of the MMPI-2 in treatment planning are outlined in Table 33.5. There are several caveats to keep in mind, however, when interpreting the content scales. First, the clinician must administer all 567 items so that all of the items on these scales can be scored. Clinicians are well advised to use these scales routinely because they provide valuable information about patients with little additional time required for administration. Second, it is mandatory that patients be able and willing to provide an accurate selfdescription because the content scales are very susceptible to overreporting or underreporting of psychopathology due to the face valid or obvious nature of the items. The implications of these two response styles for

< previous page

page_1030

next page >

< previous page

page_1031

next page > Page 1031

TABLE 33.5 Use of the MMPI-2 Content Scales in Treatment Planning Scale Potential Issues ANX Patients report general symptoms of anxiety, (Anxiety) nervousness, worries, and sleep and concentration > 64 difficulties. Depending on the level of anxiety, psychotropi medications or other anxiety-reducing techniques may be needed before implementing other interventions. FRS (Fears) Patients report a large number of specific fears (FRS1), > 64 as well as generalized fearfulness (FRS2). These specific fears respond well to systematic desensitization if they are not part of a larger set of fear and anxiety symptoms. OBS Patients have great difficulty making decisions, Obsessiveness) ruminate excessively, worry > 64] excessively, and have intrusive thoughts. They are good candidates for most insight-oriented therapies. DEP Patients have difficulty getting going and getting things (Depression) done in their life (DEP1). > 64 They have a depressive mood and thoughts (DEP2) and a negative selfconcept (DEP3). Suicide potential should be evaluated (DEP4). Their depression has an angry component that involves blaming others, particularly when DEP is higher (+15 T points) than Scale 2 (Depression). HEA (Health Patients report gastrointestinal symptoms (HEA1) and Concerns) symptoms associated with > 64 neurological functioning (HEA2), as well as general concerns about their health (HEA3). Their physical symptoms may be another manifestation of their emotional distress. They need to be reassured that their symptoms are being taken seriously. BIZ (Bizarre Patients report overtly psychotic symptoms such as Mentation) paranoid ideation an hallucinations (BIZ1) and various > 64 peculiar and strange experiences (BIZ2) Psychotropic medications may be indicated, as well as hospitalization. ANG (Anger) Patients report displaying a number of explosive > 64 tendencies such as hitting and smashing things (ANG1), as well as being irritable, grouchy, and impatient (ANG2). Assertiveness training and/or anger-control techniques should be implemented as part of treatment. CYN Patients expect others are only interested in their own (Cynicism) welfare (CYN1). They also doubt and are suspicious of > 64 others' motives (CYN2). Establishing a trusting relationship is imperative if any progress is to be made in therapy. ASP Patients have attitudes similar to individuals who break (Antisocial the law (ASP1), even if they do not actually engage in Practices) antisocial behavior. They report stealing things and > 64 other problem behaviors and antisocial practices during their school years (ASP2). It is important to determine whether these behaviors are still being displayed. Group interventions with similar patients will be most productive. TPA (Type A) Patients frequently become impatient, grouchy, > 64 irritable, and annoyed (TPA1). They are hard-driving, fast-moving, and competitive individuals (TPA2). The possibility of a manic mood disorder should be considered LSE (Low Patients have very low opinions of themselves (LSE1),

Selfesteem) > 64

and they are u uncomfortable if people say nice things about them. They give in easily to others (LSE2). Interventions need to be very supportive and allow ample time for change. SOD (Social Patients are very uneasy around others and are happier Discomfort) by themselves (SOD1). They see themselves as shy and > 64 uncomfortable in social situations (SOD2). They need to be supported and encouraged to participate in treatment until they are comfortable interacting with others. FAM (Family Patients report considerable familial discord (FAM1). Problems) Their families are reported > 64 to lack love, support, and companionship. They feel alienated from and unattached to their family (FAM2). Involvement of the family system in treattment may be important unless the patient needs to be emancipated from them. WRK (Work Patients report that they are not as able to work as they Interference) once were, and that they work under a great deal of > 64 tension. They are tired, lack energy, and sick of what they have to do. It is important to determine specifically whether the reported symptoms and behaviors actually interfere with their work because WRK is primarily a measure of general distress. TRT (Negative Patients are unmotivated and feel unable themselves Treatment (TRT1). They dislike going Indicators) > to doctors and they believe that they should not discuss 64 their personal problems with others (TRT2). They prefer to take drugs or medicine because talking about problems does not help them. Patients with depressive mood disorders will elevate TRT because it is primarily a measure of general distress, so clinicians need to be cautious about interpreting TRT in a characterologi manner.

< previous page

page_1031

next page >

< previous page

page_1032

next page > Page 1032

treatment planning were described in Table 33.2. Third, elevation of the content scales reflects that patients are aware of and willing to report the behaviors being assessed by the specific scale. When patients have insight into their behavior and are willing to report it accurately, these scales provide a quick overview of how patients are viewing and responding to their current circumstances. Fourth, the absence of elevation of the content scales can reflect either that the behaviors are not characteristic of the patient or the patient is unaware of or unwilling to acknowledge these behaviors. When the content scales are not elevated, clinicians should determine which of the these two alternative interpretations is more appropriate. However, clinicians are cautioned about making specific interpretations of low scores on the content scales, because no research has validated their correlates. Finally, the relative elevation of the content scales can be used as an index of the importance of that specific content area to the patient because of the use of uniform T-scores that make the percentiles equivalent across these scales. When a content scale is evaluated to a T-score of 60 or higher, the clinician should review the content component scales (Ben-Porath & Sherwood, 1993) to determine the salient components producing the elevation. The content component scales are particularly useful in those cases where an elevation can reflect two very disparate areas of content within the scale that have very different implications for treatment. The best example of this circumstance is an elevation to a T-score of 60 or higher on ANG (Anger) that could reflect either ANGI (Explosive Behavior) or ANG2 (Irritability). The former component scale reflects angry behaviors, whereas the latter reflects an angry mood. A similar situation occurs on ASP (Antisocial Practices), which is composed of ASPI (Antisocial Practices) and ASP2 (Antisocial Attitudes). Finally, it is imperative that clinicians know that patients can have raw scores of zero (T= 45) on DEP4 (Suicidal Ideation) and the patient still can endorse items with suicidal content (e.g., 150, 505, 524), because all of these items are not found within DEP4. The specific uses of the factor scales (Welsh, 1956; A, Anxiety, and R, Repression) of the MMPI-2 in treatment planning are outlined in Table 33.6. Clinicians should score and interpret the factor scales routinely because they provide valuable information for treatment planning. Low scores on A should be interpreted in a similar manner as low scores on Scales 2 (Depression) and 7 (Psychasthenia) in that they have the same implication for treatment planning as were described earlier. Low scores on both Factors A and R are particularly significant, because patients' psychopathology is well-engrained and not distressing to them, which limits motivation for any short-term treatment. Finally, clinicians should check a number of the specific items on the MMPI-2, which have been identified as being potentially important for treatment (see Table 33.7). Clinicians should be cautious about attaching too much significance to the response to any single MMPI-2 item, because an item can be thought of as a scale with only one item, which obviously has limited psychometric qualities. However, when patients endorse a number of items within a specific area, clinicians would be well advised to review them to determine their implications for treatment planning. The items relating to dangerous to self (150, 303, 505, 506, 520, 524, and 546) or others (150, 540, 542, 548) must be examined everytime the MMPI-2 is administered, because these areas are an integral part of any treatment plan. Clinicians should check the patient's answer sheet to determine the responses to these specific items and decide whether they are worthy of being pursued via an interview. Omitting any of these items also warrants careful review of the patient's rationale for not answering them. In addition, clinicians need to document that they have reviewed the patient's responses to these items because they could be integral to any litigation that might arise around standards of care.

< previous page

page_1032

next page >

< previous page

page_1033

next page > Page 1033

TABLE 33.6 Use of the MMPI-2 Factor Scales in Treatment Planning Scale Potential Issues A (Anxiety) Patients are reporting general distress and maladjustment, > 69 and which may be arising either from internal or external R sources. They are aware that they are distressed and they are (Repression) try to control its overt expression. They are motivated for > 59 most types of psychological intervention. A (Anxiety) Patients are reporting general distress and maladjustment. > 69 and However, they are not particularly concerned about these R problems, which they are likely to attribute to causes outside (Repression) themselves. Once the immediate distress has passed, these < 40 patients have little motivation for treatment. Consequently, treatment should focus on short-term goals. A (Anxiety) Patients are not reporting general distress and they are < 50 and confident in their own abilities. They are denying and R repressing any awareness that they .might have problems, (Repression) and they are reluctant to examine their own behavior. Short> 59 term, behaviorally oriented interventions are indicated. A (Anxiety) Patients are not reporting general distress and they see < 50 and themselves as being confident in their own abilities. In a R clinical setting, they have little awareness that they have any (Repression) problems that need to be repressed and denied. They have < 40 very chronic, ego-syntonic behaviors, which makes any type of treatment or intervention difficult.

Content Area Anger Depression Family Problems Hopelessness Poor Impulse Control Paranoia Physical Ailments Psychoticism

TABLE 33.7 Use of MMPI-2 Items in Treatment Planning Item Numbers 37(T), 134(T), 150(T), 372(F), 389(T), 478(T), 513(T), 540(T), 542(T), 548(T) 38(T), 56(T), 65(T), 95(F), 143(F), 234(T), 273(T), 388(F), 450(T), 463(T), 526(T) 21(T), 83(F), 379(T), 455(F), 478(T) 22(T), 71(T), 75(F), 92(T), 130(T), 306(T), 454(T), 516(T), 539(T), 554(T) 23(T), 85(T), 240(T), 266(F), 530(T), 564(F) 99(T), 138(T), 144(T), 162(T), 216(T), 228(T), 259(T), 314(F), 333(T), 424(T) 18(T), 36(T), 40(T), 47(F), 117(F), 142(F), 295(F) 24(T), 60(T), 72(T), 96(T), 198(T), 298(T), 319(T), 336(T), 355(T), 361(T), 551(T) 12(F), 34(F), 121(F), 268(T), 371(T), 470(T) 3(F), 39(T)

Sexuality Sleep Disturbance Substance 264(T), 387(T), 429(F), 487(T), 489(T), 511(T), 527(T), Abuse 544(T) Suicidality 150(T), 303(T), 505(T), 506(T), 520(T), 524(T), 546(T) Note. Clinicians will need to consult the MMPI-2 booklet for the actual content of the indicated items. Clinicians also must realize that patients' responses to all of these items are not reproduced in any listing of critical item. Consequently, clinicians will need to check the patient's answer sheet to determine the responses to these items. Any of these items that are omitted also should be reviewed carefully with the patient.

Clinicians frequently are confronted with MMPI-2s in which the interpretive information for a given scale is or may seem to be contradictory to the information provided by another scale. There are several procedures that can be followed to resolve such inconsistencies. Probably the best method for resolving such discrepancies involves exploring the issue with the patient directly. If the patient is not available for some reason and the patient has endorsed the items accurately, then the MMPI-2 content scales and the specific items endorsed should provide a quick means of resolving any discrepancies that may exist with the empirically derived clinical scales that may have a number of different correlates. Clinicians should realize that most, if not all, MMPI-2s

< previous page

page_1034

next page > Page 1034

will have some minor discrepancies among a few scales and they should not expect perfect concordance. If the MMPI-2 is to be used in repeat administrations to monitor change in the patient across treatment or to assess the outcome of treatment, clinicians should realize that a number of the scales (1, Hypochondriasis; 4, Psychopathic Deviate; 8, Schizophrenia; 0, Social Introversion; MAC-R, MacAndrew Alcoholism ScaleRevised) and items are designed to assess characterologic qualities and past behaviors that they will not change over time. Other scales (2, Depression; 7, Psychasthenia; A, Anxiety) and items are more reactive and would be expected to reflect the patients' changes. Consequently, clinicians should not expect to have consistent changes across all of the scales. Also, it should be remembered that the MMPI-2 is designed to be an initial screening instrument to assess the types of psychopathology that are being manifested in a particular patient, and the norms reflect the typical defensiveness that is to be expected on initial screening. Finally, the length of the MMPI-2 precludes giving it repeatedly over a short time interval (such as weekly) to monitor the course of treatment. It would be feasible to readminister the MMPI-2 on a monthly basis. If the MMPI-2 were going to be used as a dependent variable to assess the changes across the course of treatment or the outcome of treatment, clinicians should treat these changes as differences in scale scores instead of differences in clinical status based on the standard profile. That is, it probably is more accurate to say that the patient's score on Scale 2 decreased 24 T points across treatment than to say that the patient's T-score of 86 at the start of treatment was in the clinical range, and the T-score of 62 at the end of treatment is now within the normal range. It would be expected that scores on Scales F, 2 (Depression), and 7 (Psychasthenia), as well as the general elevation of the entire profile, which are measures of the general level of distress being reported by the patient, should decrease 10 or more T-score points if the treatment has been effective regardless of the nature of the intervention. The MMPI-2 content scales that were elevated reflecting the specific concerns of the patient and that were the focus of treatment also should decrease 10 or more T-score points. These caveats about using the MMPI-2 to monitor change across treatment do not apply to those circumstances in which the MMPI-2 is used as an independent variable. Clinicians should find it very profitable to determine what codetypes and patterns of MMPI-2 scales at the initiation of treatment are related to outcome, particularly within very homogeneous subgroups of patients. Use with Other Evaluation Data It is necessary to supplement the MMPI-2 with other evaluation data such as a clinical interview to enhance the accuracy of any clinical predictions that will be made. Because the MMPI was developed long before there was any widespread acceptance of the multitude of personality disorders, the MMPI-2 has limited success in this area. (The development of MMPI Personality Disorder scales discussed in Morey, Waugh, and Blashfield (1985), which are essentially intact on the MMPI-2, may provide additional information in this area, but to date the research has been too limited to provide much specific direction; cf. Morey & Smith, 1988). Consequently, it is helpful to supplement the MMPI-2 with an instrument, such as the Millon Clinical Multiaxial Inventory-III

< previous page

page_1034

next page >

< previous page

page_1035

next page > Page 1035

(MCMI-III; Millon, 1994), that is specifically designed to assess personality disorders; but there has been substantial debate over how well Millon's characterization of personality disorders fits the DSM classification system (McCann, 1991; Widiger & Sanderson, 1987). The MCMI-III does not identify Axis I disorders as well as the MMPI-2, so the routine use of both instruments would seem to be indicated anytime there is reason to suspect that the patient may have both Axes I and II disorders. The MMPI-2 also has difficulty in identifying patients who have ''well-intact" psychotic or characterologic processes. In these cases, a Rorschach or some other projective technique can provide useful information on the intactness of the patient's cognitive processes (Exner, 1993). Finally, it would seem advisable that clinicians should have some estimate of the patient's level of intellectual functioning. There is a substantial line of research indicating that the correlates of specific MMPI-2 codetypes or scales may change based on the patient's level of intelligence, particularly when trying to predict violent or acting-out behavior. Provision of Feedback Regarding Assessment Findings Patients should be provided with the results of their MMPI-2 routinely so that they understand how it is being used in planning treatment. This sharing of information with patients helps to ensure that they will take the MMPI-2 appropriately without distorting their responses. It also makes them a meaningful participant in the treatment process. Finn (1996) provided an excellent overview of how the assessment process can be therapeutic for patients, and his procedure warrants use in all clinical settings. When patients have insight into their behavior and are willing to report it accurately, the MMPI-2 content scales provide a summary of how patients are viewing and responding to their current circumstances, all of which can be shared directly with them. It probably is better not to share the standard profile for the basic validity and clinical scales with patients because of the attributions that they may make to the scale names. Lewak, Marks, and Nelson (1990) devoted an entire book to providing feedback to patients that should be consulted by the interested reader. Limitations/Potential Problems in Use In one sense, the greatest limitation of the MMPI-2 has been its success, which has created an impression that the instrument can be used in any setting to evaluate any type of problem. Frequently, clinicians' expectations of the MMPI-2 far exceed reality for any psychometric instrument. The importance of ensuring that patients have sufficient intellectual ability and reading skills to complete the MMPI-2 appropriately cannot be overemphasized. One of the primary causes of invalid MMPI-2s is the inability to read and comprehend the items, which require an eighth-grade reading level. Standard, cassette-tape administrations of the MMPI-2 should be used any time that there is reason to suspect that the patient's intellectual or reading ability may be inadequate for standard administration in a paper-and-pencil format.

< previous page

page_1035

next page >

< previous page

page_1036

next page > Page 1036

Use of the MMPI-2 For Treatment Outcome Assessment General Issues The MMPI-2 has been used less frequently to assess treatment outcome, because the instrument was developed and used primarily to provide an initial assessment for treatment planning. There have been two common themes in the use of the original MMPI to assess treatment outcome. The most frequent research examines the relation between MMPI codetypes or scales assessed at the onset of treatment with whatever outcome measure is being used (i.e., the MMPI has been used as an independent variable). There are a smaller group of studies that have used the MMPI as the dependent variable and examined the changes that occurred in MMPI scales as a result of treatment. In these latter studies, the sensitivity of the MMPI to changes in the patients' status may be limited because the items frequently are worded in the past tense and ask about past rather than current behaviors (Scapinello & Blanchard, 1987). For example, patients would not be expected to change their response to the item, "I have used alcohol excessively," regardless of the effectiveness of their alcohol treatment. It is interesting to speculate that the finding that the MacAndrew Alcoholism scale (MAC; MacAndrew, 1965) does not change as a result of treatment (Gallucci, Kay, & Thornby, 1989; Huber & Danahy, 1975; Rohan, Tatro, & Rotman, 1969) may reflect that its items are predominantly written in the past tense and ask about past behaviors. Evaluation Against Criteria for Outcome Measures The MMPI-2 clearly meets most of the NIMH ideal criteria for outcome measures, each of which is examined briefly in turn. The MMPI-2 is relevant and appropriate for assessing treatment outcomes in patient samples where psychopathology is being evaluated, particularly if the emphasis is being placed on DSM-IV Axis I disorders. The methodology for administering, scoring, and interpreting the MMPI-2 is straightforward and easily implemented across treatment settings. MMPI-2 scores on the various scales and codetypes have clear and objective referents that are consistent across clients. The MMPI-2 is not constructed so that clinicians and/or significant others can have their perspective of the patient directly measured. However, clinicians and/or significant others can report the patient's anticipated score as being high or low (elevated or not) on the various scales such as depression, anxiety, and so on to obtain such information. The MMPI-2 has adequate to good psychometric characteristics, and it is particularly sensitive to any attempt by the patient to distort responses to items. The MMPI-2 is relatively inexpensive, with the cost primarily dependent on the degree of computer-based assistance desired in its administration, scoring, and interpretation. The long history of usage of the MMPI, which has been extended with the MMPI-2, makes it easily understandable by most clinicians. The patient's MMPI-2 profile can be plotted quickly and provides an easy basis for providing feedback to the patient, other clinicians, and significant others. The MMPI-2 content scales are particularly good for direct feedback, because they provide a description of how patients report their psychopathology.

< previous page

page_1036

next page >

< previous page

page_1037

next page > Page 1037

The MMPI-2 is very useful in making clinical diagnoses, assessments, and treatment recommendations for a broad range of patients. Because the MMPI was developed in an empirical manner, the MMPI-2 (and MMPI) scales are compatible with a wide range of theories of psychopathology and the goals and procedures of various treatment approaches. In many respects, the MMPI-2 will be the standard against which other tests are evaluated in meeting these criteria. Research Applications and Findings Before using the MMPI-2 to predict treatment outcome, there are several conclusions that are apparent based on the research with the original MMPI in this area. First, the original MMPI is not related to treatment outcome in any setting when the patients are examined as a single heterogeneous group. Researchers frequently assume that there is a "typical" patient within a given diagnostic group or setting, and they do not seem to consider that there may be an interaction between type of patient and the outcome of treatment. Second, background and demographic variables contribute more variance than personality variables, when they are examined within the same study (Hoffmann & Jansen, 1973; Lin, 1975; Nathan & Skinstad, 1987). Thus, it is important not to attribute too much significance to those studies that only report MMPI variables. Third, the original MMPI may be related to treatment outcome when specific subgroups are identified within a particular diagnostic group, but these findings are inconsistently replicated across studies. Finally, a number of these studies have used cluster analyses of the original MMPI data, seemingly with little awareness of the multitude of problems that exist with these sets of procedures (cf. Blashfield, 1980). The MMPI and MMPI-2 research findings are summarized within three primary groupings: alcohol/drug/substance abuse, chronic pain, and other specific psychiatric diagnoses. These groups encompass most of the systematic data, and the results are germane to a number of different clinical groups. There are several reviews of this literature that should be consulted by the interested reader. Graham and Strenger (1988) and Greene and Garvin (1988) reviewed the MMPI research in alcoholism, and Stark (1992) reviewed the entire literature on attrition from substance abuse treatment. Nathan and Skinstad (1987) reviewed the problems of assessing outcomes of treatment in alcoholics. Their work should be read by anyone who is interested in doing research on this topic. Love and Peck (1987), Snyder (1990), and Keller and Butcher (1991) reviewed the MMPI research in chronic pain. Keller and Butcher (1991) also provided specific MMPI-2 data on a large sample of chronic pain patients that should be read by any clinician working in this area. Substance Abuse. Several studies found that alcoholics and drug addicts who have codetypes involving Scales 4 (Psychopathic Deviate) and 9 (Hypomania) are more likely to drop out of treatment or to have poorer outcomes than alcoholics with other codetypes (Aaronson, Dent, & Kline, 1996; Beasley et al., 1991; Huber & Danahy, 1975; Lin, 1975; Lurie, 1995; Marshall & Roiger, 1996; Pekarik, Jones, & Blodgett, 1986; Pettinati, Sugerman, & Maurer, 1982; Rounsaville, Dolinsky, Babor, & Meyer, 1987; Sheppard, Smith, & Rosenbaum, 1988). However, numerous other studies have not been able to replicate these findings in alcoholics (Douglas, 1994; Filstead, Drachman, Rossi, & Getsinger, 1983; McWilliams & Brown, 1977; Wilkinson, Prado, Williams, & Schnadt, 1971) or drug addicts (Craig, 1984). Some studies have reported that alcoholics who are characterized

< previous page

page_1037

next page >

< previous page

page_1038

next page > Page 1038

by denial and minimalization on the MMPI are more prone to drop out of treatment (Hoffmann & Jansen, 1973; Mozdzierz, Macchitelli, & Conway, 1973), although others have not been able to replicate these results (Belter, 1993; Krasnoff, 1977). Finally, a number of investigators have reported that alcoholics who have the highest profile elevations on the MMPI are more likely to drop out of treatment or have poorer outcomes (Albott, 1982; Knapp, Templer, Cannon, & Dobson, 1991; Pettinati et al., 1982; Svanum & Dallas, 1981; Zuckerman, Sola, Masterson, & Angelone, 1975). It is not clear whether these higher elevations reflected the presence of more psychopathology or the overreporting of psychopathology. It is important to delineate which of these alternative explanations is accurate because they would have different implications for treatment. One group of investigators (Hoffmann, Loper, & Kammeier, 1974; Kammeier, Hoffmann, & Loper, 1973; Loper, Kammeier, & Hoffmann, 1973) examined the MMPI scores of male college students for whom an average of 13 years had elapsed between college admission and entrance into an alcoholism treatment program. These investigators compared the alcoholics' MAC scale scores on admission to college and at entrance into treatment with the scores of a control group of students who were admitted to college at the same time. The alcoholics had higher MAC scale scores both at college admission and at entrance into treatment than the control group of students. Using a cutting score of 26, the MAC scale correctly classified 72% of the alcoholic sample both at college admission and at entrance into treatment. The consistency of classification by the MAC scale across such an extensive time interval suggests that the MAC scale is tapping a dimension of behavior that is resistant to change. This conclusion also is supported by the finding that MAC scores in alcoholics remain elevated after treatment (Huber & Danahy, 1975; Gallucci et al., 1989; Rohan et al., 1969), as was noted earlier. However, Schuckit, Klein, Twitchell, and Smith (1994) found that the MAC did not predict prospectively men who later became alcoholic. There is no easy way to reconcile the diametrically opposed results of these groups of investigators. Allen (1991) and Greene (1994) suggested that patients who have high versus low scores on the MAC may need different types of treatment. These suggestions are consistent with MacAndrew's (1981) formulation of the differences between high and low scorers on the MAC. In substance abuse settings, clinicians should be aware that patients who display psychopathic tendencies or who are characterized by denial and minimalization may be more prone to drop out of treatment and they should confront these issues directly. High scorers on the MAC are more likely to be risk-takers who are extraverted and impulsive, and they may have better treatment outcomes in a group-oriented and confrontational program. On the other hand, low scorers are more likely to be risk-avoiders who are introverted, withdrawn, and depressed, and they may have better treatment outcomes in a less confrontational and more supportive program. Chronic Pain. The research on chronic pain patients and treatment outcome is very similar to that already cited on substance misuse. A number of investigators have reported that elevations on Scales 1 (Hypochondriasis) and/or 3 (Hysteria) (Barnes, Smith, Gatchel, & Mayer, 1989; Bieliauskas, Graziano, Kullgren, & Roper, 1994; Herron & Pheasant, 1982; Long, 1981; McCreary, Turner, & Dawson, 1977; Sternbach, Wolf, Murphy, & Akeson, 1973; Turner, Herron, & Weiner, 1986) or more elevated profiles in general (Bombardier, Divine, Jordan, Brooks, & Neelon, 1993; Costello, Hulsey, Schoenfeld, & Ramamurthy, 1987; Gallagher et al., 1989; Naliboff, McCreary, McArthur, Cohen, & Gottlieb, 1988) are related to poorer outcomes, but others have

< previous page

page_1038

next page >

< previous page

page_1039

next page > Page 1039

been unable to replicate these findings (King & Snow, 1989; Kleinke & Spangler, 1988). Other investigators (Costello et al., 1987; Long, 1981; Strassberg, Reimherr, Ward, Russell, & Cole, 1981) have reported that pain patients with normal limit profiles (no clinical scale at or above a T-score of 70 on the MMPI) have better outcomes. Few differences in treatment outcome are found when groups of pain patients formed by cluster analysis are contrasted with each other (Guck, Meilman, Skultety, & Poloni, 1988; McArthur, Cohen, Gottlieb, Naliboff, & Schandler, 1987; Moore, Arementrout, Parker, & Kiviahan, 1986). These findings could indicate that multidisciplinary pain treatment programs work equally well with a variety of patients, eliminating any differences among the groups, or that the MMPI and MMPI-2 simply are unrelated to treatment outcome. Bieliauskas et al. (1994) found no relation between any MMPI scale and the number of previous back surgeries, that is, there were no signs of increased psychological distress with more back surgeries. It does appear that elevations on Scales 1 and/or 3 frequently are related to poorer outcomes both at the end of treatment and longterm follow-up, although the effect sizes are modest at best. Other Diagnoses. Scales 1 (Hypochondriasis) and 3 (Hysteria) also appear to be related to poorer outcomes in psychotherapy. DuBrin and Zastowny (1988) found that elevations on Scales 1 and 3 were related to dropping out of long-term psychotherapy, and Barth et al. (1988) found that these same two scales did not change in a 2-year follow-up of short-term psychotherapy. Barth et al. did find that Scale 7 (Psychasthenia) decreased across this 2-year period. Fals-Stewart and Schafer (1993) reported that Scales 2 (Depression), 8 (Schizophrenia), and 0 (Social Introversion) were related to attending behavior therapy sessions in patients with obsessive-compulsive disorder, whereas Scales 1 and 3 were not. High scores on Cook and Medley's (1954) Hostility (Ho) scale have been implicated as a risk factor in coronary heart disease in three prospective studies (Barefoot, Dahlstrom, & Williams, 1983; Shekelle, Gale, Ostfeld, & Paul, 1983; Williams et al., 1990). However, Maruta et al. (1993) and Hearn, Murray, and Luepker (1989) were unable to replicate these findings in long-term (20 and 33 years, respectively) follow-up studies. Maruta et al. did find that Ho was related to the development of coronary heart disease, but only when the risk factors of age and gender were ignored. Several studies have examined treatment outcome in patients with eating disorders (Gundersen, 1989; Schork, Eckert, & Halmi, 1994; Sunday, Reeman, Eckert, & Halmi, 1996), sexual abusers (Chaffin, 1992; Miner & Dwyer, 1995), and patients with PTSD (Munley, Bains, Frazee, & Schwartz, 1994; Schnurr, Friedman, & Rosenberg, 1993), with higher elevations on the clinical scales being associated with poorer outcomes. Researchers also have investigated the ability of the MMPI to predict treatment outcome in patients with sleep disturbances (Edinger, Stout, & Hoelscher, 1988; Klonoff, Fleetham, Taylor, & Clark, 1987), and headaches (Evans & Blanchard, 1988; Onorato & Tsushima, 1983; Williams, Thompson, Haber, & Raczynski, 1986). No clear pattern of results was found within or across these various groups of patients. Clinical Applications It appears that patients who have the most elevated MMPI-2 profiles are likely to have poorer outcomes regardless of the setting. It is important to assess whether these elevated profiles are reflecting more severe psychopathology or the overreporting of psychopathology, and then plan the course of treatment accordingly.

< previous page

page_1039

next page >

< previous page

page_1040

next page > Page 1040

Use with Other Evaluation Data As already noted, the better predictors of treatment outcome tend to be background and demographic variables such as social support systems, employment status, and so on. Thus, it is important to consider the role of these variables in conjunction with the MMPI-2 when assessing the outcome of treatment. It would be particularly important to see what additional variance is accounted for by the MMPI-2 in such assessments. Provision of Feedback Regarding Assessment Findings Because the comments about feedback of the assessments of outcome are the same as those about planning treatment, they will not be repeated here. In addition, it is important to inoculate patients against specific negative outcomes such as a high probability of dropping out so that they aware of these issues prior to their occurrence. Limitations/Potential Problems in Use The primary problem in using the MMPI-2 in assessing outcome of treatment is the fact that background and demographic variables tend to be better predictors (as noted earlier). Consequently, clinicians need to be cautious about relying too heavily on the MMPI-2 and not giving adequate weight to such variables. Case Study The patient is a 20-year-old White woman who has a 5-year history of insulin-dependent diabetes. She was referred for a psychiatric evaluation of depression by her family physician. She separated from her husband about 6 months earlier and moved across country to attend nursing school. She reported a depressed mood related to her separation and move, with crying spells and decreased energy when home alone. She feels lonely because she moved and she misses her husband, although neither of them has instituted any attempt toward reconciliation. She had some sleep problems while she was working a night shift as a nurse's aide. These sleep problems abated once she changed to a daytime shift. She performs well on and likes her job. She is well-liked by her supervisor and colleagues. She has not lost any weight. She did not report suicidal ideation in the initial assessment with the psychiatry resident, whose diagnostic impression was Adjustment Disorder with Depressed Mood. Figures 33.1 to 33.3 provide the standard validity and clinical scales, supplementary scales, and content scales for this patient, respectively. The standard profile (Fig. 33.1) is consistent with the history reported earlier. The patient took the MMPI-2 in a consistent (VRIN = 58T) and accurate (F = 58T, K = 43T) manner, which indicates that she is well-motivated and likely to be compliant with suggested treatment plans. The 2-7 codetype along with the low score on Scale 9 (Hypomania) reflects her depressive mood with little energy. The high score on Scale 0 (Social Introversion) and Si1 (Shyness/Self-consciousness) reflects her introversion, shyness, isolation, and tendency to withdraw from others that only serve to exacerbate her loneliness and depression. It will be important for her treatment plan to incorporate procedures for getting her involved with others and counteracting her isolation. Her elevated score on the L (Lie) scale suggests that she is not very psychologically minded, which is the only indicator

< previous page

page_1040

next page >

< previous page

page_1041

next page > Page 1041

Fig. 33.1. MMPI-2 Profile for Basic Scales. Reproduced by permission of the University of Minnesota Press. that she will not be a good candidate for insight-oriented psychotherapies that are suggested by her codetype. Because Scale 1 (Hypochondriasis) is not elevated above a T-score of 64, she does not report any physical symptoms associated with diabetes, so these issues can essentially be deemphasized in her treatment. The supplementary scales (Fig. 33.2) also fit her clinical picture very well. Her simultaneous elevation of both the first factor (A, Welsh Anxiety) and the second factor (R, Welsh Repression) indicates that she is experiencing general emotional distress and she is trying to control or deal with it to the best of her abilities. Her score of 15 on MAC-R (MacAndrew Alcoholism) is somewhat lower than would be expected for her codetype (see Greene, 1991, p. 415), and essentially indicates that she is depressed, introverted, inhibited, and overcontrolled. The potential of misusing the MAC to predict the absence of a problem with substance abuse in this type of patient should be kept in mind once her responses to the MMPI-2 specific substance abuse items are described later. The other scales (FB, Back F; Mt, College Maladjustment; PK, PTSD-Keane; PS, PTSDSchlenger) that are elevated on the supplementary scale profile also are first-factor scales that correlate highly (> .80) with A. The content scales (Fig. 33.3) are generally consistent with her clinical picture, although there are some notable exceptions. The elevations on ANX (Anxiety), OBS (Obsessionality), DEP (Depression), LSE (Low Selfesteem), and SOD (Social Discomfort) are redundant with her scores that were seen on the standard clinical scales and the supplementary scales, providing a solid indication that she is depressed, worried,

< previous page

page_1041

next page >

< previous page

page_1042

next page > Page 1042

Fig. 33.2. MMPI-2 Profile for Supplementary Scales. Reproduced by permission of the University of Minnesota Press. guilty, introverted, and uncomfortable in social situations. Her T-score of 45 (raw score of zero) on DEP4 (Suicidal Ideation) would suggest that she is not suicidal, which is contradicted directly by her response to some of the specific suicidal items not found on this scale. As already noted, low scores on DEP4 cannot be relied on exclusively because all of the suicidal items are not found on this scale. Her mild elevation of HEA (Health Concerns), similar to Scale 1 (Hypochondriasis), again suggests that she does not emphasize physical symptoms even though she is diabetic. The significant elevations on WRK (Work Interference) and TRT (Negative Treatment Indicators) are directly contradictory to her clinical history and outcome, which indicates that these two scales are better measures of the first factor of general distress with which they are correlated highly (> .80) than their intended content (Greene, 1991; Nichols & Greene, 1995). The mild elevation of ANG (Anger) also is somewhat unexpected until it is recalled that this scale has two sets of items; the first set (ANG1 = 61T) indicates that the person physically expresses anger, and the second set (ANG2 = 65T) indicates that the person is moody, irritable, and grouchy. It is somewhat surprising that the difference between these two scales is not larger. The patient endorsed only a small number of specific items outlined in Table 33.7. However, several of these items warrant serious attention. First of all, she endorsed Item 524, "No one knows it but I have tried to kill myself" (True), even though she did not report suicidal ideation or attempts in the psychiatric interview. She also endorsed several items (429, 511) specific to substance abuse, "Once a week or more I get high or drunk" (True) and "Except by doctor's orders I never take drugs or sleeping

< previous page

page_1042

next page >

< previous page

page_1043

next page > Page 1043

Fig. 33.3. MMPI-2 Profile for Content Scales. Reproduced by permission of the University of Minnesota Press. pills" (False), which could have been overlooked if the clinician relied solely on the MAC-R. These items are particularly important given that she is diabetic. Conclusions The MMPI-2 can provide valuable information for the clinician both in planning treatment and assessing the outcome of treatment. Clinicians need to realize the complexity of the questions that they are asking and not expect simple answers to them. Clinicians must start looking for significant subgroups within the patients in a given setting rather than assuming that all patients are alike. In addition, the results of the studies of such subgroups must consider how these groupings compare with those found in other settings and evaluate the role of background and demographic variables in the pattern of scores that are found. References Aaronson, A.L., Dent, O.B., & Kline, C. (1996). Cross-validation of MMPI and MMPI-2 predictor scales. Journal of Clinical Psychology, 52, 311-315. Albott, W.L. (1982). Drop outs from an inpatient treatment program for alcoholics. International Journal of the Addictions, 17, 199-204.

< previous page

page_1043

next page >

< previous page

page_1044

next page > Page 1044

Allen, J.P. (1991). Personality correlates of the MacAndrew Alcoholism Scale: A review of the literature. Psychology of Addictive Behaviors, 5, 59-65. Archer, R.P., Griffin, R., & Aiduk, R. (1995). MMPI-2 clinical correlates for ten common codes. Journal of Personality Assessment, 65, 391-407. Barefoot, J.C., Dahlstrom, W.G., & Williams, Jr., R.B. (1983). Hostility, CHD incidence, and total mortality: A 25-year follow-up study of 255 physicians. Psychosomatic Medicine, 45, 59-63. Barnes, D., Smith, D., Gatchel, R.J., & Mayer, T.G. (1989). Psychosocioeconomic predictors of treatment success/failure in chronic low-back pain patients. Spine, 14, 427-430. Barth, K., Nielsen, G., Haver, B., Havik, O.E., Molstad, E., Rogge, H., & Skatun, M. (1988). Comprehensive assessment of change in patients treated with short-term dynamic psychotherapy: An overview. Psychotherapy & Psychosomatics, 50, 141-150. Beasley, J.D., Grimson, R.C., Bicker, A.A., Closson, W.J., Heusel, C.A., & Faust, F.I. (1991). Follow-up of a cohort of alcoholic patients through 12 months of comprehensive biobehavioral treatment. Journal of Substance Abuse Treatment, 8, 133-142. Belter, K.S. (1993). MMPI-2 assessment in psychiatric and substance abuse inpatients and as a predictor of program attrition for substance users. Unpublished doctoral dissertation, Texas A&M University. Bence, V.M., Sabourin, C., Luty, D.T., & Thackrey, M. (1995). Differential sensitivity of the MMPI-2 depression scales and subscales. Journal of Clinical Psychology, 51, 375-377. Ben-Porath, Y.S., Butcher, J.N., & Graham, J.R. (1991). Contribution of the MMPI-2 content scales to the differential diagnosis of schizophrenia and major depression. Psychological Assessment, 3, 634-640. Ben-Porath, Y.S., & Sherwood, N.E. (1993). The MMPI-2 content component scales: Development, psychometric characteristics, and clinical application. Minneapolis: University of Minnesota Press. Bieliauskas, L.A., Graziano, G.P., Kullgren, K., & Roper, B.L. (1994). Failed back surgeries and MMPI profiles. Journal of Clinical Psychology in Medical Settings, 1, 161-166. Blashfield, R.K. (1980). Propositions regarding the use of cluster analysis in clinical research. Journal of Consulting and Clinical Psychology, 48, 456-459. Bombardier, C.H., Divine, G.W., Jordan, J.S., Brooks, W.B., & Neelon, F.A. (1993). MMPI cluster groups among chronically ill patients: Relationship to illness adjustment and treatment outcome. Journal of Behavioral Medicine, 16, 467-484. Boone, D.R. (1994). Validity of the MMPI-2 depression content scale with psychiatric inpatients. Psychological Reports, 74, 159-162. Butcher, J.N. (Ed.). (1987). Computerized psychological assessment: A practitioner's guide. New York: Basic Books. Butcher, J.N. (1989). Adult clinical system user's guide for the MMPI-2. Minneapolis: University of Minnesota Press. Butcher, J.N. (1990). MMPI-2 in psychological treatment. New York: Oxford University Press. Butcher, J.N., Aldwin, C.M., Levenson, M.R., Ben-Porath, Y.S., Spiro, A., & Bosse, R. (1991). Personality and aging: A study of the MMPI-2 among older men. Psychology and Aging, 6, 361-370. Butcher, J.N., Dahlstrom, W.G., Graham, J.R., Tellegen, A.M., & Kaemmer, B. (1989). MMPI-2: Manual for administration and scoring. Minneapolis: University of Minnesota Press. Butcher, J.N., Graham, J.R., Williams, C.L., & Ben-Porath, Y. (1989). Development and use of the MMPI-2 content scales. Minneapolis: University of Minnesota Press. Butcher, J.N., & Williams, C.L. (1992). Essentials of MMPI-2 and MMPI-A interpretation. Minneapolis: University of Minnesota Press. Butcher, J.N., Williams, C.L., Graham, J.R., Archer, R.P., Tellegen, A., Ben-Porath, Y.S., & Kaemmer, B. (1992). MMPI-A (Minnesota Multiphasic Personality Inventory-Adolescent): Manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press. Caldwell, A.B. (1989). Caldwell report. Los Angeles: Author. Caldwell, A.B. (1990, August). Measurement of the human condition. Paper presented at the annual meeting of the American Psychological Association, Boston. Caldwell, A.B. (1997). [MMPI-2 data for clinical outpatients and personnel applicants.] Unpublished raw data.

< previous page

page_1045

next page > Page 1045

Chaffin, M. (1992). Factors associated with treatment completion and progress among intrafamilial sexual abusers. Child Abuse & Neglect, 16, 251-264. Colligan, R.C., Osborne, D., Swenson, W.M., & Offord, K.P. (1983). The MMPI: A contemporary normative study. New York: Praeger. Colligan, R.C., Osborne, D., Swenson, W.M., & Offord, K.P. (1989). The MMPI: A contemporary normative study of adults (2nd ed.). Odessa, FL: Psychological Assessment Resources. Cook, W.W., & Medley, D.M. (1954). Proposed hostility and pharisaic-virtue scales for the MMPI. Journal of Applied Psychology, 38, 414-418. Costello, R.M., Hulsey, T.L., Schoenfeld, L.S., & Ramamurthy, S. (1987). P-A-I-N: A four cluster MMPI typology for chronic pain. Pain, 29, 1-11. Craig, R.J. (1984). Personality dimensions related to premature termination from an inpatient drug abuse treatment program. Journal of Clinical Psychology, 40, 351-355. Dahlstrom, W.G. (1992). Comparability of two-point high-point code patterns from original MMPI norms to MMPI-2 norms for the restandardization sample. Journal of Personality Assessment, 59, 153-164. Dahlstrom, W.G., Lachar, D., & Dahlstrom, L.E. (1986). MMPI patterns of American minorities. Minneapolis: University of Minnesota Press. Dahlstrom, W.G., & Tellegen, A. (1993). Socioeconomic status and the MMPI-2: The relation of MMPI-2 patterns to levels of education and occupation. Minneapolis: University of Minnesota Press. Dahlstrom, W.G., Welsh, G.S., & Dahlstrom, L.E. (1975). An MMPI handbook: Vol. II. Research applications (rev. ed.). Minneapolis: University of Minnesota Press. Douglas, A. (1994). Typologies and treatment outcome in alcoholicsladdicts: The MMPI, MAC, and Type 1Type 2 subgroups. Unpublished doctoral dissertation, California Institute of Integral Studies, San Francisco. DuBrin, J.R., & Zastowny, T.R. (1988). Predicting early attrition from psychotherapy: An analysis of a large private-practice cohort. Psychotherapy, 25, 393-408. Edinger, J.D., Stout, A.L., & Hoelscher, T.J. (1988). Cluster analysis of insomniacs' MMPI figures: Relation of subtypes to sleep history and treatment outcome. Psychosomatic Medicine, 50, 77-87. Edwards, D.W., Morrison, T.L., & Weissman, H.N. (1993a). The MMPI and MMPI-2 in an outpatient sample: Comparisons of code types, validity scales and clinical scales. Journal of Personality Assessment, 61, 1-18. Edwards, D.W., Morrison, T.L., & Weissman, H.N. (1993b). Uniform versus linear T-scores on the MMPI2/MMPI in an outpatient psychiatric sample: Differential contributions. Psychological Assessment, 5, 499-500. Evans, D.E., & Blanchard, E.B. (1988). Prediction of early termination from the self-regulatory treatment of chronic headache. Biofeedback and Self-Regulation, 13, 245-256. Exner, J.E., Jr. (1993). The Rorschach: A comprehensive system: Vol. I. Basic foundations (3rd ed.). New York: Wiley. Fals-Stewart, W., & Schafer, J. (1993). MMPI correlates of psychotherapy compliance among obsessivecompulsives. Psychopathology, 26, 1-5. Filstead, W.J., Drachman, D.A., Rossi, J.J., & Getsinger, S.H. (1983). The relationship of MMPI subtype membership to demographic variables and treatment outcome among substance misusers. Journal of Studies on Alcohol, 44, 917-922. Finn, S. (1996). Using the MMPI-2 as a therapeutic intervention. Minneapolis: University of Minnesota Press. Friedman, A.F., Webb, J.T., Lewak, R., & Nichols, D.S. (1999). Psychological assessment with the MMPI-2 (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. Gallagher, R.M., Rauh, V., Haugh, L.D., Milhous, R., Callas, P.W., Langelier, R., McClallen, J.M., & Frymoyer, J. (1989). Determinants of return-to-work among low back pain patients. Pain, 39, 55-67. Gallucci, N.T., Kay, D.C., & Thornby, J.I. (1989). The sensitivity of 11 substance abuse scales from the MMPI to change in clinical status. Psychology of Addictive Behaviors, 3, 29-33. Graham, J.R. (1993). MMPI-2: Assessing personality and psychopathology (2nd ed.). New York: Oxford University Press. Graham, J.R., & Strenger, V.E. (1988). MMPI characteristics of alcoholics: A review. Journal of Consulting and Clinical Psychology, 56, 197-205.

< previous page

page_1046

next page > Page 1046

Graham, J.R., Timbrook, R.E., Ben-Porath, Y.S., & Butcher, J.N. (1991). Code-type congruence between MMPI and MMPI-2: Separating fact from artifact. Journal of Personality Assessment, 57, 205-215. Greene, R.L. (1987). Ethnicity and MMPI performance: A review. Journal of Consulting and Clinical Psychology, 55, 497-512. Greene, R.L. (1991). The MMPI-2/MMPI: An interpretive manual. Boston: Allyn & Bacon. Greene, R.L. (1994). Relationships among MMPI codetype, gender, and setting and the MacAndrew Alcoholism Scale. Assessment, 1, 39-46. Greene, R.L., & Brown, R.C. (1998). MMPI-2 adult interpretive system (2nd ed.). Lutz, FL: Psychological Assessment Resources. Greene, R.L., & Garvin, R.D. (1988). Substance abuse/dependence. In R.L. Greene (Ed.), The MMPI: Use with specific populations (pp. 159-197). Philadelphia: Grune & Stratton. Greene, R.L., & Schinka, J.A. (1996). [MMPI-2 data for alcoholic inpatients and psychiatric inpatients and outpatients.] Unpublished raw data. Guck, T.P., Meilman, P.W., Skultety, F.M., & Poloni, L.D. (1988). Pain-patient MMPI subgroups: Evaluation of long-term treatment outcome. Journal of Behavioral Medicine, 11, 159-169. Gunderson, J.H.R. (1989). Psychometric features related to the acute phase and final outcome of anorexia nervosa. Scandinavian Journal of Psychology, 30, 81-89. Hathaway, S.R., & McKinley, J.C. (1940). A multiphasic personality schedule (Minnesota): I. Construction of the schedule. Journal of Psychology, 10, 249-254. Hearn, M.D., Murray, D.M., & Luepker, R.V. (1989). Hostility, coronary heart disease, and total mortality: A 33-year follow-up study of university students. Journal of Behavioral Medicine, 12, 105-121. Helmes, E., & Reddon, J.R. (1993). A perspective on developments in assessing psychopathology: A critical review of the MMPI and MMPI-2. Psychological Bulletin, 113, 453-471. Herron, L.D., & Pheasant, H.C. (1982). Changes in MMPI figures after low-back surgery. Spine, 7, 591-597. Hoffmann, H., & Jansen, D.G. (1973). Relationships among discharge variables and MMPI scale scores of hospitalized alcoholics. Journal of Clinical Psychology, 29, 475-477. Hoffmann, H., Loper, R.G., & Kammeier, M.L. (1974). Identifying future alcoholics with MMPI alcoholism scales. Quarterly Journal of Studies on Alcohol, 35, 490-498. Huber, N.A., & Danahy, S. (1975). Use of the MMPI in predicting completion and evaluating changes in a longterm alcoholism treatment program. Journal of Studies on Alcohol, 36, 1230-1237. Kammeier, M.L., Hoffmann, H., & Loper, R.G. (1973). Personality characteristics of alcoholics as college freshmen and at time of treatment. Quarterly Journal of Studies on Alcohol, 34, 390-399. Keller, L.S., & Butcher, J.N. (1991). Assessment of chronic pain with the MMPI-2. Minneapolis: University of Minnesota Press. King, S.A., & Snow, B.R. (1989). Factors for predicting premature termination from a multidisciplinary chronic pain program. Pain, 39, 281-287. Kleinke, C.L., & Spangler, Jr., A.S. (1988). Predicting treatment outcome of chronic back pain patients in a multidisciplinary pain clinic: Methodological issues and treatment implications. Pain, 33, 41-48. Klonoff, H., Fleetham, J., Taylor, D.R., & Clark, C. (1987). Treatment outcome of obstructive sleep apnea: Physiological and neuropsychological concomitants. Journal of Nervous and Mental Disease, 175, 208-212. Knapp, J.E., Templer, D.I., Cannon, W.G., & Dobson, S. (1991). Variables associated with success in an adolescent drug treatment program. Adolescence, 26, 305-317. Krasnoff, A. (1977). Failure of MMPI scales to predict treatment completion. Journal of Studies on Alcohol, 38, 1440-1442. Lewak, R.W., Marks, P.A., & Nelson, G.E. (1990). Therapist guide to the MMPI and MMPI-2: Providing feedback and treatment. Muncie, IN: Accelerated Development. Lin, T. (1975). Use of demographic variables, WRAT, and MMPI scores to predict addicts' types of discharge from a community-like hospital setting. Journal of Clinical Psychology, 31, 148-151. Long, C.J. (1981). The relationship between surgical outcome and MMPI figures in chronic pain patients. Journal of Clinical Psychology, 37, 744-749. Long, K.A., Graham, J.R., & Timbrook, R.E. (1994). Socioeconomic status and MMPI-2

< previous page

page_1047

next page > Page 1047

interpretation. Measurement and Evaluation in Counseling and Development, 27, 158-177. Loper, R.G., Kammeier, M.L., & Hoffmann, H. (1973). MMPI characteristics of college freshman males who later became alcoholics. Journal of Abnormal Psychology, 82, 159-162. Love, A.W., & Peck, C.L. (1987). The MMPI and psychological factors in chronic low back pain: A review. Pain, 28, 1-12. Lubin, B., Larsen, R.M., Matarazzo, J.D., & Seever, M. (1985). Psychological test usage patterns in five professional settings. American Psychologist, 40, 857-861. Lurie, J. (1995). The relationship between MMPI subtypes and relapse for inpatient chemical dependency patients three months after graduation from treatment. Unpublished doctoral dissertation, California School of Professional Psychology, Alameda. MacAndrew, C. (1965). The differentiation of male alcoholic outpatients from nonalcoholic psychiatric outpatients by means of the MMPI. Quarterly Journal of Studies on Alcohol, 26, 238-246. MacAndrew, C. (1981). What the MAC scale tells us about men alcoholics: An interpretive review. Journal of Studies on Alcohol, 42, 604-625. Marks, P.A., & Briggs, P.F. (1972). Adolescent norm tables for the MMPI. In W.G. Dahlstrom, G.S. Welsh, & L.E. Dahlstrom, An MMPI handbook: Vol. I. Clinical interpretation (rev. ed., pp. 388-399). Minneapolis: University of Minnesota Press. Marshall, L.L., & Roiger, R.J. (1996). Substance user MMPI-2 profiles: Predicting failure in completing treatment. Substance Use and Misuse, 31, 197-206. Maruta, T., Hamburgen, M.E., Jennings, C.A., Offord, K.P., Colligan, R.C., Frye, R.L., & Malinchoc, M. (1993). Keeping hostility in perspective: Coronary heart disease and the Hostility scale on the MMPI. Mayo Proceedings, 68, 109-114. McArthur, D.L., Cohen, M.J., Gottlieb, H.J., Naliboff, B.D., & Schandler, S.L. (1987). Treating chronic low back pain: II. Long-term follow up. Pain, 29, 23-38. McCann, J.T. (1991). Convergent and discriminant validity of the MCMI-II and MMPI personality disorder scales. Psychological Assessment, 3, 9-18. McCreary, C., Turner, J., & Dawson, E. (1977). Differences between functional versus organic low back pain patients. Pain, 4, 73-78. McWilliams, J., & Brown, C.C. (1977). Treatment termination variables, MMPI scores and frequencies of relapse in alcoholics. Journal of Studies on Alcohol, 38, 477-486. Millon, T. (1994). Manual for the Millon Clinical Multiaxial Inventory-III (MCMI-III). Minneapolis: National Computer Systems. Miner, M.H., & Dwyer, S.M. (1995). Analysis of drop outs from outpatient sex offender treatment. Journal of Psychology & Human Sexuality, 7, 77-93. Moore, J.E., Armentrout, D.P., Parker, J.C., & Kiviahan, D.R. (1986). Empirically derived pain-patient MMPI subgroups: Prediction of treatment outcome. Journal of Behavioral Medicine, 9, 51-63. Morey, L.C., & Smith, M.R. (1988). Personality disorders. In R.L. Greene (Ed.), The MMPI: Use with specific populations (pp. 110-158). Philadelphia: Grune & Stratton. Morey, L.C., Waugh, M.H., & Blashfield, R.K. (1985). MMPI scales for DSM-III personality disorders: Their derivation and correlates. Journal of Personality Assessment, 49, 245-251. Morrison, T.L., Edwards, D.W., Weissman, H.N., Allen, R., & DeLaCruz, D. (1995). Comparing MMPI and MMPI-2 profiles: Replication and integration. Assessment, 2, 39-46. Moser, R.K. (1996). The use of the MMPI-2 in the diagnosis of depression and psychosis. Unpublished doctoral dissertation, New School for Social Research, New York. Mozdzierz, G.J., Macchitelli, F.J., & Conway, J.A. (1973). Personality characteristic differences between alcoholics who leave treatment against medical advice and those who don't. Journal of Clinical Psychology, 29, 78-82. Munley, P.H., Bains, D.S., Frazee, J., & Schwartz, L.T. (1994). Inpatient PTSD treatment: A study of pretreatment measures, treatment drop out, and therapist ratings of response to treatment. Journal of Traumatic Stress, 2, 319-325. Naliboff, B.D., McCreary, C.P., McArthur, D.L., Cohen, M.J., & Gottlieb, H.J. (1988). MMPI changes following behavioral treatment of chronic low back pain. Pain, 35, 271-277.

< previous page

page_1048

next page > Page 1048

Nathan, P.E., & Skinstad, A. (1987). Outcomes of treatment for alcohol problems: Current methods, problems, and results. Journal of Consulting and Clinical Psychology, 55, 332-340. Nichols, D.S., & Greene, R.L. (1995). The MMPI-2 Structural Summary manual. Lutz, FL: Psychological Assessment Resources. Onorato, V.A., & Tsushima, W.T. (1983). EMG, MMPI, and treatment outcome in the biofeedback therapy of tension headache and posttraumatic pain. American Journal of Clinical Biofeedback, 6, 71-81. Pekarik, G., Jones, D.L., & Blodgett, C. (1986). Personality and demographic characteristics of drop outs and completers in a nonhospital residential alcohol treatment program. The International Journal of the Addictions, 21, 131-137. Pettinati, H.M., Sugerman, A.A., & Maurer, H.S. (1982). Four year MMPI changes in abstinent and drinking alcoholics. Alcoholism: Clinical and Experimental Research, 6, 487-494. Rohan, W.P., Tatro, R.L., & Rotman, S.R. (1969). MMPI changes in alcoholics during hospitalization. Quarterly Journal of Studies on Alcohol, 30, 389-400. Rounsaville, B.J., Dolinsky, Z.S., Babor, T.E., & Meyer, R.E. (1987). Psychopathology as a predictor of treatment outcome in alcoholics. Archives of General Psychiatry, 44, 505-513. Scapinello, K.F., & Blanchard, R. (1987). Historical items in the MMPI: Note on evaluating treatment outcomes for a criminal population. Psychological Reports, 61, 775-778. Schinka, J.A., & LaLone, L. (1997). MMPI-2 norms: Comparisons with a census-matched subsample. Psychological Assessment, 9, 307-311. Schinka, J.A., LaLone, L., & Greene, R.L. (1997, August). Effects of demographic characteristics on MMPI-2 scale scores. Paper presented at the annual meeting of the American Psychological Association, Chicago. Schnurr, P.P., Friedman, M.J., & Rosenberg, S.D. (1993). Premilitary MMPI scores as predictors of combatrelated PTSD symptoms. American Journal of Psychiatry, 150, 479-483. Schork, E.J., Eckert, E.D., & Halmi, K.A. (1994). The relationship between psychopathology, eating disorder diagnosis, and clinical outcome at 10-year follow-up in anorexia nervosa. Comprehensive Psychiatry, 35, 113123. Schuckit, M.A., Klein, J., Twitchell, G., & Smith, T. (1994). Personality test scores as predictors of alcoholism almost a decade later. American Journal of Psychiatry, 151, 1038-1042. Shekelle, R.B., Gale, M., Ostfeld, A.M., & Paul, O. (1983). Hostility, risk of coronary heart disease, and mortality. Psychosomatic Medicine, 45, 109-114. Sheppard, D., Smith, G.T., & Rosenbaum, G. (1988). Use of MMPI subtypes in predicting completion of a residential alcoholism treatment program. Journal of Consulting and Clinical Psychology, 56, 590-596. Sieber, K.O., & Meyers, L.S. (1992). Validation of the MMPI-2 social introversion subscales. Psychological Assessment, 4, 185-189. Snyder, D.K. (1990). Assessing chronic pain with the MMPI. In T.W. Miller (Ed.), Chronic pain (pp. 215-257). Madison, CT: International Universities Press. Stark, M.J. (1992). Dropping out of substance abuse treatment: A clinically oriented review. Clinical Psychology Review, 12, 93-116. Sternbach, R.A., Wolf, S.R., Murphy, R.W., & Akeson, W.H. (1973). Traits of pain patients: The low-back ''loser." Psychosomatics, 14, 226-229. Strassberg, D.S., Reimherr, F., Ward, M., Russell, S., & Cole, A. (1981). The MMPI and chronic pain. Journal of Consulting and Clinical Psychology, 49, 220-226. Sunday, S.R., Reeman, I.M., Eckert, E., & Halmi, K.A. (1996). Ten-year outcome in adolescent onset anorexia nervosa. Journal of Youth and Adolescence, 25, 533-544. Svanum, S., & Dallas, C.L. (1981). Alcoholic MMPI types and their relationship to patient characteristics, polydrug abuse, and abstinence following treatment. Journal of Personality Assessment, 45, 278-287. Tellegen, A., Butcher, J.N., & Hoeglund, T. (1993). Unisex norms for the MMPI-2 and MMPI-A: Are they needed? Would they work?. Paper presented at the 28th Annual Symposium on Recent Advances in the MMPI (MMPI-2 and MMPI-A), St. Petersburg, FL. Timbrook, R.E., & Graham, J.R. (1994). Ethnic differences on the MMPI-2? Psychological Assessment, 6, 212217.

< previous page

page_1049

next page > Page 1049

Turner, J.A., Herron, L., & Weiner, P. (1986). Utility of the MMPI Pain Assessment Index in predicting outcome after lumbar surgery. Journal of Clinical Psychology, 42, 764-769. Welsh, G.S. (1956). Factor dimensions A and R. In G.S. Welsh & W.G. Dahlstrom (Eds.), Basic readings on the MMPI in psychology and medicine (pp. 264-281). Minneapolis: University of Minnesota Press. Wetzler, S., Khadivi, A., & Oppenheim, S. (1995). The psychological assessment of depression: Unipolars versus bipolars. Journal of Personality Assessment, 65, 557-566. Widiger, T.A., & Sanderson, C. (1987). The convergent and divergent validity of the MCMI as a measure of DSM-III personality disorders. Journal of Personality Assessment, 51, 228-242. Wilkinson, A.E., Prado, W.M., Williams, W.O., & Schnadt, F.W. (1971). Psychological test characteristics and length of stay in alcoholism treatment. Quarterly Journal of Studies on Alcohol, 32, 1230-1237. Williams, D.E., Thompson, J.K., Haber, J.D., & Raczynski, J.M. (1986). MMPI and head-ache: A special focus on differential diagnosis, prediction of treatment outcome, and patient-treatment matching. Pain, 24, 143-158. Williams, Jr., R.B., Haney, T.L., Lee, K.L., Kong, Y-H, Blumenthal, J.A., & Whalen, R.E. (1990). Type A behavior, hostility, and coronary atherosclerosis. Psychosomatic Medicine, 42, 539-549. Zalewski, C., & Greene, R.L. (1996). Multicultural usage of the MMPI-2. In L.A. Suzuki, P.J. Meller, & J.G. Ponterotto (Eds.), Handbook of multicultural assessment: Clinical, psychological, and educational applications (pp. 77-114). San Francisco: Jossey-Bass. Zuckerman, M., Sola, S., Masterson, J., & Angelone, J.V. (1975). MMPI patterns in drug abusers before and after treatment in therapeutic communities. Journal of Consulting and Clinical Psychology, 48, 286-296.

< previous page

page_1049

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_1051

next page > Page 1051

Chapter 34 Treatment Planning and Outcome in Adults: The Millon Clinical Multiaxial Inventory-III Roger D. Davis Institute for Advanced Studies in Personology and Psychopathology Sarah E. Meagher University of Miami and Institute for Advanced Studies in Personology and Psychopathology Antonio Goncalves Mark Woodward University of Miami Theodore Millon University of Miami, Harvard University, and Institute for Advanced Studies in Personology and Psychopathology Since its original publication in 1977, the Millon Clinical Multiaxial Inventory (MCMI) has become one of the most frequently used psychometric instruments among clinicians and researchers. Like the other Millon clinical inventories, the MCMI-III was constructed as a brief objective instrument consonant with the multiaxial format of the Diagnostic and Statistical Manual (DSM), assessing both the problematic behaviors and clinical conditions of Axis I and the personality variables of Axis II. Now in its third revision, the MCMI-III (Millon, 1994) consists of 175 true-false items grouped into 14 personality scales, 10 clinical syndrome scales, 3 modifying indices assessing response style, and 1 validity index. Advances Since MCMI-II The decision to revise the second edition of the MCMI (1987) was motivated by theoretical, professional, and empirical concerns. Theoretical Progress Diagnostic instruments are more useful when they are linked systematically to a comprehensive clinical theory. Unfortunately, as many have noted (Butcher, 1972), assessment techniques and personality theory have developed almost independently. As a result, few diagnostic measures have either been based on or have evolved from clinical theory. The

< previous page

page_1051

next page >

< previous page

page_1052

next page > Page 1052

MCMI-III is different. Each of its Axis II scales is an operational measure of a syndrome derived from a theory of personality (Millon, 1969, 1981, 1986a, 1986b, 1990; Millon & Davis, 1996). Although the Axis I scales are not explicitly derived from the theory, they are nevertheless refined in terms of its generative framework. The scales and profiles of the MCMI-III thus measure these theory-derived and theory-refined variables directly and quantifiably. With a firm foundation in measurement, scale elevations and configurations can be used to suggest specific patient diagnoses and clinical dynamics, as well as testable hypotheses about social history and current behavior. The theory on which the MCMI-I and MCMI-II were constructed has now undergone considerable development. No longer is it based primarily on the behavioral principles of reinforcement and conditioning (Millon, 1969; Millon & Everly, 1985), but is instead anchored broadly and firmly to evolutionary theory (Millon, 1990; Millon & Davis, 1996). With this change, personality disorders are seen as evolutionary constructs derived from the fundamental tasks that all organisms confront, namely, the struggle to exist or survive (pleasure vs. pain), the effort to either adapt to the environment or adapt the environment to oneself (passive vs. active), and the strategy to make large reproductive investments in a single or a few offspring versus the strategy of reproducing many offspring, without much subsequent care (other vs. self). These three fundamental polarities form a foundation, based in the larger framework of evolutionary theory, that transcends any particular school or traditional perspective on personality. Accordingly, the Axis II disorders are no longer seen as being derived principally from a single clinical data level, be it behavioral, phenomenological, intrapsychic, or biophysical, that is, within one of the four traditional approaches to psychological science. Instead, personality disorders are seen as manifest across the entire matrix of the person, with expression throughout several clinical domains. This instrument has, consequently, articulated an expanding base of diagnostic criteria and personality concepts (e.g., Millon, 1984, 1990, 1994), a framework much more extensive than the DSM, including the DSM-IV. This growing body of clinical literature provides a substantial knowledge base for the MCMI-III. To the extent that DSM-IV reflects these advances, its correspondence to MCMI-III has been further strengthened. Professional Progress In addition to theoretical progress oriented toward the understanding of Axis II, the area of personality disorders itself now enjoys worldwide scientific interest. The growth of the Journal of Personality Disorders and the International Society for the Study of Personality Disorders illustrates the importance attached to these syndromes as a major component of the mental disorders. These two major forums both inform and reflect the renaissance in personality theory and assessment that began in the late 1970s and 1980s (Millon, 1984, 1990), and is continuing today. Moreover, the clinical field generally has seen numerous professional developments. The most significant of these is, of course, the publication of the DSM-IV. Enriched by this knowledge, an increasingly solid base for making refined diagnostic decisions has been found, well beyond the earlier literature. To provide for additional scales and to optimize MCMI item to DSM-IV criteria correspondence, as well to reflect generalization studies, 95 new MCMI-III items were introduced to replace 95 extant MCMI-II items. Few diagnostic instruments currently available are as fully consonant as the MCMI-III with the nosological format and conceptual terminology of this official

< previous page

page_1052

next page >

< previous page

page_1053

next page > Page 1053

system. In addition, one additional personality disorder scale (i.e., Depressive Personality) and one clinical syndrome scale (i.e., Posttraumatic Stress Disorder) were added. Finally, a small set of items was added to strengthen the utility of the Noteworthy Responses section of the interpretive report in the areas of Child Abuse, Anorexia, and Bulimia. Empirical Progress Currently, over 400 research articles have been published that employ the MCMI-I and MCMI-II as a major assessment instrument. This substantial empirical base, though difficult to digest in its totality, led to several major refinements in the structure of the MCMI-III. Numerous cross-validation and cross-generalization studies have been and continue to be executed with the goal of evaluating and improving each of the several elements that comprise the MCMI, that is, its items, scales, scoring procedures, algorithms, and interpretive text (see Choca, Shanley, & Van Denburg, 1992, 1997; Craig, 1993; Hsu & Maruish, 1992; Maruish, 1994). These ongoing investigations continue to provide an empirical grounding for further upgrading of each of these components. With the preceding information as a base, a number of changes were introduced to create the MCMI-III. First, the influence of the item weighting system introduced in the MCMI-II was moderated. Previously, prototypal items were given a weight of three points. They now receive a weight of two points in the MCMI-III. Studies have generally shown very high correlations between scales composed of weighted and unweighted items. Whereas the authors continue to feel not only that the distinction between items more central and more peripheral to the definition of a construct is an essential one, but also that items should be weighted according to their demonstrated substantive, structural, and external characteristics (Loevinger, 1957), two points are now deemed adequate for capturing this distinction. Thus, because the essential distinction has been sharpened, clinicians may still choose to inspect the prototypal items of each scale as so-called critical items when seeking support for particular criteria and when making diagnostic judgments. Abandonment of the item weighting system, although perhaps not empirically objectionable, would have produced scales composed exclusively of singly weighted items. This is not only incommensurate with the prototypal model that undergirds the official diagnostic system, but also inconsistent with the tripartite logic that guided the development of the test itselfone that holds empirical considerations to be only one basis on which the structural features of an instrument should rest. Second, modifications were made also in procedures for correcting distortion effects (e.g., random responding, faking, denial, complaining) that simplify the scoring procedures developed in the MCMI-II. Distinguishing Features Numerous features distinguish the MCMI-III from other inventories. These include the relatively brief length of the inventory, its theoretical anchoring, multiaxial format, construction through three stages of validation, use of base rate scores, and interpretive depth.

< previous page

page_1053

next page >

< previous page

page_1054

next page > Page 1054

Inventory Length Each generation of the MCMI has attempted to keep the total number of items small enough to encourage its use in all types of diagnostic and treatment settings, yet large enough to permit the assessment of a wide range of clinically relevant behaviors. At 175 items, the MCMI is much shorter than comparable instruments. Potentially objectionable candidate items have been screened out, and terminology has been geared to an eighth-grade reading level. As a result, the great majority of patients can complete the MCMI-III in 20 to 30 minutes, facilitating relatively simple and rapid administrations while minimizing patient resistance and fatigue. Structural Characteristics No less important than its link to theory is the coordination between a clinically oriented instrument and the official diagnostic system and its syndromal categories. Few diagnostic instruments currently available have been constructed to be as consonant with the official nosology as the MCMI-III. With the advent of the DSMIII, DSM-III-R and DSM-IV, diagnostic categories were precisely specified and operationally defined. The structure of the MCMI-III parallels that of the DSM-IV at a number of levels. First, the scales of the MCMI-III are grouped into the categories of personality and psychopathology, to reflect the DSM-IV distinction between Axis II and Axis I. Thus, separate scales distinguish the more enduring personality characteristics of patients (Axis II) from the acute clinical disorders they display (Axis I), a distinction judged to be of considerable use by both test developers and clinicians (Dahlstrom, 1972). Profiles based on all 24 clinical scales may be interpreted to illuminate the interplay between long-standing characterological patterns and the distinctive clinical symptoms currently manifest. Beyond the simple DSM-IV distinction between psychiatric symptoms and enduring personality dispositions, the scales within each axis are further grouped according to their level of psychopathologic severity. Thus, the premorbid characterological pattern of a patient is assessed independently of its degree of pathology. The Schizotypal, Borderline, and Paranoid syndromes represent greater levels of personality pathology, and have been set off from the 11 basic personality scales, Schizoid through Masochistic. Similarly, the moderately severe or neurotic clinical syndromes are separated from and independently assessed of those with a presumably more psychotic nature, Thought Disorder, Major Depression, and Delusional Disorder. Second, at a scale level, each axis is comprised of dimensions reflecting its foremost syndromes. Thus, the Axis II scales comprise those personality dimensions that have been a part of the DSM or its appendix since its third revision; the Axis I scales reflect those syndromes that are most prominent and important in clinical work (see Table 34.1). The content of the MCMI-III Axis I and Axis II scales is found in Table 34.2. Item Weighting and Item Selection Although items in the Millon inventories are weighted primarily in terms of whether they represent more core or more peripheral features of the constructs they assess, they are also weighted on the strength of their validational evidence.

< previous page

page_1054

next page >

< previous page

page_1055

next page > Page 1055

TABLE 34.1 The MCMI-III Scales Clinical Personality Patterns Schizoid 1. 2A. Avoidant 2B. Depressive 3. Dependent 4. Histrionic 5. Narcissistic 6A. Antisocial 6B. Sadistic 7. Compulsive 8A. .Negativistic (Passive-Aggressive) 8B. Masochistic (Self-defeating) Severe Personality Pathology S. Schizotypal C. Borderline P. Paranoid Clinical Syndromes A. Anxiety H. Somatoform N. Bipolar: Manic D. Dysthymia B. Alcohol Dependence T. Drug Dependence R. Posttraumatic Stress Disorder Severe Clinical Syndromes SS. Thought Disorder CC. Major Depression PP. Delusional Disorder Modifying Indices X. Disclosure Y. Desirability Z. Debasement V. Validity As with all the Millon inventories, item selection and scale development for the MCMI-III progressed through theoretical-substantive, internal-structural, and external-criterion stages. This tripartite model attempts to synthesize the strengths of each development phase by rejecting items found to be deficient in particular respects. This ensures that the final scales of an inventory do not consist of items that optimize one particular parameter of test construction, but instead conjointly satisfy multiple requirements, increasing the generalizability of the end product. By using different validation strategies, the MCMI-III upholds the standards of test developers committed to diverse methods of construction and validation (Hase & Goldberg, 1967). In the theoretical-substantive stage, items for each syndrome were generated to conform both to theoretical requirements and to the substance of the DSM-IV criteria. In the internal-structural stage, these "rational" items were subjected to internal consistency analyses. Items having higher correlations with scales for which they were not intended were either dropped entirely or reexamined against theoretical criteria and reassigned or reweighted. In the external-criterion phase, items were examined in terms of their ability to discriminate between clinical groups, rather than between clinical groups and normal subjects. Normals are not an appropriate reference or comparison group (Rosen, 1962).

< previous page

page_1055

next page >

< previous page

page_1056

next page > Page 1056

TABLE 34.2 Clinical Code and Brief Description of the MCMI-III Axis I and II Scales Axis II: Clinical Personality Patterns 1. Schizoid Personality. Noted by their lack of desire and incapacity to experience either pleasure or pain in depth, these individuals tend to be apathetic, listless, distant, and asocial. Because affectionate needs and emotional feelings are minimal, the individual functions as a passive observer detached from the rewards and affections of human relationships, as well as from their demands. 2A. Avoidant Personality. Basically fearful and vigilant, these individuals are perennially on guard, ever ready to distance themselves because of anxious anticipation of painful and humiliating experiences. By actively withdrawing they protect themselves in spite of deep desires to be close to others. 2B. Depressive Personality. These individuals believe that pain is a permanent and stable part of life, and that pleasure is no longer possible. A disconsolate family, a barren environment, and hopeless prospects can all shape the Depressive character style. 3. Dependent Personality. Turning primarily to others as a source of nurturance and security, these persons wait passively for others to provide affection, security, guidance, and leadership, while often submitting willingl to the wishes of others in order to maintain their affection. Lack of both initiative and autonomy is often a consequence of parental overprotection. 4. Histrionic Personality. Facile and manipulating, these individuals seek to maximize the amount of attention and favorable treatment they receive while minimizing the disinterest and disapproval of others. Their clever and often artful social behaviors give the appearance of an inner confidence and independent self-assurance. Beneath this guise, however, lies a fear of genuine autonomy and a need for repeated signs of acceptance and approval from every interpersonal source and in every social context. 5. Narcissistic Personality. Noted by their egotistic self-involvement, these individuals overvalue their self-worth, often maintaining confidence and superiority that is unsustainable by real or mature achievements. Nevertheless, they blithely assume that others will recognize their specialness and exhibit an air of arrogant self-assurance. A sublime confidence that things always work out provides with little incentive to engage in the reciprocal give-and-take of social life. 6A. Antisocial Personality. Engaging in duplicitous or illegal behaviors designed to exploit their environment for self-gain, these individuals are irresponsible and impulsive, judge others to be unreliable and disloyal, and use insensitivity and ruthlessness to head off abuse and victimization. 6B. Sadistic Personality. Although deleted from the DSM-IV manual, this construct remains part of the MCMI-III. Subjects are generally hostile, pervasively combative, and appear indifferent to or pleased by th destructive consequences of their contentious, abusive, and brutal behaviors. Although many cloak their more malicious and power-oriented tendencies in publicly approved roles and vocations, they give themselves away in their dominating and antagonistic actions. 7. Compulsive Personality. Prudent, controlled, and perfectionistic, high scorers experience a conflict between hostility and fear of social disapproval, typically suppressing resentment by overconforming and by placing high demands on themselves. Their disciplined self-restraint controls intense, though hidden, oppositional feelings, resulting in an overt passivity and seeming public compliance. 8A. Negativistic Personality. These individuals struggle between loyalty to their own needs and those of others, vacillating between deference and obedience, and defiance and aggressive opposition. Behaviorally they display an erratic pattern of explosive anger or stubbornness intermingled with periods of guilt and shame. 8B. Masochistic Personality. Relating to others in an obsequious and selfsacrificing manner, these persons allow, and perhaps encourage, others to exploit or take advantage of them. Focusing on their very worst features, many assert that they deserve being shamed and humbled. Typically acting in an unassuming and self-effacing way, they often intensify their deficits and place themselves in an inferior light or abject position. Axis II: Severe Personality Pathology

S. Schizotypal Personality. Socially isolated with minimal personal attachments and obligations, these persons are inclined to be either autistic or cognitively confused, tangential, self-absorbed, or ruminative. Their behavioral eccentricities cause others to perceive them as strange or different. C. Borderline Personality. Experiencing intense moods punctuated by recurring periods of dejection and apathy and spells of anger and anxiety, borderlines are defined by a dysregulation of affect, most clearly seen in the instability and lability of their moods. Many have recurring self-mutilating and suicidal thoughts, appear overly preoccupied with securing affection, have difficulty maintaining a clear sense of identity, and display a cognitiveaffective ambivalence evident in conflicting feelings of rage, love, and guilt toward others. P. Paranoid Personality. Displaying a vigilant mistrust of others and an edgy defensiveness against anticipated criticism and deception, these persons evidence an abrasive irritability and a tendency to precipitate exasperation and anger in others, fear of losing independence, and vigorously resist external influence and control. (Continued) (table continued on next page)

< previous page

page_1056

next page >

< previous page

page_1057

next page > Page 1057

(table continued from previous page) TABLE 34.2 (Continued) Axis I: Clinical Syndromes A. Anxiety. High scorers often report feeling either vaguely apprehensive or specifically phobic. They are typically tense, indecisive, and restless, and tend to complain of a variety of physical discomforts, such as tightness, excessive perspiration, ill-defined muscular aches, and nausea. Most give evidence of a generalized state of tension, manifested by an inability to relax, fidgety movements, and a readiness to react and be easily startled. Somatic discomfortsfor example, clammy hands or upset stomachare also characteristic. Also notable are worrisomeness and an apprehensive sense that problems are imminent, a hyperalertness to one's environment, edginess, and generalized touchiness. H. Somatoform. High scorers express psychological difficulties through somatic channels, notably, persistent periods of fatigue and weakness, and a preoccupation with ill health and a variety of dramatic but largely nonspecific pains in different and unrelated regions of the body. Some give evidence of a primary somatization disorder that is manifested by recurrent, multiple somatic complaints, often presented in a dramatic, vague, or exaggerated way. Others have a history that may be best considered hypochondriacal, because they interpret minor physical discomforts or sensations as signifying a serious ailment. If realistic diseases are factually present, they tend to be overinterpreted, despite medical reassurance. Typically, somatic complaints are employed to gain attention. N. Bipolar: Manic. High scorers evidence periods of superficial elation, inflated self-esteem, restless overactivity and distractibility, pressured speech, and impulsiveness and irritability. Also evident is an unselective enthusiasm; excessive planning for unrealistic goals; an intrusive, if not domineering and demanding quality to interpersonal relations; decreased need for sleep; flights of ideas; and rapid and labile shifts of mood. Very high scores may signify psychotic processes, including delusions or hallucinations. D. Dysthymia. High scorers remain involved in everyday life but have been preoccupied over a period of years with feelings of discouragement or guilt, lack initiative, possess low selfesteem, and frequently voice futile and self-deprecatory comments. During periods of dejection, there may be tearfulness, suicidal ideation, a pessimistic outlook toward the future, social withdrawal, poor appetite or overeating, chronic fatigue, poor concentration, a marked loss of interest in pleasurable activities, and a decreased effectiveness in fulfilling ordinary and routine life tasks. B. Alcohol Dependence. High scorers probably have a history of alcoholism. They have made efforts to overcome this problem with minimal success, and, as a consequence, experience considerable discomfort in both family and work settings. T. Drug Dependence. High scorers are likely to have had a recurrent or recent history of drug abuse, tend to have difficulty in restraining impulses or keeping them within conventional social limits, and display an inability to manage the personal consequences of these behaviors. R. Posttraumatic Stress Disorder. High scorers have experienced an extremely threatening event involving the threat to life, together with intense fear and feelings of helplessness. Images and emotions associated with the trauma are reexperienced through distressing recollections and nightmares. Symptoms of anxious arousal may also be present, along with an avoidance of circumstances associated with the trauma. Axis I: Severe Clinical Syndromes SS. Thought Disorder. Depending on the length and course of the problem, these patients are often classified as "schizophrenic," "schizophreniform," or as "brief reactive psychosis." They may periodically exhibit incongruous, disorganized, or regressive behavior, often appearing confused and disoriented and occasionally displaying inappropriate affect, scattered hallucinations, and unsystematic delusions. Thinking may be fragmented or bizarre. Feelings may be blunted,

and there may be a pervasive sense of being isolated and misunderstood by others. Withdrawn and seclusive or secretive behavior may be notable. CC. Major Depression. High scorers are severely depressed, express a dread of the future, suicidal ideation, and a sense of hopeless resignation. They may be incapable of functioning in a normal environment. Some exhibit a marked motor retardation, whereas others display an agitated quality, incessantly pacing about and bemoaning their sorry state. Several somatic processes are often disturbed during these periodsnotably, a decreased appetite, fatigue, weight loss or gain, insomnia, or early rising. Problems of concentration are common, as are feelings of worthlessness or guilt. Repetitive fearfulness and brooding are frequently in evidence. PP. Delusional Disorder. High scorers are frequently considered acutely paranoid, may become periodically belligerent, and voice irrational but interconnected sets of delusions of a jealous, persecutory, or grandiose nature. Depending on the constellation of other concurrent syndromes, there may be clear-cut signs of disturbed thinking and ideas of reference. Moods usually are hostile, and feelings of being picked on and mistreated are expressed. A tense undercurrent of suspiciousness, vigilance, and alertness to possible betrayal are typical concomitants.

< previous page

page_1057

next page >

< previous page

page_1058

next page > Page 1058

Diagnostic Thresholds An important feature that distinguishes the MCMI-III from other inventories is its use of actuarial base rate (BR) data, rather than normalized standard score transformations or percentile ranks. Because T-scores are developed so that a fixed sample percentage falls above a particular cutting score, they implicitly assume the prevalence rates of all disorders to be equal, that is, there are equal numbers of depressives and schizophrenics, for example. In contrast, the MCMI-III seeks to diagnose the percentages of patients that are actually found to be disordered across diagnostic settings. The BR score was designed to anchor cutoff points to the prevalence of a particular disorder in the psychiatric population. These data not only provide a basis for selecting optimal differential diagnostic cutting lines, they also ensure that the frequency of MCMI-III-generated diagnoses and profile patterns will be comparable to representative clinical prevalence rates. Although local base rates and cutting lines must still be developed for special settings, validation data with a variety of populations (e.g., outpatients and inpatients, alcohol and drug centers) suggest that the MCMI-III can be used with a reasonable level of confidence in most clinical settings. Such scores define a continuum of pathology representing the difference a clinical disorder and normal functioning as one of degree rather than kind. Interpretive Refinements In addition to a program for rapid and convenient machine scoring, a computer-generated narrative report is available that integrates both personological and symptomatic features of the patient. The report is arranged in a style similar to those prepared by clinical psychologists. The report synthesizes data from both scale score elevations and profile configurations and is based on the results of actuarial research, the MCMI-III's theoretical schema (Millon, 1969, 1981, 1990), and relevant DSM-IV diagnoses within a multiaxial framework. Thus, beyond giving a complex description of syndrome dynamics, the report summarizes findings along several dimensions or axes: severity of disturbance, presenting clinical syndrome, basic personality pathology, psychosocial stressors, and therapeutic implications. Theory, Structure, and Scales The model on which the MCMI-III Axis II scales are based is grounded in the principles of evolution. In essence, it seeks to explicate the structure and styles of personality with reference to deficient, imbalanced, or conflicted modes of ecological adaptation and reproductive strategy, as most fully developed in Toward a New Personology: An Evolutionary Model (Millon, 1990), and briefly presented in the revised Disorders of Personality: DSM-IV and Beyond (Millon & Davis, 1996). Four domains, or spheres, in which evolutionary principles are demonstrated are labeled as Existence, Adaptation, Replication, and Abstraction. The first relates the serendipitous transformation of random or less organized states into those possessing distinct structures of greater organization. The second refers to homeostatic processes employed to sustain survival in open ecosystems. The third pertains to reproductive

< previous page

page_1058

next page >

< previous page

page_1059

next page > Page 1059

Fig. 34.1. Polarity model and its personality disorder derivatives. styles that maximize the diversification and selection of ecologically effective attributes. Finally, the fourth concerns the emergence of competencies that foster anticipatory planning and reasoned decision making. Polarities derived from the first three phases (pleasure-pain, passive-active, otherself) are used to construct a theoretically embedded classification system of personality disorders. Personalities termed pleasure-deficient lack the capacity to experience or to enact certain aspects of the three polarities. The interpersonally imbalanced lean strongly toward one or another extreme of a polarity. Finally, the intrapsychically conflicted struggle with ambivalences toward opposing ends of a bipolarity. Three additional pathological personality patternsthe Schizotypal, Borderline, and Paranoidrepresent more advanced stages of personality pathology. Reflecting an insidious and slow deterioration of the personality structure, these differ from the basic personality disorders by several criteria, notably, deficits in social competence and frequent (but usually reversible) psychotic episodes. Less integrated in terms of personality organization and less effective in coping than their milder counterparts, they are especially vulnerable to the everyday strains of life. Figure 34.1 presents the personality disorders as derived from the evolutionary model. Psychometric Characteristics Valid psychological measurement requires scales that are internally consistent and stable across time. Alpha and test-retest reliabilities for the MCMI-III scales are reported in Table 34.1. Millon (1987) examined the stability of MCMI-II two-scale, high-point configurations of the patient profiles for a sample of 168 subjects. Over 78% had at least one scale in the 2-point code at both administrations, and 45% had the same highest two-scale configuration in either the same or reverse order. In the MCMI-II manual, diagnostic efficiency statistics were reported for the MCMI-I and the MCMI-II tests. These data were based on the expert judgments of clinicians who were well-acquainted with the patients they rated. Additional validation work has just been completed for the MCMI-III test (diagnostic statistics are discussed in depth

< previous page

page_1059

next page >

< previous page

page_1060

next page > Page 1060

in the revised manual). The results from this study supersede those reported in the first edition of the MCMI-III manual. In this new study, diagnostic judgments were obtained from clinicians who were familiar with their patient's attributes, the constructs of the personality disorders, the underlying Millon theory and its domains, and the diagnostic criteria of the DSM-IV. A total of 67 clinicians were asked to rate patients with whom they had substantial direct contact (defined as at least three therapeutic or counseling sessions). Seven sessions were modal, with contact time ranging from 3 hours to more than 60 hours. Clinicians received a detailed instruction booklet specifying DSM-IV criteria (Axis I and II) and Millon clinical domain descriptions (Axis II) across eight functional and structural domains of personality. Rating scales were anchored by descriptive paragraphs that operationalized severity and prominence of pathology at various levels. Prevalence and diagnostic efficiency statistics are presented in Table 34.3 (for Axis I results, readers are directed to the forthcoming revised manual). The frequency column shows that 38 patients were diagnosed as either primarily or secondarily schizoid by a clinician, 71 were diagnosed as primarily or secondarily avoidant, and so on. Avoidant, depressive, and dependent were the most common diagnoses (base rates follow raw frequencies in parentheses in Table 34.3). Approximately equal prevalences were obtained for most disorders by the clinicians and the MCMI-III. The sensitivity (SENS) statistic represents the proportion of patients who were clinically diagnosed with a particular disorder whose highest score was on the corresponding MCMI-III scale. For example, 56% of patients who were diagnosed as primarily schizoid by a clinician had their highest score on the Schizoid scale. The overall results show moderate to high levels of sensitivity for most of the personality scales, with five of the Axis II scales having a sensitivity higher than 70%. A more modest result (44%) was obtained for the Negativistic scale. The sensitivity for this scale falls in a more acceptable range (59%) when both primary and secondary diagnoses are included. The positive predictive power (PPP) statistic (see Table 34.3) represents the percentage of patients who tested positive for a particular disorder who were diagnosed with that TABLE 34.3 MCMI-III Clinical Personality Scale Diagnostic Efficiency Data (N = 322) Frequency SENSa PPPb PPRc SENS PPP for 1stor 2nd (1st) (1st) (1st) (1st or 2nd) (1stor 2nd) Prevalence Schizoid 36 (6%) 56% 67% 11.9 68% 72% Avoidant 71 (11%) 65% 73% 6.9 62% 63% Depressive 69 (11%) 57% 49% 4.2 75% 61% Dependent 69 (11%) 54% 81% 6.4 58% 78% Histrionic 42 (7%) 74% 63% 8.8 75% 79% Narcissistic 47 (7%) 59% 72% 10.5 72% 77% Antisocial 43 (7%) 61% 50% 8.9 81% 76% Sadistic 23 (4%) 71% 71% 32.8 74% 81% Compulsive 39 (6%) 75% 79% 8.4 74% 76% Negativistic 44 (7%) 44% 39% 7.8 59% 67% Masochistic 54 (8%) 58% 30% 8.1 85% 73% Schizotypal 22 (3%) 82% 60% 17.4 73% 67% Borderline 58 (9%) 60% 71% 5.7 79% 81% Paranoid 19 (3%) 92% 79% 21.0 69% 85% a Sensitivity. b Positive Predictive Power. c Positive Predictive Ratio.

< previous page

page_1060

next page >

< previous page

page_1061

next page > Page 1061

disorder by a clinician. For example, in the third column in Table 34.3, 67% of these individuals who scored highest on the Schizoid scale were also diagnosed as primarily schizoid by a clinician. Moderate to excellent correspondence was obtained for most of the personality scales. Lower levels of correspondence were found for most disorders in the DSM-III-R and DSM-IV appendixes (depressive, negativistic, and masochistic). This may suggest that clinicians categorize to provide reliable and valid diagnoses. Nevertheless, the PPPs for these scales move into the moderate range when calculated as either the first or second highest scale in the MCMI-III profile. The positive predictive ratio (PPR) statistic is a rough measure of incremental accuracy over what would be obtained by chance. For example, the PPR for the Avoidant scale as the highest scale is 6.9, meaning that avoidants are identified by the MCMI-III test at a rate almost seven times greater than what would result by chance alone. An impressive PPR was obtained even for the relatively common borderline personality, which is identified by the MCMI-III test at a rate of more than five times that which would be expected on the basis of chance alone, and for the depressive personality, at about four times. (Disorders with lower prevalence generally have much higher PPRs.) The diagnostic efficiency of clinically judged primary and secondary diagnoses is presented in Table 34.4 for three generations of the MCMI Axis II scales. Each version of the test achieves a satisfactory-to-high level of clinical accuracy. Changes in diagnostic efficiency from the MCMI-I test to the MCMI-III test are difficult to interpret because neither the content of the instrument nor the DSM has remained constant. Nevertheless, for any given version of the MCMI, the results are impressively consistent. Sensitivity and PPP statistics are reorganized as frequency distributions in Table 34.5, which compares the MCMI-II and MCMI-III Axis II scales in terms of the highest scale in the MCMI-III Axis II profile. These distributions show an overall upward trend for the MCMI-III. Such increasing levels of diagnostic sensitivity and positive predictive power strongly argue that the MCMI-III test is at least equal if not superior to the MCMI-II test as a diagnostic clinical instrument. Because these findings are far superior to those published in the first edition of the MCMI-III manual, they should be of particular interest to psychologists involved in defending their forensic assessments. TABLE 34.4 SENS and PPP of Primary and Secondary Diagnosis Over Three MCMI Generations SENS-Ia SENS-IIb SENS-IIIc PPP-I PPP-II PPP-III Schizoid 88% 62% 68% 68% 71% 72% Avoidant 88% 76% 62% 80% 79% 63% Depressive 75% 61% Dependent 80% 79% 56% 80% 76% 78% Histrionic 79% 88% 75% 79% 65% 79% Narcissistic 74% 74% 72% 59% 69% 77% Antisocial 62% 71% 81% 61% 80% 76% Sadistic 78% 74% 71% 61% Compulsive 58% 73% 74% 67% 67% 78% Negativistic 78% 72% 59% 73% 64% 67% Masochistic 72% 85% 58% 73% Schizotypal 74% 57% 73% 68% 59% 67% Borderline 77% 72% 79% 71% 58% 81% Paranoid 71% 50% 89% 68% 65% 65% a MCMI-I. b MCMI-II. c MCMI-III.

< previous page

page_1061

next page >

< previous page

page_1062

next page > Page 1062

TABLE 34.5 Frequency of Personality Scale SENS and PPP Levels for MCMI-II and MCMI-III by Primary Diagnosis Size of Statistic SENS II SENS III PPP II PPP III 30%-39% 3 1 2 40%-49% 1 1 4 50%-59% 3 4 3 1 60%-69% 5 3 3 3 70%-79% 1 3 2 6 80%-89% 1 1 > = 90% 1 Interpretive Strategy The MCMI-III is a multiaxial instrument derived from an integrated model of psychopathology and personality. The interpretive logic of the Millon clinical inventories follows largely from these two basic facts. Accordingly, although the inventory can be used for diagnostic purposes, clinicians should do so with the goal of achieving an understanding of the person as an integrated entity, not as an aggregation of disorders. Philosophy of the Multiaxial Model The MCMI-III is based on an integrative conception of personality and psychopathology. The movement toward integrationism in the conception of psychiatric illness is not just an ideal; it is also an empirical, historical fact, illustrated by the evolution of the health sciences through two paradigms shifts, neither of which has yet been completed in psychopathology. The series of concentric circles comprising Fig. 34.2 represents changes that have evolved in medicine over the past century. In the center is Axis I, which contains the so-called clinical syndromes (e.g., depression and anxiety). These parallel what characterized the state of medicine 100 or more years ago. In the early-and mid-19th century, physicians defined their patients' ailments in terms of their manifest symptomatologytheir sneezes and coughs and boils and feverslabeling these ''diseases" with terms such as "consumption" and "smallpox." In contrast, the outer ring of Fig. 34.2 parallels Axis IV of the DSM-IV. The related medical paradigm shift occurred approximately a century ago when illnesses began to be viewed as the result of intrusive microbes that infect and disrupt the body's normal functions. In time, medicine began to assign diagnostic labels to reflect this new etiology, replacing its old descriptive terms. Dementia paralytica, for example, came to be known as neurosyphilis. Fortunately, medicine has progressed in the past decade or two beyond its turn-of-the-century "intrusion disease" model, an advance most striking these last 15 years due to immunological diseases like the HIV virus. This progression reflects a growing awareness of the key role of the immune system, the body's intrinsic capacity to contend with the omnipresent multitude of potentially destructive infectious and carcinogenic agents that pervade the physical environment. Medicine has learned that it is not the symptoms (i.e., the sneezes and coughs) or the intruding infections (i.e., the viruses and bacteria) that are the key to health or illness. Rather, the ultimate determinant is the competence of the body's own intrinsic defensive capacities. So too, in psychopathology,

< previous page

page_1062

next page >

< previous page

page_1063

next page > Page 1063

Fig. 34.2. Interactive nature of the multiaxial system. it is not anxiety or depression, or the stressors of early childhood or contemporary life, that are the key to psychological well-being. Rather, it is the mind's equivalent of the body's immune systemthat structure and style of psychic processes representing people's overall capacity to perceive and to cope with their psychosocial world (i.e., the psychological construct termed personality). The multiaxial model has been specifically composed to encourage integrative conceptions of the individual's manifest symptoms in terms of the interaction between long-standing coping styles and psychosocial stressors. Clinicians must retrace the aforementioned historical progression within the individual person in order to achieve a conception of patients' psychopathology that does not merely diagnose or document their boils and sneezes (i.e., the Axis I disorders), but instead contextualizes these manifest disorders in terms of the larger context of the individuals' style of perceiving, thinking, feeling, and behaving. The interpretive process may be described in terms of several levels or orders that facilitate such integrated interpretations. Diagnostic Decisions In the DSM-IV, personality disorders are diagnosed when a certain number of diagnostic criteria are fulfilled. For example, meeting five of eight criteria makes one a histrionic personality disorder, whereas meeting five of nine makes one a narcissistic personality disorder. This is the prototypal model of personality, wherein no one criteria is absolutely necessary to a diagnosis, and no one criterion is sufficient to produce a diagnosis. The prototypal model is often conflated with the categorical model, and the categorical

< previous page

page_1063

next page >

< previous page

page_1064

next page > Page 1064

model is typically eschewed by psychologists who prefer to view everything in dimensional terms. Nevertheless, professionals continue to "diagnose" personality "disorders," and these unfortunate terms hail from the medical model with its categorical implications. In turn, the assumptions of the medical model pollute personality assessment practices with paradigmatic misconceptions, making assessment a diagnostic affair in which the goal is to determine whether the subject meets criteria for a personality disorder, all-or-nothing. The diagnostic paradigm is inconsistent with the personality construct on three counts. First, normality and pathology exist on a continuum. Thus, the line between normality and pathology, which might in fact exist discretely if the patient was diseased or infected, simply does not exist. Second, with the advent of the multiaxial model in DSM-III, personality was given a contextual role with respect to the classical and diseaselike psychopathologies of Axis I. Personality, then, is an immunological construct whose deficiencies and strengths must be understood as disposing toward, or immunizing against, the development of classical psychopathological symptoms. Yet, personality cannot simultaneously be the disease and an immunological protection against disease. Thus, the misconstruction of personality in the medical model is inconsistent with the multiaxial system. If the term diagnosis is to be preserved at all, it can only become a shorthand means of noting that the patient "requires intervention," or that the individual is functioning "in the clinical range," without referring to any particular content entity. Diagnosis, then, is not, as in the medical model, a determination of the presence or absence of a disease process. Instead, it is only concerned with whether the individual represents a "case,'' and how the individual's personality is tied up in the meaning of past and current problems. In other words, for Axis II diagnosis should be regarded as a pragmatic, not an ontological, issue. A systems usage simultaneously reports the existence of substantial limitations on personality functioning, and makes salient the idea of new possibilities for the person should these constraints be relaxed. Third, the all-or-nothing nature of diagnosis obscures the focus of the systems model on the internal differentiation of personality. The systems model maintains that pathology can exist to varying degrees in various domains of the system. Unlike the binary idea of a disorder, which must be either present or absent, on or off, constraints are explicitly stronger or weaker. Thus, the idea of a constraint pulls for a continuum. Finding and characterizing these constraints is the proper mission of assessment. Diagnosis is only an intermediate and often distracting goal. Nevertheless, the MCMI-III includes cutting scores that suggest diagnoses for both Axis II and Axis I. In MCMI-II, a BR of 75 or higher suggested the presence of a personality disorder, and a BR of 85 suggested the prominence of that disorder. In MCMI-III, a BR of 75 suggests problematic trait features, and a BR of 85 suggests personality disorder. For Axis I, a BR of 75 suggests the presence of disorder and BR of 85 or higher suggests the prominence of that disorder, for both MCMI-II and MCMI-III. Configural Interpretation The BR boundaries previously suggested are fuzzy and artificial. They are presented for practical purposes, in situations where labels must be assigned to persons, and do not exist in reality. The interpretation of a personality inventory should be congruent

< previous page

page_1064

next page >

< previous page

page_1065

next page > Page 1065

with the nature of personality as a construct. Historically, the word "personality" derives from the Greek term persona, originally representing the theatrical mask used by dramatic players. Through history, the meaning of the term has shifted from external illusion to surface reality, and finally to opaque or veiled inner characteristics. Presumably, the dimensions of personality assessed by any instrument are intended to capture these veiled inner characteristics. Many clinicians complain that their patients receive three, four, or more personality disorder diagnoses. This has led many to express dissatisfaction with the DSM-IV schema. It has already been noted that if the term diagnosis is to make sense at all, it must be embedded in the systems model, not in the medical model; and it can only refer to a clinical range of functioning, and not to the quantity of a psychological construct or trait, but only to its functional and contextual consequences. At a deeper level, however, the complaint that patients receive too many personality disorder diagnoses can obscure a fundamental misconception concerning the purpose of a classification system and its relation to assessment, one that is just as valid for normal as for pathological personality. Just as nature was not meant to suit people's need for a tidy and well-ordered universe, patients are not intended to fit snugly into categories and dimensions. Often this reflects some shortcoming in the classification system itself, as with the DSM-IV. However, where the goal of an assessment is the understanding of the total person, the constructs of a classification system serve as reference points against which the individual should be compared. In the medical model, the question is which diagnoses the patient will receive. In the systems model, however, the questions are why the person receives these particular diagnoses or profile elevations rather than others, which is a developmental issue; how the individual's characteristics interact with family, job, and school contexts to produce symptom formation; and which domains of personality contain strengths and constraints on functioning. Answering the last question explicitly requires that individuals be compared against the prototypes they most resemble in order to discover exactly how there are similarities to the prototype and how there are differences. If an individual is characterized as narcissistic, then this is important information. However, if features of the depressive personality are also present, as with the Voguish Narcissist, then therapy must be modified away from that which would ordinarily be prescribed for the prototypal narcissist. This should be reflected in the therapeutic recommendations section of the clinical report. The constructs of a classification system are simply a point of departure for comparison and contrast in achieving a total understanding of the complexity of the total person. When, in the course of an assessment practitioners begin to feel that their understanding of the subject has reached the point that ordinary nosologic labels no longer adequately apply, but instead must be qualified and nuanced, then they have falsified the classification system relative to the subject and likely reached a truly idiographic understanding of the person. At this level, it is inventory scales that stand in place of the person as a hypothetical construct. To be truly idiographic, the interpretation must reach successively more integrated and particularized forms, progressing first from scale scores, then to profile patterns, then to the integration of historical and developmental data, and finally to the development of a coherent and total picture of the person that makes sense of symptoms and isolates directions of remediation. The final product, the clinical report, thus stands in place of the person, much like test items stand in place of hypothetical construct. Only psychobiography affords a deeper level of idiographic explanation. In contrast, treatment plans based on diagnostic dispositions alone look distressingly superficial.

< previous page

page_1065

next page >

< previous page

page_1066

next page > Page 1066

Domain Synthesis and Treatment Planning Perhaps the most annoying misconception about abnormalities of personality, perpetuated in part by the medical disease model, and in part by a habit of language, is that personality is a substance that fills the vessel of the person. In the parlance of philosophy this is called reification, or the transformation of a thought into a thing. This misconception has broad implications for clinical assessment and therapy. If personality is a substance, then the purpose of the assessing clinician is to determine whether this substance is good or bad, normal or disordered, and the purpose of the therapist is to somehow achieve the wholesale transmutation of a bad personality into a normal one. In contrast, if personality is viewed as a structural-functional system, the purpose of assessment is to identify constraints on functioning operating within the system. The purpose of the therapist is then to address these constraints in order to make system functioning more flexible. In the substance view, it is necessary to empty and refill, or transmute, the entire individual. In the system view, it is necessary only to identify the most compelling constraints on functioning. These can then be prioritized and addressed in terms of their relative severity. For example, within a few sessions, a therapist determined that a client seemed disposed to the use of a small number of immature defense mechanisms. By modeling for the subject alternative interpretations of the interpersonal situations in which these defense mechanisms were frequently used, the client was not only able to deal with anxiety in ways less threatening to the self and less aversive to others, but was also able to see relationships more realistically. By dealing with the maladaptive use of defense mechanisms, then, the subject's interpersonal and cognitive functioning improved as well. The domains of personality can be systematically organized in a manner similar to distinctions drawn in the biological realm, that is, by dividing them into structural and functional domains in accord with the four historic approaches that characterize the study of psychopathology, the biophysical, intrapsychic, phenomenological, and behavioral perspectives. Domain descriptors for each of the 14 personality disorders have been developed and are presented in the MCMI-III manual and in Millon and Davis (1996). The narcissistic personality domains are presented in Table 34.6. The proper interpretive use of these domains in achieving a more idiographic description of personality is detailed in the chapter on the Millon Adolescent Clinical Inventory (chap. 13). The functional and structural descriptors are intended to operationalize the entire matrix of the person who is the subject of assessment. Because the MCMI-III is intended to be a brief and practical instrument, and because it is coordinated with the DSM-IV personality disorder criteria that are often weighted toward some domains rather than others, this goal is only partially achieved. Accordingly, the descriptive paragraphs offered for the clinical domains for each of the personality disorders should be viewed as clinical hypotheses to be sustained on the basis of auxiliary evidence outside the MCMI-III, including the clinical interview, the reports of informants, other instruments, the therapist's own experience with the subject in session, and so on. Nevertheless, the essential principle is that every individual personality has structural and functional referents that are interpersonal, cognitive, psychodynamic, biophysical, and so on. Were personality disorders simply linear pathologies that emanated from a single domain, treatment could proceed on the basis of the medical disease model. However, it is precisely the interactive and reciprocally causal nature of the personality system that lends personality pathology its tenacious and self-perpetuating character, and makes it notoriously difficult to treat.

< previous page

page_1066

next page >

< previous page

page_1067

next page > Page 1067

TABLE 34.6 Clinical Domains of the Narcissistic Prototype Behavioral Level: (F)aExpressively Haughty (e.g., acts in an arrogant, supercilious, pompous, and disdainful manner, flouting conventional rules of shared social living and viewing them as naive or inapplicable to self; reveals a careless disregard for personal integrity and a self-important indifference to the rights of others). (F) Interpersonally Exploitive (e.g., feels entitled, is unempathic, and expects special favors without assuming reciprocal responsibilities; shamelessly takes others for granted and uses them to enhance self and indulge desires). Phenomenological Level: (F) Cognitively Expansive (e.g., has an undisciplined imagination and exhibits a preoccupation with immature and self-glorifying fantasies of success, beauty, or love; is minimally constrained by objective reality, takes liberties with facts and often lies to redeem self-illusions). (S)bAdmirable Self-image (e.g., believes self to be meritorious, special, if not unique, deserving of great admiration, and acting in a grandiose or self-assured manner, often without commensurate achievements; has a sense of high self-worth, despite being seen by others as egotistic, inconsiderate, and arrogant). (S) Contrived Objects (e.g., internalized representations are composed far more than usual of illusory and changing memories of past relationships; unacceptable drives and conflicts are readily refashioned as the need arises, as are others often simulated and pretentious). Intrapsychic Level: (F) Rationalization Mechanism (e.g., is self-deceptive and facile in devising plausible reasons to justify self-centered and socially inconsiderate behaviors; offers alibis to place oneself in the best possible light, despite evident shortcomings or failures). (S) Spurious Organization (e.g., morphologic structures underlying coping and defensive strategies tend to be flimsy and transparent, appear more substantial and dynamically orchestrated than they are in fact, regulating impulses only marginally, channeling needs with minimal restraint, and creating an inner world in which conflicts are dismissed, failures are quickly redeemed, and self-pride is effortlessly reasserted). Biophysical Level: (S) Insouciant Mood (e.g., manifests a general air of nonchalance, imperturbability, and feigned tranquility; appears coolly unimpressionable or buoyantly optimistic, except when narcissistic confidence is shaken, at which time either rage, shame, or emptiness is briefly displayed).a Functional domain. b Structural domain. aFunctional domain. bStructural domain. Accordingly, therapies that conceptualize and treat personality pathology from a single perspectivebe it psychodynamic, cognitive, behavior, or physiologicalmay be viewed as necessary but not sufficient for a therapy of the person. Unfortunately, the practice employing single modalities, exclusively cognitive therapy, exclusively behavioral therapy, exclusively pharmacological therapy, and so on, to every patient encountered, is not yet extinct. Even therapists identifying themselves as eclectic typically lean toward only a few perspectives, often to the exclusion of others. If personality disorders were anchored exclusively to one particular structural or functional domain (as phobias are thought of being primarily behavioral), domain-bound psychotherapy would be appropriate and desirable. The etiology of the personality disorders would be monocausal, and the assumptions of the medical disease model would be valid for Axis II as well as Axis I. In that case, of course, the personality disorders would not be disorders of personality at all, but would instead be better thought of as cognitive disorders, or psychodynamic disorders, or behavioral disorders. Rather than apply behavioral or cognitive or psychodynamic therapy to every subject met in clinical practice, insight into the essential difference between Axis I and Axis II lets clinicians specify two prototypal or ideal forms of therapy for the personality disorders. Just as a configural interpretation of the Axis II scales is not a convenient clinical practice, but is instead explicitly required by the nature of personality itself, so

< previous page

page_1067

next page >

< previous page

page_1068

next page > Page 1068

too does the nature of personality pathology explicitly require forms of therapy derived from personality as a construct. These should parallel the two essential features of personality pathology already presented, adaptive inflexibility and vicious circles. Because personality pathologies are not medical diseases, how could they possibly be diagnosed and treated effectively as such? Because personality regards the entire matrix of the person, how could pathologies of personality possibly be treated effectively through an exclusively behavioral or cognitive or interpersonal approach? Instead, the key to treating personality lies in constructing, for each individual subject, therapies that not merely combine, but synergize, various interventions that then become more than the sum of their parts. Such synergistic forms of therapy achieve an efficacy beyond what would have been possible were each applied separately, and may be thought of as idiographic therapies, based on the logic of the individual case as derived from the assessment. That is why the MCMI-III is a theoretical and multiaxial instrument. The first cardinal characteristic of personality pathology, adaptive inflexibility, should be countered through what are termed potentiated pairings. Treatment methods are simultaneously combined to overcome problematic characteristics that might be refractory to each technique if administered separately. These composites pull and push for change on many different fronts, so that the therapy becomes as multioperational and as tenacious as personality pathology itself. A currently popular illustration of these treatment pairings is found in what has been referred to as cognitive-behavioral therapy. Adaptive inflexibility is manifested in its more active form in problems the personality disordered create for themselves through their inability to constructively engage a diverse range of psychosocial circumstances. In its more passive expression, this characteristic is seen in the attempts of the personality disordered to narrow the range of psychosocial environments to which they must adapt. Thus, the antisocial personality encounters difficulties because of what is done, an inappropriate use of instrumental behavior, whereas the dependent encounters difficulties because of what cannot be done, a failure to engage in instrumental behavior. Where logically applied then, these therapeutic composites draw from the nature of the constructs through which an idiographic understanding of the individual has been derived. The second cardinal characteristic of personality pathology is a consequence of the first, a tendency to foster vicious circles. Its therapeutic counterpart is termed a catalytic sequence. Here, the order in which treatments are executed is planned to optimize the impact of changes that would be less effective if the sequential combination were otherwise arranged or not previously thought out. In a catalytic sequence, for example, clinicians might seek first to alter a patient's stuttering by direct behavioral modification procedures that, if achieved, would facilitate the use of cognitive methods in producing self-image changes in confidence. This, in turn, would foster the utility of interpersonal techniques in effecting improvements in social relationships. There are, of course, no discrete boundaries between potentiating pairings and catalytic sequences. Instead, they are intrinsically interdependent. Their application is intended to foster increased flexibility and, hopefully, beneficent rather than vicious circles. Potentiated pairings and catalytic sequences represent only a first order of therapeutic synergism. A therapist might, for example, decide that a "potentiated pair" of cognitive-behavioral techniques works well together, to be followed by another pair of techniques combining elements of the interpersonal and self-image domains. This "potentiated sequence," or "catalytic pairing," recognizes that the two fundamental synergistic procedures may be built on each other, depending on the ingenuity of the therapist and the tenacity of the disorder.

< previous page

page_1068

next page >

< previous page

page_1069

next page > Page 1069

Perhaps it is easiest to grasp the integrative process of synergistic therapy when thinking of personality domains as analogous to the sections of an orchestra, and the pathological characteristics of the subject as a clustering of discordant instruments. Therapists, then, may be seen as conductors whose task is to bring forth a harmonious balance among the players, muting some here, accentuating others there, all to the end of fulfilling their knowledge of how "the composition" can best be made consonant. The task is not that of altering just one instrument, but of altering all, in concert. Just as music requires a balanced score, one composed of harmonic counterpoints, rhythmic patterns, and melodic combinations, what is needed in personologic therapy is a likewise balanced and synergistic program, a coordinated strategy of counterpoised techniques designed to optimize treatment effects in an idiographically combinatorial and sequential manner. Treatment Outcome Assessment Evaluation of the MCMI-III as an Outcome Measure In the first edition of this text, Newman and Ciarlo (1994) specified criteria for evaluating psychological instruments as outcome measures. The MCMI-III fares well when evaluated against these criteria. The MCMIIII is explicitly intended for use with clinical populations (normal subjects are inappropriate). The MCMI-III was constructed as a multiaxial instrument coordinated with both a coherent clinical theory and with the DSMIV nosology. In addition, its 175-item length and eighth-grade reading level make it basically self-administering. The inventory requires less than a half hour to complete. Scale scores are based on national samples, and prevalence rates are informed by clinical ratings on the normative population, external validity studies, and clinical wisdom. Correction factors are available to mitigate the influence of response biases. Assessments of the reliability and validity of the instrument were an integral part of the test construction process. Postconstruction studies and studies conducted by independent researchers have generally found previous generations of the MCMI to have good reliability and validity. Computer scoring is available and provides either a profile report, or the more comprehensive interpretive report written in easy-to-understand language. The scale names are descriptive, and scale elevations beyond the BR cutoff scores indicate the relative prominence of the personality features or the relative severity of clinical syndrome scores. General Considerations A discussion of therapeutic outcome, however, should go beyond matters of psychometrics and convenience, and consider the substantive nature of personality and the relation between construct systems and the individuals assessed within them. Personality theorists distinguish between two levels of description, a nomological or construct-centered level, and an idiographic or person-centered level. Ideally, these two levels should converge, in that the foundation offered by clinical theory provides a point of departure for understanding the total person and their symptoms and broad pathologies. An analogy may be drawn between the items of a scale and profile of a multiaxial instrument; just as a set of items stands in place of, and operationalizes, the construct it measures, the personality and symptom profiles stands in place of, and operationalizes, the total

< previous page

page_1069

next page >

< previous page

page_1070

next page > Page 1070

person. In an ideal world, the profile would literally be a complete schematic of the subject. Much like a map or diagram, it would be a simplification, but nevertheless leave no necessary aspect omitted. Personality is more than the sum of its parts, and likewise, if personality profiles are to have any meaning beyond that conveyed by the scale scores alone, some information must be derived from the profile that the scale scores cannot singly contain. Although this point is generic to any personality instrument, it complicates any straightforward discussion of outcome assessment; if a personality profile is to have genuine ontological teeth, then it should contain emergent information not predicable to the scales themselves. Combining the focus of outcome assessment, idiographic or nomothetic, by the unit of analysis, group or individual, yields four combinations. Issues related to outcomes assessment are discussed for each combination later. First, a practitioner may take a construct-centered attitude toward a particular group. Classic psychotherapy outcome studies and dose-response studies would fit here. For example, a group of personality disorder patients receiving psychodynamic therapy might be contrasted with a group receiving cognitive therapy and a waiting list control. Outcome questions might include: To what extent does the overall profile elevation contribute to psychotherapy outcome? Does the number of personality disorders suggested by the test interact with the kind of psychotherapy administered? Might there be an interaction between the kind of therapy and its utility for Axis I and Axis II disorders? Perhaps cognitive therapy is more effective for treating symptoms, and short-term psychodynamic therapy is more effective for personality problems, for example. Here, the focus is on change on the MCMI-III scales at multiple points in time. Outcome for most psychological inventories may be quantified directly by comparing scores at Timepoint A to scores of Timepoint B. If therapy is successful, scores should become lower across time, indicating less depression, less thought disorder, and so on. In the era of managed care, Axis I problems are more likely to be the focus of therapy, because they are almost certain to be dramatic and ego-dystonic, and readily lend themselves to the formulation of treatment plans, whereas the more subtle personality features of Axis II do not. In contrast, the nature of certain personality disorders (notably, the Narcissistic, Histrionic, and Compulsive) often makes straightforward changed scores difficult to interpret. For other personality disorders, more is worse: Being less Schizoid is still somewhat Schizoid, for example, which is still undesirable. However, less narcissism may equate with a positive and healthy level of self-confidence. Likewise, in its less severe form, the Histrionic is sociable, and the Compulsive is simply respectful. For individuals who obtain low scores on one of these scales, successful treatment may actually result in increasing BR scores on that scale. Thus, an individual with a low self-worth may actually obtain a somewhat low score on the Narcissistic personality scale, which may increase as self-confidence increases. In a mixed sample of subjects with both high and low pretreatment scores on one of these scales, mean difference scores may wash out when averaged as group effects. Second, one may take a construct-centered attitude toward a particular person. This approach is simplistic and actuarial in that only a few variables are examined, and it is not believed that the score of any one variable necessarily changes the interpretive significance of another in a way that forces the entire profile to be considered. Once again, in the era of managed care, where short-term goals are highly focal and operationalized, the Axis I variables are likely to be examined against previous scores. Third, one may take an idiographic approach toward a particular group. The purpose here is to understand a particular group of subjects as unified by common underlying

< previous page

page_1070

next page >

< previous page

page_1071

next page > Page 1071

themes. Just as understanding individual patients involves comparison and contrast with the diagnostic prototypes of Axis I and Axis II, psychological tests may be administered to specific subgroups in order to develop some understanding about their unifying characteristics and dynamics. A psychologist with an interest in cross-cultural issues, for example, might wonder how personality pathology is manifested among individuals assimilating into the larger culture, and how this affects therapeutic outcome. Such subjects have a foot in each culture, but nevertheless form a cohesive subgroup that can only be understood, from the outsider perspective of the psychologist, through established diagnostic reference points. Obviously, such groups are most often samples of convenience seen in a specific practice or research setting. The question is: How does the current subgroup differ from representative groups, and how does this affect treatment and outcome assessment options? Answering this question rigorously is difficult, because the standard of reference against which the clinician assesses outcome is no longer all depressive or all narcissists, but instead becomes the local norms of the particular practice setting or the characteristics of a particular patient group. Following the example, the outcome of assimilating subjects might be poor when compared against cultural natives, but nevertheless be remarkable when compared to their own subgroup. Because the examiner's initial goal is to understand the group, psychological inventories are likely to be used in the context of hypothesis generation and testing with all available data. As a qualitatively sophisticated understanding is achieved, outcome becomes difficult to quantify in any straightforward manner, because scale scores and profile characteristics take on a different meaning for the subgroup than for the sample on which the test was normed. This issue is generic to all psychological tests. Fourth, an idiographic approach may be taken to a particular person. This is the ordinary, everyday clinical situation, so it requires extended comment. In essence, the issue concerns what should constitute the baseline against which outcome is judged, and goes directly to the boundary between clinical psychology as an art and a science, and the ontological versus epistemological nature of so-called error variance. As noted, the ideal taxonomy is one that "carves nature at its joints." Essentially, all taxonomies seek to account for the particular characteristics of items to be classified in terms of more general laws and principles. To the extent that it does so, a taxonomy is successful. In an ideal world, all of the particular characteristics of every subject would be understood through a single set of principles, a unified science. In clinical psychology, it might be said that the characteristics of each person would fit perfectly into the classification. Idiographic knowledge would not exist in such a world, because once classified, the sum total of an individual's phenomenology, development, and future would be knowledge forever determinable on the basis of psychological laws alone. No contingent historical facts (e.g., the fact that one was mugged in Sandusky and lost hope in humanity) would exist outside the scope of the taxonomy. Although many problematic philosophical issues beset factor analysis, this is definitely one of its attractions. Factor analysis extracts latent dimensions from correlational data until nothing of any consequence remains. Loosely speaking, by comparing the amount of variance accounted for by the factor model to the amount of residual variance, the extent to which a particular domain is comprehensible in terms of more general constructs can be assessed. If residual variance dominates, a solid researcher is likely to argue that the original correlations were highly unreliable, and that this can be taken care of by obtaining a more reliable criterion or by larger samples. However, there is an equally plausible rival hypothesis. The development of the particular entities factored within the subject domain may be multiply informed by so

< previous page

page_1071

next page >

< previous page

page_1072

next page > Page 1072

many interacting contingent biographical facts that, after a time, emergent characteristics dominate. As a result, the science becomes more interpretive, more "hermeneutic." Thus, a subject with a biologically irritable temperament is adopted by considerate parents who instill within the child a degree of conscientiousness, which inhibits and controls otherwise socially troublesome expressions of anger through the preschool years. However, the breadwinner of the family becomes the victim of political machinations at work, and loses her job. The family becomes impoverished and struggles to make ends meet. Complicating matters further, the national economy falls into recession, and the family's financial outlook becomes more bleak. The father turns to drinking, and the child enters adolescence and watches other students grow up with more opportunities, and more intact and happier family systems. Eventually resentment builds, and the child begins to run into social problems within the peer group. A vicious circle begins in which she is teased and ridiculed by peers, and she begins to shrink from social contact. Simultaneously, she begins to project her own anger onto others, and they become her persecutors. The cycle of alienation intensifies, and years later, psychological testing reveals an Avoidant-Paranoid pattern. The facts of the case hang together, but they are not completely accounted for the diagnosis. Innumerable different developmental courses to the same diagnosis could be created, for which therapy might begin in a different way, have a different course, and a different outcome. A good clinician, beginning therapy with a psychological assessment that integrates the test findings with all available data, both from interviews with the client, with significant others, and even others tests, begins therapy with a different psychological baseline than does the researcher. The researcher begins with a set of scale scores alone, because these are concrete enough to fit into the regression methodology that will be required to published after all the data are collected. Moreover, the researcher may be interested only in changed scores derived from a few scales. In contrast, the clinician views the obtained profile pattern as a substantive substitute for the whole person. If the instrument's scales are linked to a generative theory, these suggest clinical hypotheses that can be fruitfully explored in connection with data outside the test itself. Eventually, the clinician constructs a complete clinical report that ideally synthesizes the world of principled science and contingent historical fact in order to make specific therapeutic recommendations for the current case. This report becomes a qualitatively sophisticated baseline that is inscrutable to most any multivariate methodology. Whereas researchers assume that the constructs their instruments measure are real and quantifiable, and that everything else is error variance, clinicians view the construct system as a network of reference points against which the person should be compared and contrasted in order to develop a more sophisticated understanding of the total person. The purpose of an assessment is to understand the person as an individual, and the clinician's goal is to reach a point in understanding the person that is so sophisticated that it seems to falsify the system of generalities on which the taxonomy is based (good clinicians are thus likely to be chronically dissatisfied with official taxonomies). For the clinician, material not amenable to the construct system does not reflect residual error, but instead is the interactive and emergent product of history, and is comprehensible given adequate clinical experience and insight. Much of what was interpreted to be residual variance by the researcher is thus ontological, or real, information available to clinical work as an art. Obviously, the more qualitatively sophisticated the clinical baseline, the more difficult change scores are to obtain in any straightforward manner. This discussion is relevant to the MCMI-III because of the theory on which the instrument is based, its method of construction, and the form of therapy dictated by the

< previous page

page_1072

next page >

< previous page

page_1073

next page > Page 1073

nature of personality as a construct. As already noted, personality is a substantive system that specifically requires profile interpretation of results; this represents the first step from a simple, straightforward, quantitative assessment to one that is qualitative and idiographic. The section dealing with domain synthesis represents yet another set, one that allows numerous clinical hypotheses to be advanced across diverse domains of personality corresponding to historical approaches to the field. In the context of all available data, including biographical history, information from other tests, and so on, yet another step is taken toward an idiographic understanding of the person as a unique entity. Case Study The authors thank James Choca of the Department of Veteran Affairs Medical Center in Chicago, Illinois, for providing the following case study. The material excerpted derives originally from a psychological evaluation in which the MCMI-II was utilized in conjunction with other psychodiagnostic instruments. The study has been modified and updated from MCMI-II to MCMI-III for illustrative purposes. Presenting Complaints Sally is a 21-year-old White female with several somatic complaints, including irregularities with her menstrual cycle and gastrointestinal problems. She explained that she has nervous episodes during which she feels hot and sweaty and spontaneously becomes sick to her stomach. These episodes happen quite often, evidenced by the fact that she vomited every day in the previous week. Although medication is taken only when needed, she reported taking this medication four times during the previous week. In addition, Sally states that she has been feeling sad and depressed, which she deals with by becoming irritable at home and picking fights with her husband. She also notes loss of energy and a loss of interest in formerly interesting activities, such as her enthusiasm for exercise, which she has now discontinued. Her most recent major stressor was the unexpected death of her sister, about which she reports frequent nightmares. The patient also had an abortion to terminate an unwanted pregnancy. Finally, she is unhappy with her present employment. Psychiatric History Sally has been under psychiatric care most of her life. From age 6 to 8 she was repeatedly molested by a family friend. Although distasteful and traumatic, she continued to go willingly into the furnace room of the church where the abuse occurred. Though she did not make anything out of these incidents at the time, after she eventually overheard her mother refer to her as having been molested, she became aware of the importance of what had taken place. In any event, the parents felt she needed some counseling after the abuse and took Sally to see a number of professionals. The patient denied experiencing any symptoms until 1985, when she went under the care of Dr. Geosits. Although she minimized the problems she was experiencing, she admitted having been depressed from 1985 until 1989 when she was treated with Desyrel. It also was acknowledged that she was drinking excessively and had become promiscuous. Other forms of substance abuse were denied. From 1989 until most recently, the patient did not receive any therapy and felt she

< previous page

page_1073

next page >

< previous page

page_1074

next page > Page 1074

was doing well. For the last few weeks, she has been seen by Dr. Gonleoski as an outpatient. Medical History Sally has an involved medical history that includes multiple visits to the emergency room and several hospital admissions. For instance, she explained that she has suffered from stomach ulcers since age 10, and has been in the hospital several times as a result of this problem. Social History Sally was born and raised in ''Anytown USA." She talked about her parents as being perfect, and explained that they never even argued with one another. In fact, the only issue she had during her childhood was the already mentioned molestation. Now 55 years old, the patient's mother was a psychologist who worked with impaired children. The mother recently left her job to raise the two children left behind by the patient's deceased sister. Sally was resentful that her mother had taken over this task, noting that she had lost her role as "the baby" of the family when this happened. Sally's father is also 55 years old and is now working as a photographer. The patient described her father as "laid back" and "very caring." She claimed to have been his favorite child. Sally was the youngest of three siblings. Her 28-year-old brother is divorced and has no children. He was said to be addicted to drugs and to be "sort of messed up." The second sibling was the sister who died in 1991 at the age of 22. This sister was married and had two children. The patient explained that the sister had been her best friend and that her death had been a very significant loss. Sally married her present husband last July. She described him as a wonderful person. He is very supportive and the two of them have a great relationship. She noted that she had been engaged to someone else when the two of them met, but she felt immediately attracted to him. Sally became pregnant in January and had an elective abortion. She is still questioning this decision, but noted that neither she nor her husband were ready to assume the responsibility of raising a family. The patient denied ever having difficulties making friends. However, her social life has been unique. For instance, she noted that she never dated anyone who was not 21 years old or older. She reported being very promiscuous for several years, and acknowledged having destructive relationships with several boyfriends. For instance, one of them often was physically abusive toward her prior to having sex. She also talked about an incident in which two guys locked her in an apartment and proceeded to have sex with her. The patient described these occurrences in a matter-of-fact manner and volunteered that she had been a somewhat willing participant. Educational History Sally was apparently an excellent student during high school. She attended a program that only took students in the top 10% of the public school system. She feels that she worked so hard during those years that she burned out, and after that was not ready to be a serious student. The patient did attend Morine Community College for 2 years,

< previous page

page_1074

next page >

< previous page

page_1075

next page > Page 1075

but discontinued her education without obtaining a diploma. She plans to return to school some time in the future. Occupational History Part of what distracted Sally from pursuing a college degree was her success as a model. She claims to have made a lot of money with this work before she broke her foot in a basketball accident a few months ago. For the last 3 months, she has worked as a receptionist. Although her present job has been alright, now that her foot is healed she is hoping to return to modeling. The previous description of Sally's presenting complaints and history provides information that can be used to begin formulating the case. According to one hypothesis, Sally has some difficulty regulating and expressing particular forms of strong affect. By her own admission, Sally feels uncomfortable with feelings of sadness, preferring to cope by getting angry. Sally also characterized her parents as perfect, stating that they never argued with each other. This assertion seems to be rather unrealistic, and may be viewed as an attempt to mask some underlying issues. Similarly, her somatic complaints may be a mechanism through which she releases negative emotion. This is not to discount the real possibility that some of her somatic symptoms may have definite physical underpinnings. Rather, her inability to regulate negative affect may exacerbate the problems. There are several other aspects of Sally's history and presenting complaints that should serve as red flags for the clinician. Clearly, Sally's feelings about her recent abortion and the death of her sister need to be explored. The sexual molestation and the incident in which two men locked her in an apartment and had sex with her clearly must have had an impact on her emotionally, despite her assertions to the contrary. Although Sally states that her relationship with her husband is a good one, the fact that she met him while engaged to someone else bears investigation. It is possible that she has difficulty forming stable attachments and perpetuates a pattern of moving from one relationship to the next. Mental Status Examination Sally came to the office impeccably dressed in a business suit and carrying a briefcase; she looked more like a lawyer than a receptionist or a model. At the time of the examination she was alert, oriented, verbal, and coherent. The affective response was generally appropriate to the content of the conversation. However, there was a certain "belle indifference" so that the abuses she reportedly sustained, or the many somatic symptoms she has been experiencing, were talked about in a very matter-of-fact manner, without the kind of feeling they would normally produce. The mood was within normal limits and she demonstrated a good range of emotions. No suicidal or homicidal ideation was verbalized. As noted when discussing her social history, it is possible that Sally has some difficulty expressing and regulating strong negative affect. This hypothesis is buttressed by the clinician's observations during the mental status exam. Specifically, she exhibited a "belle indifference" when discussing events that would ordinarily produce some display of emotion.

< previous page

page_1075

next page >

< previous page

page_1076

next page > Page 1076

Tests Administered Shipley Institute of Living Scale (SILS) Adaptive Category Test (ACat) Millon Clinical Multiaxial Inventory-III (MCMI-III) Rorschach Inkblot Test Thematic Apperception Test (TAT) Intellectual Assessment The mental status examination gave no indications of cognitive or memory deficits. The scores from the SILS suggested that Sally has average intellectual abilities. The score on the ACat was excellent. This test is a demanding problem-solving task that often is performed poorly by persons suffering from either cognitive deficits or emotional impairment. The scores on the SILS and on the ACat, in conjunction with her academic performance in high school, indicate that Sally's discontinuance of her college education most likely was not a result of academic difficulty. Sally's excellent performance on the ACat indicates that she may not be experiencing a high level of emotional turmoil, despite the traumatic events in her life. This lends support to the notion that her emotional energy is being redirected in such a way that it manifests in the form of irritability and somatic complaints. The following section discusses the personality assessment. Sally's MCMI-III scores are presented in Table 34.6. Personality Assessment The scores that Sally obtained on the MCMI-III suggested a histrionic personality style with competitive and narcissistic traits. The data suggested that these traits may be organized at a borderline level. For instance, the Rorschach showed low developmental quality and her associations were full of anatomic responses and carcasses, seemingly suggesting anger and destructive inclinations or fears. However, given the patient's age, some caution has to be exercised. If she were to continue demonstrating the kind of destructive acting out that reportedly has characterized her past life, she undoubtedly would meet criteria for a borderline personality disorder. The remainder of this section offers a more detailed description of the personality makeup. Sally's MCMI-III scores showed a predominance of histrionic traits in her basic personality structure. Histrionics are colorful and emotional individuals. They are people who seek stimulation, excitement, and attention. They react very readily to situations around them, often becoming involved in them, but typically the involvement does not last. This pattern of getting involved and ending up bored is repeated one time after another. The histrionic person is good at making positive first impressions. Their ability to react to unexpected situations, their alertness, and their search for attention make them colorful and charming socialites in parties or other social gatherings. However, they often can be too loud, exhibitionistic, and overly dramatic. They can be demanding and uncontrollable, especially on occasions when they are highly involved. They may have intense emotional moments in friendships, but these friendships may be short lived and replaced when boredom sets in.

< previous page

page_1076

next page >

< previous page

page_1077

next page > Page 1077

TABLE 34.7 MCMI-III Scores for Sally Base Rate Score

Variable Modifier Indices V Validity X Disclosure Y Desirability Z Debasement Basic Personality 1 Schizoid 2a Avoidant 2b Depressive 3 Dependent 4 Histrionic1 5 Narcissistic 6A Antisocial 6B Aggressive & Sadistic 7 Compulsive 8A Passive-Aggressive 8B Self-defeating6 Severe Personality Pathology S Schizotypal C Borderline P Paranoid Clinical Syndromes A Anxiety Disorder H Somatoform Disorder N Bipolar: Manic Disorder D Dysthymic Disorder B Alcohol Dependence T Drug Dependence R PTSD Severe Clinical Syndromes SS Thought Disorder CC Major Depression PP Delusional Disorder

0 75 57 69 30 61 60 30 15 115 115 115 72 113 4 48 82 62 93 65 80 59 55 64 70 58 62 69

Individuals with similar scores are inclined to see their environment as primarily competitive. To function in it, they feel that they have to fend for themselves. Most individuals with this view are, as a result, somewhat distant, distrusting, or suspicious of others. They see themselves as assertive, energetic, self-reliant, strong, and realistic. They feel they have to be tough to make it in this dog-eat-dog world. Sally also exhibits a tendency toward an inflated self-image. Sally probably sees herself as more capable, interesting, and worthwhile than the people around her. This tendency often is externalized through an air of conviction, independent security, and self-assurance. These individuals tend to be argumentative and contentious, and may even be abusive, cruel, or malicious at times. When matters go their way, they may act in a gracious, cheerful, and friendly manner. More characteristically, however, their behavior is guarded, reserved, and resentful. When crossed, pushed on personal matters, or faced with embarrassment, they may respond quickly and become angry, revengeful, and vindictive. In fact, the testing gave indications of a tendency to be at least aggressive, if not hostile, in her interactions with others. Similar individuals emphasize the ability to remain

< previous page

page_1077

next page >

< previous page

page_1078

next page > Page 1078

independent and are not inclined to do what others tell them to do. They are competitive by nature, and may be seen as behaving in a callous manner in the struggle to be ahead of everyone else. They are likely to be distrusting, to question the motives that others may have for their actions, and assume that they have to be vigilant and on guard if they are to protect themselves. Projection typically is used as a defense, so that Sally would be inclined to blame others for anything that goes wrong. People obtaining similar scores on the aggressive scale of the MCMI-III are likely to be touchy people: Excitable and irritable, they often have a history of treating others in a rough or mean manner, and of angrily flying off the handle whenever they are confronted or opposed. As can be seen from this discussion, the clinician generates a description of the individual's personality based on the MCMI-III scales. Several aspects of the description of Sally's personality are supported by the social history, clinical observations, and other test data. For example, evidence for Sally's anger can be found not only in her MCMI-III Scale 6B (Aggressive) elevation, but also in the data from the Rorschach Inkblot Test. Sally's MCMIIII profile configuration also suggests a pattern of unstable relationships. This is supported by her history of promiscuity and the fact that she met her current husband while engaged to another man. However, the purpose of objective personality testing is not just to support or confirm clinical impressions. Rather, personality tests attempt to go beyond the data that can be gathered from just a clinical interview. The MCMI-III seeks to provide clinicians with a sense of what they can expect from the client. Furthermore, the MCMI-III profile can be used to generate a set of working hypotheses, which can be investigated in future sessions. Two aspects of Sally's MCMI-III profile that were not discussed earlier are related to the pattern of scores she obtained on the modifier indices. First, Scale V (validity) received a score of zero, indicating that the profile is valid. Second, Sally scored higher on both Scale X (Disclosure) and Scale Z (Debasement) than she did on Scale Y. This pattern, in conjunction with the magnitude of her Scale X score, indicates that Sally is highly selfdisclosing. Thus, it can be said that she is willing to discuss those things that are within her awareness. However, her statements that her parents are perfect and that they never fought would lead to the belief that she makes ample use of repression as a defense mechanism. Although Sally is willing to discuss personal issues and feelings, such a discussion is limited by the degree to which she is unaware of the processes influencing her life. Emotional Assessment One way to understand Sally and her emotional problems may be to develop some appreciation for her anger, and for the ways in which it is repressed or redirected. How the patient came to have that anger can only be the subject of speculation. However, it may be that her parents placed such high value on her, and gave her so much attention, that the narcissistic and histrionic needs she developed could not be fulfilled outside of the home. Her anger could possibly result from her recognition that most people do not value her at the level that she has come to expect, and do not pay as much attention to her as she would wish. Although Sally may have higher potential than her average IQ would indicate, she may not be able to play the submissive student role long enough to obtain a degree. If so, the most common way to achieve status in the community may not be available to

< previous page

page_1078

next page >

< previous page

page_1079

next page > Page 1079

her. Modeling apparently has helped her meet some of her needs for a few years, a contention supported by the fact that she did not need psychotherapy for a while. In this light, it is not surprising that she returned to therapy at the time when modeling was not available to her. Although she planned to go back to modeling in the near future, it must be obvious to her that this career will only be feasible for a limited number of years. Sally revealed that her perfect parents never fought and would never get angry at anybody. It could be assumed that she learned to express much of her anger through the hysterical mechanisms that may have flowed out of the histrionic aspects of her personality. Specifically, Sally appears to use somatization as a way to deal with some of the tension she experiences. Both the history and the testing showed a preoccupation with medical problems. There also was the presence of illnesses that have an emotional substrate, such as ulcers and migraines. Her vomiting clearly would meet criteria for the conversion disorder because this symptom actually is mentioned in the DSM-III-R. The patient also exemplified the associated features of the conversion disorder noted by the DSM because her personality had histrionic traits and she demonstrated the "belle indifference" inappropriate affect. On the positive side, contact with reality was good. The patient did not appear to suffer from an affective disorder at the time of the testing. There were some signs of both depression and hypomania in both the history and the test protocol, but the affective symptoms she was experiencing did not seem to be significant enough to meet criteria for an affective disorder. Diagnostic Impressions I. Conversion disorder 300 .11 (vomiting) II. Borderline, histrionic, narcissistic, antisocial, and negativistic personality elements III. Migraines, ulcers Recommendations Sally would benefit from a period of psychotherapy. Given her personality style, some ideas can be offered about the kind of therapeutic relationship that she would find most comfortable. For instance, an emphasis on formalities, such as being on time for the session or keeping an interpersonal distance during the session, is likely to feel unfriendly and dissatisfying to her. The therapist may need to be tolerant of emotionality on the part of the patient and maybe even a certain amount of conflict. The type of relationship that would feel egosyntonic to her would be one where she is very much the center of attention and one where demonstrations of affection and support flow readily, especially from the therapist to the patient. Sally can be expected to be most comfortable in situations where she feels looked up to, admired, or at least respected. If confrontation is used in therapy, much tact has to be exercised so as not to injure her narcissism more than she can tolerate. Once the therapeutic relationship has been established, the treatment plan may include the goal of making the patient aware of her histrionic and narcissistic needs, and the anger that is generated when those needs are not fulfilled. Explorations into the unproductive ways in which she has acted out her anger may be useful in controlling the

< previous page

page_1079

next page >

< previous page

page_1080

next page > Page 1080

borderline-like behaviors. Sally may be well advised to find a long-lasting career that allows her to meet some of her needs. Working with her to decrease the level of her emotional needs also may be necessary. Conclusions The MCMI-III is a brief, easy to administer personality inventory designed to be used with clinical populations. Developed within a strong theoretical perspective, it demonstrates good reliability and criterion validity. Moreover, the MCMI-III content parallels the DSM-IV and DSM-III-R classification schemes and is well suited to multiaxial diagnosis. The use of the MCMI-III, in conjunction with an understanding of its underlying theory, provides the clinician with information crucial to treatment planning and outcome assessment. References Butcher, J.N. (Ed.). (1972). Objective personality assessment. New York: Academic Press. Choca, J.P., Shanley, L.A., & Van Denberg, E. (1992). Interpretive guide to the Millon Clinical Multiaxial Inventory (MCMI). Washington, DC: American Psychological Association. Choca, J.P., Shanley, L.A., & Van Denberg, E. (1997). Interpretive guide to the Millon Clinical Multiaxial Inventory (2nd ed.). Washington, DC: American Psychological Association. Craig, R.J. (Ed.). (1993). The Millon Clinical Multiaxial Inventory: A clinical research information synthesis. Hillsdale, NJ: Lawrence Erlbaum Associates. Dahlstrom, W.G. (1972). Whither the MMPI? In J.N. Butcher (Ed.), Objective personality assessment (pp. 85116). New York: Academic Press. Hase, H.D., & Goldberg, L.R. (1967). Comparative validity of different strategies of constructing personality inventory scales. Psychological Bulletin, 67, 231-248. Hsu, L.M., & Maruish, M.E. (1992). Conducting publishable research with the MCMI-II: Psychometric and statistical issues. Minneapolis: National Computer Systems. Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635-694. Maruish, M. (Ed.). (1994). The use of psychological testing for treatment planning and outcome assessment. Hillsdale, NJ: Lawrence Erlbaum Associates. Millon, T. (1969). Modern psychopathology. Philadelphia: Saunders. Millon, T. (1981). Disorders of personality: DSM-III, Axis II. New York: Wiley. Millon, T. (1984). On the renaissance of personality assessment and personality theory. Journal of Personality Assessment, 8, 450-466. Millon, T. (1986a). A theoretical derivation of pathological personalities. In T. Millon & G.L. Klerman (Eds.), Contemporary directions in psychopathology: Toward the DSM-IV (pp. 639-670). New York: Guilford. Millon, T. (1986b). Personality prototypes and their diagnostic criteria. In T. Millon & G.L. Klerman (Eds.), Contemporary directions in psychopathology: Toward the DSM-IV (pp. 671-712). New York: Guilford. Millon, T. (1987). Manual for the MCMI-II (2nd ed.). Minneapolis: National Computer Systems. Millon, T. (1990). Towards a new personology: An evolutionary model. New York: Wiley-Interscience. Millon, T. (1994). MCMI-III manual. Minneapolis: National Computer Systems. Millon, T., & Davis, R. (1996). Disorders of Personality: DSM-IV and beyond. New York: Wiley-Interscience.

< previous page

page_1080

next page >

< previous page

page_1081

next page > Page 1081

Millon, T., & Everly, G. (1985). Personality and its disorders. New York: Wiley. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Rosen, A. (1962). Development of the MMPI scales based on a reference group of psychiatric patients. Psychological Monographs, 76(527).

< previous page

page_1081

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_1083

next page > Page 1083

Chapter 35 Personality Assessment Inventory Leslie C. Morey Vanderbilt University The Personality Assessment Inventory (PAI; Morey, 1991) is a self-administered, objective test of personality and psychopathology designed to provide information on critical client variables in behavioral health care settings. From its inception, it was constructed to provide measures of constructs that are central in treatment planning, implementation, and evaluation. Although it was introduced fairly recently, the PAI has already generated considerable attention from clinicians and researchers, and the test has been described as a "substantial improvement from a psychometric perspective over the existing standard in the area" (Helmes, 1993, p. 417), and as "one of the most exciting new personality tests" (Schlosser, 1992, p. 12). This chapter provides an overview of the procedures employed in developing the inventory, key reliability studies, and studies of the construct validity for the various scales. It also offers specific information about the use of the PAI in the planning and evaluation of treatment. Overview Rationale and Development The development of the PAI was based on a construct validation framework that emphasized a rational as well as quantitative method of scale development. This framework places a strong emphasis on a theoretically informed approach to the development and selection of items, as well as on the assessment of their stability and correlates. The theoretical articulation of the constructs to be measured is critical, because this articulation must serve as a guide to the content of information to be sampled and to the subsequent assessment of content validity. In this process, both the conceptual nature and empirical adequacy of the items play an important role in their inclusion in the final version of the inventory. The development of the test went through four iterations in a sequential

< previous page

page_1083

next page >

< previous page

page_1084

next page > Page 1084

construct validation strategy similar to that described by Loevinger (1957) and Jackson (1970), although a number of item parameters were considered in addition to those described here. Of paramount importance in the development of the test was the assumption that no single quantitative item parameter should be used as the sole criterion for item selection. An overreliance on a single parameter in item selection typically leads to a scale with one desirable psychometric property and numerous undesirable ones. As an example, each PAI scale was constructed to include items addressing the full range of severity of the construct, including both its milder and most severe forms. Such coverage would not be possible if a single item selection criterion was applied; "milder" items would be most effective in distinguishing clinical subjects from normals, whereas items reflecting more severe pathology would be more useful in discriminating among different clinical groups. Also, item total correlations for such different items would be expected to vary as a composition of the sample due to restriction of range considerations; milder items would display higher biserial correlations in a community sample, whereas more severe items would do so in an inpatient psychiatric sample. Thus, items selected according to a single criterion (such as discrimination between groups or item total correlation) are doomed to provide limited coverage of the full range of severity of a clinical construct. The PAI sought to include items that struck a balance between different desirable item parameters, including content coverage as well as empirical characteristics, so that the scales could be useful across a number of different applications. The clinical syndromes assessed by the PAI were selected on the basis of two criteria: the stability of their importance within the nosology of mental disorder and their significance in contemporary diagnostic practice. These criteria were assessed through a review of the historical and contemporary literature, as well as through a survey of practicing diagnosticians. In generating items for these syndromes, the literature on each clinical syndrome was examined to identify those components most central to the definition of the disorder, and items were written so as to provide an assessment of each component of the syndrome in question. The test contains 344 items that are answered on a four-alternative scale, with the anchors "Totally False," "Slightly True," "Mainly True," and "Very True.'' Each response is weighted according to the intensity of the feature that the different alternatives represent; thus, clients who answer "Very True" to the question "Sometimes I think I'm worthless" add 3 points to their raw score on the Depression scale, whereas those who respond "Slightly True" to the same item add only 1 point. The 344 items comprise 22 nonoverlapping full scales: 4 validity, 11 clinical, 5 treatment consideration, and 2 interpersonal scales. Ten of the full scales contain conceptually derived subscales designed to facilitate interpretation and coverage of the full breadth of complex clinical constructs. A brief description of the full scales is provided in Table 35.1, and Table 35.2 presents a description of the PAI subscales. Normative Data The PAI was developed and standardized for use in the clinical assessment of individuals in the age range of 18 through adulthood. The initial reading level analyses of the PAI test items indicated that reading ability at the fourth-grade level was necessary to complete the inventory. Subsequent studies of this issue (e.g., Schinka & Borum, 1993) have supported the conclusion that the PAI items are written at a grade equivalent lower than estimates for comparable instruments.

< previous page

page_1084

next page >

< previous page

page_1085

next page > Page 1085

TABLE 35.1 The 22 Full Scales of the PAI Description

Scale (Designation) Validity Scales Inconsistency Determines if client is answering consistently (ICN) throughout inventory. Each pair consists of highly correlated (positively or negatively) items. Infrequency Determines is client is responding carelessly or (INF) randomly. Items are neutral with respect to psychopathology and have extremely high or low endorsement rates. Negative Suggests an exaggerated unfavorable impression or Impressio malingering. Items have relatively low endorsement (NIM) rates among clinical subjects Positive Suggests the presentation of a very favorable Impression impression or reluctance to admit minor flaws. (PIM) Clinical Scales Somatic Focuses on preoccupation with health matters and Complaints somatic complaints associated with somatization and (SOM) conversion disorders. Anxiety Focuses on phenomenology and observable signs of (ANX) anxiety with an emphasis on assessment across different response modalities. AnxietyFocuses on symptoms and behaviors related to related specific anxiety disorders, particularly phobias, Disorders traumatic stress, and obsessive-compulsive symptoms. (ARD) Focuses on symptoms and phenomenology of Depression depressive disorders. (DEP) Mania Focuses on affective, cognitive, and behavioral (MAN) symptoms of mania and hypomania. Paranoia Focuses on symptoms of paranoid disorders and more (PAR) enduring characteristics of paranoid personality. Schizophrenia Focuses on symptoms relevant to the broad spectrum (SCZ) of schizophrenic disorders. Borderline Focuses on attributes indicative of a borderline level Features of personality functioning, including unstable and (BOR) fluctuating interpersonal relations, impulsivity, affective lability and instability, and uncontrolled anger. Antisocial Focuses on history of illegal acts and authority Features problems, egocentrism, lack of empathy and loyalty, (ANT) instability, and excitement seeking. Alcohol Focuses on problematic consequences of alcohol use Problems and features of alcohol dependence. (ALC) Drug Focuses on problematic consequences of drug use Problems (both prescription and illicit) and features of drug (DRG) dependence. Treatment Scales Aggression Focuses on characteristics and attitudes related to (AGG) anger, assertiveness, hostility, and aggression. Suicidal Focuses on suicidal ideation, ranging from Ideation hopelessness to thoughts and plans for the suicidal (SUI) act. Measures the impact of recent stressors in major life Stress (STR) areas. Nonsupport Measures a lack of perceived social support, (NON) considering both the level and quality of available support. Treatment Focuses on attributes and attitudes theoretically Rejection predictive of interest and motivation in making (RXR) personal changes of a psychological or emotional

nature. Interpersonal Scales Dominance Assesses the extent to which a person is controlling (DOM) and independent in personal relationships. A bipolar dimension with a dominant style at the high end and a submissive style at the low end. Warmth Assesses the extent to which a person is interested in (WRM) supportive and empathic personal relationships. A bipolar dimension with a warm, outgoing style at the high end and a cold, rejecting style at the low end.

< previous page

page_1085

next page >

< previous page

page_1086

next page > Page 1086

TABLE 35.2 PAI Subscales and Their Descriptions Description

Subscale (Designation) Somatic Complaints Conversion Focuses on symptoms associated with conversion disorder, (SOM-C) particularly sensory or motor dysfunctions. Somatization Focuses on the frequent occurrence of various common physical (SOM-S) symptoms and vague complaints of ill health and fatigue. Health Focuses on a preoccupation with health status and physical Concerns problems. (SOM-H) Anxiety Cognitive Focuses on ruminative worry and concern about current issues (ANX-C) that results in impaired concentration and attention. Affective Focuses on the experience of tension, difficulty in relaxing, and (ANX-A) the presence of fatigue as a result of high perceived stress. Physiological Focuses on overt physical signs of tension and stress, such as (ANX-P) sweaty palms, trembling hands, complaints of irregular heartbeats, and shortness of breath. Anxiety-related Disorders ObsessiveFocuses on intrusive thoughts or behaviors, rigidity, indecision, Compulsive perfectionism, and affective constriction. (ARD-O) Phobias (ARDFocuses on common phobic fears, such as social situations, P) public transportation, heights, enclosed spaces, or other specific objects. Traumatic Stress Focuses on the experience of traumatic events that cause (ARD-T) continuing distress and that are experienced as having left the client changed or damaged in some fundamental way. Depression Cognitive Focuses on thoughts of worthlessness, hopelessness, and (DEP-C) personal failure, as well as indecisiveness and difficulties in concentration. Affective Focuses on feeling of sadness, loss of interest in normal (DEP-A) activities, and anhedonia. Physiological Focuses on level of physical functioning, activity, and energy, (DEP-P) including disturbance in sleep pattern and changes in appetite and/or weight loss. Mania Activity Level Focuses on overinvolvement in a wide variety of activities in a (MAN-A) somewhat disorganized manner and the experience of accelerated thought processes and behavior Grandiosity Focuses on inflated self-esteem, expansiveness, and the belief (MAN-G) that one has special and unique skills or talents. Irritability Focuses on the presence of strained relationships due to the (MAN-I) respondent's frustration with the inability or unwillingness of others to keep up with their plans, demands, and possibly unrealistic ideas. Paranoia Hypervigilance Focuses on suspiciousness and the tendency to monitor the (PAR-H) environment for real or imagined slights by others. Persecution Focuses on the belief that one has been treated inequitably and (PAR-P) that there is a concerted effort among others to undermine ones interests. Resentment Focuses on a bitterness and cynicism in interpersonal (PAR-R) relationships, and a tendency to hold grudges and externalize blame for any misfortunes. Schizophrenia Psychotic Focuses on the experience of unusual perceptions and Experiences sensations, magical thinking, and/or other unusual ideas that (SCZ-P) may involve delusional beliefs. Social Focuses on social isolation, discomfort, and awkwardness in Detachment social interactions.

(SCZ-S) Thought Disorder (SCZ-T)

Focuses on confusion, concentration problems, and disorganization of thought processes. (Continued)

(table continued on next page)

< previous page

page_1086

next page >

< previous page

page_1087

next page > Page 1087

(table continued from previous page)

Subscale (Designation) Borderline Features Affective Instability (BOR-A) Identity Problems (BOR-I) Negative Relationships (BOR-N) Self-harm (BOR-S) Antisocial Features Antisocial Behaviors (ANT-S) Egocentricity (ANT-E) Stimulus Seeking (ANT-S) Aggression Aggressive Attitude (AGG-A) Verbal Aggression (AGG-V) Physical Aggression (AGG-P)

TABLE 35.2 (Continued) Description

Focuses on emotional responsiveness, rapid mood changes, and poor emotional control. Focuses on uncertainty about major life issues and feelings of emptiness, unfulfillment, and an absence of purpose. Focuses on a history of ambivalent, intense relationships in which one has felt exploited and betrayed. Focuses on impulsivity in areas that have high potential for negative consequences. Focuses on a history of antisocial acts and involvement in illegal activities. Focuses on a lack of empathy or remorse and a generally exploitive approach to interpersonal relationships. Focuses on a craving for excitement and sensation, a low tolerance for boredom, and a tendency to be reckless and risk taking. Focuses on hostility, poor control over anger expression, and a belief in the instrumental utility of aggression. Focuses on verbal expressions of anger ranging from assertiveness to abusiveness, and a readiness to express anger to others. Focuses on a tendency to physical displays of anger, including damage to property, physical fights, and threats of violence.

PAI scale and subscale raw scores are transformed to T-scores in order to provide interpretation relative to a standardization sample of 1,000 community-dwelling adults. This sample was carefully selected to match 1995 U.S. census projections on the basis of gender, race, and age; the educational level of the standardization sample was selected to be representative given the required fourth-grade reading level; over half of the sample was a high school graduate or lower. The only stipulation for inclusion in the standardization sample (other than stratification fit) was that subjects had to endorse more than 90% of PAI items; in other words, no more than 33 items could be left blank. No other restrictions based on PAI data were applied in creating the census-matched standardization sample. The PAI T-scores are calibrated to have a mean of 50 and a standard deviation of 10, using a standard linear transformation from the community sample norms. Thus, a T-score value greater than 50 lies above the mean in comparison to the scores of subjects in the standardization sample. Roughly 84% of nonclinical subjects will have a T-score below 60 (one standard deviation above the mean) on most scales, whereas 98% of nonclinical subjects will have scores below 70 (two standard deviations above the mean). Thus, a T-score at or above 70 represents a pronounced deviation from the typical responses of adults living in the community. For each scale and subscale, the T-scores were linearly transformed from the means and standard deviations derived from the census-matched standardization sample. Unlike many other similar instruments, the PAI does not calculate T-scores differently for men and women; instead, the same (combined) norms are used for both genders. This is because separate norms distort natural epidemiological differences between genders. For example, women are less likely than men to receive a diagnosis of antisocial personality, and this is

< previous page

page_1087

next page >

< previous page

page_1088

next page > Page 1088

reflected in lower mean scores for women on the Antisocial Features (ANT) scale. A separate normative procedure for men and women would result in similar numbers of each gender scoring in the clinically significant range, a result that does not reflect the established gender ratio for this disorder. The PAI included several procedures designed to eliminate items that might be biased due to demographic features such as gender, race, or age, and items that displayed any signs of being interpreted differently as a function of these features were eliminated in the course of selecting the final items for the test. As it turns out, with relatively few exceptions, differences as a function of demography were negligible in the community sample. The most noteworthy effects that have been observed involve the tendency for younger individuals to score higher on the Borderline Features (BOR) and ANT scales, and the tendency for men to score higher on ANT and on Alcohol Problems (ALC) relative to women. T-scores are derived from a representative community sample, thus they provide a useful means for determining if certain problems are clinically significant, because relatively few normal adults will obtain markedly elevated scores. However, other comparisons are often of equal importance in clinical decision making. For example, nearly all patients report depression at their initial evaluation; the question confronting the clinician considering a diagnosis of major depression is one of relative severity of symptomatology. Knowing that an individual score on the PAI Depression scale is elevated in comparison to the standardization sample is of value, but a comparison of the elevation relative to a clinical sample may be more critical in forming diagnostic hypotheses. To facilitate these comparisons, the PAI profile form (shown in Fig. 35.1) also indicates the T-scores that correspond to marked elevations when referenced against a representative clinical sample. The profile "skyline" indicates the score for each scale and subscale that represents the raw score that is two standard deviations above the mean for a clinical sample of 1,246 patients selected from a wide variety of different professional settings. Thus, roughly 98% of clinical patients will obtain scores below the skyline on the profile form. Scores above this skyline thus represent a marked elevation of scores relative to those of patients in clinical settings. Thus, interpretation of PAI profiles can be accomplished in comparison to both normal and clinical samples. The PAI manual provides normative transformations for a number of different comparisons. Various appendices provide T-score transformations referenced against the clinical sample and a large sample of college students, as well as for various demographic subgroups of the community standardization sample. Although the differences between demographic groups were generally quite small, there are occasions where it may be useful to make comparisons with reference to particular groups. Thus, the raw score means and standard deviations needed to convert raw scores to T-scores with reference to normative data provided by particular groups (men, women, Blacks, and people over age 60) are provided in the manual for this purpose. However, for most clinical and research applications, the use of the T-scores derived from the full normative data is strongly recommended because of its representativeness and larger sample size. Reliability of the PAI The reliability of the PAI has been examined in a number of different studies that have examined the internal consistency, test-retest reliability, and configural stability of the instrument.

< previous page

page_1088

next page >

< previous page

page_1089

next page > Page 1089

Fig. 35.1. A sample PAI modal profile. Reproduced by special permission of the Publisher, Psychological Assessment Resources, Inc., Odessa, FL 33556, from the Personality Assessment Inventory by Leslie Morey, PhD, Copyright © 1991 by PAR, Inc. Further reproduction is prohibited without permission of PAR, Inc.

< previous page

page_1089

next page >

< previous page

page_1090

next page > Page 1090

The internal consistency reliability of the PAI has been examined in a number of different populations (Alterman et al., 1995; Boyle & Lennon, 1994; Morey, 1991; Rogers, Flores, Ustad, & Sewell, 1995; Schinka, 1995). This has involved the use of coefficient alpha (Cronbach, 1951), which can be interpreted as an estimate of the mean of all possible split-half combinations of items. The internal consistency alphas for the PAI full scales are satisfactory, with median alphas reported in the manual (Morey, 1991) for the full scales of .81, .82, and .86 for normative, college, and clinical samples, respectively. As expected, the scales tend to appear more internally consistent in more heterogeneous samples. Alterman et al. (1995) found a median alpha of .78 in a sample of methadone maintenance patients, whereas Schinka (1995) found a median alpha of .86 for full scales and .77 for the subscales in an alcoholic sample. Boyle and Lennon (1994) reported a median alpha of .84 in a mixed clinical/normal sample. Internal consistency estimates for the Inconsistency (ICN) and Infrequency (INF) scales are consistently lower than those for other scales, because these scales do not measure theoretical constructs but rather the care with which the respondent completed the test. Lower alphas for such scales would be anticipated because carelessness might vary within a given sitting; for example, a subject might complete the first half of the test accurately but complete the last half haphazardly. The lowest internal consistency estimates for the PAI reported in the literature were obtained using the Spanish version of the instrument (Rogers et al., 1995), where an average alpha of .63 was obtained. Rogers and colleagues concluded that the internal consistency of the treatment consideration scales seemed to be most affected by the translation of the test. A subsequent study by this group indicated that the translated PAI demonstrated moderate convergent validity that was at least equal, and superior in some respects, to a Spanish translation of the MMPI-2 (Fantini-Salvador & Rogers, 1997). Examination of internal consistency estimates for the PAI full scales for groups defined by various demographic characteristics (Morey, 1991) does suggest that there is little variability in internal consistency as a function of race (median scale alpha for Whites = .77, nonWhites = .78), gender (men = .79, women = .75), or age (under 40 = .79, 40 and over = .75). The FantiniSalvador and Rogers (1997) already cited also found no effect of ethnicity after controlling for symptom status. The temporal stability of PAI scales has been examined by administering the test to subjects on two different occasions (Boyle & Lennon, 1994; Morey, 1991; Rogers et al., 1995). For the standardization studies, median test-retest reliability value, over a 4-week interval, for the 11 full clinical scales was .86 (Morey, 1991), leading to standard error of measurement estimates for these scales on the order of three to four T-score points, with 95% confidence intervals of +/- six to eight T-score points. Examination of the mean absolute T-score change values for scales also revealed that the absolute changes over time were quite small, on the order of two to three T-score points for most of the full scales (Morey, 1991). Boyle and Lennon (1994) reported a median test-retest reliability of .73 in their normal sample over 28 days. Rogers et al. (1995) found an average stability of .71 for the Spanish version of the PAI, administered over a 2-week interval. Because multiple-scale inventories are often interpreted configurally, additional questions should be asked concerning the stability of configurations on the 11 PAI clinical scales. One such analysis involved determining the inverse (or Q-type) correlation between each subject's profile at Time 1 and the profile at Time 2. Correlations were obtained for each of the 155 subjects in the full retest sample, and a distribution of these within-subject profile correlations was obtained. Conducted in this manner, the

< previous page

page_1090

next page >

< previous page

page_1091

next page > Page 1091

median correlation over time of the clinical scale configuration was .83, indicating a substantial degree of stability in profile configurations over time (Morey, 1991). Validity of the PAI The validation of measures of clinical constructs is a process that requires the accumulation of data concerning convergent and discriminant validity correlates. In attempting to establish the validity of the PAI, a number of the best available clinical indicators have been administered concurrently to various samples to determine their convergence with corresponding PAI scales. Furthermore, diagnostic and other clinical judgments concerning clinical behaviors (as rated by the treating clinician) have also been examined to determine if their PAI correlates were consistent with hypothesized relations. Finally, a number of simulation studies have been performed to determine the efficacy of the PAI validity scales in identifying response sets. To date, a number of studies have been conducted examining correlates of various PAI scales; the PAI manual alone contains information about correlations of individual scales with over 50 concurrent indices of psychopathology (Morey, 1991). Some of the more noteworthy findings from these studies are summarized next. The PAI validity scales were developed to provide an assessment of the potential influence of certain response tendencies on PAI test performance. Two of these scales, ICN and INF, were developed to assess deviations from conscientious responding, whereas the other two validity scales, Negative Impression (NIM) and Positive Impression (PIM), were developed to provide an assessment of efforts at impression management by the respondent. To model the performance of subjects completing the PAI in a random fashion, computer-generated profiles were created by generating random responses to individual PAI items and then scoring all scales according to their normal scoring algorithms. A total of 1,000 simulated protocols were generated for this analysis. Comparison of profiles derived from normal subjects, clinical subjects, and the random response simulations demonstrated a clear separation of scores of actual respondents from the random simulations, and 99.4% of these random profiles were identified as such by either ICN or INF (Morey, 1991). To model the performance of subjects attempting to manage their impressions in either a positive or negative direction, studies have been performed in which subjects were instructed to simulate such response styles. Comparison of profiles for normal subjects, clinical subjects, and the corresponding response-style simulation group demonstrated a clear separation between scores of the actual respondents and the simulated responses. Subjects scoring above the critical level of NIM were 14.7 times more likely to be a member of the malingering group than of the clinical sample, whereas those scoring above threshold on PIM were 13.9 times more likely to be in the positive dissimulation sample than a community sample (Morey, 1991). Subsequent studies have generally supported the ability of these scales to distinguish simulators from actual protocols under a variety of response set conditions (e.g., Cashel, Rogers, Sewell, & Martin-Cannici, 1995; Rogers, Ornduff, & Sewell, 1993). In addition to such simulation studies, a number of correlational studies have been performed to determine the convergent and discriminant validity of the PAI validity scales as measured against other commonly used measures of similar constructs (Ban,

< previous page

page_1091

next page >

< previous page

page_1092

next page > Page 1092

Fjetland, Kutcher, & Morey, 1993; Costa & McCrae, 1992; Morey, 1991). For example, NIM correlated significantly (r = .54) with the MMPI F scale; PIM was associated with the Marlowe-Crowne Social Desirability scale (.56), as well as with the MMPI K (.47) and L (.41) scales (Morey, 1991). The INF and ICN scales displayed negligible correlations with any measures, an expected result because these were designed as relatively pure indicators of measurement error. The clinical scales of the PAI were assembled to provide information about critical diagnostic features of 11 important clinical constructs. A number of different validity indicators have been used to provide information on the convergent and discriminant validity of the PAI clinical scales; these indicators can be divided into measures of "neurotic features," "psychotic features," and "behavior disorder features." Within the neurotic spectrum, correlations with NEO-PI (Costa & McCrae, 1985), the MMPI clinical and research scales (Hathaway & McKinley, 1967; Morey, Waugh, & Blashfield, 1985; Wiggins, 1966), and several specialized assessment instruments have been examined. These specialized instruments include the Wahler Physical Symptoms Inventory (Wahler Inventory; Wahler, 1983), a broad measure of somatic complaints; the Beck Depression Inventory (BDI; Beck & Steer, 1987), Beck Anxiety Inventory (BAI; Beck & Steer, 1990) and Beck Hopelessness Scale (BHS; Beck & Steer, 1988), widely used and well-validated measures of negative affect; the Hamilton Rating Scale for Depression (HAM-D; Hamilton, 1960), perhaps the most widely used measure of outcome in treatment studies of depression; the State-Trait Anxiety Inventory (STAI; Spielberger, 1983), a widely used measure of anxiety that distinguishes between its situational and more enduring elements; the Fear Survey Schedule (FSS; Wolpe & Lang, 1964), a comprehensive assessment of common fears; the Maudsley Obsessive-Compulsive Inventory (Maudsley Inventory; Rachman & Hodgson, 1980), a measure of severe obsessional ideation and contamination fears; and the Mississippi Scale for Combat-related Posttraumatic Stress Disorder (Mississippi PTSD; Keane, Caddell, & Taylor, 1988). Correlations between each of the full scale scores for the four PAI neurotic cluster scales and the validation measures already described follow hypothesized patterns, with strong associations with other measures of neuroticism (Costa & McCrae, 1992; Montag & Levin, 1994; Morey, 1991). The strongest correlates for Somatic Complaints (SOM) have been found to be the Wiggins Health Concerns (.80) and Organic Problems (.82) content scales, the Wahler Inventory (.72), and the MMPI Hypochondriasis (.60) scale. Each of these measures is a fairly straightforward assessment of complaints regarding physical functioning, so this pattern of correlations is consistent with expectations. The SOM scale also displays small to moderate relations with measures of distress, such as anxiety or depression. The SOM scale is generally the highest point of the PAI profile in a general medical population, although even in such populations the average score is typically below 70T (Osborne, 1994). The Anxiety (ANX) scale demonstrated substantial correlations with a number of measures of negative affect, including NEO-PI Neuroticism (.76) and Anxiety (.76), STAI Trait Anxiety (.73), and the Wiggins Depression content scale (.76). This finding is consistent with research results highlighting the prominent role of anxiety in many mental disorders; such a pattern should be anticipated because ANX was intended to be a general measure of anxiety rather than a specific diagnostic indicator. In contrast, the Anxiety-related Disorders (ARD) scale was designed to provide content relevant to more specific diagnostic differentiations, and hence the pattern of correlations tends to be more specific than that observed with ANX. The largest correlation for ARD was with the Mississippi PTSD scale (.81), and the second largest involved the FSS (.66);

< previous page

page_1092

next page >

< previous page

page_1093

next page > Page 1093

each of these scales directly parallels a disorder for which ARD was designed to provide coverage. The ARD scale has also been found to correlate with the probability of getting nightmares (.46), with ARD-T (.51) in particular being associated with night terrors (Greenstein, 1993). The ARD scale (particularly ARD-T) has also been found to differentiate women psychiatric patients who were victims of childhood abuse from other women patients who did not experience such abuse (Cherepon & Prinzhorn, 1994). The Depression (DEP) scale demonstrates its largest correlations with various well-validated indicators of depression, such as the BDI (.81), the HAM-D (.78), and the Wiggins Depression content scale (.81). This is consistent with expectations, because these measures are widely used in the assessment of depression and related symptomatology. Other noteworthy correlates of the Depression scale include the MMPI D scale (.66), the Wiggins Poor Morale scale (.74), the NEO-PI Neuroticism (.69) and Depression (.70) scales, and the Beck Hopelessness scale (.67). Correlations with a number of other measures of related constructs have also been examined that can provide information relevant to the convergent and discriminant validity of the PAI "psychotic cluster" scales. For example, the MMPI, the NEO-PI, the Interpersonal Adjective Scale (IAS-R; Trapnell & Wiggins, 1990), and the clinicianrated Brief Psychiatric Rating Scale (Overall & Gorham, 1962) include scales that capture the cognitive and interpersonal abnormalities that characterize these disorders. Correlations between each of the three PAI psychotic spectrum scales and those validation measures described generally follow the expected pattern (Ban et al., 1993; Costa & McCrae, 1992; Morey, 1991). The Mania (MAN) scale has demonstrated its strongest correlations with Wiggins Hypomania (.63), Psychoticism (.58), and Hostility (.55) content scales, with the BPRS clinical ratings of Grandiosity (.48) and Conceptual Disorganization (.40), and with the MMPI Ma scale (.53). The Paranoia (PAR) scale demonstrated its largest correlations with the MMPI Paranoid personality disorder scale (.70), the Wiggins Psychoticism scale (.60), and various measures of hostility such as the Wiggins Hostility content scale (.54) and the NEO-PI Hostility facet scale (.55). A moderate correlation with the MMPI Pa scale was also observed (.45). The Schizophrenia (SCZ) scale has been found to correlate with the Wiggins Psychoticism content scale (.76) and the MMPI Schizotypal (.67) and Paranoid (.66) personality disorder scales. The SCZ scale was also positively correlated with the MMPI Sc scale (.55), and negatively associated with indices of sociability and social effectiveness such as NEO-PI Agreeableness (-.49) and Gregariousness (-.57). This pattern indicates that scores on the SCZ scale reflect disruptions in both cognitive (e.g., delusions, hallucinations) and interpersonal (e.g., limited social competence) realms of functioning. Finally, the SCZ scale has been found to distinguish schizophrenic patients from controls (Boyle & Lennon, 1994). In that study, the schizophrenic sample did not differ significantly from a sample of alcoholics on SCZ scores, although from information presented it appeared that many of the alcoholic patients completed the PAI during detoxification, which might serve to complicate differential diagnosis based solely on SCZ scores. Information on the convergent and discriminant validity of the PAI scales in the behavior disorders cluster is also available. In addition to the NEO-PI, the IAS-R, and the MMPI, the scales have been correlated with a number of specialized assessment instruments, which included the Bell Object Relations Inventory (Bell Inventory; Bell, Billington, & Becker, 1985), a multifactorial questionnaire constructed to measure a variety of interpersonal attitudes and beliefs indicative of early pathological object relations thought to be at the core of the borderline syndrome (Bell, Billington, Cicchetti, & Gibbons, 1988); the Michigan Alcoholism Screening Test (MAST; Selzer, 1971), a

< previous page

page_1093

next page >

< previous page

page_1094

next page > Page 1094

widely used and well-validated measure of problem behaviors associated with drinking; the Drug Abuse Screening Test (DAST; Skinner, 1982), a measure patterned after the MAST that assesses the consequences of drug abuse; and the Self-report Psychopathy test designed by Hare (1985) to assess his model of psychopathy. Correlations between scores for the four PAI behavior disorder cluster scales and these validation measures follow expected patterns (Costa & McCrae, 1992; Kurtz, Morey, & Tomarken, 1993; Morey, 1991). The strongest correlates of the Borderline Features (BOR) scale are the MMPI Borderline personality disorder scale (.77), the NEO-PI Neuroticism scale (.67), and several different measures of hostility, such as the NEO-PI Hostility facet (.70). The BOR scale also displayed substantial correlations with the Bell Inventory Insecure Attachment scale (.63), with the NEO-PI Impulsiveness facet (.52), and with the Wiggins Family Problems (.63) and Psychoticism (.63) content scales. This pattern of anger, impulsiveness, and interpersonal clashes is consistent with the core features of the borderline syndrome. Other studies have supported the validity and utility of this scale in a variety of clinical contexts. The BOR scale in isolation has been found to distinguish borderline patients from unscreened controls with an 80% hit rate, and successfully identified 91% of these subjects as part of a discriminant function (Bell-Pringle, 1994). Classifications based on the BOR scale have been validated in a variety of domains related to borderline functioning, including depression, personality traits, coping, Axis I disorders, and interpersonal problems (Trull, 1995). These BOR scale classifications were also found to be predictive of 2-year outcome on academic indices in college students, even controlling for academic potential and diagnoses of substance abuse (Trull, Useda, Conforti, & Doan, 1995). The PAI Antisocial Features (ANT) scale demonstrated its largest correlations with the Hare Psychopathy Scale (.82) and the MMPI Antisocial personality disorder scale (.77). Other correlates included the Wiggins Hostility (.57) and Family Problems (.52) content scales, the NEO-PI Excitement Seeking facet (.56), and the IAS-R cold interpersonal octant (.45). This pattern suggests that the personality, interpersonal, and behavioral elements of psychopathy are addressed by the scale. The correlation with the MMPI Pd scale is positive but not impressive (.34), suggesting that the two scales represent the core features of the disorder somewhat differently. The PAI Alcohol Problems (ALC) and Drug Problems (DRG) scales demonstrate a similar pattern of correlates: strong correlations with corresponding measures of substance abuse and moderate associations with indicators of antisocial personality. ALC yields a correlation of .89 with the MAST, whereas DRG correlates .69 with the DAST. The ALC scale has been found to differentiate patients in an alcohol rehabilitation clinic from patients with schizophrenia as well as normal controls (Boyle & Lennon, 1994). The DRG scale has also been found to successfully discriminate drug abusers and methadone maintenance patients from general clinical and community samples (Alterman et al., 1995). The treatment consideration scales of the PAI were assembled to provide indicators of potential complications in treatment that would not necessarily be apparent from diagnostic information. There are five of these scales: two indicators of potential for harm to self or others, two measures of the subjects environmental circumstances, and one indicator of the subjects motivation for treatment. These scales have been compared to a number of measures of related constructs that were examined. In addition to the NEO-PI, IAS-R, and the MMPI, the scales have been correlated with numerous specialized assessment instruments. The BDI, BAI, and BHS provide convergent correlates for suicidal ideation. Also, the Suicide Probability Scale (SPS; Cull & Gill, 1982) serves as a concurrent indicator of suicide potential. The SPS has four subscales that

< previous page

page_1094

next page >

< previous page

page_1095

next page > Page 1095

assess hopelessness, suicidal ideation, negative self-evaluation, and hostility, in addition to yielding a total score for suicide probability. The State-Trait Anger Expression Inventory (STAXI; Spielberger, 1988) provides a marker for aggression, which is broken down into six scales and two subscales. The Perceived Social Support scales (Procidano & Heller, 1983) provide an assessment of the subjective impact of supportive transactions between the subject and the subject's social system; two separate scales assess support provided by the subject's family and by the subject's friends. Finally, the Schedule of Recent Events (SRE) is a unit-scoring adaptation of the widely used Holmes and Rahe (1967) checklist of recent stressors, where subjects are asked to indicate major life changes that have taken place in the 12 months prior to evaluation. Correlations between the PAI treatment consideration scales and such validation measures provide support for the construct validity of these scales (Costa & McCrae, 1992; Morey, 1991). Substantial correlations have been identified between the Aggression (AGG) scale and NEO-PI Hostility (.83) and STAXI Trait Anger (.75) scales. The AGG scale was also negatively correlated with the STAXI Anger Control scale (-.57). The Suicidal Ideation (SUI) scale was most positively correlated with the BHS (.64), the BDI (.61), the Suicidal Ideation (.56) and Total Score (.40) of the SPS, and also found to be negatively correlated with perceived social support measures. As expected, the Nonsupport (NON) scale was found to be highly (and inversely) correlated with the social support measures (-.67 with PSS-Family, -.63 with PSS-Friends). It was also moderately associated with numerous measures of distress and tension. The Stress (STR) scale displayed its largest correlations with the SRE (.50) and was also associated with various indices of depression and poor morale. Finally, the Treatment Rejection (RXR) scale is found to be negatively associated with Wiggins Poor Morale (-.78) and the NEO-PI Vulnerability (-.54) scales, consistent with the idea that distress can serve as a motivator for treatment. The Treatment Rejection scale has been shown to be positively associated with indices of social support (.26 to .49), implying that people are less likely to be motivated for treatment if they have an intact and available support system as an alternative. The interpersonal scales of the PAI were designed to provide an assessment of the interpersonal style of subjects along two dimensions: a warmly affiliative versus a cold rejecting axis, and a dominating, controlling versus a meekly submissive style. These axes provide a useful way of conceptualizing variation in normal personality as well as many different mental disorders, and persons at the extremes of these dimensions may present with a variety of disorders. The PAI manual describes a number of studies indicating that diagnostic groups differ on these dimensions; for example, spouse-abusers are relatively high on the Dominance (DOM) scale, and schizophrenics are low on the Warmth (WRM) scale (Morey, 1991). Correlations with related measures also provide support for the construct validity of these scales. For example, the correlations with the IAS-R vector scores are consistent with expectations, with PAI DOM associated with the IAS-R dominance vector (.61) and PAI WRM associated with the IAS-R love vector (.65). The NEO-PI Extroversion scale roughly bisects the high DOM/high WRM quadrant, as it is moderately positively correlated with both scales; this finding is consistent with previous research (Trapnell & Wiggins, 1990). The WRM scale was also correlated with the NEO-PI Gregariousness scale (.46), and DOM was associated with the NEO Assertiveness facet (.71). In summary, the scales of the PAI have been found to associate with most major instruments for the assessment of diagnosis and treatment efficacy in theoretically concordant ways. Strategies for the interpretation of the PAI profile and its use in treatment planning and evaluation are presented next.

< previous page

page_1095

next page >

< previous page

page_1096

next page > Page 1096

Basic Interpretive Strategy Because the development of the PAI emphasized the importance of both convergent and discriminant validity of the instrument, the interpretation of PAI protocols is relatively straightforward. For example, scales were designed to be generally pure measures of the constructs in question; thus, an elevation on the DEP scale may be interpreted as indicating that the respondent reports a number of experiences consistent with the symptomatology of clinical depression. Interpretive hypotheses may be generated at four different levels: the item level, the subscale level, the full scale level, and the configuration level. Interpretation of PAI responses at the item level are meaningful because the content of each item was assumed to be critical in determining its relevance for the assessment of the construct. For example, each item was reviewed by a panel of experts to insure that its content was directly relevant to the clinical construct in question. As a result, a review of item content can provide specific information about the nature of difficulties experienced by the respondent. In addition, 27 PAI items were identified as ''critical items" based on two criteria: importance of their content as an indicator of potential crisis situations and very low endorsement rates in normal individuals. It is recommended that endorsement of any of these items be followed by more detailed questioning that can clarify the nature and severity of these concerns. The PAI subscales were constructed as an aid in isolating the core elements of the different clinical constructs that the test measures. These subscales can serve to clarify the meaning of full scale elevations, and may be used configurally in diagnostic decision making. For example, many patients typically come to clinical settings with marked distress and dysphoria, often leading to elevations on most unidimensional depression scales. However, unless other manifestations of the syndrome are present, this does not necessarily indicate that major depression is the likely diagnosis. In the absence of features such as vegetative signs, lowered self-esteem, and negative expectancies, the diagnosis may not be warranted even with a prominent elevation on a depression scale. On the PAI, such a pattern would lead to an elevation on DEP-A, representing the dysphoria and distress, but without elevations on DEP-P (the vegetative signs) and DEP-C (the cognitive signs). As a result, an overall elevation on DEP in this instance would not be interpreted as diagnostic of major depression because of the lack of supporting data from the subscale configuration. Interpretation of PAI full scale scores is aided by comparison to two referents: expected scores in the community and expected scores in clinical patients. As described earlier, the PAI profile form (Fig. 35.1) provides lines demarcating a two-standard deviation elevation with respect to each of these groups. The similarity of expected scores for these two populations varies a great deal across scales. For example, the interpersonal scales DOM and WRM have distributions that are quite similar in both community and clinical samples; thus, marked elevations (or very low scores) are noteworthy regardless of the nature of the client. On the other hand, the RXR scale (which was designed to identify risk for early treatment termination) has a markedly different distribution in clinical and community samples. The majority of clinical subjects (who are in treatment) obtain scores that are considerably below those of community subjects, who are typically not in psychological treatment and have little interest in it. Thus, although a T-score of 50 on RXR in a client presenting for psychotherapy is "average" for a community sample, it is actually considerably above the expected score for clients in clinical settings. Thus, in this instance, this score should be interpreted as

< previous page

page_1096

next page >

< previous page

page_1097

next page > Page 1097

indicating potentially significant resistance to change for this client. In contrast, an RXR score of 50T in an individual who was administered the PAI for personnel selection purposes would be unremarkable. In these two cases, the differences in the assessment question leads to differences in the interpretation of the information yielded by a normative transformation. The configuration of the PAI profile represents the highest interpretive level. Traditionally, the premise behind multidimensional inventories such as the PAI has been that the combination of information provided by the multiple scales is greater than any of its parts; hence, the focus in most previous research on the profile yielded by such an inventory, rather than the use of single-scale elevations. To date, there have been numerous research approaches to studying the configural use of PAI profile data. These approaches include the use of mean profiles, profile codetypes, cluster profiles, actuarial functions, and configural decision rules or indices. With respect to mean profiles, the PAI manual presents the average profiles derived from 24 different groups isolated on the basis of a particular diagnosis (e.g., major depression) or a particular problem behavior (e.g., recent suicide attempt). The frequency of PAI two-point codetypes in different diagnostic groups has been examined in various studies, and it is clear that specific codetypes are associated with certain diagnoses at levels far beyond that expected by chance (Morey, 1991). Studies using cluster analysis have also been performed that have identified profile clusters with external correlates that are both clinically and statistically significant. These clusters represent statistically common PAI profile configurations that can be used as a first step in interpretation; Figure 35.1 presents the full scale profile of one of these clusters (Morey, 1991). Actuarial analyses have also been conducted to identify actuarial decision rules for diagnostic assignment and for ascertaining profile validity. These functions have been incorporated into the PAI computer interpretation program in an attempt to realize the promise of computerized actuarial interpretation. Finally, configural rules have also been developed for decisions about profile validity as well as for more than 40 DSMIV diagnostic categories. These latter rules were designed to match DSM criteria with corresponding constructs on the PAI, and were also incorporated into the computer interpretation program. Use of the PAI for Treatment Planning Treatment planning is a critical issue for psychological assessment, yet it is a daunting one because there is little empirical evidence to definitively support specific treatments for specific problems or patient types. However, the PAI has particular promise for refining treatment-related decision making (Morey & Henry, 1994), as it provides important information relevant to the treatment processchoice of setting, need for medications, suitability for psychotherapy, selection of therapeutic targets, and assessment of change. This section offers guidelines to help the clinician use PAI data to make many commonly faced treatment-related decisions. Because of the subscale structure of the PAI, and its articulation with current diagnostic nomenclature, the PAI is useful in answering many common referral questions in the context of psychological testing. A number of guidelines for using the PAI to judge suitability and prognosis for psychotherapy, and suggestions for specific treatment approaches are presented next. Although drawn from empirical data whenever possible, in many cases these guidelines are

< previous page

page_1097

next page >

< previous page

page_1098

next page > Page 1098

presented as testable hypotheses, and research to test their validity and refine their use is encouraged. Predicting Treatment Process: Impediments and Assets For many years, it has been presumed that one of the most important determinants of treatment outcome is the person's motivation for treatment. Although different authors have somewhat differing views of the nature of this motivation, it is generally agreed that a dissatisfaction with current behavior patterns and a willingness to make an effort to change these patterns are important components of treatment motivation (Sifneos, 1987; Strupp & Binder, 1984). These components can serve as important determinants of treatment outcome, no matter what specific type of treatment is involved. Sifneos (1987) identified seven criteria for the evaluation of treatment motivation for his studies of short-term psychotherapy: 1. A willingness to participate actively in the diagnostic evaluation. 2. Honesty in reporting about oneself and one's difficulties. 3. Ability to recognize that the symptoms experienced are psychological in nature. 4. Introspectiveness and curiosity about one's own behavior and motives. 5. Openness to new ideas, with a willingness to consider different attitudes. 6. Realistic expectations for the results of treatment. 7. Willingness to make a reasonable sacrifice in order to achieve a successful outcome. On the PAI, the Treatment Rejection (RXR) scale is the beginning point in the examination of treatment motivation. RXR items were written to indicate attitudes that were not consistent with the characteristics of treatment motivation described earlier. In other words, they were designed to identify individuals who would not be motivated for treatment, but rather would be at risk for noncompliance and early termination. Items were written to be applicable across different therapeutic modalities. Broad content areas that were sampled included a refusal to acknowledge problems, a lack of introspectiveness, an unwillingness to participate actively in treatment, and an unwillingness to accept responsibility for change in one's life. In interpreting scores on RXR it must be remembered that T-scores are referenced against a community sample, not a treatment sample; hence, scores that are typical of normals actually represent little motivation for treatment. Thus, even T-scores that appear to be within the average range can have quite negative implications for treatment motivation when working within a clinical setting. If working in other, nonclinical settings (say, a preemployment screening) scores of 50T may be typical, but they are not typical when working with clinical populations. In the standardization clinical sample, the mean score on RXR was 40T. Another aspect of RXR that is critical in its interpretation is that it is related to treatment motivation, not prognosis. Motivation is perhaps a necessary but certainly not a sufficient condition for successful treatment. Merely because individuals recognize that they need to make changes does not mean that accomplishing those changes will be easy. In fact, very low scores on RXR are often an indication of a "cry for help," indicative of overwhelming distress and beseeching to mental health professionals to do something to alleviate their suffering. For example, individuals with borderline personality who are in acute distress will often score quite low on this scale, indicating (presumably) very high motivation for treatment. And, in fact, such patients are experiencing such turmoil that they truly and desperately want their lives to change. However,

< previous page

page_1098

next page >

< previous page

page_1099

next page > Page 1099

because such patients are extremely difficult to work with for other reasons, the prognosis for treatment is not necessarily favorable. The scaling of RXR is such that low scores reflect high motivation for treatment, whereas elevations indicate little motivation for treatment. Low scores on RXR (below 43T) suggest individuals who acknowledge major difficulties in their functioning and perceive an acute need for help in dealing with these problems; scores below 20T indicate a desperate quality to these needs. Average scores on RXR (between 43T and 53T) reflect a person who acknowledges the need to make some changes, has a positive attitude toward the possibility of personal change, and accepts the importance of personal responsibility. However, scores in the upper portion of this range are higher than expected in subjects where available information (such as from the history or from other scales of the PAI) suggests some impairment; in such circumstances, the possibility of defensiveness, rigidity, or lack of insight must be considered. Scores between 53T and 63T are indicative of people who are generally satisfied with themselves as they are and see little need for major changes in their behavior. Individuals scoring in this range would generally have little motivation to enter into psychotherapy and might be at risk for early termination if they did enter treatment. RXR scores above 63T reflect people who admit to few difficulties and have no desire to change the status quo. Such individuals are not likely to seek therapy on their own initiative and will likely be resistant if they do begin treatment; they will probably dispute the value of therapy and have little, if any, involvement in any therapeutic attempts. Although motivation for treatment is an important factor in determining treatment outcome, it is certainly not sufficient by itself to insure that the treatment will be successful. There are countless patient, treatment, and interaction variables that can potentially affect treatment outcome (Lambert, 1991). Patient predisposing variables, in isolation, will have a limited ability to predict outcome, because different types of patients can and do respond differently to diverse forms of treatment (Frances, Clarkin, & Perry, 1984). Some of these interactions and their implications for PAI interpretation are discussed in another section. Nonetheless, there are a number of patient features that suggest a difficult treatment process, regardless of the type of treatment offered. For example, a number of theorists have offered suggestions about factors influencing amenability to various types of therapeutic approaches. Table 35.3 presents a list of variables offered as predictors of suitability to exploratory therapy (Stone, 1985; Strupp & Binder, 1984; Waldinger & Gunderson, 1987). However, a close examination of these features reveals that patients with numerous indicators of "low suitability" for exploratory therapy are probably less likely to respond to any form of intervention than those who would be considered "high suitability" according to this table. For example, deceitful, impulsive, hostile patients from an unsupportive and abusive environment are less than ideal candidates for any treatment; they are unlikely to comply with pharmacotherapy, behavior therapy, or group therapy as well as exploratory therapy. Thus, this list of indicators is a reasonable starting point for estimating the degree of difficulty likely to be encountered as part of the treatment process. The following paragraphs describe the assessment of these treatment difficulty indicators. Friendliness. Individuals who are reasonably effective interpersonally are better able to make use of any form of helping relationship, regardless of the techniques used to achieve change. Individuals who are hostile are unlikely to cooperate with treatment, with the process of treatment constantly at risk for deteriorating into a struggle for control. For any individual to be considered amiable, some degree of warmth is essential.

< previous page

page_1099

next page >

< previous page

page_1100

next page > Page 1100

TABLE 35.3 Indicators of Amenability to Therapy Characteristic Low Suitability High PAI Problem Suitability Indicators 1. Friendliness Hostile Amiable PAR-R > 70T AGG-A > 70T WRM < 30T 2. Likableness Unlikable Likable BOR > 70T ANT > 70T 3. Motivation Indifferent Motivated RXR > 60T PIM > 60T 4. PsychologicalLow High BOR-S > 70T minded ANT-E > 70T SOM > 70T ANT-A > 70T 5. Conscience Deceitful Moral ANT-E > 70T Factors Sense 6. Self-discipline Chaotic Disciplined BOR > 70T ANT > 70T ALC > 70T DRG > 70T NIM > 70T 7. Impulse Control Impulsive SelfBOR-S > 70T Control AGG > 70T ANT-A > 70T ANT-S > 70T 8. Defensive Style Autoplastic Alloplastic BOR > 70T ANT > 70T ALC > 70T DRG > 70T 9. Internalization Projecting Admits PAR > 70T Fault 10. Empathy Entitlement Empathy MAN-G > 70T DOM > 70T ANT-E > 70T 11. Parental Factors Abusive/IndifferentSupportive ARD-T > 70T NON > 70T 12. Social Supports Few Many NON > 70T STR > 70T Hence, extremely low scores on WRM (less than 30T) would be a negative indicator of friendliness. Similarly, overt indicators of hostility are also negative signs, and are probably most directly gauged by PAR-R or AGG-A subscale elevations above 70T. Likableness. Although friendliness and likableness are likely to be empirically related, they are independent constructs. Some people can be friendly in an overbearing or ingenuine way, and hence not be well liked; others can be rather hostile but (e.g., because their hostility is expressed in a humorous way) still be reasonably likable. In general, individuals with personality disorders (particularly those in Cluster B) are the least likable of individuals presenting for treatment; they tend to be manipulative, disagreeable, and egocentric. Thus, scores on BOR and ANT, which tap the features of two of these disorders, are probably the best indicators of likability on the test; individuals scoring above 70T on either of these scales are not likely to be well liked by many other people. Motivation. As discussed previously, motivation for treatment is perhaps a necessary, although not sufficient, condition for successful interventions. The RXR scale was constructed to yield information relevant to this construct, and scores greater than 60T

< previous page

page_1100

next page >

< previous page

page_1101

next page > Page 1101

are a sign of very low motivation for treatment. However, elevated scores on PIM can also indicate a level of rigidity and defensiveness suggesting that motivation for personal change will be lacking; scores above 60T on this scale should also be considered an indicator of inadequate interest in treatment. Psychological-Minded. For most forms of psychological therapy, the patient must be willing to consider the psychological origin of problems, if only to allow them to participate willingly in such treatments. Even in pharmacotherapy, some capacity to self-monitor is necessary to enable the person to comply with the medication regimen. Several PAI scales are suggestive of difficulties with introspection and self-awareness. Marked impulsivity and acting-out tendencies are negative indicators of introspection; thus, scores on BOR-S or ANT-A above 70T suggest little capacity for reflection. If SOM exceeds 70T, then a patient's underlying conflicts are prone to be expressed somatically and they may be resistant to considering themselves in need of psychological intervention. If ANTE is above 70T, then the patient may not have sufficient empathic capacity to consider others' experiences or viewpoints. Any of these features suggests limited psychological-mindedness. Conscience Factors. In general, a clearly established system of values and a good moral sense are assets that are favorable prognostic features for therapy. In contrast, deceitful, vengeful, or antisocial types of individuals are likely to have considerable difficulties working within a therapy relationship. Scores on ANT-E that exceed 70T indicate a willingness to deceive others for personal gain, a characteristic that portends an arduous treatment. Self-Discipline. Individuals with the capacity for order and discipline tend to have smoother courses of treatment than those who have little discipline, who act-out behaviorally, and who lead chaotic and uncontrolled lives. These problems may lie in the realm of substance abuse (ALC or DRG > 70T), behavioral indiscretions (BOR or ANT > 70T), or in a chaotic approach to life (NIM > 70T). Impulse Control. Most psychosocial treatments require some capacity for reflection and delay. Individuals who act-out rather than reflect on their emotional experience tend to have more difficulty with treatment in general. Impulsivity can lead to compliance problems, even with treatments in which insight and introspection are minimally important. On the PAI, elevations on BOR-S, ANT-A, ANT-S, or AGG are each signs of heightened impulsivity and poor capacity for delay that make treatment difficult. Defensive Style. Stone (1985) used Alexander's terms of alloplastic as opposed to autoplastic defensive styles to refer to the nature of the patient's approach to their symptoms and problems. This concept refers to whether the core problems experienced by the person are central to the self-structure and part of the ingrained personality (autoplastic), as opposed to those that are viewed as ego-alien and seen as a change from the person's normal functioning (alloplastic). Individuals with an autoplastic defensive style are often unable to identify the aspects of their life that cause them repeated difficulties, because these aspects are in their mind simply "the way they are" rather than a disorder that they have. The characterological aspects of personality represented by BOR and ANT represent the essence of this defensive style. Similar defensive strategies are often found among substance abusers as well, leading to concerns when ALC or DRG are elevated.

< previous page

page_1101

next page >

< previous page

page_1102

next page > Page 1102

Internalization. In many clients, the internalization of blame and fault is often excessive and a source of distress. However, it is generally considered to be a favorable prognostic sign within the context of psychotherapy. Individuals who externalize blame for all their troubles, projecting responsibility outward rather than accepting some role in their problems, often are unwilling to make the personal changes needed in therapy. The pattern of externalization is likely to repeat in the context of therapy, with the patient eventually coming to blame the therapist for treatment impasses due to the clinician's unwillingness to accept the patient's worldview. Such individuals often do not place sufficient trust in others to establish a helping relationship; eventually, they have difficulty with the treating professional as an authority figure, and may react to the therapist in a hostile or derogating manner. Scores above 70T on PAR are generally a sign that marked externalization is part of the clinical picture. Empathy. The establishment of an alliance with the treating professional is a critical ingredient in therapeutic success regardless of treatment modality, and the ability to care about and establish rapport with others is central in forging this alliance. Individuals who approach relationships with an entitled, exploitative, and contemptuous attitude tend to have difficulty working within the therapy context. Elevated scores on MAN-G or ANT-E are particularly related to problems in the empathic realm. If DOM is greater than 70T, then the patient's need for control over the therapist may also make collaboration difficult. Parental Factors. Individuals who come from a background where caretakers have been abusive, indifferent, or exploitative tend to have great difficulty placing trust in helping professionals. In particular, they will become resistant and may terminate treatment as issues become increasingly sensitive. Elevations on ARD-T and/or NON can serve as clues to difficulties in this area. Social Supports. Research has shown that the patients with an adequate social support network tend to make better and more rapid progress in psychotherapy. NON scores below 70T indicate that a patient's perceived social supports are generally within normal limits, whereas STR scores in that range suggest that the support system is reasonably stable and predictable. An adequate and predictable support system is considered a favorable sign, and elevations on NON and/or STR reflect problem areas that can serve as both an obstacle and a target for treatment. The Treatment Process Index Table 35.3 presents the operationalization of these predictors of treatment amenability into a cumulative index known as the Treatment Process Index. The features on this index tap a wide array of different psychological problems, and in general respondents with globally elevated profiles will obtain high scores. However, certain PAI scales appear repeatedly in the calculation of this index, and in general the greater the degree of characterological problems, the higher the predicted degree of disruptions in treatment process. The Treatment Process Index is scored by counting the number of positive features (which may be evidenced by the presence of one or more of the listed profile characteristics) in Table 35.3. Each feature in isolation is seen with reasonable frequency in a general clinical population, but in combination the features suggest a difficult treatment. Morey (1996) provided T-score conversions for the Treatment Process Index, standardized against the means for the community and clinical samples. Scores on this index

< previous page

page_1102

next page >

< previous page

page_1103

next page > Page 1103

will be elevated in individuals with refractory problems that will tend to complicate treatment process, regardless of the specific modality used. Index scores below 4 indicate the presence of numerous personal assets that may assist the treatment process. If presenting for treatment, such people may be experiencing transient distress, perhaps associated with current circumstances rather than chronic difficulties. As the index begins to elevate (7 to 10 items positive), there are many and varied obstacles to smooth treatment process. Problems tend to be more refractory and chronic in nature, and therapy will likely be difficult and have many reversals. Marked elevations (11 or 12 items positive) suggest a very difficult treatment process. Because of the complexity of these problems and their enduring nature, considerable efforts will be needed to establish any form of alliance needed to maintain the person in treatment. Such individuals are likely to be among the most challenging of any patients to treat. Differential Treatment Planning Treatment selection is a difficult task in the mental health field; among psychosocial interventions alone, there are at least 130 different approaches from which to select (Smith, Glass, & Miller, 1980). There is also frustratingly little evidence to suggest that a specific treatment is unequivocally indicated for a particular disorder. Unfortunately, the realities of clinical practice dictate that many critical treatment selection decisions must be made despite the limited information that can be brought to bear on these questions. Obviously, making treatment recommendations based on the PAI is hampered by this limited database, but some conclusions can be drawn and this section offers some suggestions for this purpose. Perry, Frances, and Clarkin (1988) divided mental health treatments according to the following five parameters of the intervention: 1. Setting, such as inpatient hospitalization, outpatient therapy, or halfway house placements. 2. Format, referring to whether treatment should involve individual sessions, group therapy, and/or family or marital therapy. 3. Time, involving the length and frequency of sessions, and the total duration of treatment. 4. Approach, involving the use of different techniques based on different theoretical perspectives. 5. Somatic, involving the use of psychopharmacologic medications or other somatic forms of treatment. The following sections are organized according to these five parameters of treatment and the resulting treatment decisions that the clinician often faces related to these parameters. Each common question for which the PAI may provide guidance is followed by a list of topics or areas that are important to assess in answering the question. Each area is in turn followed by the specific sources of PAI data most relevant to that area. It should be stressed that these suggestions are to be treated as guidelines to aid the clinical decision-making process, and are not offered as firm rules. Choice of Treatment Setting One frequent function of psychological assessment involves determining whether inpatient treatment is required, and if the patient is already in an inpatient setting, to provide recommendations about the continued necessity of such treatment. Several areas should be considered.

< previous page

page_1103

next page >

< previous page

page_1104

next page > Page 1104

Functional Impairment. Is patients' current level of overall functioning or their ability to meet role responsibilities impaired to such an extent that hospitalization is warranted? Such problems can be manifest in a number of areas tapped by the PAI, particularly with extreme scores on the clinical scales that are at or above the profile "skyline" in the absence of any indication of negative distortion of the profile due to malingering or exaggeration. Chronic and severe somatic complaints and accompanying dysfunction or fatigue can compromise functional capacity, and extreme scores on SOM can reflect such issues. Anxiety may be so overwhelming that the patient may be unable to meet daily tasks, and mild stressors might precipitate a major crisis in such people; ANX scores above the skyline would be expected in such people. Extreme scores on DEP are usually accompanied by a crippling level of fatigue, loss of motivation, social withdrawal, and helplessness that may make outpatient treatment unfeasible. Individuals with extreme MAN scores may display a level of impulsivity, inability to delay gratification, and flight of ideas that can render them unable to meet role expectations. With extreme PAR scores, particularly elevations on PAR-P, the possibility of paranoid delusions that interfere with social and occupational functioning should be explored. Similarly, extreme scores on SCZ are typically associated with an active schizophrenic episode requiring hospitalization, and even more moderate elevations on the SCZ-P subscale should be investigated as this subscale measures psychotic signs unique to schizophrenia. Potential for Self-Harm. Are patients an imminent risk to themselves due to suicidality or impulsive self-damaging behaviors? Obviously, suicidality is a critical indication of the need for inpatient treatment, and the SUI scale is a important tool for such assessments. Individuals on suicidal precautions display an average score of 84T on that scale. As a potential supplement in this area, Morey (1996) provided information on the Suicide Potential Index, a set of 20 markers (of indicators such as poor social support or marked situational stress) likely to exacerbate the risk of suicidal behavior. It should also be noted that impaired judgment and recklessness can place an individual at risk for self-harm in the absence of overt suicidal ideation. Scores above 75T on MAN represent a degree of behavioral impulsivity that may increase the risk of self-damaging behaviors. Elevations above 70T on either BOR-S or ANT-S represent long-standing characterological features that do not necessarily indicate suicidality, but do suggest impulsivity that heightens the risk for self-harm, particularly when combined with other clinical indicators. The BOR-S elevation suggests a pattern of impulsive behavior with high potential for negative consequencesreckless spending, sexual behavior, or substance abuse. An ANT-S elevation indicates a tendency toward reckless and dangerous behavior, and a craving for excitement and stimulation. Danger to Others. Do patients require hospitalization because they are an immediate danger to others? Obviously, assaultive behavior indicates the need for inpatient treatment; the AGG scale (particularly the AGG-P subscale assessing physical aggression) is a useful beginning point for this assessment. As a potential supplement in this area, Morey (1996) provided information on the Violence Potential Index, a set of 20 markers (of indicators such as hostility or low frustration tolerance) likely to exacerbate the risk of violence. Chemical Dependency. The choice between an inpatient and outpatient setting for the treatment of chemical dependency is an increasingly common and important decision. Often this decision is based on whether or not the patient has the ability to control substance use on an outpatient basis, or can be detoxified safely as an outpatient. If

< previous page

page_1104

next page >

< previous page

page_1105

next page > Page 1105

ALC > 84T or DRG > 80T, then the patient is increasingly likely to qualify for a diagnosis of substance dependence and may require detoxification in an inpatient setting, particularly if there are emotional complications such as suicidality or danger to others. It should be remembered that the PAI drug and alcohol scales are straightforward measures of what the patient reports; therefore, various PAI indicators (as described in Morey, 1996) should be checked for evidence of denial. Traumatic Stress Reaction. Evidence of extreme preoccupation with past traumatic events on the ARD-T when accompanied by high levels of anxiety (ANX greater than 90T) may indicate the need for crisis hospitalization. In cases where no obvious stressors are known, this pattern has sometimes been observed to indicate the imminent emergence of suppressed memories of childhood abuse. On occasion, the ARD-T subscale may be elevated even in cases in which the patient cannot currently report specific traumatic memories. In extreme cases, the patient may be in temporary need of a protected environment. This is particularly true if there is evidence of recent passively self-damaging behaviors, such as car accidents. Signs of thought disturbance will also exacerbate such a clinical picture. Choice of Treatment Format Individual treatment remains the most prevalent format for mental health treatment, and it is difficult to imagine situations in which some individual contact with a client would be contraindicated. Nonetheless, the increasing acknowledgment of interpersonal factors in personal problems has led in recent years to a growing use of group and family/marital interventions. Group-based treatments come in many forms, ranging from self-help groups to psychotherapy groups with heterogeneous members. The different forms share a number of critical mechanisms that emphasize the importance of interpersonal feedback, confrontation, and support within an environment of peers. Such interventions are particularly effective for individuals with poor social skills, distortions in their view of others and themselves, problems with empathy, or social anxiety. A number of PAI scales are global indicators of social ineffectiveness of the type that might be amenable to group intervention, including low scores on WRM and high scores on SCZ-S (suggesting social awkwardness) and ARD-P (potentially indicating social anxiety). Other indicators of problems that may be helped with group interventions include marked distrust (elevated PAR scores or any of its subscales), rigid needs for interpersonal control (high scores on DOM), or failures in empathy (ANT-E). Although these latter problems present considerable hurdles for any form of therapy, groupbased interventions may be helpful in diffusing the problems with authority (in the form of resistance or hostility toward the therapist) that such people often manifest. Family and/or marital therapy are particularly effective in ameliorating issues that lie primarily within a family system, and even interventions focused on particular emotional problems may be more effective if made within a family therapy context. On the PAI, marital and family issues are most evident on NON, and to a lesser extent STR. Elevations on NON that are 10T points above any of the clinical scales are particularly indicative that the respondent views the primary concerns as existing within the marriage and/or the family. In interpreting the NON elevation in this manner, the clinician should pay particular attention to elevations on PAR and/or BOR, which may

< previous page

page_1105

next page >

< previous page

page_1106

next page > Page 1106

indicate a generalized pattern of interpersonal bitterness, of which the reported family difficulties are merely an instance. Choice of Treatment Length As cost containment becomes an ever-increasing consideration in health care, efforts to predict and even limit length of treatment have become important concerns. Unfortunately, in the mental health field it is quite difficult to predict the necessary length of treatments in advance. Length of treatment is also confounded with treatment approach, with some treatments (e.g., certain behavioral treatments) tending to be briefer and others (e.g., psychoanalysis or maintenance medication) lasting for years. Finally, over the course of treatment, both patient and therapist will reconsider whether the frequency of sessions should change and if further treatment is necessary. One rather global guide to the likely duration of treatment is the Treatment Process Index, which will be elevated in individuals with refractory problems that require treatments of greater intensity. Persons presenting for treatment with four or fewer items on this index are likely experiencing transient distress, perhaps associated with current circumstances. A relatively brief intervention with such individuals can have a significant impact, relative to other patients. As the index begins to elevate (7 to 10 items positive), the refractory nature of the problems makes it unlikely that a brief intervention will be effective in ameliorating the issues that are probably driving the observable level of distress, and treatments of greater duration and intensiveness may be required to effect lasting change. Marked elevations (11 or 12 items positive) suggest a need for highly intensive treatments. Because of the complexity of the problems and their enduring nature, brief interventions are likely to involve crisis intervention and considerable efforts will be needed to establish any form of alliance needed to maintain the person in more intensive treatment. In the course of clinical practice, decisions about length of treatment are usually part of the treatment process rather than fixed at the beginning of treatment. As improvements are noted, the intensity of treatment may be lessened, or formal treatment may be terminated. The PAI's scale and subscale structure make it particularly useful for charting patient changes and making decisions about changes in treatment intensity based on those changes. For example, in the inpatient treatment of severe depression, the relative changes in the affective, cognitive, and physiological components can be measured separately with a readministration of the test in order to better understand the specific effects of treatment. Also, the decision about need for continued inpatient care can be gauged. A reduction in suicidal ideation may be noted, and changes in the patient's openness to treatment (RXR), negativity of worldview (NIM), and the perceived balance of external stress (STR) versus available support (NON) may all be useful for judging the patient's progress and updating treatment plans as needed. Multiple administrations of the PAI during treatment can be useful in identifying critical elements of the treatment process that might indicate need for alterations in treatment intensity. For example, for clients presenting with RXR scores suggestive of treatment rejection, it would be anticipated that initial efforts in treatment might need to be directed at potential resistance. Alternatively, clients receiving an interpersonally based treatment might be expected to show changes on the interpersonal scales as a prerequisite to addressing distress that would be evident from the clinical scales. Similarly, clients receiving cognitive therapy for depression might be expected to show the most rapid improvements on the DEP-C subscale, with improvements in somatic and

< previous page

page_1106

next page >

< previous page

page_1107

next page > Page 1107

affective aspects of the syndrome contingent on this change. If anticipated changes are not observed, revisions in treatment intensity or treatment approach might be needed. Choice of Differential Treatment Approach As noted earlier, the research literature provides little evidence to support the selection of specific therapies for specific problems. However, PAI data may be coupled with guidelines offered in the literature, as well as common ''clinical wisdom" to provide some general guidance to treatment planning. For example, Karasu (1990a, 1990b) offered a comparison of psychodynamic, cognitive, and interpersonal approaches along a variety of theoretical and technical dimensions. Using the syndrome of depression as an example, Karasu delimited patient variables that would either call for or contraindicate each of these psychotherapeutic approaches. Although the model is presented in the context of depression, the concepts are equally applicable to many other clinical problems. Morey (1996) provided a detailed description of the operationalization of Karasu's selective patient variables for the psychodynamic, cognitive, and interpersonal strategies. The psychodynamic or exploratory approach focuses on insight, understanding, and resolution of internal conflict, taking a developmental approach in understanding the individual's present difficulties. This approach is particularly suited for individuals with difficulties that are developmental in nature, and hence the issue of conflicts in past relationships (suggested by ARD-T, BOR-N, and BOR-I) is especially salient. However, use of this approach requires the individual to be reasonably psychologically minded (lower RXR), have the capacity for trust (lower AGG and PAR), and be able to handle the impulses resulting from a confrontation of their defenses (lower BOR-S). Karasu (1990b) suggested that individuals with more focused interpersonal problems or social deficits (e.g., high SCZ-S, low WRM), particularly those pertaining to present-day relationships, might be better treated with an interpersonal approach. Finally, the cognitive approach is particularly suited to individuals with negative distortions of the self (high DEP-C or ANX-C), and perhaps less useful for individuals with impulsive acting-out behaviors (lower ANT-A, AGG-P, and BOR-S). There are a variety of other approaches in addition to the three already described. For example, many treatments are supportive in nature, aiming to shore up a patient's defenses and restore them to a more functional level. Such treatments are particularly important when there is evidence that the patient is extremely overwhelmed, has highly disorganized thought processes, or is quite vulnerable due to traumatic stress reactions (see previous sections for relevant data sources from the PAI). Approaches utilizing behavioral or environmental manipulation procedures may be optimal for difficulties involving circumscribed phobias (look for ARD-P elevations), somatization (SOM-S or SOM-H), assertiveness (low DOM), or lack of impulse control (BOR-S, ANT-A). Conjoint family or marital therapy should be considered in cases of extreme functional impairment or when the patient reports a marked lack of support by others, as suggested by elevated scores on NON. Choice of Somatic Treatments In many outpatient settings, the clinician often has to make the important decision of whether or not to refer the patient for a medication consult. In inpatient settings, the test results can help the physician choose between medications based on the relative

< previous page

page_1107

next page >

< previous page

page_1108

next page > Page 1108

prominence of depression, anxiety, mania, psychosis, or other symptomatology that is amenable to pharmacologic treatment. For example, Karasu (1990b), in addition to the indications for different psychotherapy approaches described earlier, offered a number of indications for pharmacotherapy of depression, including marked vegetative signs (DEP-P), motor retardation (suppressed MAN-A), loss of control over thinking (SCZ-T), and obsessive rumination (ARD-O, ANX-C). A variety of scale elevations can serve as general markers for medical evaluation and/or intervention for other disorders as well. With respect to antianxiety medications, the ANX and ARD scales are particularly informative. Marked elevations on ANX suggest intense preoccupation and rumination that may be intrusive enough to place the patient at risk for inadequate occupational or social functioning, and sufficient enough to interfere with the progress of psychotherapeutic interventions. Also, very high STR scores suggest that nearly all major life areas are in turmoil and the patient feels surrounded by crises. Severe scores on ARD-P can indicate multiple phobias, panic disorder, and/or agoraphobia, which may benefit from a combination of medical and psychosocial treatment. Various PAI markers can also indicate the need to consider antipsychotic medications. Marked elevations on PAR (particularly PAR-P) indicate a need to evaluate for systematic paranoid delusional systems that may benefit from antipsychotic medication. If the full SCZ scale is markedly elevated, or even if the SCZ-P subscale displays a more modest elevation, the patient may require neuroleptic medication. Noteworthy elevations on SCZ-T indicate marked confusion and concentration problems that may benefit from medication; however, without elevations on other SCZ subscales, SCZ-T may also reflect severe depression. Finally, elevations on MAN above the profile skyline raise the possibility of a full-blown manic episode, meaning that medication should be considered. Specifying Therapeutic Targets The PAI can also be a useful source of data for isolating specific targets for therapeutic work (regardless of approach and/or diagnosis) and may help order the priorities for intervention. Morey and Henry (1994) described a number of such targets. The following list, expanding and updating the Morey and Henry (1994) guidelines, is not exhaustive but does cover some commonly observed areas of difficulty that cause people to seek treatment. Poor Impulse Control The most obvious priorities for intervention are impulsive, potentially dangerous behaviors: chemical dependency and maladaptive anger expression. Thus, elevations on any of the following are associated with poor impulse control: ALC, DRG, MAN, BOR (particularly BOR-A and BOR-S), ANT (particularly ANT-S), and AGG. Treatment may involve medical management in the case of a manic episode, or may require direct limit setting, therapeutic contracts (conditions under which therapy will or will not proceed), or anger management training. The more numerous the indicators, the greater the problem and the poorer the prognosis. There is some research evidence to suggest that behavioral approaches may be somewhat more effective with these types

< previous page

page_1108

next page >

< previous page

page_1109

next page > Page 1109

of acting-out and antisocial problems (Sloane, Staples, Cristol, Yorkston, & Whipple, 1975). Anger Repression Some patients experience problems with overinhibition of impulses, such as an inability to appropriately express angry feelings, resulting in maladaptive strategies to contain anger. This may be due to a fear of rejection, fear of loss of control, the unacceptability of angry feelings, and so forth. Repressed anger may express itself as timidity and lack of assertion (very low AGG), compulsive rigidity (elevated ARD-O), or in physical symptoms (SOM elevations). Those patients with a history of abuse (observed on ARD-T) may also have difficulty expressing anger directly, even though there may be deep underlying anger. In these cases, encouragement of the more direct expression of anger may be useful as a first step. However, it should be noted that the mere expression of anger (e.g., "cathartic" treatment) has not usually been shown to be of lasting benefit in and of itself as the only therapeutic procedure. Excessive Dependency Excessive dependency may be a problem for a number of reasons. Patients may be unable to leave abusive relationships, may sacrifice their own needs for those of others, or they may be so eager to please and fearful of rejection that they are exploited. Above average emphasis on attachment relationships (high WRM), marked submissiveness (low DOM), and indications of borderline features (high BOR) are often associated with a pathological need for acceptance. Interpersonal Distrust Problems related to the ability to trust others, experience and tolerate genuine intimacy, and relinquish some control to others are among the most difficult to address therapeutically. The PAR scale is the most obvious indicator of such distrust, but there are many indicators that can be related to a self-protective stance and relational ambivalence or rejection based on minimal expectations of others and fears of exploitation. Elevations on ARD (particularly ARD-T), SCZ-S, BOR (particularly BOR-N), ANT, AGG-A, and/or NON all raise the possibility that establishing trust should be considered a treatment goal as well as a treatment obstacle. Group therapy may be of particular benefit as a conjoint therapy for such patients. Constriction/Rigidity A rigid, inflexible, perfectionistic, or constricted style (such as those suggested by an elevated ARD-O) may cause a host of problems deserving therapeutic attention. These include overreaction or stress response to unexpected events and change in routine, inability to experience pleasure, disrupted interpersonal relationships, fear of loss of impulse control (which may manifest itself in panic disorder symptoms), inefficient work

< previous page

page_1109

next page >

< previous page

page_1110

next page > Page 1110

habits, indecisiveness, and so on. These traits may also indicate the effects of an abusive or traumatic history. Problems related to these obsessional features are exacerbated by a high need for interpersonal control (suggested by an elevated DOM) that interferes with the ability to make necessary compromises and may lead others to see the individual as overbearing. Lack of Self-Confidence/Assertiveness Lack of self-confidence, difficulty having needs met in relationships, self-doubt, the inability to act assertively, excessive preoccupation with pleasing others, submissiveness, and inhibitions around expressing negative feelings to others may be associated with any number of pathological conditions. However, if these problems are not extreme and are not accompanied by a complex, polysymptomatic clinical picture, they are quite amenable to therapeutic intervention. Typically a behavioral deficit, rather than excess, is involved. Any variety of therapeutic approaches from behavioral to psychodynamic might be appropriate, and short-term therapy is often effective. Indicators include elevations on DEP-C and ARD-P, or suppression on scales such as AGG, DOM, or MAN-G, particularly when coupled with a relative lack of elevations on other scales. Cognitive Distortions Most psychopathology, almost by definition, involves some manifestation of cognitive distortion. However, certain extremely negative evaluations of self, others, and situations might profitably be explored and challenged as an early step in therapy. The PAI contains a number of indicators that suggest a worldview that might impede therapeutic efforts. These cognitions could be confronted with straight cognitive or rational-emotive therapy, or through cognitive techniques integrated into other theoretical approaches. A high NIM score indicates that an individual tends to think in extreme and categorical terms. Substantial NIM elevations in the absence of malingering indicate that the patient is reporting a profoundly negative evaluation of themselves and their life. If this elevation is accompanied by elevated DEP-C and low DOM, the patient likely has a very long-standing, fixed negative self-image that is not likely to yield to brief therapy. ANX-C elevations indicate that patients are prone to experience considerable tension and worry over events they cannot control, but feel that they should be able to control. The DEP-C scale, when elevated, suggests unrealistic feelings of worthlessness, failure, selfblame, and hopelessness. PAR, or any of its subscales, can indicate a fixed belief system involving distorted views and expectations of others. They may distort their experience in order to attribute their misfortune to the neglect of others and see others' successes as luck or favoritism. The PAI in the Evaluation of Change In addition to the applicability of the PAI for treatment planning, the instrument also has many characteristics that make it well suited for the evaluation of treatment efficacy. Newman and Ciarlo (1994) described 11 criteria for the selection and use of instruments as treatment outcome measures. These criteria are discussed as they pertain to the PAI.

< previous page

page_1110

next page >

< previous page

page_1111

next page > Page 1111

The Outcome Measure Should be Relevant to the Target Group. The PAI contains numerous scales relevant to a wide variety of clinical conditions, and use of the test as a prepost measure can provide information about client improvement in several critical areas. However, the utility of the PAI as a treatment outcome measure will obviously vary across different target populations; for example, little information about improvements in eating disorders or sexual dysfunction can be gleaned from the instrument. However, the broad range of symptomatology tapped by the PAI would still provide useful information in studies with such groups. This information could assist in identifying potentially associated problems in such groups, such as depression, anxiety, or anger; and allowing for increased homogeneity for classification in such groups, such as differentiating within such groups according to levels of depression, psychotic features, substance abuse, personality problems, and so forth. The Method Should be Simple and Teachable. The implementation of the PAI as a treatment outcome instrument would be quite simple in most settings. The test is self-administered and also can be administered by computer. Hand scoring the test requires no templates and it can be hand scored by clerical personnel in 10 minutes, although optically scanned computer scoring is also available. It is available for use by both English- and Spanish-speaking clients. Interpretation of the test is reasonably straightforward for any clinician trained in the basics of psychometric assessment as well as in descriptive psychopathology. PAI interpretation is aided by the information provided in the test manual, as well as the information presented in the PAI interpretive guide (Morey, 1996). The Method Should have Objective Referents. The PAI provides numerous referents against which the clinician can compare a given client. The T-scores are referenced against a census-matched community sample; additional transformations are available based on norms for clinical subjects, college students, African Americans, and older adults. In addition, profile data for many different diagnostic or evaluation groups are presented in the interpretive guide (Morey, 1996) or in the Professional Manual (Morey, 1991). Use of Multiple Respondents is Encouraged. A number of writers have noted that different stake-holders (e.g., patient, therapist, spouse, independent evaluator) can give differing portrayals of treatment outcome. The PAI was designed as a self-report instrument intended to capture the experience of the client completing it; as such, it is primarily useful in capturing the client's perspective. The test includes validity scales that seek to identify any systematic distortions in selfrepresentation, but such scales cannot substitute for the nature of information that can be obtained from collateral informants and from clinical impressions. Thus, self-reported improvements on the PAI (as gauged by reductions of clinical scale scores posttreatment) should be supplemented with information from other sources whenever possible. Outcome Measures Should Ideally Identify the Processes by which Treatment is Producing Positive Effects. Newman and Ciarlo (1994) noted that this criterion is fairly controversial, as researchers often do not agree on the extent to which treatment processes and treatment outcomes should correspond. However, repeated administrations of the PAI could be useful in documenting the process of change associated with a particular treatment. For example, in treating depression with cognitive therapy, it is assumed that alterations in the attribution system of the client will produce effects on other types of depressive symptoms. This theoretically anticipated pattern of change could be mapped by repeated administrations of the DEP scale; initial changes on

< previous page

page_1111

next page >

< previous page

page_1112

next page > Page 1112

DEP-C should be observed, with changes on DEP-A and DEP-P occurring later in the treatment process. Similarly, efforts at establishing interpersonal trust that might be leading to personal distress could be mapped by comparing the temporal pattern of changes observed on PAR and ANX. The Measure Should Meet Minimum Criteria of Psychometric Adequacy. The psychometric characteristics of the PAI have been described in some detail earlier in this chapter, and these reflect one of the primary strengths of the instrument. The reliability of the instrument is very good, leading to standard errors of measurement that are sufficiently small to reliably detect even small changes that might be associated with treatment. The validity of the instrument has been documented with respect to widely used measures of treatment-associated changes, including self-administered (e.g., BDI, STAI) and clinician-rated (e.g., Hamilton Rating Scale for Depression, Brief Psychiatric Rating Scale) instruments. The Measure Should have Low Costs Relative to Its Utility. The costs associated with a pre-post administration of the PAI for treatment outcome evaluation are relatively minor. As a self-report instrument, it requires no professional time to administer or score the instrument. Scoring can be accomplished by hand in 10 minutes; alternatively, an unlimited use computer scoring and interpretation program is available at a one-time cost. In addition, the PAI is unique in that is has a separate screener, the Personality Assessment Screener (PAS; Morey, 1997) that can be administered in under 5 minutes. The PAS provides an estimate of the likelihood that problems of various types will be identified in an administration of the full PAI. The combination of the PAS and the PAI makes it possible to provide a highly efficient sequential assessment that makes maximal use of both clinician and client time. The Measure Should be Easily Understood by Nonprofessional Audiences. The scale names and scaling procedures used in the PAI are easily understood by most individuals. PAI scales names such as Depression or Anxiety are straightforward descriptions of the types of questions contained on these scales, and the concurrent validity data support the conclusion that the scales measure what their names imply that they measure. The linear T-score is easily interpreted by nonprofessionals, and these scores can also be expressed as percentile scores referenced against a variety of different groups (e.g., census-matched community sample, clinical sample, or various demographic or diagnostic groups). Although the multiple dimensions assessed by the PAI often present a complex picture for a given client, the use of profiles in presenting these data often render them comprehensible, even to the client. The Instrument Should Provide Easy Feedback and Uncomplicated Interpretation. In many respects, this criterion is the result of meeting many of the criteria described previously. In particular, ease of interpretation is precisely what the concept of "psychometric strength" is designed to insure; a test that is reliable and valid is quite easy to interpret. In particular, the focus on discriminant validity in the construction of the PAI was designed to facilitate interpretation. Many of the difficulties in interpreting measures of psychopathology stem from inadequate discriminant validity; it can be quite challenging to interpret a scale that was intended to measure schizophrenia if there are dozens of other factors that can lead to scale elevations. Thus, interpreting the PAI is more straightforward than interpreting other instruments with lower discriminant validity. In addition, the computer interpretive report and accompanying graphical display of detailed profile information also assists interpretation of the PAI.

< previous page

page_1112

next page >

< previous page

page_1113

next page > Page 1113

The Measure Should be Useful in Clinical Services. From its inception, the PAI was designed to be of maximum utility in a wide variety of clinical settings. As a pretreatment measure, the instrument provides a comprehensive assessment of different functional areas, as well as information critical in making diagnostic assignments. The treatment consideration scales provide information specifically geared to determining treatment intensity (e.g., inpatient vs. outpatient treatment) by providing an assessment of potential for immediate crisis (e.g., suicide or assaultive behavior), as well as the client's motivation for treatment and likelihood of compliance with treatment. As a posttreatment measure, the instrument provides empirically defined "normal ranges" for each scale. Also, scales such as those measuring environmental stress and social support levels provide valuable data for determining the risk of relapse of problems. The Instrument Should be Compatible with Clinical Theories and Practices. The development of the individual PAI scales was based on a systematic review of the extant theories and supportive empirical research surrounding each construct measured. Key theoretical elements that have received research support were included in scale construction; these elements included aspects from many different theories. Examples include cognitive mechanisms in depression (DEP-C), identity disturbance in borderline personality (BOR-I), or sensation seeking in antisocial personality (ANT-S). Thus, rather than adopting one theoretical approach and applying it to several different disorders, the PAI was constructed to tap specific theoretical elements that have received empirical support as they pertain to specific disorders. Application of the PAI in Outcome Assessment At a global level, a successful intervention should have the effect of moving the client's PAI scores in the direction of the norm for a community sample (i.e., 50T). For most scales, this improvement would be reflected by reductions in scores, although there are exceptions to this rule. For example, MAN-G is often abnormally low in clinical samples, revealing very poor self-esteem; thus, increases on MAN-G would be desirable if the score fell substantially below 50T. Increases on RXR would also be expected over the course of a successful treatment, because many of the motivating sources for treatment (e.g., distress or interpersonal difficulties) would be gradually ameliorated. PAI scores have been found to be quite stable over 1-month periods in nontreatment samples (Morey, 1991); the reliability of the instrument would be expected to be even higher over shorter intervals. It should be noted that most of the scales represent constructs in a way that would not be expect to fluctuate from moment to moment; for example, the ANX scale demonstrates a somewhat greater correlation with "trait" anxiety rather than "state" anxiety. Thus, researchers interested in measuring momentary mood states would be better served by instruments designed for that purpose. The PAI can profitably be used as a measure of change over periods of longer duration, and the instrument was designed to be able to detect changes that might occur from week to week. Determining the significance of changes in PAI scores can be accomplished using the standard error of measurement (SEM) estimates calculated from various reliability studies. The SEM provides an index of variability in measurement that would be expected strictly from random fluctuations in scores; thus, changes in scores that are less than one SEM cannot be interpreted as reflecting true change with any confidence. For each of the PAI full scales, the SEM is three to four T-score points, meaning that the 95% confidence interval for these scale scores is typically five to seven points. As a

< previous page

page_1113

next page >

< previous page

page_1114

next page > Page 1114

result, changes in T-scores that are two SEMs (i.e., six to eight T-score points) in magnitude can serve as a conservative threshold for detecting statistically reliable change in a given client. For treatment studies where group comparisons are involved, the statistical significance of any group difference will obviously depend on sample size, and with large samples even small differences might attain statistical significance. When the PAI is used for such purposes, any group differences should certainly be larger than the SEM for the scale before being interpreted as clinically meaningful. It should be recognized that although the test-retest reliability of the PAI is high (and hence scores tend to be stable), these reliability estimates were derived from untreated samples. This does not imply that the PAI is not sensitive to change. This was demonstrated in a study by Friedman (cited in Morey, 1996), who performed a pre-post administration of the PAI with 25 patients during outpatient psychotherapy that had a median duration of 3 months. Friedman reported that 19 of the 21 scales of the PAI (excluding ICN) demonstrated statistically significant changes. However, Friedman's study is also valuable in that it demonstrated that the PAI scales are differentially sensitive to the changes observed in psychotherapy, with some scales demonstrating changes that were quite substantial and others showing smaller changes. Friedman's results suggested that the largest impact of psychotherapy could be observed in reduction of negative affect (ANX, DEP, ARD), improvement of selfesteem (PIM, RXR, BOR), and reduction of interpersonal and environmental turmoil (STR, BOR). Although the changes in substance abuse scales ALC and DRG were statistically significant, only moderate effects were observed. This could be expected for two reasons; first, this was not a substance abuse treatment setting and there were few significant problems of this nature in the sample, and second, the historical nature of many of the ALC and DRG items makes these scales somewhat less sensitive to change. For example, if someone has ever lost a job due to alcohol abuse, this item may be endorsed even if the person has not had a drink in 10 years. Nonetheless, the significance of changes on the substance abuse scales demonstrates that ALC and DRG are sensitive to treatment effects. In the Friedman (1995) study, the only PAI scale (other than INF, which would not be expected to change with treatment) that did not demonstrate a treatment effect was MAN. However, this result is somewhat misleading, because in fact significant changes on MAN subscales did take place. The MAN-G subscale increased .59 standard deviations on average, whereas MAN-I decreased .87 standard deviations (no significant changes were observed on MAN-A). Thus, the opposing changes in these two subscales canceled each other at the full scale level. The Friedman (1995) study demonstrates that the PAI can be used to assess improvement in a group of patients. However, the test has also been used in the literature to study change in a particular patient. One interesting application of the PAI as an outcome measure was reported by Saper, Blank, and Chapman (1995), who described the treatment of a patient with visual and auditory hallucinations that were refractory to conventional pharmacotherapy. This patient had continuous auditory hallucinations (including command hallucinations) and intrusive visions occurring roughly 10 times per day. In addition, she reported experiencing flashbacks of traumatic events that included repeated rapes. This patient had been treated unsuccessfully with all classes of neuroleptic medication, as well as tricyclic antidepressants, serotonin reuptake inhibitors, lithium, carbemazepine, and ECT. Saper et al. described a treatment that combined an imaginal exposure (implosion) treatment for the posttraumatic stress symptoms with fluphenazine medication. They used the 11 clinical scales of the PAI and two treatment scales, SUI and AGG, as outcome measures. Two measures of treatment success were reported: number

< previous page

page_1114

next page >

< previous page

page_1115

next page > Page 1115

of clinical scales reduced below 70T and number of scales that decreased following treatment. Significance testing was conducted in this case study by examining the binomial probability of each of these events occurring. In their study, 12 of the 13 scales examined displayed decreased scores, and none of the 7 scales that had been elevated pretreatment were elevated above 70T following the intervention. The binomial probability of either of these outcomes occurring by chance was less than .01. These PAI changes were corroborated by mental status examination and staff observations at discharge. This use of the PAI is a valuable demonstration of how decisions about outcome and improvement can be made using a solid empirical foundation, even in the context of a case study. Supplemental Assessment Data There is no single source of supplemental assessment data that is necessarily recommended for use with the PAI; as in all assessment situations, more information is better than less. In testing situations in which time or resources are limited, the PAI provides a good deal of information in a relatively short time. As with any other instrument, the use of other test measures will likely help deepen the clinician's understanding. Often, the most valuable supplemental data may be provided by a follow-up interview of the patient or the patient's family, using PAI results as a guide. For example, high scores on the suicide scale should always be cause for concern. Elevations on the traumatic stress scale might alert the clinician to explore for evidence of dissociative phenomena. Evidence of psychotic processes on the PAI that are not apparent clinically should receive further inquiry. The perception of the family might be particularly valuable in those cases in which the patient reports little support from the environment (NON), because information from the family might be at variance with the self-report data in these instances. Provision of Feedback No specific procedures are recommended for providing clients with feedback about PAI results. Sound clinical judgment should be used to time any feedback to patients, and to decide what information will be helpful. In most instances, it is not advisable to simply let the client read an automated report. Some statements might not be applicable, the technical language might not be understood, or the language may seem too pathologized and cause undue concern. However, usually the clinician can review the scale and subscale scores with the patient, using language the patient can understand. Words such as "normal" and "abnormal" should probably be avoided, and results are better framed as being above or below what the average person reports. Scores that indicate therapeutic targets or problems should be discussed as "areas to work on" rather than "what is wrong" with the patient. The PAI profile is a fairly straightforward reflection of what the patient has reported, and usually patients readily recognize themselves and report little discrepancy when given feedback. Case Study The following case, initially described by Morey and Henry (1994), demonstrates how the PAI may be used in planning and evaluating the course of treatment. The case shows how the PAI can be used to make midtreatment course corrections when the treatment does not seem to be progressing satisfactorily.

< previous page

page_1115

next page >

< previous page

page_1116

next page > Page 1116

Ms. A was a 42-year-old divorced, white female with two adult children who presented for outpatient therapy complaining of constant suicidal ideation. She had seen several therapists before and had been hospitalized a number of times for severe depression. Ms. A carried a historical diagnosis of bipolar disorder and had been placed on a variety of medications simultaneously. Three months prior to the initial interview she had been hospitalized following a near-fatal overdose. At the time of her initial consultation she was working and had discontinued all of her medications, stating that they left her "in a fog." The PAI was administered at intake and she was seen again for therapy several days later (see Figs. 35.2 and 35.3 for profile). Given Ms. A's presenting complaint of suicidal preoccupation with signs of depression, two immediate decisions had to be madewhether or not to refer her for a medication evaluation or recommend hospitalization. Her SUI score (105T) confirmed her morbid preoccupation with death. The indicators of functional impairment were reviewed, and none were at a level that suggested her functioning might be impaired to an extent that would warrant hospitalization (which mirrored her self-report). Except for her long-standing suicidal preoccupation, few other signs of impulsivity or vulnerability to self-damaging behaviors were positive. Evidence of psychotic processes or thought disorder was also lacking, and relatively few of the indicators on the Treatment Process Index were suggestive of a particularly difficult treatment process. Therefore, because Ms. A agreed to contact the therapist if she felt at imminent risk of a suicidal gesture, it was decided to begin therapy on an outpatient basis. Because the subscales measuring physiological signs of depression and anxiety were not significantly elevated, a medication referral was also deferred in line with her wishes. All of the positive indicators for psychotherapy were present and none of the negative indicators. It was noted, however, that she had three signs of interpersonal distrust or caution (moderate elevations on SCZ-S, BOR, and ARD-T), as well as some signs of limited motivation for treatment (a relatively high RXR), all of which suggested the potential for some problems in the therapeutic relationship. This was particularly interesting due to the fact that Ms. A initially related in a very warm, relaxed, cooperative manner. There were no strong indications for a primarily supportive or behavioral approach, so the therapist chose to employ his accustomed psychodynamic/interpersonal therapy. As therapy progressed, the therapist became increasingly convinced that Ms. A had likely been severely abused (either physically or sexually) starting at a young age, despite her lack of memory for any such abuse. The therapist shared his feelings with Ms. A and continued to probe for early memories that might help reconstruct her history. Mrs. A became increasingly anxious and began to withdraw interpersonally from the therapeutic interaction. She also began to have intrusive somatic symptoms, episodes of panic and dissociation, and renewed suicidal impulses. At this point (approximately 4 months after therapy began), the therapist became concerned, and readministered the PAI to assess changes (also presented in Figs. 35.2 and 35.3). Ms. A's worsening somatic symptomatology were reflected in a significant rise in her Somatic ComplaintsSomatization (SOM-S) score. It is also interesting to note that as she began to absorb a radically changed understanding of her history, her Borderline Features-Identity Problems (BOR-I) subscale score rose considerably, her social detachment (SCZ-S) also increased, and her self-esteem (as reflected by MAN-G) plummeted. Most troubling were dramatic rises in anxiety and depression (including significant physiological signs) and signs of emergent thought disorder. At this point, the clinician discussed with her the changes on the PAI and encouraged her to accept a referral for a medication consult. Ms. A reluctantly agreed, and the therapist shared

< previous page

page_1116

next page >

< previous page

page_1117

next page > Page 1117

Fig. 35.2. PAI full scale profile for Ms. A. Reproduced by special permission of the Publisher, Psychological Assessment Resources, Inc., Odessa, FL 33556, from the Personality Assessment Inventory by Leslie Morey, PhD, Copyright © 1991 by PAR, Inc. Further reproduction is prohibited without permission of PAR, Inc.

< previous page

page_1117

next page >

< previous page

page_1118

next page > Page 1118

Fig. 35.3. PAI subscale profile for Ms. A. Reproduced by special permission of the Publisher, Psychological Assessment Resources, Inc., Odessa, FL 33556, from the Personality Assessment Inventory by Leslie Morey, PhD, Copyright © 1991 by PAR, Inc. Further reproduction is prohibited without permission of PAR, Inc.

< previous page

page_1118

next page >

< previous page

page_1119

next page > Page 1119

the PAI results with the consulting psychiatrist. Because of the marked depression accompanied by a high level of anxiety and increased signs of underlying thought disorder, the psychiatrist chose an antidepressant with additional antipsychotic properties. The therapist also adopted a more supportive and less uncovering stance for a period of time. Ms. A responded well to the medication, and was able to resume and tolerate continued exploratory psychotherapy. Conclusions The PAI provides a comprehensive assessment of important clinical constructs that can be of great use in planning and evaluating treatment. Because of the instrument's psychometric strength and economy of use, it has great promise for increasing the precision with which different forms of treatment are implemented and examined for efficacy. The needs for future work with the PAI are similar to the needs for the field in general; to this point, the advantages of a careful evaluation in the construction of differential treatments have not received sufficient empirical demonstration. The PAI represents the increasing measurement sophistication of the assessment field in addressing critical differences among clients presenting for treatment. Ideally, as the critical differences among various mechanisms of treatment become increasingly well specified and measured, the disciplines of psychological assessment and treatment process research can combine to provide better maps through the labyrinth of clinical decision making. References Alterman, A.I., Zaballero, A.R., Lin, M.M., Siddiqui, N., Brown, L.S., Rutherford, M. J., & McDermott, P.A. (1995). Personality Assessment Inventory (PAI) scores of lower-socioeconomic African American and Latino methadone maintenance patients. Assessment, 2, 91-100. Ban, T.A., Fjetland, O.K., Kutcher, M., & Morey, L.C. (1993). CODE-DD: Development of a diagnostic scale for depressive disorders. In I. Hindmarch & P. Stonier (Eds.), Human psychopharmacology: Measures and methods (Vol. 4, pp. 73-86). Chichester, England: Wiley. Beck, A.T., & Steer, R.A. (1987). Beck Depression Inventory manual. San Antonio: The Psychological Corporation. Beck, A.T., & Steer, R.A. (1988). Beck Hopelessness Scale manual. San Antonio: The Psychological Corporation. Beck, A.T., & Steer, R.A. (1990). Beck Anxiety Inventory manual. San Antonio: The Psychological Corporation. Bell, M.J., Billington, R., & Becker, B. (1985). A scale for the assessment of object relations: Reliability, validity, and factorial invariance. Journal of Clinical Psychology, 42, 733-741. Bell, M.J., Billington, R., Cicchetti, D., & Gibbons, J. (1988). Do object relations deficits distinguish BPD from other diagnostic groups? Journal of Clinical Psychology, 44, 511-516. Bell-Pringle, V.J. (1994). Assessment of borderline personality disorder using the MMPI-2 and the Personality Assessment Inventory. Unpublished doctoral dissertation, Georgia State University, Atlanta, GA. Boyle, G.J., & Lennon, T.J. (1994) Examination of the reliability and validity of the Personality Assessment Inventory. Journal of Psychopathology and Behavior Assessment, 16, 173-188. Cashel, M.L., Rogers, R., Sewell, K., & Martin-Cannici, C. (1995). The Personality Assessment Inventory and the detection of defensiveness. Assessment, 2, 333-342. Cherepon, J.A., & Prinzhorn, B. (1994). The Personality Assessment Inventory (PAI) profiles of adult female abuse survivors. Assessment, 1, 393-400.

< previous page

page_1119

next page >

< previous page

page_1120

next page > Page 1120

Costa, P.T., & McCrae, R.R. (1985). The NEO Personality Inventory manual. Odessa, FL: Psychological Assessment Resources. Costa, P.T., & McCrae, R.R. (1992). Normal personality in clinical practice: The NEO Personality Inventory. Psychological Assessment, 4, 5-13. Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. Cull, J.G., & Gill, W.S. (1982). Suicide Probability Scale manual. Los Angeles, CA: Western Psychological Services. Fantini-Salvador, P., & Rogers, R. (1997). Spanish version of the MMPI-2 and PAI: An investigation of concurrent validity with Hispanic patients. Assessment, 4, 29-39. Frances, A., Clarkin, J., & Perry, S. (1984). Differential therapeutics in psychiatry: The art and science of treatment selection. New York: Brunner/Mazel. Friedman, P.H. (1995). Change in psychotherapy: Foundation for Well Being Research Bulletin 106. Plymouth Meeting, PA: Foundation for Well Being. Greenstein, D.S. (1993). Relationship between frequent nightmares, psychopathology, and boundaries among incarcerated male inmates. Unpublished doctoral dissertation, Adler School of Professional Psychology, Chicago, IL. Hamilton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry, 23, 56-62. Hare, R.D. (1985). Comparison of procedures for the assessment of psychopathy. Journal of Consulting and Clinical Psychology, 53, 7-16. Hathaway, S.R., & McKinley, J.C. (1967). MMPI manual (rev. ed.). New York: Psychological Corporation. Helmes, E. (1993). A modern instrument for evaluating psychopathology: The Personality Assessment Inventory professional manual. Journal of Personality Assessment, 61, 414-417. Holmes, T. H., & Rahe, R.H. (1967). The social readjustment rating scale. Journal of Psychosomatic Research, 11, 213-218. Jackson, D.N. (1970). A sequential system for personality scale development. In C.D. Spielberger (Ed.), Current topics in clinical and community psychology (Vol. 2, pp. 62-97). New York: Academic Press. Karasu, T.B. (1990a). Toward a clinical model of psychotherapy for depression: I. Systematic comparison of three psychotherapies. American Journal of Psychiatry, 147, 133-147. Karasu, T.B. (1990b). Toward a clinical model of psychotherapy for depression: II. An integrative and selective treatment approach. American Journal of Psychiatry, 147, 269-278. Keane, T.M., Caddell, J.M., & Taylor, K.L. (1988). Mississippi scale for combat-related posttraumatic stress disorder: Three studies in reliability and validity. Journal of Consulting and Clinical Psychology, 56, 85-90. Kurtz, J.E., Morey, L.C., & Tomarken, A.J. (1993). The concurrent validity of three self-report measures of borderline personality. Journal of Psychopathology and Behavioral Assessment, 15, 255-266. Lambert, M.J. (1991). Introduction to psychotherapy research. In L.E. Beutler & M. Crago (Eds.), Psychotherapy research: An international review of programmatic studies (pp. 98-110). Washington, DC: American Psychological Association. Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635-694. Montag, I., & Levin, J. (1994). The five factor model and psychopathology in nonclinical samples. Personality and Individual Differences, 17, 1-7. Morey, L.C. (1991). The Personality Assessment Inventory professional manual. Odessa, FL: Psychological Assessment Resources. Morey, L.C. (1996). An interpetive guide to the Personality Assessment Inventory. Odessa, FL: Psychological Assessment Resources. Morey, L.C. (1997). The Personality Assessment Screener professional manual. Odessa, FL: Psychological Assessment Resources. Morey, L.C., & Henry, W. (1994). The Personality Assessment Inventory. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 185-216). Hillsdale, NJ: Lawrence Erlbaum Associates. Morey, L.C., Waugh, M.H., & Blashfield, R.K. (1985). MMPI scales for DSM-III personality disorders: Their

derivation and correlates. Journal of Personality Assessment, 49, 245-251. Newman, F.L., & Ciarlo, J.A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M. Maruish (Ed.), The use of psychological testing

< previous page

page_1120

next page >

< previous page

page_1121

next page > Page 1121

for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Osborne, D. (1994, April). Use of the Personality Assessment Inventory with a medical population. Paper presented at the meetings of the Rocky Mountain Psychological Association, Denver, CO. Overall, J.E., & Gorham, D.R. (1962). The Brief Psychiatric Rating Scale. Psychological Reports, 10, 799-812. Perry, S., Frances, A., & Clarkin, J. (1988). A DSM-III-R casebook of treatment selection. New York: Brunner/Mazel. Procidano, M.E., & Heller, K. (1983). Measures of perceived social support from friends and from family: Three validation studies. American Journal of Community Psychology, 11, 1-24. Rachman, S.J., & Hodgson, R.J. (1980). Obsessions and compulsions. Englewood Cliffs, NJ: Prentice-Hall. Rogers, R., Flores, J., Ustad, K., & Sewell, K.W. (1995). Initial validation of the Personality Assessment Inventory-Spanish Version with clients from Mexican American communities. Journal of Personality Assessment, 64, 340-348. Rogers, R., Ornduff, S.R., & Sewell, K. (1993). Feigning specific disorders: A study of the Personality Assessment Inventory (PAI). Journal of Personality Assessment, 60, 554-560. Saper, Z., Blank, M.K., & Chapman, L. (1995). Implosive therapy as an adjunctive treatment in a psychotic disorder: A case report. Journal of Behavior Therapy and Experimental Psychiatry, 26, 157-160. Schinka, J.A. (1995) Personality Assessment Inventory scale characteristics and factor structure in the assessment of alcohol dependency. Journal of Personality Assessment, 64, 101-111. Schinka, J.A., & Borum, R. (1993). Readability of adult psychopathology inventories. Psychological Assessment, 5, 384-386. Schlosser, B. (1992). Computer assisted practice. The Independent Practitioner, 12, 12-15. Selzer, M.L. (1971). The Michigan Alcoholism Screening Test: The quest for a new diagnostic instrument. American Journal of Psychiatry, 127, 1653-1658. Sifneos, P.E. (1987). Short-term dynamic psychotherapy: Evaluation and technique (2nd ed.). New York: Plenum. Skinner, H.A. (1982). The drug abuse screening test. Addictive Behaviors, 7, 363-371. Smith, M.L., Glass, G.V., & Miller, T.I. (1980). The benefits of psychotherapy. Baltimore: Johns Hopkins University Press. Sloane, R.B., Staples, F.R., Cristol, A.H., Yorkston, N.J., & Whipple, K. (1975). Short-term analyticallyoriented psychotherapy vs. behavior therapy. Cambridge, MA: Harvard University Press. Spielberger, C.D. (1983). Manual for the State-Trait Anxiety Inventory. Palo Alto, CA: Consulting Psychologist's Press. Spielberger, C.D. (1988). State-Trait Anger Expression Inventory. Odessa, FL: Psychological Assessment Resources. Stone, M. (1985). Schizotypal personality: Psychotherapeutic aspects. Schizophrenia Bulletin, 11, 576-589. Strupp, H.H., & Binder, J. (1984). Psychotherapy in a new key. New York: Basic Books. Trapnell, P.D., & Wiggins, J.S. (1990). Extension of the Interpersonal Adjective Scale to include the big five dimensions of personality. Journal of Personality and Social Psychology, 59, 781-790. Trull, T.J. (1995). Borderline personality disorder features in nonclinical young adults: 1. Identification and validation. Psychological Assessment, 7, 33-41. Trull, T.J., Useda, J.D., Conforti, K., & Doan, B.T. (1995, August). Two-year outcome of subjects with borderline features. Paper presented at the meetings of the American Psychological Association, New York. Wahler, H.J. (1983). Wahler Physical Symptoms Inventory (1983 ed.). Los Angeles: Western Psychological Services. Waldinger, R.J., & Gunderson, J.G. (1987). Effective psychotherapy with borderline patients: Case studies. New York: MacMillan. Wiggins, J.S. (1966). Substantive dimensions of self-report in the MMPI item pool. Psychological Monographs, 80, 22 (Whole No. 630). Wolpe, J., & Lang, P. (1964). A fear survey schedule for use in behavior therapy. Behavior Research and Therapy, 2, 27-30.

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_1123

next page > Page 1123

Chapter 36 Rorschach Inkblot Method Irving B. Weiner University of South Florida Through a series of events Hermann Rorschach, a Swiss psychiatrist and protege of Eugen Bleuler, found himself in unappealing professional circumstances during the second decade of the present century. Brilliant, creative, and energetic, he was working in a public mental hospital that was providing primarily long-term custodial care and offering little challenge or stimulation to his considerable talents. Looking for ways to exercise his mind, he became curious about how patients in the hospital might respond to a game he had played as a youth in which participants competed to see who could be most creative in describing what a series of inkblots might be. Rorschach thus innovated the utilization of inkblot perceptions to identify patterns of personality functioning and discriminate among people with various kinds of psychological disorder. The story of how Rorschach's exploratory efforts resulted in the 1921 publication of Psychodiagnostics, a monograph based on the responses of 288 patients and 117 nonpatient subjects to a set of 10 specific inkblots, has been told often and well (e.g., Ellenberger, 1954; Exner, 1993, chap. 1; Schwarz, 1996). The materials and methods described by Rorschach in Psychodiagnostics provide the basic foundation for the manner in which Rorschach assessment has been and still is most commonly practiced. The standard Rorschach comprises the same 10 inkblots that were published with Rorschach's original monograph. Five of these blots are in achromatic shades of grey and black, two contain red as well as grey-black elements, and the remaining three are in various chromatic hues. In what constitutes the ''free association" phase of administration, subjects are shown the blots one at a time and asked, "What might this be?" The examiner records subjects' responses verbatim and than proceeds with a second phase of the administration, the "Inquiry," in which the stated purpose is "To help me see what you saw." Each response is read back to the subject, who is then asked to indicate where the percept was seen and what made it look as it did. Rorschach died at age 37, just 1 year after his monograph was published, and the further development of his instrument was thus left to other minds. Numerous variations of his method of administering the inkblots and coding responses to them were subsequently proposed. In the United States, five different Rorschach systems emerged in

< previous page

page_1123

next page >

< previous page

page_1124

next page > Page 1124

the 1930s and 1940s, in the creative hands of Sam Beck, Bruno Klopfer, Marguerite Hertz, Zygmunt Piotrowski, and the tandem of David Rapaport and Roy Schafer. These Rorschach pioneers were gifted clinicians who used the Rorschach effectively and taught others to do likewise. However, differences among their approaches and many idiosyncratic embellishments of their methods by practitioners and researchers prevented the Rorschach from becoming a standardized assessment instrument and thereby delayed for many years the accumulation of systematic research concerning its psychometric properties and behavioral correlates. The differences among these five Rorschach systems were analyzed in detail by Exner (1969), who then undertook an extensive research program to identify which elements of each approach were most likely to foster reliable coding and objective guidelines for interpretation. The result was the Rorschach Comprehensive System, which was introduced in 1974 (Exner, 1974). As subsequently elaborated by Exner and Weiner (Exner, 1991, 1993; Exner & Weiner, 1995; Weiner, 1997b), the Comprehensive System has since become by far the most widely used Rorschach method. Accordingly, the Rorschach Inkblot Method is discussed in this chapter primarily in terms of the Comprehensive System, although not exclusively so. The first section reviews how Rorschach data are used to assess personality functioning and discusses the psychometric status of the instrument. The second and third sections discuss and illustrate the application of Rorschach findings in the planning and evaluation of treatment. Overview of Rorschach Basics In the Comprehensive System, the inkblots are administered as just described, which was Rorschach's recommended approach. Each response is then coded according to the following eight characteristics: 1. Location choice concerns which parts of the blots are used for a response. Do subjects use the whole blot (W), for example, a commonly used detail (D), a rarely used detail (Dd), or the white space (S)? Do locations choices involve a combination of discrete areas of the blot (DQ+), a single discrete area of the blot (DQo), or a vague impression of the blot that lacks any specific form demand (DQv)? 2. Determinants concern which features of the blots contribute to their looking as they do. Do subjects articulate the chromatic (C), achromatic (C'), shading (Y), textural (T), or shading/dimensional (V) features of the inkblots, for example, or do they attribute human movement (M), animal movement (FM), or inanimate movement (m) movement to their percepts? 3. Form quality concerns whether the shapes of the blots are perceived in a highly articulated fashion (FQ+), in fairly common, conventional ways (FQo), in idiographic but realistic ways (FQu), or in a distorted fashion (FQ-). 4. Content concerns what the subject perceives the blots to resemble. Specific codes are used for 26 specific content categories, such as whole human (H), animal detail (Ad), anatomy (An), nature (Na), and blood (Bl). 5. Pair is coded as (2) for responses in which the symmetrical nature of the blots results in the perception of "two" of some object being reported. 6. Popular is coded as P for responses that are very commonly given, as defined by their occurring in at least one of every three records. There are 13 such responses that are coded P in the Comprehensive System. 7. Organizational activity concerns the extent to which the blots are synthesized in formulating percepts. Specific numerical weights are prescribed on each card for responses that integrate the

< previous page

page_1124

next page >

< previous page

page_1125

next page > Page 1125

whole blot, integrate adjacent or distant details with each other, or integrate the white space with inked areas of the cards. 8. Special scores comprise 14 codes for a variety of distinctive or unusual ways in which subjects may express or embellish their responses, as by using strange language (Deviant Verbalization, DV), stating logical nonsequiturs (Autistic Logic, ALOG), describing morbid circumstances (Morbid, MOR), or attributing aggressive intent to their images (Aggressive, AG). The Comprehensive System codes for these eight characteristics of individual Rorschach responses are summarized and combined in various ways to yield numerous ratios, percentages, and indices that constitute Rorschach scores and provide what is called the Structural Summary. It is the Structural Summary, rather than individual responses considered in isolation, that provides the primary basis for generating inferences about personality functioning. However, there is more to the interpretive process than is captured in a Structural Summary, and sophisticated utilization of Rorschach data requires appreciation of the interplay between objective and subjective features of subjects' responses. Objective and Subjective Features of the Rorschach As traditionally defined, objective tests are relatively structured measuring instruments in which the nature of the stimuli and the subject's task are both clearly specified, as in the case of a self-report inventory in which statements are endorsed as being True or False with respect to oneself. Projective tests, by contrast, are relatively unstructured measures in which the test stimuli are somewhat ambiguous and/or how the subject should respond is left more or less unspecified. Convention may have doomed the Rorschach to being categorized forever as a projective test, which to many people signifies its being an entirely subjective instrument. However, as indicated by the preceding brief history and overview of coding, Rorschach tradition is grounded for the most part in an objective approach to the data it yields. Although Rorschach was well-versed in the psychoanalytic theories of his day, he regarded the inkblot method as an experimental, atheoretical procedure for identifying personality styles and determining the diagnostic implications of perceptual processes. In the introduction to his Psychodiagnostics, which he subtitled "A Diagnostic Test Based on Perception," he wrote that "all of the results are primarily empirical" and that "the conclusions drawn, therefore, are to be regarded more as observations than as theoretical deductions." As if to dispel any uncertainty about what he had in mind, he devoted just two pages of his monograph to the content of responses, and he stated specifically that ''the test cannot be used to probe into the content of the subconscious" (Rorschach, 1921/1942, p. 123). In the tradition of its originator, the structural analysis of Rorschach responses proceeds in an objective manner from the coding of perceptual style to conclusions concerning the implications of particular perceptual styles. Whether subjects are using the entire blot in formulating a response is an objective fact, for example, and the corollaries of W emphasis in a record can be examined as objectively as the corollaries of variables drawn from other tests that are commonly described as objective measures. It is further significant with respect to objectivity that the Comprehensive System was designed to include only variables on which examiners who were familiar with the coding criteria could achieve substantial agreement. Subsequent research has indicated that pairs of raters can in fact achieve better than 90% agreement for Location Choice, Pair, Popular

< previous page

page_1125

next page >

< previous page

page_1126

next page > Page 1126

and Organizational Activity Codes; more than 80% agreement on the other four response characteristics just listed; and an overall mean percentage interrater agreement of just under 90% (Exner, 1991, pp. 459-460; Exner, 1993, p. 138; McDowell & Acklin, 1996). Hence, although most Rorschach codes cannot be assigned with the same certainty as the score for a True or False response on a self-report inventory, coding in the Comprehensive System is a reliable and largely objective process. There is accordingly considerable objectivity involved in identifying personality and behavioral correlates of the formally scored dimensions of the perceptual style that subjects bring to bear in articulating what the Rorschach inkblots might be. On the other hand, Rorschach scholars have long recognized that Rorschach did not fully appreciate the potential of his method. Among the American systematizers, Klopfer and Schafer were particularly influential in using psychoanalytic perspectives to elaborate numerous ways in which the thematic content of Rorschach responses can in fact provide clues to a person's underlying feelings and concerns (B. Klopfer, Ainsworth, W.G. Klopfer, & Holt, 1954; Schafer, 1954). Whereas some Rorschach specialists continued in the tradition of concentrating on identifying correlates of formally scored perceptual style, others established a new and less objective tradition in which the inkblots are viewed less as a perceptual task than as a stimulus to fantasy (see Goldfried, Stricker, & Weiner, 1971, chap. 13). In this subjective tradition, the most important data in a Rorschach protocol are the subject's fantasy productions, and elaborations of content provide the major basis for drawing inferences about personality functioning. The distinction between objective and subjective features of Rorschach data revolves around the role of projection in the formulation of responses. In perceptual terms, projection occurs when people attribute specific characteristics to stimulus fields on the basis of their internal thoughts, feelings, or need states. Rorschach made no reference to projection in his work, nor did any of the other early Rorschach systematizers. However, Frank (1939) suggested that personality tests with relatively little structure lead subjects to "project upon that plastic field" their personal feelings and attitudes, and therefore constitute "projection measures" (p. 395). From Frank's suggestion came the now well-entrenched distinction between objective and projective tests and the customary but erroneous classification of the Rorschach as a projective test. Schachtel (1966), a substantial scholar in psychoanalysis as well as an authority on the Rorschach, wrote many years ago that "only a small fraction of the many processes underlying Rorschach responses are of a projective nature" (p. 10). More recently, Exner (1989, 1996) elaborated the reasons why the Rorschach is not basically a projective test. Although some Rorschach responses may be determined in part by projection, subjects can comply with the instructions (i.e., "What might this be?" "Where do you see it?" "What makes it look like that? ") and produce a valid protocol by responding to the stimulus properties of the blots without using projection in formulating their answers. For example, consider a response to Card I of ''The whole thing looks like a black bat." Card I does in fact look like a bat or butterfly to most people, and it is grey-black in color. Hence, this response does not attribute characteristics to the stimulus that are not already there. As a commonly seen percept using the entire blot and articulating achromatic color, the response may well contribute to meaningful inferences about personality style, especially if these characteristics appear in numerous other responses as well. However, this contribution derives from the perceptual style that is in evidence, not from manifestations of projection. By contrast, consider Card I seen as "A vulture swooping down to get its prey." This response involves an uncommon and inaccurate percept (vulture), the attribution of

< previous page

page_1126

next page >

< previous page

page_1127

next page > Page 1127

movement to a static inkblot (swooping down), and the fantasy of imminent victimization (to get its prey). Responses that are perceived inaccurately, involve movement, or are embellished in these ways usually involve projection, because what is being reported is not present in the stimulus and must therefore have emerged from attitudes and concerns internal to the subject. Responses that describe stimulus properties of the inkblots help to identify aspects of personality structure, whereas responses in which projection occurs provide information about underlying personality dynamics. Although such projected material is not essential to the production of an interpretable record, its presence typically enriches what can be learned about people from their Rorschach protocol. Interpretive Strategies Consistent with the preceding discussion, adequate interpretation of the Rorschach Inkblot Method involves attention to both the structural and thematic characteristics of subjects' responses. In addition, valuable information can be gleaned from close attention to how subjects interact with the examiner during the test administration. Structural Characteristics. Interpretation of the structural characteristics of a Rorschach protocol is based on the assumption that the formulation of Rorschach responses constitutes a representative sample of behavior. Because the Rorschach stimuli are in fact only inkblots, the instructions to say what they might be and why they look that way present subjects with a problem-solving task. The manner in which subjects exercise perceptual and cognitive processes to comply with the instructions, and thereby deliver responses, can be expected to indicate how they are likely to cope with perceptual-cognitive tasks in their daily lives and how they are likely to think, feel, and act in problem-solving situations. Because the manner in which subjects structure the inkblots is representative of how they are inclined to structure other kinds of perceptual-cognitive experience, identifiable correlates of structuring the Rorschach in certain ways provide the basis for numerous inferences concerning a subject's characteristic dispositions and current emotional and attitudinal states. For example, individuals who respond mainly to peripheral details and rarely use the whole blot in formulating their responses are likely to lack capacity or willingness to integrate aspects of their experience; such people tend to be the kinds of individuals of whom it is said they "don't see the big picture" or "lose sight of the forest for the trees." Similarly, people who report many perceptually inaccurate or distorted forms are likely in their everyday existence to misperceive what is happening around them and misjudge the consequences of their actions; those whose determinants include frequent articulation of the black and grey features of the blots tend to look on the dark side of things as they go through life experiencing sad and gloomy affect; and so on, through an extensive set of interpretive implications attached to structural aspects of how subjects respond to the Rorschach task. Thematic Characteristics. Interpretation of the thematic characteristics of a Rorschach protocol is based on the assumption that responses are symbolic of behavior. It is in relation to thematic interpretation that the Rorschach is approached as a stimulus to fantasy, rather than as a perceptual task, and it is in thematic interpretation that attention is directed primarily to those responses in which projection seems to have occurred. More specifically, the thematic characteristics of a protocol consist of the manner in which Rorschach percepts are elaborated beyond the basic requirement merely to

< previous page

page_1127

next page >

< previous page

page_1128

next page > Page 1128

indicate what the inkblots might be. Thus, a response of "Two people" has few thematic characteristics, although a subject who gives an unusually high frequency of "two people" responses may, in a fairly representative manner, be revealing some preoccupation with people or person-related events. By contrast, a response of "two people just standing there" has a definite thematic elaboration that begins to suggest a passive orientation to life or an impression of people as not readily becoming engaged with each other. A further elaboration of this response, into "two people just standing there, each waiting for the other to take the first step," leaves little doubt concerning the subject's passivity and interpersonal reticence, especially if the same theme appears in several different responses. In this way, thematic imagery provides the basis for interpretive hypotheses concerning a person's underlying attitudes and concerns, especially with respect to self-perceptions and interpersonal orientations. Like structural characteristics of Rorschach responses, thematic imagery can be coded in various ways. Some types of thematic imagery are categorized within the formal coding of the Comprehensive System. These include coding categories for morbid content elaborations (MOR), for whether movement is active (a) or passive (p), and for interactions that are described as aggressive (AG) or cooperative (COP). Numerous other categorizations of thematic elaborations that have been developed separately from the Comprehensive System, especially from an object relations perspective, similarly involve formal coding that makes them accessible to psychometric evaluation (see Aronow & Reznikoff, 1976; Lerner, 1991, chaps. 11-14; Stricker & Healey, 1990). In addition, however, thematic elaborations often provide clues to idiographic personality dynamics that are expressed solely in qualitative impressions without quantification. It seems reasonable to infer, for example, that subjects who consistently describe the objects they see as small and weak are harboring concerns about their own personal adequacy. There has been some misconception that the Comprehensive System gives short shrift to thematic characteristics of a protocol and, by focusing narrowly on structural characteristics, fails to mine the full potential of Rorschach data to illuminate personality functioning. The Comprehensive System was developed primarily to strengthen the analysis of the structural data and to provide a psychometric foundation for the instrument, and its innovations lie mainly in these areas. However, this emphasis was never intended to dispense with wellreasoned inferences based on content elaborations. In fact, current developments in Comprehensive System interpretation call for a closely integrated consideration of structural and thematic characteristics of responses, much in the manner first proposed by Schafer (1954). Moreover, the strategies of interpretation now strongly recommended in the Comprehensive System require special attention to the content of those responses likely to involve projection, namely, those containing movement, form distortion, or various kinds of embellishment (Exner, 1991, chaps. 5-9). Examiner Interaction Characteristics. Examiner interaction characteristics refer to the attitudes that subjects bring to the test-taking task and the manner in which they relate to the person administering the Rorschach to them. For example, through taskrelated comments, subjects can reveal themselves to be passive/deferential (e.g., "Is it all right for me to turn the card?") or hostile/assertive (e.g., "That's all you can get me to say about that one") in relation to authority. Through extraneous comments about the testing or about themselves (e.g., "This seems like a waste of time," "I'm a little worried about what these tests will show"), subjects can provide considerable information about their personal style and their frame of mind. For most people, taking the Rorschach is a less familiar and more ambiguous situation than being interviewed or taking a relatively structured test. Accordingly, the Rorschach

< previous page

page_1128

next page >

< previous page

page_1129

next page > Page 1129

examination may be particularly rich in generating interpretively meaningful examiner interactions. However, there is nothing unique about the Rorschach, aside from its novelty and ambiguity, with respect to the clinical utility of examiner interactions. When they occur, they can be understood by empathic clinicians independently of any specific knowledge about the Rorschach. Although Rorschach clinicians can enhance their understanding of the persons they examine by taking note of examiner interaction characteristics (see Schachtel, 1966, chap. 12; Schafer, 1954, chap. 2), this source of information is not discussed further in the present chapter. Psychometric Status of the Rorschach The psychometric status of the Rorschach, long a bone of contention between researchers who insisted it had few redeeming qualities and practitioners who knew in their hearts that it was a sound test but were unable to prove it, has now been fairly clearly established by extensive research conducted with the Comprehensive System. Data attesting the soundness of the instrument have been adduced with respect to its normative base, retest reliability, and criterion and construct validity. Normative Base. The Comprehensive System includes extensive normative data for each of its individual codes, ratios, percentages, and indices. For adults, the published norms are based on 700 subjects, 350 males and 350 females, whose records were randomly selected from an available group of 1,332 records of nonpatient adults. These 700 persons range in age from 18 to 70, with a mean 32.4, and they were drawn as nonpatient volunteers from all parts of the United States. The group is roughly comparable to U.S. census data with respect to education, marital status, race, socioeconomic level, and place of residence. The demography of this reference sample is described in detail by Exner (1991, chap. 2). The normative data themselves comprise for each Rorschach variable its mean, standard deviation, range, median, mode, frequency (i.e., in how many of the nonpatients is it likely to appear at least once), and distribution characteristics (skewness and kurtosis). The distribution characteristics are used as a basis for indicating which variables satisfy usual requirements for application of parametric descriptive and inferential statistics, and which are more appropriately described and analyzed with non-parametric methods. All of this information on nonpatient adults is also presented separately for the males and females. The same kinds of normative data are provided as well for 1,390 nonpatient children and adolescents from age 5 to 16. These data are presented separately for each age group, the sample sizes of which range from 80 to 140 subjects. Normative findings at age 16 closely resemble those obtained with adults. Hence, the Rorschach protocols of 17-year-olds can be compared with either the age 16 or the adult data set. Finally, information is available for reference purposes on four groups of adult psychiatric patients, including 320 schizophrenic inpatients, 315 depressed inpatients, 440 outpatients, and 180 patients with diagnosed character disorder. Reliability. The reliability of the Comprehensive System has been documented in a series of retest studies with both children and adults over retest intervals ranging from 7 days to 3 years (Exner & Weiner, 1995, pp. 21-27). Almost all of the individual variables coded in the system that relate to trait characteristics of individuals have demonstrated substantial short- and long-term stability in adults. Most of these variables show retest correlations in the .80s, and the only variables falling below .72 in retest

< previous page

page_1129

next page >

< previous page

page_1130

next page > Page 1130

studies with adults are two that relate to state rather than trait characteristics: Inanimate Movement (m) and Diffuse Shading (Y), both of which are indices of situational distress. Moreover, the various ratios and percentages in the structural summary show even greater temporal stability than the individual variables on which they are based. These ratios and percentages carry more interpretive weight than individual codes, and some of them approach the stability of a Wechsler IQ. For example, the 3year retest correlations among adults are .90 for the Affective Ratio, .87 for the Egocentricity Ratio, and .85 for Experience Actual. Among children, short-term retest studies (3 weeks) reveal stability correlations similar to those found in adults. However, as would be expected from the evolving nature of personality during the developmental years, nonpatient young people do not show much Rorschach stability over a 2-year period until they reach age 14 (Exner, Thomas, & Mason, 1985). Validity. The complexity of the Rorschach precludes any global statements concerning its validity. Indeed, blanket or unqualified assertions that the Rorschach is or is not a valid assessment instrument say more about the naivety of the person making the statement than about the characteristics of the instrument. Like other multifaceted measures yielding numerous individual and summary scores, the Rorschach can be described as more or less valid only in relation to how its individual components fare on assessments of their criterion or construct validity. Additionally, conclusions about the validity of Rorschach components must be framed in terms of specific purposes (i.e., valid for what?) and particular contexts (i.e., valid under what circumstances?) associated with positive or negative findings. The fundamental question in assessing the validity of the Rorschach is whether the basic interpretive hypotheses associated with its structural and thematic characteristics are demonstrable in fact. Does the failure to use texture identify interpersonal distance, for example? Does a high frequency of distorted perceptions indicate poor judgment? Do numerous achromatic color and color-shading responses suggest dysphoric mood? Do responses involving reflection point to narcissistic characteristics? A vast array of focused research studies, summarized in the basic texts on the Comprehensive System (Exner, 1991, 1993) and published over the years in such journals as the Journal of Personality Assessment, Psychological Assessment, and Assessment, have in fact documented the empirical soundness of these and a great many other Rorschach characteristics as valid measures of both state and trait aspects of personality functioning. In response to critics who have without empirical justification denigrated the Rorschach as a psychometrically shoddy instrument (e.g., Dawes, 1994), Weiner (1996) provided an overview of relevant literature attesting the validity of this instrument for well-conceived purposes. With further respect to its practical applications, the validity of Rorschach variables has been assessed extensively in terms of how well they correlate with what people in a broad sense are likely to be (e.g., schizophrenic, suicidal, violent) or are likely to do (e.g., commit crimes, drop out of therapy, function effectively as a manager). Unlike the consistently positive demonstrations of relatedness between Rorschach characteristics and specific personality states and traits, Rorschach-based estimates of complex current conditions and future behavior patterns have yielded mixed results in validity studies. However, the majority of such studies in which Rorschach findings have failed to correlate with criterion variables have called on the instrument to achieve purposes for which it was neither designed nor intended. The Rorschach is a measure of personality functioning. As such, it can be expected to measure present conditions and predict future behavior only when these conditions and behaviors involve considerable personality variance. If the Rorschach is asked to

< previous page

page_1130

next page >

< previous page

page_1131

next page > Page 1131

correlate with phenomena determined largely by known personality characteristics, it is very likely to do so. On the other hand, the Rorschach has few prospects of correlating with phenomena that constitute nonpersonality characteristics of the individual (i.e., intelligence or creative talent) or circumstances external to the individual (i.e., having an adequate support network). Significantly for the focus of this chapter, the manner in which people respond to psychotherapy has a great deal to do with features of their personality, and treatment planning and evaluation are therefore clinical procedures on which Rorschach data have considerable bearing. To conclude these observations on the psychometric foundations of the Rorschach before moving on to treatment planning, numerous scholars have adduced empirical evidence to demonstrate that the Rorschach method can and will demonstrate substantial validity when it is used appropriately for purposes for which it is intended and studied in research designed along appropriate conceptual or empirical lines (Atkinson, Quarrington, Alp, & Cyr, 1986; Blatt, 1975; Exner, 1993; Holt, 1967; Meyer, 1996; Parker, 1983; Weiner, 1977, 1995). Of particular note is a meta-analysis by Parker, Hanson, and Hunsley (1988) in which the Rorschach was found to be approximately equivalent to the Wechsler Adult Intelligence Scale (WAIS) and the Minnesota Multiphasic Personality Inventory (MMPI) in reliability and stability and equivalent to the MMPI in convergent validity. Parker et al. concluded from their research review that "both the MMPI and the Rorschach can be considered to have adequate psychometric properties if used for the purposes for which they were designed and validated" (p. 367). Well-designed research continues with regularity to affirm the utility of the Rorschach in providing valid assessments of personality characteristics and facilitating not only differential diagnosis, but treatment planning and evaluation as well (see Ganellen, 1996; Hilsenroth, Fowler, & Padawer, 1998; Hilsenroth, Fowler, Padawer, & Handler, 1997; Weiner, 1997a). Treatment Planning Rorschach assessment contributes to treatment planning in three ways. First, data provided by the Rorschach Inkblot Method help to identify a prospective patient's levels of personality integration and subjectively felt distress, both of which have known implications for the intensity of psychotherapy people can tolerate and their likelihood of becoming active participants in a treatment relationship. Second, the Rorschach distinguishes among various styles of personality that make patients differentially responsive to particular kinds of treatment and approaches within psychotherapy. Third, Rorschach protocols assist in delineating the kinds of personality change that are likely to be most beneficial for an individual patient (treatment targets) and anticipating personality-based interference with such changes that might arise in the course of therapy (treatment obstacles). Personality Integration and Subjectively Felt Distress An extensive literature on patient variables associated with progress and outcome in psychotherapy indicates that, other things being equal, patients who enter treatment generally in good psychological health but acutely distressed in relation to current events in their lives are most likely to persist and improve in psychotherapy; conversely, patients with a long prior history of psychological disturbance and maladaptive functioning but

< previous page

page_1131

next page >

< previous page

page_1132

next page > Page 1132

little current distress are at relatively high risk for making minimal progress in psychotherapy and being early dropouts from it (Garfield, 1994; Mohr, 1995). Both level of personality integration (ego strength) and level of subjectively felt distress (stress overload) are readily measured by Rorschach variables. With respect to measuring personality integration, the Rorschach Prognostic Rating Scale (RPRS) introduced by Klopfer and his colleagues (Klopfer, Kirkner, Wisham, & Baker, 1951) has long been known to demonstrate good construct validity as an index of ego strength and adjustment potential (Goldfried et al., 1971, chap. 12). A contemporary meta-analysis reported by Meyer and Handler (1997) indicates further that the RPRS is a valid predictor of psychotherapy outcome and foretells behavior change extremely well in both children and adult patients being treated in both hospital and outpatient settings. Regrettably, the RPRS has not been translated from the Klopfer system of coding Rorschach responses into the more widely used Comprehensive System and has thus not found much application in clinical practice. Nevertheless, attention in general to validated Rorschach indices of adaptive personality resources, such as adequate form level and good quality human movement resources, can serve to identify the kinds of personality integration that contribute to involvement and progress in psychotherapy. A formal index of ego strength developed within the Comprehensive System by Perry and Viglione (1991), called the Ego Impairment Index (EII), has shown some potential to serve usefully in this respect. In addition to identifying a favorable therapy prognosis, adequate assessment of ego strength as facilitated by Rorschach data can also reveal underlying fragility in a patient's ego that calls for a relatively supportive, rather than an uncovering, approach in treatment. People with limited ego strength whose personality resources have been overestimated are at considerable risk for premature termination of their treatment or deterioration during treatment when they are subjected to a less structured or more demanding form of psychotherapy than they can tolerate (Appelbaum, 1990; Mohr, 1995). As for stress overload, the Rorschach D-score provides a well-validated index of the extent to which the demands that people are facing in their lives are reasonably in balance with the adaptive resources they have available for meeting these demands. An excess of experienced demands (as reflected in the es summary score) over adaptive capacities (as reflected in the EA summary score) results in D < 0, the behavioral corollaries of which are anxiety, tension, nervousness, irritability, and limited frustration tolerance. Although unpleasant to experience, this state of stress overload constitutes the type of experienced distress that contributes to people seeking, remaining in, working hard at, and benefiting from psychotherapy. Consistent with this conceptual linkage between principles of psychotherapy and the meaning of certain Rorschach variables, specific research with the Rorschach confirms that test indices of insufficient resources to cope with experienced demands and a correspondingly high level of psychological distress predict continuation rather than early dropout from psychotherapy (Colson, Eyman, & Coyne, 1994; Hilsenroth, Handler, Toman, & Padawer, 1995). Personality Styles Researchers have only recently begun systematic exploration of relationships between Rorschach indices of personality style and differential treatment response. Nevertheless, data have emerged to indicate that specific personality traits as measured by the Rorschach

< previous page

page_1132

next page >

< previous page

page_1133

next page > Page 1133

can predict the types of psychotherapy to which patients are most likely to respond positively. Blatt and Ford (1994), for example, used Rorschach variables to assist in categorizing patients as having primarily anaclitic problems, which involve difficulties in forming satisfying interpersonal relationships, or primarily introjective problems, which involve difficulties in self-definition, autonomy, self-worth, and identity. In the course of their subsequent psychotherapy, the anaclitic patients became more involved in and were more responsive to relational aspects of the treatment, whereas the introjective patients were more attuned to and influenced by their therapist's interpretive activity. As a further example, Exner (personal communication, December 6, 1996), is analyzing the pretreatment Rorschach data of 497 patients who entered various types of psychotherapy. Among 73 of these patients who dropped out of treatment within 8 weeks, those who were in behavioral therapy were especially likely to be ideational rather than expressive types of people (i.e., an introversive rather than extratensive Experience Balance), but they did not consistently show Rorschach indices of self-centeredness or interpersonal difficulty. By contrast, patients who had dropped out early from dynamic, cognitive, and experiential forms of therapy were not particularly likely to be either introversive or extratensive, but they did give evidence of narcissistic features (presence of Reflections) and problems in forming interpersonal attachments (absence of Texture). Prominent Rorschach evidence of narcissism in patients who terminate prematurely from dynamic psychotherapy has also been reported by Horner and Diamond (1996). Also of interest in the data analyzed by Exner thus far is a group of 117 patients who remained in therapy but were regarded by their therapist after 4 months as having made slow progress. Of these patients, those in behavioral therapy showed a high frequency of interpersonal neediness (as defined by having more than one Texture response), whereas those progressing slowly in other types of therapy appeared to be unusually insulated against experiencing subjectively felt distress (as defined by an EA substantially greater than their es). These preliminary indications of conceptually meaningful and empirically reliable relations between Rorschach indices of personality characteristics and treatment modality among patients differing in their early treatment course offer promise that the potential utility of the instrument in this regard has only begun to be tapped. Treatment Targets and Obstacles Rorschach examiners can best apply their data to treatment planning by asking the question, "What changes in the presently obtained protocol would probably be accompanied by the patient's feeling better, coping more effectively with interpersonal and achievement-related experiences, and realizing more fully his or her human potential?" From this perspective, any structural or thematic characteristic of a pretherapy Rorschach protocol that has known corollaries of felt distress, ineffective coping, or personal dissatisfaction can become a treatment target. Translated into the language of personality functioning, these Rorschach characteristics can tell therapists about the goals of their treatment in clear terms. Among the treatment targets that are identified in this way by Rorschach findings, some are likely to constitute obstacles to treatment as well. The known personality correlates of several Rorschach variables in particular constitute patient characteristics

< previous page

page_1133

next page >

< previous page

page_1134

next page > Page 1134

that are widely believed by therapists to pose obstacles to progress in therapy. Four such characteristics of special note, although by no means an exhaustive list, are rigidity, self-satisfaction, nonintrospectiveness, and interpersonal distancing. Rigidity in personality functioning refers to being set in one's ways and unwilling or unlikely to consider changing one's perspectives. The active:passive (a:p) ratio on the Rorschach provides an excellent measure of such rigidity. People whose ratio of active to passive movements attributed to their percepts exceeds 2:1, in either direction, tend to be people who cling stubbornly to their beliefs and seldom consider the possibility that they might benefit from looking at their experiences in a different light. Self-satisfaction consists of feeling comfortable and satisfied with oneself and experiencing little need to change. These dimensions of self-satisfaction are reflected on the Rorschach in a D score equal to or greater than 0 (D > 0), as contrasted with a minus D score (D < 0). People with D > 0 usually have sufficient personality resources to meet the demands they encounter in their daily lives. Although such people may become distressed by situations outside of their control, they tend to believe that these situations, not themselves, need to be changed. Approximately 90% of nonpatient adults have D scores of 0 or more, and in these nonpatients this Rorschach finding is indicative of stable personality functioning. Whereas stability and self-satisfaction are personality assets in people who are functioning well, in individuals who become sufficiently disturbed to require treatment they become liabilities as obstacles to change. More specifically, stability and self-satisfaction in psychologically disturbed persons are usually hallmarks of chronic disorder, characterological difficulties, and ego-syntonic symptom formation. This treatment obstacle stands in contrast to the previously mentioned implications of a stress overload, as measured by D < 0. Among people who need treatment, it is a D score of less than 0, an indicator of subjectively felt distress, that is likely to be associated with motivation to persist and make progress in psychotherapy. Nonintrospectiveness involves a disinclination to examine oneself. Psychological treatment can proceed effectively only when patients are able and willing to report their thoughts, feelings, and actions. Dynamic, behavioral, and experiential therapies alike depend for their impact on patients' readiness to observe and talk about themselves and their lives. Introspectiveness is indicated on the Rorschach by Dimensionality (FD) responses, in which subjects literally take some distance from their percepts by seeing them as from afar (e.g., "It's a long way off") or attributing some depth perspective to them (e.g., "This part is behind that part"). Among nonpatient adults, 79% have one or more FD responses in their protocol. The total absence of FD identifies a tendency toward nonintrospectiveness that can interfere with progress in treatment by limiting the amount and personal significance of the information that patients make available for discussion. Interpersonal distancing occurs when people who are disinclined to form close attachments to others hold themselves at a physical and psychological distance from interpersonal engagement. Such distancing is usually associated with having little anticipation of mutually supportive relationships with other people and little expectation that others will lend a helping hand or a caring heart. Interpersonal distancing poses a major obstacle to progress in psychotherapy by dissuading patients from trusting and confiding in their therapist and by derailing the treatment relationship as a vehicle for engaging patients in therapy and influencing their behavior. The Rorschach provides a powerful index of interpersonal distancing when records do not include texture (T) in the formulation of responses. Total absence of T occurs in just 11% of nonpatient adults, and T = 0 correlates significantly with behavioral

< previous page

page_1134

next page >

< previous page

page_1135

next page > Page 1135

indices of taking a cautious, arm's-length stance in interpersonal relationships. Preliminary empirical confirmation that records without T increase likelihood of dropout from many forms of therapy was noted earlier. As in the case an imbalanced a:p ratio, D = 0, and FD = 0, T = 0 does not preclude the possibility of effective treatment. However, each of these Rorschach variables identifies obstacles to treatment that must to be addressed in therapy before personality or behavioral change can be expected to occur. Providing Feedback To provide feedback to patients who have been tested for purposes of treatment planning, examiners need first to translate undesirable and maladaptive features of the Rorschach protocol into easily understandable descriptions of the personality problems they reflect. Although it may seem difficult at times to capture the implications of complex test findings in simple language, it is only when examiners can do so effectively that they really understand the nature of the test. Moreover, a gifted clinician's most brilliant insights into personality functioning serve little purpose if they cannot be communicated clearly to the person whom they concern. Accordingly, the feedback process should proceed in a straightforward and down-to-earth manner. Patients should be told that their responses to the test identified several important aspects of what they are like as people and pointed to certain strengths and weaknesses in their current personality functioning. Then, based on what the data show, and in some order of priority chosen to focus on the most salient findings, patients should be presented with comments of each type: For example, "You appear to be a thoughtful person who likes to think things over before showing how you feel" (nature); "The test findings indicate that you're someone who usually has good control of yourself" (strength); "There is some evidence here that you sometimes let your anger get the best of you and show bad judgment as a result" (weakness). In presenting such test inferences, the examiner should invite reactions and be prepared in some instances to reemphasize an observation (when it seems solidly based in the data and the subject is finding it somewhat difficult to accept) and in other instances to soft pedal an observation (when the data seem soft and the subject's reaction is highly skeptical or defensive). The purpose is to achieve an agreement on at least some and hopefully many personality implications of the data, and this process can be facilitated by the examiner's expressing the more obvious findings with relative certainty and more speculative inferences as tentative possibilities. At the conclusion of the feedback process, there should be on the table some agreed on areas of difficulty in the patient's personality functioning and some shared perceptions of how changes in these personality characteristics would be psychologically beneficial to the patient. These joint conclusions will then provide the initial goals of the treatment. Treatment Outcome Assessment As a continuation of treatment planning, Rorschach findings can contribute to treatment outcome assessment by identifying the number and nature of treatment targets that are present at a particular point in time. Each of the treatment targets identified in this assessment constitutes test evidence of some maladaptive personality characteristic.

< previous page

page_1135

next page >

< previous page

page_1136

next page > Page 1136

Accordingly, the fewer such treatment targets appearing in the Rorschach of a patient receiving psychotherapy, the more likely it is that time has arrived to consider termination; conversely, substantial treatment targets still in evidence indicate a need for continuing therapy and suggest the directions that further treatment should take. Additionally, if baseline testing is available for comparison purposes, currently absent or present targets clarify the nature and amount of progress that has been made in the treatment and reveal whether and how the original treatment goals should be modified. Because of the practical difficulties of mounting large-scale and long-term Rorschach follow-up studies of progress and outcome in psychotherapy, research in this area has traditionally been limited. The previously mentioned Blatt and Ford and Exner projects provide good reason to expect that appropriately designed investigations will demonstrate the utility of the Rorschach for monitoring change in psychotherapy. In addition, Weiner and Exner (1991) and Exner and Andronikof-Sanglade (1992) successfully employed a Rorschach methodology that identified amount and rate of change in a variety of treatment targets over varying durations of psychotherapy. In this research, Weiner and Exner reported the results of sequential Rorschach examinations over a 4-year period of patients in long- and short-term psychotherapy who were examined on entering treatment and on three subsequent occasions, the last coming when all of the 88 short-term patients and two thirds of the 88 long-term patients in the study had terminated their treatment. For the purposes of this study, a conceptual analysis of probable Rorschach indices of adjustment difficulty was used to identify 27 treatment target variables as potential measures of improvement in treatment. Over the 4 years of the study, 24 of these 27 Rorschach indices of adjustment difficulty became significantly less frequent in the records of the long-term therapy patients, 15 of them within the first year. Among the short-term therapy patients, 20 of the indices became significantly less frequent, 18 within the first year. These 27 indices and their maladaptive corollaries are listed in Table 36.1. In the second study, Exner and Andronikof-Sanglade examined the same 27 adjustment indices in pretherapy and posttherapy Rorschachs of 35 short-term therapy patients (seen for an average of 47 weekly sessions) and 35 patients seen in brief therapy (averaging 14.2 sessions on a once per week basis). Like Weiner and Exner's shortterm therapy subjects, the short-term group in this second study showed significant decline in the frequency of 20 of the 27 indices, and the brief therapy group had a significantly lower frequency in 12 of the indices. Decreasing frequency of these Rorschach indices has also been reported by Abraham, Lepisto, Lewis, Schultz, and Finkelberg (1994) to reflect positive personality changes over a 2-year period among 50 adolescents being treated in a residential treatment facility. Taken together, the significant diminution of these Rorschach indices of adjustment difficulty and corresponding treatment targets contribute to validating psychotherapy as an agent of positive change and the Rorschach as a measure of such change. Additionally, the more numerous changes demonstrated by the long-term patients (24) than by the short-term (20) or the brief (12) therapy patients is concordant with the general finding in psychotherapy research that the longer patients stay in treatment, the more they improve (Orlinksy, Grawe, & Parks, 1994, p. 352). In the individual case, clinicians assessing progress in psychotherapy with the Rorschach need to have in mind some guidelines for what constitutes significant change. The limited database in this regard precludes any fixed formulas for determining desirable magnitudes of change in Rorschach variables that are identified as treatment targets. However, sound judgments about whether progress is occurring and about readiness

< previous page

page_1136

next page >

< previous page

page_1137

next page > Page 1137

TABLE 36.1 Selected Rorschach Structural Variables Indicative of Adjustment Difficulty* Variable Maladaptive Corollary 1. D < 0a , bSubjectively felt distress resulting from inadequate resources to meet experienced demands 2. AdjD < Persistently felt distress extending beyond transient or 0a, b situational difficulties in meeting experienced demands 3. EA < 7a, Limited resources for implementing deliberate strategies b of resolving problematic situations 4. CDI > 3a, General deficit in capacities for coping with demands of b daily living 5. Lack of commitment to a cohesive coping style leading to Ambitencea,a personal sense of uncertainty b 6. Zd < Insufficient attention to the nuances of one's experience, 3.0a, b with superficial scanning of environmental events and hastily drawn conclusions about their significance 7. Lambda Narrow and limited frames of reference and an inclination > .99a, b to respond to situations in the simplest possible terms 8. X + % < Inability or disinclination to perceive objects and events as 70a, b most people would 9. X - % > Inaccurate perception of one's circumstances and faulty 20a, b anticipation of the consequences of one's actions 10. SumSh Negative emotional experiences of dysphoria, loneliness, > FM + ma, helplessness, and/or self-denigration b 11. DEPI = Depressive concerns 5a, b 12. DEPI > Likelihood of diagnosable depressive disorder 5 13. Afr < Avoidance of emotional interchange with the environment .50a, b and reluctance to become involved in affect-laden situations 14. CF + C Overly intense feelings and unreserved expression of > FC + 1a, affect b 15. Sum 6 Tendency toward loose and arbitrary thinking Sp Sc > 6a 16. M- > 0a, Strange conceptions of the nature of human experience b 17. Mp > Excessive use of escapist fantasy as a replacement for Maa, b constructive planning 18. Intellect Excessive reliance on intellectualization as a defensive > 5a measure 19. Fr + rF Narcissistic glorification of oneself and tendencies to >0 externalize blame 20. 3r + Excessive self-focusing and preoccupation with oneself (2)/R > .43a 21. 3r + Low regard for oneself in comparison with others (2)/R < .33a, b 22. FD > 2 Unusual extent of introspection 23. p > a + Passivity in relation to other people and an inclination to 1a, b avoid taking initiative and responsibility (Continued) (table continued on next page)

< previous page

page_1137

next page >

< previous page

page_1138

next page > Page 1138

(table continued from previous page)

Variable 24. T = 0a 25. T > 1a, b

TABLE 36.1 (Continued) Maladaptive Corollary Lack of expectation or reaching out for close, psychologically intimate, nurturant, and mutually supportive relationships with others Unmet needs for close and comforting relationships with other people, leading to feelings of loneliness and deprivation Disinterest in and/or difficulty identifying with other people Uneasiness in contemplating relationships with real, live, and fully functional people.

26. Pure H < 2a, b 27. H < (H) + Hd + (Hd)a, b *Based on research reported by Weiner and Exner (1991). aBecame significantly less frequent among patients receiving longterm psychotherapy. bBecame significantly less frequent among patients receiving shortterm psychotherapy.

for termination can usually be made on the basis of whether targeted Rorschach variables have been brought within the normative range. For example, for a parametrically distributed variable such as the Affective Ratio (Afr), this would involve falling within one standard deviation from the mean; thus a subject with an Afr of .45 on entering therapy who later shows an Afr of .60 has made clinically significant improvement (the adult nonpatient mean for Afr is .69, and the standard deviation is .16). For a nonparametrically distributed variable, progress would consist of movement toward median or modal expectation; thus, a subject with a baseline findings of T = 0 (the normative median and mode are 1) and M- = 3 (the normative median and mode are 0) who subsequently shows T = 1 and M- = 1 is definitely improved but still has some work to do on social perception. Feedback concerning outcome assessments with the Rorschach should be presented in this same way (i.e., as a commentary on what the test suggests about progress and continuing need for change, if any) on each of the original treatment targets. As is also true in presenting evaluation feedback prior to beginning treatment, there is nothing to be gained from referring to specific Rorschach findings. The focus instead should be on the implications of the test findings for the subject's present status with respect to each of the personality weaknesses that were initially agreed on as treatment targets. When treatment is continued, special emphasis can then be given to identifying those aspects of personality functioning in which there has been little or no change and that will accordingly constitute primary targets for the next phase of the therapy. Case Study Ms. A is a 23-year-old single woman referred for psychological evaluation by her physician, who was treating her for spastic colon and believed that her recurrences of this condition were stress related. She comes from a middle-class background, has completed 2 years of college, and is currently employed as a paralegal. She enjoys her work and does it well, and she is considering returning to college as a pre-law student. For the last 2 years, Ms. A has been involved with a succession of men who have taken advantage of her and

< previous page

page_1138

next page >

< previous page

page_1139

next page > Page 1139

misled her concerning their intentions. One of these unsatisfactory relationships was with a college student with whom she had begun a live-in relationship that dissolved in bitterness after 4 months. ''I couldn't believe I misjudged him so much," she said. "He was a slob, always demanding and never helping out. After a while I felt like I was his maid, and, even though I knew that, it took me almost a month before I mustered the courage to throw him out." In another instance, she had invited a 37-year-old separated man she was dating to move in with her. She terminated this relationship after 3 months when she learned he had been visiting his wife to seek a reconciliation. For the past 6 months, she says, concurrently with developing the spastic colon, "My life has been a mess." She is distracted at work, has difficulty sleeping, and has been suffering bouts of considerable pain. Although she has several acquaintances, the only person with whom she feels able to discuss her problems is a sister, who lives at a distance and is available only once or twice a week by telephone. With respect to entering treatment she says, "I'll do anything I have to straighten myself out." Ms. A's Rorschach protocol follows in Tables 36.2, 36.3, and 36.4 and Fig. 36.1. Interpretation of First Protocol Interpretation of Rorschach protocols in the Comprehensive System proceeds through a search strategy in which particular clusters of structural and thematic variables are examined in a specific sequence. The sequence for each individual protocol is determined on the basis of which are the most salient features of the record, as outlined in detail by Exner (1991, chaps. 5-9). Applying this strategy to Ms. A's Rorschach yields in approximate order of importance the following 16 features of her protocol that identify problems in personality functioning and that could constitute treatment targets: 1. Coping Deficit Index (CDI) = 4. An elevated CDI indicates marked difficulty coping effectively with everyday demands of living, particularly with respect to capable and comfortable management of interpersonal relationships. Ms. A's treatment should accordingly include a substantial, if not primary, focus on interpersonal coping skills, in the expectation that more effective social coping, reflected in a lower CDI, would be of considerable benefit to her. 2. Adj D = -2. This finding signifies persistent and long-standing distress related to inability to muster sufficient personality resources to meet the demands of her everyday life. This imbalance, which holds the key to the subjectively felt distress she reports, seems due not to deficient personality resources in general (her EA of 9.5 is average), but to her experiencing many more stressful demands than most people (es of 17). A search for the specific source of this excessive stress identifies the next three more specific treatment targets. 3. C'= 4. This unusually frequent use of achromatic color signifies a heavy burden of painful and dysphoric internalized affect. She will feel and function much better if her therapy pays special attention to ways of lifting her spirits, and thereby reducing her C'. 4. V = 1. The presence of even one Vista response points to distressing self-critical attitudes. People in therapy who can be helped to look on themselves more favorably, and thereby to get Vista out of their record, are likely to feel that a great burden has been lifted from their shoulders. 5. T = 3. The presence of more than one T gives evidence of unmet needs for closeness to other people and usually signifies the kind of loneliness that people experience when a rupture of previously enjoyed relationships deprives them of love and affection they had come to expect. It will be important for Ms. A's therapist to employ whatever strategies seem appropriate to help her establish some new loving and supportive relationships, at which point her T would be expected to diminish. 6. FC:CF + C = 1:3. This pattern of color use is much more typical of young children than mature adults. It suggests that she is an emotionally immature person who experiences and

< previous page

page_1139

next page >

< previous page

page_1140

next page > Page 1140

TABLE 36.2 Rorschach Protocol 1: Ms. A Pre-Therapy Card Response Inquiry I. 1. Oh my, skin of wild E:(Repeats subject's response) animal, I'll say a wolf, S: Well it's like he's growling, the way his mouth his face, he's angry. seems curled up. The eyes are here, cheeks, E: ears, the eyes are downward. He certainly does If you look longer I look angry. think you'll find something else. 2

(H)

= 0, 0 YES .. Col-Shd Bl>0

Hd

= 1, 0 YES .. Ego < .31, > .44

(Hd) = 1, 0 NO .. MOR > 3 Hx

= 0, 0 YES .. Zd > + - 3.5

A

= 4, 1 YES .. es > EA

(A)

= 1, 0 YES .. CF+C > FC

Ad

= 3, 0 NO .. X+% < .70

(Ad) = 0, 0 YES .. S > 3 An

= 0, 0 NO .. P < 3 or > 8

Art

= 1, 0 NO .. Pure H < 2

Ay

= 0, 0 NO .. R < 17

Bl

= 0, 1 6 ..... TOTAL

Bt

= 2, 1

Cg

= 0, 4

Cl

= 0, 0

Lv1

Lv2

Ex

= 0, 0 DV

= 0x1

0x2

Fd

= 0, 0 INC

= 0x2

0x4

Fi

= 0, 0 DR

= 0x3

0x6

Ge

= 0, 0 FAB

= 0x4

0x7

Hh

= 1, 3 ALOG = 0x5

Ls

= 0, 1 CON

Na

= 0, 3 Raw Sum6= 0

Sc Sx Xy Id

= 1, 0 Wgtd Sum6= 0 = 0, 0 = 0, 0 AB = 0 = 0, 1 AG = 1

CFB =0 =0 None 0 0 (2) = 5 COP =2 Ratios, Percentages, and Derivations R = 19 L = 0.06 = 1: 3 FC:CF+C =0 Pure C EB = 6: 3.5 EA = 9.5 EBPer = = 4:3.5 SumC':WS 1.7 umC Eb = 8: 9 D = -2 = 0.36

Special Scorings

= 0x7

2

CP = 0 MOR = 1 PER = 2 PSV = 0

COP = 2 AG = 1 Food

=0

Isolate/R

= 0.53

H:

= 4: 2

es = 17 Adj es = 17

Afr

Adj D = S 2 FM = 7 : C' = 4 T=3 Blends:R M=1: V=1 Y = 1CP =0 P= 5 A:p = 5: 9 Sum6 = 0 X+% = 0.79 Ma:Mp = 3: 3 Lv2 = 0 F+% = 0.00 2AB+Art+Ay = 1 M- = 1 SCZI = 1

WSum6 = 0 Mnone = 0

DEPI = 5*

< previous page

X-% = 0.11 S-% = 1.00 Xu% = 0.11 CDI = S-CON = 6 4*y

=7 = 10:19 Zf =15 Zd = +9.0 W:D:Dd = 9: 6: 4 W:M = 9: 6 DQ+ = 12 DQv = 1 HVI = No

page_1143

(H)Hd(Hd) (HHd) : = 1: 1 (AAd) H+A:Hd+Ad= 10: 5

3r+(2)/R = 0.74 Fr+rF = 3 FD= 1 An+Xy= 0 MOR= 1 OBS = YES

next page >

< previous page

page_1144

next page > Page 1144

Fig. 36.1. Location choices for Protocol 1. expresses feelings in an overly intense and dramatic fashion and is highly changeable in her moods. She would benefit from being helped to modulate her affects a bit more than has been customary for her. This would involve becoming a little more restrained emotionally and should be accompanied by more FC in her Rorschach. 7. Affective Ratio (Afr) = .36. Probably because her lack of emotional restraint gets her into difficulties when she has to confront emotionally charged situations, she is in light of this low Afr inclined to back away from situations in which people are likely to exchange strong feelings. She accordingly needs help to feel more comfortable when affects emerge in social situations, in order for her to become more rewardingly engaged in interpersonal relationships, and a higher Afr is an important treatment target in her case. 8. S = 7. Although not apparent from the clinical history, a substantial amount of underlying anger and resentment are revealed by her unusually frequent use of white space. The interpersonal implications of these angry feelings will become more apparent in a moment, as this analysis proceeds. At this point, however, there is little doubt that she is troubled by a maladaptive extent of resentment toward others or toward her circumstances that needs to be eased in her treatment. 9 & 10. Reflections = 3 and Egocentricity Ratio = .74. These unusual elevations on these two variables, which are typically examined together, identify a stylistic pattern of self-centeredness involving an inflated sense of self-worth, a predilection to externalize blame, and feelings of entitlement. Such personality features often cause adjustment difficulties, because high reflection, high egocentricity people tend to be seen by others as selfish, manipulative, and narcissistic. In Ms. A's case, however, these self-aggrandizing features have to be understood in light of the

< previous page

page_1144

next page >

< previous page

page_1145

next page > Page 1145

distress, depression, and self-critical attitudes already documented by the Rorschach data. It is not unusual clinically for people who basically feel very badly about themselves and their circumstances to ward off deep depression by mechanisms of denial that produce superficial manifestations of cheerfulness, enthusiasm, optimism, and self-love. Efforts to disabuse these people of such hypomanic defenses can do more harm than good, by precipitating depressive reactions and sometimes suicidal behavior as well. To the extent that Ms. A's Reflections and Egocentricity Ratio represent some necessary even though not entirely desirable defenses, they probably should be regarded not as treatment targets, but as adjustment problems to be left alone for the time being. 11. Thematic content. Analysis of the content of those responses most likely to involve projection (i.e., the minus form, human movement, and embellished responses) reveals four themes related to probable underlying concerns and attitudes. The most dramatic response in the record is 3, which is a morbid M-. This response begins as "A face of someone who's been hurt," which appears to capture self-image concerns about suffering past and possible future hurts in interpersonal relationships. Later, however, she turns the response around so that the apparent object of the hurt is not herself, but instead a male figure. To leave little doubt that she harbors fantasies of retaliation against men who have hurt her, she says that the male in this percept who is hurt and has blood on his chin has the same kind of beard as someone "I used to go with." The other minus response in the record (No. 8) captures a theme of figures ("two ducks") looking in opposite directions, which may speak to her interpersonal concerns about people looking the other way instead of paying sufficient attention to her. Another of her M responses (No. 6, "Two waiters cleaning off a table") suggests a third theme, her concern about being placed in the role of cleaning up after other people, which she expressed clearly in complaining about the boyfriend who treated her as a maid. Finally, her remaining Ms (Nos. 5, 10, 13, 17) capture as a fourth theme her self-centered concerns with making an attractive appearance and being admired for it. She pays considerable attention to details of clothing and reports percepts of a Las Vegas showgirl in a flowing costume (No. 10) and a girl "primping in the mirror'' (No. 13). 12. a:p = 5:9. Continuing with the prominent interpersonal difficulties identified by Ms. A's Rorschach protocol, this finding identifies a passive stance in relation to others. She prefers to follow rather than lead, to have other people make decisions for her, and to let others take on responsibilities that should be hers. This passivity is apparent in her interpersonal history, and it combines with her previously noted anger and resentment to indicate the likelihood of passive-aggressive rather than effectively assertive ways of coping with problems. Thus, an important treatment target is identified, namely, to help her become less passive and more assertive. 13. Isolation Index = .53. This finding identifies further a currently barren interpersonal life in which her previously noted feelings of loneliness are accompanied by a lifespace in which she in fact has very few people in whom to confide or with whom to keep pleasant company. Thus, the need is emphasized to focus her treatment on expanding her interpersonal involvements. 14. Lambda = .06. This unusually low Lambda identifies limited capacity to deal with experience in a simple and objective manner. Instead, she is likely to be the kind of person who does things the hard way, makes a major production of minor events, and gets wrapped up in thoughts and feelings that serve little purpose. She would be a more relaxed, less preoccupied, and less excitable person if therapy could get her Lambda up, which would involve helping her to deal with life experiences in a psychologically more economical manner. 15. Zd = +9.0. Ms. A, as one feature of difficulty being economical, likes to examine situations carefully and consider options carefully before coming to conclusions or making decisions. Her excessive tendencies in this regard can lead to inefficiencies and delays in bringing projects to fruition. On the other hand, in some vocations, an unusual thoroughness and determination not to overlook anything can prove valuable, providing that there are no pressing deadlines to meet, and Ms. A, as a paralegal, may be in one of those vocations. Hence, this particular index of adjustment difficulty may signify in her case a personality orientation that should be left in place rather than established as a treatment target. 16. Obsessive Index is elevated. This indication of obsessive-compulsive style fits closely with the evidence of unusual thoroughness just noted. Although being obsessive can cause adjustment difficulties and constitute in some cases a major treatment target, this seems not to be such a

< previous page

page_1145

next page >

< previous page

page_1146

next page > Page 1146

case. Ms. A's obsessive style may well be conducive to her functioning well in her job, and her work is an aspect of her life that is going well. Hence, this feature of her style should probably be left in place. Finally, Ms. A's Rorschach can be examined for indications of the obstacles to progress in treatment that were noted earlier. Her a:p of 5:9 does not demonstrate rigidity; her D of -2 does not demonstrate self-satisfaction; her FD of 1 does not demonstrate nonintrospectiveness; and her T of 3 does not demonstrate interpersonal distancing. To the contrary, she appears to be a reasonably flexible, self-concerned, introspective, and interpersonally interested individual with good prospects for becoming beneficially engaged in a treatment relationship. The systematic manner in which the Rorschach Comprehensive System interpretive search strategy identifies treatment targets provides a detailed basis for providing feedback to prospective therapy patients concerning their psychological needs and the probable focus of their therapy. This was done in Ms. A's case, and she was referred for psychotherapy, the nature and course of which are discussed next to illustrate treatment evaluation with the Rorschach. Following her initial examination, Ms. A entered individual psychotherapy on a weekly basis. She was reexamined 11 months later, at which time her therapist indicated that, in accord with the pretherapy test findings, the treatment focus had been on social interaction, with attention both to romantic relationships with men and friendships with women. The therapist reported good progress in both of these areas. She has been dating regularly without becoming involved in any problematic relationships, and she has become more at ease in forming friendships with her female acquaintances. She is more comfortable than before in making decisions, and the problem with the spastic colon has become much less severe. Ms. A's second Rorschach protocol follows in Tables 36.5, 36.6, and 36.7 and Fig. 36.2. Interpretation of Second Protocol Utilization of these Rorschach findings in evaluating progress in treatment can be assessed in terms both of the 16 original treatment targets listed previously and the 27 adjustment indices studied by Weiner and Exner. Beginning with the treatment targets, there are four of these, including the two originally considered first and second in order of importance, in respect to which Ms. A is now showing functioning in the normal range, as defined by the nonpatient norms. Specifically, her CDI has gone from 4 to 1 (Target 1), and her AdjD from -2 to 0 (Target 2), which taken together indicate that she is feeling much less distressed than before and much more capable of coping with daily life events, especially social interactions. Moreover, her a:p ratio has gone from 5:9 to an acceptable 8:7 (Target 12), and her Isolation Index has dropped from .53 to a clinically unremarkable .18 (Target 13). Hence, she now seems capable of avoiding maladaptive passivity in her interpersonal relationships, of making decisions on her own, and of populating her life with a sufficient number of friends and companions. The consistency among these dramatic Rorschach changes, the major goals of the therapy, and the therapist's report of behavioral change serve to validate each other with respect to the accuracy of the personality assessment and the efficacy of the treatment.

< previous page

page_1146

next page >

< previous page

page_1147

next page > Page 1147

On six other treatment targets, Ms. A shows some improvement, but is still responding in a clinically significant manner (Nos. 3, 5, 6, 7, 8, and 14). Thus she is less dysphoric than before (C' from 4 to 3); she is less lonely and emotionally deprived (T from 3 to 2); she is modulating her affect in a more mature way (FC:CF + C from 1:3 to 3:3); she feels more comfortable dealing with affective exchange (Afr from .36 to .57); she is less angry and resentful (S from 7 to 3); and she is more capable of dealing with situations in economical, uncomplicated ways (Lambda from .06 to .57). Whereas each of these changes indicate progress in treatment, her present levels of functioning still exceed normative expectation and identify needs for further treatment. Ms. A also demonstrates some change on three structural variables that were initially regarded as potentially problematic, but also as features of her defensive style that might best be left in place (Targets 9, 10, and 15). Thus she is a little less self-centered and narcissistic than before (Reflections from 3 to 1; Egocentricity Ratio from .74 to .55) and a little less overincorporative (Zd from +9.0 to +7.0), although compared to adult nonpatients she still elevates on all three variables. In this same vein, she still demonstrates TABLE 36.5 Rorschach Protocol 2: Ms. A Post-Therapy Card Response Inquiry I. 1. I'll say a butterfly, are E:(Repeats subject's response) these the same ones S: It has the wings and little antennae and the divided from before? tail and it has white markings on it, the wings are spread like it is flying. E:Yes. 2. Down here it looks E:(Repeats subject's response) like the outline of a S: Well, it's her outline, see the legs and her waist and woman, standing her arms, her head isn't obvious, it's like she's behind something, standing behind something that you can see you only see her through, maybe a curtain or something. outline. E:I'm not sure where the curtain is. S: It's this part here (Dd24), she's behind it. II. 3. This is the one with E:(Repeats subject's response) the blood, I remember S: It's like two men in a fist fight, see they have their it looks like two fists colliding here (D4), they're big men and people fighting and apparently they're both hurt because of the blood on they've got blood them, see blood on their heads, the red and down around their heads here around their legs. and legs. 4. Just this part looks E:(Repeats subject's response) like a dog's head, one S: Just the sketch of the head of a dog, see the nose on each side. and the ear, it just has the shape of the head of a dog, not real, just a sketch. III. 5. This is the dancers, E:(Repeats subject's response) like in a musical. S: One on each side (D9), two women I'd say, the They're dressed the breasts are pretty apparent, see the head and they same. have rather large noses and their hair is pulled tight, and here's the leg and arm and they seem to have white belts or sashes and high-heels. They're dressed identically, probably because they're in the chorus. (Continued) (table continued on next page)

< previous page

page_1147

next page >

< previous page

page_1148

next page > Page 1148

(table continued from previous page) TABLE 36.5 (Continued) Card Response Inquiry 6. I don't remember the E:(Repeats subject's response) red, these two S: They're shaped like those crazy electric remind me of guitars that the punkers use. I went to one of guitars. those concerts not long ago and thought I'd go deaf, they all had crazy looking guitars like these. IV. 7. This looks like the E:(Repeats subject's response) monster from the S: It's like you're looking up at him, he's got It's lagoon all covered like you're looking up at him, he's got big feet with mud, like he's and his arms are smaller and his head up here standing there and and he's covered with all that gooey mud, wet you're looking up at looking. him. E:Wet looking? S: He just looks wet, I suppose the shades give that impression, like mud, it creates a lot of irregularity to the outline too. V8. This way it's like a E:(Repeats subject's response) badge. S: Like a pilot's badge or someone who works for the airline, see these wb wings and then some other symbol in the middle. V. 9. Another butterfly, E:(Repeats subject's response) this one is flying too.S: It has antenna, wings, and a tail, a split tail like some butterflies have those, the wings are spread out as if it is flying. 10. I remember, a E:(Repeats subject's response) woman in a show, S: Her legs, she has a headdress on, her arms up, she has a long like a showgirl in a casino show, she has a trailing costume. costume that flows down as she walks, hugh plumes, big fancy things, like the showgirls wear. VI. 11. It looks Indian, like E:(Repeats subject's response) an Indian robe, or S: Not this top (D8), just the rest of it, it looks maybe a blanket, all like some furry animal skin that has been cut fur. so that it can be used as a blanket or robe. E:You said it's all fur? S: Oh yes, the shades definitely give that impression. V12.This way it can be a E:(Repeats subject's response) fancy mirror, an S: This would be the handle (D6) and the rest is antique one, I'm not the frame of the mirror, you don't see the counting this on the glass, it's like the back of it with the carving handle (Dd22), or showing. these out here (Dd24). E:The carving showing? S: It looks like it has grooves or designs carved into it, bumpy like, those fancy mirrors used to be like that on the back. VII. 13. Oh, the little girl E:(Repeats subject's response) looking in the S: She's looking in the mirror, here's her nose, mirror, she's sitting kinda upturned, her forehead, and chin, she on a cushion. has a ponytail, she has a big bow on the back of her dress (Dd21) and she's sitting on this cushion down here (Dd23) and over here (right side) is her image in the mirror, just the same. (Continued) (table continued on next page)

< previous page

page_1149

next page > Page 1149

(table continued from previous page) TABLE 36.5 (Continued) Card

Response Inquiry >14. This looks like E:(Repeats subject's response) a terrier when I S: He has a long tail, see the flat nose and the little hold it this ear, that dark spot is his eye and here are his way, a little little feet, I have a friend who has one like this. one just standing there. VIII. 15. It looks like E:(Repeats subject's response) two animals S: The two animals on the sides, like beavers, I climbing up a don't know if they climb trees but I guess that tree. they can, see their legs and the head and body and this is some kind of tree, see the top is up here and the branches go out, they're grabbing a branch and this center is the other branches and down here is the ground, they're just standing up to see if there is any food on top. 16. The blue taken E:(Repeats subject's response) alone can be a S: It's like rectangular wings, I've seen some kites kite. like this, the wings spread out in flight and here you can see where the string connects to it, see this is the string and it goes down to here. IX. 17. The pretty E:(Repeats subject's response) orange and S: The flower is all of the top and center, it has white flower. I petals in the center, orange and white petals remember this (points), here is the stem going up through the one. center and these big green leaves and the pot is down here, this red, you just see the top of it. 18. If I just use the E:(Repeats subject's response) orange it looks S: They have pointed hats and it looks like they're like two pointing these long sticks at each other, like clowns in the pretending to duel, you know they sometimes circus, like carry those firecracker sticks (Dd34) chasing they're joshing each other around, they're dressed in orange with each suits. other, doing an act. X. 19. The two pixies E:(Repeats subject's response) in their S: Here, the pink, the outline of their two heads is jammies and fairly precise, the forehead and nose and chin nightcaps, and their pointed nightcaps, but the rest of them looking at each is concealed like they're in those Dr. Denton kind other. of one piece pajama suits, pink, I bought them for my sister's little girls. 20. I remember E:(Repeats subject's response) this part, it S: Right here, the earpieces and this is the part that looks like the connects them. headset for the Walkman, that I use when I jog. V21.This part looks E:(Repeats subject's response) like a person S: Right here (D5), you can see his waving two big green things, like doing a (Continued) (table continued on next page)

< previous page

page_1149

next page >

< previous page

page_1150

next page > Page 1150

(table continued from previous page) TABLE 36.5 (Continued) Card

Response Dragon dance, like for Chinese New Year.

Inquiry Outline, the legs and the head and body and he's holding up these two big green things like paper dragons, like doing a dance like they do when they celebrate the Chinese New Year, see the dragon heads here. V22.It all reminds E:(Repeats subject's response) me of a S: Actually more like a floral arrangement with all painting of a different colored flowers arranged very nicely, two bouquet of big pink ones in the center and then all the others flowers. around them, blues and yellows.

TABLE 36.6 Sequence of Scores for Protocol 2 Card No.Location No. Determinant(s) (2)Content(s) Pop ZSpecial Scores I 1 WSo 1 FC'.FMao A P 3.5 4.0 2 Dd+ 24Mp.FVo H,Hh 4.5 AG,MOR II 3 W+ 1 MA.CFo 2 H,B1 4 Ddo 21Fo Art.(Ad)o III 5 DS+ 9 Ma.FC'+ 2 H,Cg P 4.0 COP 6 Do Fu 2 Sc PER IV 7 W+ 1 Mp.FD.FTo (H), Ls P 4.0 2.0 8 Wo 1 Fo Art V 9 Wo 1 FMao A P 1.0 2.5 10 W+ 1 Ma.mpo H,Cg VI 11 Do 1 FTo Ad, Ayy P MOR 12 Ddo 99FVu Ay, Hh VII 13 W+ 1 Mp.Fr+ H,Cg,Hh P 2.5 14 Do 2 FMpo A PER VIII 15 W+ 1 FMao 2 A,Bt P 4.5 3.0 PER 16 Dd+ 99Mpu Sc 5.5 IX 17 WS+ 1 CF+ Bt,Hh X 18 D+ 3 Ma.FCo 2 (H).Cg.Sc P 4.5 COP 4.5 PER, DV 19 D+ 9 Mp.FCo 2 (H),Cg 20 Do 3 Fu SC PER 4.0 21 D+ 5 Ma.FCu 2 H,Art,(A) 5.5 22 W+ 1 Cfo 2 Art,Bt

< previous page

page_1150

next page >

< previous page

next page >

page_1151

Page 1151 TABLE 36.7 Structural Summary for Protocol 2 Determinants Location Features Blends Single Contents Zf = 16 FC'.FM M = 0 H = 6, 0 ZSum = 59.5 M.FV FM= 3 (H) = 3, 0 ZEst = 52.5 M.CF M = 1 Hd = 0, 0 M.FC' FC = 0 (Hd)= 0, 0 W = 10 M.FD.FTCF = 2 Hx = 0, 0 (Wv = 0) M.m C =0 A = 4, 0 D=8 M.Fr Cn = 0 (A) = 0, 1 Dd = 4 M.FC FC'= 0 Ad = 1, 0 S=3 M.FC C'F= 0 (Ad)= 0, 1 DQ M.FC C' = 0 An = 0, 0 ....... (FQ-) FT = 1 Art = 3, 1 + = 13 (0) TF = 0 Ay = 1, 1 0 = 9 (0) T =0 Bl = 0, 1 V/+ = 0 (0) FV = 1 Bt = 1, 2 V = 0 (0) VF = 0 Cg = 0, 5 V =0 Cl = 0, 0 FY = 0 Ex = 0, 0 YF = 0 Fd = 0, 0 Y =0 Fi = 0, 0 Fr = 0 Ge = 0, 0 RF = 0 Hh = 0, 4 FD = 0 Ls = 0, 1 Form Quality F =4 Na = 0, 0 Sc = 3, 1 FQx FQf MQual SQx Sx = 0, 0 +=3 0 2 2 Xy = 0,02 O = 14 2 6 1 Id = 0, 0 U=5 2 1 0 =0 None = 0 R = 22 EB = 9:

0

0 0

0 0

S-Constellation YES .. FV+VF+V+FD>2 NO .. Col-Shd Bl>0 YES .. Ego < .31, > .44 NO .. MOR > 3 YES .. Zd > +- 3.5 NO .. es > EA NO .. CF+C > FC NO .. X+% < .70 NO .. S > 3 NO .. P< 3 or >8 NO .. Pure H < 2 NO .. R < 17 3 ..... TOTAL

Special Scorings

Lv1 Lv2 DV

= 1x1 0x2

INC

= 0x2 0x4

DR

= 0x3 0x6

FAB = 0x4 0x7 ALOG = 0x5 CON = 0x7 Raw Sum6 = 1 Wgtd Sum6 = 1

AB = 0

CP = 0

AG = 1

MOR = 2 PER = 5 PSV = 0

CFB = 0 COP = 2

(2) = 8 8 Ratios, Percentages, and Derivations L = 0.22 FC:CF+C = 3: 3 COP = 2 AG = 1 Pure C = 0 Food =0 EA = 13.5 EBPer = 2.0 SumC':WSumC = Isolate/R = 0.18

4.5 Eb = 6: 6 Es = 12

D=0

Adj es = Adj D = 0 11

Afr S Blends:R CP

FM = 4 M=2

: C' = 2 :V=2

T=2 Y=0 P=8

A:p = 8: 7

2:4.5 = 0.57 H: = 6: 3 (H)Hd(Hd) = 3 (HHd) : = 3: 2 (AAd) = H+A:Hd+Ad = 14: 2 10:22 =0

Sum6 = 1

Zf = 16

X+% = 0.77 Ma:Mp = 5: 4 Lv2 = 0 F+% = 0.50 2AB+Art+Ay = 6 WSum6 = 1 X-% = 0.00 M- = 0 Mnone = 0 S-% = 0.00 Xu% = 0.23 SCZI = 0 DEPI = 3 CDI = 1 S-CON = 3

< previous page

Zd = +7.0

3r+(2)/R = 0.50 Fr+rF = 1

W:D:Dd = 10: 8: 4 FD = 1 W:M = 10: 9

An+Xy = 0

DQ+ = 13

MOR = 2

DQv = 0 HVI = No

page_1151

OBS = YES

next page >

< previous page

page_1152

next page > Page 1152

Fig. 36.2. Location choices for Protocol 2. the obsessiveness noted previously (Target 16). These slight changes suggest that some easing or relaxation of her defensive style may be taking place, but they are modest enough to be consistent with the initial impression of her need to retain these defenses and with the fact that her therapist has not been challenging them. Ms. A's thematic content shows both changes and consistencies of interest over the 11-month retest interval. The most dramatic response in the first protocol, "A face of someone who's been hurt," has now become a response in which two men are in a fist fight, with blood around them, and both are hurt. The preoccupation with people hurting each other remains, but now both are suffering instead of just one party being hurt. The improved structural features of this response bear further witness to her improved capacity to deal comfortably with such interpersonal tensions: the previously inaccurate percept (minus form level) has now become a commonly seen percept (ordinary form level), and the partial object representation of just the face has been replaced by a percept involving whole people. As for other responses in this second record that are likely to involve projection, there are no minus responses to examine but eight human movement responses in addition to the men fighting. Five of these M responses continue the theme from the first record of admiring herself and showing herself off to others to receive their admiration. Thus, she sees dancers in a musical (No. 5), women in a show wearing costumes (No. 10), a girl looking at herself in a mirror (No. 13), a clown doing an act

< previous page

page_1152

next page >

< previous page

page_1153

next page > Page 1153

in a circus (No. 18), and a person doing a dance in celebration (No. 21). Like her elevated Reflections and Egocentricity Ratio, this persistent thematic content appears to identify a characterological reliance on narcissistic or hypomanic mechanisms that is not likely to change very much and perhaps should not be disrupted by her therapy. The one treatment target on which there has been a change for the worse involves Vista, which has increased from 1 to 2. Although having 2 V's in one's record indicates an undesirable and discomfiting state of affairs, an increase in painful self-examination during the course of psychotherapy is not unexpected and may even be an indication that patients are working hard to examine features of themselves or their circumstances that they would rather not have to confront. Although Weiner and Exner (1991) did not examine V in their study, they did find that Dimensionality (FD), an index of introspectiveness (see Table 36.4), increased among both of their samples during the first year of treatment before decreasing later on to its baseline frequency. Finally, a brief examination of Ms. A's two protocols with respect to the adjustment indices listed in Table 36.4 elucidates both her improvement and her need for continued treatment. Her pretherapy Rorschach is positive for 12 of the 27 indices of adjustment difficulty (Nos. 1, 2, 4, 10, 11, 13, 14, 16, 19, 20, 23, and 25), with 7 of these relating to difficulties in managing stress and modulating affect. Her second protocol after 11 months of therapy is positive for only 5 of the indices (Nos. 10, 18, 19, 20, and 25), with none of these relating to stress management and only one to affect modulation. However, she does continue to show some noteworthy emotional distress (No. 10), the self-centeredness and self-glorification already noted (Nos. 19-20), and continuing unmet needs for closeness (No. 25). In addition, she is now demonstrating some excessive reliance on intellectualization as a defense (No. 18), which constitutes an addition to the defensive repertoire she displayed initially. Conclusions A review of this chapter reveals that it is relatively long on conceptual underpinnings and clinical applications, and relatively short on empirical findings. Only in recent years have standardized procedures for coding and examining Rorschach protocols been applied in studies of treatment planning and outcome evaluation. Wellconceived and carefully designed research has nevertheless begun to demonstrate the utility of Rorschach assessment in selecting treatment approaches to meet individual patient's needs and in monitoring progress and change in psychotherapy. At the same time, abundant data have documented the psychometric soundness of the Rorschach inkblot method and its validity as a measure of personality states and traits. To the extent that effective treatment planning and reliable assessment of treatment outcome can be based on aspects of personality functioning, there is every reason to believe that future research will provide further empirical confirmation of the clinical applications illustrated here. Finally, with respect to the 11 criteria set forth by Newman and Ciarlo (1994) for selecting psychological instruments for treatment outcome assessment, the Rorschach Inkblot Method has few shortcomings. It is typically used to collect data from only a single subject in such assessmentsthe person begin treatedrather than from multiple respondents, and it therefore provides only a portion of the information that can prove useful in these assessments. The Rorschach is also a complicated personality assessment

< previous page

page_1153

next page >

< previous page

page_1154

next page > Page 1154

method that is relatively costly to utilize in terms of the professional time required to employ it adequately. On the other hand, the Rorschach method is readily teachable (see Weiner, 1997b) and can easily be understood by laypeople when it is properly explained to them by clinicians trained in doing so. The cost of the Rorschach relative to less complex measures is more than compensated by the breadth and depth of clinically relevant information that emerges from it. As for the remaining Newman and Ciarlo criteria, this chapter indicates that Rorschach assessment proceeds independently of psychological treatment and measures personality characteristics relevant to treatment concerns; that Rorschach variables have numerous objective referents and many implications for treatment-related processes, and the instrument as a whole has considerable psychometric strength; and that Rorschach findings lend themselves well to uncomplicated interpretation, effective feedback (see Finn, 1996), useful planning of clinical services, and formulation within a variety of theoretical and practical perspectives. References Abraham, P. P., Lepisto, B. L., Lewis, M. G., Schultz, L., & Finkelberg, S. (1994). An outcome study: Changes in Rorschach variables of adolescents in residential treatment. Journal of Personality Assessment, 62, 505-514. Appelbaum, S. A. (1990). The relationship between assessment and psychotherapy. Journal of Personality Assessment, 54, 791-891. Aronow, E., & Reznikoff, M. (1976). Rorschach content interpretation. New York: Grune & Stratton. Atkinson, L. Quarrington, B., Alp, I. E., & Cyr, J. J. (1986). Rorschach validity: An empirical approach to the literature. Journal of Clinical Psychology, 42, 360-362. Blatt, S. J. (1975). The validity of projective techniques and their clinical and research contributions. Journal of Personality Assessment, 39, 327-343. Blatt, S. J., & Ford, R. Q. (1994). Therapeutic change. New York: Plenum. Colson, D. R., Eyman, J. R., & Coyne, L. (1994). Rorschach correlates of treatment difficulty and of the therapeutic alliance in psychotherapy with female psychiatric patients. Bulletin of the Menninger Clinic, 58, 383388. Dawes, R. M. (1994). House of cards: Psychology and psychotherapy built on myth. New York: The Free Press. Ellenberger, H. F. (1954). The life and work of Hermann Rorschach (1884-1922). Bulletin of the Menninger Clinic, 18, 173-219. Exner, J. E., Jr. (1969). The Rorschach systems. New York: Grune & Stratton. Exner, J. E., Jr. (1974). The Rorschach: A comprehensive system. New York: Wiley. Exner, J. E., Jr. (1989). Searching for projection in the Rorschach. Journal of Personality Assessment, 53, 520536. Exner, J. E., Jr. (1991). The Rorschach: A comprehensive system: Vol. 2. Interpretation (2nd ed.). New York: Wiley. Exner, J. E., Jr. (1993). The Rorschach: A comprehensive system: Vol. 1. Basic foundations (3rd ed.). New York: Wiley. Exner, J. E., Jr. (1996). Critical bits and the Rorschach response process. Journal of Personality Assessment, 67, 464-477. Exner, J. E., Jr., & Andronikof-Sanglade, A. (1992). Rorschach changes following brief and short-term therapy. Journal of Personality Assessment, 59, 59-71. Exner, J. E., Jr., Thomas, E. A., & Mason, B. (1985). Children's Rorschachs: Description and prediction. Journal of Personality Assessment, 49, 13-20. Exner, J. E., Jr., & Weiner, I. B. (1995). The Rorschach: A comprehensive system: Vol. 3. Assessment of children and adolescents (2nd ed.). New York: Wiley. Finn, S.E. (1996). Assessment feedback integrating MMPI-2 and Rorschach findings. Journal of Personality Assessment, 67, 543-557. Frank, L. K. (1939). Projective methods for the study of personality. Journal of Psychology, 8, 389-413. Ganellen, R. J. (1996). Comparing the diagnostic efficiency of the MMPI, MCMI-II, and

< previous page

page_1154

next page >

< previous page

page_1155

next page > Page 1155

Rorschach: A review. Journal of Personality Assessment, 67, 219-243. Garfield, S. L. (1994). Research on client variables in psychotherapy. In A. E. Bergin & S. L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 190-228). New York: Wiley. Goldfried, M. R., Stricker, G., & Weiner, I. B. (1971). Rorschach handbook of clinical and research applications. Englewood Cliffs, NJ: Prentice-Hall. Hilsenroth, M. J., Fowler, J. C., & Padawer, J. R. (1998). The Rorschach Schizophrenic Index (SCZI): An examination of reliability, validity, and diagnostic efficiency. Journal of Personality Assessment, 70, 513-533. Hilsenroth, M. J., Fowler, J. C., Padawer, J. R., & Handler, L. (1997). Narcissism in the Rorschach revisited: Some reflections on empirical data. Psychological Assessment, 9, 113-121. Hilsenroth, M. J., Handler, L., Toman, K. M., & Padawer, J. R. (1995). Rorschach and MMPI-2 indices of early psychotherapy termination. Journal of Consulting and Clinical Psychology, 63, 956-965. Holt, R. R. (1967). Diagnostic testing: Present status and future prospects. Journal of Nervous and Mental Disease, 144, 444-465. Horner, M. S., & Diamond, D. (1996). Object relations development and psychotherapy dropout in borderline patients. Psychoanalytic Psychology, 13, 205-224. Klopfer, B., Ainsworth, M. D., Klopfer, W. G., & Holt, R. R. (1954). Developments in the Rorschach technique: I. Theory and development. Yonkers-on-Hudson, NY: World Book. Klopfer, B., Kirkner, F. J., Wisham, W., & Baker, G. (1951). Rorschach prognostic rating scale. Journal of Projective Techniques, 15, 425-428. Lerner, P.M. (1991). Psychoanalytic theory and the Rorschach. Hillsdale, NJ: Analytic Press. McDowell, C., & Acklin, M. W. (1996). Standardizing procedures for calculating Rorschach interrater reliability: Conceptual and empirical foundations. Journal of Personality Assessment, 66, 308-320. Meyer, G. J. (1996). Construct validation of scales derived from the Rorschach method: A review of issues and introduction to the Rorschach Rating Scale. Journal of Personality Assessment, 67, 598-628. Meyer, G. J., & Handler, L. (1997). The ability of the Rorschach to predict subsequent outcome: A metaanalysis of the Rorschach Prognostic Rating Scale. Journal of Personality Assessment, 69, 1-38. Mohr (1995). Negative outcome in psychotherapy: A critical review. Clinical Psychology, 2, 1-27. Newman, F. L., & Ciarlo, J. A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M. E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Orlinsky, D. E., Grawe, K., & Parks, B. K. (1994). Process and outcome in psychotherapynoch einmal. In A. E. Bergin & S. L. Garfield (Eds.), Handbook of psychotherapy research and behavior change (4th ed., pp. 270376). New York: Wiley. Parker, K.C.H. (1983). A meta-analysis of the reliability and validity of the Rorschach. Journal of Personality Assessment, 47, 227-231. Parker, K.C.H., Hanson, R. K., & Hunsley, J. (1988). MMPI, Rorschach, and WAIS: A meta-analytic comparison of reliability, stability, and validity. Psychological Bulletin, 103, 367-373. Perry, W., & Viglione, D. J. (1991). The Ego Impairment Index as a predictor of outcome in melancholic depressed patients treated with tricyclic antidepressants. Journal of Personality Assessment, 56, 487-501. Rorschach, R. (1921/1942). Psychodiagnostics. Bern: Hans Huber. Schachtel, E. G. (1966). Experiential foundations of Rorschach's test. New York: Basic Books. Schafer, R. (1954). Psychoanalytic interpretation in Rorschach testing. New York: Grune & Stratton. Schwarz, W. (1996). Hermann Rorschach, M.D.: His life and work. Rorschachiana, 21, 6-17. Stricker, G., & Healey, B. J. (1990). Projective assessment of object relations: A review of the empirical literature. Psychological Assessment, 2, 219-230. Weiner, I. B. (1977). Approaches to Rorschach validation. In M. A. Rickers-Ovsiankina (Ed.), Rorschach psychology (2nd ed., pp. 575-608). Huntington, NY: Krieger. Weiner, I. B. (1995). Methodological considerations in Rorschach research. Psychological Assessment, 7, 330337.

< previous page

page_1156

next page > Page 1156

Weiner, I. B. (1996). Some observations on the validity of the Rorschach Inkblot Method. Psychological Assessment, 8, 206-213. Weiner, I. B. (1997a). Current status of the Rorschach Inkblot Method. Journal of Personality Assessment, 68, 5-19. Weiner, I. B. (1997b). Teaching the Rorschach Comprehensive System. In L. Handler & M. Hilsenroth (Eds.), Teaching and learning personality assessment (pp. 215-233). Hillsdale, NJ: Lawrence Erlbaum Associates. Weiner, I. B., & Exner, J. E., Jr. (1991). Rorschach changes in long-term and short-term psychotherapy. Journal of Personality Assessment, 56, 453-465.

< previous page

page_1156

next page >

< previous page

page_1157

next page > Page 1157

Chapter 37 Butcher Treatment Planning Inventory (BTPI): An Objective Guide to Treatment Planning Julia N. Perry James N. Butcher University of Minnesota The traditions of objective personality assessment and psychotherapeutic intervention have been somewhat estranged from one another throughout much of their respective existences. This separation may be a byproduct of attitudes holding that the act of incorporating test data into the psychotherapeutic process may somehow unnecessarily bias therapists against their clients and thereby be detrimental to both treatment process and outcome. Only recently have assessment principles begun to be incorporated into the treatment planning process. Over the past several decades, theorists have proposed the use of a variety of models for the purpose of treatment planning, such as Lazarus' Multimodal Therapy (1981, 1989), Beutler and Harwood's Systematic Treatment Planning (1995), and Makover's Hierarchical Treatment Planning (1992); however, the assessment processes these procedures necessitate are not always delineated in substantial detail, and there are often few provisions for how to obtain the needed information in a reliable manner. It is here that the benefit of objective assessment procedures becomes apparent, and there are several measures intended to expedite the treatment planning process. Some of these are specific to individual disorders or diagnostic groups. Others, such as the Butcher Treatment Planning Inventory (BTPI; Butcher, 1998), are intended to address a variety of clients and psychological problems. The BTPI was created for the purpose of ''incorporating objectively derived, self-report information into the treatment process when a tactical therapeutic approach is being formulated and time is crucial" and is intended to "aid the therapist in obtaining, organizing, and employing relevant personality and symptomatic information early in the treatment process" (Butcher, 1998, p. xiii). Overview Summary of Development The development and construction of the BTPI were initiated by two relatively straightforward observations: obtaining pertinent treatment-related information in the earliest

< previous page

page_1157

next page >

< previous page

page_1158

next page > Page 1158

phases of therapy formulation can greatly benefit therapists in the process of designing treatment regimens that best suit the needs of their clients; and if one wants to obtain this type of information, the easiest way is simply to ask the clients themselves to provide it (Butcher, 1998). By intertwining the traditions of psychotherapeutic intervention and objective personality assessment, an instrument was created for the express purpose of aiding therapists in assessing the therapeutic climate and pinpointing possible impediments to the process of treatment, particularly as they relate to personality and symptomatology variables. The instrument was also informed by the concept of "assessment therapy" (Finn & Martin, 1997), a procedure in which the therapist uses a feedback model to review psychological test information with the client and thereby promote the process of behavioral change. Several key components of this model (e.g., that the therapist must obtain an accurate appraisal of the client's openness to and capacity for change and willingness to disclose information to others) were thereby incorporated into the BTPI. The BTPI is fundamentally atheoretical, as it is tied to no particular therapeutic orientation and instead is intended to highlight treatment-related variables that cut across the various theoretical models of psychological intervention. Nevertheless, as it is largely a product of clinical experience, it is informed by behavioral and cognitive-behavioral theoretical backgrounds. Scales in the inventory were both rationally and empirically developed in a manner that first involved identifying relevant constructs to assess, and the selection of those included in the inventory was based on clinical practice and the literature concerning treatment process and outcome (e.g., Beutler, 1995; Butcher & Herzog, 1982; Garfield, 1978; Koss & Butcher, 1986). Scale items were then rationally generated, with an eye toward creating a balance between brevity of completion time and adequate assessment of each construct of interest. A sample of normal individuals was administered the resulting inventory so that the initial scales could be refined and two empirically derived validity scales (discussed later) could be developed. The result of the test construction process was a behaviorally oriented, self-report measure consisting of 210 true-false items, which take approximately 30 minutes to complete under normal conditions. The BTPI items provide scores on 14 scales that fall into three clusters: validity indicators, treatment issues, and current psychological symptoms (see Table 37.1). Scale Description The four Cluster 1 scales assess the veracity of the respondent's self-report. Inconsistent Responding consists of 21 item pairs (Butcher, 1998). As one of the two empirically derived scales on the inventory, it assesses the degree to which the respondent has endorsed the item pairs in semantically inconsistent directions. For example, responses to the items "I feel much better now than I have in months" and "I feel much better now than I have in a long time" should either both be True or both be False. Individuals' raw scores on this scale indicate how many inconsistent responses they have made. Overly Virtuous Self-Views is a 15-item scale measuring the tendency to present oneself in an unrealistically positive fashion, professing to be better adjusted than the average individual (Butcher, 1998). Clients for whom this scale's score is elevated are those who are attempting to present themselves as more dependable, capable, and trustworthy than the average individual. Thus, their self-reports are not likely to be credible depictions of their current functioning. "I have never met anyone I didn't like" (T) and "I always give 10% of my income to charity" (T) are items included on this scale.

< previous page

page_1158

next page >

< previous page

page_1159

next page > Page 1159

TABLE 37.1 Summary Information for BTPI Scales Number of What It Measures Items Cluster 1: Validity Indicators Inconsistent Responding

21 pairs

Overly Virtuous SelfViews

15

Exaggerated Problem Presentationo

61

Closed-Mindedness

19

Cluster 2: Treatment Issues Problems in Relationship Formation

18

Somatization of Conflict

16

Low Expectation of Therapeutic Benefit

25

Self-Oriented/Narcissism

19

Perceived Lack of Environmental Support

17

Cluster 3: Current Symptoms Depression

18

Anxiety

15

Anger-Out

16

Anger-In

16

Unusual Thinking

15

Consistency of response pattern Tendency toward presenting an overly positive self-view Likelihood of exaggerating symptom presentation Openness to new ways of thinking and behaving

Vulnerability to developing relationship problems Channeling emotional conflicts into somatic symptoms Attitudes toward therapy and its potential value Self-centered, narcissistic lifestyle Viewing environment as negative/punishing

Low mood and suicidality Feelings of nervousness and tension Tendencies toward hostility, aggressiveness, and irritability Tendencies toward internalization of anger and self-blame Strange/delusional beliefs and unusual behaviors

The other empirically derived BTPI scale, Exaggerated Problem Presentation, is the longest scale on the inventory (Butcher, 1998). Containing 61 items, it gauges the degree to which the respondent is endorsing an unrealistically high number of symptoms that tend not to be endorsed by other psychotherapy clients. Clients with elevated scores on this scale are professing to experience an extreme number of mental health and physical complaints, and they consequently may feel completely overwhelmed by their difficulties at the present. Examples of items on this scale include "I never get irritated at other people even when they do something against me" (F) and "I feel very hopeless about other people I know" (T). The final Cluster 1 scale, Closed-Mindedness, evaluates the individual's tendency to resist self-disclosure and cognitive and behavioral change (Butcher, 1998). It is 19 items in length, and an elevated score on the scale is indicative of individuals who prefer not to talk about themselves and are reluctant to reveal personal information to others. In addition, high scorers on Closed-Mindedness report dislike for hearing others' views. They report feeling that others do not really know or understand them, and they prefer to keep it this way. Items on this scale include "It makes me uncomfortable to talk about myself" (T) and "I usually keep a good distance from others and do not express my feelings openly" (T). The second BTPI cluster consists of five scales assessing specific treatment-related issues. The 18-item Problems in Relationship Formation scale assesses lack of interpersonal trust, problems relating to other people, and a general vulnerability to developing relationship problems (Butcher, 1998). An elevated score on this scale is indicative of an individual who is unlikely to have many friends and may actually prefer being alone.

< previous page

page_1160

next page > Page 1160

"I think most people make friends just to use them for their own benefit" (T) and "I am very hard to get to know" (T) are two representative items from this scale. Somatization of Conflict is 16 items long and measures the degree to which the respondent tends to develop somatic symptoms as a means of dealing with emotional conflict (Butcher, 1998). Individuals who score high on this scale are those who recount experiencing a high number of physical problems, such as headaches, stomach problems, and other bodily distress. Moreover, they report that their symptoms co-occur with their life difficulties. Examples of items on this scale include "I have been in poor health for some time now" (T) and "My worries are not affecting my physical health" (F). The 25-item Low Expectation of Therapeutic Benefit scale evaluates the respondent's skepticism about the appropriateness and value of therapy and of making substantive therapeutic changes (Butcher, 1998). High scorers on this scale report cynicism about the benefit of following the advice of others, including health care professionals, and a dislike for trying out new activities. As they are reluctant to receive feedback from others, they are not likely to comply with treatment suggestions. Items on this scale include "I don't really feel the need to make any changes in my life now" (T) and "I really like to try out new and different things" (F). Self-Oriented/Narcissism reflects self-centered and self-indulgent interpersonal styles, such as the tendency to be selfish in relationships (Butcher, 1998). Individuals who obtain elevated scores on this scale tend to view themselves very highly and report being very selfish in relationships, as others are viewed as being secondary to themselves. Coupled with this, there is likely to be a sense of entitlement and a feeling that individuals do not receive the attention they merit. "I deserve much better treatment than I usually get from other people" (T) and "I usually get a lot of compliments about how I look" (T) are items included on the Self-Oriented/Narcissism scale. The last treatment issues scale is the 17-item Perceived Lack of Environmental Support scale (Butcher, 1998). It assesses clients' impressions of their social environment, with a focus on the degree to which they feel lonely and emotionally distant from others. High scorers on this scale are acknowledging that they regard their lives as unpleasant and the people around them as nonsupportive. They may feel resentment toward others, whom they view as having let them down. Sample scale items are as follows: "I don't feel that my problems are understood by anyone I know" (T) and "My home life is filled with arguing and bickering" (T). Finally, the five Cluster 3 scales assess the client's current psychological symptoms. Depression, an 18-item measure, was developed to assess the client's current mood state (Butcher, 1998). An elevated Depression score indicates the presence of depressed affect and a lack of energy for daily activities. There may also be problems sleeping and a feeling that life is so unpleasant that it is difficult just to get by. It is recommended that high Depression scorers be evaluated for the presence of suicidal ideation. "I frequently find myself feeling sad these days" (T) and "I no longer enjoy living as I used to" (T) are two representative Depression scale items. The Anxiety scale is 15 items in length and measures feelings of fearfulness, nervousness, and tension and a proclivity for worrying (Butcher, 1998). Individuals who score high on Anxiety are likely to worry over even small matters, and their daily functioning may be impaired by the inability to make even minor decisions. Examples of items on this scale are "I am so tense at times that I can't sit still" (T) and "I do not have any worries or problems that I cannot solve myself" (F). The first of the two anger-related scales deals with the tendency to externalize these feelings. Specifically, Anger-Out reflects individuals' tendency to express anger in a

< previous page

page_1160

next page >

< previous page

page_1161

next page > Page 1161

hostile and aggressive fashion and behave irritably toward others (Butcher, 1998). An elevated score on this scale is indicative of individuals who feel the world is filled with antagonism. High scorers are liable to feel very angry and resentful, which may lead them to strike out at others, particularly when they feel that they have been wronged. This sentiment is represented in endorsement of the item "I sometimes have thoughts of hitting or injuring someone else to get back at them" (T). Another item on this 16-item scale is "At times I am so tense that I feel I am going to explode" (T). Anger-In assesses the antithesis of the aforementioned anger response style (Butcher, 1998). Rather than measuring overt hostility, the 16 Anger-In items gauge the respondent's tendency to internalize ire. Here, an elevated score highlights a tendency toward passivity and intropunitiveness, as high scorers readily condemn themselves for problems, even when they are blameless. Low self-esteem is also liable to be present in an individual with an elevated Anger-In score, and this quality may lead to self-destructive and self-punishing behaviors. "I have gotten so angry with myself in the past that I have attempted to end my own life" (T) and "I am usually the person to blame when things go wrong" (T) are two representative items on this scale. Lastly, the 15-item Unusual Thinking scale (Butcher, 1998) gauges the degree to which the respondent exhibits strange behaviors and holds unusual beliefs. Individuals endorsing a high number of Unusual Thinking items report they have difficulty with accurate mental processing and recount that their minds are not working well. They may be very mistrustful and suspicious of those around them, and their thinking likely has a superstitious, magical quality to it. Extremely high scores probably indicate the presence of fully delusional belief systems. "I have proof that the government has spied on me in the past" (T) and "I sometimes hear voices or see visions that other people do not see" (T) are two items on the Unusual Thinking scale. In addition to the scores provided by the individual scales, two BTPI composite scores can also be calculated. The first of these is the General Pathology Composite, which is comprised of four of the current psychological symptoms scales: Depression, Anxiety, Anger-Out, and Anger-In. The nature of the symptoms included in the fifth Cluster 3 scale, Unusual Thinking, makes it more appropriate for inclusion in the Treatment Difficulty Composite, which is additionally made up of scores on Problems in Relationship Formation, Somatization of Conflict, Low Expectation of Therapeutic Benefit, Self-Oriented/Narcissism, and Perceived Lack of Environmental Support. Administration and Scoring The BTPI may be completed in the traditional paper-and-pencil manner and hand scored using a set of eight scoring templates. It may also be administered via on-screen test item presentation, whereby clients make their responses using a computer keyboard. A computer program then scores the inventory and plots scale and composite T-scores on a profile sheet. The same computer program that scores the instrument can also generate an interpretive report, further facilitating the inventory's use. The sources of the computer-based interpretive information include both empirical, externally validated data (based on the concurrent validity, predictive validity, and therapist rating data obtained in the normative and clinical samples discussed later) and theoretical hypotheses based on the assumption that endorsement of the BTPI items represents a direct communication between the client and therapist (Butcher, 1998). Three sets of linear T-scores (for men, women, and the combined male/female group) were developed to simplify the interpretive process for the 14 BTPI scales. The normative

< previous page

page_1161

next page >

< previous page

page_1162

next page > Page 1162

group on which these scores were developed is discussed in a later section on the instrument's psychometric properties. Separate T-scores were also developed for interpretation of both of the composite scores. Psychometric Properties The Normative Study. The items on the BTPI were administered to a normative sample of 800 individuals (400 men and 400 women) from eight regions of the United States (Butcher, 1998). The sample was representative of the U.S. population on the bases of sex, age, education, and race/ethnicity. A subset of 100 individuals in the normative sample were readministered the BTPI a median of 7 days after the initial testing. Test-retest reliability coefficients for the scales and composites ranged from .55 (for Inconsistent Responding, a measure of random answering patterns on which scores should not be constant over time) to .90, with most values being .80 or higher. Internal consistency reliabilities were in the high range with two exceptions: Inconsistent Responding, whose items are heterogeneous and therefore would not be expected to be psychometrically related to one another, and Low Expectation of Therapeutic Benefit, which is also theoretically complex and is not intended to be an internally consistent measure. In addition to the BTPI, the participants in the normative study were administered questionnaires to gather information regarding life history, life functioning, and physical health, as well as a battery of other measures that included the Inventory of Interpersonal Problems (Horowitz, in press; Horowitz, Rosenberg, Baer, Ureño, & Villaseñor, 1988), the Beck Depression Inventory-II (Beck, Steer, & Brown, 1996), the Beck Anxiety Inventory (Beck & Steer, 1990), and the Symptom Checklist-90-R (Derogatis, 1994). Concurrent validity analyses demonstrated that the BTPI scales were largely positively correlated with each of these and other measures of interpersonal relationships and psychological problems, with an expected configuration of convergent and discriminative validity. The College Student Study. A separate study of the BTPI's psychometric properties was conducted using college students (Butcher, 1998; see also Butcher, Rouse, & Perry, 1998). This population of individuals was examined to investigate their attitudes with respect to psychotherapy and determine whether these young adults warranted their own set of norms on the instrument. The students comprising the sample were 379 undergraduates at the University of Minnesota who participated as part of an introductory-level psychology course. Linear T-scores were computed for the sample so that college students in psychotherapy could be compared with a similar population. In an effort to examine the stability of the BTPI among college students, 100 of them were readministered the BTPI approximately 30 minutes after the initial administration. Test-retest reliability coefficients were uniformly high, with the majority of the scale values falling close to .90. Internal consistency estimates were also assessed within this sample, with values for all scales but Inconsistent Responding and Low Expectation of Therapeutic Benefit in the range of .70. Two concurrent validity studies of the BTPI were conducted using 289 students from this college sample. In addition to the BTPI, 100 of them were administered the Inventory of Interpersonal Problems (Horowitz, in press; Horowitz et al., 1988), and another 150 completed the Minnesota Multiphasic Personality Inventory-2 (MMPI-2; Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989). As was true of the normative sample, correlations between the BTPI scales and the Inventory of Interpersonal Problems octant and problem scales were largely positive. Examination of the relations between BTPI

< previous page

page_1162

next page >

< previous page

page_1163

next page > Page 1163

scales and those on the MMPI-2 revealed several notable correlations. For example, Inconsistent Responding from the BTPI was positively and significantly correlated with the MMPI-2's Variable Response Inconsistency scale, which also assesses the degree of congruity of the response style. The BTPI's Unusual Thinking scale was significantly related to F, Fb, and Scale 8 (Schizophrenia) on the MMPI-2, demonstrating the overlap in these scales' assessment of unusual thinking patterns. The 100 students who participated in the test-retest study were additionally asked to complete a brief "Therapy Experience Survey," and their responses to the seven items on this measure were examined in conjunction with the BTPI scores (Butcher et al., 1998). The items comprising the survey assessed such areas as whether the respondents had ever been in psychological treatment in the past, whether they had ever considered undergoing any psychological intervention, and whether they were willing to recommend psychotherapy to others whom they judged to be in need. This study's findings demonstrated the following: Students with no history of counseling had lower anticipations that therapy would be beneficial for them (i.e., higher scores on Low Expectation of Therapeutic Benefit) than those who had been in therapy in the past; endorsement of an unwillingness to engage in therapy was accompanied by elevated scores on Low Expectation of Therapeutic Benefit, Closed-Mindedness, and Problems in Relationship Formation; and women were more likely than men to have considered undertaking psychotherapy, actually undertaken psychotherapy, considered recommending psychotherapy to others, and gone through the process of recommending psychotherapy to others. The Clinical Study. As the BTPI is intended to examine treatment-related issues among therapy clients, a study of individuals undergoing outpatient psychotherapy was also conducted (Butcher, 1998). The participants in this study, the Minnesota Psychotherapy Assessment Project, were 460 clients undergoing therapy with 64 therapists in England and 13 states in a variety of regions around the United States. The most common diagnoses among the client participants were adjustment, affective, and anxiety disorders. The majority of the scales' internal consistency reliability estimates for the clinical sample ranged from .70 to .89. To further evaluate the potential usefulness of norms for therapy clients, a multivariate analysis of variance was performed to uncover scale score differences between the normative and clinical samples. For the most part, the clinical sample had higher ranging scores than the normative group, particularly on most of the psychological symptoms scales and on the General Pathology Composite; however, scores on the Overly Virtuous Self-Views and Low Expectation of Therapeutic Benefit scales were significantly lower for members of the clinical sample. These score differences reveal a greater willingness among the psychotherapy clients to admit to faults and a greater anticipation that therapy will be beneficial to their psychological adjustment. In addition to the BTPI, the participants in this study were asked to complete the MMPI-2, and their therapists generated a collection of external behavior correlates to describe their current functioning. Correlations between the BTPI and MMPI-2 scales generally reflected those found in the college sample. Significant correlations between therapist-rated behavior and BTPI scales included the following: dissociative symptoms with Problems in Relationship Formation; antisocial behavior with Low Expectation of Therapeutic Benefit; marital conflict with Perceived Lack of Environmental Support; self-mutilation and anorexia with Anger-In; and defensiveness with Anger-Out. The BTPI scales were factor-analyzed using the aggregate of the normative, college, and clinical samples (Butcher, 1998). After being submitted to a principle components analysis, followed by a normal varimax rotation, four factors were extracted. The first

< previous page

page_1163

next page >

< previous page

page_1164

next page > Page 1164

factor was labeled Neuroticism, and it accounts for over 45% of the variance on the measure. The BTPI scales included on it are Exaggerated Problem Presentation, Perceived Lack of Environmental Support, Somatization of Conflict, Depression, Anxiety, Anger-Out, Anger-In, and Unusual Thinking. Factor II, Cynicism-Uncooperative, includes Problems in Relationship Formation, Inconsistent Responding, Closed-Mindedness, and Low Expectation of Therapeutic Benefit. Self-Oriented/Narcissism and Anger-Out (secondarily) load on Factor III, Self-Orientation/Anger. The sole scale loading on Factor IV (Defensiveness) is Overly Virtuous Self-Views. Application of the BTPI for Treatment Planning The BTPI is intended to be used at several possible points during the course of therapy. At the earliest stages of intervention, it can be administered to clients to assess the therapy climate and gauge potential aids and detriments to the psychotherapeutic process. For example, it can help therapists determine the extent to which their clients are motivated to make behavioral changes and the degree to which they believe that psychotherapy will be useful in assisting them to do so. The qualities assessed by the treatment issues scales (perception of the degree of environmental support, difficulties in forming and maintaining relationships, etc.) are particularly important to the initial process of treatment planning, as they highlight many of the areas that help determine the degree to which psychological intervention is likely to be beneficial. Additionally, the current psychological symptoms scales can point to key problem areas that should be addressed in treatment and thereby inform the treatment plan that subsequently is created. Importantly, clients' scores at the point of their initial testing can also serve as baseline measures to be compared with scores obtained later in the therapeutic process, providing a means by which to monitor the degree of change. With the present influences of managed health care, the aforementioned benefits of using the BTPI are particularly important. As has been noted in the literature (e.g., Ben-Porath, 1997; Butcher, 1997), the use of psychological test data to quantify clients' intervention needs in a reliable and valid manner provides one solution to the problem of justifying the need for a particular set of therapeutic strategies and/or number of sessions prior to initiating treatment. With nonstructured clinical interviews as the most common alternative to objective data, there is obvious benefit in employing empirically guided decision-making strategies. Ben-Porath (1997) discussed other advantages to the use of psychological assessment methods like the BTPI in the treatment plan formulation stage. One of these is increased ease of automated test data interpretation, which has been shown to have greater validity than interpretation based on subjective clinical judgments (e.g., Dawes, Faust, & Meehl, 1989; Grove & Meehl, 1996; Meehl, 1954) and whose ultimate cost-effectiveness is apparent. Treatment Monitoring with the BTPI It is also increasingly important, in the age of managed care, for clinicians to be able to document progress and assess whether additional treatment is warranted at any given point in the therapeutic process. To the extent that changes in scale scores are indicative of reported attitudinal and behavioral transformations, reductions in score levels can be viewed as evidence of therapeutic progress. Similarly, a lack of substantial change in scores may be indicative of a failure to benefit from intervention thus far.

< previous page

page_1164

next page >

< previous page

page_1165

next page > Page 1165

Not only can a failure to produce lower scores on the BTPI scales be used as an objective justification for extending the therapy, but an examination of the scores themselves can facilitate discussion between client and therapist about why therapy may not have been more beneficial to this point. Perhaps new scale elevations have replaced or accompanied those that were present at the initial testing, possibly indicating the presence of new life problems that are impairing the client's ability to achieve maximum benefit from the treatment as it was previously formulated. The initial identified problem may be taking a back seat to formerly secondary issues that have moved to the forefront, resulting in the client's inability to engage fully in therapy as it is now progressing. In this scenario, comparison of the original and most recent BTPI profiles will provide a means by which to evaluate which necessary changes have been made and which are ongoing. To facilitate the ease with which the BTPI may be readministered at various points throughout treatment, abbreviated forms have been created and validated (Butcher, 1998). All of these have reduced the number of overall items while maintaining full scales for those they include. However, two of the alternate forms have eliminated the validity cluster scales and therefore should be used only with clients whose self-reports are trustworthy. The Symptom Monitoring Form is the briefest of the abbreviated forms. It consists of only 80 items, it typically can be administered in around 15 minutes, and it provides therapists with their clients' scores on the Depression, Anxiety, Anger-Out, Anger-In, and Unusual Thinking scales. This BTPI abbreviated form should be used when there is very little or no threat of the client providing invalid self-report test data and the therapist is confident that the treatment issues scales are not relevant. As its name implies, the Symptoms Monitoring Form is most useful for the purpose of symptom tracking throughout the course of therapy, within the context of observing the changes that occur in clients' perceptions of their psychological problems as treatment progresses. A longer abbreviated form is the Treatment Process/Symptom Form. Its 171 items provide therapists with scores on all of the treatment issues scales (i.e., Problems in Relationship Formation, Somatization of Conflict, Low Expectation of Therapeutic Benefit, Self-Oriented/Narcissism, and Perceived Lack of Environmental Support), as well as the five current symptom scales of Depression, Anxiety, Anger-Out, Anger-In, and Unusual Thinking. Because it includes the treatment issues scales, this form is best used when the therapist needs to monitor treatment-related variables, such as the client's present level of difficulty in creating and maintaining relationships with others, in addition to tracking current emotional status via evaluation of psychological symptomatology. As was true of the Symptom Monitoring Form, the Treatment Process/Symptom Form should only be used when the therapist is confident that the client is fully cooperative with the assessment process and will provide valid self-report data. Therefore, if the client has produced invalid records in the past (on either the BTPI or another self-report measure), then the full BTPI form should be administered (Butcher, 1998). The final abbreviated BTPI option is provided by the Treatment Issues Form. At 174 items in length, it assesses the Cluster I (Inconsistent Responding, Overly Virtuous Self-Views, Exaggerated Problem Presentation, and Closed-Mindedness) and Cluster 2 (Problems in Relationship Formation, Somatization of Conflict, Low Expectation of Therapeutic Benefit, Self-Oriented/Narcissism, and Perceived Lack of Environmental Support) scales only. This form is intended for use by therapists who wish to reassess validity and treatment issues in their clients but prefer to gather monitoring information about current psychological symptoms through means other than client self-report.

< previous page

page_1165

next page >

< previous page

page_1166

next page > Page 1166

Outcomes Assessment with the BTPI Assessment of the client's functioning after completion of therapy is important for the purpose of gauging the degree to which the client made progress in treatment. To this end, the BTPI can be readministered at the point of therapy termination and the scores from this assessment compared with those obtained in the initial testing. This comparison will allow both the therapist and the client to evaluate the efficacy of the therapy, and it can facilitate a discussion between both parties about the changes that have occurred. As alluded to previously, the BTPI can play an integral role in helping the therapist and client determine when to discontinue the intervention, as an examination of score differences can aid in the determination of when substantive therapeutic changes have been made by the client. As Ben-Porath (1997) noted, the types of assessment data provided by measures like the BTPI can allow for therapy termination decisions to be made empirically and objectively rather than subjectively, as is the case when only the impressions of the therapist and client (or increasingly more commonly the managed care administrator) are available. Research Applications The serviceability and practicality of the information provided by the BTPI extend beyond the purely clinical domain and into the realm of research as well. Through an examination of pre- and posttreatment scores, the inventory can aid in the determination of which intervention strategies are beneficial within different groups of clients and diagnostic problems and, consequently, what therapeutic changes may be indicated for future clients with similar difficulties. The data collected through the Minnesota Psychotherapy Assessment Project on the 460-member clinical sample discussed previously are intended for this purpose, and the other normative data also lend themselves to further evaluation through research. Case Study Suzette J., a 34-year-old Mexican American store manager, entered psychological treatment with her husband because of marital conflict they were experiencing. The J.s have been married for 8 years and have one child who is 7 years old. Mr. J. initiated the treatment referral after a period of intense arguments between them. The couple separated for 2 weeks following their last argument, but reconciled on condition that Ms. J. becomes involved in therapy. At first she was reluctant to participate in therapy but agreed to come in order to resolve some of their outstanding differences. Therapist Impressions After the Initial Session Following the initial treatment session, her therapist recorded the following impressions about Ms. J.'s problems. She was considered to be somewhat reticent to enter into the discussion of their problems with her husband, although she was tearful through much of the session. She became more involved in the process near the end of the session when she complained angrily that she got little support from him. Ms. J.'s problems, as characterized by her husband during the session, included her moodiness, irritability, and tendency to become angry over relatively minor events. The computer-generated report of Ms. J.'s BTPI results is presented in Fig. 37.1.

< previous page

page_1166

next page >

< previous page

page_1167

next page > Page 1167

Fig. 37.1. BTPI profile of a 34-year-old woman patient.

< previous page

page_1167

next page >

< previous page

page_1168

next page > Page 1168

Fig. 37.1. (Continued)

< previous page

page_1168

next page >

< previous page

page_1169

next page > Page 1169

Fig. 37.1. (Continued)

< previous page

page_1169

next page >

< previous page

page_1170

next page > Page 1170

Three-Month Follow-up Mr. and Ms. J. remained in therapy through the 3-month follow-up period. The treatment sessions were often stormy and after four sessions were interrupted for 2 weeks after Ms. J. became very angry at what she considered as being ''ganged up on." She resumed therapy with a more productive attitude. At the time of the follow-up, the therapist considered the couple to be making satisfactory progress and thought their prognosis was good for attaining better problem-solving strategies. Conclusions The BTPI is an objective personality and symptom questionnaire intended for use in treatment planning. The scales on the BTPI address the level of cooperation and important treatment destructive attitudes some patients possess. In addition, the BTPI provides an appraisal of several important symptom patterns patients in therapy often manifest. The BTPI provides an extensive evaluation of many of the qualities that are decisive factors in treatment outcome. Although it is not intended to be exhaustive, its scales provide a thorough assessment of the degree to which the therapist can have confidence in the respondent's self-report; the presence or absence of a collection of psychological problems; and, arguably most important, the client characteristics that are likely to encourage or impede the treatment process. Several forms of the BTPI are available to fulfill different facets of treatment planning, such as pretreatment evaluation, symptom monitoring over the course of therapy, and posttreatment evaluation. The norms for the BTPI were based on a representative sample of men and women from across the United States. In addition, norms are also available on a sample of college students for psychologists interested in assessing students in a college counseling setting. The BTPI has been validated on 460 patients in psychological treatment. Finally, a computer-based interpretation system has been developed to provide an objective summary of the client's treatment amenability. References Beck, A. T., & Steer, R. A. (1990). Beck Anxiety Inventory: Manual. San Antonio, TX: The Psychological Corporation. Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Beck Depression Inventory: Manual (2nd ed.). San Antonio, TX: The Psychological Corporation. Ben-Porath, Y. S. (1997). Use of personality assessment instruments in empirically guided treatment planning. Psychological Assessment, 9, 361-367. Beutler, L. E. (1995). Integrating and communicating findings. In L. E. Beutler & M. R. Berren (Eds.), Integrative assessment of adult personality (pp. 25-64). New York: Guilford. Beutler, L. E., & Harwood, T. M. (1995). Prescriptive psychotherapies. Applied and Preventive Psychology, 4, 89-100. Butcher, J. N. (1997). Assessment and treatment in the era of managed care. In J. N. Butcher (Ed.), Objective psychological assessment in managed health care: A practitioner's guide (pp. 3-12). New York: Oxford University Press. Butcher, J. N. (1998). The Butcher Treatment Planning Inventory (BTPI): Test manual and interpretive guide. San Antonio, TX: Psychological Corporation. Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). The Minnesota

< previous page

page_1170

next page >

< previous page

page_1171

next page > Page 1171

Multiphasic Personality Inventory-2: Manual for administration and scoring. Minneapolis, MN: University of Minnesota Press. Butcher, J. N., & Herzog, J. G. (1982). Individual assessment in crisis intervention: Observation, life history, and personality approaches. In C. D. Spielberger & J. N. Butcher (Eds.), Advances in personality assessment (Vol. 1, pp. 115-168). Hillsdale, NJ: Lawrence Erlbaum Associates. Butcher, J. N., Rouse, S. V., & Perry, J. N. (1998). Assessing resistance to psychological treatment. Measurement and Evaluation in Counseling and Development, 31, 95-108. Dawes, R., Faust, D., & Meehl, P. E. (1989, March). Clinical versus actuarial judgment. Science, 243, 16681674. Derogatis, L. R. (1994). Symptom Checklist-90-R: Administration, scoring, and procedures manual. Minneapolis, MN: National Computer Systems. Finn, S. E., & Martin, H. (1997). Therapeutic assessment with the MMPI-2 in managed health care. In J. N. Butcher (Ed.), Personality assessment in managed health care: Using the MMPI-2 in treatment planning (pp. 131-152). New York: Oxford University Press. Garfield. S. (1978). Research on client variables in psychotherapy. In S. L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change: An empirical analysis (2nd ed., pp. 191-232). New York: Wiley. Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293-323. Horowitz, L. M. (in press). Inventory of Interpersonal Problems: Manual. San Antonio, TX: The Psychological Corporation. Horowitz, L. M., Rosenberg, S. E., Baer, B. A., Ureño, G., & Villaseñor, V. S. (1988). Inventory of Interpersonal Problems: Psychometric properties and clinical applications. San Antonio, TX: The Psychological Corporation. Koss, M. P., & Butcher, J. N. (1986). Research on brief psychotherapy. In S. L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change: An empirical analysis (3rd ed., pp. 627-670). New York: Wiley. Lazarus, A. A. (1981). The practice of multimodal therapy. New York: McGraw-Hill. Lazarus, A. A. (1989). The practice of multimodal therapy: Systematic, comprehensive, and effective psychotherapy. Baltimore, MD: Johns Hopkins University Press. Makover, R. B. (1992). Training psychotherapists in hierarchical treatment planning. Journal of Psychotherapy Practice and Research, 1, 337-350. Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and review of the evidence. Minneapolis, MN: University of Minnesota Press.

< previous page

page_1171

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_1173

next page > Page 1173

Chapter 38 Marital Satisfaction InventoryRevised Douglas K. Snyder Grace G. Aikman Texas a&M University Couple therapy should be guided by comprehensive assessment, linkage of assessment observations to relevant theoretical frameworks, and empirical findings regarding the efficacy of competing treatment approaches (Snyder, Cavell, Heffer, & Mangrum, 1995; Snyder, Cozzi, Grich, & Luebbert, in press). An important modality for couple assessment comprises self-report methods evaluating partners' respective appraisals of their relationship. However, fewer than 40% of marital and family therapists report regularly using any standardized assessment instruments (Boughner, Hayes, Bubenzer, & West, 1994). Moreover, despite the proliferation of measures of marital and family functioning, most suffer from inadequate attention to psychometric features and few find widespread use (Schumm, 1990; Snyder, 1982; Snyder & Rice, 1996). The Marital Satisfaction Inventory-Revised (MSI-R; Snyder, 1997) was developed to address both psychometric and clinical concerns in evaluating distressed couples' relationships. The MSI-R is a multidimensional, selfreport measure of relationship satisfaction with over 20 years of empirical and clinical study supporting its reliability, validity, and utility in assessing and treating distressed couples. The inventory permits comparisons of partners' independent evaluations of their relationship across 13 profile scales, in addition to comparison of these evaluations to profiles typical of couples in therapy or from the general population. The MSI-R facilitates delineating and prioritizing partners' concerns in the development of initial therapeutic goals; it can be used throughout therapy and at termination in evaluating changes in the couple's relationship or planning additional interventions. The discussion begins with an overview of the MSI-R, emphasizing scale structure and composition, administration and scoring, and empirical foundations of reliability and validity. Attention then shifts to clinical use in assessment and treatment of distressed relationships. A basic interpretive strategy for configural analysis integrating both partners' profiles is presented, along with recommendations for incorporating assessment data into different phases of therapy. Clinical use of the computer-based interpretive narrative is also emphasized. Third, use of the MSI-R as a measure for evaluating treatment outcome is addressed. Findings from various treatment studies highlight pre-to

< previous page

page_1173

next page >

< previous page

page_1174

next page > Page 1174

to posttreatment profile changes typical of couples in treatment and the extent to which the MSI-R facilitates predicting couples' response to therapy. The chapter concludes with a clinical case highlighting the manner in which the MSI-R can be incorporated into the initial design and subsequent evaluation of treatment interventions. Overview Scale Development, Structure, and Composition The MSI-R is a complete revision and restandardization of the Marital Satisfaction Inventory (MSI; Snyder, 1979a, 1981). The revision encompasses important improvements to the instrument, including a substantially larger and more representative standardization sample, a reduction in the instrument's length by nearly half, rewording of items to facilitate the MSI-R's use with nontraditional couples, and the addition of a scale assessing relationship aggression. The MSI-R includes 150 true-false items comprising 2 validity scales, 1 global distress scale, and 10 additional scales assessing specific areas of relationship distress. It is administered to individual partners separately, and requires approximately 25 minutes to complete. Individuals' responses are scored along the 13 profile scales and are plotted on a standard profile sheet using gender-specific norms. Profile scale content, abbreviations, and sample items are discussed next.1 Inconsistency (INC). This new validity scale includes 20 pairs of items assessing the individual's consistency in responding to item content. The items comprising each pair are either similar in content and expected to be answered in the same direction, or dissimilar or nearly opposite in content and expected to be answered in opposite directions. Conventionalization (CNV). This validity scale assesses individuals' tendencies to distort the appraisals of their relationship in a socially desirable direction. Items reflect denial of minor, commonly occurring difficulties and efforts to describe the relationship in an unrealistically positive manner (e.g., "I have never regretted our relationship even for a moment." or "My partner and I understand each other completely.") Global Distress (GDS). This global measure assesses respondents' overall dissatisfaction with the relationship. Items reflect general discontent, chronic disharmony, desire for couple therapy, and consideration of separation or divorce. (e.g., "Our relationship has been disappointing in several ways." or "At times I have very much wanted to leave my partner.") Affective Communication (AFC). This scale focuses on the process of verbal and nonverbal communication and is the best single index of the affective quality of the couple's relationship. Items reflect respondents' dissatisfaction with the amount of affection and understanding expressed by their partner. (e.g., "My partner doesn't take 1 Sample items listed are all scored in the True direction. A complete listing of item composition for each scale, scoring direction, and response characteristics is provided in the test manual (Snyder, 1997).

< previous page

page_1174

next page >

< previous page

page_1175

next page > Page 1175

me seriously enough sometimes." or "Sometimes I wonder just how much my partner really does love me.") Problem-Solving Communication (PSC). This second communication scale assesses the couple's general ineffectiveness in resolving differences. Items measure overt conflict, rather than the underlying feelings of detachment or alienation. (e.g., "Minor disagreements with my partner often end up in big arguments." or "My partner and I seem able to go for days sometimes without settling our differences.") Aggression (AGG). This new scale assesses the levels of intimidation and physical aggression experienced by respondents from their partner. Items range from verbal threats to severe physical abuse. (e.g., "My partner sometimes screams or yells at me when he or she is angry." or "My partner has left bruises or welts on my body.") Time Together (TTO). Items on this scale reflect a lack of common interests and dissatisfaction with the quality and quantity of leisure time together. (e.g., "My partner and I don't have much in common to talk about." or "It seems that we used to have more fun than we do now.") Disagreement About Finances (FIN). This scale assesses partners' discord regarding the management of shared finances. (e.g., "My partner buys too many things without consulting with me first." or "It is often hard for us to discuss our finances without getting upset with each other.") Sexual Dissatisfaction (SEX). Items on this scale reflect dissatisfaction with both the frequency and quality of intercourse and other sexual activity. (e.g., "My partner sometimes shows too little enthusiasm for sex." or "My partner has too little regard sometimes for my sexual satisfaction.") Role Orientation (ROR). This scale reflects the respondent's advocacy of a traditional versus nontraditional orientation toward marital and parental sex roles. Items are scored in the nontraditional direction. (e.g., "There should be more daycare centers and nursery schools so that more mothers of young children could work." or "In a relationship the woman's career is of equal importance to the man's.") Family History of Distress (FAM). Items reflect individuals' unhappy childhoods and disharmony in the marriages of respondents' parents and extended family. (e.g., "I was very anxious as a young person to get away from my family." or "My parents didn't communicate with each other as well as they should have.") Dissatisfaction with Children (DSC). This scale assesses parental dissatisfaction or disappointment with children. Items reflect the parent-child relationship rather than the relationship between the partners. (e.g., "Having children has not brought all of the satisfactions I had hoped it would." or "I wish my children would show a little more concern for me.") Conflict Over Child Rearing (CCR). Items assess the extent of conflict between partners regarding child rearing practices and parental responsibilities. (e.g., "My partner doesn't spend enough time with the children." or "My partner doesn't assume his or her fair share of taking care of the children.") Except for the two validity scales and Role Orientation, each scale is scored so that high scores reflect high levels of distress. Items from the first 11 scales are presented in

< previous page

page_1175

next page >

< previous page

page_1176

next page > Page 1176

random order; items from the two child-related scales (DSC and CCR) appear last, so that couples without children complete only the first 129 items. Administration and Scoring Items comprising the Marital Satisfaction Inventory-Revised can be presented to individuals by means of an MSI-R AutoScoreÔ answer form, or interactively on a microcomputer using software distributed by the test publisher.2 Automated scoring and computerized interpretation are also available by facsimile or mail-in service. The MSI-R AutoScoreÔ form is a multipage form that presents individual items and a place for the respondent to indicate either True or False to each item on the same page. The individual's responses are automatically transferred to another scoring page on which items pertaining to the same scale are linked by a series of preprinted lines. Raw scores for each scale are obtained by simply tallying the number of marked responses along the line for that scale. When hand scored, one or both partners' scores on the MSI-R are transferred to a profile sheet on which raw scores are transformed into normalized T-scores, standardized separately by gender. On the profile sheet, different interpretive scale ranges are delineated by separate shaded areas, reflecting different degrees of distress in each problem area (see Figs. 38.1 and 38.2 accompanying the sample case presented later). Hand scoring and transfer of scale scores to the profile form requires approximately 10 minutes per individual. Test reports generated by Western Psychological Services through a fax or mail-in service provide a graphic display of results on a comparable profile form, along with an automated interpretive narrative based on actuarial findings for this instrument (Snyder & Lachar, 1997). Microcomputer software permits individuals to view and respond to test items interactively, with subsequent computerized scoring and interpretation in the clinician's office. For clinicians with limited microcomputer facilities, this same software permits transfer of previously recorded responses (e.g., from an administration booklet and separate answer form designed for this purpose) for computerized scoring and generation of an automated report. Types of Norms Available Respondents' raw scores on MSI-R scales are converted to normalized T-scores based on separate norms for men and women. The new standardization sample for the MSI-R represents a substantial improvement over the original normative group and included 2,040 persons (1,020 intact couples). This new sample was geographically diverse and had a balance that was consistent with the population of the U.S. census regions. The sample was also representative of the U.S. population for such demographic characteristics as ethnicity, educational level, and occupation. The broad age range of the sample ensured representation of persons in their late teens through those in their 70s and beyond (Snyder, 1997). In addition to the standard English version, the MSI-R has been translated into Spanish (Snyder, 1996). Separate norms for the Spanish translation of the MSI-R have 2 The Marital Satisfaction Inventory-Revised (MSI-R; including materials for administration and either hand or computer scoring and interpretation) is published and distributed by Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025.

< previous page

page_1176

next page >

< previous page

page_1177

next page > Page 1177

not been developed; consequently, interpretation of scores for this version should proceed with caution. However, a preliminary study of the MSI-R with Mexican American couples provided initial evidence supporting the appropriateness of using this instrument within the Hispanic community (Negy & Snyder, 1997). The MSI-R profile form permits easy comparison of individuals' scores to those typical of men or women sampled at large from a nonclinic population; the median normalized T-score is 50, and approximately two thirds of individuals from a nondistressed sample could be expected to have scores in the range of 40T to 60T. It is often useful, however, to compare an individual's scores on the MSI-R to the average profile for specific criterion groups. The manual for the MSI-R (Snyder, 1997) includes group mean profiles for samples of individuals entering marital therapy, individuals completing marital therapy, couples seeking treatment at a sexual dysfunctions specialty clinic, physically battered women seeking refuge at a spouse abuse shelter, couples in which one or both spouses are in individual treatment for nonmarital emotional and behavioral disorders, and parents of psychiatrically hospitalized children or adolescents. In drawing comparisons between an individual profile and the mean profile for any criterion group, it is essential to emphasize the high degree of profile variability across individuals comprising any specific group. Consequently, similarities between individuals' MSI-R profile and the mean MSI-R profile for some criterion group may suggest that respondents share important characteristics with that group in terms of their relationship, but these should be explored carefully in interview; similarly, discrepancies between individuals' MSI-R scores and the mean profile for some criterion group do not rule out the possibility that respondents share important common features of that group. Reliability of Scales The MSI-R scales possess high levels of both internal consistency and temporal stability. Cronbach alpha coefficients of internal consistency range from .70 to .93 (M = .82). Test-retest reliability analyses over a 6-week interval for a sample of 210 respondents (105 couples) yield stability coefficients ranging from .74 to .88 (M = .79). Standard errors of difference based on test-retest correlations average approximately 6.0 T-score points. These relatively high reliability indices have important implications for test interpretation. First, changes in clients' profiles across time most likely reflect genuine changes in the respondents' experience of their relationship rather than chance variations in test scores. Second, both variability in scores across scales for a given individual and discrepancies between partners on the same scale can more safely be interpreted as reflecting reliable, meaningful differences. Specifically, in evaluating the difference between partners' scores on any given scale or the difference between an individual's scores on any given scale across time, it can be concluded that a difference of approximately 6 T-score points or more is significant with roughly 68% confidence. Validity of Scales Evidence for the validity of the MSI-R scales derives from four sources: correlations between revised MSI-R scales and original MSI scales; studies of group discriminant validity; correlational studies of scales' convergent validity; and actuarial studies identifying the interpretive meaning of scores on each scale across distinct scale ranges.

< previous page

page_1177

next page >

< previous page

page_1178

next page > Page 1178

Comparison of Revised and Original Scales. In order to extend the empirical validity established for the original MSI across 15 years of research to the scales that constitute the revised instrument, correlations between the original scales and their corresponding revisions on the MSI-R were examined for a national sample of 646 individuals (323 couples). Correlations ranged from .94 to .98 (M = .95), lending strong support to the translation of previous validity findings for the scales of the original MSI to the corresponding scales of the MSI-R, incorporating revised norms derived from the new standardization sample using normalized T-scores. Discriminant Validity. More than 20 studies have examined the discriminant validity of the original Marital Satisfaction Inventory across diverse clinical samples. These studies have generally demonstrated that the MSI discriminates between groups expected to differ on conceptual or theoretical bases. Several studies have confirmed the ability of the MSI to discriminate between couples in marital therapy and nondistressed couples from the general population. Snyder (1979b) administered the MSI to 30 couples in marital therapy at various private and community settings. Subsequent analyses indicated that couples in therapy differed significantly from matched control couples on each of the 11 MSI scales. Snyder, Wills, and Keiser (1981) administered the MSI to 50 couples entering marital therapy and obtained a mean profile for this clinical sample similar to the one reported earlier. More recently, the MSI-R was administered to a new sample of 50 couples entering marital therapy; scores for this clinical sample were compared to a subgroup of the standardization sample comprising couples from the same geographic community not in therapy. As shown in Table 38.1, each of the 13 scales comprising the MSIR distinguished between couples in therapy and nonclinic community couples. As a group, couples in marital therapy uniformly display low CNV scores, high GDS scores, and high elevations on the affective triad (AFC, PSC, and TTO). A high degree of within-group variability exists for the remaining scales, so that mean scores in these areas tend to be lower although they are still elevated above what is normally expected. In addition, individual profiles of distressed partners tend to exhibit a high degree of within-subject variability that facilitates specifying the relative sources of subjective distress. One of the most common clinical and research applications of the MSI has been as a measure for evaluating couples' change in response to clinical interventions (Frank, Dixon, & Grosz, 1993; Iverson & Baucom, 1990; Jacobson et al., 1989; Jacobson & Truax, 1991; Klann & Hahlweg, 1995; Snyder & Wills, 1989; Snyder, Wills, & Grady-Fletcher, 1991; Waring, Stalker, Carver, & Gitta, 1991; Whisman & Jacobson, 1992; G. L. Wilson, Bornstein, & L. J. Wilson, 1988). In general, studies have consistently shown the MSI scales to be sensitive indicators of couples' response to treatment; from a clinical perspective, midtreatment changes in couples' profiles serve both as an indicator of gains in specific domains as well as a marker of relationship areas in which further clinical interventions are warranted (Snyder, Lachar, & Wills, 1988). Pre- and posttreatment MSI profiles for 59 couples treated in a comparative therapy outcome study funded by the National Institute of Mental Health (NIMH; Snyder & Wills, 1989) are presented in Table 38.2. Findings confirmed significant changes on each of the MSI scales reflecting relationship distress, with the greatest changes occurring on those scales reflecting general relationship affect and communication and secondary reductions on remaining scales reflecting relationship conflict in specific areas. Snyder, Wills, and Grady-Fletcher (1991) followed-up on couples from this same outcome study 4 years after completing treatment. Couples who had divorced within 4

< previous page

page_1178

next page >

< previous page

page_1179

next page > Page 1179

TABLE 38.1 Mean MSI-Scale Scores for Couples in Marital Therapy and Matched Community Couples Not in Therapy Scale Therapy Couples Community Couples t M M (SD) (SD) 53.8 47.7 5.98* INC (7.5) (8.2) 38.6 50.7 11.72* CNV (5.8) (9.2) 64.9 47.4 18.24* GDS (6.9) (7.8) 61.3 47.6 12.78* AFC (7.7) (8.7) 62.5 47.3 13.50* PSC (7.8) (9.4) 56.9 49.8 5.73* AGG (10.3) (9.3) 60.1 49.1 10.00* TTO (8.2) (8.9) 57.4 50.3 5.79* FIN (10.6) (8.9) 56.3 49.4 5.44* SEX (10.3) (9.6) 54.0 50.4 3.33* ROR (7.3) (9.0) 54.6 49.3 4.32* FAM (9.6) (9.3) 54.6 49.4 3.38* DSC (11.3) (10.1) 57.9 48.3 7.29* CCR (11.2) (7.5) Note. For marital therapy sample, n = 100 (78 for DSC and CCR). For community sample, n = 154 (122 for DSC and CCR). df = 198 for DSC and CCR; df = 252 for remaining scales. *p < .001 Material from the Marital Satisfaction Inventory, Revised, copyright © 1997 by Western Psychological Services. Used by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, USA. Not to be reprinted in whole or in part for any additional purpose without the expressed, written permission of the publisher. All rights reserved. years after treatment were compared with couples still happily married after the same period had elapsed with regard to their MSI profiles obtained prior to marital therapy. At intake, couples eventually pursuing divorce demonstrated relatively higher scores on measures of global distress, difficulties in problem-solving communication, dissatisfaction with children, and family history of distress. Several studies have examined the usefulness of the MSI with couples in which one or both partners experience verbal or physical aggression from the other. In an early study, Snyder, Fruchtman, and Scheer (1980) collected MSI data from 66 women who had sought protection from their physically abusive partners at a residential shelter for battered women. Compared to women entering marital therapy, physically battered women showed highly elevated MSI profiles with scores consistently 5 to 10 T points higher than those for women beginning marital therapy on measures of GDS, FIN, CCR, and the affective triad (AFC, PSC, and TTO). Physically abused women producing

< previous page

page_1179

next page >

< previous page

page_1180

next page > Page 1180

TABLE 38.2 Mean MSI Scale score Comparisons for Couples Prior to Versus Completing Marital Therapy Scale Pretherapy Posttherapy t M M (SD) (SD) 39.9 44.9 7.69** CNV (4.5) (8.5) 64.7 55.1 11.99** GDS (8.3) (9.7) 60.9 52.8 8.58** AFC (9.2) (10.9) 63.0 53.0 10.07** PSC (7.4) (11.0) 58.8 51.9 8.16** TTO (9.1) (9.4) 57.8 52.9 5.77** FIN (11.7) (11.1) 57.8 53.6 5.42** SEX (9.1) (9.7) 56.9 57.3 -0.88 ROR (8.4) (9.1) 52.5 51.2 2.56* FAM (9.9) (10.4) 53.4 50.3 3.51** DSC (11.1) (8.7) 55.7 50.6 5.47** CCR (11.6) (9.7) Note. Based on Snyder and Wills (1989). Scores were obtained on original MSI using linear T-scores. For DSC and CCR, n = 94; for remaining scales, n = 118. *p < .05 **p < .001 Material from the Marital Satisfaction Inventory, Revised, copyright © 1997 by Western Psychological Services. Used by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, USA. Not to be reprinted in whole or in part for any additional purpose without the expressed, written permission of the publisher. All rights reserved. somewhat lower MSI profiles reported during interview more satisfactory relationships, less pervasive abuse, and a stronger inclination to return to their partners following their residence at the shelter. In a separate study of 60 couples reporting violence by the male against his female partner (Edleson, Eisikovits, Guttmann, & SelaAmit, 1991), higher scores for men on the Conflict Over Child Rearing (CCR) scale predicted higher levels of physical violence. The need for a screening measure of relationship violence led to the addition of an Aggression (AGG) scale in the revised MSI-R. In an initial study of this scale's validity, Snyder and Snow (1995) found that scores on the AGG scale discriminated successfully between battered women and maritally distressed, nonabused women. By contrast, aggressive male partners did not differ from nonabusing men in marital therapy, confirming the scale's specificity in identifying victims, rather than perpetrators, of relationship intimidation and physical aggression. The entanglement of conflict over financial concerns with generalized relationship distress represents a difficult assessment and intervention challenge confronting the couple therapist. A recent study (Aniol & Snyder, 1997) examined the role of financial conflicts,

< previous page

page_1180

next page >

< previous page

page_1181

next page > Page 1181

problem-solving communication deficits, and global relationship distress among 25 couples in marital therapy and 32 couples seeking assistance at a nonprofit agency providing financial counseling services. Couples seeking financial counseling obtained their highest score on the Disagreement About Finances (FIN) scale, but also showed moderate difficulties in problem-solving communication. Results confirmed the importance of specifically evaluating both the nature and level of relationship distress and conflict concerning finances when intervening with couples reporting financial difficulties. Berg and Snyder (1981) administered the MSI to 45 couples having primary complaints of dissatisfaction with their sexual relationship and being seen in brief, directive conjoint therapy at a sexual dysfunctions specialty clinic. These couples were compared on the MSI with 45 couples entering marital therapy. As a group, sexually distressed couples were significantly higher than more generally maritally distressed couples on the Sexual Dissatisfaction (SEX) scale, with 75% of individuals in the former sample having SEX elevations exceeding 60T. Various studies have examined the correspondence of personality functioning and psychopathology to marital and other relationship distress. Although relationship difficulties and individual psychopathology clearly overlap, it is important for a measure of relationship functioning to be able to distinguish between the two. Toward this end, Snyder and Regts (1990) administered the MSI to three samples of 30 couples in marital therapy, 30 couples in which one partner was psychiatrically hospitalized or in individual outpatient psychotherapy, and 30 nonclinic couples from the general population. Although individuals in the psychiatric sample reported general marital discord and communication difficulties at a level higher than nonclinic community couples, the MSI still discriminated successfully between the psychiatric sample and couples in marital therapy, with the latter group obtaining significantly higher scores on MSI scales reflecting negative relationship affect (GDS and SEX) and communication difficulties (AFC and PSC). Results suggested the usefulness of the MSI in identifying individuals presenting with their own emotional or behavioral concerns for whom relationship difficulties comprise an additional significant issue warranting separate attention. Both the clinical and research literature points to a frequent linkage between emotional and behavioral disturbances in children or adolescents and disruption in the parents' relationship. In an investigation of this linkage, the MSI was administered to three samples of 23 couples in marital therapy, 28 couples identified on the basis of their child or adolescent having been psychiatrically hospitalized for emotional or behavioral problems, and 59 nonclinic couples from the general population (Snyder, Klein, Gdowski, Faulstich, & LaCombe, 1988). Results indicated that parents of psychiatrically hospitalized children or adolescents showed the highest rates of dissatisfaction with their children's adjustment (DSC) compared to both marital therapy and community samples, and lower rates of distress in the couple's own relationship compared to the marital therapy sample. These findings suggested the usefulness of the MSI in distinguishing distressed from nondistressed relationships among parents of children or adolescents with emotional or behavioral difficulties. Convergent Validity. Correlational studies of the MSI scales' convergent validity have established the relatedness of these scales to a broad range of affective and behavioral components of marital interaction. For example, as an overall measure of relationship accord, the Global Distress (GDS) scale has been found to correlate highly with both the LockeWallace (1959) Marital Adjustment Test (Snyder, 1979b), and with Spanier's (1976) Dyadic Adjustment Scale (Snyder & Wills, 1989; Whisman & Jacobson,

< previous page

page_1181

next page >

< previous page

page_1182

next page > Page 1182

1992; Wilson et al., 1988). Higher levels of relationship distress using the GDS scale or factor scales derived for the original MSI (Snyder & Regts, 1982) have been related to dysfunctional relationship beliefs (Jones & Stanton, 1988), gender-role orientation (House, 1986), lower levels of positive affect observed during conflict resolution (Morell & Apple, 1990), higher levels of depressive symptomatology (Heim & Snyder, 1991), Type A personality characteristics (Houston & Kelly, 1987), poorer response to anti-depressant medication (Bromberger, Wisner, & Hanusa, 1994), relationship appraisals among partners of cancer patients (Clipp & George, 1992), pain behavior reinforcement by partners of chronic pain patients (Kremer, Sieber, & Atkinson, 1985), independent reports of distress in the family of origin and partner's lack of support in the home (Blattberg & Hogan, 1994), and teacher ratings of emotional and behavioral difficulties for one or more of the respondent's children (Westerman & Schonholtz, 1993). Studies examining the MSI's two communication scales (AFC and PSC) have found one or both scales to relate to observational ratings of anger and contempt (Burman, Margolin, & John, 1993), to discriminate between depressed and nondepressed individuals (Basco, Prager, Pita, Tamir, & Stephens, 1992), and to reflect gains in communication skills associated with brief cognitive-behavioral couples therapy (Waring et al., 1991). Observational ratings of couples' communication during a conflict resolution task were related to the AFC and PSC scales in a study of 30 clinic and 12 nonclinic couples (Snyder, Trull, & Wills, 1987). Lack of emotional closeness and problem-solving difficulties reflected by scores on these two scales were linked to high negative affect and low positive affect of both speaker and listener, low rates of agreement, high rates of disagreement, attribution of unexpressed thoughts or feelings to the partner, and restatement of one's own views to the neglect of summarizing the partner's views. Similar evidence of convergent validity has been obtained for MSI scales assessing additional domains of couples' interaction. In a study of couples' use of leisure time (Smith, Snyder, Trull, & Monsma, 1988), high scores on the Time Together (TTO) scale were linked to the absence of leisure interaction with one's partner and, instead, allocation of discretionary time to individual pursuits or to interaction with others exclusive of the partner. Factor analytically derived dimensions of partners' time together accounted for nearly twice the variance in relationship satisfaction for women as for men. In a study of 32 couples seeking assistance at a nonprofit agency providing financial counseling services (Aniol & Snyder, 1997), scores on the Disagreement About Finances (FIN) scale related to individuals' reports of both their own and their partner's handling of finances. For both men and women, higher scores on the FIN scale were linked to reports of frequent relationship arguments around purchases and handling of money, as well as to complaints regarding the partner's irresponsibility in financial decisions, inability to follow a budget, excessive reliance on credit cards, and lack of sustained effort toward shared financial goals. Several studies have examined correlates of the Sexual Dissatisfaction (SEX) scale and its sensitivity as an indicator of couples' response to marital or premarital interventions (Markman, Floyd, Stanley, & Storaasli, 1988; Schroder, Hahlweg, Hank, & Klann, 1994). Snyder and Berg (1983a) administered the SEX scale and separate criterion checklists reflecting both sexual and other interpersonal concerns to 45 couples in sex therapy. For both men and women, high scores on SEX were linked to the partner's lack of response to sexual requests, infrequent intercourse, and lack of affection for the partner. In addition, men's SEX scores related to their wives' failure to reach orgasm during intercourse, whereas their wives' scores on SEX related more closely to concerns regarding their own sexual adequacy.

< previous page

page_1182

next page >

< previous page

page_1183

next page > Page 1183

The Family History of Distress (FAM) scale on the MSI has been found to correlate highly with independent measures developed to assess psychological health and adjustment in the family of origin (Ryan, Kawash, Fine, & Powel, 1994). In a study of 49 families with a suicidal adolescent (Mitchell & Rosenthal, 1992), the MSI's Conflict Over Child Rearing (CCR) scale correlated with father-child unresolved conflict and discriminated between these families and a comparison sample of families with distressed but nonsuicidal adolescents. Finally, in a study of 173 young adults' attitudes toward marital and parental roles (Snyder, Velasquez, Clark, & Means, 1997), scores on the Role Orientation (ROR) scale correlated strongly with scores on this scale obtained from the respondents' mothers and fatherssuggesting a strong intergenerational linkage in relationship role dispositions. In the same study, scores on ROR were mostly unrelated to more global genderrole attitudes, indicating the specificity of the ROR scale to views regarding marital and parental roles and responsibilities. In addition to correlational findings concerning individual scales and conceptually related measures already described, two studies have examined the relation of the MSI to broad-band multidimensional measures of psychopathology and personality functioning in adults and children or adolescents. The first study (Snyder & Regts, 1990) examined correlations between the MSI and the Minnesota Multiphasic Personality Inventory (MMPI; Hathaway & McKinley, 1967) in a sample of 90 couples. Relationship conflict assessed by the MSI related strongly to the MMPI's Pd scale (Psychopathic Deviance), as well as to other dimensions of psychological disturbance tapped by the MMPI including Depression (D) and scales comprising the psychotic triad (PaParanoia; ScSchizophrenia; and MaHypomania). These findings were consistent with clinical and research literature suggesting that poor impulse control, hypersensitivity to perceived criticism, exaggerated selfappraisal, history of impaired interpersonal relationships, or experience of overt psychotic symptomatology may all predispose a person toward impaired relationship functioning. The second study (Snyder, Klein et al., 1988) examined correlations between the MSI and the Personality Inventory for Children (PIC; Wirt, Lachar, Klinedinst, & Seat, 1984) in a sample of 110 couples in which parents completed the PIC with reference to their ''most problematic" child or adolescent. Results showed a strong relation between scales on the MSI assessing parental concerns about children (DSC) or partners' conflict over child rearing (CCR) and a broad range of measures on the PIC reflecting both internalizing and externalizing disorders in the child or adolescent. These findings suggested that high scores on either the DSC or CCR scales of the MSI should prompt careful inquiry into possible emotional or behavioral difficulties of one or more children in the home. Actuarial Validity. Although conceptually overlapping with studies of both discriminant and convergent validity, actuarial studies of the MSI differ in the manner in which group differences or correlational findings are analyzed. The actuarial assessment of couples' relationships implies that, for a given set of test scores from one or both partners, the clinician draws on an extensive database in evaluating behaviors, feelings, and cognitions likely to be expressed by each partner in the relationship (Snyder & Costin, 1994; Snyder, Lachar et al., 1991). The assessment relies on probability estimates derived from previously identified relations between test scores and nontest criteria relevant to couples' interaction, rather than on clinical intuition or the content of individual test items to which partners have responded.

< previous page

page_1183

next page >

< previous page

page_1184

next page > Page 1184

Snyder, Lachar et al. (1991) described four steps in the derivation of actuarially based interpretive systems for psychological measures. The first of these involves identifying statistically significant and reliable associations between predictors and respective criteria. The second requires construction of contingent-frequency tables delineating the likelihood that some external criterion (e.g., some affect, cognition, or behavior) will be observed given some range of scores on the predictor measure. A related third step requires identifying that score range at which some external criterion becomes significantly more or less likely than what would be expected by chance alone. The final step involves integration of these probability analyses both within and across scales to derive interpretive guidelines or narratives for various scale score ranges or across scale configurations. In addition to data accrued from studies of discriminant and convergent validity already described, three additional studies were conducted specifically to assist in the actuarial interpretation of profile scales on the Marital Satisfaction Inventory. An initial clinical validation study of the MSI (Snyder et al., 1981) examined the relation of individual MSI scales to clinicians' ratings of 50 couples entering marital therapy. Following an extensive conjoint interview, each husband and wife was rated separately on 61 clinical criteria assessing: general presentation of self and the marriage; specific areas of interaction between spouses (e.g., communication, leisure time, finances, sexual relationship); family history and role dispositions; psychiatric and physical distress; spousal interactions regarding children; and clinician-rated prognosis for response to marital therapy. For each item, the clinician rated the presence or absence of that criterion and, if present, whether the criterion was evident to a moderate or an extensive degree. Clinicians' ratings of partners in each of these domains were subsequently correlated with husbands' and wives' scores on the MSI. Results provided broad support for the validity of individual scales and the ability of the MSI to distinguish among levels and sources of relationship distress among couples entering marital therapy. Scheer and Snyder (1984) extended these results in a replication study using 50 nonclinic couples sampled from the general population. Following completion of the MSI and an extensive conjoint interview, each husband and wife was rated separately on a 76-item criterion checklist including items used in the Snyder et al. (1981) clinic study plus 16 new criteria reflecting only mild distress or unusually gratifying aspects of the marital relationship. Similar to results from the earlier clinical study, correlational findings with this nonclinic sample lent strong empirical support to the interpretive intent of MSI scales. In addition, these findings offered empirical support for use of the MSI with nonclinic couples and with couples presenting other than primary complaints of marital distress. Finally, Snyder and Lachar (1986) conducted a national validation study in collaboration with the test publisher, Western Psychological Services, based on a sample of 323 couples engaged in marital therapy with 161 therapists reflecting a broad range of professional backgrounds from all geographic regions of the continental United States. All spouses completed the MSI and an extensive checklist on which they rated the extent of relationship difficulties in specific areas, predicted the future course of the marriage and likely success in resolving marital difficulties, and rated both themselves and their partner on 43 descriptors of intrapersonal and interpersonal functioning. In addition, each therapist completed a checklist describing areas of relationship dysfunction, partners' individual emotional difficulties interfering with the marriage, and probable course of therapy and future of the couple's relationship. Overall, 766 descriptors of the marital relationship and individual partners were found to correlate with MSI scales across

< previous page

page_1184

next page >

< previous page

page_1185

next page > Page 1185

mixed-gender samples; in addition, 90 correlates specific to either husbands or wives were identified (Snyder, Freiman, & Lachar, 1989). As in the Snyder et al. (1981) and Scheer and Snyder (1984) studies, scales' significant correlations with partners' and clinicians' ratings generally conformed to scales' interpretive intent and with nomological nets delineated in previous validational efforts. In each of these three studies of actuarial validity, contingent frequency tables were constructed for each of the significant scale-to-criterion correlations obtained. In all, more than 1,200 such tables were constructed. These tables delineated the probability that a given relationship characteristic would be observed as a function of partners' scores across each scale of the MSI. Findings across these tables were integrated to identify low, moderate, and high ranges for each of the MSI scales. (See Snyder, Lachar et al., 1991, for annotated examples of actuarial analyses.) Following revision of scales for the MSI-R, data from each of the three original studies of actuarial validity were completely reanalyzed. Scale-to-criterion correlations for the revised scales were identified, and new contingent frequency tables, again numbering over 1,200, were constructed for scale correlates. Actuarial findings for all external correlates across respective scales were reviewed to derive revised interpretive guidelines for scores in the low, moderate, or high ranges of each scale. These empirically derived interpretive narratives comprise the computer-based interpretive system for the MSI-R (Snyder & Lachar, 1997). Additional interpretive guidelines were developed to incorporate moderating effects on scale interpretation from configurations both within and between partners' profiles. An example of actuarial analyses is provided in Table 38.3. This table lists correlates of the Sexual Dissatisfaction (SEX) scale along with their prevalence associated with low, moderate, or high scores on this scale. Examination of these findings resulted in the following interpretive narrative for high scores on SEX: This individual indicates extensive dissatisfaction with the sexual relationship. Disagreements regarding the frequency or variety of sexual behaviors are likely to be frequent; it is somewhat unlikely that sexual complaints arise entirely from more general difficulties, and specific interventions in this area may be warranted. [Men/Women] with similar scores often describe their partner as uncaring or uncommitted to a satisfying sexual relationship, and are likely to complain of deficits in nonsexual expressions of affection. The couple is likely to have substantial difficulty in discussing sexual issues openly and effectively. The actuarial approach and empirical findings on which interpretation of the MSI-R is based distinguish this instrument from virtually every other measure of marital or family functioning reported in the literature, a feature noted in independent reviews of this instrument (Bascue, 1985; Boen, 1988; Burnett, 1987; Dixon, 1985; Fowers, 1990; Waring, 1985). The remainder of this chapter addresses means by which clinicians and researchers can incorporate these empirically based interpretive narratives in the assessment and treatment of distressed marital relationships. A Basic Interpretive Strategy Interpretation of the MSI-R profile proceeds systematically across scales to consider issues related to profile validity and global relationship affect, partners' communication and anger management, specific areas of interaction, concerns regarding children (if these items are completed), role orientation, and family history of distress. Interpretation

< previous page

page_1185

next page >

< previous page

page_1186

next page > Page 1186

TABLE 38.3 Correlate Frequency Across Score Ranges for the SEX Scale Percentages Descriptor LowMod.High 6 43 78 Reports significant problems with overall quality of the sexual relationship. 24 44 78 Expresses wish for improvement in sexual relationship. 33 55 85 Is dissatisfied with frequency of sexual activity. 11 47 84 Describes frequency of intercourse as significant problem. 1 14 58 Describes partner as uninterested in sex. 8 25 58 Dissatisfied with variety of sexual activity. 11 42 50 Inability of partners to discuss sexual matters. 2 12 53 Describes partner as sexually uncaring. 95 67 43 Describes partner as sexually exciting. 99 78 42 Describes partner as sexually satisfying. 94 73 60 Describes self as sexually satisfying. 22 36 75 Describes lack of love and affection from partner as a significant problem. 8 43 54 States that partner does not show enough affection. 17 57 67 Describes feeling emotionally distant from partner. 31 55 60 Reports lack of affection among parents and siblings. Note. Data indicate the percentage of individuals in a given range (low, moderate, or high) who exhibit the descriptor. Based on Snyder, Wills, and Keiser (1981), Scheer and Snyder (1984), and Snyder and Lachar (1986). Material from the Marital Satisfaction Inventory, Revised, copyright © 1997 by Western Psychological Services. Used by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, USA. Not to be reprinted in whole or in part for any additional purpose without the expressed, written permission of the publisher. All rights reserved. of individual profiles relies primarily on scale-by-scale interpretation and, to a lesser extent, on configural analysis. The interpretive strategy for MSI-R profiles obtained from both partners is similar to that used with individuals, but emphasizes configural analyses comparing partners' scores along each of the MSI-R scales, in addition to examining configural patterns for each partner separately. In particular, it is useful to consider both the absolute differences in partners' scores along each scale, as well as their differences in rank order of scale elevations. Elevations on particular scales for one partner frequently contribute to an understanding of elevations on different scales for the other. The following steps are offered as an overall strategy for MSI-R profile analysis. Step 1 Profile Validity. The MSI-R answer form should be examined for an excessive number of omitted items precluding scoring of the inventory. Examination of scores on Inconsistency (INC) and Conventionalization (CNV) provides an additional indication of profile validity and potentially problemmatic response sets. High scores on INC may indicate random responding or sporadic inattention to item content. High scores on CNV reflect distortion of relationship appraisals in an idealized direction and suggest a reluctance to engage in critical analysis of relationship processes and areas of potential concern.

Step 2 Overall Relationship Distress. Respondents' overall level of relationship distress should be assessed based both on scores on Global Distress (GDS), as well as the generalization of distress at moderate or higher levels across scales assessing specific

< previous page

page_1186

next page >

< previous page

page_1187

next page > Page 1187

relationship domains. Although moderate levels of overall distress may prompt efforts to improve the relationship, high levels of distress or generalization across multiple aspects of the partners' interactions may contribute to negative expectations regarding the relationship's future, reduced resilience to specific conflict episodes, and excessive anger or emotional guardedness interfering with the challenges of couples therapy. Differences in partners' overall relationship affect should be acknowledged and their implications regarding motivations for therapy should be examined. Step 3 Communication Effectiveness and Anger Management. Partners' descriptions of affective and problem-solving communication (AFC and PSC) should be assessed by examining relative distress in each of these domains for each partner separately as well as between partners conjointly. Differences in partners' concerns about emotional intimacy versus efficacy in conflict resolution should be explored in terms of potential stylistic differences in approaches to dealing with relationship distress. Scores on Aggression (AGG) should be examined to evaluate the extent to which difficulties in conflict resolution extend to verbal or physical intimidation and aggression. Indications of prior relationship aggression should prompt careful assessment of steps taken by either partner to disrupt the potential for its recurrence. Step 4 Specific Domains of Relationship Conflict. Partners' descriptions of conflict in specific relationship domains should be evaluated based on their scores on Time Together (TTO), Disagreement About Finances (FIN), and Sexual Dissatisfaction (SEX). Differences in levels of conflict across specific domains facilitate prioritizing areas for initial treatment interventions. Areas of relative satisfaction may be emphasized as a means of building on relationship strengths and increasing resilience to distress in other domains. The degree of similarity or divergence in partners' views of their relationship should be evaluated for the potential impact on establishing a collaborative alliance and partners' agreement on initial therapeutic goals. Step 5 Concerns Regarding Children. If items on the two child-related scales (DSC and CCR) have been completed, partners' conflict over child rearing should be evaluated and compared with respondents' views regarding their children's emotional and behavioral adjustment and potential disruption in the parent-child relationship. Tensions between partners regarding children should be assessed in the context of partners' conflict and difficulties in collaborating across other domains of their relationship. Similarly, disruption of parent-child relationships should be compared to respondents' emotional distance from the partner to evaluate their overall level of isolation and disaffection within the family. Step 6 Role Orientation and Family History of Distress. Partners' attitudes toward marital and parental roles (ROR) and history of distress in respective families of origin (FAM) should be evaluated for their potential contribution to current relationship difficulties. Discrepancies between espoused versus enacted roles should be explored with each partner, along with partners' differences in role attitudes. Similarly, styles of managing conflict and intimacy in each partner's family of origin and differences in this regard should be examined for their potential contribution to conflicts in these areas in the couple's own relationship. Step 7 Additional Analysis at the Item Level. Analysis of partners' responses to the MSI-R at the item level may offer additional information regarding specific sources of relationship distress. For example, it may be useful to determine that a couple's elevated

< previous page

page_1187

next page >

< previous page

page_1188

next page > Page 1188

scores on Sexual Dissatisfaction (SEX) result not from distress with the frequency of sexual relations but rather from dissatisfaction with the specific content of sexual exchanges. Second, an analysis of individual items may help to pinpoint specific situations leading to distress in an area; for example, a review of partners' responses to items comprising the Conflict Over Child Rearing (CCR) scale may highlight disagreements concerning how much time either parent spends with the children. Third, analysis of responses at the item level can sometimes serve a didactic function; for example, discussion of individual items on the Affective and Problem-solving Communication scales (AFC and PSC) can lead to important interventions emphasizing active listening or conflict resolution skills. Computer-Based Test Interpretation of the MSI-R Users of the MSI-R can obtain computer-generated narrative reports for individuals or couples by mailing or faxing special optically scanned answer sheets directly to the publisher, or by administering the inventory or recording partners' responses on their own personal computer using microcomputer software distributed by the test publisher. The MSI-R is one of a small number of instruments for which an automated interpretive system has been developed almost exclusively on the basis of actuarially based external validation studies. Similarly, the computer-based test interpretation (CBTI) for the MSI-R is one of the very few CBTI's to have been subjected to careful empirical scrutiny, including rigorous controls for various response sets influencing consumers' ratings of CBTI accuracy. Hoover and Snyder (1991) conducted a national validation study of the computerized report for the MSI in which clinicians indicated for each separate interpretive section of the computer-generated report whether that section was concise, confirmed the therapist's own clinical impressions of the couple, omitted important information, was useful for diagnosis or treatment, was accurate, and included important new information. Overall, clinicians' ratings strongly supported the accuracy, clinical usefulness, and clarity of the MSI computerized report. The majority of clinicians (83%) judged the report to accurately describe the couple and their problems. In terms of clinical utility, the same percentage of clinicians (83%) rated the report as helpful for planning treatment and 63% responded that it pointed out things about the couple that were not noticed previously. In addition, almost all clinicians (98%) felt that the overall computerized MSI report was wellorganized and clear in its presentation. Because the new MSI-R computerized report retains the same overall structure and actuarial basis of interpretation, these findings may be expected to generalize to the revised report for the MSI-R as well. Use of the MSI-R for Treatment Planning General Considerations In clinical applications, it is particularly important that respondents understand the rationale underlying assessment with the MSI-R and the intended use of test results. It is important to present the MSI-R as one component in a systematic effort to understand

< previous page

page_1188

next page >

< previous page

page_1189

next page > Page 1189

the couple's relationship in its entirety and to plan interventions that will be most meaningful and effective for the couple. In most cases, the couple should be reassured thatalthough initially completing the inventory separately and without collaborationresults will be presented and discussed conjointly. The MSI-R should be presented as an opportunity for each partner to describe their relationship with regard to both its strengths and limitations. The inventory may provide an opportunity to articulate concerns not yet assessed in interview, or it may allow the respondent to indicate thoughts or feelings that have thus far been difficult to convey verbally. It is often helpful to explain that the MSI-R results can be expected to prove useful in prioritizing each partner's relationship concerns and to provide a more objective basis for comparing differences and similarities in partners' views. Most importantly, test findings should be framed in the context of clinical hypotheses to be examined more fully in collaborative discussion between the couple and the therapist. For a couple entering therapy for assistance with their relationship, the role of the MSI-R in assessment and planning useful interventions has an obvious and intuitive appeal. However, in other clinical contexts (e.g., the evaluation of couples in which one partner has been psychiatrically hospitalized, or the assessment of a couple's marriage when their child or adolescent has entered therapy for emotional or behavioral difficulties), presenting the MSI-R requires special sensitivity to respondents' potential anxiety or defensiveness regarding assessment of their relationship. In such situations, it is sometimes helpful to conceptualize relationship concerns as possibly co-occurring with or resulting from other stressors or difficulties. Application of Clinical Findings to Treatment Planning In planning meaningful interventions with couples, it is crucial that the therapist assess each partner's subjective experience and appraisal of the relationship. As a multidimensional measure delineating specific sources of relationship distress, the MSI-R facilitates formulation of therapeutic interventions tailored to the specific concerns of the couple. The amount of information generated by the MSI-R profile is substantial and increases considerably when two profiles are assessed interactively. Identification of Relationship Concerns. A primary feature of the MSI-R is its usefulness in distinguishing areas of relative satisfaction or distress in the couple's relationship. Examples of treatment implications derived from differential scale elevations are noted in the systematic approach to profile interpretation described earlier. Similar implications can be drawn from related configural analysis both within and between partners' profiles. For example, an important interpretive consideration involves the extent to which high distress reflected elsewhere in the MSI-R profile accompanies only moderate scores on the Global Distress (GDS) scale. Such results often reflect the relative recency of specific conflicts or the couple's ability to sustain positive interactions in most areas despite distinct concerns. For individuals displaying this pattern, it can sometimes be helpful to note that although distress in some areas may have reached acute levels, the resilience noted in other domains and in overall relationship sentiment may provide a foundation for establishing a collaborative stance between partners for resolving more focused difficulties. It can sometimes be useful to contrast scores on Affective Communication (AFC) and Time Together (TTO) (reflecting emotional and behavioral intimacy) with scores

< previous page

page_1189

next page >

< previous page

page_1190

next page > Page 1190

on Problem-solving Communication (PSC) and Aggression (AGG) (reflecting difficulties in resolving differences and anger dyscontrol). Individuals obtaining low or moderate scores on AFC and TTO compared to higher scores on PSC or AGG have retained feelings of affection and attachment in their relationship despite their differences and sometimes intense conflict; such persons tend to retain fairly strong motivation for engaging in couples therapy aimed at resolving disagreements and improving conflict resolution skills. By contrast, individuals obtaining low or moderate scores on PSC and AGG compared to higher scores on AFC and TTO often have contained conflict at the cost of intimacy through disengagement and isolation, and are sometimes more difficult to involve in the challenges of couples therapy. Receptiveness to Treatment. Indicators of an individual's preparedness to engage constructively in couples therapy include partners' scores on the two validity scales (INC and CNV), as well as scores on Global Distress (GDS). For example, high scores on the Inconsistency (INC) scale, suggesting careless or random responding, should prompt the examiner to discuss with the respondent their approach to completing the MSI-R and views toward the assessment process. If a collaborative stance toward assessing the couple's relationship can be established, then readministration of the MSI-R may be warranted. Moderate scores on Conventionalization (CNV) among individuals entering couples therapy reflect a level of idealistic distortion or sentimentality unusual for persons seeking assistance with their relationship, and often indicate a reluctance to take a more objective or critical view of relationship difficulties. It is particularly important with such persons to provide a supportive therapeutic atmosphere that encourages close examination of relationship difficulties while reassuring the individual that discussion of negative feelings or events can serve as a vehicle for relationship growth rather than alienation. High scores on CNV among individuals entering couples therapy are rare and invariably reflect a level of defensiveness and resistance to discussing relationship conflict seen when respondents have entered counseling reluctantly at their partner's urging. On occasion, such scores occur when respondents are engaged in frantic denial of relationship difficulties in the face of their partner's unhappiness or threats to leave the relationship. The respondents' level of anxiety regarding discussion of relationship conflict, and differences between the partners' views of the relationship, will need to be resolved before a more collaborative approach to couples therapy can be established. High scores on Global Distress (GDS) reflect extensive relationship dissatisfaction; conflicts are likely to be of long duration and to have generalized across diverse areas of the couples' interactions. Individuals obtaining high GDS scores describe substantial disappointment in the relationship and are likely to have concerns or doubts regarding the relationship's future. When obtained by individuals at the beginning of couples therapy, careful assessment should be made of respondents' motivations for counseling and steps they may have taken toward separation or divorce. Information regarding the level of commitment required to modify long-standing negative relationship patterns should be provided, and some indication obtained of the respondent's willingness to suspend any steps toward relationship dissolution and engage in collaborative efforts toward change. Selection of Treatment Approach. A variety of relatively brief, focused interventions have been developed for couples experiencing difficulties in circumscribed domains. These include brief directive sex therapy, various parent-training protocols, financial counseling and debt management programs, and time-limited approaches to relationship enhancement emphasizing emotional expressiveness and shared activities. In each case, the effectiveness of these approaches may be compromised by more pervasive relationship

< previous page

page_1190

next page >

< previous page

page_1191

next page > Page 1191

deficits in attachment or conflict resolution. Assessment of the couple's overall relationship using the MSI-R can help to distinguish those couples for whom brief structured interventions may be of limited usefulness. For example, in an outcome study of 26 couples treated at a sexual dysfunctions clinic, Snyder and Berg (1983b) found that couples most likely to profit from a symptom-focused and behaviorally oriented treatment program for sexual dysfunctions were those initially characterized by flexibility in sharing intimacy, expressing affection, and empathizing with their spouse. They suggested two criteria by which the clinician would be prudent to defer brief directive sex therapy in favor of clinical interventions designed to promote more general relationship satisfaction: when the absolute intensity of global relationship distress exceeds moderate levels (GDS>60T); or when the relative elevation of SEX to other clinical scales reveals the couple to have primary distress around nonsexual aspects of their relationship such as communication and intimacy. Similarly, in comparing 32 couples seeking assistance at a nonprofit agency providing financial counseling services with 25 couples in marital therapy, Aniol and Snyder (1997) found that approximately one third of couples seeking financial counseling had general relationship distress (including problem-solving difficulties) above the mean on this dimension for couples entering marital therapy. They noted that the therapist must engage such couples in collaborative assessment and goal setting that address both specific challenges regarding finances as well as more fundamental structural and affective resources within the marriage (e.g., communication skills and intimacy) that enable the couple to meet these challenges. In addition to scores on the MSI-R, such assessment might include observation of both fiscal knowledge and communication processes in structured tasks involving financial concerns (e.g., negotiating a budget). Couples presenting with primary concerns regarding child rearing, but exhibiting marked deficits in their own communication with each other, require basic interventions in conflict resolution and constructive listening skills in addition to interventions emphasizing knowledge of child development and positive behavior influence techniques. Similarly, partners' views toward marital and parental roles and sharing of child rearing responsibilitiesreflected in their respective scores on Role Orientation (ROR)influence both the pace and content of interventions aimed at collaborative parenting. Finally, from a broader systemic perspective, partners' scores on the Family History of Distress (FAM) scale indicate the extent to which treatment interventions focusing on respective families of origin may be instrumental to successful couple therapy. In addition to reflecting inadequate family models for effective relationship skills, high scores on FAM often suggest enduring conflicts regarding intimacy and autonomy that exert a deleterious influence on the current relationship; in such cases, interventions emphasizing a developmental approach to interpersonal patterns may be particularly beneficial. High scores on FAM also indicate a need to examine the extent to which members of the extended family wield an intrusive role in the couple's daily interactions, suggesting the merit of structural interventions aimed at strengthening appropriate boundaries between the couple and respective families of origin. Utility of Collateral Interventions. Mirroring issues of treatment selection and occasions when brief directive interventions may be insufficient are other occasions when couple therapy may benefit from collateral interventions with one or more family members in an individual context. For example, in addition to couple interventions directed by high levels of physical aggression indicated by scores on the Aggression (AGG) scale, one or both partners may benefit from extended individual therapy emphasizing specific anger management techniques. Similarly, specific sexual dysfunctions contributing to high

< previous page

page_1191

next page >

< previous page

page_1192

next page > Page 1192

scores on the Sexual Dissatisfaction (SEX) scale (e.g., primary vaginismus) may warrant collateral interventions on an individual basis. High scores on the Disagreement About Finances (FIN) scale and evidence of extensive financial liabilities exacerbating other relationship conflicts may suggest the usefulness of independent legal and financial counseling concerning debt restructuring. Similarly, prominent concerns regarding child rearing (CCR) or emotional and behavioral adjustment problems of one or more children (DSC) may indicate the usefulness of family therapy or individual adjunctive therapy for the youngster(s) experiencing significant difficulties. In each of these circumstances, an important consideration involves the overall resources of the couple to deal with diverse challenges confronting the family or their own relationship, reflected by the levels of affective, behavioral, and sexual intimacy between partners (AFC, TTO, SEX) and their overall ability to deal constructively with relationship conflicts (PSC, AGG). A further consideration involves the relative generalization of distress across multiple levels of the family system (e.g., involvement of the children or extended family in contributing to conflict versus buffering against relationship distress as reflected by scores on DSC, CCR, and FAM). Scores on the MSI-R can provide useful information regarding the potential benefit of supplementing conjoint couple therapy with alternative treatment modalities. Predicting Treatment Outcome Initial evidence for the MSI's usefulness in predicting treatment outcome was reported by Snyder and Berg (1983b) in a study of 26 couples entering brief directive sex therapy. Couples' pretreatment MSI scores were correlated with posttreatment ratings of dissatisfaction with the frequency of intercourse and individuals' lack of affection for their partner. Using multiple regression analysis, pretreatment scores on Affective Communication (AFC), Time Together (TTO), Disagreement About Finances (FIN), and Sexual Dissatisfaction (SEX) accounted for 29% of the variance in couples' posttreatment ratings of residual sexual distress. Similarly, pretreatment scores on four MSI scales (Problem-solving Communication, PSC; Sexual Dissatisfaction, SEX; Family History of Distress, FAM; and Dissatisfaction with Children, DSC) accounted for 30% of the variance in couples' posttreatment ratings of residual lack of affection for their partner. These results suggested that couples most likely to profit from a symptom-focused and behaviorally oriented treatment program for sexual dysfunctions are those initially characterized by flexibility in sharing intimacy, expressing affection, and empathizing with their partner. Additional evidence regarding the MSI's usefulness in predicting couples' long-term response to therapy was obtained in a NIMH treatment outcome study (Snyder, Mangrum, & Wills, 1993; Snyder, Wills, & GradyFletcher, 1991). Data regarding marital status and marital accord were obtained for 55 couples 4 years after they had completed marital therapy. Of these 55 couples, 11 couples (20%) had divorced and an additional 8 couples (15%) who were still married reported significant relationship distress. Couples' scores on the MSI at intake were correlated with their status at termination (distressed vs. nondistressed) and with their status 4 years after completing treatment (distressed or divorced vs. nondistressed) to evaluate the utility of initial MSI assessment data in predicting short- and long-term treatment response. Results are shown in Table 38.4. Overall, pretreatment MSI scores on four profile scales (CNV, GDS, AFC, TTO) predicted initial response at termination for both husbands and wives; the best intake predictor of initial treatment response was the Global Distress (GDS) scale. By comparison, only Problem-solving Communication (PSC) predicted couples' long-term response

< previous page

page_1192

next page >

< previous page

page_1193

next page > Page 1193

to therapy at 4-year follow-up for both husbands and wives. These results were consistent with previous research indicating that higher levels of negative marital affect at intake, various indicators of spouses' disengagement, and concrete steps taken toward separation or divorce predict poorer treatment outcome. These findings also suggested that, although initial therapy response related particularly strongly to pretreatment affect, long-term response was better predicted by pretreatment conflict resolution skills. Given its overall predictiveness of both short- and long-term treatment response, additional analyses focused on the Global Distress (GDS) scale for its prediction of couples' eventual divorce following treatment. Individuals' responses on the GDS scale were reanalyzed based on new scoring criteria for the revised MSI-R. Results of these analyses are presented in Table 38.5. Overall, 11 of 55 couples (20%) treated in this study had divorced within 4 years following marital therapy. However, for couples having intake GDS scores ³ 66T, the percentage of couples eventually divorcing jumped to 45% in contrast to only 14% for couples with intake GDS scores £ 65T. Similarly, for couples having termination GDS scores ³ 61T, the percentage of couples eventually divorcing reached 50% in contrast to only 13% for couples with termination GDS scores £ 60T. Although pretreatment scores on the revised GDS scale ³ 66T are unlikely to comprise a viable exclusionary criterion for accepting couples into treatment, such scores provide valuable information to the therapist and both partners regarding the degree to which this relationship is at increased risk for eventual dissolution. Similarly, posttreatment GDS scores ³ 61T indicate residual risk for subsequent deterioration or divorce and may suggest either continued therapy to improve the relationship, interventions to facilitate a functional dissolution of the relationship, or additional assessment and potential reinitiation of treatment 3 to 6 months following suspension of therapy. TABLE.38.4 Correlation of intake MSI Measures with Therapy Outcome Termination Response 4-Year Outcome Husbands Wives Combined Husbands Wives Combined Intake MSI -.40* -.35* -.38* CNV .47* .61* .54* .28 .30* GDS .28 .33* .31* .20 AFC .32* .26 .31 .26* .29* PSC .29 .46* .38* TTO .23 .21* .32* .29* FIN .30 SEX ROR .37* .19 FAM .24* DSC CCR .18 Note. Based on Snyder, Mangrum, and Wills (1993) and Snyder, Wills, and GradyFletcher (1991). Inconsistency (INC) and Aggression (AGG) were not scored. Asterisks denote correlations significant at >IT+>p < .01; all others significant at p < .05. At termination, n = 59 couples; at 4-year follow-up, n = 55 couples. Material from the Marital Satisfaction Inventory, Revised, copyright © 1997 by Western Psychological Services. Used by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, USA. Not to be reprinted in whole or in part for any additional purpose without the expressed, written permission of the publisher. All rights reserved.

< previous page

page_1193

next page >

< previous page

page_1194

next page > Page 1194

TABLE 38.5 Predicting Divorce 4 Years Posttherapy from Couples' Scores on the Revised Global Distress (GDS) Scale at Intake and Termination Couple's Average GDS Score at Intake 4-Year Outcome 55T-65T £ 54T ³ 66T 30 Happily Married 2 4 6 Distressed 0 2 Married 6 Divorced 0 5 Base Rate for divorce = 11/55 = .20 p (Divorce I (Intake GDS £ 65T)) = 6/44 = .14 p (Divorce I (Intake GDS ³ 66T)) = 5/11 = .45 Couple's Average GDS Score at Termination 4-Year Outcome 50T-60T £ 49T ³ 61T 30 Happily Married 5 1 4 Distressed 0 4 Married 5 Divorced 1 5 Base Rate for divorce = 11/55 = .20 p (Divorce I (Termination GDS £ 60T)) = 6/45 = .13 p (Divorce I (Termination GDS ³ 61T)) = 5/10 = .50 Note. Based on Snyder, Wills, and Grady-Fletcher (1991). Findings reflect reanalyses of the Global Distress (GDS) acale based on revised scoring criteria. n = 55 couples. Material from the Marital Satisfaction Inventory, Revised, copyright © 1997 by Western Psychological Services. Used by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, USA. Not to be reprinted in whole or in part for any additional purpose without the expressed, written permission of the publisher. All rights reserved. Integration of Clinical Findings with Other Evaluation Data MSI-R profiles should be viewed in the context of additional data acquired during interview, through direct observation, and in conjunction with additional responses to interpretive hypotheses provided by the couple. Specific areas of concern denoted by the MSI-R profile may indicate the potential benefit of more detailed assessment in those domains. For example, a variety of instruments are available for detailed evaluation of couples' sexual relationship, leisure time together, verbal and nonverbal communication, marital and parental roles, and similar domains. Several sourcebooks describing such measures are available (cf. Fredman & Sherman, 1987; Grotevant & Carlson, 1989; Jacob & Tennenbaum, 1988; L'Abate & Bagarozzi, 1993; Straus & Brown, 1978; Touliatos, Perlmutter, & Straus, 1990). However, a common concern regarding these measures involves limited information addressing their reliability, validity, and treatment utility (Snyder & Rice, 1996). In addition, couple therapy often must consider important aspects of individual or family functioning not addressed by the MSI-R. For example, elevations on the two

< previous page

page_1194

next page >

< previous page

page_1195

next page > Page 1195

child-related scales of the MSI-R have been found to relate to a broad spectrum of childhood and adolescent emotional and behavioral difficulties, and may warrant extended psychological evaluation of one or more children in the family (Snyder, Klein et al., 1988). Similarly, individual measures of psychopathology may be useful in identifying partners potentially benefitting from adjunctive psychological or pharmacological interventions. Separate from issues of psychopathology, individual differences in partners' emotional and cognitive styles contributing to relationship conflict can sometimes be elucidated by measures of normal personality functioning. Uses and Limitations of the MSI-R for Treatment Planning in Managed Care Settings The MSI-R presents a low cost, low effort method for gathering sensitive information early in the treatment process. It allows partners to communicate concerns and perspectives they are eager to transmit, and thus facilitates a collaborative alliance with the therapist. The content of profile scales lends itself readily to specific interventions reflecting the couple's unique constellation of relationship strengths and liabilities. As a multidimensional measure assessing relationship behaviors germane to, but not necessarily targeted by, treatment, it is ideally suited as an objective outcome measure at termination and follow-up. A strength of the MSI-R rests in its assessment of partners' subjective experiences of their relationship. However, similar to other self-report measures, the MSI-R is also susceptible to distortions influencing respondents' appraisals. An actuarial approach to profile interpretation and the presence of two validity scales on the MSI-R (INC and CNV) reduce but do not eliminate concerns regarding this potential response bias. Thus, it is important to integrate scores on the MSI-R with other assessment findings and the context in which evaluation data have been obtained. For some couples, assessment using the MSI-R may simply confirm relationship concerns already apparent to both partners; hence, the MSI-R may offer fewer unique contributions to treatment planning for couples demonstrating at intake a clear understanding of conflicts contributing both to their own and their partner's dissatisfaction. Other couples may reject this opportunity to engage in a collaborative evaluation of their relationship, or may use information provided by the MSI-R in an antagonistic, destructive manner. For such couples, the clinician's sensitivity and skills play a critical role in framing relationship difficulties and individual concerns in such a way as to maintain the therapeutic alliance and elicit a collaborative response. For many individuals, howeverparticularly those less able to articulate their own or their partner's concernsscores on this inventory serve to highlight and place in perspective various sources of relationship distress (Snyder, Lachar, & Wills, 1988). Providing Couples with Feedback Clinical assessment should be a collaborative process between the client and clinician. A cogent rationale for structured assessment must be presented, results of the initial assessment must be provided in a comprehensible manner sensitive to the concerns of the respondent, the client should be invited to respond to initial interpretation of findings

< previous page

page_1195

next page >

< previous page

page_1196

next page > Page 1196

with elaborations or challenges, and the clinician and client must then work together to explore the implications of assessment findings for treatment. The relatively atheoretical structure of the MSI-R and the obvious relevance of scale content to concerns typical of couples facilitate the provision of feedback and collaborative assessment process just noted. Various models for timing the initial administration of the MSI-R, providing results to couples in treatment, and incorporating this inventory in evaluating outcome have been presented elsewhere (Snyder, 1981, 1983, 1990, 1997; Snyder et al., 1995; Snyder, Lachar, & Wills, 1988; Wills & Snyder, 1982). One approach is to have both partners complete the MSI-R independently either immediately before or after the initial interview, to score both protocols during the week, and to provide the couple with a description and interpretation of results conjointly during the second session. Although affording advantages of brief assessment and immediate feedback, this approach sometimes limits the amount of interview material incorporated into planning of therapeutic interventions and may raise pragmatic concerns regarding time constraints in test administration. An alternative approach extends the initial assessment and interpretive phase across four sessions. During the initial session, the couple is interviewed conjointly to solicit general background information and problem identification. On the second visit, one individual (usually the more distressed of the two) is interviewed individually while the partner completes the MSI-R; on the third visit, these roles are reversed. During the fourth session, extensive feedback is provided to the couple conjointly, incorporating both interview material and test results to formulate an individualized treatment plan collaboratively. Depending on the length and progress of therapy, the MSI-R may be administered again at 3 to 6 months following the initial evaluation. This is frequently a helpful procedure for documenting changes that have occurred and establishing new directions for therapeutic endeavor. Readministration of the MSI-R at termination facilitates a review and integration of gains the couple has acquired. Residual areas of distress may be discussed in terms of constructive alternatives the couple may adopt on their own or as potential areas for further exploration during additional treatment at a subsequent time. A general practice is to provide both partners with their own copies of the computer-based narrative for the MSI-R. In initially presenting results, the MSI-R profile provides the basis for interpretation because results graphically depict partners' respective differences in sources and degrees of relationship distress. Each partner is encouraged to review the narrative report during the week and to pursue reactions, questions, or additional concerns during the following session. Use of the MSI-R for Treatment Outcome Assessment Evaluation of the MSI Against Criteria for Outcome Measures The MSI-R compares favorably to previously identified criteria for evaluating outcome assessment instruments (Newman & Ciarlo, 1994). Relevance. The MSI-R was developed specifically to assess both the source and extent of relationship distress across multiple domains previously identified in both the clinical and empirical literature. Its rational-deductive approach to scale development

< previous page

page_1196

next page >

< previous page

page_1197

next page > Page 1197

results in test items and scale interpretations that have direct relevance to couples experiencing relationship distress. Ease of Procedures. Use of the MSI-R requires no special equipment, facilities, or specialized training. The MSI-R AutoScoreÔ answer form permits traditional hand scoring; interpretive guidelines for distinct scale score ranges are provided in the test manual. Computer-assisted administration, scoring, and interpretation are also available for users with their own microcomputer equipment; alternatively, users can mail or fax answer sheets to the test publisher for computerized scoring and interpretation. Referents. The MSI-R profile form facilitates objective comparison of individuals' scores to men and women sampled from the general population. Individuals' scores can also be compared to specific criterion groups, including couples beginning or completing marital therapy, couples treated for specific sexual dysfunctions, physically battered women seeking refuge at a spouse abuse shelter, couples in which one or both partners are in individual treatment for nonmarital emotional or behavioral disorders, and parents of psychiatrically hospitalized children or adolescents. Multiple Perspectives. The MSI-R elicits the subjective experiences of both partners; areas of both convergence and divergence facilitate tailoring interventions specific to the needs of the couple. From an actuarial basis, individuals' scores are empirically linked to their own and their partner's views of the relationship in addition to independent appraisals by couple therapists. Treatment Linkage. In contrast to global measures of marital distress, the MSI-R identifies distinct areas of relationship concern; both the profile and interpretive report suggest meaningful interventions most likely to result in favorable treatment outcome. Psychometric Features. The MSI-R scales possess high levels of both internal consistency and temporal stability. The actuarial approach and empirical findings on which interpretation of the MSI-R is based distinguish this instrument from virtually every other measure of marital or family functioning reported in the literature. Potential respondent bias is assessed directly by measures of response inconsistency and social desirability. Cost. As a self-report measure, the MSI-R permits a wide range of information specific to relationship distress to be obtained at minimal cost. Hand scoring can be conducted by clerical staff. Computer-based interpretation further reduces the professional time required for clinical use. The cost of administering, scoring, and interpreting the MSI-R for both partners using either microdiskettes or mail-in services typically amounts to less than 20% of the fee for one therapy session. Consumer Acceptance. The MSI-R samples from domains directly relevant to couples' typical presenting complaints; consequently, the instrument has high face validity for couples entering therapyin contrast to measures of personality or relationship measures linked to particular theoretical or conceptual models. Similarly, dimensions assessed by the MSI-R have high content validity for professionals outside mental health for whom marital therapy outcome evaluation may have direct relevance (e.g., physicians, clergy, attorneys, judges, or departments of social service). Ease of Interpretive Feedback. Interpretive feedback regarding MSI-R test results is facilitated by the following: face validity of scale content; graphic display of profile scores permitting comparison of partners' results to each other and to couples from the

< previous page

page_1197

next page >

< previous page

page_1198

next page > Page 1198

general population; group mean profiles for specific clinical populations presented in the test manual facilitating couples' comparisons to these groups; shaded ranges of distress on the MSI-R profile form identifying high, moderate, and low scores on each dimension; concise interpretive guidelines for scores on each scale presented in the test manual; and computerized interpretation for the MSI-R using either microdiskettes or fax/mail-in services, providing evaluations of each scale and profile configurations within and between partners. Usefulness Across Clinical Functions. As a diagnostic and therapeutic procedure, the MSI-R is used in the initial phases of therapy in discussing couples' presenting complaints and in formulating therapeutic goals; it can be used throughout therapy and at termination in the evaluation of change and redirection of treatment interventions. The MSI-R can also be used didacticallyfor example, reviewing item content from the two communication scales to facilitate discussion of essential components of active listening and conflict resolution. The MSI-R also serves as a screening measure in clinical populations for which relationship concerns may not comprise the primary reason for seeking treatment (e.g., parents of psychiatrically referred children). Theoretical Compatibility. Because the MSI-R is relationship-specific and focuses on actual elements related to a couple's interaction, it can be readily incorporated by clinicians across a broad spectrum of theoretical orientations. Its relatively atheoretical structure neither requires nor precludes higher order conceptualizations for incorporation in diverse treatment approaches. Research Findings and Clinical Applications Evaluating the effectiveness of treatment interventions with a distressed couple should comprise a process continuing throughout the course of therapy. As noted earlier, the MSI-R can be readministered at multiple points during treatment to evaluate and consolidate gains that the couple has made and to identify residual areas of distress for further work. This idiographic approach to outcome evaluation emphasizes within-partner change across time. In general, three approaches can be incorporated for evaluating the meaningfulness of change across time, including attainment of statistically reliable change, movement from one scale score range to a different range denoting less distress, and approximation of the individual's profile to the mean profiles for couples terminating treatment or from the general population. Attainment of Statistically Reliable Change. Jacobson and Truax (1991) reviewed developments over the last decade for evaluating meaningful change in psychotherapy. They advocate a reliable change (RC) index, computed as 1.96 times a measure's standard error of difference. When the absolute degree of an individual's change exceeds the RC index, a meaningful difference can be inferred that would occur by chance less than 5% of the time. A less conservative measure of change is the standard error of difference itself. In general, it can be concluded that a change in scores exceeding the standard error of difference would occur by chance less than 32% of the time. Table 38.6 presents both the standard error of difference and more conservative RC index required for each MSI-R scale to infer reliable change on that measure.

< previous page

page_1198

next page >

< previous page

page_1199

next page > Page 1199

TABLE 38.6 Coefficients of Temporal Stability, Standard Errors of Difference, and Reliable Change (RC) Index for Inferring Meaningful Differences in T-Scores Across Time Scale rtt M1 M2 Standard Error of Reliable Change (RC) (SD1) (SD2) Difference Index 18.89 INC .52 47.4 46.7 9.64 (9.4) (9.4) 12.45 CNV.78 50.5 52.2 6.35 (9.4) (8.9) 13.05 GDS .74 47.9 47.6 6.66 (8.3) (8.1) 11.76 AFC .79 47.9 47.8 6.00 (9.1) (9.0) 11.39 PSC .82 47.8 47.5 5.81 (9.3) (10.0) 10.94 AGG.81 49.7 48.3 5.58 (9.6) (9.6) 12.56 TTO .77 49.3 49.1 6.41 (9.5) (10.2) 13.21 FIN .74 49.6 49.3 6.74 (9.2) (9.3) 11.54 SEX .81 49.8 49.6 5.89 (9.7) (10.3) 9.17 ROR .88 50.5 50.1 4.68 (9.1) (9.3) 10.54 FAM.84 49.3 48.6 5.38 (9.8) (10.5) 11.94 DSC .79 49.4 49.2 6.09 (10.3) (10.0) 12.58 CCR .74 48.6 48.3 6.42 (8.0) (8.0) Note. For DSC and CCR, n = 153; for remaining scales, n = 210. Mean test-retest interval was 6 weeks. Reliable change (RC) index = 1.96 times the standard error of difference. Material from the Marital Satisfaction Inventory, Revised, copyright © 1997 by Western Psychological Services. Used by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, USA. Not to be reprinted in whole or in part for any additional purpose without the expressed, written permission of the publisher. All rights reserved. Change in Scale Score Range. In addition to evaluating whether a respondent's scores on the MSI-R reflect reliable change, clinicians can examine whether a given change denotes transition from a distressed to nondistressed (or less distressed) status. This involves determining whether scale scores on a given dimension at Time 1 and Time 2 fall within the same or different ranges, using guidelines developed on the basis of actuarial analyses of scale correlates. Scale ranges for each MSI-R scale (low, moderate, or high) are noted in the test manual as well as by shaded zones on the profile form. For example, the cutoff points distinguishing the moderate range is from 50T to 60T for each of the MSI-R scales except Inconsistency (55T-65T), Conventionalization (45T-55T), and Family History of Distress (45T-55T). Approximation of Profile to Nondistressed or Posttreatment Populations. In addition to comparisons between profiles across time, it can be useful to compare respondents' profiles to group mean profiles for couples beginning and completing therapy in a nomothetic approach to evaluating outcome. Group mean profiles for the 59 couples treated in the Snyder and Wills (1989) study comparing behavioral versus insight-oriented

< previous page

page_1199

next page >

< previous page

page_1200

next page > Page 1200

marital therapies at intake and at termination are presented in Table 38.2. The mean pretreatment profile is similar to those obtained previously for couples in marital therapy (Snyder, 1981) and is characterized by a low score on Conventionalization (40T), moderate to high scores (ranging from 60T-65T) on Global Distress and the two communication scales (AFC and PSC), and moderate scores (55T-60T) on remaining scales of relationship distress (Time Together, TTO; Disagreement About Finances, FIN; Sexual Dissatisfaction, SEX; Dissatisfaction with Children, DSC; and Conflict Over Child Rearing, CCR). By termination, these couples' mean profile showed greatest reduction on Global Distress and scales comprising the affective triad (AFC, PSC, and TTO), with scores on these measures moving to the nondistressed range (50T-55T). Couples' mean termination profile also exhibited a small increase on Conventionalization (5 T-score points) and moderate reductions on scales assessing specific domains of relationship distress (TTO, FIN, SEX, DSC, and CCR). This group termination profile included responses from couples who subsequently divorced within a 4-year period. Overall, couples' MSI-R profiles in response to successful marital therapy could be expected to approachbut most likely not reachscores of 50T across most scales. Again, it should be emphasized that a high degree of within-group variability exists for couples in therapy across all scalesparticularly those assessing specific sources of conflict (FIN, SEX, DSC, and CCR), Role Orientation (ROR), and Family History of Distress (FAM). Consequently, an integrated approach to evaluating outcome is recommended, incorporating both idiographic and nomothetic perspectives (i.e., analysis of changes in an individual's MSI-R profile across time and comparison at termination to group mean profiles for couples completing treatment). Additional Considerations Although generally a reduction is anticipated in relationship distress reflected in the MSI-R profile as treatment continues, there are occasions when an increase in subjective distress on the MSI-R may indicate important therapeutic gains. This interpretation appears most warranted when the respondent initially has completed the MSI-R adopting a defensive response set characterized by high scores on Conventionalization (CNV) and atypically low scores on scales assessing specific areas of relationship distress. Such profiles sometimes occur among couples entering therapy where the more distressed member has initiated treatment but the partner resists acknowledging relationship difficulties. Case Study Background Data Mike and Brenda, both age 35 and physicians at a community hospital, were referred for couple therapy by a colleague. Mike worked in anesthesiology and Brenda worked in internal medicine. They had a 2-year-old son, Tyler, who had been born after several years of unsuccessful attempts at conception.

< previous page

page_1200

next page >

< previous page

page_1201

next page > Page 1201

The couple presented with difficulties in communication characterized by increasing frequency of arguments and emotional withdrawal, as well as difficulties in their sexual relationship. Brenda stated that she had been unhappy with their level of emotional intimacy and Mike's lack of emotional expressiveness for several years, and she had mentioned their need for couple therapy to him previously. She stated that there was no single precipitating event that caused her to seek therapy at this time, but rather that there had been an accumulation of factors. Primary among these had been disappointments at work for both of them, stresses related to their medical practices, and also Brenda's dissatisfaction with salary and other practical considerations of her job. She acknowledged feeling depressed, but believed her depression resulted as much from lack of marital intimacy as it did from job-related stressors. Mike acknowledged that he, too, was unhappy in the marriage. Although he did not experience the same lack of emotional closeness as did Brenda, he was hurt by her apparent anger with him and their frequent arguments. He also reported frustration with a progressive decline in their frequency of sexual relations. Both partners appeared deeply committed to their son and described him as the light of their lives. Brenda described Mike as a wonderful father, and Mike became tearful in discussing the joys he experienced in fatherhood. Brenda stated that Mike was more patient than she and better able to play in childlike ways with Tyler. In describing their families of origin, Mike stated that his father had been a computer consultant and his mother a teacher; he described as one of his earliest memories his father's ''verbally malicious personality." Brenda's father sold medical equipment and her mother had been a homemaker. She described warm, secure attachments with both parents and with several siblings. Both partners expressed a commitment to their marriage and to each other. Both showed considerable intellect and a moderate capacity for introspection, although Mike's understanding of self emphasized cognitive processes to the neglect of emotional components. Initial Profiles on the MSI-R Mike's and Brenda's initial profiles on the MSI-R are shown in Fig. 38.1. Mike's scores on the two validity scales indicated a moderate level of inconsistency in responding (INC), which both partners subsequently interpreted as consistent with his tendency to express himself in a cautious and vacillating manner. Typical of couples entering therapy, neither partner showed a tendency to describe their marriage in an unrealistically positive manner (CNV). Both Mike and Brenda acknowledged considerable global distress in their marriage (GDS). Both partners expressed dissatisfaction with the amount of affection shown by their partner (AFC); persons with similar scores typically describe their partner as emotionally distant and uncaring. Brenda's discontent in this regard was particularly striking and suggested that the lack of emotional warmth she experienced from Mike seriously threatened the viability of their marriage. Mike's and Brenda's MSI-R profiles also suggested that the couple's difficulties in sharing intimate feelings likely exacerbated their difficulties in resolving differences (PSC). Patterns of nonconstructive communication indicated by their scores included exchange of angry feelings, failure to acknowledge each other's views, and attributing their partner's behavior to hostile motives.

< previous page

page_1201

next page >

< previous page

page_1202

next page > Page 1202

Fig. 38.1. Intake MSI-R profiles for Mike and Brenda. Material from the Marital Satisfaction Inventory, Revised, copyright © 1997 by Western Psychological Services. Used by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, USA. Not to be reprinted in whole or in part for any additional purpose without the expressed, written permission of the publisher. All rights reserved.

< previous page

page_1202

next page >

< previous page

page_1203

next page > Page 1203

Neither partner, however, expressed concern about disagreements escalating to a level of physical aggression or intimidation (AGG). Overall, Mike and Brenda evaluated specific dimensions of their relationship in a somewhat similar manner. With couples in therapy, this pattern reflects moderate agreement in partners' views of distress and often promotes their collaboration in identifying and prioritizing relationship treatment goals. Both spouses reported moderate dissatisfaction with leisure time spent with their partner (TTO). Their scores suggested few common interests and activities. Mike's and Brenda's responses indicated that it was unlikely that their distress in this area resulted entirely from insufficient time for leisure; instead, they each likely regarded the other as lacking a desire for greater time together. Both partners described finances as an area of relative agreement (FIN). Responses indicated that responsibilities concerning money were likely to be shared by both spouses with substantial agreement on financial priorities. By contrast, both Mike and Brenda reported dissatisfaction with their sexual relationship, with Mike reporting particularly strong concern in this domain (SEX). Their scores suggested that disagreements regarding the frequency or nature of sexual exchanges were frequent, and the couple was likely to have difficulty in discussing sexual matters openly and effectively. Scores on the MSI-R confirmed the couple's reports during interview regarding positive interactions regarding their son, Tyler. Each described the other as a good parent, affectionate to their son, and helpful with responsibilities of child rearing (CCR). Differences in child rearing attitudes were likely to be minor and resolved without arguments. Both Mike and Brenda described generally satisfying relationships with Tyler. Both viewed their son as contributing to their own commitment to the marriage and to their own personal fulfillment (DSC). Both Mike and Brenda described themselves as having fairly nontraditional views toward marital and parental roles (ROR). Persons giving similar responses generally believe that both partners ought to have equal influence in decisions regarding family matters. Both partners expressed attitudes favoring a more flexible division of household and child rearing responsibilities, with both pursuing independent careers and sharing equally in responsibilities at home. Finally, Mike and Brenda differed somewhat on the MSI-R in their description of relationships within the families in which they grew up (FAM). Consistent with her reports in interview, Brenda described a fairly happy childhood and positive feelings toward her siblings and parents. By comparison, Mike described significant disruption of relationships within his family. His responses in this regard suggested that nonconstructive ways of interacting in his family of origin should be carefully examined to determine the extent to which these contributed to similar problems in his own marriage. Course of Therapy The couple was provided with interpretive feedback based on their MSI-R profiles and agreed to pursue increases in emotional and behavioral intimacy as an initial goal. Discussion of time together and activities of mutual enjoyment served as a vehicle for developing improved communication skills. With Mike's support, Brenda described a pervasive sense of loneliness that contributed to her feelings of depression. Despite her high level of professional attainment, she had felt socially insecure since adolescence. She was uncomfortable initiating friendships with other women in the community, and

< previous page

page_1203

next page >

< previous page

page_1204

next page > Page 1204

this isolation was exacerbated by interpersonal conflicts at work. As a result, Brenda experienced a heightened need for Mike to satisfy her longings for companionship and he sometimes felt overwhelmed by her need. His own fatigue from work and his preference for solitary time conflicted directly with Brenda's wish to do more together. During the first few weeks of therapy, Brenda became less blaming toward Mike for her loneliness as he listened to her concerns in a caring and nondefensive manner. As Mike's empathy for Brenda's loneliness grew, he was able to collaborate with her in considering various social alternatives for them to pursue as a couple as well as for Brenda to pursue on her own. Brenda identified several women in her neighborhood whom she had met briefly previously, and she subsequently approached them about joining them on evening walks. In addition, Mike volunteered to assume more initiative for planning social engagements with other couples they both enjoyed. Mike developed more effective ways of letting Brenda know when he needed time alone, and his need for solitary time decreased as the quality of their relationship improved. After several weeks of therapy, Mike began to explore ways in which his own constricted emotional style had developed in his youth. His father had been a somewhat cold and demanding parent who tolerated little disagreement from his children. As the sole son, Mike had been determined not to suffer his father's harshness as his more emotionally expressive mother and sisters had. He learned to suppress his emotions and, during disagreements, to argue in an intellectual and detached manner. However, Mike also recognized the negative effects of his emotional style on his marriage, and he recalled fondly the warmth he had enjoyed in his relationship with his more nurturant mother. As Mike explored these developmental issues, Brenda developed greater empathy for his difficulties with emotional expressiveness. She became less angry at his emotional retreat at the first hint of conflict and found more constructive ways of engaging him in discussions of their respective feelings. Several structured exercises were prescribed during the course of therapy to assist the couple with recognizing diverse feelings, expressing these in a constructive manner, and listening empathically. Over the course of 4 months of therapy, Mike and Brenda consolidated improved patterns of communicating about feelings and resolving relationship conflicts. They implemented and sustained more satisfying methods for increasing time together as a couple. Their overall feelings about their marriage improved considerably, and as they approached termination they were administered the MSI-R a second time to evaluate the impact of therapy. Termination Profiles on the MSI-R Termination profiles for Mike and Brenda are shown in Fig. 38.2. Results confirmed dramatic improvements in relationship satisfaction for both partners. Both Mike and Brenda showed moderate reductions in overall marital distress (reductions on GDS of approximately 10 points), and substantial gains in satisfaction with the quality of emotional expressiveness, with their leisure time together, and with their ability to resolve differences in a constructive manner (reductions on AFC, TTO, and PSC of approximately 15 to 25 points). Mike's termination profile showed an increased recognition of relationship struggles in his family of origin (an increase on FAM of approximately 5 points), and he and Brenda discussed alternative ways of interacting with Mike's father that felt more satisfying for both of them.

< previous page

page_1204

next page >

< previous page

page_1205

next page > Page 1205

Fig. 38.2. Termination MSI-R profiles for Mike and Brenda. Material from the Marital Satisfaction Inventory, Revised, copyright © 1997 by Western Psychological Services. Used by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, USA. Not to be reprinted in whole or in part for any additional purpose without the expressed, written permission of the publisher. All rights reserved.

< previous page

page_1205

next page >

< previous page

page_1206

next page > Page 1206

Although both partners showed increased satisfaction with their sexual relationship (reductions on SEX of approximately 10 points), Mike continued to express moderate distress in this area of the marriage. The couple used two additional sessions to discuss their respective concerns and wishes for their sexual exchanges, and both partners expressed confidence that their sexual relationship would continue to improve without further therapy as they pursued additional avenues of emotional and behavioral intimacy. Conclusions In contrast to other measures of marital and family functioning, the clinical interpretation and application of the MSI-R rest firmly on extensive empirical findings for this instrument, a unique asset noted in several independent reviews of the MSI-R. As a clinical technique, the MSI-R more clearly directs the focus of therapeutic interventions. As a diagnostic and therapeutic procedure, the MSI-R is used in the initial phases of therapy in discussing couples' presenting complaints and in formulating therapeutic goals; it can be used throughout therapy and at termination in the evaluation of change and redirection of treatment interventions. A distinct advantage of the MSI-R is the inclusion of both broad- and narrow-band scales for assessing global distress and general response characteristics in addition to more specific sources of relationship discord. The relative ease of test administration and scoring makes the MSI-R a cost-effective means of generating objective assessment data across a broad range of issues relevant to both clinicians and couples entering treatment. The relatively atheoretical framework of the MSI-R facilitates its incorporation into various therapeutic contexts adopting different theoretical orientations. With its emphasis primarily on behavioral and attitudinal components of a couple's relationship, the MSI-R neither assumes nor precludes higher order inferences regarding either intrapsychic or systemic determinants of relationship distress. Additional assessment data from interview or other structured techniques are easily integrated with the MSI-R to suggest potential intrapersonal or systemic dynamics contributing to areas of relationship concern. The computerized report for the MSI-R integrates a broad range of research findings for this instrument in both a psychometrically valid and clinically useful manner. Future investigations examining configural interpretation of MSI-R profiles, incorporation of content through delineation of critical items, external criterion studies of the computerized report, and prediction of differential response to competing therapeutic modalities should all contribute to the clinical and research utility of this instrument. References Aniol, J. C., & Snyder, D. K. (1997). Differential assessment of financial and relationship distress: Implications for couples therapy. Journal of Marital and Family Therapy, 23, 347-352. Basco, M. R., Prager, K. J., Pita, J. M., Tamir, L. M., & Stephens, J. J. (1992). Communication and intimacy in the marriages of depressed patients. Journal of Family Psychology, 6, 184-194. Bascue, L. O. (1985). Review of the Marital Satisfaction Inventory. Test Critiques, 3, 415-418. Berg, P., & Snyder, D. K. (1981). Differential diagnosis of marital and sexual distress: A multidimensional approach. Journal of Sex and Marital Therapy, 7, 290-295.

< previous page

page_1206

next page >

< previous page

page_1207

next page > Page 1207

Blattberg, K. J., & Hogan, J. D. (1994). Marital distress across the mid-life transition among middle-class Caucasian women. Psychological Reports, 75, 497-498. Boen, D. L. (1988). A practitioner looks at assessment in marital counseling. Journal of Counseling and Development, 66, 484-486. Boughner, S. R., Hayes, S. F., Bubenzer, D. L., & West, J. D. (1994). Use of standardized assessment instruments by marital and family therapists: A survey. Journal of Marital and Family Therapy, 20, 69-75. Bromberger, J. T., Wisner, K. L., & Hanusa, B. H. (1994). Marital support and remission of treated depression: A prospective pilot study of mothers of infants and toddlers. Journal of Nervous and Mental Disease, 182, 4044. Burman, B., Margolin, G., & John, R. S. (1993). America's angriest home videos: Behavioral contingencies observed in home reenactments of marital conflict. Journal of Consulting and Clinical Psychology, 61, 28-39. Burnett, P. (1987). Assessing marital adjustment and satisfaction: A review. Measurement and Evaluation in Counseling and Development, 20, 113-121. Clipp, E. C., & George, L. K. (1992). Patients with cancer and their spouse caregivers. Cancer, 69, 1074-1079. Dixon, D. N. (1985). Review of the Marital Satisfaction Inventory. In J. V. Mitchell (Ed.), The ninth mental measurements year-book (Vol. 1, pp. 894-895). Lincoln, NE: University of Nebraska Press. Edleson, J. L., Eisikovits, Z. C., Guttmann, E., & Sela-Amit, M. (1991). Cognitive and interpersonal factors in woman abuse. Journal of Family Violence, 6, 167-182. Fowers, B. J. (1990). An interactional approach to standardized marital assessment: A literature review. Family Relations, 39, 368-377. Frank, B., Dixon, D. N., & Grosz, H. J. (1993). Conjoint monitoring of symptoms of premenstrual syndrome: Impact on marital satisfaction. Journal of Counseling Psychology, 40, 109-114. Fredman, N., & Sherman, R. (1987). Handbook of measurements for marriage and family therapy. New York: Brunner/Mazel. Grotevant, H. D., & Carlson, C. I. (1989). Family assessment: A guide to methods and measures. New York: Guilford. Hathaway, S. R., & McKinley, J. C. (1967). The Minnesota Multiphasic Personality Inventory manual. New York: Psychological Corporation. Heim, S. C., & Snyder, D. K. (1991). Predicting depression from marital distress and attributional processes. Journal of Marital and Family Therapy, 17, 67-72. Hoover, D. W., & Snyder, D. K. (1991). Validity of the computerized interpretive report for the Marital Satisfaction Inventory: A customer satisfaction study. Psychological Assessment, 3, 213-217. House, E. A. (1986). Sex role orientation and marital satisfaction in dual- and one-provider couples. Sex Roles, 14, 245-259. Houston, B. K., & Kelly, K. E. (1987). Type A behavior in housewives: Relation to work, marital adjustment, stress, tension, health, fear-of-failure and self esteem. Journal of Psychosomatic Research, 31, 55-61. Iverson, A., & Baucom, D. H. (1990). Behavioral marital therapy outcomes: Alternative interpretations of the data. Behavior Therapy, 21, 129-138. Jacob, T., & Tennenbaum, D. L. (1988). Family assessment: Rationale, methods, and future directions. New York: Plenum. Jacobson, N. S., Schmaling, K. B., Holtzworth-Munroe, A., Katt, J. L., Wood, L. F., & Follette, V. M. (1989). Research-structured versus clinically flexible versions of social learning-based marital therapy. Behaviour Research and Therapy, 27, 173-180. Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Jones, M. E., & Stanton, A. L. (1988). Dysfunctional beliefs, belief similarity, and marital distress: A comparison of models. Journal of Social and Clinical Psychology, 7, 1-14. Klann, N., & Hahlweg, K. (1995). Survey of the efficacy of marriage counseling: Experiences and obstacles. System FamilieForschung und Therapie, 8, 66-74. Kremer, E. F., Sieber, W., & Atkinson, J. H. (1985). Spousal perpetuation of chronic pain behavior. International Journal of Family Therapy, 7, 258-270.

L'Abate, L., & Bagarozzi, D. (1993). Sourcebook of marriage and family evaluation. New York: Brunner/Mazel.

< previous page

page_1207

next page >

< previous page

page_1208

next page > Page 1208

Locke, H. J., & Wallace, K. M. (1959). Short marital adjustment prediction tests: Their reliability and validity. Journal of Marriage and the Family, 21, 251-255. Markman, H. J., Floyd, F. J., Stanley, S. M., & Storaasli, R. D. (1988). Prevention of marital distress: A longitudinal investigation. Journal of Consulting and Clinical Psychology, 56, 210-217. Mitchell, M. G., & Rosenthal, D. M. (1992). Suicidal adolescents: Family dynamics and the effects of lethality and hopelessness. Journal of Youth and Adolescence, 21, 23-33. Morell, M. A., & Apple, R. F. (1990). Affect expression, marital satisfaction, and stress reactivity among premenopausal women during a conflictual marital discussion. Psychology of Women Quarterly, 14, 387-402. Negy, C., & Snyder, D. K. (1997). Ethnicity and acculturation: Assessing Mexican-American couples' relationships using the Marital Satisfaction Inventory-Revised. Psychological Assessment, 9, 414-421. Newman, F. L., & Ciarlo, J. A. (1994). Criteria for selecting psychological instruments for treatment outcome assessment. In M. E. Maruish (Ed.), Use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Ryan, B. A., Kawash, G. F., Fine, M., & Powel, B. (1994). The Family of Origin Scale: A construct validation study. Contemporary Family Therapy, 16, 145-159. Scheer, N. S., & Snyder, D. K. (1984). Empirical validation of the Marital Satisfaction Inventory in a nonclinical sample. Journal of Consulting and Clinical Psychology, 52, 88-96. Schroder, B., Hahlweg, K., Hank, G., & Klann, N. (1994). Sexual dissatisfaction and marital quality: Satisfactory sexuality equals satisfying relationship? Zeitschrift fur Klinische Psychologie, 23, 178-187. Schumm, W. R. (1990). Evolution of the family field: Measurement principles and techniques. In J. Touliatos, B. F. Perlmutter, & M. A. Straus (Eds.), Family measurement techniques (pp. 23-36). Newbury Park, CA: Sage. Smith, G. T., Snyder, D. K., Trull, T. J., & Monsma, B. (1988). Predicting relationship satisfaction from couples' use of leisure time. American Journal of Family Therapy, 16, 3-13. Snyder, D. K. (1979a). Marital Satisfaction Inventory. Los Angeles, CA: Western Psychological Services. Snyder, D. K. (1979b). Multidimensional assessment of marital satisfaction. Journal of Marriage and the Family, 41, 813-823. Snyder, D. K. (1981). Manual for the Marital Satisfaction Inventory. Los Angeles: Western Psychological Services. Snyder, D. K. (1982). Advances in marital assessment: Behavioral, communications, and psychometric approaches. In C. D. Spielberger & J. N. Butcher (Eds.), Advances in personality assessment (Vol. 1, pp. 169201). Hillsdale, NJ: Lawrence Erlbaum Associates. Snyder, D. K. (1983). Clinical and research applications of the Marital Satisfaction Inventory. In E. E. Filsinger (Ed.), Marriage and family assessment: A sourcebook for family therapy (pp. 169-189). Beverly Hills, CA: Sage. Snyder, D. K. (1990). The Marital Satisfaction Inventory: An actuarial approach to assessing relationships. In F. W. Kaslow (Ed.), Voices in family psychology (Vol. 2, pp. 261-271). Newbury Park, CA: Sage. Snyder, D. K. (1996). Inventario de Satisfacción Marital, Revisado (MSI-R) [Marital Satisfaction Inventory, Revised]. Los Angeles: Western Psychological Services. Snyder, D. K. (1997). Manual for the Marital Satisfaction Inventory-Revised. Los Angeles: Western Psychological Services. Snyder, D. K., & Berg, P. (1983a). Determinants of sexual dissatisfaction in sexual distressed couples. Archives of Sexual Behavior, 12, 237-246. Snyder, D. K., & Berg, P. (1983b). Predicting couples' response to brief directive sex therapy. Journal of Sex and Marital Therapy, 9, 114-120. Snyder, D. K., Cavell, T. A., Heffer, R. W., & Mangrum, L. F. (1995). Marital and family assessment: A multifaceted, multilevel approach. In R. H. Mikesell, D. D. Lusterman, & S. H. McDaniel (Eds.), Integrating family therapy: Handbook of family psychology and systems theory (pp. 163-182). Washington, DC: American Psychological Association. Snyder, D. K., & Costin, S. (1994). The Marital Satisfaction Inventory. In M. E. Maruish (Ed.), Use of psychological testing for treatment planning and outcome assessment (pp.

< previous page

page_1208

next page >

< previous page

page_1209

next page > Page 1209

322-351). Hillsdale, NJ: Lawrence Erlbaum Associates. Snyder, D. K., Cozzi, J. J., Grich, J., & Luebbert, M. C. (in press). The tapestry of couple therapy: Interweaving theory, assessment, and intervention. In S. H. McDaniel, D. D. Lusterman, & C. Philpot (Eds.), A casebook for integrating family therapy. Washington, DC: American Psychological Association. Snyder, D. K., Freiman, K. E., & Lachar, D. (1989, August). Convergent validity of the Marital Satisfaction Inventory: A national validation study. Paper presented at the meeting of the American Psychological Association, New Orleans, LA. Snyder, D. K., Fruchtman, L., & Scheer, N. (1980). Relationships of physically abused women: An objective appraisal. Unpublished manuscript. Snyder, D. K., Klein, M. A., Gdowski, C. L., Faulstich, C., & LaCombe, J. (1988). Generalized dysfunction in clinic and nonclinic families: A comparative analysis. Journal of Abnormal Child Psychology, 16, 97-109. Snyder, D. K., & Lachar, D. (1986). A computerized interpretation system for the Marital Satisfaction Inventory. Los Angeles: Western Psychological Services. Snyder, D. K., & Lachar, D. (1997). A computerized interpretation system for the Marital Satisfaction Inventory-Revised. Los Angeles: Western Psychological Services. Snyder, D. K., Lachar, D., Freiman, K. E., & Hoover, D. W. (1991). Toward the actuarial assessment of couples' relationships. In J. P. Vincent (Ed.), Advances in family intervention, assessment, and theory (Vol. 5, pp. 89122). London: Kingsley. Snyder, D. K., Lachar, D., & Wills, R. M. (1988). Computer-based interpretation of the Marital Satisfaction Inventory: Use in treatment planning. Journal of Marital and Family Therapy, 14, 397-409. Snyder, D. K., Mangrum, L. F., & Wills, R. M. (1993). Predicting couples' response to marital therapy: A comparison of short- and long-term predictors. Journal of Consulting and Clinical Psychology, 61, 61-69. Snyder, D. K., & Regts, J. M. (1982). Factor scales for assessing marital disharmony and disaffection. Journal of Consulting and Clinical Psychology, 50, 736-743. Snyder, D. K., & Regts, J. M. (1990). Personality correlates of marital dissatisfaction: A comparison of psychiatric, maritally distressed, and nonclinic samples. Journal of Sex and Marital Therapy, 16, 34-43. Snyder, D. K., & Rice, J. L. (1996). Methodological issues and strategies in scale development. In D. H. Sprenkle & S. M. Moon (Eds.), Research methods in family therapy (pp. 216-237). New York: Guilford. Snyder, D. K., & Snow, A. (1995, August). Evaluating couples' aggression in marital therapy. Paper presented at the meeting of the American Psychological Association, New York. Snyder, D. K., Trull, T. J., & Wills, R. M. (1987). Convergent validity of observational and self-report measures of marital interaction. Journal of Sex and Marital Therapy, 13, 224-236. Snyder, D. K., Velasquez, J. M., Clark, B. L., & Means, A. J. (1997). Parental influence on gender and marital role attitudes: Implications for intervention. Journal of Marital and Family Therapy, 23, 215-225. Snyder, D. K., & Wills, R. M. (1989). Behavioral versus insight-oriented marital therapy: Effects on individual and interspousal functioning. Journal of Consulting and Clinical Psychology, 57, 39-46. Snyder, D. K., Wills, R. M., & Grady-Fletcher, A. (1991). Long-term effectiveness of behavioral versus insightoriented marital therapy: A 4-year follow-up study. Journal of Consulting and Clinical Psychology, 59, 138141. Snyder, D. K., Wills, R. M., & Keiser, T. W. (1981). Empirical validation of the Marital Satisfaction Inventory: An actuarial approach. Journal of Consulting and Clinical Psychology, 49, 262-268. Spanier, G. B. (1976). Measuring dyadic adjustment: New scales for assessing the quality of marriage and similar dyads. Journal of Marriage and the Family, 38, 15-28. Straus, M. A., & Brown, B. W. (1978). Family measurement techniques: Abstracts of published instruments, 1935-1974 (rev. ed.). Minneapolis, MN: University of Minnesota Press. Touliatos, J., Perlmutter, B. F., & Straus, M. A. (Eds.). (1990). Handbook of family measurement techniques. Newbury Park, CA: Sage. Waring, E. M. (1985). Review of the Marital Satisfaction Inventory. In J. V. Mitchell (Ed.), The ninth mental measurements yearbook (Vol. 1, pp. 895-896). Lincoln, NE: University of Nebraska Press. Waring, E. M., Stalker, C. A., Carver, C. M., & Gitta, M. Z. (1991). Waiting list controlled

< previous page

page_1210

next page > Page 1210

trial of cognitive marital therapy in severe marital discord. Journal of Marital and Family Therapy, 17, 243256. Westerman, M. A., & Schonholtz, J. (1993). Marital adjustment, joint parental support in a triadic problemsolving task, and child behavior problems. Journal of Clinical Child Psychology, 22, 97-106. Whisman, M. A., & Jacobson, N. S. (1992). Change in marital adjustment following marital therapy: A comparison of two outcome measures. Psychological Assessment, 4, 219-223. Wills, R. M., & Snyder, D. K. (1982). Clinical use of the Marital Satisfaction Inventory: Two case studies. American Journal of Family Therapy, 10, 17-26. Wilson, G. L., Bornstein, P. H., & Wilson, L. J. (1988). Treatment of relationship dysfunction: An empirical evaluation of group and conjoint behavioral marital therapy. Journal of Consulting and Clinical Psychology, 56, 929-931. Wirt, R. D., Lachar, D., Klinedinst, J. K., & Seat, P. D. (1984). Multidimensional description of child personality: A manual for the Personality Inventory for Children. 1984 Revision by D. Lachar. Los Angeles: Western Psychological Services.

< previous page

page_1210

next page >

< previous page

page_1211

next page > Page 1211

Chapter 39 The Adult Personality Inventory Samuel E. Krug Metritech, Inc. Why is a chapter about normal-range personality in a book on treatment planning and outcomes assessment? At first glance it might seem out of place among chapter titles filled with references to psychiatric disorders, anxiety, depression, and symptom assessment. Was the word ''disorder" accidentally dropped from the chapter title during proofreading? Did we catch the editor napping when he accepted this chapter? Is "Adult Personality Inventory" simply the name put on the test booklet to reduce the disagreeable reaction that a title like "Adult Insanity Measure" might evoke? None of the above is the answer. The answer is that clients present not only with symptoms but also with personalities, and understanding their personalities is as important as understanding their symptoms if effective treatment plans are to be developed. In the broadest sense, personality may be defined as the set of influences that explain a person's behavior in a specific situation. The situation may involve vocational choice, treatment planning, therapeutic intervention, or, more likely, some combination of these and other concerns. The influences may be intrapersonal characteristics, such as anxiety or intelligence, that evolve from interactions of genetic and environmental causes. The influences may be interpersonal processes that explain the interaction of two or more people. They also may be characteristics that explain a relatively specific and limited area of behavior, such as vocational choice and occupational adjustment. Regardless of the scope or origin of these influences, personality assessment is the process of quantifying them. The purpose of any assessment is to provide relevant information that helps reduce uncertainty and error in making decisions. Within clinical settings, there are a large number of client decisions that require answers. With substance abuse clients, for example, a simple history that records onset and length of addiction, types of substance(s) used, and similar information is not sufficient to suggest effective treatment procedures. It also does not provide great insight into probable outcomes. Additional information about characteristics such as maturity, adjustment, and sensitivity prove to be very helpful in predicting the course of therapy and structuring a reasonable treatment plan. Other characteristics are helpful in anticipating who will have greater difficulty following

< previous page

page_1211

next page >

< previous page

page_1212

next page > Page 1212

through on a treatment plan. The kinds of characteristics measured by tests of normal-range personality are exactly those that are helpful in reducing uncertainty related to these kinds of decisions. Normal-range personality characteristics are useful in sorting out primary and secondary problems and understanding differences between chronic and acute symptomatology. Knowledge of personality characteristics can often predict the resistance to therapy that some clients experience. Research has also demonstrated that personality can be useful in determining choice of medication as an adjunct to therapy. For example, Neal (1977) validated a formula for predicting choice of medication in the treatment of depression. The formula, which involves three normal-range personality characteristics, accurately identifies patients who are more likely to respond to one class of medications than another. The Adult Personality Inventory (API) is one instrument for measuring these kinds of characteristics. The API provides a technology for assessing major dimensions of adult personality and reporting them in terms that are understandable and relevant to a wide array of decisions. Although the API is a relative newcomer on the test scene, its roots lie deep within an assessment tradition that spans more than a half century. Development of the API Historical Perspective Raymond B. Cattell is usually credited with one of the most ambitious attempts to map the full domain of normal range personality. His belief that language must ultimately encapsulate anything of importance about human personality led him to the dictionary. Cattell's work began in the 1940s with the complete lexicon of 17,953 characteristics Allport and Odbert (1936) extracted from Webster's New Unabridged International Dictionary. Cattell set out to reduce these terms to a more limited and elementary set. He felt these basic building blocks of personality would have relevance to a wide assortment of socially significant criteria. That conviction was subsequently affirmed when research on scales designed to measure Cattell's constructs empirically confirmed many such connections. Several thousand published articles and books provide a rich database to support the usefulness of Cattell's personality theory and its key constructs (Krug & Johns, 1990). Cattell's own research provides a detailed analysis of how characteristics evolve throughout the life span (Cattell, 1979, 1980). Other studies have explored the relative strength of genetic and environmental factors in shaping and molding these personality factors (Cattell, 1982). Despite this vast accumulation of data and Cattell's passionate attempts to persuade, there is relatively little consensus regarding his assertion that his 16 constructs represent the "source traits" of human personality. Over the years, other theorists have argued for an alternative "primary" conception of personality. Eysenck, for example, championed extroversion, neuroticism, and, somewhat later, psychoticism as fundamental dimensions of personality. Guilford proposed a structure somewhere in between Cattell's and Eysenck's in complexity. More recently, a variety of authors have supported the "five-factor" or "big five" model as representing the final word on personality structure.

< previous page

page_1212

next page >

< previous page

page_1213

next page > Page 1213

Although disagreement exists about Cattell's 16 factors representing the primary structure of normal range adult personality, there is compelling evidence (e.g., Cattell & Krug, 1986) that they represent a well-replicated summarization of the original "personality sphere," a term Cattell used to describe the total domain of human behavior. As noted earlier, their linkage to a wide variety of socially relevant and important criteria is firmly established in the research literature. Much is known about how they relate to similar characteristics assessed by other widely used measures of adult personality. Perhaps more than any other set of personality characteristics, they have been studied across the life span. Cattell and his associates spent a considerable amount of effort mapping their development and growth over time. In short, the 16 dimensions represent one of the most widely and systematically studied set of scales in existence. For this reason, the 16-dimensional model that emerged from Cattell's research formed the blueprint for API development. API Test Design The API blueprint is two dimensional. One dimension reflects content; the second reflects four levels of assessment. To clarify the content dimension, API item development began by factor analyzing the existing pools of items that had been designed to assess Cattell's core personality constructs. These items had been incorporated into different forms of Cattell's 16 Personality Factor Questionnaire (16PF; Cattell, Eber, & Tatsuoka, 1970). Items with the strongest loadings on the first principal axis factor within each set described content most essential to the definition of each construct. These factor analyses provided the content specifications for item writing by identifying a set of prototypical items for each of the 16 constructs. Four item formats sampled the four levels of assessment that Werner and Pervin (1986) identified in major personality questionnaires: cognitive (beliefs, opinions), preference (likes, dislikes, wishes), affective (emotions), and behavioral (activities). The test blueprint primarily emphasized the development of items assessing the individual's belief system, which is Werner and Pervin's Category 1. Since Kelly's (1955) seminal work, a substantial body of research has developed regarding the importance of cognitions and how they influence behavior. For example, two individuals may attend parties frequently, but for very different reasons. One individual might go primarily to enjoy the warmth of interaction with others. Another might go primarily to meet contacts for sales. Expanding the number of items in the API that assessed beliefs and opinions increased the predictive power and precision of the instrument. More than half of the items in the API fall within Werner and Pervin's cognitive level. The remaining items are equally distributed across the three other categories. Description of the API The API contains 324 items developed from the content and item-type specifications just described. There is no time limit for completion, but most people finish in 45 minutes or less. Because the API items are written at a level that requires fourth-grade reading skills, the test is appropriate for a very broad segment of the population. The majority of API items were constructed along the lines of traditional personality questionnaires (e.g., "I have more friends than most people"). However, approximately

< previous page

page_1213

next page >

< previous page

page_1214

next page > Page 1214

10% of the items deal specifically with work interests (e.g., "I like jobs that offer challenge and responsibility," "Mechanical things interest me"), and another 10% were designed specifically to measure general cognitive ability. These items represent three domains or item types: verbal analogies (e.g., shoot means the same as: hunters, miss, or fire), arithmetic problems, and verbal reasoning items (e.g., which of the following does not belong with the other two: snake, tiger, wolf?). These items contribute in important ways to scores on several of the career scales, in particular. API results are reported in terms of three sets of content scales. Each set of scales, 21 in all, provides a different, theoretically based template for understanding personality and its contribution to predictability across different kinds of situations. The first five of the Personal Characteristic scales, for example (Extroverted, Adjusted, Tough-Minded, Independent, Disciplined), define a factor space that has been suggested to represent the basic structure of phenotypic personality traits (Costa & McCrae, 1985; Goldberg, 1981, 1993; Norman, 1963). That is, these scales are thought to be located at the highest level of abstraction that is still descriptive of behavior (John, Hampson, & Goldberg, 1991). Individuals differ in many ways. Within personality study, the interpersonal domainhow people differ in what they do to each otherhas long been a topic of special interest. The eight Interpersonal scales (Caring, Adapting, Withdrawn, Submissive, Uncaring, NonConforming, Sociable, Assertive) were developed to provide a representation of the two-dimensional, circular structure that has been found to represent the basic organization of interpersonal traits (LaForge, 1985; Leary, 1957; Wiggins, 1979, 1985). That is, the eight scales can be represented in a two-dimensional plane, the major axes of which are the Caring-Uncaring and WithdrawnAssertive dimensions. Because the scales focus on interpersonal relationships, it is not surprising that the scales have been found to be very useful in understanding the dynamics of successful and unsuccessful marriages (Krug & Ahadi, 1986). The third set of scales, called the Career scales, (Practical, Scientific, Aesthetic, Social, Competitive, Structured) relate to career choice, job satisfaction, and lifestyle preferences. They were developed empirically by studying response patterns that differentiated people in a broad sampling of occupations (Krug, 1995). Four validity scales (Good Impression, Bad Impression, Infrequency, Uncertainty) provide a check on various patterns of distortion that can invalidate a test. Technical Characteristics Both gender-specific and gender-neutral norms exist to convert raw scores to standard scores. A short form of the API, using only the first 189 items, takes about half the time of the long form. Nevertheless, short-form scale reliabilities are only about 10% lower, on average, than reliabilities for the full test. The average correlation between short- and long-form scales is .88. Currently available evidence suggests that ethnicity has relatively little impact on scale elevation (Krug, 1986). Gender differences may be controlled by using separate norms for men and women. The API cannot be scored by hand. Several scoring services provide different kinds of narrative score reports. In addition, several microcomputer editions (Krug, 1985, 1991b) provide on-site test administration, scoring, and reporting for PC users. Although most API reports are for professional use, one API report was specifically designed to be shared with clients. The format of the report provides background and contextual

< previous page

page_1214

next page >

< previous page

page_1215

next page > Page 1215

information clients use in analyzing the test scores and focusing the information they contain on a particular problem or decision. Interpreting the API Profile Figure 39.1 presents a sample API profile. The organization of the profile illustrates the general approach to test interpretation. API scores are reported on a standard score scale that ranges from 1 through 10 with a mean of 5.5 and a standard deviation of 2 in the reference population. The graph delineates three score ranges (low, average, high) to facilitate interpretation of the most noteworthy personality features. The profile begins with the four validity scales, which are usually examined before the content scales. When scores on the validity scales fall in the high range, that is,

Fig. 39.1. Sample API profile.

< previous page

page_1215

next page >

< previous page

page_1216

next page > Page 1216

scores of eight or above, the first thing to consider is that the responses are unusual, and interpretations of test scores may not lead to valid inferences. For example, high scores on the Uncertain scale mean the person has used the "uncertain" response much more often than once in about 20 questions, which is the typical rate. One consequence of such high use of the "uncertain" response is that the range of scores on the personality scales is artificially restricted. Some people are very careful about the way they present themselves to others. They try to emphasize their best features and downplay their worst features. Scores on the Good Impression scale reflect the extent to which these tendencies are operating. High scores are sometimes obtained by status-conscious people who do not deliberately set out to "fake good," but whose choices are influenced by their social self-image. In other cases, high scores may reflect a correct assessment by individuals who have very little negative to say about themselves. High scores on the Bad Impression scale indicate that the person has been highly self-critical. In some cases, this may be because the individual has deliberately selected answers in an attempt to create an unfavorable profile. In other cases, it may be that the person is experiencing unusual pressure or stress at the present time. A high score on this scale is always something to explore further. The Infrequency scale contains items that very few people answer in the scored direction. One explanation of a high score is that the person was careless and answered without thinking. In other cases, it may be that the person is experiencing unusual difficulties. As with the Bad Impression scale, a high score on Infrequency is always something to explore further. In the sample profile shown in Fig. 39.1, all the validity scales fall within normal limits. With respect to the personality scales, both high and low scores are considered equally interpretable. For example, both high and low Adjustment have important implications for clinical evaluation. As mentioned earlier, the Personal Characteristic scales represent some of the most important dimensions of individual differences with respect to normal-range characteristics. Noting more extreme scores provides a convenient way of summarizing key features of personality functioning. For example, the sample API profile shown in Fig. 39.1 suggests a person who may be described as strongly self-directed, independent, and selfsufficient. The extreme score on the Independent scale indicates that this person likes to do things her own way and make her own decisions. Consequently, she usually relies on herself, rather than others. It is likely that she finds it very difficult to accept direction from other people. A score of 8.8 on Adjusted suggests that her overall level of emotional maturity is excellent. She shows no evidence of being particularly anxious or distressed at the present time. On the contrary, she describes herself as very stable. She likely has the inner resources to cope with the challenges she encounters. A score of 8.1 on Extroverted indicates that she is outward-oriented and prefers to focus on the world around her rather than the world of inner thoughts and feelings. She probably enjoys being with other people. A score of 7.5 on Creative suggests that she is likely to be innovative and most comfortable when she has the opportunity to explore new ideas. With respect to achievement motivation (i.e., Creative), the client is only slightly higher than average. The score of 6.6 on Tough-Minded suggests a positive balance of objectivity and sensitivity in the decisions she makes and in her approach to people. The Interpersonal Style scales describe how the individual is likely to relate to others. Table 39.1 provides a capsule interpretive summary for each of these eight scales. The pattern of scores on the Interpersonal Style scales for the sample profile in Fig. 39.1

< previous page

page_1216

next page >

< previous page

page_1217

next page > Page 1217

TABLE 39.1 The API Interpersonal Style Scales CARING People who score high on this scale accept others openly and unconditionally. Other words that could be used to describe them include pleasant, companionable, sympathetic, cheerful, perky, neighborly, approachable, forgiving, jovial, outgoing. ADAPTING People who score high on this scale appear to be more comfortable taking, rather than giving, orders. Other words that could be used to describe them include silent, nonaggressive, undemonstrative, forceless, nonargumentative, meek, self-effacing, quiet, withdrawn, sensitive to group pressure. WITHDRAWN People who score high on this scale describe themselves as insecure in social situations. Other words that could be used to describe them include bashful, uncalculating, shy, silent, antisocial, nonaggressive, disorganized, accommodating, timid, forceless, self-doubting, undemanding, dissocial, introverted. SUBMISSIVE People who score high on this scale need the support and approval of other people. They can be easily upset by criticism, even if it is well intentioned. They react poorly to most stressful situations. Other words that could be used to describe them include emotional, timid, self-doubting, inhibited, brooding, moody. UNCARING People who score high on this scale are best described as distant, aloof, dispassionate, or cold. Other words that could be used to describe them include domineering, cunning, swell-headed, wily, ruthless, sly, big-headed, cruel, exploitative, hard-hearted, cocky, tricky, dominant, angry, critical. NONCONFORMING People who score high on this scale are not sensitive to rules. Other words that could be used to describe them include uncivil, forceful, domineering, impractical, self-assured, self-confident, dominant, outgoing, assertive, overforward, impolite, extraverted, inconsistent, boisterous, sensation seeking, narcissistic, uncontrolled, rebellious. SOCIABLE People who score high on this scale describe themselves as outgoing and open. Other words that could be used to describe them include forceful, domineering, self-assured, dominant, outgoing, self-confident, cheerful, enthusiastic, vivacious, flaunty, assertive, extraverted, jovial, boisterous. ASSERTIVE High scoring people describe themselves as take-charge persons. Their leadership potential is high. Other words that could be used to describe them include forceful, companionable, firm, self-assured, self-confident, industrious, cheerful, enthusiastic, assertive, approachable, extraverted, jovial, friendly, outgoing, dominant. suggest an individual who probably enjoys many positive relationships with others. The score of 8.8 on Assertive, which could make her seem too domineering at times, is offset by a nearly equal score on Caring. Consequently, she is likely to be thought of by others as genuine and sincere. The Career scales assess work setting preferences and, by implication, the kinds of careers an individual may find most appealing. The results are based on preferences, not skills, and do not take into account special abilities or training that may be required. They are designed to suggest options and possibilities the individual may wish to consider in career planning. People who score high on the Practical scale are attracted to the functional aspects of work settings, that is, getting things done. They describe themselves as down-to-earth, confident, and self-sufficient. They like concrete, physical tasks and working with tools or machines and find production, engineering, transportation, and security compatible career areas. People who score high on the Scientific scale are attracted to the analytic aspects of work settings, that is, solving problems. They are

< previous page

page_1217

next page >

< previous page

page_1218

next page > Page 1218

stimulated by opportunities to use their investigative and deductive skills to find new solutions to problems, as in research and development, planning, systems analysis, programming, or medical settings. People who score high on the Aesthetic scale are attracted to the creative aspects of work settings and are likely to be most comfortable in work settings that allow them to express their imagination and creativity. Consequently, they are apt to pursue careers in advertising, public relations, writing, college teaching, or product design and development. People who score high on the Social scale are attracted to the interpersonal aspects of work settings, that is, working with people. They are more comfortable in settings that allow them to interact with other people, often in roles in which they can take care of other people. Among the career settings they find most congenial are social service agencies, religious organizations, health care, and human resources. People who score high on the Competitive scale are attracted to the commercial aspects of work. They are likely to be perceived by others as ambitious and goal oriented. They are oriented toward business. People who score high on the Structured scale are attracted to the detail aspects of work. They have a preference for well-defined activities and unambiguous job requirements. They tend to be precise in the way they approach problems and situations. Compatible career areas include operations and finance, accounting, office management, auditing, and quality assurance. Reliabilities Reliabilities of the scales are high. The median reliability across scales, assessed by both internal consistency and test-retest methods, is above .80. Values range to .91 for internal consistency coefficients, which are generally highest for the Personal Characteristic scales and lowest for the Career scales. Test-retest coefficients over 1 to 2 months range to the high .80s. Again, the highest values are found for the Personal Characteristic scales. Validation Like other personality tests, API validation rests on an expanding fabric of interlocking research that links its scales to predicted outcomes and to external measures of theoretically related constructs. This information appears in a series of publications to which the interested reader is referred (Krug, 1985, 1986, 1991a, 1995, 1997). Scales of the API have been extensively correlated with those of other well-established inventories for assessing normal-range characteristics, such as the California Personality Inventory, the 16 Personality Factor Questionnaire, and the Guilford-Zimmerman Temperament Survey; instruments designed specifically for assessing clinical features and symptomatology, such as the Millon Clinical Multiaxial Inventory, the IPAT Depression Scale, and the Differential Personality Inventory; and instruments designed to measure similar interpersonal structures, such as the Interpersonal Check List. These results are summarized in Krug (1984). The findings are consistent with expectations that arise from the scale names. Among the Personal Characteristic scales, Adjusted and Disciplined are found most often to have correlations in the clinical domain. Within the Interpersonal Style scales, a similar pattern holds for the Withdrawn and Uncaring scales. Some research with the API relates directly to clinical and counseling outcomes. For example, studies of husbands and wives who have participated in marriage counseling identified a constellation of characteristics that differentiated them from a control group

< previous page

page_1218

next page >

< previous page

page_1219

next page > Page 1219

(Krug & Ahadi, 1984). The main scales involved were Adjusted and Practical, on which the experimental group, as couples, scored lower than the control group, and Aesthetic, Submissive, and Withdrawn, on which the experimental group scored higher than the control group. The combination of scales involved in this pattern immediately suggests a dimension of maladjustment in which anxiety is the most prominent characteristic. The Aesthetic scale and the Practical scale suggest some absorption in fantasy and possibly some lack of reality contact. Reviews of the API and API-based products can be found in Bolton (1985, 1991), Hilgert (1987), Krug (1991a, 1991b), Meier (1986), Mercadal (1987), and Schlosser (1989). The API Decision Model Feature As noted earlier, the purpose of any assessment is to provide relevant information that helps reduce uncertainty and error in making decisions. One of the unique features of the API, the decision model feature, was designed specifically to enhance test validity. In applied settings, it is not enough that psychological tests validly measure the constructs they purport to measure. It is equally important to know how test scales combine to predict a given criterion. This is quite frequently done through validation studies in which large samples of individuals are tested and their results are related to some performance criterion. However, it is too often the case that test data must be used without any validity evidence because the specific criterion, population, or situation is unique. The "decision model" approach to API interpretation was designed for situations where formal validity evidence was limited or absent. The decision model is a test profile derived from expert judgments about the relative importance of the scales. A decision model is developed by responding to a series of 64 paired comparisons. Each comparison contrasts one scale with another and the task is to select the scale that describes the more important characteristic for a specified situation. The decision modeling process takes advantage of expert judgment about criteria, situations, and populations for which limited empirical test data exists. It provides an objective process for linking the test user's often quite extensive knowledge of the criterion to the test scales. The entire decision model development process is usually managed by microcomputer (Krug, 1985). Each scale name is accompanied by a fixed set of adjectives that serve to anchor the concept for the test user. The test user then makes 64 judgments about which of two test characteristics are more relevant or more importantly related to the criterion. These importance judgments are then transformed into a profile that has the same distributional properties (i.e., mean, standard deviation, and range) as empirical test profiles. Characteristics judged more relevant receive higher scores in the model profile. Less relevant characteristics receive lower scores. The mathematical transformation of paired-comparison data to standard scores is formally equivalent to a scaling procedure originally described by Lawshe, Kephart, and McCormick (1949). Individual test results are then compared statistically to the decision model. The calculation uses the sum of squared differences between the model and individual profile across all 21 test scales. Although it is simple to use, the decision model process rests on a number of underlying theoretical premises. First, the approach provides a formal structure for conducting some form of criterion analysis, which is the first step in the assessment paradigm (Wiggins, 1973). Because different models can be quickly and easily generated,

< previous page

page_1219

next page >

< previous page

page_1220

next page > Page 1220

the approach provides a practical way to tailor assessment objectives to each situation. In many cases, psychological testing proceeds in too automatic a fashion. Tests are sometimes administered and scored before the user has taken the time to determine whether the characteristics they measure are even appropriate to the situation. Second, the approach uses the method of paired comparisons rather than subjective estimation (i.e., the method of equal appearing intervals) to generate the importance judgments. Practical experience and research findings both suggest that people find it easier to decide between two stimuli than to scale a stimulus directly on an underlying latent continuum, especially when a large number of stimuli are involved. Furthermore, there is no guarantee that subjective estimation methods would produce model profiles that are distributed like individual test profiles, especially with regard to the variance of test scores (Torgerson, 1958). Third, the approach uses a subset of comparisons that limits judgments to logically interrelated sets of test scales. It does not require that all possible comparisons of 21 scales be made, which would require 210 separate judgments and require substantially more time. The API contains three distinct sets of test scales that are intended to provide different perspectives on the person (i.e., how is the person best described?personal characteristic scales; how does the person relate to other people?interpersonal style scales; and what type of work settings seem most compatible?career factor scales). Comparisons are made among all scales within each set, but not across sets of scales. Fourth, comparisons of model profiles with actual test protocols are carried out across all test scales, not just those identified as above average in importance. The comparison is statistical rather than clinical. A large body of research has repeatedly emphasized the relative superiority of statistical methods of data combination once the relevant characteristics have been identified (e.g., Dawes, 1979; Goldberg, 1970; Meehl, 1954). Finally, the statistic used to match profiles was standardized on the basis of an empirically obtained distribution of more than 4,000 real comparisons. In this study, the set of models used and individuals sampled represented the full range of situations that would be met in practice. Consequently, ratings of "high," "average," or "low" similarity represent data-based interpretations rather than abstract mathematical relationships. The decision model feature was initially developed to facilitate decisions related to employment selection. There, the position for which the examinee is being considered is often not well described by a single title or one for which much validation data exist. For example, the title "secretary" spans a wide array of tasks and skills. One secretary may engage primarily in managing correspondence whereas another serves also as an office manager. A third may have primary responsibility for scheduling an executive's day and have considerable latitude in controlling access to that person. As frequently happens in assessment history, applications and purposes evolve over time. API users discovered that discrepancies between a decision model and a test profile could be productively regarded as a blueprint for growth and development. For example, in the selection of an executive director for a medium-size social services agency, a decision model was constructed by a consensual process of the agency's board of directors. The process involved extended discussions by the board of past situations and future challenges facing the agency and the kinds of characteristics likely to be most productive in those situations, discussions that the consulting psychologist facilitated. The focus on the development of a decision model provided a stimulus for a comprehensive evaluation of the agency's mission and vision for the future. When the selection process was complete, the consulting psychologist provided the successful candidate with API feedback that included the decision model created by the board. The discrepancies

< previous page

page_1220

next page >

< previous page

page_1221

next page > Page 1221

between the two represented personal growth challenges the candidate could reasonably expect to meet in the future. Another clinical application of the decision modeling feature is in screening. Early illustrations of this approach appear in Krug (1981) and Krug and Behrens (1981). In both cases, empirical data relating test scores to the specific criteriawho should be allowed unescorted access to nuclear power generating facilities and who should be allowed to carry weapons on Coast Guard boarding partieswere extremely limited. The decision modeling approach allowed experts' knowledge about test scale performance in similar but not identical situations to generalize to the specific criteria. In other applications, the decision model feature helps counseling clients quantify ideal self-images. However it is used, the decision model strategy is a paradigm to guide the test user to more reliable decisions in the absence of better data. It is not a substitute for more formal validation procedures. Nevertheless, it is suggested that the process will result in more reliable and valid judgments than would otherwise be made without any more formal process of criterion analysis and scale combination. Use of the API for Treatment Planning, Monitoring, and Outcomes Assessment In many clinical and counseling situations, a primary application of tests of normal-range personality is in identifying and quantifying goals for personal development. Although tests like the API do not address symptomatology directly, the scales hold considerable information about directions for change and growth that become an important part of treatment. Table 39.2 summarizes developmental considerations suggested by high or low scores on the API personal characteristic scales. Because of its obvious clinical implications, Adjustment is not considered in the table. In suggesting developmental possibilities, the table is also useful in framing specific treatment objectives and outcomes. Tests like the API are probably more helpful in cases where the primary diagnosis is of a personality disorder. They may seem somewhat less useful in diagnoses of depression or anxiety disorder and still less useful in psychotic reactions. Regardless of disorder, however, the scales are helpful in sorting out premorbid characteristics from acute symptomatology. They are also helpful in understanding patterns of interpersonal relationships and occupational adjustment. Evaluation Against the Newman and Ciarlo Criteria The 11 criteria suggested by Newman and Ciarlo (1994) for evaluating psychological instruments for treatment outcomes assessment seem best applied to instruments designed specifically for the clinical domain. However, they have relevance as well to instruments like the API that find applications in the clinical domain although not specifically designed for that arena. The first criterion is that the instrument be relevant to the target group. As noted earlier, the API is appropriate for use with a broad segment of the adult population. The characteristics it measures are basic to understanding personality functioning, and the information it provides is useful in differentiating chronic from acute symptomatology

< previous page

page_1221

next page >

< previous page

page_1222

next page > Page 1222

TABLE 39.2 Developmental Suggestions Related to Score Deviations on API Personal Characteristic Scales Scale and Description High Score Low Score Suggestions Suggestions EXTROVERTED May need to take May need to Assesses quality and intensity of interpersonal more time for learn to be more interaction, activity level, need for stimulation, personal communicative and capacity for joy. reflection. and open. May need to curb May need to strong need for learn to spend activity, less time impatience. thinking and more times doing. If trying to convince or persuade, may need to project enthusiasm. TOUGH-MINDED May need to May need to Assesses how people make decisions and learn to consider learn judgments. others' assertiveness feelings and to be skills. more sensitive to May need to others. take a more May need to objective and develop tough-minded interpersonal approach with skills, others to such as giving avoid being compliments, taken advantage communicating of or tactfully. being taken for granted. INDEPENDENT May need to May need to Assesses self-reliance, self-sufficiency. develop patience learn to make to work firmer and cooperatively more confident with others. decisions rather May need to than learn the art of look for support negotiation or depending on and compromise, others. and to acknowledge input from others. DISCIPLINED May need to May need to Assesses the lifestyle adopted in dealing with the develop patience develop world; serves as a control over other personality for those perseverance, dimensions. who ignore planfulness, and standard get in the habit operating of procedures while setting goals. trying out new techniques May need to work at being open to other ways of doing things. May need to

learn not to judge others by their own strict standards. CREATIVE Assesses imagination, unconventionality, innovativeness.

May need to May need to focus on practical develop a more details. future-oriented May need to try perspective, to state things prod more themselves to simply. look at the benefits of change.

ENTERPRISING Assesses drive/ambition.

< previous page

May need to May need help learn to relax and to recognize enjoy talents. doing things for May need to the sheer fun of gain confidence it. that they can succeed.

page_1222

next page >

< previous page

page_1223

next page > Page 1223

and primary from secondary problems. The test requires fourth-grade reading skills and can be administered orally, if necessary. The API was designed to measure relatively stable features of adult personality, so most scales are unlikely to shift dramatically as a function of treatment. However, there is variation in this respect. The Career scales, as a set, are less likely to show change, for example, than the Interpersonal Style scales. Among all personality scales, the Adjusted scale is perhaps the best single index of current functioning and changes of one to two standard deviations (two to five scale score points) would not be improbable throughout the course of a successful intervention. Still, the API was not designed to be sensitive to day-to-day or short-term fluctuations in the same way as some other instruments. Consequently, it plays its most useful role in helping the clinician understand personality characteristics of the client that need to be considered at the outset of an intervention to plan its course. The second criterion requires that the instrument be simple to use. In its computerized versions, the entire administration is managed by microcomputer. Before the actual administration, the program provides instructions and a short, sample test to ensure that the client understands response options and how to navigate through the test. After each question is presented, the client selects an answer and verifies that choice before moving on to the next question. Once an answer has been verified, the client cannot return to a question. This provides a maximum amount of standardization and control. Even when the test is administered offline, the process is straightforward. Test manuals and handbooks guide the user through the process and define the conditions for standardized administration. The items used in the API were carefully selected to represent objective situations to which the client could respond without ambiguity. It is in this sense that the API may be thought to be in line with Newman and Ciarlo's third criterion: The measure should have objective referents. In their fourth criterion, Newman and Ciarlo stressed the value of employing multiple perspectives in clinical assessment. The API is a vehicle though which the client is able to provide reliable information about many important aspects of life functioning from the client perspective. In its original form, the test is not designed for use as an observer rating instrument. The decision model feature has been used informally in some settings (e.g., marital counseling) to facilitate collection of the kind of data that can be used for direct comparison with the client's self-ratings. The API rates particularly high on Newman and Ciarlo's sixth criterion, psychometric adequacy. Reliabilitieswhether assessed as test-retest coefficients or internal consistencyare high for all content scales. Correlations with a large number of other personality measures confirm that the test validly assesses some of the broadest aspects of normal-range personality functioning. Normsboth gender neutral and gender specifichave been established to represent the adult population. The four validity scales check for some of the most common stylistic tendencies that may confound interpretation of the content scales. The seventh criterion is concerned with cost-effectiveness. The API, like most objective personality measures, acquits itself well in this arena because little professional time is needed for administration. PC versions of the test that administer, score, and provide narrative reports of results immediately do so for less than $10 per test. This is less than many other tests charge for a mail-in service that turns information around in weeks rather than minutes. And, the information the API provides is relevant to a wide range of questions.

< previous page

page_1223

next page >

< previous page

page_1224

next page > Page 1224

The next two criteria deal with how easily test feedback can be shared with nonprofessional audiences, including the client. The API describes features of the personality that are easily understood by a wide variety of audiences. The report formats were developed to convey the information the test provides simply and directly. The reports avoid jargon and use a variety of graphic formats to facilitate client understanding. Among specific questions Newman and Ciarlo present in relation to their tenth criterion is ''do the test results help in planning the array and levels of services, treatments, and intervention styles that might best meet service goals?" It is in this sense that tests like the API are probably most useful in clinical services. Although the test does not focus specifically on current symptoms, knowing something about clients' score on Disciplined, for example, says a lot about how easily they will adapt to treatment regimens and follow through on treatment plans. The API is not linked to a particular clinical theory or practice and so meets Newman and Ciarlo's final criterion that instruments should be compatible with a variety of clinical theories and practices. Conclusions The API provides a technology for assessing significant aspects of normal-range adult personality and reporting them in terms that are understandable and relevant to practical decisions. Specifically, with respect to treatment outcome assessment, the API can contribute productively to predicting the course of therapy and structuring reasonable treatment plans. References Allport, G. W., & Odbert, H. S. (1936). Traitnames, a psycho-lexical study. Psychological Monographs, 47. Bolton, B. (1991). Comments on "The Adult Personality Inventory." Journal of Counseling and Development, 69, 272-273. Bolton, B. (1985). Review of the Adult Personality Inventory. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (pp. 55-56). Lincoln, NE: Buros Institute. Cattell, R. B. (1979). Personality and learning theory: Vol. 1. The structure of personality in its environment. New York: Springer. Cattell, R. B. (1980). Personality and learning theory: Vol. 2. A systems theory of maturation and structured learning. New York: Springer. Cattell, R. B. (1982). The inheritance of personality and ability: Research methods and findings. New York: Academic Press. Cattell, R. B., Eber, H. W., & Tatsuoka, M. M. (1970). Handbook for the 16 Personality Factor Questionnaire. Champaign, IL: Institute for Personality and Ability Testing. Cattell, R. B., & Krug, S. E. (1986). The number of factors in the 16PF: A review of the evidence with special emphasis on methodological problems. Educational and Psychological Measurement, 46, 509-522. Costa, P. T., & McCrae, R. R. (1985). The NEO Personality Inventory manual. Odessa, FL: Psychological Assessment Resources. Dawes, R. N. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34, 571-582. Goldberg, L. R. (1970). Man vs. model of man: A rationale, plus some evidence for a method of improving on clinical inference. Psychological Bulletin, 73, 422-432. Goldberg, L. R. (1981). Language and individual differences: The search for universals in personality lexicons. In L. Wheeler (Ed.), Review of personality and social psychology

< previous page

page_1224

next page >

< previous page

page_1225

next page > Page 1225

(Vol. 2, pp. 141-165). Beverly Hills, CA: Sage. Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist, 48, 26-34. Hilgert, L. D. (1987). Review of TEST PLUS. Social Science Microcomputer Review, 5, 95-97. John, O. P., Hampson, S. E., & Goldberg, L. R. (1991). The basic level in personality-trait hierarchies: Studies of trait use and accessibility in different contexts. Journal of Personality and Social Psychology, 60, 348-361. Kelly, G. A. (1955). The psychology of personal constructs. New York: Norton. Krug, S. E. (1981). Development of a formal measurement model for security screening in the nuclear power plant environment. Multivariate Experimental Clinical Research, 5, 109-123. Krug, S. E. (1984). The Adult Personality Inventory manual. Champaign, IL: Institute for Personality and Ability Testing. Krug, S. E. (1985). TEST PLUS: A microcomputer based system for the Adult Personality Inventory. Champaign, IL: MetriTech. Krug, S. E. (1986). Preliminary evidence regarding Black-White differences in scores on the Adult Personality Inventory. Psychological Reports, 58, 203-206. Krug, S. E. (1991a). The Adult Personality Inventory. Journal of Counseling and Development, 69, 266-271. Krug, S. E. (1991b). Reply to Brian Bolton. Journal of Counseling and Development, 69, 274. Krug, S. E. (1995). Career assessment and the Adult Personality Inventory. Journal of Career Assessment, 3(2), 176-187. Krug, S. E. (1997). Interpretive and technical guide for the Adult Personality Inventory. Champaign, IL: MetriTech. Krug, S. E., & Ahadi, S. A. (1986). Personality patterns among couples participating in a marriage enrichment program. Multivariate Experimental Clinical Research, 8, 168-178. Krug, S. E., & Behrens, G. M. (1981). Psychological screening for weapons use suitability: A formal decision model. In Proceedings of the 23rd Annual Conference of the Military Testing Association. Arlington, VA. Krug, S. E., & Johns, E. F. (1990). The 16PF. In C. E. Watkins, Jr., & V. L. Campbell (Eds.), Testing in counseling practice (pp. 63-90). Hillsdale, NJ: Lawrence Erlbaum Associates. LaForge, R. (1985). The early development of the Freedman-Leary-Coffey interpersonal system. Journal of Personality Assessment, 49, 613-621. Lawshe, C. H., Kephart, N. C., & McCormick, E. J. (1949). The paired comparison technique for rating performance of industrial employees. Journal of Applied Psychology, 33, 69-77. Leary, T. (1957). Interpersonal diagnosis of personality. New York: Ronald. Meehl, P. E. (1954). Clinical versus statistical prediction. Minneapolis: University of Minnesota Press. Meier, S. T. (1986). Review of TEST PLUS. Computers in Psychiatry/Psychology, 8, 21-22. Mercadal, D. (1987). Review of TEST PLUS. Psychologists' Software Club Newsletter, November, 2-3. Neal, L. (1977). Office psychiatry for the primary care physician. In S. E. Krug (Ed.), Psychological assessment in medicine (pp. 28-47). Champaign, IL: Institute for Personality and Ability Testing. Newman, F. L., & Ciarlo, J. A. (1994). Criteria for selecting psychological instruments for treatment outcomes assessment. In M. E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 98-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Norman, W. T. (1963). Toward an adequate taxonomy of personality attributes: Replicated factor structure in peer nomination personality ratings. Journal of Abnormal and Social Psychology, 66, 574-583. Schlosser, B. (1989). Review of TEST PLUS. The Independent Practitioner (Bulletin of the Division of Independent Practice of the American Psychological Association), 9(4), 6. Torgerson, W. S. (1958). Theory and methods of scaling. New York: Wiley. Werner, P. D., & Pervin, L. A. (1986). The content of personality inventory items. Journal of Personality and Social Psychology, 51, 622-628. Wiggins, J. S. (1979). A psychological taxonomy of trait-descriptive terms: The interpersonal domain. Journal of Personality and Social Psychology, 37, 395-412. Wiggins, J. S. (1985). Interpersonal circumplex models: Commentary. Journal of Personality Assessment, 49,

626-631. Wiggins, J. S. (1973). Personality and prediction: Principles of personality assessment. Reading, MA: AddisonWesley.

< previous page

page_1225

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_1227

next page > Page 1227

Chapter 40 SF-36 Health Survey John E. Ware, Jr. Quality Metric, Inc. Health Assessment Lab Tufts University School of Medicine Harvard School of Public Health The SF-36 health survey is a multipurpose, short-form health survey with only 36 questions. It yields an eightscale profile of scores, as well as summary measures. It is a generic measure of health status as opposed to one that targets a specific age, disease, or treatment group. Accordingly, the SF-36 has proven useful in comparing general and specific populations, estimating the relative burden of different diseases, differentiating the health benefits produced by a wide range of different treatments, and screening individual patients (Manocchia, Bayliss, et al., 1997). This chapter summarizes the steps in the construction of the SF-36; how it led to the development of an even shorter (1-page, 2-minute) survey form, the SF-12; the improvements reflected in Version 2.0 of the SF-36; psychometric studies of assumptions underlying scale construction and scoring; how they have been translated in more than 40 countries as part of the International Quality of Life Assessment (IQOLA) Project; and studies of reliability and validity. SF-36 Literature The experience to date with the SF-36 has been documented in more than 700 publications, which have been summarized in two annotated bibliographies (Manocchia et al., 1997; Shiely, Bayliss, Keller, Tsai, & Ware, 1996). The most complete information about the history and development of the SF-36, its psychometric evaluation, studies of reliability and validity, and normative data is available in the first of three user's manuals (Ware, Snow, Kosinski, & Gandek, 1993). A second manual documents the development and validation of the SF-36 summary measures and presents norms for those measures (Ware, Kosinski, & Keller, 1994). A third presents similar information for the SF-12 Health Survey, an even shorter version constructed from a subset of 12 items (Ware, Kosinski, & Keller, 1995). One of the most complete independent accounts of SF-36 development, along with a critical commentary, is offered by McDowell and Newell (1996). Additional publications are listed on the SF-36 web page (http://www.sf-36.com) and a third edition of the annotated bibliography is in preparation.

< previous page

page_1227

next page >

< previous page

page_1228

next page > Page 1228

The advantage of a generic survey (such as the SF-36) in estimating disease burden is well illustrated in articles describing more than 130 diseases and conditions. Among the most frequently studied conditionswith more than 20 SF-36 publications eachare arthritis, back pain, depression, diabetes, and hypertension (Manocchia et al., 1997). Translations of the SF-36 are the subject of 148 publications and one or more articles compare results from the SF-36 with those from 225 other generic and disease-specific instruments (Manocchia et al., 1997). Construction of the SF-36 The SF-36 was constructed to satisfy minimum psychometric standards necessary for group comparisons. The 8 health concepts were selected from 40 included in the Medical Outcomes Study (MOS; Stewart & Ware, 1992). Those chosen represent the most frequently measured concepts in widely used health surveys and those most affected by disease and treatment (Ware, 1995; Ware et al., 1993). SF-36 items also represent multiple operational definitions of health, including function and dysfunction, distress and well-being, objective reports and subjective ratings, and both favorable and unfavorable self-evaluations of general health status (Ware et al., 1993). Most SF-36 items have their roots in instruments that have been in use since the 1970s and 1980s (Stewart & Ware, 1992), including the General Psychological Well-being Inventory (Dupuy, 1984), various physical and role functioning measures (Hulka & Cassel, 1973; Patrick, Bush & Chen, 1973; Reynolds, Rushing, & Miles, 1974; Stewart, Ware, & Brook, 1981), the Health Perceptions Questionnaire (Ware, 1976), and other measures that proved to be useful during the Health Insurance Experiment (HIE; Brook et al., 1979). MOS researchers selected and adapted questionnaire items from these and other sources and developed new measures for a 149item Functioning and Well-being Profile (FWBP; Stewart & Ware, 1992). The FWBP was the source for SF-36 items and instructions, which were first made available in a "developmental" form in 1988 and in "standard" form in 1990 (Ware, 1988; Ware & Sherbourne, 1992). As documented elsewhere (Ware et al., 1993), the standard form eliminated more than one fourth of the words contained in MOS versions of the 36 items and instructions and adopted improvements in format and scoring. Version 2.0 In 1996, Version 2.0 of the SF-36 was introduced, primarily to improve the two role functioning scales (Ware & Kosinski, 1996). Relative to the standard SF-36 Version 1.0, Version 2.0 includes the following: improvements in instructions and questionnaire items, an improved layout for questions and answers in the self-administered survey, greater comparability with widely used translations and cultural adaptations, and five-level response choices in place of dichotomous response choices for seven items in the two role functioning scales and six-level response. These improvements are briefly explained later. Layout. All responses to questions in Version 2.0 are printed in a left-to-right (also referred to as "horizontal") format, rather than with the mixture of horizontal and vertical listings of response choices that were printed below questions in the original SF-36. Mixed formats of response choices confuse respondents and cause missing and inconsistent responses, particularly among the elderly. Other improvements in layout

< previous page

page_1228

next page >

< previous page

page_1229

next page > Page 1229

include more consistent use of indenting, numbering of instructions, deletion of item labels, and the formatting of boxes that are checked by respondents. Type Size and Bolding. A larger type size has been adopted throughout. Only instructions, as opposed to response choices, are bolded to simplify the "look and feel" of Version 2.0. These and other refinements were adopted on the basis of lessons learned in health care and from surveys in other fields. Wording Changes. Evidence from numerous focus group studies, formal cognitive tests, and from empirical studies in more than a dozen countries support the improvements in item wording and the changes in some terms used to identify health concepts adopted in Version 2. These improvements are expected to make the English-language SF-36 easier to understand and administer as well as making it more objective. Version 2.0 is also more comparable with translations of the SF-36. Because most of the improvements in item wording were developed during the process of translating and adapting the SF-36 for use in other countries during the International Quality of Life Assessment (IQOLA) Project, Version 2.0 is referred to as the "international version." Five-Choice Response Scales. There is considerable empirical evidence that the Version 2.0 five-level response scales substantially improve the two SF-36 role functioning scales. These response scales extend the range measured and greatly increase score precision without increasing respondent burden. Specifically, Version 2.0 achieves a four-fold increase in the number of levels defined by both role scales, a substantially smaller standard deviation, and substantially reduces the percentage of respondents who score at both the ceiling and floor for both role scales. The elimination of one of the six response choices (e.g., "a good bit of the time") from the mental health and vitality items was based on the finding that this choice is not consistently ordered in relation to the other choices in Version 1.0 or in translations of the SF-36. Eliminating this choice simplified the format of the form with little or no loss of information. Scoring. Version 2.0 scoring includes algorithms for norm-based scoring for all eight scales as well as the same standardization of scoring (T-scores with M = 50, sd = 10) that has made the SF-36 summary measures much easier to interpret. Version 2.0 scoring software also achieves improved estimation of missing responses and provides respondent-specific data quality indicators. Comparability of Results. To make Version 1.0 easier to interpret and directly comparable to results based on Version 2.0, cross-sectional and longitudinal norms for general and specific populations are being reestimated for Version 1.0 using normbased scoring for all eight scales and for the two summary measures. Further, a national calibration study is underway in the United States to evaluate the effect of all improvements and to assure the comparability of average scores across versions. Psychometric Considerations SF-36 Measurement Model Figure 40.1 illustrates the taxonomy of items and concepts underlying the construction of the SF-36 scales and summary measures. The taxonomy has three levels: items, eight scales that aggregate 2 to 10 items each, and two summary measures that aggregate

< previous page

page_1229

next page >

< previous page

page_1230

next page > Page 1230

Fig. 40.1. SF-36 and SF-12 measurement model. From Ware, Kosinski, and Keller (1994) and Ware, Kosinski, and Keller (1996). Those items in boxes were selected for SF-12. * Significant correlation with other summary measure. scales. All but one of the 36 items (self-reported health transition) are used to score the eight SF-36 scales. Each item is used in scoring only one scale. The eight scales are hypothesized to form two distinct higher ordered clusters due to the physical and mental health variance that they have in common. Factor analytic studies have confirmed physical and mental health factors that account for 80% to 85% of the reliable variance in the eight scales in the U.S. general population (Ware, Kosinski, et al., 1994), among MOS patients (McHorney, Ware, Rogers, Raczek, & Lu, 1993; Ware, Kosinski, et al., 1994), and in general populations in Sweden (Sullivan et al., 1995) and the United Kingdom (Ware et al., 1994). These studies have also been replicated in more than a dozen other countries (Ware et al., 1998). Three scales (Physical Functioning, Role-Physical, Bodily Pain) correlate most highly with the physical component and contribute most to the scoring of the Physical Component Summary (PCS) measure (Ware, Kosinski, et al., 1994). The mental component correlates most highly with the Mental Health, Role-Emotional, and Social Functioning scales, which also contribute most to the scoring of the Mental Component Summary (MCS) measure. Three of the scales (Vitality, General Health, and Social Functioning) have noteworthy correlations with both components.

< previous page

page_1230

next page >

< previous page

page_1231

next page > Page 1231

The importance of these findings is illustrated in the discussion of empirical validity. Specifically, scales that load highest on the physical component are most responsive to treatments that change physical morbidity, whereas scales loading highest on the mental component respond most to drugs and therapies that target mental health. Scaling and Scoring Assumptions A major objective in constructing the SF-36 was achievement of high psychometric standards. Guidelines for testing were derived from those recommended for use in validating psychological and educational measures by the American Psychological Association, the American Education Research Association, and the National Council on Measurement in Education (APA, 1974). Extensive psychometric testing has been conducted on the SF-36 in the United States (Garratt, Ruta, Abdalla, Buckingham, & Russell, 1993; Jenkinson, Coulter, & Wright, 1993; McHorney, Ware, Lu, & Sherbourne, 1994; Wagner, Keller, et al., 1995), and in numerous other countries (Bullinger, 1995; McCallum, 1995; Rampal, Martin, Marquis, Ware, & Bonfils, 1994; Sullivan, 1994; Sullivan, Karlsson, & Ware, 1995). On the strength of favorable results from tests to date, nearly all studies have used the method of summated ratings and standardized SF-36 scoring algorithms documented elsewhere (Medical Outcomes Trust, 1991; Ware et al., 1993). This method assumes that items shown in the same scale in Fig. 40.1 can be aggregated without score standardization or item weighing. Standardization of items within a scale was avoided by selecting or constructing items with roughly equivalent means and standard deviations. Weighing was avoided by using equally representative items (i.e., items with roughly equivalent relations to the underlying scale dimension). All items have been shown to correlate substantially (greater than 0.40, corrected for overlap) with their hypothesized scales with rare exceptions (McHorney et al., 1994; Ware et al., 1993). These results support analysis as interval-level measurement scales. Reliability and Confidence Intervals The reliability of the eight scales and two summary measures has been estimated using both internal consistency and test-retest methods. With rare exceptions, published reliability statistics have exceeded the minimum standard of .70 recommended for measures used in group comparisons in more than 25 studies (Manocchia et al., 1997); most have exceeded .80 (McHorney et al., 1994; Ware et al., 1993). Reliability estimates for physical and mental summary scores usually exceed .90 (Ware, Kosinski, et al., 1994). One review of 15 published studies revealed that the median reliability coefficients for each of the eight scales was equal or greater than .80 except Social Functioning, which had a median reliability across studies of .76 (Ware et al., 1993). In addition, a reliability of .93 has been reported for the Mental Health scale using the alternate forms method, suggesting that the internal-consistency method underestimated the reliability of that scale by about 3% (McHorney & Ware, 1995). The trends in reliability coefficients for the SF-36 scales and summary measures already summarized have also been replicated across 24 patient groups differing in sociodemographic characteristics and diagnoses (McHorney et al., 1994; Ware et al.,

< previous page

page_1231

next page >

< previous page

page_1232

next page > Page 1232

1993; Ware, Kosinski, et al., 1994). Although studies of subgroups indicate slight declines in reliability for more disadvantaged respondents, reliability coefficients consistently exceeded recommended standards for group level analysis. Reliability estimates consistent with these trends have been published in more than 100 studies, including more than 30 test-retest studies (Manocchia et al., 1997). Standard errors of measurement, 95% confidence intervals for individual scores, and distributions of change scores from test-retest and 1-year stability studies have been published (Brazier et al., 1992; Ware et al., 1993; Ware, Kosinski, et al., 1994). Confidence intervals around individual scores are much smaller for the two summary measures than for the eight scales (+/- 6-7 points vs. +/- 13-32 points, respectively; Ware, Kosinski, et al., 1994). Estimates of sample sizes required to detect differences in average scores of various magnitudes have been documented for five different study designs for each of the eight scales and for the two summary measures (Ware et al., 1993; Ware, Kosinski, et al., 1994). Validity Studies of validity are about the meaning of scores and whether or not they have their intended interpretations. Because of the widespread use of the SF-36 across a variety of applications, evidence of all types of validity is relevant. Studies to date have addressed content, concurrent, criterion, construct, and predictive validity. The content validity of the SF-36 has been compared to that of other widely used generic health surveys (Ware, 1994; Ware et al., 1993). Systematic comparisons indicate that the SF-36 includes eight of the most frequently represented health concepts. Among the content areas included in widely used surveys, but not included in the SF-36, are sleep adequacy, cognitive functioning, sexual functioning, health distress, family functioning, selfesteem, eating, recreation/hobbies, communication, and symptoms/problems that are specific to one condition. Symptoms and problems specific to a particular condition are not included in the SF-36 because it is a generic measure. To facilitate the consideration of concepts not included, the SF-36 users' manuals include tables of correlations between the 8 scales and the 2 summary measures and 32 measures of other general concepts (Ware et al., 1993; Ware, Kosinski, et al., 1994) and 19 specific symptoms. SF-36 scales correlate substantially (r = .40 or greater) with most of the omitted general health concepts and with the frequency and severity of many specific symptoms and problems. A noteworthy exception is sexual functioning, which correlates relatively weakly with SF-36 scales and is a good candidate for inclusion in questionnaires that supplement the SF-36. Because most SF-36 scales were constructed to reproduce longer scales, attention was initially given to how well the short-form versions perform in empirical tests relative to the full-length versions. Relative to the longer MOS measures they were constructed to reproduce, SF-36 scales have been shown to perform with about 80% to 90% empirical validity in studies involving physical and mental health "criteria" (McHorney et al., 1993). The validity of each of the eight scales and the two summary measures has been shown to differ markedly as would be expected from factor analytic studies of construct validity (see Fig. 40.2; McHorney et al., 1993; Ware, Kosinski, et al., 1994; Ware, Kosinski, Bayliss, et al., 1995). Specifically, the Mental Health, RoleEmotional, and Social Functioning scales and the MCS summary measure have been shown to be the most valid mental health measures in both cross-cultural and longitudinal tests using the method of known-groups validity. The Physical Functioning, Role-Physical, and

< previous page

page_1232

next page >

< previous page

page_1233

next page > Page 1233

Fig. 40.2. Construct validation of the SF-36 two-component model. Source: Ware, Kosinski, and Keller, 1994. Bodily Pain scales and the PCS have been shown to be the most valid physical health measures. Criteria used in the known-groups validation of the SF-36, which include accepted clinical indicators of diagnosis and severity of depression, heart disease, and other conditions, are well documented in peer-reviewed publications and in the two users' manuals (Kravitz et al., 1992; McHorney et al., 1993; Ware et al., 1993; Ware, Kosinski, et al., 1994; Ware, Kosinski, Bayliss, et al., 1995). The Mental Health scale has been shown to be useful in screening for psychiatric disorders (Berwick, 1991; Ware, Kosinski, et al., 1994), as has the MCS summary measure (Ware, Kosinski, et al., 1994). For example, using a cutoff score of 42, the MCS had a sensitivity of 74% and a specificity of 81% in detecting patients diagnosed with depressive disorder (Ware, Kosinski, et al., 1994). Relative to other published measures, SF-36 scales have performed well in most tests published to date (Brazier et al., 1992; Kantz, Harris, Levitsky, Ware, & Davies, 1992; Krousel-Wood, McCune, Abdoh, & Re, 1994; Krousel-Wood & Re, 1994; Weinberger et al., 1991). The second SF-36 annotated bibliography cites studies comparing the SF-36 with 225 other measures. Predictive validity studies have linked SF-36 scales and summary measures to utilization of health care services (Ware, Kosinski, et al., 1994), the clinical course of depression (Beusterien, Steinwald, & Ware, 1996; Wells, Burnam, Rogers, Hays, & Camp, 1992), loss of job within 1 year (Ware, Kosinski, et al., 1994), and 5-year survival (Ware, Kosinski, et al., 1994). Results from clinical studies comparing scores for patients before and after treatment have largely supported hypotheses about the validity of SF-36 scales based on factor analytic studies. For example, clinical studies have shown that three of the scales (Physical Functioning, Role-Physical, and Bodily Pain) with the most physical factor content (Fig. 40.2) tend to be most responsive to the benefits of knee replacement (Kantz et al., 1992), hip replacement (Kantz et al., 1992; Lansky, Butler, & Waller, 1992), and heart valve surgery (Phillips & Lansky, 1992). In contrast, the three scales with the most mental factor content (Mental Health, Role-Emotional, and Social Functioning) in factor analytic studies have been shown to be most responsive in comparisons of patients before and after recovery from depression (Ware et al., 1995); change in the severity of

< previous page

page_1233

next page >

< previous page

page_1234

next page > Page 1234

depression (Beusterien et al., 1996); as well as drug treatment and interpersonal therapy for depression (Coulehan, Schulberg, Block, Madonia, & Rodrigues, 1997). The discovery that 80% to 85% of the reliable variance in the eight SF-36 scales was accounted for by two factors that led to the construction of psychometrically based physical and mental health summary measures. It was hoped that they would make it possible to reduce the number of statistical comparisons involved in analyzing the SF-36 (from eight to two) without substantial loss of information. In both cross-sectional and longitudinal studies reported to date, this appears to be the case (Ware, Kosinski, et al., 1994; Ware, Kosinski, Bayliss, et al., 1995). The advantages and disadvantages of analyzing the eight-scale SF-36 profile versus the two summary measures are illustrated and discussed elsewhere (Ware, Kosinski, et al., 1994; Ware, Kosinski, Bayliss, et al., 1995). Finally, the SF-36 self-evaluated health transition item (five levels from "much better" to "much worse"), which is not used in scoring the scales or summary measures, has been shown to be useful in estimating average changes in health status during the year prior to its administration. In the MOS, measured changes in health status during a 1-year follow-up period corresponded substantially, on average, to self-evaluated transitions at the end of the year. Using the 0 to 100 General Health Rating Index (GHRI) scale (Davies & Ware, 1981) as a "criterion," those who evaluated their health as "much better" improved an average of 13.2 points. The average change was 5.8 points for those who reported that they were "somewhat better.'' An average decline of -10.8 was observed for those who reported that their health was "somewhat worse" and 34.4 for those reporting "much worse." (It should be noted that the latter category had only 29 patients.) Change scores for those choosing the "about the same" category averaged 1.6 points. These results are encouraging with regard to the use and interpretation of self-evaluated transitions at the group level. Pending results from ongoing studies of the reliability of responses to the SF-36 self-evaluated transition item, it should be interpreted with caution at the individual level. Additional results and their implications are discussed elsewhere (Ware et al., 1993; Ware, Kosinski, et al., 1994). Administration Methods and Scoring The SF-36 is suitable for self-administration, computerized administration, or administration by a trained interviewer in person or by telephone, to persons age 14 and older. The SF-36 has been administered successfully in general population surveys in the United States and other countries (Ware, Keller, et al., 1995), as well as to young and old adult patients with specific diseases (McHorney et al., 1994; Ware et al., 1993). It can be administered in 5 to 10 minutes with a high degree of acceptability and data quality (Ware et al., 1993). Indicators of data quality that have yielded satisfactory results in studies to date include very high item completion rates and favorable results for a response consistency index based on 15 pairs of SF-36 items, which is scored at the individual level (Ware et al., 1993). Computer-administered and telephone voice recognition interactive systems of administration are currently being evaluated. Summary Measures Table 40.1 summarizes information about the eight SF-36 scales and two summary measures that is important in their use and interpretation. The eight scales are ordered

< previous page

page_1234

next page >

< previous page

page_1235

next page > Page 1235

TABLE 40.1 Summary of Information About SF-36 Scales and Physical and Mental Component Summary Measures Correlations Number of Definition (% observed) Scales PCS MCS ItemsLevelsMeand SD ReliabilityCla Lowest Possible Score Highest Possible Score (Floor)c (Ceiling)c 10 21 84.2 23.3 .93 12.3 .85 .12 Very limited in performing Performs all types of Physical all physical activities, physical activities Functioning including bathing or including the most (PF) dressing (0.8%) vigorous without limitations due to health (38.8%) 4 5 80.9 34.0 .89 22.6 .81 .27 Problems with work or No problems with work or Roleother daily activities as a other daily activities Physical result of physical health (70.9%) (RP) (10.3%) 2 11 75.2 23.7 .90 15.0 Bodily Pain .76 .28 Very severe and extremely No pain or limitations due (BP) limiting pain (0.6%) to pain (31.9%) 5 21 71.9 20.3 .81 17.6 .69 .37 Evaluates personal health Evaluates personal health General as poor and believes it is as excellent (7.4%) Health likely to get worse (0.0%) (GH) 4 21 60.9 20.9 .86 15.6 .47 .65 Feels tired and worn out all Feels full of pep and Vitality of the time (0.5%) energy all of the time (VT) (1.5%) 2 9 83.3 22.7 .68 25.7 .42 .67 Extreme and frequent Performs normal social Social interference with normal activities without Functioning social activities due to interference due to (SF) physical and emotional physical or emotional problems (0.6%) problems (52.3%) 3 4 81.3 33.0 .82 28.0 .16 .78 Problems with work or No problems with work or Roleother daily activities as a other daily activities Emotional result of emotional (71.0%) (RE) problems (9.6%) (Continued) (table continued on next page)

< previous page

page_1235

next page >

< previous page

page_1236

next page > Page 1236

(table continued from previous page) TABLE 40.1 (Continued) Correlations Number of Definition (% observed) PCS MCS ItemsLevelsMeandSD Reliability Cla Lowest Possible Score Highest Possible Score (Floor)c (Ceiling)c 5 26 74.7 18.1 .84 14.0 .17 .87 Mental Feelings of Feels peaceful, happy, and Health nervousness and calm all of the time (0.2%) (MH) depression all of the time (0.0%) 35 567b 50.0 10.0 .92 5.7 Physical Limitations in selfNo physical limitations, Component care, physical, social, disabilities, or decrements in Summary and role activities, well-being, high energy level, (PCS) severe bodily pain, health rated "excellent" (0.0%) frequent tiredness, health rated "poor" (0.0%) 35 493b 50.0 10.0 .88 6.3 Mental Frequent psychological Frequent positive affect, Component distress, social and role absence of psychological Summary disability due to distress and limitations in (MCS) emotional problems, usual social/role activities due health rated "poor" to emotional problems, health (0.0%) rated "excellent" (0.0%) Note. From Ware, Kosinski, and Keller (1994). aCI = 95% confidence interval. bNumber of levels observed at baseline; scores rounded to the first decimal place (n = 2, 474). cPercentage observed comes from general U.S. population sample. dScores for eight scales are the percentage of the total possible score achieved for each of these scales. Scores for PCS and MCS are T-scores. Scales

< previous page

page_1236

next page >

< previous page

page_1237

next page > Page 1237

in terms of their factor content (i.e., construct validity) as they are in the SF-36 profile to facilitate interpretation. The first scale is Physical Functioning (PF), which has been shown to be the best all around measure of physical health; the last scale, Mental Health (MH), is the most valid measure of mental health in studies to date (McHorney et al., 1993; Ware et al., 1993; Ware, Kosinski, et al., 1994). Interestingly, MH and PF are the poorest measures of the physical and mental components, respectively. Scales in between are ordered according to their validity in measuring physical and mental health. The Vitality and General Health scales have substantial or moderate validity for both components of health status and should be interpreted accordingly. The number of items and levels and the range of states defined by each scale are also shown in Table 40.1. These attributes have been linked to their empirical validity (McHorney et al., 1992). The most precise (least coarse) scales are those with 20 or more levels (PF, GH, VT, and MH). They also define the widest range of health states and, therefore, usually produce the least skewed score distributions. The relatively coarse role disability scales (RP and RE) each measure only four or five levels across a restricted range and, therefore, usually have the most problems with ceiling and floor effects. Means and standard deviations for each of the eight scales in the general U.S. adult population are also presented in Table 40.1. These can be used to determine whether a group or individual in question scores above or below the U.S. average. Detailed normative data, including frequency distributions of scores and percentile ranks, are documented in the two users' manuals (Ware et al., 1993; Ware, Kosinski, et al., 1994). Table 40.1 illustrates the practical implications of a number of theoretical advantages of the PCS and MCS summary measures, including reliability, as well as the number and range of levels covered. Norm-Based Scoring and Interpretation The standardization of mean scores and standard deviations for all SF-36 scales has proven to be very useful when monitoring disease groups and interpreting differences across scales in the SF-36 profile. As documented elsewhere (Ware, Kosinski, et al., 1994), linear transformations were performed to achieve scores of 50 and 10 for means and standard deviations, respectively, in the general U.S. population. This transformation achieves the same means and standard deviation for all eight scales as published for the physical and mental summary measures. The advantages of norm-based scoring can be illustrated by comparing the SF-36 profile scored using the original 0 to 100 scoring algorithms based on the summated ratings method and the norm-based scoring algorithms for a sample of asthmatic patients who participated in a clinical trial (Okamoto, Noonan, DeBoisblanc, & Kellerman, 1996). The original SF-36 0 to 100 scoring produced the profile shown in Fig. 40.3a. The shape of this profilethe peaks and valleys resulting from higher and lower scores across scalesreflects at least two different things. First, to some extent, low scores indicate the impact of asthma on SF-36 health concepts. Second, the profile shape reflects arbitrary differences in the ceilings and floors of the different scales. Three scales, namely, General Health (GH), Vitality (VT), and Mental Health (MH), measure relatively wide score ranges and set the ceiling relatively high by measuring very favorable levels of those health concepts (Ware et al., 1993). Other scales, such as Physical Functioning (PF) and Role-Physical (RP), assess a narrower range. Their most favorable levels (scored 100 using the original SF-36 algorithms) represent the absence of limitations and do not extend the range into well-being. Thus, the average score for each

< previous page

page_1237

next page >

< previous page

page_1238

next page > Page 1238

Fig. 40.3a. SF-36 health profile: adults with asthma compared with U.S. norm. Source: Okamoto et al., 1996. *Norm significantly higher.

Fig. 40.3b. Standardized SF-36 profile (Version 2.0): adults with asthma. *Scale significantly below norm. scale differs substantially across the profile using the original SF-36 0 to 100 scoring. A common inference from the profile in Fig. 40.3 that asthma has a greater impact on the Vitality scale than on the Physical Functioning scale is incorrect. The age- and gender-adjusted general population norms provide a basis for comparisons across scales (see Fig. 40.3a). For example, the PF scale averages between 80 and 90, whereas the VT average score is just above 60 (on the 100-point score range)

< previous page

page_1238

next page >

< previous page

page_1239

next page > Page 1239

Fig. 40.3c. Asthma: After inhaler treatment (one year follow-up). *Significant improvement with treatment. in the general population. In relation to these age- and gender-adjusted norms, the impact of asthma is much larger on the Physical Functioning scale than on the Vitality scale, although both are statistically significant. Using the original 0 to 100 scoring, these differences in norms must be kept in mind when interpreting a profile. Differences in standard deviations, which are also substantial across some scales, must also be considered for this purpose. In norm-based scoring, each scale was scored to have the same average (50) and the same standard deviation (10 points). Without referring to norms, it is clear that anytime a scale score is below 50, health status is below average, and each point is one tenth of a standard deviation. As shown in Fig. 40.3b, with norm-based scoring, differences in scale scores much more clearly reflect the impact of the diseasein this example, the impact of asthma. Clinicians can more quickly and appropriately interpret the effect of asthma on a SF-36 health profile. Because the Physical (PCS) and Mental (MCS) component summary measures take into account the correlation among the eight SF-36 scales, it is clear that asthma impacted on the physical component of health and (from the profile with five significant differences) impacted very broadly. The application of norm-based scoring to a clinical trial of treatment effects is illustrated in Fig. 40.3c. Patients treated using an inhaler showed statistically significant improvements relative to baseline after 16 weeks of treatment on three of the eight SF-36 scales, those most closely associated with physical functioning, and on the physical summary measure. Translations The International Quality of Life Assessment (IQOLA) Project is translating, validating, and norming the SF-36 Health Survey for use in multinational clinical trials and other international studies (Aaronson et al., 1992; Ware, Gandek, & the IQOLA Project Group, 1996; Ware, Gandek, & the IQOLA Project Group, 1994; Ware, Keller et al.,

< previous page

page_1239

next page >

< previous page

page_1240

next page > Page 1240

1995). Based at the Health Assessment Lab at New England Medical Center, the project began in 1991 with sponsored investigators from 14 countries: Australia, Belgium, Canada, Denmark, France, Germany, Italy, Japan, The Netherlands, Norway, Spain, Sweden, the United Kingdom (English version), and the United States (English and Spanish versions). In addition, researchers from more than 30 other countries are translating and validating the SF-36 using IQOLA Project methods, including Argentina, Bangladesh, Brazil, Bulgaria, Cambodia, China, Croatia, Czech Republic, Estonia, Finland, Greece, Hong Kong, Hungary, Iceland, Indonesia, Israel, Korea, Mexico, New Zealand, Poland, Portugal, Romania, Russia, Singapore, Slovak Republic, South Africa, Taiwan, Tanzania, Turkey, the United Kingdom (Welsh), the United States (Chinese, Japanese, Vietnamese), and Yugoslavia. Four major stages of activity are included. First, translation follows a standard protocol, including multiple forward and backward translations. Qualitative and quantitative methods are used to evaluate the quality of a translation and its conceptual equivalence with the original survey. Second, formal psychometric tests of scaling assumptions and scoring assumptions are conducted prior to publication of a translation. Third, data from clinical trials and other studies are being analyzed to address issues of validity and comparability across countries. Normative data are being collected in general population surveys in 11 countries for purposes of norm-based interpretation. Published norms will soon be available for 10 countries. English-language, Swedish, and Italian user's manuals are available and others are forthcoming. Published IQOLA Project SF-36 translations and English-language adaptations are distributed royalty-free by the Health Assessment Lab and, on approval by the Scientific Advisory Committee of the Medical Outcomes Trust (MOT), by the MOT. Currently, published forms include the German (Bullinger, 1995), Spanish (Alonso, Prieto, & Anto, 1995), Swedish (Sullivan et al., 1995), and Italian (Apolone, Cifani, Liberati, & Mosconi, 1997) translations and English-language adaptations for use in Australia/New Zealand, Canada, and the United Kingdom. For information about the availability of SF-36 translations, go to the Internet at http://www.sf36.com. Conclusions McDowell and Newell (1996) attributed the "meteoric rise to prominence" observed for the SF-36 Health Survey to a variety of factors. The widespread adoption of the SF-36 in general population surveys and clinical trials is evidence that more practical measurement tools are more likely to be used. The standardization of measurement across studies is producing considerable information about norms and benchmarks useful in comparing "well" and "sick" populations and for estimating the burden of specific conditions. The brevity of the SF-36 was achieved by focusing on only 8 of 40 health concepts studied in the MOS and by measuring each concept with a short-form scale. The scales chosen (excluding General Health) have been shown to explain about two thirds of the reliable variance in individual evaluations of current health status in the United Kingdom, United States, and Sweden (Ware, Keller, et al., 1995). In the United States, addition of 14 multi-item measures (e.g., sleep problems, family and sexual functioning) added only about 5% to the variance explained in general health evaluations. Although many studies appear to be relying on the SF-36 as the principal measure of health outcome, among the most useful studies are those that use it as a "generic core." A generic core battery of measures makes it possible to compare results across

< previous page

page_1240

next page >

< previous page

page_1241

next page > Page 1241

studies and populations and accelerates the accumulation of interpretation guidelines essential to determining the clinical, economic, and social relevance of differences in health status and outcomes. Because it is short, the SF36 can be reproduced in a questionnaire with ample room for other more precise general and specific measures. Numerous studies (Kantz et al., 1992; Nerenz, Repasky, Whitehouse, & Kahkonen, 1992; Wagner, Keller, et al., 1995) have adopted this strategy and have illustrated the advantages of supplementing it. How useful is the SF-36 for purposes of comparing general and specific population groups, relative to longer surveys? Some SF-36 scales have been shown to have 10% to 20% less precision than the long-form MOS measures that SF-36 scales were constructed to reproduce (McHorney et al., 1992). This disadvantage of the SF36 should be weighed against the fact that some of these long-form measures require 5 to 10 times greater respondent burden. Empirical studies of this tradeoff suggest that the SF-36 provides a practical alternative to longer measures and that the eight scales and two summary scales rarely miss a noteworthy difference in physical or mental health status in group level comparisons (Katz, Larson, Phillips, Fossel, & Liang, 1992; Ware et al., 1993; Ware, Kosinski, et al., 1994). Regardless, the fact that the SF-36 represents a documented compromise in measurement precision (relative to longer MOS measures) leading to a reduction in the statistical power of hypothesis testing should be taken into account in planning clinical trials and other studies. To facilitate such planning, five tables of sample size estimates for differences in scores of various amounts for conventional statistical tests are published in the two SF-36 users' manuals (Ware et al., 1993; Ware, Kosinski, et al., 1994). In relation to longer non-MOS measures, such as the Sickness Impact Profile, the SF-36 has performed equally well or better in detecting average group differences or changes over time (Beaton, Bombardier, & Hogg-Johnson, 1994; Katz et al., 1992). The value of general and specific population norms, which was demonstrated well for the Sickness Impact Profile (Bergner, Bobbitt, Carver, & Gilson, 1981) and later for the MOS SF-20 (Stewart et al., 1988; Stewart, Hays, & Ware, 1989) and other measures, has also been demonstrated for the SF-36. In addition to the 20 medical conditions described in the MOS and 14 conditions described in the U.S. population norming survey (Ware, Kosinski, et al., 1994), other publications have reported descriptive data for patients with cardiac disease (Jette & Downing, 1994; Krousel-Wood et al., 1994), depressive disorders (Coulehan et al., 1997), epilepsy (Vickrey et al., 1992; Wagner, Keller, et al., 1995), diabetes mellitus (Jacobson, de Groot, & Samson, 1994; Nerenz et al., 1992), migraine headache (Osterhaus, Townsend, Gandek, & Ware, 1994), heart transplant patients (Rector, Ormaza, & Kubo, 1993), ischemic heart disease (Phillips & Lansky, 1992), ischemic stroke (Kappelle et al., 1994), low back pain (Garratt et al., 1993; Lansky et al., 1992), lung disease (Viramontes & O'Brien, 1994), menorrhagia (Garratt, Ruta, Abdalla, & Russell, 1994), orthopedic conditions leading to knee replacement (Kantz et al., 1992), knee surgery (Katz et al., 1992), and hip replacement (Katz et al., 1992; Lansky et al., 1992), and for renal disease (Benedetti et al., 1994; Kurtin, Davies, Meyer, DeGiacomo, & Kantz, 1992; Meyer et al., 1994). Whereas some of the initial descriptive studies using the SF-36 were performed primarily to validate scale scores (McHorney et al., 1992), on the strength of validation studies to date, SF-36 scales appear to be increasingly accepted as valid health measures for purposes of documenting disease burden. Much remains to be discovered about population health in comprehensive terms of functional health and wellbeing, the relative burden of disease, or the relative benefits of alternative treatments. One reason has been the lack of practical measurement tools

< previous page

page_1241

next page >

< previous page

page_1242

next page > Page 1242

appropriate for widespread use across diverse populations. The SF-36 was constructed to provide a basis for such comparisons of results. As predicted when it was first published (Ware & Sherbourne, 1992), the SF-36 has been widely adopted because of its brevity and its comprehensiveness. Although these two measurement goals are competing, the SF36 appears to have achieved a psychometrically sound compromise between them. Population and large group descriptive studies and clinical trials to date demonstrate that the SF-36 is very useful for descriptive purposes such as documenting differences between sick and well patients and for estimating the relative burden of different medical conditions. Although its usefulness in clinical trials was doubted by many, experience to date from more than 50 studies suggests that the SF-36 will also be useful in evaluating the benefits of alternative treatments (Manocchia et al., 1997). Acknowledgments Development and validation of the SF-36 Health Survey was supported by a grant from the Henry J. Kaiser Family Foundation to the Health Institute, New England Medical Center (J.E. Ware, Jr., principal investigator). Development of the SF-36 PCS and MCS summary measures was supported by unrestricted research grants for the International Quality of Life Assessment (IQOLA) Project for the Glaxo Research Institute, Research Triangle Park, North Carolina, and Schering-Plough Corporation, Kenilworth, New Jersey (J.E. Ware, Jr., principal investigator). The IQOLA Project is sponsored by unrestricted research grants from Glaxo Wellcome Inc., Research Triangle Park, North Carolina, and Schering-Plough Corporation, Kenilworth, New Jersey. Associate sponsors include Astra, Parke-Davis, Pharmacia & Upjohn, Proctor & Gamble Pharmaceuticals, Searle, Solvay Duphar B.V., and Synthelabo. Additional support has also been provided by more than 40 other pharmaceutical companies. References Aaronson, N.K., Acquadro, C., Alonso, J., Apolone, G., Bucquet, D., Bullinger, M., Bungay, K., Fukuhara, S., Gandek, B., Keller, S., Razavi, R., Sanson-Fisher, M., Sullivan, S., Wood-Dauphinee, S., Wagner, A., & Ware, J.E. (1992). International Quality of Life Assessment (IQOLA) Project. Quality of Life Research, 1, 349-351. Alonso, J., Prieto, L., & Antó, J.M. (1995). La versión española del "SF-36 Health Survey" (Cuestionairo de Salud SF-36): Un instrumento para la medida de los resultados clinicos [The Spanish version of the SF-36 Health Survey: A measure of clinical outcomes]. Medicina Clinica (Barc), 104(20), 771-776. American Psychological Association. (1974). Standards for educational and psychological tests. Washington, DC: American Psychological Association. Apolone, G., Cifani, S., Liberati, M.C., & Mosconi, P. (1997). Questionario sullo stato di salute SF-36. Traduzione e validazione della versione italiana: Risultati del progetto IQOLA [The SF-36 Health Survey: Translation and validation in Italy. Results from the IQOLA Project]. Metodologia e Didattica Clinica, 5, 86-94. Beaton, D.E., Bombardier, C., & HoggJohnson S. (1994). Choose your tool: A comparison of the psychometric properties of five generic health status instruments in

< previous page

page_1242

next page >

< previous page

page_1243

next page > Page 1243

workers with soft tissue injuries. Quality of Life Research, 3, 50-56. Benedetti, E., Matas, A.J., Hakim, N., Fasola, C., Gillingham, K., McHugh, L., & Najarian, J. S. (1994). Renal transplantation for patients 60 years or older: A single-institution experience. Annuals of Surgery, 220(4), 445460. Bergner, M., Bobbitt, R.A., Carter, W.B., & Gilson, B.S. (1981). The Sickness Impact Profile: Development and final revision of a health status measure. Medical Care, 19(8), 787-805. Berwick, D.M. (1991). The double edge of knowledge. Journal of the American Medical Association, 266(6), 841-842. Beusterien, K.M., Steinwald, B., & Ware, J.E. (1996). Usefulness of the SF-36 Health Survey in measuring health outcomes in the depressed elderly. Journal of Geriatric Psychiatry Neurology, 9, 1-9. Brazier, J.E., Harper, R., Jones, N.M.B., O'Cathain, A., Thomas, K.J., Usherwood, T., & Westlake, L. (1992). Validating the SF-36 Health Survey Questionnaire: New outcome measure for primary care. British Medical Journal, 305, 160-164. Brook, R.H., Ware, J.E., Davies-Avery, A., Stewart, A.L., Donald, C.A., Rogers, W. H., Williams, K.N., & Johnston, S.A. (1979). Overview of adult health status measures fielded in RAND's Health Insurance Study. Medical Care, 17(7, Special Suppl.), 1-131. Bullinger, M. (1995). German translation and psychometric testing of the SF-36 health survey: Preliminary results from the IQOLA Project. Social Science Medicine, 41(10), 1359-1366. Coulehan, J.L., Schulberg, H.C., Block, M. R., Madonia, M.J., & Rodrigues, E. (1997). Treating depressed primary care patients improves their physical, mental, and social functioning. Archives of International Medicine, 157, 1113-1120. Davies, A.R., & Ware, J.E. (1981). Measuring health perceptions in the health insurance experiment (R-2711HHS). Santa Monica, CA: Rand Corporation. Dupuy, H.J. (1984). The Psychological General Well-Being (PGWB) index. In N.K. Wenger, M. E. Mattson, C.D. Furberg, & J. Elinson (Eds.), Assessment of quality of life in clinical trials of cardiovascular disease (pp. 170-183). New York: Le Jacq Publishing. Garratt, A.M., Ruta, D.A., Abdalla, M. I., Buckingham, J.K., & Russell, I.T. (1993). The SF-36 Health Survey Questionnaire: An outcome measure suitable for routine use within the NHS? British Medical Journal, 306, 1440-1444. Garratt, A.M., Ruta, D.A., Abdalla, M.I., & Russell, I.T. (1994). SF-36 Health Survey Questionnaire: II. Responsiveness to changes in health status in four common clinical conditons. Quality in Health Care, 3, 186192. Hulka, B.S., & Cassel, J.C. (1973). The AAFP-UNC study of the organization, utilization and assessment of primary medical care. American Journal of Public Health, 63(6), 494-501. Jacobson, A.M., de Groot, M., & Samson, J.A. (1994). The evaluation of two measures of quality of life in patients with type I and type II diabetes. Diabetes Care, 17(4), 267-274. Jenkinson, C., Coulter, A., & Wright, L. (1993). Short Form 36 (SF-36) Health Survey Questionnaire: Normative data for adults of working age. British Medical Journal, 306, 1437-1440. Jette, D.U., & Downing, J. (1994). Health status of individuals entering a cardiac rehabilitation program as measured by the Medical Outcomes Study 36-Item Shortform Survey (SF-36). Physical Therapy, 74(6), 521527. Kantz, M.E., Harris, W.J., Levitsky, K., Ware, J. E., & Davies, A.R. (1992). Methods for assessing conditionspecific and generic functional status outcomes after total knee replacement. Medical Care, 30(Suppl. 5), MS240-MS252. Kappelle, L.J., Adams, H.P., Heffner, M.L., Torner, J.C., Gomez, F., & Biller, J. (1994). Prognosis of young adults with ischemic stroke: A long-term follow-up study assessing recurrent vascular events and functional outcome in the Iowa Registry of Stroke in young adults. Stroke, 25(7), 1360-1365. Katz, J.N., Larson, M.G., Phillips, C.B., Fossel, A.H., & Liang, M.H. (1992). Comparative measurement sensitivity of short and longer health status instruments. Medical Care, 30(10), 917-925. Kravitz, R.L., Greenfield, S., Rogers, W.H., Manning, W.G., Zubkoff, M., Nelson, E., Tarlov, A.R., & Ware, J.E. (1992). Differences

< previous page

page_1243

next page >

< previous page

page_1244

next page > Page 1244

in the mix of patients among medical specialties and systems of care: Results from the Medical Outcomes Study. Journal of the American Medical Association, 267(12), 1617-1623. Krousel-Wood, M.A., McCune, T.W., Abdoh, A., & Re, R.N. (1994). Predicting work status for patients in an occupational medicine setting who report back pain. Arch Fam Med, 3, 349-355. Krousel-Wood, M.A., & Re, R.N. (1994). Health status assessment in a hypertension section of an internal medicine clinic. American Journal of Medical Science, 308(4), 211-217. Kurtin, P.S., Davies, A.R., Meyer, K.B., DeGiacomo, J.M., & Kantz, M.E. (1992). Patient-based health status measures in outpatient dialysis: Early experiences in developing an outcomes assessment program. Medical Care, 30(5 Suppl.), MS136-MS149. Lansky, D., Butler, J.B.V., & Waller, F.T. (1992). Using health status measures in the hospital setting: From acute care to ''outcomes management." Medical Care, 30(Suppl. 5), MS57-MS73. Manocchia, M., Bayliss, M.S., Connor, J., Keller, S. D., Shiely, J.-C., Tsai, C., Voris, R.A., & Ware, J.E., Jr. (1997). SF-36 health survey annotated bibliography: Second Edition (1988-1996). Boston: Health Assessment Lab, New England Medical Center. McCallum, J. (1995). The SF-36 in an Australian sample: Validating a new, generic health status measure. Australian Journal of Public Health, 19(2), 160-166. McDowell, I., & Newell, C. (1996). Measuring health: A guide to rating scales and questionnaires (2nd ed.). New York: Oxford University Press. McHorney, C.A., & Ware, J.E. (1995). Construction and validation of an alternate form general mental health scale for the Medical Outcomes Study Short Form 36-Item Health Survey. Medical Care, 33(1), 15-28. McHorney, C.A., Ware, J.E., Lu, J.F.R., & Sherbourne, C.D. (1994). The MOS 36-Item Short-form Health Survey (SF-36): III. Tests of data quality, scaling assumptions and reliability across diverse patient groups. Medical Care, 32(4), 40-66. McHorney, C.A., Ware, J.E., & Raczek, A.E. (1993). The MOS 36-Item Short-form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Medical Care, 31(3), 247-263. McHorney, C.A., Ware, J.E., Rogers, W., Raczek, A., & Lu, J.F.R. (1992). The validity and relative precision of MOS short- and long-form health status scales and Dartmouth COOP charts: Results from the Medical Outcomes Study. Medical Care, 30(Suppl. 5), MS253-MS265. Medical Outcomes Trust. (1991). Medical Outcomes Trust: Improving medical outcomes from the patient's point of view. Boston: Author. Meyer, K.B., Espindle, D.M., DeGiacomo, J. M., Jenuleson, C.S., Kurtin, P.S., & Davies, A.R. (1994). Monitoring dialysis patients' health status. American Journal of Kidney Disease, 24(2), 267-279. Nerenz, D.R., Repasky, D.P., Whitehouse, F. W., & Kahkonen, D.M. (1992). Ongoing assessment of health status in patients with diabetes mellitus. Medical Care, 30(Suppl. 5), MS112-MS124. Okamoto, L.J., Noonan, M., DeBoisblanc, B. P., & Kellerman, D.J. (1996). Fluticasone propionate improves quality of life in patients with asthma requiring oral corticosteroids. Annals of Allergy and Asthma Immunology, 76, 455-461. Osterhaus, J.T., Townsend, R.J., Gandek, B., & Ware J.E. (1994). Measuring the functional status and wellbeing of patients with migraine headaches. Headache, 34(6), 337-343. Patrick, D.L., Bush, J.W., & Chen, M.M. (1973). Toward an operational definition of health. Journal of Health Social Behavior, 14, 6-21. Phillips, R.C., & Lansky, D.J. (1992). Outcomes management in heart valve replacement surgery: Early experience. Journal of Heart Valve Disease, 1(1), 42-50. Rampal, P., Martin, C., Marquis, P., Ware, J.E., & Bonfils, S. (1994). A quality of life study in 581 duodenal ulcer patients. Scandinavian Journal of Gastroenterology, 29(206), 44-51. Rector, T.S., Ormaza, S.M., & Kubo, S.M. (1993). Health status of heart transplant recipients versus patients awaiting heart transplantation: A preliminary evaluation of the SF-36 Questionnaire. Journal of Heart Lung Transplant, 12(6 Pt. 1), 983-986. Reynolds, W.J., Rushing, W.A., & Miles, D. L. (1974). The validation of a functional

< previous page

page_1244

next page >

< previous page

page_1245

next page > Page 1245

status index. Journal of Health Social Behavior, 15, 271-289. Shiely, J-C., Bayliss, M.S., Keller, S.D., Tsai, C., & Ware, J.E. (1996). SF-36 health survey annotated bibliography: The first edition (1988-1995). Boston: The Health Institute, New England Medical Center. Stewart, A.L., Greenfield, S., Hays, R.D., Wells, K., Rogers, W.H., Berry, S.D., McGlynn, E.A., & Ware, J.E. (1989). Functional status and well-being of patients with chronic conditions: Results from the Medical Outcomes Study. Journal of the American Medical Association, 262(7), 907-913. Stewart, A.L., Hays, R.D., & Ware, J.E. (1988). The MOS Short-form General Health Survey: Reliability and validity in a patient population. Medical Care, 26(7), 724-735. Stewart, A.L., & Ware, J.E. (1992). Measuring functioning and well-being: The medical outcomes study approach. Durham, NC: Duke University Press. Stewart, A.L., Ware, J.E., & Brook, R.H. (1981). Advances in the measurement of functional status: Construction of aggregate indexes. Medical Care, 19(5), 473-488. Sullivan, M., Karlsson, J., & Ware, J.E. (1995). The Swedish SF-36 Health Survey: I. Evaluation of data quality, scaling assumption, reliability and construct validity across general populations in Sweden. Social Science Medicine, 41(10), 1349-1358. Vickrey, B.G., Hays, R.D., Graber, J., Rausch, R., Engel, J., & Brook, R.H. (1992). A health-related quality of life instrument for patients evaluated for epilepsy surgery. Medical Care, 30(4), 299-319. Viramontes, J.L., & O'Brien, B. (1994). Relationship between symptoms and health-related quality of life in chronic lung disease. Journal of General Intern Medicine, 9(1), 46-48. Wagner, A.K., Keller, S.D., Kosinski, M., Baker, G.A., Jacoby, A., Hsu, M-A., Chadwick, D. W., & Ware, J.E. (1995). Advances in methods for assessing the impact of epilepsy and antiepileptic drug therapy on patients' health-related quality of life. Quality of Life Research, 4, 115-134. Wagner, P.J., Phillips, W., Radford, M., & Hornsby, J.L. (1995). Frequent use of medical services: Patient reports of intentions to seek care. Archives of Family Medicine, 4(7), 594-599. Ware, J.E. (1976). Scales for measuring general health perceptions. Health Services Research, 11(4), 396-415. Ware, J.E. (1988). How to score the revised MOS Short-form Health Scale (SF-36). Boston: The Health Institute, New England Medical Center Hospitals. Ware, J.E. (1994). Tech notes: Confidence intervals for individual scores. Medical Outcomes Trust Bulletin, 2(1), 3. Ware, J.E. (1995). The status of health assessment 1994. Annual Review Public Health, 16, 327-354. Ware, J.E., Gandek, B., & the IQOLA Project Group. (1994). The SF-36 Health Survey: Development and use in mental health research and the IQOLA Project. International Journal of Mental Health, 23(2), 49-73. Ware, J.E., Gandek, B., Keller, S.D., & the IQOLA Project Group. (1996). Evaluating instruments used crossnationally: Methods from the IQOLA Project. In B. Spilker (Ed.), Quality of life and pharmacoeconomics in clinical trials (2nd ed., pp. 681-692). New York: Raven. Ware, J.E., Keller, S.D., Gandek, B., Brazier, J. E., Sullivan, M., & the IQOLA Project Group. (1995). Evaluating translations of health status questionnaires: Methods from the IQOLA Project. International Journal of Technology Assess Health Care, 11(3), 525-551. Ware, J.E., & Kosinski, M. (1996, September 20). The SF-36 Health Survey (Version 2.0) technical note. Boston: Health Assessment Lab (updates September 27, 1997). Ware, J.E., Kosinski, M., Bayliss, M.S., McHorney, C.A., Rogers, W.H., & Raczek, A. (1995). Comparison of methods for the scoring and statistical analysis of SF-36 health profiles and summary measures: Summary of results from the Medical Outcomes Study. Medical Care, 33(Suppl. 4), AS264-AS279. Ware, J.E., Kosinski, M., Gandek, B.G., Aaronson, N., Alonso, J., Apolone, G., Bech, P., Bullinger, M., Kaasa, S., Leplège, & Sullivan, M. (1998). The factor structure of the SF-36 Health Survey in 10 countries: Results from the International Quality of Life Assessment (IQOLA) Project. Journal of Clinical Epidemiology, 51(11), 1159-1165.

< previous page

page_1245

next page >

< previous page

page_1246

next page > Page 1246

Ware, J.E., Kosinski, M., & Keller, S.D. (1995). SF-12: How to Score the SF-12 Physical and Mental Health Summary Scales (2nd ed.). Boston: The Health Institute, New England Medical Center. Ware, J.E., Kosinski, M., & Keller, S.D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Medical Care, 3(34), 220-233. Ware, J.E., Kosinski, M., & Keller, S.K. (1994). SF-36 Physical and Mental Health Summary scales: A user's manual. Boston: The Health Institute. Ware, J.E., & Sherbourne, C.D. (1992). The MOS 36-Item Short-form Health Survey (SF-36): I. Conceptual framework and item selection. Medical Care, 30(6), 473-483. Ware, J.E., Snow, K.K., Kosinski, M., & Gandek, B. (1993). SF-36 Health Survey manual and interpretation guide. Boston: New England Medical Center, The Health Institute. Weinberger, M., Samsa, G.P., Hanlon, J.T., Schmader, K., Doyle, M.E., Cowper, P.A., Uttech, K.M., Cohen, H.J., & Feussner, J. R. (1991). An evaluation of a brief health status measure in elderly veterans. Journal of American Geriatric Sociology, 39(7), 691-694. Wells, K.B., Burnam, M.A., Rogers, W., Hays, R., & Camp, P. (1992). The course of depression in adult outpatients: Results from the Medical Outcomes Study. Archives of Generic Psychiatry, 49, 788-794.

< previous page

page_1246

next page >

< previous page

page_1247

next page > Page 1247

Chapter 41 Katz Adjustment Scales James R. Clopton Texas Tech University Roger L. Greene Pacific Graduate School of Psychology The Katz Adjustment Scales (Katz & Lyerly, 1963) were originally developed to help assess patients before their admission to a psychiatric hospital and after their return to the community. Although the most common use of these scales in the past has been as a research instrument for assessing patients following their discharge from a psychiatric hospital, the scales have been used in a variety of ways. In recent years, one of the Katz scales (originally called the R1 scale, now Part I of the KAS-R) has demonstrated its usefulness as a standard assessment instrument in the rehabilitation of patients with traumatic brain injury. It also shows promise as a measure to plan and monitor treatment and assess treatment outcome. This chapter summarizes the development of the Katz Adjustment Scales, reviews the reliability and validity data for the scales, and samples from the extensive research on the scales' usefulness in planning and monitoring treatment and assessing treatment outcome. This chapter has been substantially revised because of the recent revision of the Katz Adjustment Scales-Relative Report Form (KAS-R; Katz & Warren, 1997). A discussion of additional research on the Katz scales has been added, and the evaluation of the Katz scales has been updated to reflect new developments. Overview Summary of Development In designing the Katz Adjustment Scales, Katz and Lyerly (1963) sought to assess not only the presence or absence of symptoms of psychopathology, but also whether patients and their family members were satisfied with the patients' daily activities and social functioning. The use of ratings by family members or others familiar with the patient was considered by Katz and Lyerly to be similar to the traditional reliance on such

< previous page

page_1247

next page >

< previous page

page_1248

next page > Page 1248

individuals for background information when a highly disturbed patient is admitted to a psychiatric hospital. The Katz scales originally consisted of two sets of five scales, one set to be completed by a family member or other person familiar with the patient (the R scales), and the other set to be completed by the patient (the S scales). The items of the recently revised KAS-R are nearly identical to the items of the original R scales, but there are several additional summary scores in the KAS-R. Furthermore, as is described later, the KAS-R makes use of normative data to provide T-scores, and its manual provides more extensive interpretive guidelines than were available for the original R scales. A description of both the original Katz scales and the revised KAS-R is provided here because past research has used the original scales, but future research and clinical applications will make use of the KAS-R. The R1 Scale. The 127 items of the R1 scale were designed to assess social behavior and symptoms of psychopathology. Examples of items1 assessing positive social behaviors are "Shows good judgment" and "Is independent." Other items assess social behavior that would disturb others or could be indicative of psychopathology. For example, the following items are included: "Says the same thing over and over again" and "Believes in strange things." According to the original instructions for the R1 scale, the items were to be rated as either ''present" or "absent" when they were completed at the time of hospital admission, but were to be rated on a 4-point scale (i.e., "almost never," "sometimes," "often," or "almost always") when they were completed after discharge from the hospital (Katz & Lyerly, 1963). Katz and Lyerly (1963) were aware of the problems in using information about patients obtained from family members. For example, they noted that family members may be reluctant to make negative judgments about the patient, such as describing the patient as unfriendly. To limit such difficulties, an attempt was made to select items for the R1 scale that described specific behaviors and that avoided asking the family member to judge the patient. According to Katz and Lyerly (1963), phrases such as "looks like," "acts as if," and "says" were used with some items to emphasize that the relative is to describe the patient's behavior. Examples of items with such phrases are "Looks worn out," "Acts as if he or she sees people or things that aren't there," and "Says that people are talking about him or her." In the recent revision of the Katz scales, 100 of the original 127 R1 scale items became Part I of the KAS-R (Katz & Warren, 1997). There also is a short form for Part I, which has 48 items. This short form can be used when the patient's relative does not have sufficient time or energy to complete all 100 Part I items. In completing either the full 100 items of Part I, or the 48-item short form, the relative uses the 4-point scale. The selection of the 100 items for Part I of the KAS-R and the development of the 48-item short form is described more fully following the section on the original cluster subscales for the R1 scale. There are several new features of Part I of the KAS-R. For example, there is an Inconsistent Responding (INC) scale, which consists of 19 pairs of items that would be expected to be answered the same way by a relative describing a patient (e.g., "Is dependable" and "Is responsible"). Three additional subscales and six new indices were 1 Selected material from the prepublication edition of the Katz Adjustment Scale: Relative Form copyright © 1997 by Western Psychological Services. Reprinted by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, U.S.A. No additional reprinting in whole or in part without the expressed, written permission of the publisher. All rights reserved.

< previous page

page_1248

next page >

< previous page

page_1249

next page > Page 1249

developed during the revision of Part I of the KAS-R. These new subscales and indices are described later. Another new feature of Part I of the KAS-R is a list of critical items. According to the manual, the patient's relative should be interviewed further when any of those 45 items are responded to in the critical direction (e.g., responding "Almost Always" to the item "Says how bad or useless he or she is."). The critical items describe behaviors that could indicate problems needing immediate attention, such as the possibility that the patient has severe cognitive impairment, is overwhelmed with extreme emotional distress, or is prone to violence. Nearly all Part I items describe unusual behaviors that can indicate serious problems, so it seems somewhat arbitrary to designate nearly half of the items in Part I as being ''critical" items. Only 20 of these 45 critical items are included in the 48-item short form, so apparently some of these items are less critical than others. The R2 And R3 Scales. The R2 and R3 scales, now Part II of the KAS-R, use the same 16 items. These items describe self-care responsibilities and activities in the home and community and were adapted from a list prepared by Freeman and Simmons (1958). The items of the R2 scale ask the relative to rate each item on a 3-point scale, depending on whether the patient is "not doing" the activity described, is doing it "some," or is doing it "regularly." For example, relatives are asked to rate the following items: "Dresses and takes care of himself or herself" and "Goes to parties and other social activities." The items of the R3 scale ask the relative to rate each item on a 3-point scale, depending on whether the relative "did not expect" the patient to be doing the activity, expected the patient to be doing it "some," or expected the patient to be doing it "regularly." The R3 scale was not intended originally to be a separate measure, but was included so that the relative's level of satisfaction with the patient's performance could be assessed by examining the difference between the patient's activities (the R2 scale) and relative's expectations about the patient's activities (the R3 scale). This indirect method of assessment was intended to make it easier for a relative to indicate dissatisfaction with the level of activity and responsibility shown by the patient. In Part II of the KAS-R (Katz & Warren, 1997), the same 16 items are included in the same order, but the 3point scales used for the R2 and R3 scales have been increased to 4-point scales in Part II of the KAS-R by adding the option "Does Not Apply." This additional option is used when the activity is physically impossible for the patient, is not applicable to the patient (e.g., child care when there are no children), or does not occur in that person's culture. Part II yields scores on three measures: Level of Performance of Socially Expected Activities (P-Role), the Level of Expectations for Performance of Socially Expected Activities (E-Role), and the Expectation-Performance Discrepancy. The R4 And R5 Scales. The R4 and R5 scales, now Part III of the KAS-R, function in much the same way as the R2 and R3 scales, and there is some overlap in the two sets of items. However, the focus for the R4 and R5 scales is more on hobbies and leisure activities. Examples of items are "Take part in community or church work" and "Visit friends." The 23 items for these two scales were modeled after the Activities and Attitudes Scale (Cavan, Burgess, Havighurst, & Goldhamer, 1949). The R4 scale asks the relative to rate each item on a 3-point scale, depending on whether the patient is doing the activity "frequently," "sometimes," or "practically never." The R5 scale is rated on a 2point scale, depending on whether the relative is satisfied or dissatisfied with the patient's participation in that activity. Dissatisfaction includes wanting the

< previous page

page_1249

next page >

< previous page

page_1250

next page > Page 1250

patient to do either more or less of the activity. The contrast between the indirect method of assessing satisfaction with the R3 scale and the direct assessment for the R5 scale reflects the Katz and Warren's (1997) belief that for activities assessed by the R4 scale, it would be "less realistic" to ask about the relative's expectations. The 23 items of the R4 and R5 scales are now Part III of the KAS-R. The patient's relative provides ratings on two 4-point scales for each item. The first rating is for how often the patient is doing the activity (i.e., "frequently," "sometimes," "practically never," and "does not apply''). The second rating indicates whether the relative is "satisfied" with the patient's participation in the activity or would like to see the patient participate "more" or "less" in the activity ("does not apply" is also a response option). Part III produces two scores, one for Level of Performance of Leisure Activities (P-Leisure) and another for Level of Satisfaction with Leisure Activities (S-Leisure). The S1 Scale. The S1 scale's 55 items, which assess the patient's discomfort with symptoms and problems, were adapted from the Johns Hopkins Symptom Checklist (Parloff, Kelman, & Frank, 1954). The patient rates each item on a 4point scale, depending on whether the patient has had the symptom during the past few weeks and how often the patient has been bothered by the symptom (i.e., "not had the complaint," "a little," "quite a bit," or "almost all the time"). The S2, S3, S4, and S5 Scales. These four scales are identical in content and scoring to the corresponding R scales completed by relatives, except that the wording was adapted for self-ratings. The items for the S1 scale are available from Western Psychological Services, the publisher of the KAS-R, but the recent revision of the Katz scales focuses only on the R scales and does not include the S scales. The R scales have been used much more frequently than the S scales in both clinical and research applications. Furthermore, there are so many self-report measures available that clinicians and researchers wanting to use the S1 scale should consider whether some other self-report measure may better serve their needs (W.L. Warren, personal communication, September 15, 1997). Instructions for administering the Katz scales were provided in an appendix to the original monograph (Katz & Lyerly, 1963), and additional instructions were provided by Michaux, Katz, Kurland, and Gansereit (1969). The manual for the KAS-R has clear and thorough instructions for administration of Parts I, II, and III (equivalent to the original five R scales). These instructions make it clear that the relative completing the KAS-R should focus on "the past 2 or 3 weeks." The KAS-R may be administered during an interview if the relative has difficulty reading the items, or if there is some other reason that such an administration is preferable to the standard administration. About 35 to 45 minutes are needed for a relative to complete all three parts. Completing only the 48-item short form for Part I typically requires 10 to 15 minutes. In addition to describing the development of the scales, Katz and Lyerly (1963) reported the results of two studies: an initial validity study (described later) and the use of cluster analysis to develop a more refined scoring method for the R1 scale. The patients in this second study were 100 newly admitted state hospital patients (48 men, 52 women), mostly patients with schizophrenia, who were referred for medication to relieve their "anxiety, agitation, and restlessness" (p. 523). Over half of the informants who completed the R1 scale were spouses of the patients; the other informants were siblings, parents, other relatives, and friends. The response format was changed from a 4-point scale to "yes" or "no" to shorten the time it took the relatives to complete the scale.

< previous page

page_1250

next page >

< previous page

page_1251

next page > Page 1251

Cluster analysis of the data obtained from the relatives of the patients was used to identify internally consistent and relatively independent clusters. The 11 cluster subscales identified were: Belligerence, Verbal Expansiveness, Negativism, Helplessness, Suspiciousness, Anxiety, Withdrawal and Retardation, Nervousness, Confusion, Bizarreness, and Hyperactivity. In addition, those R1 scale items that were correlated highly with at least five clusters became a subscale assessing general psychopathology. The internal consistency of the cluster subscales was assessed by analyzing relatives' ratings of the R1 scale for 242 newly admitted patients with symptoms of schizophrenia. The internal consistency coefficients for the cluster subscales ranged from .41 to .81; a consistency coefficient was not reported for the Confusion cluster subscale. A factor analysis of the cluster subscales was performed using the relatives' ratings of the R1 scale for 404 patients from nine hospitals and treatment centers, including the 242 patients whose data had been used to determine the internal consistency of the subscales. (Again, the Confusion cluster scale was omitted.) Three factors were identified that accounted for 57% of the total variance of the subscales. Inspection of the factor loadings for the cluster subscales led Katz and Lyerly (1963) to label the three factors as Social Obstreperousness, Acute Psychoticism, and Withdrawn Depression. Katz and Lyerly (1963) noted that these three factors accounted for only about half of the variance in the R1 scale scores of patients and appeared to be unrelated to three cluster subscales (Suspiciousness, Nervousness, and Hyperactivity). The R1 scale (now, Part I of the KAS-R) is by far the most commonly used of the Katz scales. In fact, the R1 scale has sometimes been referred to simply as "the Katz Adjustment Scale" (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989). Therefore, the items for the R1 cluster subscales are presented in Table 41.1, along with the items for R1 subscales developed in other research studies. Several goals were pursued in revising the 127-item R1 scale and turning it into Part I of the KAS-R (W.L. Warren, personal communication, September 16, 1997). An attempt was made to preserve the original cluster subscales and to add other subscales and indices that would be useful clinically. In addition, those items that were redundant or did not contribute to any subscale or index were eliminated. The original cluster subscales identified by Katz and Lyerly (1963) are all present in Part I of the KAS-R and retain the same 77 items, although the order of the items has been slightly altered. Three new subscalesStability (9 items), Depression (12 items), and Expressive Deficits (6 items)have been constructed using items from the R1 scale that were not included in the original cluster subscales. The development of these new subscales was based on both clinical experience and statistical analyses, such as factor analysis. Although the cluster subscales have been retained intact in the revised KAS-R, they have been combined into six new indices designed to have greater stability than the individual cluster subscales. The Social Aggression Index is the sum of the scores for the Belligerence, Negativism, and Verbal Expansiveness subscales. The Emotionality Index is the sum of two indices, each of which is composed of two cluster subscales: the Anxiety Index (the sum of the Anxiousness and Nervousness subscales) and the Depression Index (the sum of the Depression and Helplessness subscales). The Disorientation/Withdrawal Index is the sum of the scores for the Confusion, Expressive Deficit, and Withdrawal/Retardation subscales. The Severe Pathology Index is the sum of the scores for the Bizarreness, Hyperactivity, and Suspiciousness subscales. Items were selected for the 48-item short form of Part I by identifying those items in each cluster subscale that correlated most highly with the Stability and General

< previous page

page_1251

next page >

< previous page

page_1252

next page > Page 1252

TABLE 41.1 Items for R1 Subscales from Various Research Studies Subscale KAS-R item numbers Cluster Subscales Developed by Katz and Lyerly (1963) Belligerence 20, 23, 46, 59 Verbal Expansiveness 42, 43, 87, 89, 94 Negativism 17, 21, 22, 64, 70, 71, 74-76 Helplessness 32, 39, 50, 84 Suspiciousness 18, 19, 44, 90 Anxiety 10, 13, 55, 92, 97, 99 Withdrawal and 5, 33-35, 54, 78 Retardation General Psychopathology 7, 16, 24, 25, 31, 38, 41, 45, 51, 60-63, 68, 69, 77, 79, 81, 83, 85, 86, 95, 96, 100 Nervousness 11, 12, 56, 65 Confusion 36, 37, 82 Bizarreness 14, 15, 47, 57, 98 Hyperactivity 3, 4, 8 Factor Subscales Developed by Graham et al. (1973) Belligerence 16, 20-25, 45, 46, 68, 69, 72, 74, 81, 94, 99, 100 (64, 89, 120) Social Conformity 26-30, 66, 70, 73, 80 (49, 65, 87, 96) Withdrawal-Retardation 5, 33-35, 52, 54, 78, 79 (69, 83, 102) Fear-Apprehension 58, 90-93, 97, 98 (122) Disruption of 88, 89 (29, 82, 101) Communication Agitation-Depression 2-4, 8, 50, 57 (16, 35) Loss of Control 82 (14, 61, 96, 126) Revised Subscales Developed by Vickrey et al. (1992) Nervousness 11, 12, 56 (16, 83) Dependency 31, 32, 39, 40, 52-54, 67, 80, 83, 84 (72, 78, 81, 82) Oversensitivity/Fearfulness1, 2, 6, 7, 9, 10, 13, 47, 48, 50, 55, 57, 58, 91-93, 97, 99 Withdrawal 5, 34, 35, 51, 77-79, 86, 88 (69, 102) Abnormal Thought 38, 85, 89, 95 (104) Process Social 26-30, 66, 70, 73 (49, 65) Emotional Lability 16, 60, 61, 63 (35, 126) Bizarreness 8, 14, 15, 98 Disorientation 36, 37, 82 (87, 89) Acting Out 17, 20, 21, 23, 42, 46, 59, 62, 64, 71, 87 (112) Irritability 22, 24, 25, 45, 68, 69, 74, 100 (120) Paranoia 18, 19, 44, 90 (122) Sociopathy 75, 76 (61, 68) Hyperactivity 3, 4 (77, 101) Factor Subscales Developed by Jackson et al. (1992) Emotional/Psychosocial Belligerence 8, 16, 17, 20-27, 31, 59, 61, 62, 64, 66, 6871, 73-75, 87 (65) Apathy/Amotivational 2-6, 8, 9, 12, 32, 51-53 (72) Syndrome Social Irresponsibility 17, 26-31, 51, 60, 64, 66 Emotional Sensitivity 2, 3, 6, 7, 9, 16, 18-20, 50, 74, 77 Nervousness 1, 11-13, 27, 32, 56 Social Withdrawal 2, 73, 77-79 (69) Emotional Incongruity 8, 17, 31, 60, 63, 64, 75 (35) Obstreperousness 18, 19, 23, 74 (64, 65) Resentfulness 24, 32, 72 Openness 1, 16, 65, 75 Uncooperativeness 71 (49) Determination 21, 32 (16) Resistance 41, 42, 68 (65) Physical Independence 80 (Continued)

(table continued on next page)

< previous page

page_1252

next page >

< previous page

page_1253

next page > Page 1253

(table continued from previous page) TABLE 41.1 (Continued) Subscale KAS-R item numbers Factor Subscales Developed by Jackson et al. (1992) Physical/Intellectual General Cognitive 34, 36-40, 42, 82-85, 89 (89, 96, 102, Dysfunction 104) Speech Dysfunction 40, 41, 85, 86, 88 (78, 96, 102) Arousal Disorder 49, 53, 57, 67 (14) Verbal Expansiveness 34, 35, 42, 43, 81, 87, 89 (101) Motor Retardation 33, 35 (77, 78, 102) Orientation 34, 36, 81 (87, 89) Abnormal Movement 38, 42, 67 (81, 82) Rate of Speech (101, 102) Motor Tremor 41 (78, 83) Psychiatric Paranoid Ideation 44-46, 90-92, 94, 95, 100 (112, 114, 120, 122, 126) Psychotic Anxiety 10, 14, 47, 55, 91, 92, 97 Bizarreness 10, 14, 15, 55, 95 (29, 112) Psychotic Depression 47, 48, 60, 90, 95, 100 (120) Antisocial Behavior 76 (61) Suicidal Inclination 58, 97, 99 Unrealistic Attitude 94, 96, 98, 100 Fear of Losing Control 93, 94 (114) Component Subscales Developed by Goran and Fabiano (1993) Belligerence 17, 19, 22, 24, 25, 64, 69, 74, 90 (120) Apathy/Amotivational 5, 51-53 Syndrome Social Irresponsibility 26-30, 70, 73, 80 (65) Orientation 36, 37, 82 (87, 89) Antisocial Behavior 21, 41, 71, 75, 76, 94, 100 Speech/Cognitive 33, 34, 40, 54, 86, 88 (78, 96, 102) Dysfunction Bizarreness 14, 15, 60, 63, 98 (29, 35, 82, 126) Paranoid Ideation 18, 44, 55, 57, 85, 89, 92, 95, 97 (122) Verbal Expansiveness 42, 78, 87 (69, 112, 114) Emotional Sensitivity 2, 6, 7, 9, 11, 12, 16, 50, 56, 61 Note. The numbers in parentheses are R1 scale items that were not retained in the KAS-R. The informationin this table was published with permission of authors and publishers from: Graham, J.R., Lilly, R.S.,Paolino, A.F., Friedman, I., & Konick, D.S. (1973). Measuring behavior and adjustment in the community:A factor analytic study of the Katz Adjustment Scale (Form R1). Journal of Community Psychology, 1, 4853.Copyright © 1973 by John Wiley & Sons. Reprinted by permission of John Wiley & Sons, Inc., 605 ThirdAvenue, New York, NY 10158-0012. Vickrey, B.G., Hays, R.D., Brook, R.H., & Rausch, R. (1992). Reliability and validity of the KatzAdjustment Scales in an epilepsy sample. Quality of Life Research, 1, 63-72. Reprinted by permission of M.Staquet, Editor, Quality of Life Research. Jackson, H.F., Hopewell, C.A., Glass, C.A., Warburg, R., Dewey, M., & Ghadiali, E. (1992). The KatzAdjustment Scale: Modification for use with victims of traumatic brain and spinal injury. Brain Injury, 6,109-127. Goran, D.A., & Fabiano, R.J. (1993). The scaling of the Katz Adjustment Scale in a traumaticbrain injury rehabilitation sample. Brain Injury, 7, 219229. Copyright © 1992 and 1993 by Taylor & Francis.Reprinted by permission of Taylor & Francis Group Ltd., 4 John Street, London WCIN 2ET, England. Selected material from the prepublication edition of the Katz Adjustment Scale: Relative Form copyright© 1997 by Western Psychological Services. Reprinted by permission of the publisher, Western PsychologicalServices, 12031 Wilshire Boulevard, Los Angeles, California 90025, USA. No additional reprinting in

wholeor in part without the expressed, written permission of the publisher. All rights reserved.

< previous page

page_1253

next page >

< previous page

page_1254

next page > Page 1254

Psychopathology scores and with the Part I indices (Social Aggression, Emotionality, Anxiety, Depression, Disorientation/Withdrawal, and Severe Pathology). An attempt was also made to have at least two items in the short form from each cluster subscale (Katz & Warren, 1997). When the short form is used, scores are provided only for Stability, General Psychopathology, and the Part I indices. The KAS-R manual provides data indicating that those scores from the short form are essentially equivalent to what would have been obtained if all 100 Part I items were used (correlations ranging from .87 to .98). Types of Available Norms Until the recent publication of the KAS-R manual (Katz & Warren, 1997), the only norms available for the Katz scales were those that Hogarty and Katz (1971) provided for the five R scales completed by the relatives of 450 normal adolescents and adults. Individuals with a history of psychiatric treatment were excluded from that normative sample. The influence of ethnicity and religion on the five R scales could not be assessed because nearly all subjects were White and Protestant. However, significant gender, age, marital status, and social class differences in R scale ratings were found. Means and standard deviations for the cluster subscales for the R1 scale and for five measures of social performance (scales R2, R3, R4, and R5, and the difference between scales R2 and R3) were reported separately by gender, age, marital status, and social class. Women were given higher ratings than men on Nervousness, Helplessness, and Anxiety. Compared with the ratings made by their husbands, other informants rated married women as being more suspicious and as having more general psychopathology, had higher expectations for their role performance, and were more dissatisfied with their free-time activity. Compared with other age groups, adolescents were rated as having greater negativism, whereas those over age 50 were rated as being more withdrawn. Among the significant differences for the different marital status groups, the small group of individuals who were divorced and remarried were rated as having more negativism and helplessness than individuals in other marital status groups, and their relatives indicated more dissatisfaction with their activities. The greater psychopathology and social dysfunction commonly found among the poor were reflected in the ratings given to individuals in the lower classes indicating more withdrawal, suspiciousness, instability, low performance, and dissatisfaction. Another demographic variable shown to affect the Katz scales is whether individuals come from urban or rural areas (Chu, Sallach, & Klein, 1986). When the R and S scale scores of hospitalized patients with schizophrenia from urban and rural areas were compared, patients from rural areas were found to have better social adjustment than patients from urban areas. Patients from rural areas expected more participation in social activities of themselves than patients from urban areas. Similarly, relatives of patients from rural areas expected the patients to participate more in social activities than did the relatives of patients from urban areas. These findings suggested that the higher expectations held by patients from rural areas and their relatives may have contributed to increased social participation and better adjustment. The KAS-R manual reports normative data obtained from 1,864 adults who were asked to provide ratings for another adult living in the same household (Katz & Warren, 1997). Data were collected from the normal adult population in three counties in different states (Pennsylvania, Maryland, and Hawaii), and sampling techniques were used to

< previous page

page_1254

next page >

< previous page

page_1255

next page > Page 1255

ensure that the people providing KAS-R data were representative of the general population in each location. The Maryland data were collected in the late 1960s; the data from Pennsylvania and Hawaii were collected in the mid-1970s. These normative data have been used as a standardization sample for the KAS-R so that raw scores can now be converted to T-scores. Katz and Warren (1997) provided information about how KAS-R scores differ depending on the age, gender, ethnic background, and socioeconomic status of the individuals whose behavior is being rated. Differences in some KAS-R scores were found for these demographic characteristics. Women received slightly higher ratings on Depression than men (53.9 vs. 50.5), whereas men had slightly higher ratings on Expressive Deficit (52.7 vs. 49.3). Blacks from Pennsylvania obtained higher scores than other ethnic groups for some KAS-R measures (e.g., General Psychopathology, Belligerence, and Nervousness), but the scores of Blacks and Whites from Pennsylvania were roughly equivalent on those measures. Katz and Warren (1997) argued that separate norms were not needed for different demographic groups because the differences in KAS-R scores among those groups, which were often statistically significant, rarely exceeded one third of a standard deviation, suggesting that they lacked "practical significance" (Horst, Tallmadge, & Wood, 1975, cited in Katz & Warren, 1997). Basic Validity and Reliability Information Reliability. The only reliability data reported in the initial Katz and Lyerly (1963) monograph were internal consistency coefficients for the cluster subscales for the R1 scale, which ranged from .41 to .81. Few other studies have examined the reliability of the Katz scales. Crook, Hogarty, and Ulrich (1980) assessed the interrater reliability of the R scales by administering them to both parents of 15 male patients with schizophrenia shortly after the patients' admission to a psychiatric hospital. The correlations ranged from .33 to .84 for the R1 cluster subscales, and from .08 to .85 for the other R scales. In general, the correlations indicated acceptable interrater reliability for scales assessing more observable behavior (e.g., the R2 scale and the Hyperactivity subscale from the R1 scale), but showed unacceptably low reliability for scales requiring judgment about the patients' behavior (e.g., the Nervousness, Confusion, and Suspiciousness cluster subscales from the R1 scale). Roughly half of all R1 scale items were rated the same by both parents, and the other half of the items tended to differ by only 1 point on a 4-point scale. Parker and Johnston (1989) also examined the interrater reliability of the R1 scale. Both parents of 24 patients with schizophrenia completed the R1 scale at admission of the patients. These patients were rated as having greater psychopathology by their mothers than by their fathers. The correlations between scale scores from mothers and fathers ranged from .03 to .60 for the cluster subscales, with correlations for only five scales being statistically significant. Compared with the correlations at admission, the correlations between ratings from mothers and fathers were higher when they completed the R1 scale 1 month after discharge (ranging from .15 to .98) and again 9 months after discharge (ranging from .25 to .83). Those higher correlations suggest that the reliability of the R1 scale may be higher when the patient is no longer acutely disturbed, or that there may be convergence in the judgments of parents when they repeatedly complete the R1 scale. Furthermore, for the follow-up ratings, mothers did not report more psychopathology than fathers. Most of the correlations reported by Parker and Johnston

< previous page

page_1255

next page >

< previous page

page_1256

next page > Page 1256

are so low that they raise serious concerns about the reliability of the R1 subscales. Because of the limited evidence of interrater agreement for the R scales, it might be wise to consider pooling the ratings of multiple informants (Fiske, 1975). Internal consistency data, using the 524 individuals in the Pennsylvania normative sample, are provided in the recently published KAS-R manual (Katz & Warren, 1997). The internal consistency for measures in Parts II and III was excellent (.90 to .98), and was generally quite high for the newly developed Part I indices, which are combinations of cluster subscales. The internal consistency of each of these indices was greater than .80, except for the Disorientation/Withdrawal Index (.73). As would be expected, the internal consistency of the indices and the larger subscales, such Stability and General Psychopathology, is greater than for the smaller subscales. Some of the smaller subscales have low internal consistency coefficients (e.g., .58 for the three-item Confusion subscale). The internal consistency of the scores reported for the 48-item short form for Part I is generally acceptable. The Stability and General Psychopathology subscales have internal consistency coefficients of .81 and .80, respectively. All of the Part I indices have internal consistency coefficients greater than .76, except for Disorientation/Withdrawal (.68). Katz and Warren (1997) reported that Cheung (1980) obtained an overall test-retest correlation of .79 for 20 college students in Hong Kong who completed the KAS-R ratings with a 1-week interval. Parker and Johnston (1989) assessed the test-retest reliability of the R1 scale based on ratings made by the parents of patients with schizophrenia. For mothers, the correlation of R1 scale scores at admission and at 1-month follow-up was .50, and the correlation of scores at 1-month and 9-month follow-up was .62. For fathers, the correlations were .69 between ratings at admission and at 1-month follow-up, and .75 between ratings at 1-month and 9-month follow-up. Only moderate correlations would be expected in test-retest comparisons when clinical groups are tested at different points in treatment because of the expected changes in the patients' adjustment (Katz & Warren, 1997). Validity. In their original monograph, Katz and Lyerly (1963) reported a validity study that sought to determine whether the scales could differentiate between two groups of psychiatric patients known to have different levels of adjustment following their discharge from the hospital. The director of an aftercare clinic identified 15 patients who were judged by the staff to be functioning well in the community and 15 who were judged to have poor or marginal adjustment. Nearly all patients had a diagnosis of schizophrenia when they were admitted to the hospital, and all patients had been out of the hospital for at least 10 months. All R and S scales were administered to the patients and their relatives. However, only two thirds of the relatives (9 for well-adjusted patients and 10 for poorly adjusted patients) completed the scales. Point-biserial correlations indicated that the scores for each of the Katz scales differed in the expected way for the well-adjusted and poorly adjusted patients. Specifically, well-adjusted patients had fewer symptoms and more social and free-time activities than poorly adjusted patients. There were significant differences for all but one scale; also, there were significant differences for the two measures of satisfaction with patients' performance of socially expected activities (R3-R2 for relatives; S3-S2 for patients). The S5 scale, self-ratings of patients' satisfaction with their free-time activities, did not significantly differentiate between well-adjusted and poorly adjusted patients. There was a smaller difference between the self-ratings of the two groups of patients on the S1 scale than there was for their relatives' ratings on the R1 scale.

< previous page

page_1256

next page >

< previous page

page_1257

next page > Page 1257

Another early study examined the validity of the R1 scale by comparing relatives' ratings of patients on the R1 scale with other ratings made by psychiatrists and nurses (Katz, Lowery, & Cole, 1966). Specifically, relatives' ratings on the R1 cluster subscales were correlated with nurses' ratings on the Ward Behavior Rating Scale (Burdock, Hakerem, Hardesty, & Zubin, 1959) and with psychiatrists' ratings on the Inpatient Multidimensional Psychiatric Scale (Lorr, Klett, McNair, & Lasky, 1963) and the Clyde Mood Scale (Clyde, 1963). Ratings were obtained from the relatives of 242 patients from nine different hospitals (National Institute of Mental Health, 1964). The correlations between ratings made by relatives and ratings made by psychiatrists and nurses were modest; none was greater than .39. For example, belligerence as rated by relatives correlated .20 with hostility as rated by psychiatrists and .24 with irritability as rated by nurses. Such correlations were treated by Katz et al. (1966) as evidence for the validity of the R1 cluster subscales, even though some unexpected correlations were found and some of the R1 cluster subscales did not have clear counterparts on the measures completed by psychiatrists and nurses. In addition to examining the correlations between individual measures, Katz et al. (1966) assessed the relation between patient types, which were identified by patterns of R1 subscale scores, and ratings made by psychiatrists and nurses. A principal components analysis of R1 subscale scores was used to obtain a set of six patient types (Type Iagitated, belligerent, suspicious; Type IIwithdrawn, periodically agitated; Type IIIacute panic state; Type IVwithdrawn, helpless, suspicious; Type Vagitated, helpless; Type VIagitated, expansive, bizarre, suspicious). Sixty-two percent of the 400 patients from the National Institute of Mental Health (1964) study were classified into one of the six patient types. For these 248 patients, the Inpatient Multidimensional Psychiatric Scale (IMPS; Lorr et al., 1963) was completed during the patient's first week in the hospital. A multivariate analysis of variance indicated that different symptoms were associated with the six patient types. Katz et al. concluded that there was an expected correspondence between patient types and IMPS scores for four patient types, but not for the other two patient types. Similar differences were found for ratings on the Ward Behavior Scale for the six patient types, except that patterns for the Ward Behavior Scale were not as distinctive as the patterns for the IMPS. A final question investigated by Katz et al. (1966) was whether their typology of patients was predictive of treatment outcome. Patients received either one of three types of phenothiazine medication or placebo for 6 weeks, at which time they were rated by their psychiatrist on a 7-point scale of improvement (i.e., from "very much improved" to "very much worse"). Data were analyzed for five patient types. (Type V was not included because too few patients were available.) A Drug X Patient type of analysis of variance produced a significant main effect for patient type, which Katz et al. interpreted as meaning that the different types of patients responded differently to psychotropic medication and that the R1 scale predicted response to treatment. However, differences in improvement among the patient types may have been unrelated to the drug treatment the patients received. There was no significant interaction between drug and patient type, and no significant difference in improvement was reported between patients who received phenothiazine medication and patients who received a placebo. Patients with acute panic state showed the most improvement among the patient types, but again that improvement occurred regardless of the type of treatment the patients received. Vestre and Zimmermann (1969) assessed the validity of the R1 cluster subscales by correlating them with the scales of the Psychotic Reaction Profile (PRP; Lorr, 1961). Soon after 159 patients were admitted to a psychiatric hospital, ratings for the R1

< previous page

page_1257

next page >

< previous page

page_1258

next page > Page 1258

subscales were completed by the relatives of patients, and the PRP was completed by each patient's nurse. Some significant correlations were obtained for parallel subscales between the two instruments. For example, the Agitated Depression scale of the PRP was correlated significantly with the Helplessness, Anxiety, and Nervousness subscales from the R1 scale. Similarly, the Withdrawal scale of the PRP was correlated significantly with the Withdrawal-Retardation subscale from the R1 scale. However, the correlations were modestnone was larger than .32and some expected relationships were not found. For example, there were zeroorder correlations between the Paranoid-Belligerence scale of the PRP and several relevant R1 subscales, such as Belligerence, Negativism, and Suspiciousness. Zimmermann, Vestre, and Hunter (1975) correlated R1 scale scores with numerous other measures for 149 hospitalized psychiatric patients. About half of the patients were rated by a parent and half were rated by a spouse. Canonical correlations revealed that the R1 cluster subscales were correlated highly (R = .87) with scores on the Interpersonal Checklist (LaForge & Suczek, 1955), which also was completed by the relatives. Relatives' R1 scores were significantly, but less highly, correlated with MMPI scores from the patients, with psychiatric residents' ratings on the Brief Psychiatric Rating Scale (Overall & Gorham, 1962), and with ward nurses' ratings on the PRP (Lorr, 1961). However, the relatives' ratings on the R1 scale were not correlated significantly with global ratings of psychopathology made by the patients, the psychiatric residents, or the nurses. Because of the parallel scales for patients and their relatives, it appears reasonable to correlate the R scales completed by relatives with the S scales completed by patients. Such a comparison might be considered as assessment of either reliability (the consistency between two types of raters) or validity (using one set of ratings as a criterion for validating the other set of ratings). However, there are several reasons that the ratings given by relatives will not match patients' self-ratings (Graham, Lilly, Paolino, Friedman, & Konick, 1972). Of most importance, the wording of R1 and S1 items are different, so that most R1 items do not have clear counterparts among the S1 items. Biases and response sets may affect the ratings for the other R and S scales. Patients and their family members may be motivated to give ratings that are either too favorable or too unfavorable, and the biases of patients and their family members may operate in different directions. Even without a conscious attempt to distort the ratings, such biases can lead to a lack of correspondence between the ratings of patients and their family members. For example, patients with low self-esteem might give themselves ratings that are more negative than those given by family members. The R1 scale was used in the revision of the MMPI (Butcher et al., 1989). Individuals in a community sample (823 men and 832 women) who were involved in a couple relationship completed an experimental 704-item form of the MMPI (MMPI-AX) and also completed 110 items that had been adapted from the R1 scale. Factor analysis of the R1 items produced six factors: Dysphoria, Hostility, Sociability, Impulsivity, Conformity, and Antisocial Behavior. Factor scores for these six factors were correlated with scores for MMPI-2 scales. The pattern of correlations showed expected relationships between R1 factors and MMPI-2 scales, but the magnitude of the correlations were modest at best. For example, the Dysphoria factor for the R1 scale was significantly correlated with the following MMPI-2 scales: 2 (D), 7 (Pt), 4 (Pd), and the Anxiety (ANX) and Depression (DEP) content scales, but none of the significant correlations exceeded .40 either for men or for women. Although the Katz R1 scale was used by Butcher et al. (1989) to help establish the validity of the MMPI-2 scales, their data can just as easily be seen as helping to establish the validity of the R1 factor subscales.

< previous page

page_1258

next page >

< previous page

page_1259

next page > Page 1259

Basic Interpretative Strategy Until the recent revision (Katz & Warren, 1997), there were few explicit guidelines for interpretation of the Katz scales. However, interpretation was aided by those studies that identified the dimensions underlying the R1 scale and developed subscales to measure those dimensions. Research Studies on R1 Subscales. Graham, Lilly, Paolino, Friedman, and Konick (1973) concluded that problems with the development of the Katz and Lyerly (1963) cluster subscales of the R1 scale may have prevented those subscales from accurately reflecting the dimensions of behavior measured by the R1 scale. For example, the sample size was small (100) considering the large number of items (127), and Katz and Lyerly (1963) reported that their clinical opinions influenced the identification of clusters, but they did not report the extent or nature of that influence. Graham et al. (1973) developed a new set of subscales because of their criticisms of the methodology used by Katz and Lyerly (1963) in developing the cluster subscales. Family members of 464 patients completed the R1 scale approximately 18 months after the patients were discharged from a psychiatric hospital. Factor analysis was used to develop seven independent subscales: Belligerence, Social Conformity, Withdrawal-Retardation, Fear-Apprehension, Disruption of Communication, Agitation-Depression, and Loss of Control (see Table 41.1). Except for the similarity of the Withdrawal-Retardation factor, the cluster subscales developed by Katz and Lyerly (1963) differed greatly from the factor subscales developed by Graham et al. (1973). For example, several Graham et al. subscales (Social Conformity, Disruption of Communication, and Loss of Control) lacked a counterpart among the Katz and Lyerly subscales and consisted mostly of R1 items that are not scored for any of the Katz and Lyerly subscales. Some R1 scale items did not appear in either the cluster subscales of Katz and Lyerly (1963) or the factor subscales of Graham et al. (1973). That omission suggested to Graham et al. (1973) that a shortened version of the R1 scale, perhaps the 70 items found in the seven factor subscales, might be as useful as the original 127-item measure. Graham et al. (1972) thought that the S1 scale developed by Katz and Lyerly (1963) did not assess as broad a range of information as the R1 scale, but mainly assessed symptoms of psychopathology. Because of that limitation, they revised the S1 scale by rewording the 127 items of the R1 scale for self-rating. In the revised S1 scale, patients rate each item on a 4-point scale (i.e., "almost never," "sometimes," "often," or "almost always"). Graham et al. (1972) used the responses of 464 patients in a short-term psychiatric hospitalthe same patients as in the Graham et al. (1973) studyto identify eight factors among their revised S1 scale: Agitation-Depression, Social Conformity, Belligerence, Paranoid Alienation, Loss of Control, Immaturity, Disorientation, and Psychomotor Retardation. Despite the criticisms that Graham et al. (1973) made of the Katz and Lyerly (1963) cluster subscales, the Katz and Lyerly subscales have received some empirical support (Clum, 1976; Goodman, Ball, & Peck, 1988). Clum (1976) performed two factor analyses of the R1 scale; one for 196 patients at hospital admission, and another for 99 of the same patients at 1-year follow-up. Five factors were found in both analyses: Antisocial Behavior, Anxiety, Retardation, Verbal Expansiveness, and Loss of Control. In general, the factors found by Clum (1976) were a combination of the factors found by Katz and Lyerly (1963). For example, the Anxiety and Nervousness factors found by Katz and Lyerly were combined into a single Anxiety factor found by Clum. Goodman et al.

< previous page

page_1259

next page >

< previous page

page_1260

next page > Page 1260

(1988) obtained ratings of R1 scale items from the relatives of 48 head-injured adults and 48 adults who did not have head injuries but were matched on other variables, and compared how well those two groups of patients were differentiated by the subscales developed by Katz and Lyerly (1963) and by Graham et al. (1973). Discriminant analysis revealed that the Katz and Lyerly (1963) subscales led to a slightly greater classification accuracy than the Graham et al. (1973) subscales (85.4% vs. 74.0%). In contrast to the research of Goodman et al. (1988), other research with patients with neuropsychological impairments has found it useful either to derive new subscales for the R1 scale (Fabiano & Goran, 1992; Goran & Fabiano, 1993; Jackson et al., 1992) or at least to substantially alter the Katz and Lyerly (1963) subscales (Vickrey, Hays, Brook, & Rausch, 1992). Jackson et al. (1992) altered the wording of some R1 items slightly and obtained R1 ratings from the relatives of 463 patients with traumatic brain injury or spinal injury. Ratings were made of the patient's functioning before and after the injury. A factor analysis of difference scores between the pairs of ratings produced 31 factors, which identified both positive (Determination) and negative changes (Apathy/Amotivational Syndrome) that resulted from the brain and spinal injuries of the patients. Subscales developed for the 31 factors (see Table 41.1) led to greater classification accuracy than the Katz and Lyerly (1963) subscales in a discriminant analysis when patients were grouped by type and severity of injury (60.9% vs. 47.2%). Fabiano and Goran (1992; Goran & Fabiano, 1993) developed new subscales for the R1 scale by factor analyzing the ratings provided by the relatives of 88 patients admitted to a rehabilitation program for traumatic brain injury. Ten R1 subscales were developed: Belligerence, Apathy/Amotivational Syndrome, Social Irresponsibility, Orientation, Antisocial Behavior, Speech/Cognitive Dysfunction, Bizarreness, Paranoid Ideation, Verbal Expansiveness, and Emotional Sensitivity (see Table 41.1). Although the names of these subscales are the same or similar to the subscales of Katz and Lyerly (1963), these new subscales have greater internal consistency (.75 to .93), more items, and only a modest overlap in items even for subscales with identical names. For example, the Katz and Lyerly (1963) Belligerence subscale has 4 items, whereas the Goran and Fabiano (1993) Belligerence subscale consists of 10 entirely different items. Instead of using factor analysis to develop new R1 subscales, Vickrey et al. (1992) applied multitrait scaling analysis to determine how to revise the Katz and Lyerly (1963) subscales. Using this method with the R1 ratings obtained from the relatives or friends of 328 patients at an epilepsy center led to an increase in the number of subscales from 12 to 14 and in the number of R1 items included in the subscales from 76 to 113 (see Table 41.1). Some revised subscales are highly similar to the Katz and Lyerly (1963) subscales (e.g., Bizarreness) or merely an expanded and renamed version of the original subscales (e.g., Paranoia). In contrast, other revised subscales either combined items from two or more of the original subscales (e.g., Oversensitivity/Fearfulness), or were derived either from the original General Psychopathology subscale or from those items not included in any of the original subscales (e.g., Emotional Lability). Coefficient alphas for the revised subscales were higher than for the original subscales, with more of the revised subscales having an acceptable level of internal consistency (Cronbach's alphas of at least .70 for 12 of the 14 revised subscales, but only 5 of the 12 original subscales). Vickrey et al. (1992) applied the revised subscales to the R1 ratings obtained for a second sample of 193 patients with epilepsy. These patients were divided into three groups depending on whether they were having full seizures, simple partial seizures, or no seizures. After adjusting statistically for patients' age, gender, and medication use, the 40 seizure-free patients showed significantly better adjustment on 11 of the 14 revised

< previous page

page_1260

next page >

< previous page

page_1261

next page > Page 1261

subscales than the 54 full-seizure patients. The 42 partial-seizure patients had significantly better adjustment on 8 of the 14 revised subscales than the full-seizure patients, but did not differ significantly from the seizure-free group on any subscale (even using the liberal standard of t tests for comparisons among means). Aids to Interpretation of the KAS-R. The recently published KAS-R manual provides both general guidelines and explicit statements about the meaning of high and low scores on the various KAS-R subscales and indices (Katz & Warren, 1997). Interpretation of the KAS-R also is greatly enhanced by the conversion of raw scores to T-scores that was made possible by the large standardization sample. Two general guidelines from the KAS-R manual deserve special attention. First, Katz and Warren (1997) noted that because many of the KAS-R items refer to behaviors that rarely occur in a normal population, even slight elevations on some scales is cause for concern about the patient. A clear example of this ''liberal" interpretation is given by Katz and Warren (1997). If any of the three items on the Confusion subscale is rated as occurring more often than "Almost Never," then the patient will receive a T-score of at least 60, indicating the possibility of severe cognitive impairment. The second guideline is the caution that the KAS-R is a measure of adjustment and social relationships in the community, which are often affected by medical problems and mental disorders; however, it is not a screening test for medical problems or mental disorders. That guideline is a useful way of making the point that high scores on KAS-R indices can be caused by more than one type of problem. This caution should not, however, prevent professionals from using information from the KAS-R to make judgments about medical and psychiatric interventions, as is demonstrated in the case example provided later. The first issue to be considered in interpreting the KAS-R is the validity of the relative's ratings of the patient's behavior. As Katz and Warren (1997) noted, the professional should ask the relative about any rating that is inconsistent with information previously given about the patient and should examine the Inconsistent Responding score to make sure it is not excessively elevated (i.e., that the T-score is less than 75). The two most common ways in which a relative may give invalid responses are not knowing enough about the patient to respond accurately to KAS-R items, and being protective of the patient by concealing serious problems. Either of these problems may lead the relative to not respond to some items. Omitted items should lead to great caution in interpreting KAS-R scores, especially on those subscales that have few items, and the relative should be asked about such omitted items. The KAS-R manual provides interpretations for elevated (T-score > 60) scores on the various subscales and indices. For example, it notes that when the Nervousness subscale is the only elevated score, and there is no evidence of a drug reaction or other medical problem, it is possible that the patient has an adjustment disorder that will respond favorably to short-term treatment. In addition to describing the most common possible meanings of elevations on individual subscales and indices, the KAS-R manual also describes the implications of specific patterns of elevated scores. For example, an elevated T-score on the Severe Psychopathology Index (T > 60) would magnify the serious concerns raised by elevations on other indices, such as Social Aggression, Emotionality, or Disorientation/Withdrawal. In addition to identifying behaviors that indicate the need to evaluate the patient for medical problems or mental disorders, the KAS-R can identify strengths of which the patient's relative is aware. For example, the Stability subscale of Part I, together with

< previous page

page_1261

next page >

< previous page

page_1262

next page > Page 1262

Parts II and III, can identify positive social behaviors, even for a patient who is having severe problems. Diverse Applications of the Katz Adjustment Scales The greatest strengths of the Katz scales are their durability as research instruments and the diversity of their research applications. The main applications of the Katz scales have been research on schizophrenia, including cross-cultural studies, and research on neuropsychological rehabilitation. However, the Katz scales have been used to assess a variety of groups: patients in brief and short-term therapy (Exner & Andronikof-Sanglade, 1992), drivers in fatal automobile accidents (Schmidt, Perlin, Townes, Fisher, & Shaffer, 1972), and participants in a cardiac rehabilitation program (Stern & Cleary, 1981). Exner and Andronik of-Sanglade's (1992) study is described in the section on treatment monitoring; the studies by Schmidt et al. (1972) and Stern and Cleary (1981) were described briefly in the first edition of this chapter (Clopton & Greene, 1994, p. 358). Research with Psychiatric Patients with Schizophrenia in Other Cultures. The Katz R scales were used by Otsuka, Nakane, and Ohta (1994) to investigate the symptoms and social adjustment of 58 individuals with schizophrenia who lived in a rural area of Japan and were receiving outpatient treatment. The family member who had the most contact with the patient completed the five R scales and also completed a measure of expressed emotion (EE). EE scores were significantly correlated with scores for several cluster subscales of the R1 scale. Relatives with higher EE scores rated patients significantly higher on the Belligerence, Negativism, and Helplessness subscales and significantly lower on the Stability subscale. The level of expressed emotion shown by relatives in this study was related more to their perception of the patients' condition than to the patients' level of adjustment as rated by psychiatrists. It is surprising that more studies have not included both measures of EE and ratings on the Katz R scales from patients' relatives, given that each of these types of measure has shown promise in predicting which patients with schizophrenia will relapse (Hooley, 1985; Parker & Hadzi-Pavlovic, 1990; Michaux et al., 1969; L.I. Mintz, Liberman, Miklowitz, & J. Mintz, 1987; Sappington & Michaux, 1975). Two earlier cross-cultural studies (Chu et al., 1986; Katz, Sanborn, & Gudeman, 1970) were described in the first edition of this chapter (Clopton & Greene, 1994, p. 355). Neuropsychological Assessment and Rehabilitation. During the last 10 to 15 years, the most frequent use of the Katz scales has been in the assessment of individuals with brain and spinal cord injuries. Several studies in this area were reviewed in the first edition of this chapter (Clopton & Greene, 1994, pp. 355, 358). In the interim, even more studies in this area have become available, and there is growing awareness of the usefulness of the Katz scales in neuropsychological assessment and rehabilitation. Most patients with brain or spinal injuries live with relatives following their discharge from the hospital, so the judgments of family members about patients' symptoms and social behavior are important to monitor (Jackson et al., 1992). Emotional and social functioning are important outcomes to assess for patients with neurological conditions because they are often the most critical areas of impairment (Vickrey et al., 1992). R scale scores often have been found to differ in expected ways when patients with brain injuries are compared with normal individuals (Goodman et al., 1988), or when groups of neurological patients whose impairments differ in severity are contrasted (Klonoff,

< previous page

page_1262

next page >

< previous page

page_1263

next page > Page 1263

Costa, & Snow, 1986; Stambrook, Moore, & Peters, 1990). For example, ratings obtained from the wives of men with either severe or moderate closed head injuries were compared by Stambrook et al. (1990) with psychiatric and general population norms for the R scales (Hogarty & Katz, 1971). The 18 patients with severe head injuries had higher scores for the R scales, indicating greater social maladjustment and less social participation than the 25 patients with moderate head injuries. The patients with severe head injuries had a level of impairment that was comparable to psychiatric patients (Hogarty & Katz, 1971). Use of the Katz Adjustment Scales for Treatment Planning General Treatment Planning Issues Even though the Katz scales have been used to assess patients before their admission to a psychiatric hospital, they have been used much more frequently after a patient has been discharged from a psychiatric hospital or has completed a program of neuropsychological rehabilitation. When the Katz scales have been administered prior to hospital admission or at the beginning of rehabilitation, they have been used more often as research instruments than as part of an individualized clinical assessment. Application of Research on the Katz Adjustment Scales to Important Treatment Planning Issues Research on the use of the Katz scales in treatment planning is lacking. However, high ratings for particular scales or subscales could be used to identify the particular areas of a patient's behavior that need the most attention during treatment. A special use of the R and S scales in treatment planning could be to assess discrepancies between the views of the patient and the patient's family regarding the patient's symptoms and level of adaptive functioning. Such discrepancies often would have important implications for treatment planning. One note of caution in using information from the R scales in treatment planning was offered by Clum (1975, 1976), who found that relatives' ratings of psychiatric patients' symptoms on the R1 scale were unrelated to patient's prognosis (length of hospitalization) and also unrelated to patients' scores on the Life Change Inventory (Rahe, 1972), both when the patients were admitted to the hospital and a year after discharge. There is little basis at present to support the use of the Katz scales for making many of the judgments needed in treatment planning, especially in a managed care setting. For example, there is no research evidence that would support the use of the Katz scales in differentiating between primary and secondary problems, in identifying the appropriate level of care (inpatient vs. partial hospitalization vs. outpatient), in evaluating the patient's ability or willingness to be involved in treatment, or in assessing the need for therapeutic adjuncts, such as medication.

< previous page

page_1263

next page >

< previous page

page_1264

next page > Page 1264

Use of the Katz Adjustment Scales with Other Evaluation Data An appropriate focus for additional research would be the possibility that the Katz scales, which incorporate information about the patient from relatives, can serve as a supplement to information provided by other measures. For example, it would be helpful to use regression analysis to assess whether predictions of treatment outcome are improved when information from relatives (i.e., KAS-R scores) is added to assessment information obtained from patients and the professionals treating them. Provision of Feedback Regarding Findings from the Katz Adjustment Scales There are no guidelines for giving feedback to patients and their families about the data from the Katz scales. Summarizing the information that patients' relatives provide on the KAS-R often will be straightforward and readily accepted. However, an issue that may emerge when feedback is given to patients and their families is the lack of agreement between KAS-R scores provided by the patient's family and the judgments of the patient's behavior made by the treatment team (Katz et al., 1966). There also may be significant discrepancies between the KAS-R ratings given by different family members, or between the ratings made by patients' relatives and patients' assessment of their own functioning (Crook et al., 1980; Parker & Johnston, 1989). Although such discrepancies can have important implications for treatment planning and the outcome of treatment, dealing with these discrepancies in a feedback session may take considerable clinical skill. Families, especially those dealing with the stress of having a family member with severe problems, often find it difficult to acknowledge and accept differences of opinion (Haley & Hoffman, 1967, pp. 97-173; Satir, 1983). Any discrepancies between the ratings given by relatives and patients' assessment of their own functioning should be explored carefully by the clinician. Patients may have more insight into the behavior in question and thus be providing a more accurate rating of it. On the other hand, patients may lack insight into their own behavior, so that the perspective of family members would be more accurate. Finn (1996) described a process of providing feedback in a therapeutic manner that can be easily adapted to dealing with a situation in which there are discrepancies between the ratings of a patient's behavior. Limitations and Potential Problems in the Use of the Katz Adjustment Scales for Treatment Planning As mentioned earlier, the KAS-R has a limitation in that even though the ratings from relatives can indicate deficiencies in patients' functioning, those ratings do not indicate the reason for the deficiencies (Wallace, 1986). For example, a high score on withdrawal would have different meanings for two different patients: For one patient it could indicate an avoidance of other people, and for another it could be due to limited mobility (Jackson et al., 1992). Thus, the information in the ratings, by itself, often will not be specific enough to help in developing appropriate treatment plans.

< previous page

page_1264

next page >

< previous page

page_1265

next page > Page 1265

Use of the Katz Adjustment Scales for Treatment Monitoring Purpose of Treatment Monitoring Research evidence about the effectiveness of the Katz scales for monitoring the effects of treatment is scarce, but the KAS-R has potential usefulness in treatment monitoring. Repeated assessment with the KAS-R could be used as a way of monitoring the progress (or lack thereof) seen by patients' relatives (e.g., Parker & Johnston, 1989). How to Use the Katz Adjustment Scales for Treatment Monitoring Exner and Andronik of-Sanglade (1992) used some of the Katz scales to monitor progress in psychotherapy in several exceptional ways. Although most research has used the R scales, they employed three S scales (S1, S4, and S5) at the start of therapy, at termination, and at 8 to 12 months following termination. Instead of using the Katz scales with individuals who had chronic impairment (schizophrenia or closed head injuries), they used the Katz scales to investigate the progress of 70 patients in brief or short-term therapy (an average of 14 and 47 weekly sessions, respectively; 35 patients in each type of therapy). Furthermore, they repeatedly assessed the patients with the Katz scales. Before beginning psychotherapy, the S scale scores of patients indicated that they had many troubling symptoms, had a low level of free-time activity, and were dissatisfied with their level of social and free-time activities. At termination of psychotherapy, patients had shown a significant reduction in symptoms, and they were participating significantly more frequently in free-time activities and were significantly more satisfied with those activities. Considerations Regarding how Often to Monitor Treatment Progress Little information is available about the test-retest reliability of the Katz scales, and there is little information about how scale scores change over time with repeated administration. Therefore, it is difficult to know how frequently the KAS-R can be administered when it is used to monitor treatment. It also is difficult to know how big a change in KAS-R scores is clinically meaningful for an individual patient. Potential Use or Limits for Treatment Monitoring in a Managed Care Setting The KAS-R has potential usefulness for treatment monitoring in a managed care setting. KAS-R scores can be used to supplement other data in justifying the need for treatment, especially now that the use of T-scores based on normative data provides a clear basis

< previous page

page_1265

next page >

< previous page

page_1266

next page > Page 1266

for interpreting the meaning of individual scores. However, more studies that examine the scales' responsiveness to change over time would be useful. Use of the Katz Adjustment Scales for Treatment Outcomes Assessment General Issues The Katz scales often have been used to assess the adjustment of patients following their treatment. Until now, however, the focus of that research has been group comparisons rather than individual patients. The most common criterion with which the Katz scales have been compared is independent ratings of patients' functioning. As reported earlier, Katz and Lyerly (1963) demonstrated that the scale scores differ in expected ways for patients in aftercare who were judged to have different levels of adjustment, and similar differences have been found when groups of neurological patients whose impairments differ in severity have been compared (Klonoff et al., 1986; Stambrook et al., 1990). Evaluation of the Katz Scales Against NIMH Criteria for Outcomes Measures The KAS-R successfully meets most NIMH criteria for outcome measures. The scales are relevant and appropriate for assessing treatment outcome in patient samples, especially patients with schizophrenia or head injuries. The methodology for administering and scoring the KAS-R is simple and can easily be implemented uniformly, and the cost of using the scales is low. The meaning of the scale scores is much clearer because of the normative data and the explicit guidelines for interpretation provided in the KAS-R manual. Scale scores are influenced by several demographic variables, so the meaning of scores will not necessarily be consistent across diverse groups of patients, but those group differences are usually so small that their clinical significance may be negligible. The scales can be used readily to assess the perspectives of clinicians, patients, and relatives. Although there is limited evidence for the reliability and validity of the Katz scales, they have been demonstrated to be sensitive to treatment-related change (Exner & Andronikof-Sanglade, 1992; Hogarty, Goldberg, & Schooler, 1974), and scale scores are related to whether psychiatric patients need to be readmitted to the hospital following their discharge (Michaux et al., 1969; Sappington & Michaux, 1975). The obvious nature of the scale items makes them easy to "fake," either positively or negatively. However, there is a lack of research on the extent to which the KAS-R scores are influenced by these distortions. The meaning of the individual items of the KAS-R is understandable to mental health professionals and to patients and their families. Indeed, a major strength of the KAS-R is that the items were written so that they can be answered readily by nonprofessional raters (Jackson et al., 1992). The KAS-R results can be given quickly and clearly by using the cluster subscales and broader indices for Part I and the summary

< previous page

page_1266

next page >

< previous page

page_1267

next page > Page 1267

scores for Parts II and III. By themselves, the scales have limited value in making clinical diagnoses and treatment recommendations. However, the information they provide can be a useful supplement to other methods of clinical assessment. The development of the scales was not guided by a particular theoretical orientation, therefore their use is compatible with a wide range of theories of psychopathology. Research Findings Relevant to Use of the Katz Adjustment Scales as Outcomes Measures One area of research with the Katz scales has been the assessment of patients (e.g., individuals with schizophrenia) following their discharge from psychiatric hospitals. This section presents studies showing that scores on the Katz scales have been found to be related to whether psychiatric patients need to be readmitted to the hospital following their discharge (Michaux et al., 1969; Sappington & Michaux, 1975). Hogarty et al. (1974) used a variety of measures, including the cluster subscales for the R1 scale and the other four R scales, to investigate factors that enhanced the adjustment of 120 individuals who had been discharged following inpatient treatment for schizophrenia. Two treatments were investigated: drug treatment (chlorpromazine) and major role therapy, an intervention by social workers focusing on practical problems such as resolving crises and enhancing the patient's ability to be an effective homemaker or wage earner. Most of the significant effects in the data analysis were interactions suggesting that optimal posthospital adjustment required both drug treatment and major role therapy. Michaux et al. (1969) obtained monthly evaluations with both the R and S scales for 139 psychiatric patients after their release from a state hospital. Several scale scores were significantly related to whether patients stayed out of the hospital during the first year after their discharge, with more significant relations for R scale scores than for S scale scores. The five best predictors of whether a patient would remain out of the hospital were performance of socially expected activities (the R2 and S2 scales), the patients' expectations about socially expected activities (the S3 scale), and the relatives' ratings of the patients' Nervousness and Bizarreness (cluster subscales for the R1 scale). The R and S scales were used in a long-term research project with patients in day treatment (Michaux, Chelst, Foster, Prium, & Dasinger, 1973; Sappington & Michaux, 1975). For example, Sappington and Michaux sought to determine (for 86 patients in day treatment and 56 hospital inpatients) whether information gathered at admission could predict which patients would be readmitted to a hospital within a year after leaving treatment. Three types of information were available for patients: self-ratings on the S scales, ratings of patients made by family members on the R scales, and ratings made by professionals. The ratings made by professionals were less able to predict relapse than the R and S scales. The information from the R and S scales that was associated with relapse differed for the inpatients and day treatment patients. Indeed, the investigators reported that "the scores of patients who relapse following day-care treatment resemble those of patients [who do not relapse] following hospital treatment and vice versa" (Sappington & Michaux, 1975, p. 904). Michaux et al. (1973) used both the R and S scales to assess the adjustment of day treatment and hospital patients 2 and 12 months after they left treatment. At the end of treatment, the 52 hospitalized patients showed greater improvement in symptoms than the 45 day treatment patients. However, the difference in favor of hospital patients

< previous page

page_1267

next page >

< previous page

page_1268

next page > Page 1268

decreased at 2 months following treatment; and at 12 months following treatment, day treatment patients had better performance of social activities than hospital patients, according to both self-ratings and relatives' ratings. Genthner and Graham (1976) compared the response of Black patients and White patients to short-term psychiatric hospitalization. Both the S scales (including the S1 scale as revised by Graham et al., 1972) and the R scales were used to assess adjustment following discharge. The R scales scores did not differ for White patients and Black patients, but Blacks rated themselves as more depressed, more socially conforming, and more disoriented than Whites. Comparisons of Black patients and White patients with the R and S scales were similar to the Black-White comparisons with other measures in this study; there were few significant differences. Clinical Applications of the Katz Adjustment Scales for Outcomes Assessment Although scores on the Katz scales have been found to be related to whether psychiatric patients are readmitted to the hospital (Michaux et al., 1969; Sappington & Michaux, 1975), there are few guidelines for using the scale scores of an individual patient to predict posthospital adjustment. Patients who have high scores on a number of scales have poorer outcomes, but it is unclear which scales are most predictive of treatment outcome. Even though clinical applications of the Katz scales to the problem of monitoring treatment effectiveness and evaluating treatment outcome are lacking, a general approach to measuring change either from admission to discharge or from discharge to follow-up can be recommended. The standard error of measurement (SEM)2 for a scale can be used to estimate how large a change is needed to conclude that an increase or decrease in the scale score represents a genuine change in the patient. The SEM can be computed either by using the normative data in the KAS-R manual, data collected in one's own treatment setting, or by using data reported by others (e.g., Goran & Fabiano, 1993). The SEM for a scale, together with the properties of the normal distribution, can help estimate how likely it is that a change in scale scores exceeds a chance fluctuation (Anastasi, 1988, pp. 133-135; Walsh & Betz, 1995, pp. 56-57). When computing the SEM from data reported by others, it is important that the data have been collected from patients in a similar treatment setting. For example, using the data from Goran and Fabiano (1993) to compute SEMs would be appropriate in making judgments about the Katz scale scores for patients in rehabilitation for traumatic brain injury, but could be inappropriate for other types of patients. Use of Findings from the Katz Adjustment Scales with Other Evaluation Data Information about how the Katz scales can be combined with other data in assessing treatment outcome is lacking. 2 The formula for the standard error of measurement (SEM) is: SEM = SDscale , where SDscale is the standard deviation of scores for that scale, and rscale is the reliability of that scale.

< previous page

page_1268

next page >

< previous page

page_1269

next page > Page 1269

Provision of Feedback Regarding Outcomes Assessment Findings Guidelines for giving feedback to patients and their families about the data they have provided on the Katz scales is lacking. As mentioned earlier, one significant issue that may arise in providing feedback about Katz scores is the discrepancy between the judgments of different people about the patient's behavior. Limitations and Potential Problems in the Use of the Katz Adjustment Scales for Outcomes Assessment The biggest limitation to using the KAS-R to assess treatment outcome is that, until recently, the Katz scales were used primarily as research instruments. The recent revision increases the potential of the KAS-R in making judgments about the outcome of an individual's treatment. Now, research is needed to evaluate that potential. Potential Use as a Data Source for Mental Health Service Report Cards The aspect of mental health service report cards for which the KAS-R will be most relevant is the assessment of the effectiveness of care. The ratings of patients' relatives could be used to help establish that treatment met the needs of patients. For example, graphs could be used to plot changes in KAS-R scores during treatment. Areas of possible application include inpatient psychiatric care, outpatient psychotherapy, and neuropsychological rehabilitation. Case Study The KAS-R manual includes several case studies, including a woman receiving outpatient treatment for dysthymic disorder, a man brought to a clinic for an evaluation of a possible psychotic disorder, and a man with a history of stroke who was being evaluated prior to starting cognitive rehabilitation (Katz & Warren, 1997). It also has the following case study: The KAS-R results presented in Fig. 41.1 describe a woman who was 30 years old, married and the mother of a 1 year old child. She had been discharged from a psychiatric hospital 4 months previously, and at the time of the evaluation was living at home with her husband. She had been hospitalized for nearly 3 months recovering from an acute schizophrenic episode and it was her third hospitalization for psychiatric illness, the first occurring at age 18. The most recent episode developed over the past 2 month period following the birth of her child. It was characterized by an exaggerated concern about protecting the infant from infection that resulted in almost total isolation of the child from outsiders. She had completely neglected all of her other duties and responsibilities. She would wander around the neighborhood at night in her nightdress, had bizarre somatic delusions and exhibited an aggressive hostile manner that had not previously been evident. Concern for the welfare of the child strongly influenced the decision that hospitalization was required.

< previous page

page_1269

next page >

< previous page

page_1270

next page > Page 1270

Fig. 41.1. KAS-R scores for the patient described in the case study. Selected material from the prepublication edition of the Katz Adjustment Scale: Relative Form, copyright © 1997 by Western Psychological Services. Reprinted by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, USA. No additional reprinting in whole or in part without the expressed, written permission of the publisher. All rights reserved.

< previous page

page_1270

next page >

< previous page

page_1271

next page > Page 1271

The course of the patient's hospitalization had been uneventful. She was treated with an antipsychotic drug that was continued following discharge. She continued to see her psychiatrist at regular intervals following discharge from the hospital. At the time of the follow-up evaluation, she was described as a rather large, shapeless woman who moved slowly, showed little facial expression, and spoke very softly and hesitantly, if at all. She appeared careless about her grooming and dress. She did not seem to shun social contact, but neither did she appear to actively seek it. The KAS was completed by the patient's 32 year old husband. The INC score of 47T indicated that the informant was consistent in his description of the patient. The GEN score of 61T is slightly elevated and indicates the need to maintain regular follow-up contact with this patient. Further inspection of the KAS-R profile revealed that the primary difficulties in social adjustment for this patient were a lack of social contact and what appeared to be depressive symptoms, indicated by her D/W score of 69T and WDR item cluster score of 70T, her DEPRI Index of 65T, and elevations to 64T and 63T on the DEP and HEL item clusters, respectively. The remaining KAS-R Part I scores were all in the average range, indicating that her previous bizarre delusions and hostile aggression were not a problem at the time of this evaluation. The only critical response given by the informant was ''Needs to do things very slowly to do them right" (Almost always). Not surprisingly, this informant rated his wife's performance of socially-expected activities and participation in free-time activities as lower than most people's, obtaining a P-Role score of 38T and a PLeisure score of 60T. However, his E-Role score of 46T indicated that his expectations for her were not high, and the average EP-Discrepancy (48T) and E-Leisure (52T) scores revealed him to be generally tolerant of her level of functioning. In the opinion of the clinician who conducted the follow-up evaluation, the patient's elevated KAS-R scores in the areas of impoverished social contact and motor retardation seemed to be related, at least in part, to her continued medication. She was being maintained on a moderate to heavy dosage schedule and appeared to be somewhat sedated and underenergetic. Consequently, the patient's medication schedule was reviewed and the dosage was reduced. (Katz & Warren, 1997, chap. 3, case 3: Follow-up Evaluation After Discharge Following an Acute Schizophrenic Episode. Selected material from the prepublication edition of the Katz Adjustment Scale: Relative Form, copyright © 1997 by Western Psychological Services. Reprinted by permission of the publisher, Western Psychological Services, 12031 Wilshire Boulevard, Los Angeles, California 90025, USA. No additional reprinting in whole or part without the expressed, written permission of the publisher. All rights reserved.) Conclusions The Katz R scales have been the most widely used measures of ratings of patients made by their family members (Fiske, 1975; Hogarty, 1975). Despite their widespread use, there are problems with the item content of the Katz scales and with their reliability and validity. The Katz scales have significant strengths, such as their use in a wide variety of research and their growing role in the assessment of individuals with head injuries and other neurological conditions. Evaluation of Item Content There are several problems with the item content of the Katz scales. When the scales were developed, a clear conceptual basis for defining the scales and selecting items was lacking (Siegrist & Junge, 1990). The R2 and R4 scales overlap in content and lack a clear conceptual difference. Factor analysis could supply a clear understanding of the R1 scale, but information about the factorial structure of that scale has been inconclusive

< previous page

page_1271

next page >

< previous page

page_1272

next page > Page 1272

(Siegrist & Junge, 1990). Factor analyses of the R1 scale have produced differing results (Clum, 1976; Fabiano & Goran, 1992; Graham et al., 1973; Jackson et al., 1992; Katz & Lyerly, 1963). Many researchers use the cluster subscales of the R1 scale developed by Katz and Lyerly (1963), although there is little evidence that those subscales are the best representation of the content of the R1 scale. When the KAS-R is used in neuropsychological assessment and rehabilitation, subscales developed from data on patients with brain and spinal injuries should be considered as a supplement to the standard subscales and indices for Part I. The subscales developed by others may do a better job than the standard KAS-R measures of assessing dimensions of behavior and social adjustment that are highly relevant for individuals with neurological impairments (Goran & Fabiano, 1993; Jackson et al., 1992; Vickrey et al., 1992). One problem in using the alternative subscales, however, is that the R1 scale items for some of those subscales are no longer included in the KAS-R (see items in parentheses in Table 41.1). Katz and Lyerly (1963) attempted to write items for the R1 scale that would ask the patient's relative to describe instead of judge the patient's behavior. One reason for this emphasis on description was to reduce the influence of the attitudes and value judgments of the person completing the scales. However, inspection of the items and empirical data both indicate that the goal of eliminating judgments was not achieved. Most items ask the relative to report what the patient talks about or what the patient has done, but a substantial number of items ask for evaluations of the patient's behavior. For example, several items ask for a judgment about whether some behavior or emotional symptom occurred "for no reason" or "without reason." Other items ask the relative to judge the quality of the patient's behavior (e.g., ''Is dependable" and "Shows good judgment"). Such items can be influenced easily by the attitudes of the relative, reducing interrater reliability (Crook et al., 1980). Few studies have examined the interrater reliability of the R1 scale, but Hogarty and Katz (1971) found significant differences in the ratings provided by different informants. Mothers and siblings gave different ratings of adolescents, and married women were rated differently by their husbands than by other raters. Despite these limitations, the items of the Katz scales cover a wide range of possible symptoms and behavior patterns. Researchers often have adapted the items of the Katz scales by changing the tense depending on whether current or past behavior is being assessed. Limited Evidence of Reliability and Validity Most researchers who have used the Katz scales have assumed that the reliability and validity of the scales were established long ago. Such a conclusion has been encouraged by early reviews of the scale (Chen & Bryant, 1976; Hogarty, 1975). In contrast, later reviews faulted the scales for having little evidence of reliability and validity (McDowell & Newell, 1987; Wallace, 1986). The more recent reviews are still largely correct; there are some significant problems with the reliability and validity of the KAS-R. Some KAS-R measures (e.g., Part II, Part III, and the indices for Part I) have good internal consistency. In contrast, some of the coefficient alphas for the Part I cluster subscales are too low to support the internal consistency of the subscales (Katz & Lyerly, 1963; Katz & Warren, 1997), and there is little data on the test-retest reliability of KAS-R measures. Furthermore, the correlations between the ratings of the mothers and fathers of patients often are too low to provide convincing evidence of interrater reliability (Crook et al., 1980; Parker & Johnston, 1989).

< previous page

page_1272

next page >

< previous page

page_1273

next page > Page 1273

One difficulty found in the early reliability and validity studies is that the optimistic conclusions were not always supported by the data. For example, Katz et al. (1966) concluded that they had confirmed the validity of parents' ratings of psychiatric patients using the R scales. However, the correlations between parents' ratings and ratings made by psychiatrists and nurses were modest (< .40). The studies that have provided the most convincing evidence of validity have found significant relations between scale scores and posthospital adjustment (Michaux et al., 1969; Sappington & Michaux, 1975), or have found that scale scores differ in the expected ways for individuals with different levels of adjustment (Hogarty & Katz, 1971; Katz & Lyerly, 1963; Klonoff et al., 1986; Stambrook et al., 1990). In contrast, convincing evidence for the discriminative validity of Part I subscales and indices is lacking. Strengths The greatest strengths of the Katz scales are their endurance and diverse research applications. They have continued to be used in assessing the posthospital adjustment of psychiatric patients since they were developed, and recently have become increasingly accepted for the assessment of patients with head injuries and other neurological conditions. Other strengths are their ease of administration and the potential value of the ratings of relatives' satisfaction with patients' performance of socially expected and leisure activities (Wallace, 1986). Finally, the recent revision of the Katz scales, providing T-scores based on normative data and guidelines for interpretation of the scores for an individual patient, has set the stage for the KAS-R to go from being primarily a research instrument to becoming an instrument for the clinical assessment of patients in a variety of treatment settings. References Anastasi, A. (1988). Psychological testing. New York: Macmillan. Burdock, E.I., Hakerem, G., Hardesty, A.S., & Zubin, J. (1959). Ward Behavior Rating Scale. New York: Psychiatric Institute. Butcher, J.N., Dahlstrom, W.G., Graham, J.R., Tellegen, A.M., & Kaemmer, B. (1989). MMPI-2: Manual for administration and scoring. Minneapolis: University of Minnesota Press. Cavan, R.S., Burgess, E.W., Havighurst, R.J., & Goldhamer, H. (1949). Personal adjustment in old age. Chicago: Science Research Associates. Chen, M.K., & Bryant, B.E. (1976). The measurement of healthA critical and selective overview. International Journal of Epidemiology, 4, 257-264. Cheung, F.M. (1980). Adjustment and social behavior in the community. Hong Kong: Institute of Social Studies and the Humanities, Social Research Centre, Chinese University of Hong Kong. Chu, C.C., Sallach, H.S., & Klein, H.E. (1986). Differences in symptomatology and social adjustment between urban and rural schizophrenics. Social Psychiatry, 21, 10-14. Clopton, J.R., & Greene, R.L. (1994). Katz Adjustment Scales. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 352-370). Hillsdale, NJ: Lawrence Erlbaum Associates. Clum, G.A. (1975). Intrapsychic and environmental variables as predictors of length of hospitalization. Journal of Consulting and Clinical Psychology, 43, 276. Clum, G.A. (1976). Role of stress in the prognosis of mental illness. Journal of Consulting and Clinical Psychology, 44, 54-60.

< previous page

page_1273

next page >

< previous page

page_1274

next page > Page 1274

Clyde, D.J. (1963). Manual for the Clyde Mood Scale. Coral Gables, FL: Biometric Laboratory, University of Miami Press. Crook, T., Hogarty, G.E., & Ulrich, R.F. (1980). Inter-rater reliability of informants' ratings: Katz Adjustment Scales, R Form. Psychological Reports, 47, 427-432. Exner, J.E., Jr., & Andronik of-Sanglade, A. (1992). Rorschach changes following brief and short-term therapy. Journal of Personality Assessment, 59, 59-71. Fabiano, R.J., & Goran, D.A. (1992). A principal component analysis of the Katz Adjustment Scale in a traumatic brain injury rehabilitation sample. Rehabilitation Psychology, 37, 75-85. Finn, S. (1996). Using the MMPI-2 as a therapeutic intervention. Minneapolis, MN: University of Minnesota Press. Fiske, D.W. (1975). The use of significant others in assessing the outcome of psychotherapy. In I.W. Waskow & M.B. Parloff (Eds.), Psychotherapy change measures (pp. 189-201). Washington, DC: Government Printing Office. Freeman, H.E., & Simmons, O.G. (1958). Mental patients in the community: Family settings and performance levels. American Sociological Review, 23, 147-154. Genthner, R.W., & Graham, J.R. (1976). Effects of short-term public psychiatric hospitalization for both Black and White patients. Journal of Consulting and Clinical Psychology, 44, 118-124. Goodman, W.A., Ball, J.D., & Peck, E. (1988). Psychosocial characteristics of head-injured patients: A comparison of factor structures of the Katz Adjustment Scale. Poster presented at the Annual Meeting of the International Neuropsychology Society, New Orleans. (Abstract: Journal of Clinical and Experimental Neuropsychology, 1988, 10, 42) Goran, D.A., & Fabiano, R.J. (1993). The scaling of the Katz Adjustment Scale in a traumatic brain injury rehabilitation sample. Brain Injury, 7, 219-229. Graham, J.R., Lilly, R.S., Paolino, A.F., Friedman, I., & Konick, D.S. (1972). Measuring the adjustment of expatients in the community: A comparison of the factor structures of self-ratings and ratings by others. Journal of Clinical Psychology, 28, 380-384. Graham, J.R., Lilly, R.S., Paolino, A.F., Friedman, I., & Konick, D.S. (1972). Measuring behavior and adjustment in the community: A factor analytic study of the Katz Adjustment Scale (Form R1). Journal of Community Psychology, 1, 48-53. Haley, J., & Hoffman, L. (1967). Techniques of family therapy. New York: Basic Books. Hogarty, G.E. (1975). Informant ratings of community adjustment. In I.E. Waskow & M.B. Parloff (Eds.), Psychotherapy change measures (pp. 202-221). Washington, DC: Government Printing Office. Hogarty, G.E., Goldberg, S.C., & Schooler, N.R. (1974). Drug and sociotherapy in the aftercare of schizophrenic patients: III. Adjustment of nonrelapsed patients. Archives of General Psychiatry, 31, 609-618. Hogarty, G.E., & Katz, M.M. (1971). Norms of adjustment and social behavior. Archives of General Psychiatry, 25, 470-480. Hooley, J.M. (1985). Expressed emotion: A review of the critical literature. Clinical Psychology Review, 5, 119139. Horst, D.P., Tallmadge, G.K., & Wood, C.T. (1975). A practical guide for measuring project impact on student achievement (Office of Education Monograph No. 1, Evaluation in Education Series). Washington, DC: U.S. Government Printing Office. Jackson, H.F., Hopewell, C.A., Glass, C.A., Warburg, R., Dewey, M., & Ghadiali, E. (1992). The Katz Adjustment Scale: Modification for use with victims of traumatic brain and spinal injury. Brain Injury, 6, 109127. Katz, M.M., Lowery, H.A., & Cole, J.O. (1966). Behavior patterns of schizophrenics in the community. In M. Lorr (Ed.), Explorations in typing psychotics (pp. 209-230). New York: Pergamon. Katz, M.M., & Lyerly, S.B. (1963). Methods for measuring adjustment and social behavior in the community: I. Rational, description, discriminative validity and scale development. Psychological Reports, 13, 503-535. Katz, M.M., Sanborn, K.O., & Gudeman, H. (1970). Characterizing differences in psychopathology among ethnic groups in Hawaii. Schizophrenia Bulletin, 2, 20-29. Katz, M.M., & Warren, W.L. (1997). Katz Adjustment Scales Relative Report Form (KAS-R) manual. Los Angeles: Western Psychological Services. Klonoff, P.S., Costa, L.D., & Snow, W.G. (1986). Predictors and indicators of quality of life in patients with

closed-head injury.

< previous page

page_1274

next page >

< previous page

page_1275

next page > Page 1275

Journal of Clinical and Experimental Neuropsychology, 8, 469-485. LaForge, R., & Suczek, R.F. (1955). The interpersonal dimension of personality: III. An interpersonal checklist. Journal of Personality, 24, 94-112. Lorr, M. (1961). The Psychotic Reaction Profile manual. Beverly Hills, CA: Western Psychological Services. Lorr, M., Klett, C.J., McNair, D.M., & Lasky, J.J. (1963). Inpatient Multidimensional Psychiatric Scale (Manual). Palo Alto, CA: Consulting Psychologists Press. McDowell, I., & Newell, C. (1987). Measuring health: A guide to rating scales and questionnaires. New York: Oxford University Press. Michaux, M.H., Chelst, M.R., Foster, S.A., Prium, R.J., & Dasinger, E.M. (1973). Postrelease adjustment of day and full-time psychiatric patients. Archives of General Psychiatry, 29, 647-651. Michaux, W.W., Katz, M.M., Kurland, A.A., & Gansereit, K.H. (1969). The first year out: Mental patients after hospitalization. Baltimore: Johns Hopkins University Press. Mintz, L.I., Liberman, R.P., Miklowitz, D.J., & Mintz, J. (1987). Expressed emotion: A call for partnership among relatives, patients, and professionals. Schizophrenia Bulletin, 13, 227-235. National Institute of Mental HealthPsychopharmacology Service Center Collaborative Study Group. (1964). Phenothiazine treatment in acute schizophrenia. Archives of General Psychiatry, 10, 246-261. Otsuka, T., Nakane, Y., & Ohta, Y. (1994). Symptoms and social adjustment of schizophrenic patients as evaluated by family members. Acta Psychiatrica Scandinavica, 89, 111-116. Overall, J.E., & Gorham, D.R. (1962). The Brief Psychiatric Rating Scale. Psychological Reports, 10, 799-812. Parker, G., & Hadzi-Pavlovic, D. (1990). Expressed emotion as a predictor of schizophrenic relapse: An analysis of aggregated data. Psychological Medicine, 20, 961-965. Parker, G., & Johnston, P. (1989). Reliability of parental reports using the Katz Adjustment Scales: Before and after hospital admission for schizophrenia. Psychological Reports, 65, 251-258. Parloff, M.B., Kelman, H.C., & Frank, J.D. (1954). Comfort, effectiveness and self-awareness as criteria of improvement in psychotherapy. American Journal of Psychiatry, 111, 343-353. Rahe, R.H. (1972). Subjects' recent life changes and their near future illness reports. Annals of Clinical Research, 4, 250-265. Sappington, A.A., & Michaux, M.H. (1975). Prognostic patterns in self-report, relative report, and professional evaluation measures for hospitalized and day-care patients. Journal of Consulting and Clinical Psychology, 43, 904-910. Satir, V. (1983). Conjoint family therapy (3rd ed.). Palo Alto, CA: Science and Behavior Books. Schmidt, C.W., Perlin, S., Townes, W., Fisher, R. S., & Shaffer, J.W. (1972). Characteristics of drivers involved in single-car accidents. Archives of General Psychiatry, 27, 800-803. Siegrist, J., & Junge, A. (1990). Measuring the social dimension of subjective health in chronic illness. Psychotherapy and Psychosomatics, 54, 90-98. Stambrook, M., Moore, A.D., & Peters, L.C. (1990). Social behavior and adjustment to moderate and severe traumatic brain injury: Comparison to normative and psychiatric samples. Cognitive Rehabilitation, 8, 26-30. Stern, M.J., & Cleary, P. (1981). National exercise and heart disease project: Psychosocial changes observed during a low-level exercise program. Archives of Internal Medicine, 141, 1463-1467. Vestre, N.D., & Zimmermann, R. (1969). Validity of informants' ratings of the behavior and symptoms of psychiatric patients. Journal of Consulting and Clinical Psychology, 33, 175-179. Vickrey, B.G., Hays, R.D., Brook, R.H., & Rausch, R. (1992). Reliability and validity of the Katz Adjustment Scales in an epilepsy sample. Quality of Life Research, 1, 63-72. Wallace, C.J. (1986). Functional assessment in rehabilitation. Schizophrenia Bulletin, 12, 604-630. Walsh, W.B., & Betz, N.E. (1995). Tests and assessment. Englewood Cliffs, NJ: Prentice-Hall. Zimmermann, R.L., Vestre, N.D., & Hunter, S.H. (1975). Validity of family informants' ratings of psychiatric patients: General validity. Psychological Reports, 37, 619-630.

< previous page

page_1275

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_1277

next page > Page 1277

Chapter 42 Quality of Life Assessment/Intervention and the Quality of Life InventoryÔ (QOLI®) Michael B. Frisch Baylor University Pressures for change in clinical assessment are coming from inside and outside the professions of clinical psychology and behavioral medicine. From within the professions, clinicians are being urged to engage in ongoing patient assessment to insure that treatment goals, priorities, and strategies are specified, as well as to evaluate the amount, scope, and stability of clinically (rather than statistically) significant change (Frisch, 1992, 1998b; Kazdin, 1992, 1993a). At the same time, assessment is becoming more consumer-oriented, and less provider-oriented. Consumers, third-party payers, and managed care administrators increasingly are demanding that providers document the effectiveness of their interventions in obvious, practical terms that are readily understandable to all involved. This data then will be used to evaluate the effectiveness of particular treatments, providers, and even health plans, and to show the success or lack thereof for a particular clinical case (Dowart, 1996; Moreland, Fowler, & Honaker, 1994; Maruish, 1994). To the extent that many providers lack the will or skills to conduct such assessments as a regular part of their practice (and "treatment," e.g., see Kazdin 1993a), managed care companies have vowed to find and train providers up to the task (Lowman, 1994). The routine use of validated quality of life (QOL) measures can help to meet both sets of challenges because measures of life satisfaction and subjective well-being (SWB) have "a validity all their own" (Strupp & Hadley, 1977, p. 188) that is readily understandable to consumers, providers, and payers alike (Moreland et al., 1994). Academics, professionals, and laypersons can see that, when it comes to "important, noticeable change that affects a patient's everyday functioning" (Kazdin, 1993a, p. 27), no concept is more "clinically important" and significant than QOL or SWB because patients wish first and foremost to be happy and content (Kazdin, 1993a; Matarazzo, 1992; Ogles, Lambert, & Masters, 1996; Strupp, 1996; Strupp & Hadley, 1977). As Lambert and his colleagues observed ''contentment (that is, life satisfaction) and subjective well-being (SWB) are the most important indicators of outcome from a patient's perspective. . . . Symptomatic change is only part of the expected result of treatment. . . . Measures of positive mental health and SWB are somewhat independent of negative affect and

< previous page

page_1277

next page >

< previous page

page_1278

next page > Page 1278

symptoms" (Ogles et al., 1996, p. 92). Moreland et al. (1994) also asserted that symptom reduction is insufficient to demonstrate clinically significant change; "the absence of disease does not equal good health in both psychology and medicine" (p. 593). They go on to suggest an assessment scheme comparable to the one suggested later, asserting that psychologists must "go beyond curing pathology" to promoting SWB. This is because SWB is important in its own right and has been shown to predict future health problems and r