The use of psychological testing for treatment planning and outcomes assessment

  • 40 399 5
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

The use of psychological testing for treatment planning and outcomes assessment

cover title: author: publisher: isbn10 | asin: print isbn13: ebook isbn13: language: subject publication date: lcc: dd

3,524 118 14MB

Pages 1792 Page size 842 x 1191 pts (A3) Year 2011

Report DMCA / Copyright

DOWNLOAD FILE

Recommend Papers

File loading please wait...
Citation preview

cover

title: author: publisher: isbn10 | asin: print isbn13: ebook isbn13: language: subject

publication date: lcc: ddc: subject:

next page >

The Use of Psychological Testing for Treatment Planning and Outcome Assessment Maruish, Mark E. Lawrence Erlbaum Associates, Inc. 0805827617 9780805827613 9780585176666 English Psychological tests, Mental illness--Diagnosis, Psychiatry--Differential therapeutics, Mental illness-Treatment--Evaluation, Psychiatric rating scales, Outcome assessment (Medical care) , Outcome assessment 1999 RC473.P79U83 1999eb 616.89/075 Psychological tests, Mental illness--Diagnosis, Psychiatry--Differential therapeutics, Mental illness-Treatment--Evaluation, Psychiatric rating scales, Outcome assessment (Medical care) , Outcome assessment

cover

next page >

< previous page

page_iii

next page > Page iii

The Use of Psychological Testing for Treatment Planning and Outcomes Assessment Second Edition Edited by Mark E. Maruish United Behavioral Health

< previous page

page_iii

next page >

< previous page

page_iv

next page > Page iv

Copyright © 1999 by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of this book may be reproduced in any form, by photostat, microfilm, retrieval system, or any other means, without prior written permission of the publisher. Lawrence Erlbaum Associates, Inc., Publishers 10 Industrial Avenue Mahwah, NJ 07430 Cover design by Kathryn Houghtaling Lacey Library of Congress Cataloging-in-Publication Data The use of psychological testing for treatment planning and outcomes assessment / edited by Mark E. Maruish.2nd ed. p. cm. Includes bibliographical references and index. ISBN 0-8058-2761-7 (cloth: alk. paper) 1. Psychological tests. 2. Mental illnessDiagnosis. 3. Psychiatry Differential therapeutics. 4. Mental illnessTreatmentEvaluation. 5. Psychiatric rating scales. 6. Outcome assessment (Medical care). 7. Outcome assessment. I. Maruish, Mark E. (Mark Edward) RC473.P79U83 1998 616.89'075dc21 98-34838 CIP Books published by Lawrence Erlbaum Associates are printed on acid-free paper, and their bindings are chosen for strength and durability. Printed in the United States of America 10 9 8 7 6 5 4 3 2 1

< previous page

page_iv

next page >

< previous page

page_v

next page > Page v

CONTRIBUTORS Thomas M. Achenbach University of Vermont Grace G. Aikman Texas A&M University Ross B. Andelman University of California San Francisco Robert P. Archer Eastern Virginia Medical School C. Clifford Attkisson University of California San Francisco Cheryl A. Barber University of Minnesota Larry E. Beutler University of California, Santa Barbara Jacque Bieber American Academy of Neurology Gary M. Burlingame Brigham Young University James N. Butcher University of Minnesota Daniel Carpenter Merit Behavioral Care Corporation and Cornell University College of Medicine Dianne L. Chambless University of North Carolina at Chapel Hill James A. Ciarlo University of Denver James R. Clopton Texas Tech University C. Keith Conners Duke University Medical Center Melissa A. Culhane McLean Hospital Gayle A. Dakof University of Miami School of Medicine Allyson Ross Davies Independent Health Care Consultant Roger D. Davis Institute for Advanced Studies in Personology and Psychopathology Edwin de Beurs University of North Carolina at Chapel Hill Leonard R. Derogatis Clinical Psychometric Research, Inc., and Loyola College of Maryland Susan V. Eisen McLean Hospital William O. Faustman Palo Alto Veterans Affairs Health Care System and Stanford University School of Medicine Arthur E. Finch Brigham Young University

Daniel Fisher University of California, Santa Barbara Raymond D. Fowler American Psychological Association Michael B. Frisch Baylor University Anthony B. Gerard Western Psychological Services Antonio Goncalves University of Miami and Institute for Advanced Studies in Personology and Psychopathology Ginger Goodrich University of California, Santa Barbara Roger L. Greene Pacific Graduate School of Psychology Thomas K. Greenfield University of California San Francisco Steven R. Hahn Albert Einstein College of Medicine and Jacobi Medical Center Nancy M. Hatcher University of Georgia George Henly University of North Dakota Kay Hodges Eastern Michigan University L. Michael Honaker American Psychological Association R. W. Kamphaus University of Georgia Joel Katz University of Toronto Randy Katz University of Toronto Kenneth A. Kobak Dean Foundation Mark Kosinski The Health Institute at New England Medical Center and Quality Metric, Inc.

< previous page

page_v

next page >

< previous page

page_vi

next page > Page vi

Maria Kovacs University of Pittsburgh, School of Medicine Kurt Kroenke Regenstrief Institute for Health Care and Indiana University School of Medicine Samuel E. Krug MetriTech, Inc. David Lachar University of Texas-Houston Medical School John M. Lambert University of Utah Michael J. Lambert Brigham Young University Jeanne M. Landgraf HealthAct William W. Latimer University of Minnesota, Minneapolis Larry L. Lynn Clinical Psychometric Research, Inc., and Loyola College of Maryland John S. March Duke University Medical Center Brian J. Marsh University of South Florida Mark E. Maruish United Behavioral Health Sarah E. Meagher University of Miami and Institute for Advanced Studies in Personology and Psychopathology Theodore Millon Harvard University, University of Miami, and Institute for Advanced Studies in Personology and Psychopathology Kevin Moreland Fordham University Leslie C. Morey Vanderbilt University Jack A. Naglieri Ohio State University Frederick L. Newman Florida International University John E. Overall The University of Texas Health Science Center at Houston Ashley E. Owen University of South Florida Carleton A. Palmer University of North Carolina at Chapel Hill James D.A. Parker Trent University Julia N. Perry University of Minnesota Steven I. Pfeiffer Duke University Cecil R. Reynolds Texas A&M University

William M. Reynolds University of British Columbia Abram B. Rosenblatt University of California San Francisco Kathryn L. Savitz Clinical Psychometric Research, Inc., and Loyola College of Maryland Brian F. Shaw University of Toronto Gill Sitarenios Multi-Health Systems, Inc. Douglas K. Snyder Texas A&M University Charles D. Spielberger University of South Florida Robert L. Spitzer New York State Psychiatric Institute and Columbia University Randy D. Stinchfield University of Minnesota, Minneapolis Sumner J. Sydeman University of South Florida Manuel J. Tejeda Gettysburg College John E. Ware, Jr. Qualtity Metric, Inc., Health Assessment Lab, Tufts University School of Medicine, and Harvard School of Public Health Irving B. Weiner University of South Florida M. Gawain Wells Brigham Young University Janet B.W. Williams New York State Psychiatric Institute and Columbia University Oliver B. Williams University of California, Santa Barbara Kimberly A. Wilson University of North Carolina at Chapel Hill Ken C. Winters University of Minnesota, Minneapolis Mark Woodward University of Miami and Institute for Advanced Studies in Personology and Psychopathology Jill M. Wroblewski Strategic Advantage, Inc., Minneapolis, MN Bonnie T. Zima University of California Los Angeles

< previous page

page_vi

next page >

< previous page

next page >

page_vii

Page vii CONTENTS Preface

xiii

Part I: General Considerations 1. Introduction Mark E. Maruish

1

2. Psychological Tests in Screening for Psychiatric Disorder Leonard R. Derogatis and Larry L. Lynn

41

3. Use of Psychological Tests/Instruments for Treatment Planning Larry E. Beutler, Ginger Goodrich, Daniel Fisher, and Oliver B. Williams

81

4. Use of Psychological Tests for Assessing Treatment Outcome Michael J. Lambert and John M. Lambert

115

5. Guidelines for Selecting Psychological Instruments for Treatment Planning and Outcome Assessment Frederick L. Newman, James A. Ciarlo, and Daniel Carpenter

153

6. Design and Implementation of an Outcomes Management System Within Inpatient and Outpatient Behavioral Health Settings 171 Jacque Bieber, Jill M. Wroblewski, and Cheryl A. Barber 7. Progress and Outcome Assessment of Individual Patient Data: Selecting Single-Subject Design and Statistical Procedures Frederick L. Newman and Gayle A. Dakof

211

8. Selecting Statistical Procedures for Progress and Outcome Assessment: The Analysis of Group Data Frederick L. Newman and Manuel J. Tejeda

225

< previous page

page_vii

next page >

< previous page

next page >

page_viii

Page viii Part II: Child and Adolescent Assessment Instrumentation 9. Use of the Children's Depression Inventory Gill Sitarenios and Maria Kovacs

267

10. The Multidimensional Anxiety Scale for Children (Masc) John S. March and James D.A. Parker

299

11. Characteristics and Applications of the Revised Children's Manifest Anxiety Scale (Rcmas) Anthony B. Gerard and Cecil R. Reynolds

323

12. Overview of the Minnesota Multiphasic Personality InventoryAdolescent (Mmpi-a) 341 Robert P. Archer 13. Studying Outcome in Adolescents: The Millon Adolescent Clinical Inventory and Millon Adolescent Personality Inventory Roger D. Davis, Mark Woodward, Antonio Goncalves, Sarah Meagher, and Theodore Millon

381

14. Personality Inventory for Children, Second Edition (PIC-2), Personality Inventory for Youth (Piy), and Student Behavior Survey (Sbs) David Lachar

399

15. The Child Behavior Checklist and Related Instruments Thomas M. Achenbach

429

16. Conners Rating Scales-Revised C. Keith Conners

467

17. Youth Outcome Questionnaire (Y-OQ) M. Gawain Wells, Gary M. Burlingame, and Michael J. Lambert

497

18. Use of the Devereux Scales of Mental Disorders for Diagnosis, Treatment Planning, and Outcome Assessment 535 Jack A. Naglieri and Steven I. Pfeiffer 19. Treatment Planning and Evaluation with the Basc: The Behavior Assessment System for Children R.W. Kamphaus, Cecil R. Reynolds, and Nancy M. Hatcher

563

20. Assessing Adolescent Drug Use with the Personal Experience Inventory Ken C. Winters, Williams W. Latimer, Randy D. Stinchfield, and 599 George Henly 21. Child and Adolescent Functional Assessment Scale (CAFAS) 631 Kay Hodges

< previous page

page_viii

next page >

< previous page

next page >

page_ix

Page ix 22. The Child Health Questionnaire (CHQ): A Potential New Tool to Assess the Outcome of Psychosocial Treatment and Care Jeanne M. Landgraf

665

Part III: Adult Assessment Instrumentation 23. The SCL-90-R, Brief Symptom Inventory, and Matching Clinical Rating Scales Leonard R. Derogatis and Kathryn L. Savitz

679

24. Symptom Assessment-45 Questionnaire (SA-45) Mark E. Maruish

725

25. Behavior and Symptom Identification Scale (BASIS-32) Susan V. Eisen and Melissa A. Culhane

759

26. Brief Psychiatric Rating Scale William O. Faustman and John E. Overall

791

27. The Outcome Questionnaire Michael J. Lambert and Arthur E. Finch

831

28. Primary Care Evaluation of Mental Disorders (PRIME-MD) Steven R. Hahn, Kurt Kroenke, Janet B.W. Williams, and Robert L. Spitzer

871

29. Beck Depression Inventory and Hopelessness Scale Randy Katz, Joel Katz, and Brian F. Shaw

921

30. Hamilton Depression Inventory Kenneth A. Kobak and William M. Reynolds

935

31. Beck Anxiety Inventory Kimberly A. Wilson, Edwin De Beurs, Carleton A. Palmer, and Dianne L. Chambless

971

32. Measuring Anxiety and Anger with the State-Trait Anxiety Inventory (Stai) and the State-Trait Anger Expression Inventory (Staxi) Charles D. Spielberger, Sumner J. Sydeman, Ashley E. Owen, and 993 Brian J. Marsh 33. Minnesota Multiphasic Personality Inventory-2 (MMPI-2) Roger L. Greene and James R. Clopton

1023

34. Treatment Planning and Outcome in Adults: The Millon Clinical Multiaxial Inventory-III Roger D. Davis, Sarah E. Meagher, Antonio Goncalves, Mark Woodward, and Theodore Millon

1051

35. Personality Assessment Inventory Leslie C. Morey

1083

< previous page

page_ix

next page >

< previous page

next page >

page_x

Page x 36. Rorschach Inkblot Method Irving B. Weiner

1123

37. Butcher Treatment Planning Inventory (BTPI): An Objective Guide to Treatment Planning 1157 Julia N. Perry and James N. Butcher 38. Marital Satisfaction Inventory-Revised Douglas K. Snyder and Grace G. Aikman

1173

39. The Adult Personality Inventory Samuel E. Krug

1211

40. SF-36 Health Survey John E. Ware, Jr.

1227

41. Katz Adjustment Scales James R. Clopton and Roger L. Greene

1247

42. Quality of Life Assessment/Intervention and the Quality of Life Inventoryä (QOLIâ) Michael B. Frisch

1277

43. The UCSF Client Satisfaction Scales: I. the Client Satisfaction Questionnaire-8 1333 C. Clifford Attkisson and Thomas K. Greenfield 44. The UCSF Client Satisfaction Scales: II. the Service Satisfaction Scale-30 Thomas K. Greenfield and C. Clifford Attkisson

1347

45. The Consumer Satisfaction Survey Allyson Ross Davies, John E. Ware, Jr., and Mark Kosinski

1369

Part IV: Future Directions 46. Quality of Life of Children: Toward Conceptual Clarity Ross B. Andelman, C. Clifford Attkisson, Bonnie T. Zima, and Abram B. Rosenblatt

1383

47. Future Directions in the Use of Psychological Assessment for Treatment Planning and Outcome Assessment: Predictions and Recommendations Kevin Moreland, Raymond D. Fowler, and L. Michael Honaker 1415 Author Index

1437

Subject Index

1475

< previous page

page_x

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_xiii

next page > Page xiii

PREFACE Over the past several years the American people have witnessed some rather dramatic changes in the state of their health care system. These changes, prompted by the out-of-control cost of health care services, have reshaped the way in which health care is delivered and paid for. The resulting developments have affected not only physical health care services but behavioral health care services as well. The practice of test-based psychological assessment has not entered this new era unscathed. Limitations placed on total moneys allotted for psychological services have had an impact on the practice of psychological testing. However, for those skilled in its use, psychological testing's ability to help quickly identify psychological problems, plan and monitor treatment, and document treatment effectiveness presents many potentially rewarding opportunities during a time when health care organizations must (a) provide problem-focused, timelimited treatment; (b) demonstrate the effectiveness of treatment to payers and patients; and (c) implement quality improvement initiatives. With the opportunity at hand, it is now up to those with skill and training in psychological assessment to make the most of it as they contribute to (and benefit from) efforts to control health care costs. However, the task may not be as simple as it would appear. Many psychologists and other professionals schooled and experienced in the use of psychological tests actually have had relatively limited training and experience in the full range of applications of testing to day-to-day clinical practice. For many, formal testing courses and practicum and internship experiences have focused primarily on the use of testing for symptom identification, personality description, and diagnostic purposes. Many of the more experienced professionals are likely to have only limited knowledge of how to use test results for planning, monitoring, and assessing the outcome of psychological interventions. Consequently, although the basic skills are there, many well-trained cliniciansand graduate students as wellneed to develop or expand their testing knowledge and skills so as to be better able to apply them for such purposes. This need served as the impetus for the

< previous page

page_xiii

next page >

< previous page

page_xiv

next page > Page xiv

development of the first edition of this book, and the development of this second edition attests to its continued presence. In both cases, it was decided that the most informative and useful approach would be one in which aspects of each of four broad topical areas were addressed separately. The first area has to do with general issues and recommendations to be considered in the use of psychological testing for treatment planning and outcome assessment in today's behavioral health care environment. The second and third areas deal with issues related to the use of specific psychological tests and scales for these same purposes. The fourth concerns the future of psychological testing, including new developments on the horizon. Part I of this second edition represents an update and extension of the first part of the first edition. As in the first edition, it is devoted to general considerations that pertain to the need for and use of psychological testing for treatment planning and outcome assessment. The introductory chapter provides an overview of the status of the health care delivery system today and the ways in which testing can contribute to making that system more costeffective. Two chapters are devoted to issues related to treatment planning while four chapters focus on issues related to outcome assessment. The first of the two planning chapters deals with the use of psychological tests for screening purposes in various clinical settings. Screening can serve as the first step in the treatment planning process; for this reason, it is a topic that warrants the reader's attention. The second of these chapters presents a discussion of the research suggesting how testing may be used as a predictor of differential response to treatment and its outcome. Each of these chapters represent updated versions of the original work. The four chapters on the use of testing for outcome assessment are complementary. The first provides an overview of the use of testing for outcome assessment purposes, discussing some of the history of outcome assessment, its current status, its measures and methods, individualizing outcome assessment, the distinction between clinically and statistically significant differences in outcome assessment, and some outcomes-related issues that merit further research. The next three chapters expand on the groundwork laid in this chapter. The first of these three presents an updated discussion of a set of specific guidelines that can be valuable to clinicians in their selection of psychological measures for assessing treatment outcomes. These same criteria also are generally applicable to the selection of instruments for treatment planning purposes. The other two chapters provide a discussion of statistical procedures and research design issues related to the measurement of treatment progress and outcomes with psychological tests. One is a new chapter that specifically addresses the analysis of individual patient data; the other is an update of the original chapter that deals with the analysis of group data. As before, these discussions are presented with full knowledge of the understandable distaste that many clinicians have for statistics. However, knowledge and skills in this area are particularly important and needed by clinicians wishing to establish and maintain an effective treatment evaluation process within their particular setting. Completing Part I is another new chapter. It presents considerations relevant to the design, implementation, and maintenance of outcomes management programs in behavioral health care settings. Parts II and III address the use of specific psychological instruments for treatment planning and outcome assessment purposes. Part II deals with child and adolescent instruments while Part III focuses on instruments that are intended exclusively or primarily for use with adult populations. The instruments considered as potential chapter topics were evaluated against several selection criteria, including the popularity of the instrument among clinicians; recognition of its psychometric integrity; the potential, in the case of recently released instruments, for the instrument to become widely accepted and used; the perceived usefulness of the instrument for treatment planning and outcome assessment purposes; and the

< previous page

page_xiv

next page >

< previous page

page_xv

next page > Page xv

availability of a recognized expert on the instrument (preferably its author) to contribute a chapter to this book. In the end, the instrument-specific chapters selected for inclusion were those judged most likely to be of the greatest interest and utility to the majority of the book's intended audience. Each of the original chapters in the first edition had previously met these selection criteria; thus, Parts II and III consist of updated versions of the instrumentation chapters that appeared in the first edition. Both parts also contain several new chapters discussing instruments that were not included in the first edition for one reason or another (e.g., those that were not developed at the time, or those that only recently gained wide acceptance for outcome assessment purposes). Recognition of the potential utility of each of these instruments for treatment planning and/or evaluation served as a significant impetus for revising the original work. A decision regarding the specific content of each of the chapters in Parts II and III was not easy to arrive at. However, in the end, the contributors were asked to address those issues and questions that are of the greatest concern or relevancy for practicing clinicians. Generally, these fall into three important areas: what the instrument does and how it was developed; how one should use this instrument for treatment planning and monitoring; and how one should use it to assess treatment outcomes. Guidelines were provided to assist the contributors in addressing each of these areas. Many of the contributors adhered strictly to these guidelines; others modified the contents of their chapter to reflect and emphasize what they judged to be important for the reader to know about the instrument when using it for planning, monitoring, and/or outcome assessment purposes. Some may consider the chapters in these Parts II and III as being the ''meat" of this work because they provide how-to instructions for tools that are commonly found in the clinician's armamentarium. In fact, these chapters are no more or less important than those found in Part I. They are only extensions and are of limited value outside the context of those first several chapters. Part IV presents a discussion of the future of psychological assessment. One chapter in this part was written to inform the reader of anticipated advances in the field of testing as well as anticipated legislative and medical care mandates that may affect the manner in which psychological testing will be used in years to come. The other chapter is devoted to reviewing research related to the conceptualization of quality of life (QOL) as it applies to children, and to how it has evolved over the years. The purpose of the authors' endeavor is to present a foundation for the future development of useful measures of child QOLsomething that currently appears to be in short supply. Like the first edition, this book is not intended to be a definitive summary and review. However, it is hoped that the reader will find its chapters useful in better understanding general and test-specific considerations and approaches related to treatment planning and outcome assessment, and in effectively applying them in his or her daily practice. It also is hoped that the book will stimulate further endeavors to investigate the application of psychological testing for these purposes.

Acknowledgments The development of the second edition of this book was a significant undertaking, requiring the efforts and support from a number of people. First and foremost are the contributors. Nearly all of the eminent contributors to the first edition were gracious enough to set aside time in their busy schedules to update and revise their initial work. In addition, I was extremely fortunate to be able to enlist an equally distinguished group of experts in the field of psychological testing

< previous page

page_xv

next page >

< previous page

page_xvi

next page > Page xvi

and assessment to contribute new chapters dealing with instruments and topics that have gained attention from the professional community since the publication of the first edition. This project was successful only because of their commitment and willingness to share their knowledge, experience, and insights with this audience. Several other parties deserve particular thanks for their contributions to this endeavor. I thank Magellan Health Services and Strategic Advantage, Inc., for their support during the course of this project. Nancee Meuser once again was kind enough to serve as the "editor's editor," reviewing, editing, and offering suggestions as to how to improve the chapters that I authored. And a special thanks goes to Larry Erlbaum of Lawrence Erlbaum Associates for his encouragement, counsel, and support. Finally, I am grateful to those family members and friends who have been there for me during this project. Without their support, this book would not have been possible. MARK E. MARUISH

< previous page

page_xvi

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_xix

next page > Page xix

PART I GENERAL CONSIDERATIONS

< previous page

page_xix

next page >

< previous page

page_1

next page > Page 1

Chapter 1 Introduction Mark E. Maruish United Behavioral Health The cost of health care in the United States has reached astronomical heights. In 1995, approximately $1 trillion, or 14.9%, of the gross domestic product was spent on health care, and a 20% increase is expected by the year 2000 ("Future Targets," 1996). The costs of mental health problems and the need for behavioral health care services in the United States have risen over the past several years and are particularly disconcerting. A Substance Abuse and Mental Health Services Administration (SAMHSA) summary of various findings in the literature indicated that the U.S. bill for mental health disorders in 1990 was $148 billion (Rouse, 1995). This compares to the 1983 direct and indirect mental health costs of $73 billion reported by Harwood, Napolitano, and Kristiansen (cited in Kiesler & Morton, 1988). The high cost of treating behavioral health problems is not surprising given the prevalence of psychiatric and substance use disorders in this country. The Center for Disease Control and Prevention (1994) reported on the results of a survey of 45,000 randomly interviewed Americans regarding their quality of life. The survey found that one third of the respondents reported suffering from depression, stress, or emotional problems at least 1 day a month. Eleven percent of the sample reported having these problems more than 8 days a month. Preliminary estimates from SAMHSA's 1995 National Household Survey on Drug Abuse (SAMHSA, 1996) indicated that 12.8 million Americans 12 years and older, or 6.1% of the population, had used illicit drugs within a month of the survey interview. More alarming is the fact that illicit drug use was reported by 10.9% of the 12- to 17-yearold subsample. Also, it was estimated that 11 million Americans, or 5.5% of the population, could be classified as heavy drinkers, that is, they consumed five or more drinks on each of five different days during the same 1month time period. The American Psychological Association (APA, 1996) also reported statistics that bear attention. In sum: It is estimated that 15% to 18% of Americans suffer from a mental disorder. Fourteen million of these individuals are children. Approximately eight million Americans suffer from depression in any given month.

< previous page

page_1

next page >

< previous page

page_2

next page > Page 2

As many as 20% of Americans will suffer one or more major episodes of depression during their lifetime. An estimated 80% of elderly residents in Medicaid facilities were found to have moderate to intensive needs for mental health services. Moreover, information from various studies indicate that at least 25% of primary health care patients have a diagnosable behavioral disorder ("Leaders Predict," 1996). The Value of Behavioral Health Care Services The demand for behavioral health care services also is significant. In analyzing data from a 1987 national survey of 40,000 people in 16,000 households, Olfson and Pincus (1994a, 1994b) found that 3% of the population was seen for at least one psychotherapeutic session that year. Eighty-one percent of these sessions were visits to mental health professionals. The stigma attached to mental health problems and their treatment continues to lessen, so it might well be assumed that utilization of behavioral health care services has increased since the time of that survey. But what is the value of the services provided to those suffering from mental illness or substance abuse/dependency? Some might argue that the benefit is either minimal or too costly to achieve if significant effects are to be gained. This, however, is in the face of data that indicate otherwise. Numerous studies have demonstrated that treatment of mental health and substance abuse/dependency problems can result in substantial savings when viewed from a number of perspectives. This "cost offset" effect has been demonstrated most clearly in savings in medical care dollars. Given reports that 50% to 70% of typical primary care visits are for medical problems involving psychological factors, the value of medical cost offset is significant (American Psychological Association, 1996). Moreover, APA also reported that 25% of patients seen by primary care physicians have a disabling psychological disorder, and that depression and anxiety rank among the top six conditions seen by family physicians. The following are just a few of the findings supporting the medical cost savings that can be achieved through the provision of behavioral health care treatment: Patients with diagnosable behavioral disorders who are seen in primary care settings use two to four times as many medical resources as those patients without these disorders ("Leaders Predict," 1996b). A study by Simon, VonKorff, and Barlow (1995) revealed that the annual health care costs of 6,000 primary care patients with identified depression were nearly twice those of the same number of primary care patients without depression ($4,246 vs. $2,371). J. Johnson, Weissman, and Klerman (1992) reported that depressed patients make seven times as many visits to emergency rooms as do nondepressed patients. Saravay, Pollack, Steinberg, Weinschel, and Habert (1996) found that cognitively impaired medical and surgical inpatients were rehospitalized twice as many times as cognitively unimpaired patients within a 6month period. In the same study, depressed medical and surgical inpatients were found to have an average of approximately 12 days of rehospitalization over a 4-year follow-up period. During this same period, nondepressed inpatients averaged only 6 days of rehospitalization.

< previous page

page_2

next page >

< previous page

page_3

next page > Page 3

Demonstrating the potential for additional costs that can accrue from the presence of a behavioral health problem, the health care costs of families with an alcoholic member were found to be twice that of families without alcoholic members in a longitudinal study by Holder and Blose (1986). Sipkoff (1995) made several conclusions after reviewing several studies conducted between 1988 and 1994 and listed in the "Cost of Addictive and Mental Disorders and Effectiveness of Treatment" report published by SAMHSA. After meta-analyzing the cost offset effect, Sipkoff found that treatment for mental health problems results in about a 20% reduction in the overall cost of health care. The report also concluded that whereas alcoholics were found to spend twice as much on health care as those without abuse problems, one half of the cost of substance abuse treatment is offset within 1 year by subsequent reductions in the combined medical cost savings for patients and their families. Strain et al. (1991) found that screening a group of 452 elderly hip fracture patients for psychiatric disorders prior to surgery, and then providing mental health treatment to the 60% of the sample needing treatment, reduced total medical expenses by $270,000. The cost of the psychological/psychiatric services provided to this group was only $40,000. Simmons, Avant, Demski, and Parisher (1988) compared the average medical costs for chronic back pain patients at a multidimensional pain center (providing psychological and other types of intervention) incurred during the year prior to treatment to those costs incurred in the year following treatment. The pretreatment costs per patient were $13,284 whereas posttreatment costs were $5,596. APA (1996) succinctly summarized what appears to be the prevalent findings of the medical cost offset literature: Patients with mental disorders are heavy users of medical services, averaging twice as many visits to their primary care physicians as patients without mental disorders. When appropriate mental health services are made available, this heavy use of the system often decreases, resulting in overall health savings. Cost offset studies show a decrease in total health care costs following mental health interventions even when the cost of the intervention is included. In addition, cost offset increases over time, largely because . . . patients continue to decrease their overall use of the health care system, and don't require additional mental health services. (p. 2) A more detailed discussion of various ways in which behavioral interventions can both maximize care to medical patients and achieve significant economic gains can be found in Friedman, Sobel, Myers, Caudill, and Benson (1995). The dollar savings that result from medical cost offset are relatively obvious and easy to measure. However, the larger benefits to the communityfinancial and otherwisethat can also accrue from the treatment of mental health and substance abuse/dependency problems may not be as obvious. One area in which treatment can have a tremendous impact is in the workplace. For example, note the following facts compiled by APA (1996): In 1985, behavioral health problems resulted in over $77 billion in lost income to Americans. California's stress-related disability claims totaled $350 million in 1989. In 1980, alcoholism resulted in over 500 million lost work days in the United States. Major depression costs an estimated $23 billion in lost work days in 1990. Individuals with major depression are three times more likely than nondepressed individuals to miss time from work and four times more likely to take disability days.

< previous page

page_3

next page >

< previous page

page_4

next page > Page 4

Seventy-seven percent of all subjects from 58 psychotherapy effectiveness studies focusing on the treatment of depression received significantly better work evaluations than depressed subjects who did not receive treatment. Treatment resulted in a 150% increase in earned income for alcoholics and a 390% increase in income for drug abusers in one study of 742 substance abusers. On another front, the former director of the Office of the National Drug Control Policy reported that for every dollar spent on drug treatment, the United States saves $7 in health care and criminal justice costs (Substance Abuse Funding News, 1995). Also, SAMHSA's summary of the literature on 1990 behavioral health care costs indicated that crime, criminal justice activities, and property loss associated with substance use and mental disorders crime resulted in a total of $67.8 billion spent or lost (Rouse, 1995). Society's need for behavioral health care services provides an opportunity for psychologists and other trained behavioral health service providers to become part of the solution to a major health care problem with no indication of decline. Each of the helping professions has the potential to make a contribution to this solution. Especially important are those contributions that can be made by psychologists and others trained in the use of psychological tests. For decades, psychologists and other behavioral health care providers have come to rely on psychological assessment as a standard tool to assist diagnostic and treatment planning activities. However, the care delivery system that has evolved within health care in general, and behavioral health care services in particular, has led to changes in how third-party payers, psychologists, and other service providers think about and/or use psychological assessment in day-to-day clinical practice. Some question the value of psychological assessment in the current time-limited, capitated service delivery arena where the focus seemingly has changed from clinical priorities to fiscal priorities (Sederer, Dickey, & Hermann, 1996). Others argue that it is in just such an arena that the benefits of psychological assessment can be most fully realized and contribute significantly to the delivery of cost-effective treatment for behavioral health disorders (Maruish, 1994). Consequently, assessment could assist the health care industry in appropriately controlling or reducing the utilization and cost of health care over the long term. As Maruish (1990) observed nearly a decade ago, consider that the handwriting on the wall appears to be pointing to one scenario. With limited dollars available for treatment, the delivery of cost-efficient, effective treatment will be dependent on the ability to clearly identify the patient's problem(s). Based on this and other considerations, the most appropriate treatment modality . . . must then be determined. Finally, the organization will have to show that it has met the needs of each client. . . . It is in all of these functionsproblem identification, triage/disposition, and outcome measurementthat psychological assessment can make a significant contribution to the success of the organization. (p. 5) It is the latter side of the argument that is supported here, and that provides the basis for this and subsequent chapters within this volume. This introduction is intended to provide students and practitioners of psychology and other behavioral health care professions with an overview of how psychological assessment could, and should, be used in this era of managed behavioral health care to the ultimate benefit of patients, providers, and payers. On a final introductory note, it is important to understand that the term psychological assessment, as it is used in this chapter, refers to the evaluation of a patient's mental health status using psychological tests or related instrumentation. This evaluation may

< previous page

page_4

next page >

< previous page

page_5

next page > Page 5

be conducted with or without the benefit of patient or collateral interviews, review of medical or other records, and/or other sources of relevant information about the patient. The Current Practice of Psychological Assessment in Behavioral Health Care Settings For a number of decades, psychological assessment has been a valued and integral part of the services offered by psychologists and other mental health professionals trained in its use. However, its popularity has not been without its ups and downs. Megargee and Spielberger (1992) described a period of decreased interest in assessment that began in the 1960s. This decline was attributed to a number of factors, including a shift in focus to those aspects of treatment where assessment was thought to contribute little (e.g., a growing emphasis on behavior modification techniques, increased use of psychotropic medications, emphasis on the study of symptoms rather that personality syndromes and structures). But Megargee and Spielberger also noted a resurgence in the interest in assessment, including a new realization of how psychological assessment can assist in mental health care interventions today. Where does psychological assessment currently fit into the daily scope of activities for practicing psychologists? The newsletter Psychotherapy Finances ("Fee, Practice and Managed Care," 1995) reported the results of a nationwide readership survey of 1,700 mental health providers. Sixty-seven percent of the psychologists participating in this survey reported providing psychological testing services. This represents about a 10% drop from the level indicated by a similar 1992 survey published in the same newsletter. Also of interest in the more recent survey is the percent of professional counselors (39%), marriage and family counselors (16%), psychiatrists (21%), and social workers (13%) offering these same services. In a 1995 survey by the American Psychological Association's Committee for the Advancement of Professional Practice (Phelps, 1996), 14,000 psychological practitioners responded to questions related to workplace settings, areas of practice concerns, and range of activities. Most of the respondents (40.7%) were practitioners whose primary work setting was independent practice. Other general work settingsgovernment, medical, academic, group practicewere fairly equally represented by the remainder of the sample. The principal professional activity reported by the respondents was psychotherapy, with 43.9% of the sample acknowledging involvement in this service. Assessment, which was reported by 14% of the sample, was the second most prevalent activity. Differences in the two samples utilized in the aforementioned surveys may account for inconsistencies in findings. Psychologists who are subscribers to Psychotherapy Finances may represent that subsample of the APA survey respondents who are more involved in the delivery of clinical services. Certainly the fact that only about 44% of the APA respondents offer psychotherapy services supports this hypothesis. Regardless of the two sets of findings, psychological assessment does not appear to be utilized as much as in the past, and it is not necessary to look hard to determine at least one reason why. One of the major changes that has come about in the U.S. health care system during the past several years has been the creation and proliferation of managed care organizations (MCOs). The most significant direct effects of managed care include reductions in the length and amount of service, reductions in accessibility to particular modalities (e.g., reduced number of outpatient visits per case), and profession-related changes in

< previous page

page_5

next page >

< previous page

page_6

next page > Page 6

the types of services managed by behavioral health care providers (Oss, 1996). Overall, the impact of managed behavioral health care on the services offered by psychologists and other behavioral health care providers has been tremendous. In the APA survey reported earlier (Phelps, 1996), approximately 79% of the respondents reported that managed care had at least some impact on their work. How has managed care negatively impacted the use of psychological assessment? It is not clear from the results of the APA or Psychotherapy Finances surveys, but perhaps others can offer some insight. Ficken (1995) commented on how the advent of managed care has limited the reimbursement for (and therefore the use of) psychological assessment. In general, he saw the primary reason for this as a financial one. In an era of capitated behavioral health care coverage, the amount of money available for behavioral health care treatment is limited. MCOs therefore require a demonstration that the amount of money spent for testing will result in a greater amount of treatment cost savings. This author is unaware of any published or unpublished research to date that can provide this demonstration. In addition, Ficken noted that much of the information obtained from psychological assessment is not relevant to the treatment of patients within a managed care environment. Understandably, MCOs are reluctant to pay for gathering such information. Werthman (1995) provided similar insights into this issue, noting that managed care . . . has caused [psychologists] to revisit the medical necessity and efficacy of their testing practices. Currently, the emphasis is on the use of highly targeted and focused psychological and neuropsychological testing to sharply define the "problems" to be treated, the degree of impairment, the level of care to be provided and the treatment plan to be implemented. The high specificity and "problem-solving" approach of such testing reflects MCOs' commitment to effecting therapeutic change, as opposed to obtaining a descriptive narrative with scores. In this context, testing is perceived as a strong tool for assisting the primary provider in more accurately determining patient "impairments" and how to "repair" them. (p. 15) In general, Werthman (1995) viewed psychological assessment as being no different from other forms of patient care, thus making it subject to the same scrutiny, the same demands for demonstrating medical necessity and/or utility, and the same consequent limitations imposed by MCOs on other covered services. The foregoing representations of the current state of psychological assessment in behavioral health care delivery could be viewed as an omen of worse things to come. But, they are not. Rather, the limitations being imposed on psychological assessment and the demand for justification of its use in clinical practice represent part of the customers' dissatisfaction with the way things were done in the past. In general, the tightening of the purse strings is a positive move for both behavioral health care and the profession of psychology. It is a wake-up call to those who have contributed to the health care crisis by either uncritically performing costly psychological assessments, being unaccountable to the payers and recipients of those services, and generally not performing those services in the most responsible, cost-effective way possible. Providers need to evaluate how they have used psychological assessments in the past and then determine the best way to use them in the future. As such, it is an opportunity for providers to reestablish the value of the contributions they can make to improve the quality of care delivery through their knowledge and skills in the area of psychological assessment. The sections that follow convey one vision of the present and future opportunities for psychological assessment in behavioral health care and the means of best achieving

< previous page

page_6

next page >

< previous page

page_7

next page > Page 7

them. In doing so, the context for the remaining chapters is established. The views advanced are based on a knowledge of and experience in current psychological assessment practices as well as directions provided by the current literature. Some practitioners will disagree with the view put forth here, given their own experience and thinking on the matters discussed. However, it is hoped that even though in disagreement, they will be challenged to defend their position to themselves and, as a result, further refine their thinking and approach to the use of assessments within their practice. Psychological Assessment as a Treatment Adjunct: An Overview Traditionally, the role of psychological assessment in therapeutic settings has been quite limited. Those who did not receive their clinical training within the past few years probably were taught that the value of psychological assessment is found only at the "front end" of treatment. That is, they likely were instructed in the power and utility of psychological assessment as a means of assisting in the identification of symptoms and their severity, personality characteristics, and other aspects of the individual (e.g., intelligence, vocational interests) that are important in understanding and describing the patient at a specific point in time. Based on this data and information obtained from patient and collateral interviews, medical records, and the individual's stated goals for treatment, a diagnostic impression was given and a treatment plan was formulated and placed in the patient's charthopefully to be reviewed at various points during the course of treatment. In some cases, the patient was assigned to another practitioner within the same organization or referred out, never to be seen or contacted againmuch less to be reassessed by the person who performed the original assessment. Fortunately, during the past few years, psychological assessment has come to be recognized for more than just its usefulness at the beginning of treatment. Consequently, its utility has been extended beyond being a mere tool for describing an individual's current state, to a means of facilitating the treatment and understanding behavioral health care problems throughout and beyond the episode of care. Generally speaking, several psychological tests that are now commercially available can be employed as tools to assist in clinical decision making and outcomes assessment, and, more directly, as a treatment technique in and of itself. Each of these uses contributes value to the therapeutic process. Psychological Assessment for Clinical Decision Making Traditionally, psychological assessment has been used to assist psychologists and other behavioral health care clinicians in making important clinical decisions. The types of decision making for which it has been used include those related to screening, treatment planning, and monitoring of treatment progress. Generally, screening may be undertaken to assist in either identifying the patient's need for a particular service, or to determine the likely presence of a particular disorder or other behavioral/emotional problems. More often than not, a positive finding on screening leads to a more extensive evaluation of the patient in order to confirm with greater certainty the existence of the problem, or to further delineate the nature of the problem. The value of screening lies in the fact

< previous page

page_7

next page >

< previous page

page_8

next page > Page 8

that it permits the clinician to quickly and economically identify, with a fairly high degree of confidence (depending on the particular instrumentation used), those who are likely to need care or at least further evaluation. In many instances, psychological assessment is performed in order to obtain information that is deemed useful in the development of a specific treatment plan. Typically, this type of information is not easily (if at all) accessible through other means or sources. When combined with other information about the patient, information obtained from a psychological assessment can aid in understanding the patient, identifying the most important problems and issues that need to be addressed, and formulating recommendations about the best means of addressing them. Another way psychological assessment plays a valuable role in clinical decision making is in treatment monitoring. Repeated assessment of the patient at regular intervals during the treatment episode can provide the clinician with feedback regarding therapeutic progress. Based on the findings, the therapist will be encouraged to either continue with the original therapeutic approach or, in the case of no change or exacerbation of the problem, modify or abandon the approach in favor of an alternate one. Psychological Assessment as a Treatment Technique The degree to which the patient is involved in the assessment process is changing. One reason for this is the relatively recent revision of the ethical standards of the American Psychological Association (American Psychological Association, 1992). This revision includes a mandate for psychologists to provide feedback to clients whom they assess. According to Ethical Standard 2.09, ''Psychologists ensure that an explanation of the results is provided using language that is reasonably understandable to the person assessed or to another legally authorized person on behalf of the client" (p. 8). Finn and Tonsager (1992) offered other reasons for the recent interest in providing patients with assessment feedback. These include the recognition of the patient's right to see their medical and psychiatric health care records, as well as clinically and research-based findings and impressions suggesting that "therapeutic assessment" (described later) facilitates patient care. Finn and Tonsager also referred to Finn and Butcher's (1991) summary of potential benefits that may accrue from providing feedback to patients about their results. The benefits cited include increased feelings of self-esteem and hope, reduced symptomatology and feelings of isolation, increased self-understanding and self-awareness, and increased motivation to seek or be more actively involved in their mental health treatment. In addition, Finn and Martin (1997) noted that the therapeutic assessment process provides a model for relationships that can result in increased mutual respect, can lead to increased feelings of mastery and control, and can decrease feelings of alienation. Recently, empirical studies and other published works have addressed the therapeutic benefits that can be realized directly from discussing psychological assessment results with the patient. Although not the focus of this book, the use of psychological assessment as a treatment also bears mention here. Therapeutic use of assessment generally involves a presentation of assessment results (including assessment materials such as test protocols, profile forms, other assessment summary materials) directly to the patient, an elicitation of the patient's reactions to them, and an in-depth discussion of the meaning of the results in terms of patient-defined assessment goals. In essence, assessment data can serve as a catalyst for the therapeutic

< previous page

page_8

next page >

< previous page

page_9

next page > Page 9

encounter via the objective feedback that is provided to the patient, the patient self-assessment that is stimulated, and the opportunity for patient and therapist to arrive at mutually agreed on therapeutic goals. The use of psychological assessment as a means of therapeutic intervention has received particular attention primarily through the work of Finn and his associates (Finn, 1996a, 1996b; Finn & Martin, 1997; Finn & Tonsager, 1992). In discussing what he termed "therapeutic assessment" using the MMPI-2, Finn (1996b) outlined a procedure with a goal to "gather accurate information about clients . . . and then use this information to help clients understand themselves and make positive changes in their lives" (p. 3). Elaborating on this procedure and extending it to the use of any test, Finn and Martin (1997) described therapeutic assessment as collaborative, interpersonal, focused, time limited, and flexible. It is . . . very interactive and requires the greatest of clinical skills in a challenging role for the clinician. It is unsurpassed in a respectfulness for clients: collaborating with them to address their concerns (around which the work revolves), acknowledging them as experts on themselves and recognizing their contributions as essential, and providing to them usable answers to their questions in a therapeutic manner. Simply stated, Finn and his colleagues' therapeutic assessment procedure may be considered an approach to the assessment of mental health patients in which the patient is not only the primary provider of information needed to answer questions, but also is actively involved in formulating the questions to be answered by the assessment. Feedback regarding the results of the assessment is provided to the patient and is considered a primary, if not the primary, element to the assessment process. Thus, the patient becomes a partner in the assessment process and, as a result, therapeutic and other benefits accrue. Finn's clinical and research work primarily has focused on therapeutic assessment techniques using the MMPI-2. However, it appears that the same techniques can be employed with other instruments or batteries of instruments that provide multidimensional information relevant to patients' concerns. Thus, the work of Finn and his colleagues can serve as a model for deriving direct therapeutic benefits from the psychological assessment experience using any of several commercially available and public domain instruments. Psychological Assessment for Outcomes Assessment Currently, one of the most common reasons for conducting psychological assessment in the United States is to assess the outcomes of behavioral health care treatment. It is difficult to open a trade paper or health care newsletter, or to attend a professional conference, without being presented with a discussion about either how to "do outcomes" or what the results of a certain facility's outcomes study has revealed. The interest in and focus on outcomes assessment most probably can be traced to the continuous quality improvement (CQI) movement that was initially implemented in business and industrial settings. The impetus for the movement was a desire to produce quality products in the most efficient manner, resulting in increased revenues and decreased costs. In health care, outcomes assessment has multiple purposes, not the least of which is as a tool for marketing the organization's services. Related to this, those provider organizations vying for lucrative contracts from a third party frequently must present outcomes data demonstrating the effectiveness of their services. Equally important are data that demonstrate patient satisfaction. But perhaps the most important potential

< previous page

page_9

next page >

< previous page

page_10

next page > Page 10

use of outcomes data within provider organizations (although not always recognized as such) is the knowledge it can yield about what does and does not work. In this regard, outcomes data can serve as a means for ongoing program evaluation. It is the knowledge obtained from outcomes data that, if attended to and acted on, can lead to improvement in the services offered by the organization. When used in this manner, outcomes assessment can become an integral component of the organization's CQI initiative. More importantly, however, for individual patients, outcomes assessment provides a means of objectively measuring how much improvement they have made from the time of treatment initiation to the time of treatment termination, and in some cases extending to some time after termination. Feedback to this effect may serve to instill in patients greater self-confidence and self-esteem, and/or a more realistic view of where they are (from a psychological standpoint) at that point in time. It also may serve as an objective indicator to patients of the need for continued treatment. The purpose of the foregoing discussion was to present a broad overview of psychological assessment as a multipurpose behavioral health care tool. Depending on the individual clinician or provider organization, it may be employed for one or more of the purposes just described. The preceding overview should provide a context for a better understanding of the more in-depth and detailed discussion about each of these applications that follows. Before beginning this discussion, however, it is important to briefly review the types of instrumentation most likely to be used in therapeutic psychological assessment, as well as the significant considerations and issues related to the selection and use of this instrumentation. This should further facilitate an understanding of what is presented in the remainder of the chapter. General Considerations for the Selection and Use of Assessment Instrumentation Major test publishers regularly release new instrumentation for facilitating and evaluating behavioral health care treatment. Thus, availability of instrumentation for these purposes is not an issue. However, selection of the appropriate instrument(s) for one or more of the purposes already described is a matter requiring careful consideration. Inattention to an instrument's intended use, its demonstrated psychometric characteristics, its limitations, and other aspects related to its practical application can result in misguided treatment and potentially harmful consequences for a patient. Several types of instruments could be used for the general assessment purposes described earlier. For example, neuropsychological instruments might be used to assess memorial deficits that could impact the clinician's decision to perform further testing, the goals established for treatment, and the approach to treatment selected. Tests designed to provide estimates of level of intelligence might be used for the same purposes. It is beyond the scope of this chapter (and this book) to address, even in the most general way, all of the types of tests, rating scales, and the instrumentation that might be employed in a therapeutic environment. Instead, the focus here is on general classes of instrumentation that have the greatest applicability in the service of patient screening as well as in the planning, monitoring, and evaluation of psychotherapeutic interventions. To a limited extent, specific examples of such instruments are presented. This is followed by a brief overview of criteria and considerations that will assist clinicians in

< previous page

page_10

next page >

< previous page

page_11

next page > Page 11

selecting the best instrumentation for their intended purposes. Newman and Ciarlo present a more detailed discussion of this topic in chapter 5. Instrumentation for Behavioral Health Care Assessment The instrumentation required for any assessment application will depend on the general purpose(s) for which the assessment is being conducted and the level of informational detail that is required for those purpose(s). Generally, the types of instrumentation that would serve the purpose of assessment may be classified into one of four general categories. As already mentioned, other types of instrumentation are frequently used in clinical settings for therapeutic purposes. However, this discussion is limited to those more commonly used for screening, treatment planning, treatment monitoring, and outcome assessment. Psychological/Psychiatric Symptom Measures. Probably the most frequently used instrumentations for each of the four stated purposes are measures of psychopathological symptomatology. These are the types of instruments on which the majority of the clinician's psychological assessment training has likely been focused. These instruments were developed to assess the problems that typically prompt people to seek treatment. There are several subtypes of these measures of psychological/psychiatric symptomatology. The first is the comprehensive multidimensional measure, which is typically a lengthy, multiscale instrument that measures and provides a graphical profile of the patient on several psychopathological symptoms domains (e.g., anxiety, depression) or disorders (schizophrenia, antisocial personality). Also, summary indices sometimes are available to provide a more global picture of the individuals with regard to their psychological status or level of distress. Probably the most widely used and/or recognized of these measures are the Minnesota Multiphasic Personality Inventory (MMPI; Hathaway & McKinley, 1951) and its restandardized revision, the MMPI-2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989), the Millon Clinical Multiaxial Inventory-III (MCMI-III; Millon, 1994), and the Personality Assessment Inventory (PAI; Morey, 1991). Multiscale instruments of this type can serve a variety of purposes that facilitate therapeutic interventions. They may be used on initial contact with the patient to screen for the need for service and, at the same time, yield information useful for treatment planning. Indeed, some such instruments (e.g., the MMPI-2) may make available supplementary, content-related, and/or special scales that can assist in addressing specific treatment considerations (e.g., motivation for treatment). Other multiscale instruments might be useful in identifying specific problems that may be unrelated to the patient's chief complaints (e.g., low self-esteem). They also can be administered numerous times during the course of treatment to monitor the patient's progress toward achieving established goals and to assist in determining what adjustments (if any) must be made to the clinician's approach. In addition, use of such an instrument in a pre- and posttreatment fashion can provide information related to the outcomes of an individual patient's treatment. At the same time, data obtained in this fashion can be analyzed with the results of other patients to evaluate the effectiveness of an individual therapist, a particular therapeutic approach, or an organization. Abbreviated multidimensional measures are similar to the comprehensive multidimensional measure in many respects. First, by definition, they contain multiple scales for

< previous page

page_11

next page >

< previous page

page_12

next page > Page 12

measuring a variety of symptom domains and disorders. They also may allow for the derivation of an index of the patient's general level of psychopathology or distress. In addition, they may be used for screening, treatment planning and monitoring, and outcomes assessment purposes just like the comprehensive instruments. The distinguishing feature of the abbreviated instrument is its length. Again, by definition, these instruments are relatively short, easy to administer, and (usually) easy to score. Their brevity does not allow for an in-depth assessment of patients and their problems, but this is not what these instruments were designed to do. Probably the most widely used of these brief instruments are Derogatis' family of symptom checklist instruments. These include the original Symptom Checklist-90 (SCL-90; Derogatis, Lipman, & Covi, 1973) and its revision, the SCL-90-R (Derogatis, 1983). Both of these instruments contain a checklist of 90 psychological symptoms, most of which score on the instruments' nine symptom scales. An even briefer version has been developed for each of these instruments. The first is the Brief Symptom Inventory (BSI; Derogatis, 1992), which was derived from the SCL-90-R. In a health care environment that is cost-conscious and unwilling to make too many demands on a patient's time, this 53-item instrument is gaining popularity over its longer 90-item parent instrument. Similarly, a brief form of the original SCL-90 has been developed. Titled the Symptom Assessment45 Questionnaire (SA-45; Strategic Advantage, Inc., 1996), its development did not follow Derogatis' approach to the development of the BSI; instead, cluster analytic techniques were used to select five items each for assessing each of the nine symptom domains found on the three Derogatis checklists. The major strength of the abbreviated multiscale instruments is their ability to broadly and very quickly survey several psychological symptoms domains and disorders relative to the patient. Their value is most clearly evident in settings where both the time and dollars available for assessment services are quite limited. These instruments provide a lot of information quickly. Also, they are much more likely to be completed by patients than their lengthier counterparts. This last point is particularly important if there is an interest in monitoring treatment or assessing outcomes, both of which require at least two or more assessments to obtain the necessary information. Measures of General Health Status and Role Functioning. During the past decade, there has been an increasing interest in the assessment of health status in physical and behavioral health care delivery systems. Initially, this interest was shown primarily within those organizations and settings focused on the treatment of physical diseases and disorders. In recent years, behavioral health care providers have recognized the value of assessing the patient's general level of health. It is important to recognize that the term health means more than just the absence of disease or debility. According to the World Health Organization (WHO; as cited in Stewart & Ware, 1992), it also implies a state of well-being throughout the individual's physical, psychological, and social spheres of existence. Dickey and Wagenaar (1996) noted how this concept of health recognizes the importance of eliciting the patient's point of view in assessing health status. They also pointed to similar conclusions reached by Jahoda (1958) specific to the area of mental health. Here, individuals' self-assessment relative to how they feel they should be is an important component of "mental health." Measures of health status and physical functioning can be classified into one of two groups: generic and condition specific. Probably the most widely used and respected generic health status measures are the 36-item Medical Outcomes Study Short Form Health Scale (SF-36; Ware & Sherbourne, 1992; Ware, Snow, Kosinski, & Gandek,

< previous page

page_12

next page >

< previous page

page_13

next page > Page 13

1994) and the 39-item Health Status Questionnaire 2.0 (HSQ; Health Outcomes Institute, 1993; Radosevich, Wetzler, & Wilson, 1994). Aside from minor variations in the scoring of one of the instruments' scales (i.e., Bodily Pain) and the HSQ's inclusion of three depression screening items, the two measures are essentially identical. Each assesses eight dimensionsfour addressing mental health-related constructs and four addressing physical health-related constructsthat reflect the WHO concept of "health." Role functioning has recently gained attention as an important variable to address in the course of assessing the impact of a physical or mental disorder on an individual's life functioning. How the person's ability to work, perform daily tasks, or interact with others is affected by the disorder is important to know in devising a treatment plan and monitoring progress over time. The SF-36 and HSQ both address these issues with scales designed for this purpose. In response to concerns that even these two relatively brief objective measures are too lengthy for regular administration in clinical and research settings, a 12-item, abbreviated version of each has been developed. The SF-12 (Ware, Kosinski, & Keller, 1995) was developed for use in large-scale, population-based research where the monitoring of health status at a broad level is all that is required. Also, a 12-item version of the HSQ, the HSQ-12 (Radosevich & Pruitt, 1996), was developed for similar uses. (It is interesting that despite being derived from essentially the same instrument, there is only 50% item overlap between the two abbreviated instruments.) Both instruments are relatively new, but the data supporting their use that has been gathered to date are promising. Condition-specific health status and functioning measures have been utilized for a number of years. Most have been developed for use with physical rather than mental disorders, diseases, or conditions. However, conditionspecific measures of mental health status and functioning are beginning to appear. A major source of this type of instrument is the Minnesota-based Health Outcomes Institute (HOI), a successor to the health care think tank InterStudy. In addition to the HSQ and the HSQ-12, HOI serves as the distributor/clearinghouse for the condition-specific "Technology of Patient Experience (TyPE) specifications." The available TyPEs that would be most useful to behavioral health care practitioners currently include those developed by a team of researchers at the University of Arkansas Medical Center for use with depressive, phobic, and alcohol/substance use disorders. TyPEs for other specific psychological disorders are currently under development at the University of Arkansas for distribution through HOI. Quality of Life Measures. Andrews, Peters, and Teesson (1994) indicated that most of the definitions of the "quality of life" (QOL) describe a multidimensional construct encompassing physical, affective, cognitive, social, and economic domains. Objective measures of QOL focus on environmental resources required to meet a person's needs and can be completed by someone other than the patient. The subjective measures of QOL assess patients' satisfaction with the various aspects of their life and thus must be completed by patients. Andrews et al. (1994) indicated other distinctions in the QOL arena. One has to do with the differences between QOL and health-related quality of life, or HRQL; and (similar to the case with health status measures) the other has to do with the distinction between generic and condition-specific measures of QOL. QOL measures differ from HRQL measures in that the former assess the whole "fabric of life," whereas the latter assess quality of life as it is affected by a disease or disorder, or by its treatment. Generic measures are designed to assess aspects of life that are generally relevant to most people;

< previous page

page_13

next page >

< previous page

page_14

next page > Page 14

condition-specific measures are focused on aspects of the lives of particular disease/disorder populations. However, as Andrews et al. pointed out, there tends to be a great deal of overlap between generic and conditionspecific QOL measures. Service Satisfaction Measures. With the expanding interest in assessing the outcomes of treatment for the patient, it is not surprising to see an accompanying interest in assessing patients' and (in some instances) their family's satisfaction with the services received. In fact, many professionals and organizations equate satisfaction with outcomes and frequently consider it the most important outcome. In a recent survey of 73 behavioral health care organizations, 71% of the respondents indicated that their outcomes studies included measures of patient satisfaction (Pallak, 1994). Although service satisfaction is frequently viewed as an outcome, it should not be classified as such. Rather, it should be considered a measure of the overall therapeutic process, encompassing patients' (and at times, others') view of how the service was delivered, the capabilities and attentiveness of the service provider, the benefits of the service (if any), and any number of other selected aspects of the service they received. Patient satisfaction surveys do not answer the question, "What was the result of the treatment rendered to the patient?"; they do answer the question, "How did the patient feel about the treatment he or she received?" Thus, they serve an important program evaluation/improvement function. The number of questionnaires currently in use to measure patient satisfaction is countless. This reflects the attempts of individual health care organizations to develop customized measures that assess variables important to their particular needs, which in turn reflects a response to outside demands to "do something" to demonstrate the effectiveness of their services. Often, this "something" has not been evaluated to determine its basic psychometric properties. As a result, there exists numerous survey options that one may choose from, but very few that actually have demonstrated their validity and reliability as measures of service satisfaction. Fortunately, there are a few instruments that have been investigated for their psychometric integrity. Probably the most widely used and researched patient satisfaction instrument designed for use in behavioral health care settings is the eight-item version of the Client Satisfaction Questionnaire (CSQ-8; Attkisson & Zwick, 1982; Nguyen, Attkisson, & Stenger, 1983). The CSQ-8 was derived from the original 31-item CSQ (Larsen, Attkisson, Hargreaves, & Nguyen, 1979), which also yielded two longer 18-item alternate forms, the CSQ-18A and CSQ-18B (LeVois, Nguyen, & Attkisson, 1981). The more recent work of Attkisson and his colleagues at the University of California at San Francisco is the Service Satisfaction Scale-30 (SSS-30; Greenfield & Attkisson, 1989). The SSS-30 is a 30-item, multifactorial scale that yields information about several aspects of satisfaction with the services received, such as perceived outcome and manner and skill of the clinician. Guidelines for Instrument Selection Regardless of the type of instrument being considered for use in the therapeutic environment, clinicians frequently must choose between many product offerings. But what are the general criteria for the selection of any assessment instrument? What should guide the clinician's selection of an instrument for a specific purpose? As part of their training, psychologists and other mental health professionals have been educated about the

< previous page

page_14

next page >

< previous page

page_15

next page > Page 15

psychometric properties important to consider when determining the appropriateness of an instrument for its intended use. However, this is just one of several considerations that should be taken into account in evaluating an instrument for specific therapeutic use. Guidance regarding instrument selection has been offered by many experts. Probably the most thorough and clinically relevant guidelines for the selection of psychological assessment instrument comes from the National Institute of Mental Health (NIMH) supported work of Ciarlo, Brown, Edwards, Kiresuk, and Newman (1986). Newman, Ciarlo, and Carpenter present an update summary and synopsis of this NIMH work in chapter 5. The criteria they describe deal generally with applications, methods and procedures, psychometric features, as well as cost and utility considerations. Although these selection criteria were originally developed for use in evaluating instruments for outcomes assessment purposes, most also have relevance in the selection of instrumentation for screening, treatment planning, and treatment monitoring. The work of Ciarlo and his colleagues provides more extensive instrument selection guidelines than most. Others who have addressed the issue have arrived at recommendations that serve to reinforce and/or compliment those listed in the NIMH document. For example, Andrews' work in Australia has led to significant contributions to the body of outcomes assessment knowledge. As part of this, Andrews et al. (1994) identified six general "qualities of consumer outcome measures" that are generally in concordance with those from the NIMH study: applicability, acceptability, practicality, reliability, validity, and sensitivity to change. Ficken (1995) indicated that instruments used for screening purposes should (a) possess high levels of sensitivity and specificity to diagnostic criteria from the Diagnostic and Statistical Manual of Mental Disorders (4th ed., DSMIV; American Psychiatric Association, 1994) or the most up-to-date version of the International Classification of Diseases (ICD); (b) focus on hard-to-detect (in a single office visit) but treatable disorders associated with imminent harm to self or others, significant suffering, and a decrease in productivity; (c) require no more than 10 minutes to administer; and (d) have an administration protocol that easily integrates into the organization's work flow. Other sources (Burlingame, Lambert, Reisinger, Neff, & Mosier, 1995; Schlosser, 1995; Sederer et al., 1996) discuss this matter further. Psychological Assessment as a Tool for Screening Among the most significant ways in which psychological assessment can contribute to the development of an economic and efficient behavioral health care delivery system is through its ability to quickly identify individuals in need of mental health or substance use treatment services, and/or to determine the likelihood of the presence of a specific disorder or condition. The most important aspect of any screening procedure is the efficiency with which it can provide information useful to clinical decision making. In the field of psychology, the most efficient and thoroughly investigated screening procedures involve the use of psychological test instruments. The power or utility of a psychological screener lies in its ability to determine, with a high level of probability, whether respondents do or do not have a particular disorder or condition, or whether they are or are not a member of a group with clearly defined characteristics. The most commonly used screeners in daily clinical practice are those designed to identify some specific aspect of psychological functioning or disturbance, or to provide a broad overview of the respondent's point-in-time mental status. Examples of problem-specific

< previous page

page_15

next page >

< previous page

page_16

next page > Page 16

screeners include the Beck Depression Inventory (BDI; Beck, Rush, Shaw, & Emery, 1979) and State-Trait Anxiety Inventory (STAI; Spielberger, 1983). Examples of screeners for more generalized psychopathology or distress include the SA-45 and BSI. Research-Based Use of Psychological Screeners The establishment of a system for screening for a particular disorder or condition involves determining what it is clinicians want to screen in or screen out, the level of probability they feel comfortable at making that decision, and how many misclassifications (or what percentage of errors) they are willing to tolerate. Once it is decided what particular disorder or condition will be screened, they then must evaluate the instrument's classification efficiency statisticssensitivity, specificity, positive predictive power (PPP), and negative predictive power (NPP)to determine if a given instrument is suitable for the intended purpose(s). These statistics and related issues are described in detail in chapter 2. Implementation of Screeners into the Daily Work Flow of Service Delivery The utility of a screening instrument is only as good as the degree to which it can be integrated into an organization's daily regimen of service delivery. This, in turn, depends on a number of factors. The first is the degree to which the administration and scoring of the screener is quick and easy, and the amount of time required to train the provider's staff to successfully incorporate the screener into the daily work flow. The second factor relates to the instrument's use. Generally, screeners are developed to assist in determining the likelihood that the patient does or does not have the specific condition or characteristic the instrument is designed to identify. Use for any other purpose (e.g., assigning a diagnosis based solely on screener results, determining the likelihood of the presence of other characteristics) only serves to undermine the integrity of the instrument in the eyes of staff, payers, and other parties with a vested interest in the screening process. The third factor has to do with the ability of the provider to act on the information obtained from the screener. It must be clear how the clinician should proceed based on the information available. The final factor is staff acceptance and commitment to the screening process. This comes only with a clear understanding of the importance of the screening, the usefulness of the obtained information, and how the screening process is to be incorporated into the organization's daily work flow. Ficken (1995) provided an example of how screeners can be integrated into an assessment system designed to assist primary care physicians in identifying patients with psychiatric disorders. This system (which also allows for the incorporation of practice guidelines) seems to take into account the first three utility-related factors mentioned earlier. It begins with the administration of a screener that is highly sensitive and specific to DSM- or ICD-related disorders. These screeners should require no more than 10 minutes to complete, and "their administration must be integrated seamlessly into the standard clinical routine" (p. 13). Somewhat similar to the sequence described by Derogatis and DellaPietra (1994), positive findings would lead to a second level of testing. Here, another screener that meets the same requirements as those for the first

< previous page

page_16

next page >

< previous page

page_17

next page > Page 17

screener and also affirms or rules out a diagnosis would be administered. Positive findings would lead to additional assessment for treatment planning purposes. Consistent with standard practice, Ficken recommended confirmation of screener findings by a qualified psychologist or physician. Psychological Assessment as a Tool for Treatment Planning Problem identification through the use screening instruments is only one way in which psychological assessment can facilitate the treatment of behavioral health problems. When employed by a trained clinician, psychological assessment also can provide information that can greatly facilitate and enhance the planning of a specific therapeutic intervention for the individual patient. It is through the implementation of a tailored treatment plan that the patient's chances of problem resolution are maximized. The importance of treatment planning has received significant attention during recent years. The reasons for this recognition were summarized (Maruish, 1990) as follows: "Among important and interrelated reasons . . . [are] concerted efforts to make psychotherapy more efficient and cost effective, the growing influence of `third parties' (insurance companies and the federal government) that are called upon to foot the bill for psychological as well as medical treatments, and society's disenchantment with open-ended forms of psychotherapy without clearly defined goals" (p. iii). The role that psychological assessment can play in planning a course of treatment for behavioral health care problems is significant. Butcher (1990) indicated that information available from instruments such as the MMPI2 not only can assist in identifying problems and in establishing communication with the patient, it also can help ensure that the plan for treatment is consistent with the patient's personality and external resources. In addition, psychological assessment may reveal potential obstacles to therapy, areas of potential growth, and problems of which the patient may not be consciously aware. Moreover, both Butcher (1990) and Appelbaum (1990) viewed testing as a means of quickly obtaining a second opinion. Other benefits of the results of psychological assessment identified by Appelbaum include assistance in identifying patient strengths and weaknesses, identification of the complexity of the patient's personality, and establishment of a reference point during the therapeutic episode. The type of treatment-relevant information that can be derived from patient assessment and the manner in which it is applied are quite varieda fact that will become evident later. Regardless, Strupp (see Butcher, 1990) probably provided the best summary of the potential contribution of psychological assessment to treatment planning, stating that "careful assessment of patient's personality resources and liabilities is of inestimable importance. It will predictably save money and avoid misplaced therapeutic effort; it can also enhance the likelihood of favorable treatment outcomes for suitable patients" (pp. v-vi). Assumptions About Treatment Planning The introduction to this section presented a broad overview of ways in which psychological assessment can assist in the development and successful implementation of treatment plans for behavioral health care patients. These and other benefits are discussed

< previous page

page_17

next page >

< previous page

page_18

next page > Page 18

in greater detail later. However, it is important to first clarify what treatment planning is and some of the general, implicit assumptions that can typically be made about this important therapeutic activity. For the purpose of this discussion, the term treatment planning is defined as that part of a therapeutic episode in which a set of goals for an individual presenting with mental health or substance abuse problems is developed, and the specific means by which the therapist or other resources will assist the patient in achieving those goals is identified. The following are some general assumptions underlying the treatment planning process: 1. Patients are experiencing behavioral health problems that have been identified either by themselves or another party. Common external sources of problem identification include a spouse, parent, teacher, employer, and the legal system. 2. Patients experience some degree of internal and/or external motivation to eliminate or reduce the identified problems. An example of external motivation to change is the potential loss of a job or dissolution of a marriage if problems are not resolved to the satisfaction of the other party. 3. The goals of treatment are tied either directly or indirectly to the identified problems. 4. The goals of treatment have definable criteria for achievement, are indeed achievable by the patient, and are developed by the patient in collaboration with the clinician. 5. The prioritization of goals is reflected in the treatment plan. 6. Patients' progress toward the achievement of the treatment goals can be tracked and compared against an expected path of improvement in either a formal or informal manner. This expected path of improvement may be based on the clinician's experience or (ideally) on objective data gathered on similar patients. 7. Deviations from the expected path of improvement will lead to a modification in the treatment plan, followed by subsequent monitoring to determine the effectiveness of the alteration. These assumptions should not be considered exhaustive, nor are they likely to reflect what actually occurs in all situations. For example, some patients seen for therapeutic services have no motivation to change. As may be seen in juvenile detention settings or in cases where children are brought to treatment by the parents, their participation in treatment is forced, and they may exert no effort to change. In the more extreme cases, they might in fact engage in intentional efforts to sabotage the therapeutic intervention. In other cases, it is likely that some clinicians continue to identify and prioritize treatment goals without the direct input of patient. Regardless, the previous assumptions have a direct bearing on the manner in which psychological assessment can best serve treatment-planning efforts. The Benefits of Psychological Assessment for Treatment Planning As pointed out earlier, there are several ways in which psychological assessment can assist in the planning of treatment for behavioral health care patients. The more common and evident contributions can be organized into four general categories: problem identification, problem clarification, identification of important patient characteristics, and monitoring of treatment progress. Problem Identification. Probably the most common use of psychological assessment in the service of treatment planning is for problem identification. Often, the use of psychological testing per se is not needed to identify what problems patients are experiencing.

< previous page

page_18

next page >

< previous page

page_19

next page > Page 19

They either will tell the clinician directly without questioning, or they will admit their problem(s) while questioned during a clinical interview. However, this is not always the case. The value of psychological testing becomes apparent in those cases where patients are hesitant or unable to identify the nature of their problems. With a motivated and engaged patient who responds openly and honestly to items on a well-validated and reliable test, the process of identifying what led the patient to seek treatment may be greatly facilitated. Cooperation shown during testing may be attributable to the nonthreatening nature of questions presented on paper or a computer monitor (as opposed to those posed by another human being); the subtle, indirect qualities of the questions themselves (compared to those asked by the clinician); or a combination of these reasons. In addition, the nature of some of the more commonly used psychological test instruments allows for the identification of secondary, but significant, problems that might otherwise be overlooked. Multidimensional inventories such as the MMPI-2 and the PAI are good examples of these types of instruments. Moreover, these instruments may be sensitive to other patient symptoms, traits, or characteristics that may exacerbate or otherwise contribute to the patient's problems. Note that the type of problem identification described here is different from that conducted during screening. Whereas screening is focused on determining the presence or absence of a single problem, problem identification generally takes a broader view and investigates the possibility of the presence of multiple problem areas. At the same time, there also is an attempt to determine problem severity and the extent to which the problem area(s) affect the patient's ability to function. Problem Clarification. Psychological testing can often assist in the clarification of a known problem. Through tests designed for use with populations presenting problems similar to those of the patient, aspects of identified problems can be elucidated. Information gained from these tests can improve the patient's and clinician's understanding of the problem, and lead to the development of a better treatment plan. The three most important types of information that can be gleaned for this purpose are the severity of the problem, the complexity of the problem, and the degree to which the problem impairs the patient's ability to function in one or more life roles. The manner is which a patient is treated depends a great deal on the severity of the problem. In particular, problem severity plays a significant role in determining the setting in which the behavioral health care intervention is provided. Those patients whose problems are so severe that they are considered a danger to themselves or others are often best suited for inpatient treatment, at least until dangerousness is no longer an issue. Similarly, problem severity may be a primary criterion that signals the necessity of evaluation for a medication adjunct to treatment. Severity also may have a bearing on the type of psychotherapeutic approach taken by the clinician. For example, it may be more productive for the clinician to take a supportive role with severe cases; all things being equal, a more confrontive approach may be more appropriate with patients whose problems are mild to moderate in severity. As alluded to earlier, the problems of patients seeking behavioral health care services are frequently multidimensional. Patient and environmental factors that play into the formation and maintenance of a psychological problem, along with the problem's relation with other conditions, all contribute to its complexity. Knowing the complexity

< previous page

page_19

next page >

< previous page

page_20

next page > Page 20

of the target problem is invaluable in developing an effective treatment plan. Again, multidimensional instruments or batteries of tests, each measuring specific aspects of psychological dysfunction, serve this purpose well. As with problem severity, knowledge of the complexity of a patient's problems can help the clinician and patient in many aspects of treatment planning, including determination of appropriate setting, therapeutic approach, need for medication, and other important decisions. However, possibly of equal importance to the patient and other concerned parties (wife, employer, school, etc.) is the extent to which these problems affect patients' ability to function in their role as parent, child, employee, student, friend, and so on. Information gathered from the administration of measures designed to assess role functioning clarifies the impact of the patient's problems and serves to establish role-specific goals. It also can identify other parties that may serve as potential allies in the therapeutic process. In general, the most important role-functioning domains to assess are those related to work or school performance, interpersonal relationships, and activities of daily living (ADLs). Identification of Important Patient Characteristics. The identification and clarification of the patient's problems is of key importance in planning a course of treatment. However, there are numerous other types of patient information not specific to the identified problem that can be useful in planning treatment and easily identified through the use of psychological assessment instruments. The vast majority of treatment plans are developed or modified with consideration to at least some of these nonpathological characteristics. The exceptions are generally found with clinicians or programs that take a ''one-size-fits-all" approach to treatment. Probably the most useful type of information not specific to the identified problem that can be gleaned from psychological assessment is the identification of patient characteristics that can serve as assets or areas of strength for patients in working to achieve their therapeutic goals. For example, Morey and Henry (1994) pointed to the utility of the PAI's Nonsupport scale in identifying whether patients perceive an adequate social support network, this being a predictor of positive therapeutic progress. Other examples include "normal" personality characteristics, such as that which can be obtained from Gough, McClosky, and Meehl's Dominance (1951) and Social Responsibility scales (1952) developed for use with the MMPI/MMPI-2. Greene (1991) indicated that those with high scores on the Dominance scale are described as "being able to take charge of responsibility for their lives. They are poised, self-assured, and confident of their own abilities" (p. 209). Gough and his colleagues interpreted high scores on the Social Responsibility scale as being indicative of individuals who, among other things, trust the world, are self-assured and poised, and believe that individuals must carry their share of duties. Thus, scores on these and similar types of scales may reveal important aspects of patient functioning that can be used to affect therapeutic change. Similarly, knowledge of the patient's weaknesses or deficits also may impact the type of treatment plan. Greene and Clopton (1994) provided numerous types of deficit-relevant information from the MMPI-2 Content scales that have implications for treatment planning. For example, a clinically significant score (T > 64) on the Anger scale should lead clinicians to consider the inclusion of training in assertiveness and/or anger control as part of the patient's treatment. On the other hand, uneasiness in social situations, as suggested by a significantly elevated score on either the Low Self-Esteem or Social Discomfort scale, suggests that a supportive approach to the intervention would be beneficial, at least initially.

< previous page

page_20

next page >

< previous page

page_21

next page > Page 21

Moreover, use of specially designed scales and procedures can provide information related to the patient's ability to become engaged in the therapeutic process. For example, the Therapeutic Reactance Scale (Dowd, Milne, & Wise, 1991) and the MMPI-2 Negative Treatment Indicators Content Scale developed by Butcher and his colleagues (Butcher, Graham, Williams, & Ben-Porath, 1989) may be useful in determining whether the patient is likely to resist therapeutic intervention. Morey and Henry (1994) presented algorithms utilizing PAI T-scores that may be useful in making statements about the presence of characteristics that bode well for the therapeutic endeavor (e.g., sufficient distress to motivate engagement in treatment, the ability to form a therapeutic alliance). Other types of patient characteristics that can be identified through psychological assessment have implications for selecting the best therapeutic approach for a given patient and thus can contribute significantly to the treatment planning process. Moreland (1996) pointed out how psychological assessment can assist in determining if the patient deals with problems through internalizing or externalizing behaviors. He noted that all things being equal, internalizers would probably profit most from an insight-oriented approach rather than a behaviorally oriented approach. The reverse would be true for externalizers. And through their work over the years, Beutler and his colleagues (Beutler & Clarkin, 1990; Beutler, Wakefield, & R.E. Williams, 1994; Beutler & O.B. Williams, 1995) identified several patient characteristics that are important to matching patients and treatment approaches for maximized therapeutic effectiveness. These are addressed in detail in chapter 3. Monitoring of Progress Along the Path of Expected Improvement. Information from repeated testing during the treatment process can help the clinician to determine if the treatment plan is appropriate for the patient at a given point in time. Thus, many clinicians use psychological assessment to determine whether their patients are showing the expected improvement as treatment progresses. If not, adjustments can be made. These adjustments may reflect the need for more intensive or aggressive treatment (e.g., increased number of psychotherapeutic sessions each week, addition of a medication adjunct), less intensive treatment (e.g., reduction or discontinuation of medication, transfer from inpatient to outpatient care), or a different therapeutic approach (e.g., changing from humanistic therapy to cognitive-behavioral therapy). Regardless, any modifications require later reassessment of the patient to determine if the treatment revisions have impacted patient progress in the expected direction. This process may be repeated any number of times. These "in-treatment" reassessments also can provide information relevant to the decision of when to terminate treatment. The goal of monitoring is to determine whether treatment is "on track" with expected progress at a given point in time. When and how often clinicians might assess the patient is dependent on a few factors. The first is the instrumentation. Many instruments are designed to assess the patient's status at the time of testing. Items on these measures are generally worded in the present tense (e.g., "I feel tense and nervous," "I feel that my family loves and cares about me"). Changes from one day to the next on the construct(s) measured by these instruments should be reflected in the test results. Other instruments, however, ask the patient to indicate if a variable of interest has been present, or how much or to what extent it has occurred during a specific time period in the past. The items usually are asked in the context of something like "During the past month, how often have you . . . ?" or "During the past week, to what extent has . . . ?" Readministration of these interval-of-time-specific measures or subsets of items within them should be undertaken only after a period of time equivalent to or longer than the

< previous page

page_21

next page >

< previous page

page_22

next page > Page 22

time interval to be considered in responding to the items. For example, an instrument that asks the patient to consider the extent to which certain symptoms have been problematic "during the past 7 days" should not be readministered for at least 7 days. The responses from a readministration that occurs less than 7 days after the first administration would include patients' consideration of their status during the previously considered time period. This may make interpretation of the change of symptom status (if any) from the first to the second administration difficult, if not impossible. Methods to determine if clinically significant change has occurred from one point in time to another have been developed and can be used for treatment monitoring. These methods are discussed in the outcomes assessment section of this chapter and in chapter 7. However, another approach to monitoring therapeutic change, referred to as the glide-path approach, may be superior. The term glide path refers to the narrow path of descent that airplanes must follow when landing. Deviation from the flight glide path requires corrections in the plane's speed, altitude, and/or attitude in order to land safely. R.L. Kane (personal communication, July 22, 1996) indicated that just as pilots have the instrumentation to alert them about the plane's position on the glide path, the clinician may use assessment instruments to track how well the patient is following the "glide path of treatment." The glide path, in this case, represents expected improvement over time in one or more measurable areas of functioning (e.g., symptom severity, social role functioning, occupational performance). The expectations would be based on objective data obtained from similar patients at various points during their treatment and would allow for minor deviations from the path. The end of the glide path is one or more specific goals that are part of the treatment plan. Thus, "arrival" at the end of the glide path signifies the attainment of specific treatment goal(s). Psychological Assessment as a Tool for Outcomes Management The 1990s have witnessed accelerating growth in the level of interest and development of behavioral health care outcomes programs. Cagney and Woods (1994) attributed this to four major factors. First, behavioral health care purchasers are asking for information regarding the value of the services they buy. Second, there is an increasing number of purchasers who are requiring a demonstration of patient improvement and satisfaction. Third, MCOs need data demonstrating that their providers render efficient and effective services. And fourth, outcomes information will be needed for the "quality report cards" that MCOs anticipate they will be required to provide in the future. In short, fueled by soaring health care costs, there has been an increasing need for providers to demonstrate that what they do is effective. And all of this has occurred within the context of the CQI movement, in which there has been similar trends in the level of interest and growth. As noted previously, the interest in and necessity for outcomes measurement and accountability in this era of managed care provides a unique opportunity for psychologists to use their training and skills in assessment (Maruish, 1994). However, the extent to which psychologists and other trained professionals become a key and successful contributor to an organization's outcomes initiative (whatever that might be) will depend on their understanding of what "outcomes" and their measurement and applications are all about.

< previous page

page_22

next page >

< previous page

page_23

next page > Page 23

What are Outcomes? Before discussing outcomes, it is important to have a clear understanding of what is meant by the term. Experience has shown that its meaning varies depending on the source. Donabedian (1985) identified three dimensions of quality of care. The first is structure. This refers to various aspects of the organization providing the care, including how the organization is "organized," the physical facilities and equipment, and the number and professional qualifications of its staff. Process refers to the specific types of services provided to a given patient (or group of patients) during a specific episode of care. These might include various tests and assessments (e.g., psychological tests, lab tests, magnetic resonance imaging), therapeutic interventions (e.g., group psychotherapy, medication), and discharge planning activities. Processes that address treatment complications (e.g., drug reactions) also are included here. Outcomes, on the other hand, refers to the results of the specific treatment that was rendered. As for the relation between these three facets of quality of care, Brook, McGlynn, and Cleary (1996) noted that "if quality-of-care criteria based on structural or process data are to be credible, it must be demonstrated that variations in the attribute they measure lead to differences in outcome. If outcome criteria are to be credible, it must be demonstrated that differences in outcome will result if the processes of care under the control of health professionals are altered" (p. 966). The outcomes, or results, of treatment should not imply a change in only a single aspect of functioning. Treatment may impact multiple facets of a patient's life. Stewart and Ware (1992) identified five broad aspects of general health status: physical health, mental health, social functioning, role functioning, and general health perception. Treatment may affect each of these aspects of health in different ways, depending on the disease or disorder being treated and the effectiveness of the treatment. Some specific aspects of functioning related to these five areas of general health status that are commonly measured include feelings of well-being, psychological symptom status, use of alcohol and other drugs, functioning on the job or at school, marital/family relationships, utilization of health care services, and ability to cope. In considering the types of outcomes that might be assessed in behavioral health care settings, a substantial number of clinicians probably would identify symptomatic change in psychological status as being the most important. However important change in symptom status may have been in the past, psychologists and other behavioral health care providers have come to realize that change in many other aspects of functioning identified by Stewart and Ware (1992) are equally important indicators of treatment effectiveness. As Sederer et al. (1996) noted, Outcome for patients, families, employers, and payers is not simply confined to symptomatic change. Equally important to those affected by the care rendered is the patient's capacity to function within a family, community, or work environment or to exist independently, without undue burden to the family and social welfare system. Also important is the patient's ability to show improvement in any concurrent medical and psychiatric disorder. . . . Finally, not only do patients seek symptomatic improvement, but they want to experience a subjective sense of health and well being. (p. 2) A much broader perspective is offered in Faulkner and Gray's The 1995 Behavioral Outcomes and Guidelines Sourcebook (Migdail, Youngs, & Bengen-Seltzer, 1995): Outcomes measures are being redefined from a vague "is the patient doing better?" to more specific questions, such as, "Does treatment work in ways that are measurably valuable to the

< previous page

page_23

next page >

< previous page

page_24

next page > Page 24

patient in terms of daily functioning level and satisfaction, to the payer in terms of value for each dollar spent, to the managed care organization charged with administering the purchaser's dollars, and to the clinician charged with demonstrating value for hours spent?" (p. 1) Thus, "outcomes" holds a different meaning for each of the different parties who have a stake in behavioral health care delivery; what is measured generally depends on the purpose(s) for which outcomes assessment is undertaken. As is shown here, these vary greatly. Outcomes Assessment: Measurement, Monitoring, and Management Just as it is important to be clear about what is meant by outcomes, it is equally important to clarify the three general purposes for which outcomes assessment may be employed. The first is outcomes measurement. This involves nothing more than pre-and posttreatment assessment of one or more variables to determine the amount of change that has occurred (if any) in these variables as a result of therapeutic intervention. A more useful approach is that of outcomes monitoring. This refers to "the use of periodic assessment of treatment outcomes to permit inferences about what has produced change" (Dorwart, 1996, p. 46). Like treatment progress monitoring used for treatment planning, outcomes monitoring involves the tracking of changes in the status of one or more outcomes variables at multiple points in time. Assuming a baseline assessment at the beginning of treatment, reassessment may occur one or more times during the course of treatment (e.g., weekly, monthly), at the time of termination, and/or during one or more periods of posttermination follow-up. Whereas treatment progress monitoring is used to determine deviation from the expected course of improvement, outcomes monitoring focuses on revealing aspects about the therapeutic process that seem to affect change. The third, and most useful, purpose of outcomes assessment is that of outcomes management. Dorwart (1996) defined outcomes management as "the use of monitoring information in the management of patients to improve both the clinical and administrative processes for delivering care" (pp. 46-47). Whereas Dorwart appeared to view outcomes management as relevant to the individual patient, it is a means to improve the quality of services offered to the patient population(s) served by the provider, not to any one patient. Information gained through the assessment of patients can provide the organization with indications of what works best, with whom, and under what set of circumstances, thus helping to improve the quality of services for all patients. In essence, outcomes management can serve as a tool for those organizations with an interest in implementing a CQI initiative (discussed later). The Benefits of Outcomes Assessment The implementation of any type of outcomes assessment initiative within an organization does not come without effort from and cost to the organization. However, if implemented properly, all interested partiespatients, clinicians, provider organizations, payers, and the health care industry as a wholeshould find the yield from the outlay of time and money to be substantial. Cagney and Woods (1994) identified several benefits to patients, including enhanced health and quality of life, improved health care quality, and effective use of the dollars paid into benefits plans. For providers, the outcomes data can result in

< previous page

page_24

next page >

< previous page

page_25

next page > Page 25

improved clinical skills, information related to the quality of care provided and to local practice standards, increased profitability, and decreased concerns over possible litigation. Outside of the clinical context, benefits also can accrue to payers and MCOs. Cagney and Woods (1994) saw the potential payer benefits as including healthier workers, improved health care quality, increased worker productivity, and reduced or contained health care costs. As for MCOs, the benefits include increased profits, information that can shape the practice patterns of their providers, and a decision-making process based on delivering quality care. The Therapeutic Use of Outcomes Assessment The foregoing overview provides the background necessary for discussing the use of outcomes data from psychological assessment in day-to-day clinical practice. Whereas the focus of the previous review was centered on both the individual patient and patient populations, it now will narrow to the use of outcomes assessment primarily in service to the individual patient. The reader interested in issues related to large, organization-wide outcomes studies conducted for outcomes management purposes (as defined earlier) is referred to chapter 8. The reader also is encouraged to seek other sources of information that specifically address that topic (see, e.g., Migdail et al., 1995; Newman, 1994). There is no one system or approach to the assessment of treatment outcomes for an individual patient that is appropriate for all providers of behavioral health care services. Because of the various types of outcomes of interest, the reasons for assessing them, and the manner in which they may impact decisions, any successful and useful outcomes assessment approach must be customized. Customization should reflect the needs of the primary benefactor of the assessment information (i.e., patient, payer, or provider), with consideration to the secondary stakeholders in the therapeutic endeavor. Ideally, the identified primary benefactor would be the patient. Although this is not always the case, it appears that only rarely would the patient not benefit from involvement in the outcomes assessment process. Following are considerations and recommendations for the development and implementation of an outcomes assessment initiative by behavioral health care providers. Although space limitations do not allow a comprehensive review of all issues and solutions, the information that follows can be useful to psychologists and others with similar training who wish to incorporate outcomes assessment into their standard therapeutic routine. Purpose of the Outcomes Assessment. There are numerous reasons for assessing outcomes. For example, in a recent survey of 73 behavioral health care organizations, the top five reasons (in descending order) identified by the participants as to why they had conducted an outcomes program were evaluation of outcomes for patients, evaluation of provider effectiveness, evaluation of integrated treatment programs, management of individual patients, and support of sales and marketing efforts (Pallak, 1994). However, from the clinician's standpoint, a couple of purposes are worth noting. In addition to monitoring the course of progress during treatment, clinicians may employ outcomes assessment to obtain a direct measure of how much patient improvement has occurred as the result of a completed course of treatment intervention. Here, the findings are of more benefit to the clinician than to patients themselves because a pre- and posttreatment approach to the assessment is utilized. The information will not lead to any change in the patient being assessed, but the feedback it provides to clinicians could assist them in the treatment of other patients in the future.

< previous page

page_25

next page >

< previous page

page_26

next page > Page 26

Another common reason for outcomes assessment is to demonstrate the patient's need for therapeutic services beyond that which is typically covered by the patient's health care benefits. When assessment is conducted for this reason, the patient and the clinician both may benefit from the outcomes data. However, the type of information that a third-party payer requires for authorization of extended benefits may not always be useful, relevant, or beneficial to the patient or the clinician. What to Measure. The specific aspects or dimensions of patient functioning that are measured as part of outcomes assessment will depend on the purpose for which the assessment is being conducted. As discussed earlier, probably the most frequently measured variable is that of symptomatology or psychological/mental health status. After all, disturbance or disruption in this dimension is probably the most common reason why people seek behavioral health care services in the first place. However, there are other reasons for seeking help. Common examples include difficulties in coping with various types of life transitions (e.g., a new job, a recent marriage or divorce, other changes in the work or home environment), an inability to deal with the behavior of others (e.g., spouse, children), or general dissatisfaction with life. Additional assessment of related variables therefore may be necessary, or may even take precedence over the assessment of symptoms or other indicators. Nevertheless, in the vast majority of the cases seen for behavioral health care services, the assessment of the patient's overall level of psychological distress or disturbance will yield the most singularly useful information. This is regardless of whether it is used for outcomes measurement, outcomes monitoring, outcomes management, or to meet the requirements of third-party payers. Indices such as the Positive Symptom Total (PST) or Global Severity Index (GSI), which are part of both the SA-45 and the BSI, can provide this type of information efficiently and economically. For some patients, measures of one or more specific psychological disorders or symptom clusters are at least as important, if not more important, than overall symptom or mental health status. Here, if interest is in only one disorder or symptom cluster (e.g., depression), clinicians may choose to measure only that particular set of symptoms using an instrument designed specifically for that purpose (e.g., use of the BDI with depressed patients). For those interested in assessing the outcomes of treatment relative to multiple psychological dimensions, the administration of more than one disorder-specific instrument or a single, multiscale instrument that assesses all or most of the dimensions of interest would be required. Again, instruments such as the SA-45 or the BSI can provide a quick, broad assessment of several symptom domains. Although much lengthier, other multiscale instruments, such as the MMPI-2 or the PAI, permit a more detailed assessment of multiple disorders or symptom domains using one inventory. In many cases, the assessment of mental health status alone is adequate for outcomes assessment purposes. There are other instances in which changes in psychological distress or disturbance either (a) provide only a partial indication of the degree to which therapeutic intervention has been successful and are not of interest to the patient or a third-party payer; (b) are unrelated to the reason why the patient sought services in the first place; or (c) are otherwise inadequate or unacceptable as measures of improvement in the patient's condition. One may find that for some patients, improved functioning on the job, at school, or with family or friends is much more relevant and important than symptom reduction. For other patients, improvement in their quality of life or sense of well-being is more meaningful.

< previous page

page_26

next page >

< previous page

page_27

next page > Page 27

It is not always a simple matter to determine exactly what should be measured. However, careful consideration of the following questions should greatly facilitate the decision: 1. Why did the patient seek services? People pursue treatment for many reasons. The patient's stated reason for seeking therapeutic assistance may be the first clue in determining what is important to measure. 2. What does the patient hope to gain from treatment? The patients' stated goals for the treatment they are about to receive may be a primary consideration in the selection of outcomes to assess. 3. What are the patient's criteria for successful treatment? The patient's goals for treatment may provide only a broad target for the therapeutic intervention. Having the patient identify exactly what would have to happen to consider treatment successful and no longer necessary would help in specifying the most important constructs and/or behaviors to assess. 4. What are the clinician's criteria for the successful completion of the current therapeutic episode? What patients identify as being important to accomplish during treatment might reflect a lack of insight into their problems, or it might be inconsistent with what an impartial observer would consider indicative of meaningful improvement. In such cases, it probably would be more appropriate for the clinician to determine what constitutes therapeutic success and the associated outcomes variables. 5. What are the criteria for the successful completion of the current therapeutic episode by significant third parties? From a strict therapeutic perspective, this should be given the least amount of consideration. From a more realistic perspective, the expectations and limitations that one or more third parties have for the treatment rendered cannot be overlooked. The expectations and limitations set by the patient's parents/guardian, significant other, health care plan, guidelines of the organization in which the clinician practices, and possibly other external forces may significantly play into the decision about when to terminate treatment. 6. What, if any, are the outcomes initiatives within the provider organization? One cannot ignore any outcomes programs that have been initiated by the organization in which the therapeutic services are delivered. Regardless of the problems and goals of the individual patient, organization-wide studies of treatment effectiveness may dictate the gathering of specific types of outcomes data from patients who have received services. Note that the selection of the variables to be assessed may address more than one of the previous issues. Ideally, this is what should happen. However, clinicians need to ensure that the task of gathering outcomes data does not become too burdensome. As a general rule, the more outcomes data they attempt to gather from a given patient or collateral, the less likely it is that they will obtain any data at all. The key is to identify the point where the amount of data that can be obtained from a patient and/or collaterals and the ease at which it can be gathered are optimized. How to Measure. Once the decision of what to measure has been made, clinicians must then decide how it should be measured. In many cases, the most important data will be that obtained directly from the patient using self-report instruments. Underlying this assertion is the assumption that valid and reliable instrumentation, appropriate to the needs of the patient, is available to the clinician; the patient can read at the level required by the instruments; and the patient is motivated to respond honestly to the questions asked. Barring one or more of these conditions, other options should be considered. Other types of data-gathering tools may be substituted for self-report measures. Rating scales completed by the clinician or other members of the treatment staff may provide information that is as useful as that elicited directly from the patient. In those

< previous page

page_27

next page >

< previous page

page_28

next page > Page 28

cases in which the patient is severely disturbed, unable to give valid and reliable answers (e.g., younger children), unable to read or is an otherwise inappropriate candidate for a self-report measure, clinical rating scales can serve as a valuable substitute for gathering information about the patient. Related to these instruments are parent-completed inventories for child and adolescent patients. These are particularly useful in obtaining information about the child's or teen's behavior that might not otherwise be known. Collateral rating instruments and parent report instruments can also be used to gather information in addition to that obtained from self-report measures. When used in this manner, these instruments provide a mechanism by which the clinician, other treatment staff, and/or parents, guardians, or other collaterals can contribute data to the outcomes assessment endeavor. This not only results in the clinician or provider organization having more information on which to evaluate the outcomes of therapeutic intervention, it also gives the clinician an opportunity to ensure that the perspectives of the treatment provider and/or relevant third parties are considered in this evaluation. Another potential source of outcomes information is administrative data. In many of the larger provider organizations, this information can easily be retrieved through the organization's management information systems (MISs). Data related to the patient's diagnosis, dose and regimen of medication, physical findings, course of treatment, resource utilization, treatment costs, and other types of data typically stored in these systems can be useful in evaluating the outcomes of therapeutic intervention. When to Measure. There are no hard and fast rules or widely accepted conventions related to when outcomes should be assessed. The common practice is to assess the patient at least at treatment initiation and termination/discharge. Obviously, at the beginning of treatment, the clinician should obtain a baseline measure of whatever variables will be measured at termination. At minimum, this allows for "outcomes measurement" as previously described. As has also been discussed, additional assessment of the patient on the variables of interest can take place at other points in time, that is, at other times during treatment and on postdischarge follow-up. Many would argue that postdischarge/posttermination follow-up assessment provides the best or most important indication of the outcomes of therapeutic intervention. Two types of comparisons may be made on follow-up. The first is a comparison of the patient's status on the variables of interesteither at the time of treatment initiation or at the time of discharge or terminationto that of the patient at the time of follow-up assessment. Either way, this follow-up data will provide an indication of the more lasting effects of the intervention. Generally, the variables of interest for this type of comparison include symptom presence and intensity, feeling of well-being, frequency of substance use, and social or role functioning. The second type of posttreatment investigation involves comparing the frequency or severity of some aspect(s) of the patient's life circumstances, behavior, or functioning that occurred during an interval of time prior to treatment, to that which occurred during an equivalent period of time immediately preceding the postdischarge assessment. This approach is commonly used in determining the medical cost offset benefits of treatment. For example, the number of times a patient has been seen in an emergency room for psychiatric problems during the 3-month period preceding the initiation of outpatient treatment can be compared to the number of emergency room visits during the 3-month period preceding the postdischarge follow-up assessment. Not only can this provide an indication of the degree to which treatment has helped patients deal with their problems, it also can demonstrate how much medical expenses have been reduced through the patients' decreased use of costly emergency room services.

< previous page

page_28

next page >

< previous page

page_29

next page > Page 29

In general, postdischarge outcomes assessment probably should take place no sooner than 1 month after treatment has ended. When feasible, waiting 3 to 6 months to assess the variables of interest is preferred. A longer interval between discharge and postdischarge follow-up should provide a more valid indication of the lasting effects of treatment. Assessments being conducted to determine the frequency at which some behavior or event occurs (as may be needed to determine cost offset benefits) should be administered no sooner than the reference time interval used in the baseline assessment. For example, suppose that the patient reports 10 emergency room visits during the 3-month period prior to treatment. If a clinician wants to know if the patient's emergency room visits have decreased after treatment, the assessment cannot take place any earlier than 3 months after treatment termination. How to Analyze Outcomes Data. There are two general approaches to the analysis of treatment outcomes data. The first is by determining whether changes in patient scores on outcomes measures are statistically significant. The other is by establishing whether these changes are clinically significant. Use of standard tests of statistical significance is important in the analysis of group or population change data. This topic is addressed in chapter 8. Clinical significance is more relevant to change in the individual patient's scores. As this chapter's focus is on the individual patient, this section centers on matters related to determining clinically significant change as the result of treatment. The issue of clinical significance has received a great deal of attention in psychotherapy research during the past several years. This is at least partially owing to the work of Jacobson and his colleagues (Jacobson, Follette, & Revenstorf, 1984, 1986; Jacobson & Truax, 1991) and others (e.g., Christensen & Mendoza, 1986; Speer, 1992; Wampold & Jenson, 1986). Their work came at a time when researchers began to recognize that traditional statistical comparisons do not reveal a great deal about the efficacy of therapy. In discussing the topic, Jacobson and Truax broadly defined the clinical significance of treatment as "its ability to meet standards of efficacy set by consumers, clinicians, and researchers" (p. 12). Further, they noted that while there is little consensus in the field regarding what these standards should be, various criteria have been suggested: a high percentage of clients improving . . .; a level of change that is recognizable by peers and significant others . . .; an elimination of the presenting problem . . .; normative levels of functioning at the end of therapy . . .; high end-state functioning at the end of therapy . . .; or changes that significantly reduce one's risk for various health problems. (p. 12) From their perspective, Jacobson and his colleagues (Jacobson et al., 1984; Jacobson & Truax, 1991) felt that clinically significant change could be conceptualized in one of three ways. Thus, for clinically significant change to have occurred, the measured level of functioning following the therapeutic episode would either fall outside the range of the dysfunctional population by at least two standard deviations from the mean of that population, in the direction of functionality; fall within two standard deviations of the mean for the normal or functional population; or be closer to the mean of the functional population than to that of the dysfunctional population. Jacobson and Truax viewed the third option as being the least arbitrary, and they provided different recommendations for determining cutoffs for clinically significant change, depending on the availability of normative data. At the same time, these same investigators noted the importance of considering the change in the measured variables of interest from pre- to posttreatment in addition to

< previous page

page_29

next page >

< previous page

page_30

next page > Page 30

the patient's functional status at the end of therapy. To this end, Jacobson et al. (1984) proposed the concomitant use of a reliable change (RC) index to determine whether change is clinically significant. This index, modified on the recommendation of Christensen and Mendoza (1986), is nothing more than the pretest score minus the posttest score divided by the standard error of the difference of the two scores. The RC index and its use are discussed in detail in chapter 4. Psychological Assessment as a Tool for Continuous Quality Improvement Implementing a regimen of psychological testing for planning treatment and/or assessing its outcome has a place in all organizations where the delivery of cost-efficient, quality behavioral health care services is a primary goal. However, additional benefits can accrue from testing when it is incorporated within an ongoing program of service evaluation and continuous quality improvement (CQI). Although espoused by Americans, the CQI philosophy was initially implemented by the Japanese in rebuilding their economy after World War II. Today, many U.S. organizations have sought to balance quality with cost by implementing CQI procedures. Simply put, CQI may be viewed as a process of continuously setting goals, measuring progress toward the achievement of those goals, and subsequently reevaluating them in light of the progress made. Underlying the CQI process are a few simple assumptions. First, those organizations that can produce high quality products or services at the lowest possible cost have the best chance of surviving and prospering in today's competitive market. Second, it is less costly to prevent errors than to correct them, and the process of preventing errors is a continuous one. Third, it is assumed that the workers within the organization are motivated and empowered to improve the quality of their products or services based on the information they receive about their work. More information about CQI can be found in several sources (e.g., Berwick, 1989; Dertouzos, Lester, & Solow, 1989; Donabedian, 1980, 1982, 1985; P.L. Johnson, 1989; Scherkenback, 1987; Shewhart, 1939; Walton, 1986). A continuous setting, measurement, and reevaluation of goalscharacteristic of the CQI processis being employed by many health care organizations as part of their efforts to survive in a competitive, changing market. At least in part, this move also reflects what InterStudy (a predecessor of the Health Outcomes Institute) described as a ''shifting from concerns about managing costs in isolation to a more comprehensive view that supplements an understanding of costs with an understanding of the quality and value of care delivered" (1991, p. 1). InterStudy defined quality as a position or view that should lead all processes within a system. In the case of the health care system, the most crucial of these processes is that of patient care. InterStudy pointed out that with a CQI orientation, these processes must be well-defined, agreed on, and implemented unvaryingly when delivering care. They also should provide measurable results that will subsequently lead to conclusions about how the processes might be altered to improve the results of care. InterStudy considered CQI as implying "a system that articulates the connections between inputs and outputs, between processes and outcomes . . ., a way of organizing information in order to discover what works, and what doesn't" (p. 1).

< previous page

page_30

next page >

< previous page

page_31

next page > Page 31

In behavioral health care, as in other arenas of health care, CQI is concerned with the services delivered to customers. Here, the "customer" may include not only the patient being treated, but also the employer through whom the health care plan is offered and the third-party payer who selects or approves the service providers who can be accessed by individuals seeking care under the health care plan. It should be evident from the discussion presented throughout this chapter that psychological testing can help the provider focus on delivering the most efficient and effective treatment in order to satisfy the needs of all "customers." It thus can contribute greatly to the CQI effort. Perhaps the most apparent way in which testing can augment the CQI process is through its contributions in the area of outcomes assessment. Through the repeated administration of tests to all patients at intake and later at one or more points during or after the treatment process, an organization can obtain a good sense of how effective individual clinicians, treatment program/units, and/or the organization as a whole are in providing services to their patients. This testing might include the use of not only problem-oriented measures but also measures of patient satisfaction. Considered in light of other, nontest data, this may result in changes in service delivery goals such as the implementation of more effective problem identification and treatment planning procedures. For example, Newman's (1991) graphic demonstration of how data used to support treatment decisions can be extended to indicate how various levels of depression (as measured by the Beck Depression Inventory) may be best served by different types of treatment (e.g., inpatient vs. outpatient). Future Directions The ways in which psychologists and other behavioral health care clinicians conduct the types of psychological assessment described here have continued to undergo dramatic changes during the 1990s. This should come as no surprise to anyone who spends a few minutes a day skimming the newspaper or watching the evening news. The health care revolution started gaining momentum at the beginning of the decade and has not slowed down since that timeand there are no indications that it will subside in the foreseeable future. From the beginning, there was no real reason to think that behavioral health care would be spared from the effects of the health care revolution, and there is no good reason why it should have been spared. The behavioral health care industry certainly has contributed its share of waste, inefficiency, and lack of accountability to the problems that led to the revolution. Now, like other areas of health care, it is forced to "clean up its act." Although some consumers of mental health or chemical dependency services have benefited from the revolution, others have not. Regardless, the way in which health care is delivered and financed has changed, and psychologists and other behavioral health care professionals must adapt to survive in the market. Some of those involved in the delivery of psychological assessment services may wonder (with some fear and trepidation) where the revolution is leading the behavioral health care industry and, in particular, how their ability to practice will be affected. At the same time, others are eagerly awaiting the inevitable advances in technology and other resources that will come with the passage of time. What ultimately will occur is open to speculation. However, close observation of the practice of psychological assessment and the various industries that support it has led to a few predictions as to where

< previous page

page_31

next page >

< previous page

page_32

next page > Page 32

the field of psychological assessment is headed and the implications for patients, clinicians, and provider organizations. What the Industry is Moving Away from One way of discussing what the field is moving toward is to first talk about what it is moving away from. In the case of psychological assessment, two trends are becoming quite clear. First, starting from the early 1990s, the use of (and reimbursement for) psychological assessment has gradually been curtailed. In particular, this has been the case with regard to indiscriminate administration of lengthy and expensive psychological test batteries. Payers began to demand evidence that the knowledge gained from the administration of these instruments in fact contributes to the delivery of cost-effective, efficient care to patients. There seem to be no indications that this trend will stop. Second, the form of assessment commonly used is moving away from lengthy, multidimensional objective instruments (e.g., MMPI) or time-consuming projective techniques (e.g., Rorschach) that previously represented the standard in practice. The type of assessment authorized now usually involves the use of brief, inexpensive, yet well-validated problem-oriented instruments. This reflects modern behavioral health care's time-limited, problem-oriented approach to treatment. Today, the clinician can no longer afford to spend a great deal of time in assessment when the patient is only allowed a limited number of payer-authorized sessions. Thus, brief instruments will become more commonly employed for problem identification, progress monitoring, and outcomes assessment in the foreseeable future. Trends in Instrumentation In addition to the move toward the use of brief, problem-oriented instruments, another trend in the selection of instrumentation is the increasing use of public domain tests, questionnaires, rating scales, and other measurement tools. Typically, these "free-use" instruments were not developed with the same rigor that is applied by commercial test publishers in the development of psychometrically sound instruments. Consequently, they commonly lacked the validity and reliability data necessary to judge their psychometric integrity. Recently, however, there has been significant improvement in the quality and documentation of the public domain and other free-use tests that are available. Instruments such as the SF-36/SF-12 and HSQ/HSQ-12 health measures are good examples of such tools. These and instruments such as the Behavior and Symptom Identification Scale (BASIS-32; Eisen, Grob, & Klein, 1986) and the Outcome Questionnaire (OQ-45.1; Lambert, Lunnen, Umphress, Hansen, & Burlingame, 1994) have undergone psychometric scrutiny and have gained widespread acceptance as a result. Although copyrighted, these instruments may be used for a nominal one-time or annual licensing fee; thus, they generally are treated much like public domain assessment tools. In the future, other high quality, useful instruments will be made available for use at little or no cost. As for the types of instrumentation that will be needed and developed, some changes can probably be expected. Accompanying the increasing focus on outcomes assessment is a recognition by payers and patients that positive change in several areas of functioning are at least as important as change in level of symptom severity when evaluating

< previous page

page_32

next page >

< previous page

page_33

next page > Page 33

treatment effectiveness. For example, employers are interested in patients' ability to resume the functions of their job, whereas family members likely are concerned with the patients' ability to resume their role as spouse or parent. Increasingly, measurement of the patient's functioning in areas other than psychological/mental status has come to be included as part of behavioral health care outcomes systems. Probably the most visible indications of this is the incorporation of the SF-36 or HSQ in various behavioral health care studies, and the fact that at least three major psychological test publishers now offer HSQ products in their clinical product catalogs. Other public domain and commercially available nonsymptom-oriented instruments, especially those emphasizing social and occupational role functioning, will likely appear in increasing numbers over the next several years. Other types of instrumentation also will become prominent. These may well include measures of variables that support outcomes and other assessment initiatives undertaken by provider organizations. What one organization or provider believes is important, or what payers determine is important for reimbursement or other purposes, will dictate what is measured. Instrumentation also may include measures that will be useful in predicting outcomes for individuals seeking specific psychotherapeutic services from those organizations. Conclusions The health care revolution has brought mixed blessings to those in the behavioral health care professions. It has limited reimbursement for services rendered and has forced many to change the way they practice their profession. At the same time, it has led to revelations about the cost savings that can accrue from the treatment of mental health and substance use disorders. This has been the bright spot in an otherwise bleak picture for some behavioral health care professionals. But, for psychologists and others trained in psychological assessment procedures, the picture appears to be somewhat different. They now have additional opportunities to contribute to the positive aspects of the revolution and to gain from the "new order" it has imposed. By virtue of their training and through the application of appropriate instrumentation, they are uniquely qualified to support or otherwise facilitate multiple aspects of the therapeutic process. Earlier in this chapter, some of the types of psychological assessment instruments that are commonly used in the therapeutic endeavors were identified. These included both brief and lengthy (multidimensional) symptom measures, as well as measures of general health status, quality of life, role functioning, and patient satisfaction. Also identified were different sets of general criteria that can be applied when selecting instruments for use in therapeutic settings. The main intent of this chapter, however, was to present an overview of the various ways in which psychological assessment can be used to facilitate the selection, implementation, and evaluation of appropriate therapeutic interventions in behavioral health care settings. Generally, psychological assessment can assist the clinician in three important clinical activities: clinical decision making, treatment (when used as a specific therapeutic technique), and treatment outcomes evaluation. Regarding the first of these activities, three important clinical decision-making functions can be facilitated by psychological assessment: screening, treatment planning, and treatment monitoring. The first of these can be served by the use of brief instruments designed to identify, within a high degree of

< previous page

page_33

next page >

< previous page

page_34

next page > Page 34

certainty, the likely presence (or absence) of a particular condition or characteristic. Here, the diagnostic efficiency of the instrument used (as indicated by their PPPs and NPPs) is of great importance. Through their ability to identify and clarify problems, as well as other important treatment-relevant patient characteristics, psychological assessment instruments also can be of great assistance in planning treatment. In addition, treatment monitoring, or the periodic evaluation of the patient's progress during the course of treatment, can be served well by the application psychological assessment instruments. Second, assessment may be used as part of a therapeutic technique. In what Finn termed "therapeutic assessment," situations in which patients are evaluated via psychological testing are used as opportunities for the process itself to serve as a therapeutic intervention. This is accomplished through involving the patient as an active participant in the assessment process, not just as the object of the assessment. Third, psychological assessment can be employed as the primary mechanism by which the outcomes or results of treatment can be measured. However, use of assessment for this purpose is not a cut-and-dried matter. Issues pertaining to what to measure, how to measure, and when to measure require considerable thought prior to undertaking a plan to assess outcomes. Guidelines for resolving these issues are presented, as is information on how to determine if the measured outcomes of treatment are indeed "significant." The role that outcomes assessment can have in an organization's CQI initiative also was discussed. The final section of the chapter shared some thoughts about where psychological assessment is headed in the future. In general, what is foreseen is the appearance of more quality, affordable instrumentation designed to assess various aspects of a patient's functioning. There is no doubt that the practice of psychological assessment has been dealt a blow within recent years. However, clinicians trained in the use of psychological tests and related instrumentation have the skills to take these powerful tools, apply them in ways that will benefit those suffering from mental health and substance abuse problems, and demonstrate their value to patients and payers. Only time will tell whether they will be successful in this demonstration. Meanwhile, the field will continue to make advancements that will facilitate and improve the quality of its work. A Final Word As suggested earlier, psychologists' training in psychological testing should provide them with an edge in surviving in the evolving revolution in mental health service delivery. Maximizing their ability to use the "tools of the trade" to facilitate problem identification, subsequent planning of appropriate treatment, and measuring and documenting the effectiveness of their efforts can only aid in clinicians' quest for optimal efficiency and quality in service. It is hoped that the information and guidance provided by the many distinguished contributors to this volume will assist practicing psychologists, psychologists-in-training, and other behavioral health care providers in maximizing the resources available to them and thus in prospering in the emerging new health care arena. This is a time of uncertainty and perhaps some anxiety. It is also a time of great opportunity. How practitioners choose to face the current state of affairs is a matter of personal and professional choice.

< previous page

page_34

next page >

< previous page

page_35

next page > Page 35

Acknowledgment This chapter is adapted from M.E. Maruish, "Therapeutic Assessment: Linking Assessment and Treatment," in M. Hersen & A. Bellack (Series Eds.) & C.R. Reynolds (Vol. Ed.), Comprehensive clinical psychology: Vol. 4. Assessment (in press), with permission from Elsevier Science Ltd., The Boulevard, Langford Lane, Kidlington OX5 1GB, UK. References American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. American Psychological Association. (1992). Ethical principles. Washington, DC: Author. American Psychological Association. (1996). The costs of failing to provide appropriate mental health care. Washington, DC: Author. Andrews, G., Peters, L., & Teesson, M. (1994). The measurement of consumer outcomes in mental health. Canberra, Australia: Australian Government Publishing Service. Appelbaum, S.A. (1990). The relationship between assessment and psychotherapy. Journal of Personality Assessment, 54, 791-801. Attkisson, C.C., & Zwick, R. (1982). The Client Satisfaction Questionnaire: Psychometric properties and correlations with service utilization and psychotherapy outcome. Evaluation and Program Planning, 6, 233-237. Beck, A.T., Rush, A.J., Shaw, B.F., & Emery, G. (1979). Cognitive therapy of depression. New York: Guilford. Berwick, D.M. (1989). Sounding board: Continuous improvement as an ideal in health care. New England Journal of Medicine, 320, 53-56. Beutler, L.E., & Clarkin, J. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. New York: Brunner/Mazel. Beutler, L.E., Wakefield, P., & Williams, R.E. (1994). Use of psychological tests/instruments for treatment planning. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 55-74). Hillsdale, NJ: Lawrence Erlbaum Associates. Beutler, L.E., & Williams, O.B. (1995). Computer applications for the selection of optimal psychosocial therapeutic interventions. Behavioral Healthcare Tomorrow, 4, 66-68. Brook, R.H., McGlynn, E.A., & Cleary, P.D. (1996). Quality of health care: Part: 2 Measuring quality of care. New England Journal of Medicine, 335, 966-970. Burlingame, G.M., Lambert, M.J., Reisinger, C. W., Neff, W.M., & Mosier, J. (1995). Pragmatics of tracking mental health outcomes in a managed care setting. Journal of Mental Health Administration, 22, 226-236. Butcher, J.N. (1990). The MMPI-2 in psychological treatment. New York: Oxford University Press. Butcher, J.N., Dahlstrom, W.G., Graham, J.R., Tellegen, A.M., & Kaemmer, B. (1989). MMPI-2: Manual for administration and scoring. Minneapolis, MN: University of Minnesota Press. Butcher, J.N., Graham, J.R., Williams, C.L., & Ben-Porath, Y. (1989). Development and use of the MMPI-2 content scales. Minneapolis, MN: University of Minnesota Press. Cagney, T., & Woods, D.R. (1994). Why focus on outcomes data? Behavioral Healthcare Tomorrow, 3, 65-67. Center for Disease Control and Prevention. (1994, May 27). Quality of life as a new public health measure: Behavioral risk factor surveillance system. Morbidity and Mortality Weekly Report, 43, 375-380. Christensen, L., & Mendoza, J.L. (1986). A method of assessing change in a single subject: An alteration of the RC index [Letter to the editor]. Behavior Therapy, 17, 305-308. Ciarlo, J.A., Brown, T.R., Edwards, D.W., Kiresuk, T.J., & Newman, F.L. (1986). Assessing mental health treatment outcomes measurement techniques (DHHS Publication No. ADM 86-1301). Washington, DC: U.S. Government Printing Office. Derogatis, L.R. (1983). SCL-90-R: Administration, scoring and procedures manualII. Baltimore: Clinical Psychometric Research.

< previous page

page_35

next page >

< previous page

page_36

next page > Page 36

Derogatis, L.R. (1992). BSI: Administration, scoring and procedures manualII. Baltimore: Clinical Psychometric Research. Derogatis, L.R., & DellaPietra, L. (1994). Psychological tests in screening for psychiatric disorder. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 22-54). Hillsdale, NJ: Lawrence Erlbaum Associates. Derogatis, L. R., Lipman, R.S., & Covi, L. (1973). SCL-90: An outpatient psychiatric rating scalepreliminary report. Psycho-pharmacology Bulletin, 9, 13-27. Dertouzos, M.L., Lester, R.K., & Solow, R.M. (1989). Made in America: Regaining the productive edge. Cambridge, MA:MIT Press. Dickey, B., & Wagenaar, H. (1996). Evaluating health status. In L.I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 55-60). Baltimore: Williams & Wilkins. Donabedian, A. (1980). Explorations in quality assessment and monitoring: The definition of quality and approaches to its assessment (Vol. 1). Ann Arbor, MI: Health Administration Press. Donabedian, A. (1982). Explorations in quality assessment and monitoring: The criteria and standards of quality (Vol. 2). Ann Arbor, MI: Health Administration Press. Donabedian, A. (1985). Explorations in quality assessment and monitoring: The methods and findings in quality assessment: An illustrated analysis (Vol. 3). Ann Arbor, MI: Health Administration Press. Dorwart, R.A. (1996). Outcomes management strategies in mental health: Applications and implications for clinical practice. In L.I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 45-54). Baltimore: Williams & Wilkins. Dowd, E.T., Milne, C.R., & Wise, S.L. (1991). The Therapeutic Reactance Scale: A measure of psychological reactance. Journal of Counseling and Development, 69, 541-545. Eisen, S.V., Grob, M.C., & Klein, A.A. (1986). BASIS: The development of a self-report measure for psychiatric inpatient evaluation. The Psychiatric Hospital, 17, 165-171. Fee, practice and managed care survey. (1995, January). Psychotherapy Finances, 21 (1), Issue 249. Ficken, J. (1995). New directions for psychological testing. Behavioral Health Management, 20, 12-14. Finn, S.E. (1996a). Assessment feedback integrating MMPI-2 and Rorschach findings. Journal of Personality Assessment, 67, 543-557. Finn, S. E. (1996b). Manual for using the MMPI-2 as a therapeutic intervention. Minneapolis, MN: University of Minnesota Press. Finn, S.E., & Butcher, J.N. (1991). Clinical objective personality assessment. In M. Hersen, A.E. Kazdin, & A.S. Bellack (Eds.), The clinical psychology handbook (2nd ed., pp. 362-373). New York: Pergamon. Finn, S.E., & Martin, H. (1997). Therapeutic assessment with the MMPI-2 in managed health care. In J.N. Butcher (Ed.), Personality assessment in managed care (pp. 131-152). Minneapolis: University of Minnesota Press. Finn, S.E., & Tonsager, M.E. (1992). Therapeutic effects of providing MMPI-2 test feedback to college students awaiting therapy. Psychological Assessment, 4, 278-287. Friedman, R., Sobel, D., Myers, P., Caudill, M., & Benson, H. (1995). Behavioral medicine, clinical health psychology, and cost off-set. Health Psychology, 14, 509-518. Future targets behavioral health field's quest for survival. (1996, April 8). Mental Health Weekly, pp. 1-2. Gough, H.G., McClosky, H., & Meehl, P.E. (1951). A personality scale for dominance. Journal of Abnormal and Social Psychology, 46, 360-366. Gough, H.G., McClosky, H., & Meehl, P.E. (1952). A personality scale for social responsibility. Journal of Abnormal and Social Psychology, 47, 73-80. Greene, R.L. (1991). The MMPI-2/MMPI: An interpretive manual. Boston: Allyn & Bacon. Greene, R.L., & Clopton, J.R. (1994). Minnesota Multiphasic Personality Inventory-2. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 137-159). Hillsdale, NJ: Lawrence Erlbaum Associates. Greenfield, T.K., & Attkisson, C.C. (1989). Progress toward a multifactorial service satisfaction scale for evaluating primary care and mental health services. Evaluation and Program Planning, 12, 271-278.

< previous page

page_37

next page > Page 37

Hathaway, S.R., & McKinley, J.C. (1951). MMPI manual. New York: The Psychological Corporation. Health Outcomes Institute. (1993). Health Status Questionnaire 2.0 manual. Bloomington, MN: Author. Holder, H.D., & Blose, J.O. (1986). Alcoholism treatment and total health care utilization and costs: A 4-year longitudinal analysis of federal employees. Journal of the American Medical Association, 256, 1456-1460. InterStudy. (1991). Preface. The InterStudy Quality Edge, 1, 1-3. Jahoda, M. (1958). Current concepts of mental health. New York: Basic Books. Jacobson, N.S., Follette, W.C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy, 15, 336-352. Jacobson, N.S., Follette, W.C., & Revenstorf, D. (1986). Toward a standard definition of clinically significant change [Letter to the editor]. Behavior Therapy, 17, 309-311. Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Johnson, J., Weissman, M., & Klerman, G.J. (1992). Service utilization and social morbidity associated with depressive symptoms in the community. Journal of the American Medical Association, 267, 1478-1483. Johnson, P.L. (1989). Keeping score: Strategies and tactics for winning the quality war. New York: Harper & Row. Kiesler, C.A., & Morton, T.L. (1988). Psychology and public policy in the "health care revolution." American Psychologist, 43, 993-1003. Lambert, M.J., Lunnen, K., Umphress, V., Hansen, N.B., & Burlingame, G.M. (1994). Administration and scoring manual for the Outcome Questionnaire (OQ-45.1). Salt Lake City, UT: IHC Center for Behavioral Healthcare Efficacy. Larsen, D.L., Attkisson, C.C., Hargreaves, W.A., & Nguyen, T.D. (1979). Assessment client/patient satisfaction: Development of a general scale. Evaluation and Program Planning, 2, 197-207. Leaders predict integration of MH, primary care by 2000. (1996, April 8). Mental Health Weekly, pp. 1, 6. LeVois, M., Nguyen, T.D., & Attkisson, C.C. (1981). Artifact in client satisfaction assessment: Experience in community mental health settings. Evaluation and Program Planning, 4, 139-150. Maruish, M. (1990). Psychological assessment: What will its role be in the future? Assessment Applications, Fall, 7-8. Maruish, M.E. (1994). Introduction. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 3-21). Hillsdale, NJ: Lawrence Erlbaum Associates. Megargee, E.I., & Spielberger, C.D. (1992). Reflections on 50 years of personality assessment and future directions for the field. In E.I. Megargee & C.D. Spielberger (Eds.), Personality assessment in America (pp. 170-190). Hillsdale, NJ: Lawrence Erlbaum Associates. Migdail, K.J., Youngs, M.T., & Bengen-Seltzer, B. (Eds.). (1995). The 1995 behavioral outcomes and guidelines sourcebook. New York: Faulkner & Gray. Millon, T. (1994). MCMI-III manual. Minneapolis, MN: National Computer Systems. Moreland, K.L. (1996). How psychological testing can reinstate its value in an era of cost containment. Behavioral Healthcare Tomorrow, 5, 59-61. Morey, L.C. (1991). The Personality Assessment Inventory professional manual. Odessa, FL: Psychological Assessment Resources. Morey, L.C., & Henry, W. (1994). Personality Assessment Inventory. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 185-216). Hillsdale, NJ: Lawrence Erlbaum Associates. Newman, F.L. (1991). Using assessment data to relate patient progress to reimbursement criteria. Assessment Applications, Summer, 4-5. Newman, F.L. (1994). Selection of design and statistical procedures for progress and outcome assessment. In M.E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 111134). Hillsdale, NJ: Lawrence Erlbaum Associates.

< previous page

page_37

next page >

< previous page

page_38

next page > Page 38

Nguyen, T.D., Attkisson, C.C., & Stegner, B. L. (1983). Assessment of patient satisfaction: Development and refinement of a service evaluation questionnaire. Evaluation and Program Planning, 6, 299-313. Olfson, M., & Pincus, H.A. (1994a). Outpatient psychotherapy in the United States: I. Volume, costs, and user characteristics. American Journal of Psychiatry, 151, 1281-1288. Olfson, M., & Pincus, H.A. (1994b). Outpatient psychotherapy in the United States: II. Patterns of utilization. American Journal of Psychiatry, 151, 1289. Oss, M.E. (1996). Managed behavioral health care: A look at the numbers. Behavioral Health Management, 16, 16-17. Pallak, M.S. (1994). National outcomes management survey: Summary report. Behavioral Healthcare Tomorrow, 3, 63-69. Phelps, R. (1996, February). Preliminary practitioner survey results enhance APA's understanding of health care environment. Practitioner Focus, 9, 5. Radosevich, D., & Pruitt, M. (1996). Twelve-item Health Status Questionnaire (HSQ-12) Version 2.0 user's guide. Bloomington, MN: Health Outcomes Institute. Radosevich, D.M., Wetzler, H., & Wilson, S. M. (1994). Health Status Questionnaire (HSQ) 2.0: Scoring comparisons and reference data. Bloomington, MN: Health Outcomes Institute. Rouse, B.A. (Ed.) (1995). Substance abuse and mental health statistics sourcebook (DHHS Publication No. SMA 95-3064). Washington, DC: Superintendent of Documents, U.S. Government Printing Office. Saravay, S.M., Pollack, S., Steinberg, M.D., Weinschel, B., & Habert, M. (1996). Four-year follow-up of the influence of the influence of psychological comorbidity on medical rehospitalization. American Journal of Psychiatry, 153, 397-403. Scherkenback, W.W. (1987). The Deming route to quality and productivity: Road maps and roadblocks. Rockville, MD: Mercury Press/Fairchild Publications. Schlosser, B. (1995). The ecology of assessment: A "patient-centric" perspective. Behavioral Healthcare Tomorrow, 4, 66-68. Sederer, L.I., Dickey, B., & Hermann, R.C. (1996). The imperative of outcomes assessment in psychiatry. In L.I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 1-7). Baltimore: Williams & Wilkins. Shewhart, W.A. (1939). Statistical methods from the viewpoint of quality control. Washington, DC: U.S. Department of Agriculture Graduate School. Simmons, J.W., Avant, W.S., Demski, J., & Parisher, D. (1988). Determining successful pain clinic treatment through validation of cost effectiveness. Spine, 13, 34. Simon, G.E., VonKorff, M., & Barlow, W. (1995). Health care costs of primary care patients with recognized depression. Archives of General Psychiatry, 52, 850-856. Sipkoff, M.Z. (1995, August). Behavioral health treatment reduces medical costs: Treatment of mental disorders and substance abuse problems increase productivity in the workplace. Open Minds, 12. Speer, D.C. (1992). Clinically significant change: Jacobson and Truax (1991) revisited. Journal of Consulting and Clinical Psychology, 60, 402-408. Spielberger, C.D. (1983). Manual of the State-Trait Anxiety Inventory: STAI (Form Y). Palo Alto, CA: Consulting Psychologists Press. Stewart, A.L., & Ware, J.E., Jr. (1992). Measuring functioning and well-being. Durham, NC: Duke University Press. Strain, J.J., Lyons, J.S., Hammer, J.S., Fahs, M., Lebovits, A., Paddison, P.L., Snyder, S., Strauss, E., Burton, R., & Nuber, G. (1991). Cost offset from a psychiatric consultation-liaison intervention with elderly hip fracture patients. American Journal of Psychiatry, 148, 1044-1049. Strategic Advantage, Inc. (1996). Symptom Assessment-45 Questionnaire manual. Minneapolis, MN: Author. Substance Abuse Funding News. (1995, December 22). Brown resigns drug post. p. 7. Substance Abuse and Mental Health Services Administration (1996, August). Preliminary estimates from the 1995 National Household Survey on Drug Abuse (SAMHSA Advanced Rep. No. 18). Rockville, MD: Author. Walton, M. (1986). The Deming management method. New York: Dodd Mead. Wampold, B.E., & Jenson, W.R. (1986). Clinical significance revisited [Letter to the editor]. Behavior Therapy,

17, 302-305. Ware, J.E., Kosinski, M., & Keller, S.D. (1995). SF-12: How to score the SF-12 physical and

< previous page

page_38

next page >

< previous page

page_39

next page > Page 39

mental summary scales (2nd ed.). Boston: New England Medical Center, The Health Institute. Ware, J.E., & Sherbourne, C.D. (1992). The MOS 36-Item Short Form Health Survey (SF-36): I. Conceptual framework and item selection. Medical Care, 30, 473-483. Ware, J.E., Snow, K.K., Kosinski, M., & Gandek, B. (1993). SF-36 Health Survey manual and interpretation guide. Boston: New England Medical Center, The Health Institute. Werthman, M.J. (1995). A managed care approach to psychological testing. Behavioral Health Management, 15, 15-17.

< previous page

page_39

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_41

next page > Page 41

Chapter 2 Psychological Tests in Screening for Psychiatric Disorder Leonard R. Derogatis Larry L. Lynn Clinical Psychometric Research, Inc. And Loyola College of Maryland From a historical perspective, psychology represents one of the pioneering disciplines in the development of screening models and selection algorithms. Early on, psychologists recognized one of the fundamental tenets of screening: Early detection can substantially improve the probability of the delivering effective treatment. Psychologists were also among the first to formally describe the methodologies of establishing the reliability and validity of screening tests, and the differential implications of false positive and false negative errors in prediction. In addition, psychologists have argued consistently in favor of the inherent cost-efficiency of psychiatric screening programs, a critical issue with current health care costs approaching runaway proportions. Concerning the issue of cost, the evidence is extremely compelling that the implementation of screening paradigms for psychiatric disorder, particularly in primary care populations, can substantially reduce costs and measurably improve treatment effectiveness of primary medical disorders. Psychiatric screening programs represent an effective response to the reality that the majority of individuals with psychiatric disorders in society (many estimate as high as 75%) are never seen by a mental health professional. The large majority of such cases in the health system are attended to by primary care physicians. So, in spite of possessing highly reliable, valid methods for the evaluation of psychological status, treatment planning and outcomes assessment essentially address just the ''tip of the iceberg." Assessment methods are irrelevant for a large majority of those who would derive benefit from their utilization, because the psychiatric disorders of these individuals are rarely recognized by the primary care professionals who provide their health care. This being the case, the routine screening for mental disorders in cohorts known to be at elevated risk (e.g., college students, chronic medical illnesses, specific elderly populations) would identify a significantly larger proportion of covert psychological disorders, and do so at an earlier stage of their illnesses, thereby reducing the potential for cumulative morbidity.

< previous page

page_41

next page >

< previous page

page_42

next page > Page 42

Overview of Screening The Concept of Screening Screening has been defined traditionally as "the presumptive identification of unrecognized disease or defect by the application of tests, examinations or other procedures which can be applied rapidly to sort out apparently well persons who probably have a disease from those who probably do not" (Commission on Chronic Illness, 1957, p. 45). Screening is an operation conducted in an ostensibly well population in order to identify occult instances of the disease or disorder in question. Some authorities make a distinction between screening and case finding, which is specified as the ascertainment of disease in populations comprised of patients with other disorders. Under such a distinction, the detection of psychiatric disorders among medical patients would more precisely fit the criteria for case finding than screening. In actual implementation there appears to be little real difference between the two processes, so this chapter uses the term screening for both operations. Regardless of its specific manifestation, the screening process represents a relatively unrefined sieve designed to segregate the cohort under assessment into "positives," who presumptively have the condition, and "negatives," who are ostensively free of the disorder. Screening is not a diagnostic procedure per se. Rather, it represents a preliminary filtering operation that identifies those individuals with the highest probability of having the disorder in question for subsequent specific diagnostic evaluation. Individuals found negative by the screening process are usually not evaluated further. The conceptual underpinning for screening rests on the premise that the early detection of unrecognized disease in apparently healthy individuals carries with it a measurable advantage in achieving effective treatment and/or cure of the condition. Although logical, this assumption is not always valid. In certain conditions, early detection does not measurably improve the capacity to alter morbidity or mortality, either because diagnostic procedures are unreliable, or because effective treatments for the condition are not yet available. In an attempt to facilitate a better appreciation of the particular health problems that lend themselves to effective screening systems, the World Health Organization (WHO) published guidelines for effective health screening programs (Wilson & Junger, 1968). The following is a version of these criteria: 1. The condition should represent an important health problem that carries with it notable morbidity and mortality. 2. Screening programs must be cost-effective, that is, the incidence/significance of the disorder must be sufficient to justify the costs of screening. 3. Effective methods of treatment must be available for the disorder. 4. The test(s) for the disorder should be reliable and valid, so that detection errors (i.e., false positives or false negatives) are minimized. 5. The test(s) should have high cost-benefit, that is, the time, effort, and personal inconvenience to the patient associated with taking the test should be substantially outweighed by its potential benefits. 6. The condition should be characterized by an asymptomatic or benign period, during which detection will significantly reduce morbidity and/or mortality. 7. Treatment administered during the asymptomatic phase should demonstrate significantly greater efficacy than that dispensed during the symptomatic phase.

< previous page

page_42

next page >

< previous page

page_43

next page > Page 43

Some authorities are not convinced that psychiatric disorders, and the screening systems designed to detect them, conclusively meet all of the aforementioned criteria. For example, the efficacy of treatments for certain psychiatric conditions (e.g., schizophrenia) is arguable, and it has not been definitively demonstrated for some conditions that treatments initiated during asymptomatic phases (e.g. "maintenance" antidepressant treatment) are more efficacious than treatment initiated during acute episodes of manifest symptoms. Nevertheless, it is generally understood that psychiatric conditions and the screening paradigms designed to identify them do meet the WHO criteria in most instances, and the consistent implementation of screening systems can substantially improve the quality and cost-efficiency of health care. The impetus for the development and routine implementation of effective psychiatric screening systems for medical and community cohorts arises not only from the increases in morbidity and mortality associated with undetected psychiatric disorders (Hawton, 1981; Kamerow, Pincus, & MacDonald, 1986; Regier, Robert, et al., 1988), but also from several additional factors. First, it is currently well-established that between two thirds and three quarters of individuals with psychiatric disorders either go completely untreated or are treated by nonpsychiatric physicians; these individuals are never seen by a mental health care professional (B.P. Dohrenwend & B.S. Dohrenwend, 1982; Regier, Goldberg, & Taube, 1978; Weissman, Myers, & Thompson, 1981; Yopenic, Clark, & Aneshensel, 1983). Second, although there is a significant correlation, or comorbidity, between physical illness and psychiatric disorder (J.E. Barrett, J.A. Barrett, Oxman, & Gerber, 1988; Fulop & Strain, 1991; T.L. Rosenthal et al., 1992), the detection by primary care physicians of the most prevalent psychiatric disorders (i.e., anxiety and depressive disorders) is routinely poor (Linn & Yager, 1984; Nielson & Williams, 1980). It is not unusual to find recognition rates in medical cohorts falling below 50%. In addition, research has consistently demonstrated higher prevalence rates for psychiatric disorders among the medically ill (K.B. Wells, Golding, & Burnam, 1988), and has confirmed that high health care utilizers have elevated levels of psychological distress and psychiatric diagnoses (Katon et al., 1990). In addition, costs for medical patients with comorbid psychiatric disorders can be dramatically higher than those of comparable patients free of psychological comorbidities (Allison et al., 1995). Effective psychiatric screening programs designed with current methods would not only significantly reduce psychiatric and medical morbidity, but would almost certainly have a beneficial impact on health care costs (Allison et al., 1995). The number of unnecessary diagnostic tests would be diminished, lengths of stay and numbers of readmissions would be reduced (Saravay, Pollack, Steinberg, Weinschel, & Habert, 1996), and the demand for health care among the groups with the highest utilization rates would be decreased. More importantly, those individuals in the community whose disorders currently go undetected would be identified early, and treated before the pervasive morbidity associated with chronicity sets in. The Epidemiologic Screening Model Because most psychologists are not closely familiar with screening paradigms, the basic epidemiologic screening model is reviewed. Essentially, a cohort of individuals who are apparently well, or in the instance of case finding present with a condition distinct from the index disorder, are evaluated by a "test" to determine if they are at high risk for a particular disorder or disease. As already outlined, the disorder must have sufficient incidence or consequence to be considered a serious public health problem, and must

< previous page

page_43

next page >

< previous page

page_44

next page > Page 44

TABLE 2.1 Epidemiologic Screening Model Actual Screening Test Cases Noncases Test Positive a b Test Negative c d Note. Sensitivity (Se) = a/(a + c); false negative rate (1 - Se) = c/(a + c); specificity (Sp) = d/(b + d); false positive rate (1 - Sp) = b/(b + d); positive predictive value (PPV) = a/(a + b); negative predictive value (NPV) = d/(c + d). be characterized by a distinct early or asymptomatic phase during which detection will substantially improve the results of treatment. The screening test itself (e.g., pap smear, Western blot) should be both reliable (i.e., consistent in its performance from one administration to the next) and valid (i.e., be capable of identifying those with the index disorder, and eliminating individuals who do not have the condition). In psychometric terms, this form of validity has been traditionally referred to as "predictive," or "criterion-oriented," validity. In epidemiologic models, the predictive validity of the test is apportioned into two distinct partitions: the degree to which the test correctly identifies those individuals who actually have the disorder, termed its sensitivity, and the extent to which those free of the condition are correctly identified as such, or its specificity. Correctly identified individuals with the index disorder are referred to as true positives, and those accurately identified as being free of the disorder are termed true negatives. Misidentifications of healthy individuals as affected are labeled false positives, and affected individuals missed by the test are referred to as false negatives. It should be noted that each type of prediction error carries with it a socially determined value or significance, termed its utility, and these utilities need not be equal. The basic fourfold epidemiologic table, as well as the algebraic definitions of each of these validity indices, are given in Table 2.1. Sensitivity and specificity are a screening test's most fundamental validity indices; however, other parameters can markedly affect a test's performance. In particular, the prevalence, or base rate, of the disorder in the population under evaluation can have a powerful effect on the results of screening. Two other indicators of test performance, predictive value of a positive and predictive value of a negative, reflect the interactive effects of test validity and prevalence. These indices are also defined in Table 2.1, although their detailed discussion is postponed until a later section. Screening Tests for Psychiatric Disorders A History of Screening Measures The predecessors of modern psychological screening instruments date back to the late 19th and early 20th centuries. Sir Francis Galton (1883) created the prototype psychological questionnaire as part of an exposition for the World Fair. The first self-report symptom inventory, the Personal Data Sheet, was developed by Woodworth (1918) as

< previous page

page_44

next page >

< previous page

page_45

next page > Page 45

part of the effort to screen American soldiers entering World War I for psychiatric disorders. At approximately the same time, the psychiatrist Adolph Meyer constructed the first psychiatric rating scale, the Phipps Behavior Chart at Johns Hopkins (Kempf, 1914-1915). Since these pioneering efforts, many hundreds of analogous tests and rating scales have been developed and published. A number have become well-validated and widely used. The current chapter briefly reviews a small number of these instruments in an effort to familiarize the reader with the nature of screening measures available. This is not the appropriate place for a comprehensive review of psychological screening tests (See Sederer & Dickey, 1996; Spilker, 1996; Zalaquett & Wood, 1997); rather, the goal is to introduce a number of instruments judged to be exemplary of their class. General Psychometric Principles Fundamental to a realistic appreciation of the psychometric basis for psychiatric screening is the realization that we are first and foremost involved in psychological measurement. Psychologists are rigorously schooled in the awareness that the principles underlying psychological assessment are no different from those that govern any other form of scientific measurement. However, a major distinction that characterizes psychological measurement resides in the object of measurement: It is usually a hypothetical construct. By contrast, measurement in the physical sciences usually involves tangible entities, which are measured via ratio scales with true zeros and equal intervals and ratios throughout the scale continuum (e.g., weight, distance, velocity). In quantifying hypothetical constructs (e.g., anxiety, depression, impulsivity), measurement occurs on ordinalapproaching interval scales, which by their nature are less sophisticated, and have substantially larger errors of measurement (Luce & Narens, 1987). Psychological measurement is no less scientific due to this fact; however, it is less precise than measurement in the physical sciences. Reliability. All scientific measurement is based on consistency or replicability; reliability concerns the degree of replicability inherent in measurement. To what degree would a symptom inventory provide the same results on readministration? To what extent do two clinicians agree on a psychiatric rating scale? Conceived differently, reliability can be thought of as the converse of measurement error. It represents that proportion of variation in measurement that is due to true variation in the attribute under study, as opposed to random or systematic error variance. Reliability can be conceptualized as the ratio of true score variation to the total measurement variance. It specifies the precision of measurement and thereby sets the theoretical limit of measurement validity. Validity. Just as reliability indicates the consistency of measurement, validity reflects the essence of measurement: the degree to which an instrument measures what it is designed to measure. It specifies how well an instrument measures a given attribute or characteristic of interest. Establishing the validity of a screening instrument is more complex and programmatic than determining its reliability, and rests on more elaborate theory. Although the validation process involves many types of validity experiments, the most explicitly applicable to the screening process is predictive validity. Essentially, the predictive validity of an assessment device hinges on its degree of correlation with an external reference criterionsome sort of gold standard. In the case of screening tests, the external criterion usually takes the form of a comprehensive

< previous page

page_45

next page >

< previous page

page_46

next page > Page 46

laboratory and/or clinical diagnostic evaluation that definitively establishes the presence or absence of the index condition. Critical to a genuine appraisal of predictive validity is the realization that it is highly specific in nature. To say that a particular screening test is "valid" has little or no scientific meaning; tests are valid only for specific purposes. An explicit note of validation specificity is made here only because there appears to be some confusion on this issue relative to psychological tests. Psychological tests employed in screening for psychiatric disorder(s) must be validated specifically in terms of the diagnostic assignments they are designed to predict. A specific unidimensional test (e.g., a depression scale) should be validated in terms of its ability to accurately predict clinical depressions; it should be of little value in screening for other psychiatric disorders except by virtue of the high comorbidity of depression with numerous other conditions (Maser & Cloninger, 1990), and the pervasive nature of depressive symptoms in many medical illnesses. Generalizability. Like reliability and validity, generalizability is a fundamental psychometric characteristic of test instruments used in psychiatric screening paradigms. Many clinical conditions and manifestations are systematically altered as a function of parameters such as age, sex, race, and the presence or absence of a comorbid medical illness. When validity coefficients (i.e., sensitivity and specificity) for a particular test are established relative to a specific diagnostic condition, they may vary considerably if the demographic and health characteristics of the cohort on which they were established are altered significantly. To cite examples, it is well-established that men are more constrained than women in reporting emotional distress. Well-constructed tests measuring symptomatic distress develop distinct sets of norms for the two genders to deal with this effect (Nunnally, 1978). Another illustration resides in the change of the phenomenologic characteristics of depression across age: Depression in the very young tends toward less dramatic affective display, and progresses through the classic clinical delineations of young and middle adult years, to the geriatric depressions of the elderly, which are more likely to be characterized by dementialike cognitive dysfunctions. Any single test is unlikely to perform with the same degree of validity across shifts in relevant parameters; therefore, generalizability must be established empirically and cannot merely be assumed. It is interesting to note that in a recent treatise on modernizing the conceptualization of validity in psychological assessment, Messick (1995) integrated generalizability, along with external criterion validity, as one of six discernable aspects of construct validity. Self-Report Versus Clinical Judgment. Although advocates and adherents argue the differential merits of self-report versus clinician ratings, a great deal of evidence suggests that the two techniques have strengths and weaknesses of roughly the same magnitude. Neither approach can be said to function more effectively overall in screening for psychiatric disorder. Each screening situation must be assessed independently, and the circumstances of each must be objectively weighed to determine which instrument modality is best suited for any particular screening implementation. Traditionally, self-report inventories have been more frequently used as screening tests than clinical rating scales. This is probably so because the self-report modality of measurement has much to recommend it to the task of screening. Self-report measures tend to be brief, inexpensive, and are tolerated well by the individuals being screened. These features lend the important attributes of cost-efficiency and cost-benefit to self-report. Self-report scales are also transportable; they may be used in a variety of settings, and they minimize professional time and effort. In addition, their administration, scoring,

< previous page

page_46

next page >

< previous page

page_47

next page > Page 47

and evaluation require little or no professional input. Recently, such tests have been adapted for use on personal computers. Interactive computerized testing enables test administration, scoring, evaluation, and storage of results entirely by computer, reducing both professional and technical time. Finally, perhaps the greatest advantage of self-report resides in the fact that the test is being completed by the only individuals experiencing the phenomenathe respondents themselves. Clinicians, no matter how skilled or well-trained, can never know the actual experience of respondents; rather, they must be satisfied with an apparent, or deduced, representation of the phenomena. This last feature of self-report tests can also represent their greatest potential source of error, that is, patient bias in reporting. Because the test respondent is providing the test data, an opportunity exists to consciously or unconsciously distort the responses given. Although patient bias does represent a potential difficulty for selfreport, empirical studies have indicated that such distortions represent a problem only in situations where there is obvious personal gain associated with response distortions. Otherwise, this problem usually does not represent a major source of bias (L.R. Derogatis, Lipman, Rickles, Uhlenhuth, & Covi, 1974a). There is also the possibility that response sets, such as acquiescence or attempts at impression management, may result in systematic response distortions, but such effects tend to add little error variance in most realistic clinical screening situations. Probably the greatest limitation of self-report arises from the inflexibility of the format: A line of questioning cannot be altered or modified depending on how the individual responds to previous questions. In addition, only denotative responses can be appreciated; facial expressions, tones of voice, attitudes and postures, and cognitive/emotional status of respondents are not integral aspects of the test data. This inflexibility extends to the fact that respondents must also be literate in order to read the questions. The psychiatric rating scale or interview is a viable alternative to self-report instruments in designing a screening paradigm. The clinical rating scale introduces professional judgment into the screening process, and is inherently more flexible than self-report. The clinician has both the expertise and freedom to delve in more detail into any area of history, thought, or behavior that will deliver relevant information on the respondents' mental status. Clinicians also carry the capacity to clarify ambiguous answers and probe areas of apparent contradiction. In addition, because of their sophistication in psychopathology and human behavior, there is the theoretical possibility that more complex and sophisticated instrument design may be utilized in developing psychiatric rating scales. On the negative side, just as self-report is subject to patient bias, clinical rating scales are subject to equally powerful interviewer biases. Training sessions and videotaped interviews may be utilized in an attempt to reduce systematic errors of this type; however, interviewer bias can never be completely eliminated. Furthermore, the very fact that a professional clinician is required to make the ratings significantly increases the costs of screening. Lay interviewers have been trained to do such evaluations in some instances, but they are rarely as skilled as professionals, and the costs of their training and participation must be weighed into the equation as well. Finally, the more flexibility designed into the interview, the more time it is likely to take for the clinician to complete the ratings. At some point on this continuum, the "test" will no longer fit the format of a screening instrument, and it will begin to take on the characteristics of a comprehensive diagnostic interview. Both self-report and clinical interview modalities are designed to quantify the respondents' status in such a way as to facilitate a valid evaluation of their "caseness."

< previous page

page_47

next page >

< previous page

page_48

next page > Page 48

Both approaches lend themselves to actuarial quantitative methods, which allow for a normative framework to be established within which to evaluate individuals. Most importantly, both approaches work. It depends on the nature of the screening task, the resources at hand, and the experience of the clinicians or investigators involved to determine which method will work best in any particular situation. Screening Tests This section provides a brief synopsis of each of seven popular psychological tests and rating scales that are frequently employed as screening instruments. The assessment is not intended to be a comprehensive review, but rather to outline each measure and provide some information about its background and psychometric characteristics. In the case of commercially available tests (e.g., SCL-90-R, BSI, GHQ, BDI, BASIS-32), detailed discussions and comprehensive psychometric data are available from their published manuals. Scholarly reviews provide analogous information in the cases of the others (CES-D, HAS, HRDS). Five of the screening tests discussed here are self-report, and the remaining two are clinician rated. Table 2.2 provides a brief summary of instrument characteristics. SCL-90-R/BSI. The Symptom Checklist-90-Revised (SCL-90-R; Derogatis, 1977, 1983, 1994) is a 90-item, multidimensional, self-report symptom inventory derived from the Hopkins Symptom Checklist (L.R. Derogatis, Lipman, Rickles, Uhlenhuth, & Covi, 1974b) that was first published in 1975. The inventory measures symptomatic distress in terms of nine primary dimensions and three global indices of distress. The dimensions include Somatization, Obsessive-compulsive, Interpersonal Sensitivity, Depression, TABLE 2.2 Psychiatric Screening Tests in Common Use With Medical And Primary Care Populations Instrument Author/Date Mode Description Time Application Sensitivity/ Specificity SCL-90-R Self .73/91 Derogatis 90 items 15-20 1, 2, 3 (1975) multidim. min. 4, 7 Self .72/90 BSI 53 items 10-15 1, 2, 3 Derogatis (1975) multidim. min. 4, 7 GHQ Self .69-1.0/.75-.92 Goldberg 60, 30, 12 5-15 1, 2, (1972) items min. 3 4, 5 multidim. CES-D Self .83-.97/.61-.90 Radloff 20 items 10 1, 2, 3 (1977) unidim. min. 4 BDI Self .76-.92/.64-.80 Beck 21 items 5-10 2, 3, 4 (1961) unidim. min. BASIS-32 Self Eisen & 32 items 5-20 3, 5, 7 NA/NA Grob multidim. min. (1989) HAS Clin. 20 .91/.94 Hamilton Rating 14 items 1, 2, 3, (1959) bidim. 4, 6 HRDS Clin. 94-1.0/1.0 Hamilton Rating 21 items 30+ 1, 2, 3, (1960) unidim. min. 4, 5, 7 Note. 1 = Community adults, 2 = Community adolescents, 3 = Inpatient/outpatient, 4 = Medical patients, 5 = Elderly, 6 = Children, 7 = College students.

< previous page

page_48

next page >

< previous page

page_49

next page > Page 49

Anxiety, Hostility, Phobic Anxiety, Paranoid Ideation, and Psychoticism. Several matching clinical rating scales, such as the Derogatis Psychiatric Rating Scale and the SCL-90 Analogue Scale, which measure the same nine dimensions, are also available. Norms for the SCL-90-R have been developed for adult community nonpatients, psychiatric outpatients, psychiatric inpatients, and adolescent nonpatients. The Brief Symptom Inventory (BSI; L.R. Derogatis, 1993; L.R. Derogatis & Melisaratos, 1983; L.R. Derogatis & Spencer, 1982) is the brief form of the SCL-90-R. The BSI measures the same nine symptom dimensions and three global indices using only 53 items. Dimension scores on the BSI correlate highly with comparable SCL90-R scores (L.R. Derogatis, 1993), and the brief form shares most psychometric characteristics of the longer scale. Most recently, an 18-item version of the BSI has been developed and normed (L.R. Derogatis, 1997). Both the SCL-90-R and the BSI have been used as outcome measures in an extensive array of research studies, among them a number of investigations focusing specifically on screening (L.R. Derogatis et al., 1983; Kuhn, Bell, Seligson, Laufer, & Lindner, 1988; Royse & Drude, 1984; Zabora, Smith-Wilson, Fetting, & Enterline, 1990). To date, the SCL-90-R has been utilized in over 1,000 published research studies, with over 500 available in a published bibliography (L.R. Derogatis, 1990). The BSI has also demonstrated sensitivity to psychological distress in numerous clinical and research contexts (Cochran & Hale, 1985; O'Hara, Ghonheim, Heinrich, Metha, & Wright, 1989; Piersma, Reaume, & Boes, 1994). Both the SCL-90-R and the BSI have been translated into 26 languages. General Health Questionnaire (GHQ). The GHQ was originally developed as a 60-item, multidimensional, self-report symptom inventory by Goldberg (1972). Subsequent to its publication (Goldberg & Hillier, 1979), four subscales were factor analytically derived: Somatic Symptoms, Anxiety and Insomnia, Social Dysfunction, and Severe Depression. The GHQ is one of the most widely used screening tests for psychiatric disorder internationally, its popularity arising in part from the fact that several brief forms are available (e.g., the GHQ30 and GHQ12). The more recent brief forms retain the basic four subscale format of the longer parent scale, but avoid including physical symptoms as indicators of distress (Malt, 1989). The GHQ has been validated for use in screening and outcome assessment in numerous populations, including the traumatically injured, cancer patients, geriatric populations, and many community samples (Goldberg & Williams, 1988). Center for Epidemiologic Studies-Depression Scale (CES-D). The CES-D was developed by Radloff and her colleagues (1977). It is a brief, unidimensional, self-report depression scale comprised of 20 items that assess the respondent's perceived mood and level of functioning within the past 7 days. Four fundamental dimensionsDepressed Affect, Positive Affect, Somatic Problems, and Interpersonal Problemshave been identified as basic to the CES-D. The CES-D also has a total aggregate score. The CES-D has been used effectively as a screening test with a number of community samples (Comstock & Helsing, 1976; Frerichs, Areshensel, & Clark, 1981; Radloff & Locke, 1985), as well as medical (Parikh, Eden, Price, & Robinson, 1988) and clinic populations (Roberts, Rhoades, & Vernon, 1990). Recently, Shrout and Yager (1989) demonstrated that the CES-D could be shortened to 5 items and still maintain adequate sensitivity and specificity, as long as prediction was limited to traditional two-class categorizations. Generally, an overall score of 16 has been used as a cutoff score for depression, with approximately 15% to 20% of community populations scoring ³ 16.

< previous page

page_49

next page >

< previous page

page_50

next page > Page 50

Beck Depression Inventory (BDI). The BDI is a unidimensional, self-report, depression inventory employing 21 items to measure the severity of depression. Pessimism, guilt, depressed mood, self-deprecation, suicidal thoughts, insomnia, somatization, and loss of libido are some of the symptom areas covered by the BDI. The BDI was developed by Beck and his colleagues (A.T. Beck et al., 1961). A short (13-item) version of the BDI was introduced in 1972 (A.T. Beck & R.W. Beck, 1972), with additional psychometric evaluation accomplished subsequently (Reynolds & Gould, 1981). A revised version of the BDI has also been published (A.T. Beck & Steer, 1993). The BDI is characterized as being most appropriate for measuring severity of depression in patients who have been clinically diagnosed with depression. It has been utilized to assess depression worldwide with numerous community and clinical populations (Steer & A.T. Beck, 1996). Each of the items represents a characteristic symptom of depression on which respondents are to rate themselves on a 4-point (i.e., 0-3) scale. These scores are then summed to yield a total depression score. Beck's rationale for this system is that the frequency of depressive symptoms is distributed along a continuum, from "nondepressed" to "severely depressed." In addition, number of symptoms is viewed as correlating with intensity of distress and severity of depression. The BDI has been used as a screening device with renal dialysis patients, as well as with medical inpatients and outpatients (Craven, Rodin, & Littlefield, 1988). More recently, Whitaker et al. (1990) used the BDI with a group of 5,108 community adolescents and noted that it performed validly in screening for major depression in this previously undiagnosed population. In screening community populations, scores in the range of from 17 to 20 are generally considered suggestive of dysphoria, whereas scores greater than 20 are felt to indicate the presence of clinical depression (Steer & A.T. Beck, 1996). Behavior and Symptom Identification Scale (BASIS-32). The BASIS-32, otherwise known as the Behavior and Symptom Identification Scale, was designed and developed by Eisen and Grob (1989) to evaluate the outcome of mental health interventions from the perspective of the patient. It is a 32-item, self-report inventory that assesses current "difficulty in the major symptom and functioning domains that lead to the need for inpatient psychiatric treatment" (Eisen, 1996, p. 65). A 1-week time window is typically utilized with the BASIS-32, and respondents rate the degree of difficulty they have been experiencing on each item via 5-point (0-4) scales. The BASIS-32 contains five cluster analysis-derived subscales: Relation to Self and Others, Daily Living and Role Functioning, Anxiety and Depression, Impulsive and Addictive Behavior, and Psychosis. The five domains are not consonant with diagnostic entities, but reflect problems and manifestations of mental illness that are central aspects of the majority of psychiatric illnesses. The BASIS-32 has been utilized primarily with psychiatric hospital inpatients, and to a lesser degree with psychiatric outpatients and day-hospital patients. Fundamental psychometric characteristics of the scale are quite respectable (Eisen, 1996), and its sensitivity to change has been demonstrated in large sample improvement profiles (Eisen & Dickey, 1996). Although developed and validated within psychiatric inpatient populations, research on the use of the BASIS-32 with outpatient cohorts is in progress, and may confirm its utility for psychiatric populations in general. By integrating items reflecting problems in daily living and functional status with those representing formal psychiatric symptoms, the BASIS-32 could fulfill a long-standing need for a brief outcomes measure focused on psychosocial integration.

< previous page

page_50

next page >

< previous page

page_51

next page > Page 51

Hamilton Anxiety Scale (HAS). The HAS is a 14-item clinician rating scale published in 1959 by Hamilton. Each item represents a clinical feature of anxiety, requiring the clinician to rate the client on a 5-point scale from (0) ''not present" to (4) "very severe." The items reflect both somatic (e.g., cardiovascular, respiratory, gastrointestinal, and genitourinary), and psychic/cognitive (e.g., memory and concentration impairment) manifestations of anxiety. The HAS was designed to yield two separate subscores for "psychic anxiety" and "somatic anxiety." The HAS has been used with children as well as adults (Kane & Kendall, 1989), coronary artery bypass patients (Erikkson, 1988), general medical/surgical patients (Bech, Grosby, Husum, & Rafaelson, 1984), psychiatric outpatients (Riskind, Bech, Brown, & Steer, 1987), and many other groups. In addition to these applications, the HAS has become accepted as a standard outcome measure in clinical anxiolytic drug trials. Hamilton Rating Scale for Depression (HRDS/HAM-D). The HRDS is similar to the HAS in that both provide quantitative assessments of the severity of a clinical disorder. The HRDS was developed in 1960 by Hamilton, and was later revised in 1967 (Hamilton, 1967). It consists of 21 items, each measuring a depressive symptom. Hamilton recommended using only 17 items when scoring because of the uncommon nature of the remaining items (e.g., depersonalization). Hedlund and Vieweg (1979) reviewed the psychometric and substantive properties of the HRDS in two dozen studies and gave it a very favorable evaluation. More recently, Bech (1987) completed a similar review and concluded that the HRDS is an extremely useful scale for measuring depression. A Structured Interview Guide for the HRDS is also available (Williams, 1988). It provides standardized instructions for administration, and has been shown to improve interrater reliability. Just as the HAS with anxiety, the HRDS has also become a standard outcome measure in antidepressant drug trials. Psychological Screening in Specific Settings Community Settings By far the most comprehensive data on the prevalence of psychiatric disorders in the community has been developed from the NIMH Epidemiologic Catchment Area (ECA) investigation, a study of psychiatric disorders in the community involving nearly 20,000 individuals. These results make explicit the fact that psychiatric disorders are highly prevalent in society. This is so regardless of whether we assess lifetime (Robins et al., 1984), 6-month (Myers et al., 1984; Blazer et al., 1984) or 1-month (Regier et al., 1988) prevalence estimates. Detailing the latter, the 1-month prevalence for any psychiatric disorder, across all demographic parameters, was 15.4%, which is similar to European and Australian estimate ranging from 9% to 16% (Regier et al., 1988). In terms of specific diagnoses, the overall rate for affective disorders was 5.1%, whereas that for anxiety disorders was 7.3% (Regier et al., 1988). Six-month prevalence estimates for affective disorders ranged from 4.6% to 6.5% across the five ECA sites (Myers et al., 1984), whereas 6-month estimates for anxiety disorders, recently updated by Weissman and Merikangas (1986), reveal rates for panic disorder ranging from 0.6% to 1.0%. Agoraphobia showed prevalences from 2.5% to 5.8% across the various ECA sites.

< previous page

page_51

next page >

< previous page

page_52

next page > Page 52

Clearly, these data demonstrate that psychiatric disorders are a persistent and demonstrable problem that affect substantial numbers of the community population. Unfortunately, there is no effective system for screening individuals in the community per se; no action can be taken until they seek medical advice or treatment for a disorder, and then formally enter the health care system. At that point, primary care "gatekeepers" have the first, and in most instances the only, opportunity to identify psychiatric morbidity. Medical Settings In medical populations, prevalence estimates of psychiatric disorder are substantially increased over community rates. This is particularly true of anxiety and depressive disorders, which by far account for the majority of psychiatric diagnoses assigned to medical patients (Barrett et al., 1988; Derogatis et al., 1983; Von Korff, Dworkin, & Kruger, 1988). In recent reviews of psychiatric prevalence in medical populations, J.E. Barrett et al. (1988) observed prevalence rates of from 25% to 30%, and L.R. Derogatis and Wise (1989) reported prevalence estimates for a broad range of medical cohorts varying from 22% to 33%. Derogatis and Wise concluded that, "in general, it appears that up to one-third of medical inpatients reveal symptoms of depression, while 20 to 25% manifest more substantial depressive symptoms" (p. 101). Concerning anxiety, Kedward and Cooper (1966) observed a prevalence rate of 27% in their study of a London general practice, whereas Schulberg and his colleagues (1985) observed a combined rate of 8.5% for phobic and panic disorders among American primary care patients. In another contemporary review, Wise and Taylor (1990) concluded that from 5% to 20% of medical inpatients suffer the symptoms of anxiety, and 6% receive formal anxiety diagnoses. They further determined that depressive phenomena are even more prevalent among medical patients, citing reported rates of depressive syndromes of from 11% to 26% in inpatient samples. With such prevalence rates and the acknowledged escalations in morbidity and mortality associated with psychiatric disorders, there is little doubt that screening programs for psychiatric disorders in medical populations could achieve impressive utility. Potential therapeutic gains associated with psychiatric screening would be further enhanced and magnified by the fact that attendant related problemssuch as substance abuse, inappropriate diagnostic tests, and high utilization of health care servicesalso would be minimized. Particularly dramatic gains could be realized in specialty areas where estimated prevalence rates are over 50% (e.g., HIV: Lyketsos, Hutton, Fishman, Schwarz & Trishman, 1996; or obesity/weight reduction: Goldstein, Goldsmith, Anger, & Leon, 1996). In general, early and accurate identification of occult mental disorders in individuals with primary medical conditions would lead to a significant improvement in their well-being. It would also help relieve the fiscal and logistic strain on the health care system. Physician Recognition of Psychiatric Disorder. It is now well-established that in the United States, primary care physicians represent a de facto mental health care system (Burns & Burke, 1985; Regier et al., 1982; Regier et al., 1978). There is reliable evidence that one fifth to one third of the primary care population suffer from at least one psychiatric condition (typically an anxiety or depressive disorder; Derogatis & Wise, 1989), rendering the competence with which primary care physicians recognize psychiatric

< previous page

page_52

next page >

< previous page

page_53

next page > Page 53

disorders a critical issue. Unfortunately, current evidence suggests that only a fraction of prevalent psychiatric disorders are detected in primary care, a deficiency with considerable implications for both the physical and psychological health of patients (Seltzer, 1989). Also, because undetected psychiatric disorders are associated with increased morbidity and mortality, and enhanced use of health care facilities (Katon et al., 1990; Wells et al., 1988), the "costs" of failing to detect them are substantially magnified. Unaided Physician Recognition. During the past decade, a substantial number of studies have been reported documenting both the magnitude and nature of the problem of undetected psychiatric disorder among primary care physicians (Davis, Nathan, Crough, & Bairnsfather, 1987; Jones, Badger, Ficken, Leepek, & Andersen, 1987; Kessler, Amick, & Thompson, 1985; Schulberg et al., 1985). The data from these studies establish rates of accurate physician diagnosis of psychiatric conditions, which range from a low of 8% (Linn & Yager 1984) to a high of 53% observed by Shapiro et al. (1987), with an elderly cohort. More recently, in a study focused on primary care physicians, Yelin et al. (1996) observed that 44% of over 2,000 primary care patients who screened positive for clinical anxiety on the SCL-90R (Derogatis, 1994) had been previously assigned a mental health diagnosis. Although an improvement, these data also underscore the fact that 56% of these patients' mental conditions went undiagnosed. Although the methodology and precision of studies of this phenomenon continue to improve (Anderson & Harthorn, 1989; Rand, Badger, & Coggins, 1988), rates of accurate physician diagnosis have remained for the most part unacceptably low. A summary of these investigations along with their characteristics and accurate detection rates appear in Table 2.3. Aided Physician Recognition. The data from the aforementioned investigations strongly suggest that proactive steps must be taken to facilitate the accurate recognition of psychiatric conditions among primary care doctors. This is particularly the case in light of contemporary changes in health care, which suggest that in the future nonpsychiatric physicians will be playing a greater rather than a lesser role in this regard. If TABLE 2.3 Recent Research on Rates of Accurate Identification of Psychiatric Morbidity in Primary Care Investigator Study Sample Criteria Correct Diagnosis (%) Andersen & 120 physicians primary care DKI 33% affective Harthorn (1989) disorder Davis et al. (1987) 377 family practice patients Zung 15% mild SDS symptoms 30% severe symptoms Jones et al. (1987) 20 family physicians/ 51 patients DIS 21% Rand et al. (1988) 36 family practice residents/ 520 GHQ 16% patients Kessler et al. (1985) 1,452 primary care patients GHQ 19.7% Linn & Yager 150 patients in a general medical Zung 8% (1984) clinic Schulberg et al. 294 primary care patients DIS 44% (1985) Shapiro et al. (1987) GHQ 53% 1,242 Patients at university internal medicine clinic Zung et al. (1983) 41 family medicine patients Zung 15% SDS

< previous page

page_53

next page >

< previous page

page_54

next page > Page 54

primary care physicians cannot correctly identify psychiatric conditions, then they can neither adequately treat them personally nor refer them to appropriate mental health professionals. Such a situation will ultimately serve to degrade the quality of the health care systems further, and help to deny effective treatment to those who in many ways need it most. There is some evidence that primary care physicians can accurately identify both the prevalence of psychiatric disorders and the nature of these conditions. They estimate prevalence to be between 20% and 25% in their patient populations, and perceive anxiety and depressive disorders to be the most prevalent conditions they encounter (Fauman, 1983; Orleans, George, Haupt, & Brodie, 1985). In an effort to identify and overcome the problems inherent in detecting psychiatric conditions in primary care, a number of investigators have studied the effects of introducing a diagnostic aid for primary care doctors, in the form of results from a psychological screening test. Although far from unanimous, the studies completed during the past decade have concluded that, in the appropriate situation, screening tests can significantly improve physician detection of psychiatric conditions. Linn and Yager (1984), using the Zung SDS, found an increase from 8% correct diagnosis to 25% in a cohort of 150 general medical patients. Similarly, Zung, Magill, Moore, and George (1983) reported an increase in correct identification rising from 15% to 68% in family medicine outpatients with depression. Likewise, Moore, Silimperi, and Bobula (1978) observed an increase in correct diagnostic identification from 22% to 56% working with family practice residents. Not all studies have shown such dramatic improvements in diagnostic accuracy, however. Hoeper, Nyczi, and Cleary (1979) found essentially no improvement in diagnosis associated with making GHQ results available to doctors, and Shapiro et al. (1987) reported only a 7% increase in accuracy when GHQ scores were made accessible. The question of aided recognition of psychiatric disorders is a complex one, with numerous patient and doctor variables playing an important role. Nonetheless, the results of the studies on aided recognition appear promising, and an excellent contemporary review of the issues involved has been written by Anderson and Harthorn (1990). Problems Unique to Psychiatric Disorders. The prototypic psychiatric disorder is a hypothetical construct, with few pathognomonic clinical or laboratory indicators and a pathophysiology and etiology that are only dimly discernible. For these reasons, singular problems arise in the detection of psychiatric disorders, particularly in medical patients. To begin with, the highly prevalent anxiety and depressive disorders have a multitude of somatic symptoms associated with them, which are difficult to differentiate from those arising from verifiable physical causes. Schurman, Kramer, and Mitchell (1985) indicated that 72% of visits to primary care doctors resulting in a psychiatric diagnosis presented with somatic symptoms as the primary complaint. Katon et al. (1990) and Bridges and Goldberg (1984) both indicated that presentation with somatic symptoms as primary complaints is a key reason for misdiagnosis of psychiatric disorders in primary care. In their study of high health care utilizers, Katon et al. (1990) reported that the high utilization group had elevated SCL-90-R scores of over three quarters of a standard deviation, not only on the Anxiety and Depression subscales, but on the Somatization subscale as well. A second problem, more specific to the chronically or terminally ill, has to do with the misperception of clinical depressions as demoralization reactions (Derogatis & Wise, 1989). Most serious chronic illnesses, and those that inevitably result in mortality, have

< previous page

page_54

next page >

< previous page

page_55

next page > Page 55

as a natural aspect of the illness a period of disaffection and demoralization. These negative affective responses are a natural reaction to the loss of vitality and well-being associated with being chronically ill and, where appropriate, the anticipated loss of life itself. Physicians familiar with caring for such patients (e.g., cancer, emphysema patients) frequently misperceive true clinical depressions (for which effective treatments are available) for reactive demoralized states that are a natural part of the illness. They then fail to initiate a therapeutic regimen on the grounds that such mood states are part of the primary medical condition. There is good evidence that such reactive states can be reliably distinguished from major clinical depressions (Snyder, Strain, & Wolf, 1990), and patients suffering such painful comorbid conditions are done a substantial disservice if physicians fail to diagnose adequately and treat their disorders. Although understandable, this composite of problems has a highly regressive impact on the overall health care system. As it is now structured, it is a system where the preponderance of psychiatric disorders are seen by primary care physicians, who undeniably leave a majority of these conditions undetected. Of the cases they do identify, only a small minority are ever referred to mental health specialists, even though such conditions are known to be of a chronic and recurrent nature and primary care physicians admit they feel less than fully competent to treat them. Undetected or improperly treated anxiety and depressive disorders are known to be disproportionately associated with substance abuse, alcoholism, excessive diagnostic tests, suicide, excessive utilization of the health care system, and spiraling health care costs. We must greatly facilitate the proficiency of our primary care physicians in the detection of psychiatric disorders if our hopes of ever developing an efficient, cost-effective system are to be anything but illusory. Academic Settings Recollections of college days usually bring to mind idyllic images of youthful abandon and the pursuit of personal growth and pleasure, unencumbered by the tedious demands and stresses of everyday adult life. Unfortunately, the realities of contemporary student life paint a different portrait. The period of modern undergraduate and graduate studies represents a phase in the life cycle of rapid change, high stress, and previously unparalleled demands on an individual's coping resources. In the light of this reality, it is not surprising that it also represents a phase of life associated with a high incidence of psychiatric morbidity. Numerous studies have reported prevalence rates of psychological disorders in university populations. Telch, Lucas, and Nelson (1989) investigated panic disorder in a sample of 2,375 college students and found that 12% reported at least one panic attack in their lifetime. Furthermore, 2.36% of the sample met DSM-III-R criteria for panic disorder. Craske and Krueger (1990) reported lifetime prevalence of nocturnal panic attacks in 5.1% of their 294 undergraduates. Prevalence of daytime panic attacks was also 5.1%, but only 50% of those reporting nocturnal panic also reported daytime panic. Disorders that are especially salient in college populations include addiction, eating disorders, and depression. West, Drummond, and Eames (1990) found that 25.6% of men and 14.5% of women in a sample of 270 college students reported drinking large quantities of alcohol weekly. This same sample indicated that 20% of men and 6% of women had damaged property after drinking in the past year. Seay and T. Beck (1984) administered the Michigan Alcohol Screening Test to 395 undergraduates and discovered

< previous page

page_55

next page >

< previous page

page_56

next page > Page 56

that 25% were problem drinkers and 7% were alcoholics. However, only 1% were aware they had a drinking problem. Eating disorders, especially bulimia, are relatively common in college populations, because the average age at onset is between adolescence and early adulthood (American Psychiatric Association, 1987). In a study of 1,040 college students, Striegel-Moore, Silberstein, Frensch, and Rodin (1989) found rates of bulimia of 3.8% for females and 0.2% for males. In a study of 69 college women, Schmidt and Telch (1990) reported the prevalence of personality disorders in three groups. In a group defined as bulimic, 61% of subjects met criteria for at least one personality disorder. In a group of "nonbulimic binge eaters," 13% met criteria for a disorder. And in the control group, only 4% of individuals met criteria for a personality disorder. Most bulimics (57%) who exhibited personality disorder met criteria for borderline personality disorder. V. Wells, Klerman, and Deykin (1987) reported that 33% of their sample of 424 adolescents met the standard criteria for depression using the CES-D. The rate fell to 16% when more stringent duration criteria were applied. Even more troubling are the results of a study by McDermott, Hawkins, Littlefield, and Murray (1989), which revealed that 65% of the 331 college women and 51% of the 241 college men they surveyed met criteria for depression using the CES-D. Furthermore, 10% of this sample reported contemplating self-injurious behavior during the previous week. Suicidal ideation was reported by 8%, and 1% said they thought about suicide "most or all of the time" during the past week. The results of these studies make it apparent that university students suffer from a considerable prevalence of psychiatric morbidity. A critical question then becomes, to what degree is this morbidity detected by university health centers? As in the community, university physicians carry much of the burden for detecting psychological disorders because of the nature of their contacts with students. These physicians invariably treat many patients who present with somatic complaints that are actually manifestations of an underlying psychological disorder. The relative homogeneity of the age of the student group does carry some advantages with it, but beyond that fact, university physicians are essentially in the same position relative to recognizing psychological disorders as their primary care counterparts. There is very little published data regarding the accuracy of university physicians' diagnoses; however, it is probably safe to assume that they are approximately as precise as their primary care colleagues. As reviewed earlier, primary care doctors show rates of accurate diagnoses ranging from 8% (Linn & Yager, 1984) to 53% (Shapiro et al., 1987), with a majority of the rates remaining below 33% (L.R. Derogatis, DellaPietra, & Kilroy, 1992). This is obviously an unsatisfactory level of detection, given the prevalence of disorder in college populations. Aided Recognition of Disorders in Academic Settings. The use of aided recognition screening paradigms appears to be a strategy that may improve the rate of detection of psychological disorders in this important population. As evidenced by analogous approaches in primary care, significant improvement in recognition rates can be developed from the implementation of such systems. The components required involve screening tests that are valid indicators of psychiatric morbidity in this age group, and a systematic and meaningful system of application within universities. Several screening instruments have been used successfully with adolescent and college student populations and have been shown to have adequate sensitivity and specificity. The BDI and the CES-D are two unidimensional measures that have been used with college populations. Whitaker et al. (1990) used the BDI with 5,108 adolescents and

< previous page

page_56

next page >

< previous page

page_57

next page > Page 57

found it to have moderate validity in this population. Schmidt and Telch (1990) also used the BDI to measure depression in college women with bulimia. McDermott et al. (1989) used the CES-D in order to investigate health-related practices and events, and depression in college students. They judged the scale to be "practical and reliable." The General Behavior Inventory (Depue, Krauss, Spoot, & Arbisi, 1989) was used to detect unipolar and bipolar conditions in college students and was found to have adequate sensitivity (i.e., @ .76) and high specificity (i.e., .99). Two multidimensional inventories that have also proven useful with college populations are the GHQ and the SCL-90-R. The GHQ was used by Szulecka, Springett, and De Pauw (1986) to identify first-year undergraduates who might be good candidates for psychotherapy, whereas the SCL-90-R and its briefer counterpart, the BSI, have been utilized with a number of studies of distress in university students. For example, Benjamin, Kasniak, Sales, and Shanfield (1986) used the BSI to measure distress in law and medical students, and Johnson, Ellison, and Heikkinen (1989) employed the SCL-90-R to describe the type and severity of psychological symptoms in university students attending a counseling center. The substantial prevalence of psychiatric disorders in university populations is well-documented, and there are a variety of instruments currently available that can improve the rate of detection of psychological disorders in university students. It is now incumbent on academic decision makers to formally integrate such measures into university mental health evaluation systems. Implementation of a Screening System. In implementing a university-based screening system, there are two broad approaches available. The first is to take a preventive posture. Szulecka et al. (1986) used this method with entering students at Nottingham University. At time of registration, students were given the GHQ. Students scoring high (i.e., showing more distress) were then split into two groups: an intervention group (IG) and a control group (CG). There was also a matched group (MG) of students scoring low on the GHQ. The IG subjects were offered an interview to discuss their feelings about the GHQ and adjustment to college life. They were made aware of counseling and support services on campus. At the end of the year, all students again took the GHQ. Although many results failed to reach statistical significance, they supported a number of clinically relevant trends. Compared to the CG, the IG showed more improvement in GHQ scores at follow-up, made fewer consultations to physicians, had fewer withdrawals from the university, and had fewer students fail out of school. Student reaction was positive, with many students viewing the program as "evidence of care," which also "strengthened confidence in the Health Centre." The results imply that identifying vulnerable students on admission and offering help can have beneficial effects. Although the CG made more consults to physicians, they were less likely to return, showing that distressed students may not always follow through in seeking help. For this reason, an active outreach program seems essential. A second approach to the problem that integrates an active outreach component is described by L. Clark, Levine, and Kinney (1988-1989). This process is described with a specific focus on the prevention, identification, and treatment of bulimia; however, it reflects a generic set of procedures that can be applied to psychiatric disorders in general. Clark et al. addressed the problem from multiple sources. They enlisted faculty, staff, the library, campus media, counselors, physicians, and peers, thereby heightening awareness of the resources available for treating psychological problems. Examples of services that could be developed and brought to bear on the problem include courses, workshops,

< previous page

page_57

next page >

< previous page

page_58

next page > Page 58

and public lectures about psychological disorders (given by faculty), information sessions (given by counselors, ministers, residence hall coordinators), accessible reading materials and lists of people/organizations to contact (in the library), public service announcements (in all campus media), and peer support groups. Once students are made aware of the nature of psychological problems and the availability of a full range of services, counselors, physicians, and "recovered" patients can provide screening, therapy, medical evaluation and treatment, and support. Obviously, these two approaches to implementing screening are not mutually exclusive, and combining certain aspects would probably lead to an even more effective process of identifying and treating psychological disorders on campus. Matriculating students could be assessed prior to admission, and those identified as "at risk" could be offered interventions. Those who "slip through" the screening process, or those who develop psychological problems after entering school, would hopefully seek help as a result of the campus' campaign for awareness. Such a comprehensive program would insure that a large majority of students in need are reached. Screening for Suicidal Behavior "Suicidal behavior" is a phrase that strongly affects most physicians, psychologists, and other health care professionals. Suicide has always been a perplexing subject for members of the health care community because of its perceived unpredictability and its inherent life threatening nature. Chiles and Strosahl (1995) defined suicidal behavior as a "broad spectrum of thoughts, communications, and acts . . . ranging from the least common, completed suicide . . . to the more frequent, suicidal communications . . . and the most frequent, suicidal ideation and verbalizations" (p. 51). Chiles and Strosahl reported that the rate of suicide has remained stable in the United States for the past 20 years at approximately 12.7 deaths per 100,000, and ranks as the eighth leading cause of death in the general population. Suicide ranks as the third leading cause of death for individuals from 18 to 24 years old, and the suicide rate in the elderly (i.e., over 65) is approximately double the rate of that for the 18- to 24-year-old population. Suicidal behavior is generally broken down into three categories. The first type of suicidal behavior is suicidal ideation, or thoughts about suicide. Relatively little is known about the predictive value of suicidal ideation. The second type of suicidal behavior concerns suicide attempts, which tend to be more common in females and younger individuals. Chiles and Strosahl (1995) indicated that approximately 50% of those who attempt suicide have no formal mental health diagnosis. The last category of suicidal behavior is completed suicide, which is more common in males and older individuals, many of whom have formal psychiatric diagnoses. Evidence also suggests that whites and divorced or separated persons are at increased risk for suicide, as are individuals with diagnoses of depression, drug abuse, panic disorder, generalized anxiety disorder, phobias, posttraumatic stress disorder, obsessive compulsive disorder, somatoform, and dysthymic disorder (Lish et al., 1996). Other risk factors include loss of a spouse (increases risk for up to 4 years), unemployment, physical illness, bereavement, and physical abuse. Some personality traits associated with suicide include poor problem solving, dichotomous thinking, and feelings of helplessness (Chiles & Strosahl, 1995). Lish et al. (1996) noted that 82% of the people who commit suicide have visited primary care physicians within the past 6 months, 53% within 1 month, and 40% 1 week prior to the suicide. This situation makes it imperative that primary care physicians

< previous page

page_58

next page >

< previous page

page_59

next page > Page 59

are able to screen for and recognize the risk factors involved in suicidal behavior. Because medical illness is itself a risk factor for suicide, primary care physicians are more likely to see cases of suicidal behavior in their initial stages compared to mental health professionals such as psychologists or psychiatrists. And primary care physicians are usually not well-trained in the identification of mental health disorders, so they are at increased risk for missing the risk factors associated with suicidal behavior. A major problem affecting screening for suicidal behavior is the phenomena of low base rates (see later). The problem of low base rates is related to the low prevalence of suicide in the general population to be screened. Chiles and Strosahl (1995) reported a lifetime prevalence of suicide between 1% and 12%, and Lish et al. (1996) reported a 7.5% prevalence of suicidal behavior in a VA hospital sample. As discussed later in the chapter, with prevalences this low, even the most valid screening tests will produce an unacceptably high number of false positives for every true positive identified. One of the most common techniques used to estimate or predict suicidal behavior is the profile of risk factors. As already mentioned, age (younger and older are at higher risk for suicidal behavior) and race (whites, Hispanics, and Asians are two times more likely to attempt suicide than African Americans; Lish et al., 1996) have significant predictive value. In addition, those with a mental health diagnosis are 12 times more likely to attempt suicide, those that have had previous mental health treatment are 7 times more likely to attempt suicide, and those in poor physical health are 4 times more likely to attempt suicide. Even with these ratios, Chiles and Strosahl (1995) noted that profiling is not sufficiently powerful to accurately predict suicide in individuals, but is better suited to documenting differential rates of occurrence of suicidal behavior across groups. Some key risk factors addressed as part of an overall evaluation of suicidal behavior include an individual's positive evaluation of suicidal behavior, low ability to tolerate emotional pain, high levels of hopelessness, a sense of inescapability, and low survival and coping beliefs. Although a clinical interview with a detailed history of treatment and previous suicidal behaviors appears to be the most effective predictor of current suicidal behavior, this process can often be very time consuming and is not cost-effective or practical for screening purposes. Several brief instruments have been found to be useful in predicting suicidal behavior, including the Beck Hopelessness Scale (BHS; A.T. Beck, Kovacs, & Weissman, 1975) and the BDI (A.T. Beck & Steer, 1993). Westefeld and Liddel (1994) noted that the 21-item BDI may be particularly useful for screening for suicidal behavior in college students. L.R. Derogatis and M.F. Derogatis (1996) documented the utility of the SCL-90-R and the BSI in screening for suicidal behavior. A number of investigators have reported the primary symptom dimensions and the global scores of the SCL-90-R/BSI capable of discriminating suicidal behavior in individuals diagnosed with depression and panic disorder (Bulik, Carpenter, Kupfer, & Frank, 1990; Noyes, Chrisiansen, Clancy, Garvey, & Suelzer, 1991). Similarly, Swedo et al. (1991) found that all SCL-90-R subscales successfully distinguished suicide attempters from controls in an adolescent population, and the majority of subscales were effective in discriminating "attempters" from an intermediate at-risk group. Adolescents and adults who attempted suicide tended to perceive themselves as more distressed and hopeless on the SCL-90-R than the at-risk group, a finding confirmed by Cohen, Test, and Brown (1990) using the BSI. Several other instruments have been used to screen for suicidal behavior, including the SCREENER (Lish et al., 1996) and the College Student Reasons for Living Inventory (Westefeld, Cardin, & Deaton, 1992). Lish et al. (1996) used the SCREENER to screen

< previous page

page_59

next page >

< previous page

page_60

next page > Page 60

for psychiatric disorders and to determine the possibility of suicidal behavior. The SCREENER screens for DSM-IV Axis I conditions and is available in 96-item and 44-item forms. It contains three questions that address death or suicide and one that addresses suicidal ideation directly. Lish et al. stated that clinicians should screen for substance abuse and anxiety disorders as well as major depression because these disorders all increase the risk of suicidal behavior in a primary care setting. The College Student Reasons for Living Inventory (CSRLI) is a college student version of the 47-item Reasons for Living Inventory (Westefeld et al., 1992). The CSRLI produces a total score and six subscale scores that include Survival and Coping Beliefs, College and Future-related Concerns, Moral Objections, Responsibility to Friends and Family, Fear of Suicide, and Fear of Social Disapproval. Westefeld, Bandura, Kiel, and Scheel (1996) collected additional data on the CSRLI and found that college students who were at higher risk for suicidality endorsed fewer reasons for living. Westefeld et al. stated that the data supports using the CSRLI as an effective screening tool in a college setting. Two other screening methods for suicidal behavior are worth brief mention. The first technique is a computer interview for suicide risk prediction developed by Greist et al. (1973). This method uses a computeradministered interview that consists of both open-ended and multiple-choice questions. The program used a "branching" system of questions based on answers previously given to determine suicidal risk. The method was well received by the patients and predicted 70% of suicide attempts correctly, whereas clinicians predicted only 40% accurately. A final method to be mentioned, in contradistinction to the traditional profiling approach mentioned previously, is the "manifest predictors" method of Bjarnason and Thorlindsson (1994). They suggested the use of "manifest predictors''such as school postures (two questions), leisure time (two questions about music and two about what one does with leisure time), peer and parent relationships (eight questions), consumption (five questions involving smoking, alcohol, caffeine, and skipping meals), and contact with suicidal behavior (three questions)as a complement to the more commonly used "latent" predictors (i.e., depression and hopelessness). Screening for Cognitive Impairment Screening for cognitive impairment, especially when dealing with geriatric populations, is extremely important because it is estimated that up to 70% of patients with an organic mental disorder (OMD) go undetected (Strain et al., 1988). Some OMDs are reversible if discovered early enough, so screening programs in high risk populations can have a very high utility. Even in conditions found to be irreversible, early detection and diagnosis can help in the development of a treatment plan and the education of family members. Instruments with a General Versus Specific Focus. There are several instruments available that provide quick and efficient screening of cognitive functioning. Most of these address the general categories of cognitive functioning covered in the standard mental status examination, including attention, concentration, intelligence, judgment, learning ability, memory, orientation, perception, problem solving, psychomotor ability, reaction time, and social intactness (McDougall, 1990). However, not all instruments include items from all of the previous categories. These general instruments can be contrasted with another class of cognitive screening measures characterized by a more specific focus. For example, the Stroke Unit Mental Status Examination (SUMSE) was

< previous page

page_60

next page >

< previous page

next page >

page_61

Page 61 designed specifically to identify cognitive deficits and plan rehabilitation programs for stroke patients (Hajek, Rutman, & Scher, 1989). Another example of a screening instrument with a specific focus is the Dementia of Alzheimer's Type Inventory (DAT), designed to distinguish Alzheimer's disease from other dementias (Cummings & Benson, 1986). Previously, specific types of measure tended to be less common, owing to their limited range of applicability. More recently, specific scales have become more popular as they have been used in conjunction with general measures. Unlike other screening tests, the great majority of cognitive impairment scales are administered by an examiner. Of the instruments reviewed here, none are self-report measures. There are no pencil-and-paper inventories that can be completed by the respondent alone. Instead, these screening measures are designed to be administered by a professional and require a combination of oral and written responses. Most of the tests are highly transportable, however, and can be administered by a wide variety of health care workers. Following are nine cognitive impairment screening measures. This is not intended to be an exhaustive review, but rather to provide some data on the nature of each measure and its psychometric properties (see Table 2.4). Mini-Mental State Examination (MMSE). The MMSE was developed by M. Folstein, S. Folstein, and McHugh (1975) to determine the level of cognitive impairment. It is an 11-item scale measuring six aspects of cognitive function: orientation, registration, attention and calculation, recall, language, and praxis. Scores can range from 0 to 30, with lower scores indicating greater impairment. The MMSE has proved successful at assessing levels of cognitive impairment in many populations, including community residents (Kramer, German, Anthony, Von Korf, & Skinner, 1985), hospital patients (Teri, Larson, & Reifler, 1988), residents of long-term care facilities (Lesher & Whelihan, 1986), and neurological patients (Dick et al., 1984). However, Escobar et al. (1986) suggested using another instrument with Spanish-speaking individuals, as the MMSE may overestimate dementia in this population. Roca et al. (1984) also recommended other instruments for patients with less than 8 years of schooling for similar reasons. In contrast, the MMSE may underestimate cognitive TABLE 2.4 Screening Instruments for Cognitive Impairment Instrument Author MMSE M. Folstein et al. (1975) CCSE

Jacobs et al. (1977)

SPMS

Pfeiffer (1975)

HSCS

Faust & Fogel (1989)

MSQ

Kahn et al. (1960)

Description 11 itemsdesigned to determine level of cognitive impairment 30 itemsdesigned to detect presence of organic mental disorder 10 itemsdesigned to detect presence of cognitive impairment 15 itemsdesigned to estimate presence, scope, and severity of cognitive impairment

Sensitivity/ Application Specificity 1, 3, 4 .83/.89

2, 3, 4, 5

.73/.90

1

.55-.88/.72.96

2

.94/.92

1, 4, 5, 6 .55-.96/NA 10 itemsdesigned to quantify dementia Note. 1 = community populations, 2 = cognitively intact, 3 = hospital inpatients, 4 = medical patients, 5 = geriatric, 6 = long-term care patients.

< previous page

page_61

next page >

< previous page

page_62

next page > Page 62

impairment in psychiatric populations (Faustman, Moses, & Cernansky, 1990). The MMSE appears to have lower sensitivity with mildly impaired individuals who are more likely to be labeled as demented (Doyle, Dunn, Thadani, & Lenihan, 1986). As such, the MMSE is most useful for patients with moderate to moderately severe dementia. Fuhrer and Ritchie (1993) confirmed that the MMSE was more discriminating for moderate dementias as opposed to milder cases, but did not find a significant difference associated with education. The authors also noted that cutoff scores for the MMSE require adjustment when comparisons involve clinical samples with base rates higher than the 6% prevalence observed in the general population. Cognitive Capacity Screening Examination (CCSE). The CCSE is a 30-item scale designed to detect diffuse organic disorders, especially delirium, in medical populations. The instrument was developed by Jacobs, Berhard, Delgado, and Strain (1977) and is recommended if delirium is suspected. The items include questions of orientation, digit recall, serial sevens, verbal short-term memory, abstractions, and arithmetic, all of which are helpful in detecting delirium (Baker, 1989). The CCSE has been used with geriatric patients (McCartney & Palmateer, 1985), as well as hospitalized medical-surgical patients (Foreman, 1987). In a comparison study of several brief screening instruments, the CCSE was shown to be the most reliable and valid (Foreman, 1987). Like the MMSE, the CCSE is also influenced by the educational level of the subject. However, unlike the MMSE, the CCSE cannot differentiate levels of cognitive impairment or types of dementias, and is most appropriate for cognitively intact patients (Judd et al., 1986). Short Portable Mental Status Questionnaire (SPMSQ). The SPMSQ (Pfeiffer, 1975) is a 10-item scale for use with community and/or institutional residents. This scale is unique in that it has been used with rural and less-educated populations (Baker, 1989). The items assess orientation, and recent and remote memory; however, visuospatial skills are not tested. The SPMSQ is a reliable detector of organicity (Haglund & Schuckit, 1976), but it should not be used to predict the progression or course of the disorder (Berg, Edwards, Danziger, & Berg, 1987). High Sensitivity Cognitive Screen (HSCS). This scale was designed to be as sensitive and comprehensive as lengthier instruments while still being clinically convenient. It was developed by Faust and Fogel (1989) for use with 16- to 65-year-old, native Englishspeaking subjects with at least an eighth-grade education who are free from gross cognitive dysfunction. The 15 items include reading, writing, immediate and delayed recall, and sentence construction tasks, among others. The HSCS has shown adequate reliability and validity and is best used to estimate presence, scope, and severity of cognitive impairment (Faust & Fogel, 1989). The HSCS cannot pinpoint specific areas of involvement and, as with most of these scales, should represent a first step toward cognitive evaluation, not a substitute for a standard neuropsychological assessment. Mental Status Questionnaire (MSQ). The MSQ is a 10-item scale developed by Kahn, Goldfarb, Pollock, and Peck (1960). It has been used successfully with medical geriatric patients (LaRue, D'Elia, Clark, Spar, & Jarvik, 1986), community residents (Shore, Overman, & Wyatt, 1983), and long-term care patients (Fishback, 1977). Disadvantages of this measure include its sensitivity to education and ethnicity of the subject, its reduced sensitivity with mildly impaired individuals, and its omission of tests of retention, registration, and cognitive processing (Baker, 1989).

< previous page

page_62

next page >

< previous page

page_63

next page > Page 63

Other Instruments. Three measures have been developed that are particularly appropriate for primary care use, because their main function is to simply detect or rule out the presence of dementia. FROMAJE (Libow, 1981) classifies individuals into normal, mild, moderate, and severe dementia groups and has been used successfully with long-term care patients (Rameizl, 1984). The Blessed Dementia Scale (Blessed, Tomlinson, & Roth, 1968) measures changes in activities and habits, personality, interests and drives, and is useful for determining presence of dementia, though not its progression. Finally, the Global Deterioration Scale (Reisberg, Ferris, deLeon, & Crook, 1982) can be used to distinguish between normal aging, age-associated memory impairment, and primary degenerative disorder (such as Alzheimer's disease). The GDS is useful for assessing the magnitude and progression of cognitive decline (Reisberg, 1984). Recently, other innovative approaches have been proposed. With specific reference to Alzheimer's disease, Steffens and his colleagues (1996) proposed using the Telephone Interview for Cognitive Status in conjunction with a videotaped mental status exam. These researchers believe that the use of a telephone-based methodology shows some promise, and may help with physician time constraints. Another innovative new scale is the Chula Mental Test (CMT) developed by Jitapunkul, Lailert, Worakul, Srikiatkhachorn, and Ebrahim (1996) for use with elderly respondents from underdeveloped countries. Jitapunkul et al. commented that most cognitive screening measures have been based on highly developed, Western notions of cognitive dysfunction. As such, they may not be culturally or linguistically relevant in other countries. The CMT is a 13-item scale that is less biased toward education and literacy, which aids in minimizing false positives when screening in underdeveloped countries. The CMT tests for remote memory, orientation, attention, language, abstract thinking, judgment, and general knowledge. Two other measures of particular interest to the field of psychiatry are the Neurobehavioral Cognitive Status Examination (NCSE; Mitrushina, Abara, & Blumenfeld, 1995) and the Cognitive Levels Scale (C. Allen & R. Allen, 1987). The Cognitive Levels Scale is designed to measure cognitive impairment and social dysfunction in patients with mental disorders. Cognitive impairment is classified according to six levels (profoundly disabled to normal) and has implications for patients' functioning at home and at work. The NCSE samples 10 cognitive domains, which include orientation, attention, comprehension, repetition, naming, construction, memory, calculation, similarities, and judgment. The NCSE is capable of screening intact individuals as negative within approximately 5 minutes because, by design, it introduces a demanding item at the beginning of each substantive domain. The NCSE has been applied with neurological, medical, and psychiatric patients, and has been found capable of discriminating those patients with organic mental disorder from those free of the disorder. Although the NCSE has established a record of high sensitivity, low specificity has been characteristic of the test (Mityrushina et al., 1995). Cognitive Screening in Geriatric Populations As alluded to in the beginning of the chapter, an important consideration in any screening paradigm concerns the prevalence of the index disorder in the population under investigation. The prevalence of cognitive disorders is relatively dramatic in elderly populations. Furher and Ritchie (1993) noted a 6% prevalence rate for dementia in the general patient population, which may rise to as high as 14% to 18% in the elderly (Jagger, Clarke, & Anderson, 1992). In studying delirium, Hart et al. (1995) indicated

< previous page

page_63

next page >

< previous page

page_64

next page > Page 64

a prevalence of from 10% to 13% in the general patient population, which they estimate may rise to as high as from 15% to 30% in elderly patients. Screening the geriatric patient can often be a challenging enterprise for a number of diverse reasons. First, these patients often present with sensory, perceptual, and motor problems that seriously constrain the use of standardized tests. Poor vision, diminished hearing, and other physical handicaps can undermine the appropriateness of tests that are dependent on these skills. Similarly, required medications can cause drowsiness or inalertness, or in other ways interfere with optimal cognitive functioning. Illnesses such as heart disease and hypertension, common in the elderly, have also been shown to affect cognitive functioning (Libow, 1977). These limitations require screening instruments that are flexible enough to be adapted to the patient with handicaps or illnesses, and yet be sufficiently standardized to allow normative comparisons. Another difficulty with this population involves distinguishing cognitive impairment from aging-associated memory loss, and from characteristics of normal aging. This distinction requires a sensitive screening instrument, as the differences between these conditions are often subtle. Normal aging and dementia can be differentiated through their different effects on such functions as language, memory, perception, attention, information-processing speed, and intelligence (Bayles & Kaszniak, 1987). The Global Deterioration Scale is a screening test designed for this specific purpose. It has been shown to describe the magnitude of cognitive decline, and to predict functional ability (Reisberg, Ferris, deLeon, & Crook, 1988). A final problem encountered when screening in geriatric populations is the comorbidity of depression. Depression is one of several disorders in the elderly that may imitate dementia, resulting in a syndrome known as "pseudodementia." These patients have no discernable organic impairment, and the symptoms of dementia will usually remit when the underlying affective disorder is treated. Variability of task performance can distinguish these patients from truly demented patients, who tend to have an overall lowered performance level on all tasks (C. Wells, 1979). If depression is suspected, it should be the focus of a distinct diagnostic workup. Recently, a number of new instruments have been developed that help address these problems. One of these, the Cognitive Test for Delirium (CTD; Hart et al., 1995), appears promising. The CTD is a 9-item, examineradministered assessment that evaluates orientation, attention span, memory, comprehension, and vigilance. The CTD is completely nonverbal and requires only 10 to 15 minutes administration time. Through the application of ROC analysis (discussed later), Hart et al. were able to establish an optimal cutoff score of less than 19 to discriminate delirium from other disorders (Hart et al., 1995). They also reported that the CTD correlates highly with the MMSE in delirium and dementia patients, and it achieved a sensitivity and specificity of 100% and 95%, respectively, in an implementation with dementia in ICU patients. Cognitive Screening Among Inpatient Medical Populations When attempting to screen for cognitive impairment in medical populations, several of the limitations mentioned earlier as pertaining to geriatric populations will also apply, because the groups often overlap. Medical patients are often constrained by their illness and may not be able to respond to the test in the required manner. In addition, these patients are often bedridden, necessitating the use of a portable, bedside instrument. Perhaps the most demanding issue when evaluating this population is discriminating between the dementing patient and the patient with acute confusional states, or delirium.

< previous page

page_64

next page >

< previous page

page_65

next page > Page 65

This is particularly important not only because of the increased occurrence of delirium in medical patients, but because if left untreated, delirium can progress to an irreversible condition. Delirium can have multiple etiologies, such as drug intoxication, metabolic disorders, fever, cardiovascular disorders, or effects of anesthesia. The elderly and medical patients in general are both susceptible to misuse or overuse of prescription drugs, as well as metabolic or nutritional imbalances. Hypothyroidism, hyperparathyroidism, and diabetes are a few of the medical conditions often mistaken for dementia (Albert, 1981). In addition, cognitive impairment can also be caused by infections, such as pneumonia. Fortunately, three cardinal characteristics enable practitioners to distinguish dementia from delirium. The first characteristic is the rate of onset of symptoms. Delirium is marked by acute or abrupt onset of symptoms, whereas dementia has a more gradual progression. A second characteristic is the impairment of attention. Delirious patients have special difficulty sustaining attention on tasks such as serial sevens and digit span. The third characteristic is nocturnal worsening, which is characteristic of delirium but not dementia (Mesulam & Geschwind, 1976). Cognitive Screening in Primary Care Settings As already mentioned, many cases of cognitive impairment go undetected. This may be due to the fact that the early stages of cognitive dysfunction are often quite subtle, and many of these cases first present to primary care physicians (Mungas, 1991), who tend to have their principal focus on other systems. Also, many are unfamiliar with the available procedures for detecting cognitive impairment, whereas others are reluctant to add a formal cognitive screening to their schedule of procedures. Although brief, the 10 to 30 minutes required of most cognitive screening instruments remains a formidable requirement considering the fact that, on average, a family practice physician spends from 7 to 10 minutes with each patient. Because cognitive screening techniques are highly transportable and actuarial in nature, and may be administered by a broad range of health care professionals, the solution to introducing such screening in primary care may be to train nurses or physician's assistants to conduct screening. Such an approach would not add to the burden of physicians, and would at least effect an initiation of such programs so that their utility can be realistically evaluated. Methodological Issues in Screening for Psychiatric Disorders The Problem of Low Base Rates Over 40 years ago, a paper appeared in the psychological literature (Meehl & Rosen, 1955) that sensitized psychologists to the dramatic impact of low base rates on the predictive validity of psychological tests. Meehl and Rosen demonstrated that attempts to predict rare attributes or events, even with highly valid tests, would result in substantially more misclassifications than correct classifications if the prevalence of the event was sufficiently low. Knowledge and understanding of this important but little known fact remained limited to a few specialists at that time. However, Vecchio (1966)

< previous page

page_65

next page >

< previous page

page_66

next page > Page 66

TABLE 2.5 Predictive Values of Positive and Negative Tests at Varying Prevalence (Base) Rates Prevalence or Base Rate Predictive Value of a + Predictive Value of a (%) (%) (%) 16.1 99.9 1 2 27.9 99.9 5 50.0 99.7 10 67.9 99.4 20 82.6 98.7 50 95.0 95.0 75 98.3 83.7 100 100 Note. Synopsis of data originally presented by Vecchio (1966). Sensitivity and specificity = 95%. published a report in the medical literature dealing with essentially the same phenomenon. In Vecchio's report, where the substantive aspects of the report dealt with screening tests in medicine, the information reached a much wider audience. As a result, knowledge of the special relation between low base rates and the predictive validity of screening tests has since become well-established. To be precise, low prevalence does not equally affect all aspects of a test's validity; its impact is felt only in the validity partition that deals with correctly classifying positives, or "cases." Predictive validity concerning negatives, or "noncases," is minimally impaired because with extremely low prevalence, even a test with moderate validity will perform adequately. This relation is summarized in Table 2.5, which is a synopsis of data originally given by Vecchio (1966). In the example developed by Vecchio, the sensitivity and specificity of the screening test are given as .95, values that do not represent realistic validity coefficients for a psychological screening test. Table 2.6 provides a more realistic example of the relation between prevalence and positive predictive value, based on a hypothetical cohort of N = 1,000, with validity coefficients (i.e., sensitivity and specificity) more consistent with those that might be genuinely anticipated for such tests. The data of Table 2.5 and 2.6 make it clear that as prevalence drops below 10%, the predictive value of a positive experiences a precipitous decline. In the first example, when prevalence reaches 1%, the predictive value of a positive is only 16%, which means in practical terms that in such situations 5 out of 6 positives will be false positives. The predictive value of a negative remains extremely high throughout the range of base rates depicted, and is essentially unaffected by low prevalence situations. The example from Table 2.6 is more realistic in that the validity coefficients are more analogous to those commonly reported for psychological screening tests. In the screening situation depicted here, the predictive value of a positive drops from 77% when prevalence is 30% (e.g., the rate of psychiatric disorders among specialized medical patients) to 7.5% when prevalence falls to 1%. In the latter instance, 12 out of 13 positives would be false positives. Sequential Screening: A Technique for Low Base Rates Although screening for psychiatric disorders in general is not usually affected by problems of low base rates, there are specific mental health phenomena (e.g., suicide), and diagnostic categories (e.g., panic disorder) revealing prevalences that are quite low. In

< previous page

page_66

next page >

< previous page

page_67

next page > Page 67

TABLE 2.6 Relation of Prevalence (Base Rate) and Positive Predictive Value Assumed Test Sensitivity = .80 Assumed Test Specificity = .90 Prevalence = .30 Prevalence = .05 Prevalence = .01 Actual Disorder Actual Disorder Actual Disorder T Pos. Neg. T Pos. Neg. T Pos. Neg. 240 70 40 + 310 E+8 99 107 E E + 95 135 60 630 10 855 S 690 S 865 S 2 891 893 300 700 50 950 T T T 10 990 Pos. Predict. Val. = 240/310 Pos. Predict. Val. = 40/135 Pos. Predict. = 8/107 = = 77% = 30% 7.5% addition, as Baldessarini, Finklestein, and Arana (1983) noted, the nature of the population being screened can markedly affect the quality of screening outcomes. A good example of this distinction is provided by the dexamethasone suppression test (DST) when used as a screen for major depressive disorder (MDD). The DST functions relatively effectively as a screen for MDD on inpatient affective disorders units where the prevalence of the disorder is quite high. In general medical practice, however, where the prevalence of MDD is estimated to be about 5%, the DST results in unacceptable rates of misclassification. The validity of the DST is insufficient to support effective screening performance in populations with low base rates of MDD. A method designed to help overcome low base rate problems is commonly referred to as sequential screening. In a sequential screening paradigm, there are two phases to screening and two screening tests. Phase I involves a less refined screen, whose primary purpose is to correctly identify individuals without the condition and eliminate them from consideration in Phase II evaluation. The initial screening also has the important effect of raising the prevalence of the index condition in the remaining cohort. In Phase II, a separate test of equal or superior sensitivity is then utilized. Because the base rate of the index condition has been significantly raised by Phase I screening, the performance of the Phase II screen will involve much lower levels of false positive misclassification. A hypothetical example of sequential screening is given in Table 2.7.

Table 2.7 Hypothetical Example of Sequential Screening as a Strategy for Dealing with Low Base Rates

< previous page

page_67

next page >

< previous page

page_68

next page > Page 68

In Phase I of the hypothetical screening, a highly valid instrument with sensitivity and specificity equal to .90 is used in a large population cohort (N = 10,000) with a prevalence of 4% for the index condition. Because of the low base rate, the predictive value of a positive is only 27.2%, meaning essentially that less than 1 out of every 3 positives will be true positives. The 1,320 individuals screened positive from the original cohort of 10,000 subsequently become the cohort for Phase II screening. With an equally valid, independent test (sensitivity and specificity = .90) and a base rate of 27.2%, the predictive value of a positive in Phase II rises to 77%, representing a substantial increase in the level of screening performance. Sequential screening essentially zeros in on a high risk subgroup of the population of interest by virtue of a series of consecutive sieves. These have the effect of eliminating from consideration individuals with low likelihood of having the disorder, and simultaneously raising the base rate of the condition in the remaining sample. Sequential screening can become expensive because of the increased number of screening tests that must be administered. However, in certain situations where prevalence is low (e.g., HIV screening in the general population) and the validity of the screening test is already close to maximum, it may be the only method available to minimize errors in classification. ROC Analysis Although some screening tests operate in a qualitative fashion, depending on the presence or absence of a key indicator, psychological screening tests function, as do many others, along a quantitative continuum. The individual being screened must obtain a probability, or "score," above some criterion threshold, or "cutoff," to be considered a "positive,'' or a "case." The cutoff value is usually determined to be the value that will maximize correct classification and minimize misclassification relative to the index disorder. If the relative consequences of one type of error are considered more costly than the other (i.e., the consequences have dramatically different utilities; e.g., false negative = missed fatal but potentially curable disease), the cutoff value will often be adjusted to take this differential utility into account. Although quantitative methods exist to estimate optimal threshold values (e.g., Weinstein et al. 1980), traditionally they have been selected by simple inspection of cutoff tables and their associated sensitivities and specificities. The selection of a cutoff value automatically determines both the sensitivity and specificity of the test because it defines the rates of correct identification and misclassification. Actually, an entire distribution of cutoffs are possible, with corresponding sensitivities and specificities. Further, as touched on in the previous section, test performance (i.e., the errors associated with a particular cutoff value) is highly affected by the prevalence or base rate of the disorder under study. Viewed from this perspective, a test should not be characterized by a sensitivity and specificity; rather, it should be perceived as possessing distributions of sensitivities and specificities associated with the distribution of possible threshold values and the distribution of prevalences. Receiver Operating Characteristic (ROC) analysis is a method that enables the visualization of the entire distribution of sensitivity/specificity combinations for all possible cutoff values and prevalences. As such, it enables the selection of a criterion threshold based on substantially more information, and represents a much more sophisticated clinical decision process. ROC analysis was first developed by Swets (1964) in the context of signal detection paradigms in psychophysics. Subsequently, applications of the technique were developed in the areas of radiology and medical imaging (Hanley

< previous page

page_68

next page >

< previous page

page_69

next page > Page 69

& McNeil, 1982; Metz, 1978; Swets, 1979). Madri and Williams (1986) and Murphy et al. (1987) introduced and applied ROC analysis to the task of screening for psychiatric disorders. More recently, Somoza and his colleagues (Somoza, 1994, 1996; Somoza & Mossman, 1990a, 1990b, 1991; Somoza, Steer, A. T. Beck, & D. A. Clark, 1994) published an extensive series of in-depth reports integrating ROC analysis with information theory to optimize the performance of diagnostic and screening tests. In their informative series, these investigators reviewed the topics of construction of tests (Somoza & Mossman, 1990a, 1990b), the effects of prevalence (Mossman & Somoza, 1991), optimizing information yield (Somoza & Mossman, 1992a, 1992b), and maximizing expected utility (Mossman & Somoza, 1992), among others. Typically, an ROC curve is developed by plotting corresponding values of a test's sensitivity (true positive rate) on the vertical axis, against the compliment of its specificity (false positive rate) on the horizontal axis, for the entire range of possible cutting scores from lowest to highest (see Fig. 2.1). A number of computer programs (e.g., Somoza & Mossman, 1991) are available to generate and plot ROC curves. The ROC curve demonstrates the discriminative capacity of the test at each possible definition of threshold (cutoff score) for psychiatric disorder. If the discriminative capacity of the test is no better than chance, then the curve will follow a diagonal straight line from the origin of the graph (lower left) to its uppermost right corner. This line is termed the "line of no information." The ROC curve rises from the origin (point 0, 0) to its termination point (1, 1) on a plane as defined. To the extent that a test has discriminative ability, the curve will bow in a convex manner toward the upper left corner of the graph. The greater the deviation toward the upper left corner, the greater discriminative ability the test has for the particular application at hand. An ROC summary statistic describing the discriminative capacity of a test is referred to as the "area under the curve" (AUC). The AUC may be thought of as a probability estimate that at each cutoff score a randomly chosen positive (or "case") will demonstrate a higher score than a randomly chosen negative. When the ROC curve follows the line of no information, the AUC is .50. In the situation of theoretically optimal discrimination, the ROC curve would follow the outline of the ordinate of the graph from point 0, 0 to point 1, 0, and then move at right angles to point 1, 1. In this situation, the AUC would equal 1.0.

Fig. 2.1. ROC curves for two hypothetical psychiatric screening tests. From "Performance of Screening and Diagnostic Tests" by J. M. Murphy et al., 1987, Archives of General Psychiatry, 44, pp. 550-555. Copyright © 1987 by American Medical Association. Reprinted by permission.

< previous page

page_69

next page >

< previous page

page_70

next page > Page 70

Although ROC analysis has been introduced to the area of screening for psychiatric disorders only within the past decade, investigators have found numerous applications for the technique. In addition to simply describing the distribution of validity coefficients for a single test, ROC analysis has been used to compare various screening tests (Somoza et al., 1994; Weinstein, Berwick, Goldman, Murphy, & Barsky, 1989), aid in the validation of new tests, compare different scoring methods for a particular test (Birtchnell, Evans, Deahl, & Master, 1989), contrast the screening performance of a test in different populations (Burnam, Wells, Leake, & Landsverk, 1988; Hughson, Cooper, McArdle, & Smith, 1988), and assist in validating a foreign language version of a standard test (Chong & Wilkinson, 1989). ROC analysis has also been effectively integrated with paradigms from information theory to maximize information yield in screening (Somoza & Mossman, 1992a, 1992b), and with decision-making models to optimize expected utilities of screening outcomes (Mossman & Somoza, 1992). Although ROC analysis does not represent a definitive solution for the complex problems of psychiatric screening, it does significantly increase the information available to the decision maker and provides a relatively precise and sophisticated method for making decisions. Conclusions Currently, little doubt remains that psychiatric disorders meet the WHO criteria for conditions appropriate for the development of effective health screening programs (Wilson & Junger, 1968). The magnitude of the health problem they represent is extensive, and the morbidity, mortality, and costs associated with these conditions is imposing. Currently, there are valid, cost-efficient, psychological tests to effectively identify these conditions in medical and community settings, and the efficacy of treatment regimens for most psychiatric conditions is constantly improving (Regier et al., 1988). Although evidence concerning the incremental advantage of early detection remains somewhat equivocal, evidence is compelling that left to their natural courses, such conditions will result in chronic, compound morbidities of both a physical and psychological nature (L. R. Derogatis & Wise, 1989; Katon et al., 1990; Regier et al., 1988). As indicated earlier, it is of little ultimate consequence to develop effective systems of treatment planning and outcomes assessment if the majority of individuals who would benefit from their utilization are lost to the system. In large measure, this undesirable reality has to do with the fact that a substantial majority of patients with psychiatric conditions are never seen by mental health professionals, and up to 20% are never seen by any health care professional. The majority of individuals with psychiatric morbidity seen in the health care system are attended by primary care physicians who have been insufficiently trained to recognize or effectively treat these conditions. A substantial plurality of these cases go unrecognized, and of those in whom a correct diagnosis is made, only a minority are referred to mental health professionals. Typically, primary care physicians prefer to treat these cases personally, even though they do not feel confident in doing so. Primary care physicians are playing a more prominent role as "gatekeepers" in the health care system relative to psychiatric disorders and all evidence points toward their continuing in this role in the future. This being the case, it seems imperative that effective methods be developed and introduced to facilitate these professionals' diagnostic and treatment decisions concerning psychiatric disorders. Although biological markers may

< previous page

page_70

next page >

< previous page

page_71

next page > Page 71

ultimately bring enhanced refinement to the identification of psychiatric morbidity (Jefferson, 1988; Tollefson, 1990), such a reality remains futuristic. In the present, available psychological screening techniques can deliver valid, cost-effective identification of these conditions now. Considering the cost-benefit and savings involved, such systems should be extensively implemented as soon as possible. References Albert, M. (1981). Geriatric neuropsychology. Journal of Consulting and Clinical Psychology, 49, 835-850. Allen, C., & Allen, R. (1987). Cognitive disabilities: Measuring the social consequences of mental disorders. Journal of Clinical Psychiatry, 48, 185-190. Allison, T. G., Williams, D. E., Miller, T. D., Patten, C. A., Bailey, K. R., Squires, R. W., & Gau, G. T. (1995). Medical and economic costs of psychologic distress in patients with coronary artery disease. Mayo Clinic Proceedings, 70, 734-742. American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd rev. ed.). Washington, DC: Author. Anderson, S. M., & Harthorn, B. H. (1989). The recognition diagnosis, and treatment of mental disorders by primary care physicians. Medical Care 27, 869-886. Anderson, S. M., & Harthorn, B. H. (1990). Changing the psychiatric knowledge of primary care physicians: The effects of a brief intervention on clinical diagnosis and treatment. General Hospital Psychiatry, 12, 177-190. Baker, F. (1989). Screening tests for cognitive impairment. Hospital and Community Psychiatry, 40, 339-340. Baldessarini, R. J., Finklestein, S., & Arana, G. W. (1983). The predictive power of diagnostic tests and the effect of prevalence of illness. Archives of General Psychiatry, 40, 569-573. Barrett, J. E., Barrett, J. A., Oxman, T. E., & Gerber, P. D. (1988). The prevalence of psychiatric disorders in a primary care practice. Archives of General Psychiatry, 45, 1100-1106. Bayles, K., & Kaszniak, A. (1987). Communication and cognition in normal aging and dementia. Boston: Little, Brown. Bech, P. (1987). Observer rating scales of anxiety and depression with reference to DSM-III for clinical studies in psychosomatic medicine. Advances of Psychosomatic Medicine, 17, 55-70. Bech, P. Grosby, H., Husum, B., & Rafaelson, L. (1984). Generalized anxiety and depression measured by the Hamilton Anxiety Scale and the Melancholia Scale in patients before and after cardiac surgery. Psychopathology, 17, 253-263. Beck, A. T., & Beck, R. W. (1972). Screening depressed patients in family practice: A rapid technic. Postgraduate Medicine, 52, 81-85. Beck, A. T., Kovacs, M., & Weissman, A. (1975). Hopelessness and suicidal behavior: An overview. Journal of the American Medical Association, 234, 1146-1149. Beck, A. T., & Steer, R. A. (1993). Manual for the Beck Depression Inventory. San Antonio, TX: The Psychological Corporation. Beck, A. T., Ward C., Mendelson, M., Mock, J. E., & Erbaugh, J. K. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 53-63. Benjamin, G., Kaszniak, A., Sales, B., & Shanfield, S. (1986). The role of legal education in producing psychological distress among law students and lawyers. American Bar Foundation Research Journal, 2, 225252. Berg, G., Edwards, D., Danzinger, W., & Berg, L. (1987). Longitudinal change in three brief assessments of SDAT. Journal of the America Geriatrics Society, 35, 205-212. Bjarnason, T., & Thorlindsson, T. (1994). Manifest predictors of past suicide attempts in a population of Icelandic adolescents. Suicide and Life Threatening Behavior, 24, 350-357. Birtchnell, J., Evans, C., Deahl, M., & Master, N. (1989). The Depression Screening Instrument

< previous page

page_71

next page >

< previous page

page_72

next page > Page 72

(DSI): A device for the detection of depressive disorders in general practice. Journal of Affective Disorders, 16, 269-281. Blazer, D., George, G. K., Landerman, R., Pennybacker, M., Melville, M. L., Woodbury, M., Mantor, K. G., Jordan, K., & Locke, B. (1984). Psychiatric disorders: A rural/urban comparison. Archives of General Psychiatry, 41, 959-970. Blessed, G., Tomlinson, B., & Roth, M. (1968). The association between quantitative measures of dementia and of senile change in the cerebral gray matter of elderly. British Journal of Psychiatry, 114, 797-811. Bridges, K., & Goldberg, D. (1984). Psychiatric illness in in patients with neurological disorders: Patient's view on discussions of emotional problems with neurologists. British Medical Journal, 289, 656-658. Bulik, C. M., Carpenter, L. L., Kupfer, D. J., & Frank, E. (1990). Features associated with suicide attempts in recurrent major depression. Journal of Affective Disorders, 18, 27-29. Burnam, M. A., Wells, K. B., Leake, B., & Landsverk, J. (1988). Development of a brief screening instrument for detecting depressive disorders. Medical Care, 26, 775-789. Burns, B. J., & Burke, J. D. (1985). Improving mental health practices in primary care. Public Health Reports, 100, 294-299. Chiles, J. A., & Strosahl, K. D. (1995). The suicidal patient: Principles of assessment, treatment, and case management (pp. 50-105). Washington, DC: American Psychiatric Press. Chong, M., & Wilkinson, G. (1989). Validation of 30- and 12-item versions of the Chinese Health Questionnaire (CHQ) in patients admitted for general health screening. Psychological Medicine, 19, 495-505. Clark, L., Levine, M., & Kinney, N. (1988-1989). A multifaceted and integrated approach to the prevention, identification, and treatment of bulimia on college campuses. Journal of College Student Psychotherapy, 3, 257298. Cochran, C. D., & Hale, W. D. (1985). College students norms on the Brief Symptom Inventory. Journal of Clinical Psychology, 31, 176-184. Cohen, L. J., Test, M. A., & Brown, R. L. (1990). Suicide and schizophrenia: Data from a prospective community treatment study. American Journal of Psychiatry, 47, 602-607. Commission on Chronic Illness. (1957). Chronic Illness in the United States. Vol. 1, Cambridge, Commonwealth Fund, Harvard University Press. Comstock, G. W., & Helsing, K. J. (1976). Symptoms of depression in two communities. Psychological Medicine, 6, 551-564. Craske, M., & Krueger, M. (1990). Prevalence of nocturnal panic in a college population. Journal of Anxiety Disorders, 4, 125-139. Craven, J. L., Rodin, G. M., & Littlefield, C. (1988). The Beck Depression Inventory as a screening device for major depression in renal dialysis patients. International Journal of Psychiatry in Medicine, 18, 365-374. Cummings, J., & Benson, F. (1986). Dementia of the Alzheimer type: An inventory of diagnostic clinical features. Journal of the American Geriatrics Society, 34, 12-19. Davis, T. C., Nathan, R. G., Crough, M. A., & Bairnsfather, L. E. (1987). Screening depression with a new tool: Back to basics with a new tool. Family Medicine, 19, 200-202. Depue, R., Krauss, S., Spoont, M., & Arbisi, P. (1989). General Behavior Inventory identification of unipolar and bipolar affective conditions in a nonclinical university population. Journal of Abnormal Psychology, 98, 117-126. Derogatis, L. R. (1975a). The Brief Symptom Inventory (BSI). Baltimore, MD: Clinical Psychometric Research. Derogatis, L. R. (1975b). The SCL-90-R. Baltimore, MD: Clinical Psychometric Research. Derogatis, L. R. (1977). SCL-90-R: Administration, scoring and procedures manual-I, Baltimore: Clinical Psychometric Research. Derogatis, L. R. (1983). SCL-90-R: Administration, scoring and procedures manual-II, Baltimore: Clinical Psychometric Research. Derogatis, L. R. (1990). SCL-90-R: A bibliography of research reports 1975-1990. Baltimore: Clinical Psychometric Research. Derogatis, L. R. (1993). BSI: Administration, scoring and procedures manual for the Brief Symptom Inventory (3rd ed.). Minneapolis, MN: National Computer Systems.

Derogatis, L. R. (1994). SCL-90-R: Administration, scoring and procedures manual (3rd ed.).

< previous page

page_72

next page >

< previous page

page_73

next page > Page 73

Minneapolis, MN: National Computer Systems. Derogatis, L. R. (1997). The Brief Symptom Inventory-18 (BSI-18). Minneapolis, MN: National Computer Systems. Derogatis, L., DellaPietra, L., & Kilroy, V. (1992). Screening for psychiatric disorder in medical populations. In M. Fava, G. Rosenbaum, & R. Birnbaum (Eds.), Research designs and methods in psychiatry. (pp. 145-170). New York: Elsevier. Derogatis, L. R., & Derogatis, M. F. (1996). SCL-90-R and the BSI. In B. Spilker (Ed.), Quality of life and pharmacoeconomics (pp. 323-335). Philadelphia: Lippincott-Raven. Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi, L. (1974a). The Hopkins Symptom CheckList (HSCL): A self-report symptom inventory. Behavioral Science, 19, 1-15. Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi. L. (1974b). The Hopkins Symptom Checklist (HSCL). In P. Pinchot (Ed.), Psychological measurements in psychopharmacology (pp 79-111). Basel: Karger. Derogatis, L. R., & Melisaratos, N. (1983). The Brief Symptom Inventory: An introductory report. Psychological Medicine, 13, 595-605. Derogatis, L. R., Morrow, G. R., Fetting, J., Penman, D., Piasetsky, S., Schmale, A. M., Henrichs, M.,& Carnicke, C.L.M. (1983). The prevalence of psychiatric disorders among cancer patients. Journal of the American Medical Association, 249, 751-757. Derogatis, L. R., & Spencer, P. M. (1982). BSI administration and procedures manualI. Baltimore: Clinical Psychometric Research. Derogatis, L. R., & Wise, T. N. (1989). Anxiety and depressive disorders in the medical patient. Washington, DC: American Psychiatric Press. Dick, J., Guiloff, R., Stewart, A., Blackstock, J., Bielawska, C., Paul, E., & Marsden, C. (1984). Mini-Mental State Examination in neurological patients. Journal of Neurology, Neurosurgery, and Psychiatry, 47, 496-499. Dohrenwend, B. P., & Dohrenwend, B. S. (1982). Perspectives on the past and future of psychiatric epidemiology. American Journal of Public Health, 72, 1271-1279. Doyle, G., Dunn, S., Thadani, I., & Lenihan, P. (1986). Investigating tools to aid in restorative care for Alzheimer's patients. Journal of Gerontological Nursing, 12, 19-24. Eisen, S. V. (1996). Behavior and Symptom Identification Scale (BASIS-32). In L. I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 65-69). Baltimore: Williams & Wilkins. Eisen, S. V., & Dickey, B. (1996). Mental health outcome assessment: The new agenda. Psychotherapy, 33, 181-189. Eisen, S. V., & Grob, M. C. (1989). Substance abuse in an inpatient population. McLean Hospital Journal, 14, 1-22. Erikkson, J. (1988). Psychosomatic aspects of coronary artery bypass graft surgery: A prospective study of 101 male patients. Acta Psychiatrica Scandinavica, 77(Suppl. 340), 112. Escobar, J., Burnam, A., Karno, M., Forsythe, A., Landsverk, J., & Golding, J. (1986). Use of the Mini-Mental State Examination (MMSE) in a community population of mixed ethnicity: Cultural and linguistic artifacts. Journal of Nervous and Mental Disease, 174, 607-614. Fauman, M. A. (1983). Psychiatric components of medical and surgical practice: II. Referral and treatment of psychiatric disorders. American Journal of Psychiatry, 140, 760-763. Faust, D., & Fogel, B. (1989). The development and initial validation of a sensitive bedsider cognitive screening test. Journal of Nervous and Mental Disease, 177, 25-31. Faustman, W., Moses, J., & Csernansky, J. (1990). Limitations of the Mini-Mental State Examination in predicting neuropsychological functioning in a psychiatric sample. Acta Psychiatrica Scandinavica, 81, 126-131. Fishback, D. (1977). Mental Status Questionnaire for Organic Brain Syndrome, with a New Visual Counting Test. Journal of the American Geriatrics Society, 35, 167-170. Folstein, M., Folstein, S., & McHugh, P. (1975). Mini-Mental State. Journal of Psychiatric Research, 12, 189198. Foreman, M. (1987). Reliability and validity of mental status questionnaires in elderly hospitalized patients. Nursing Research, 36, 216-220. Frerichs, R. R., Areshensel, C. S.,& Clark, V. A. (1981). Prevalence of depression in Los Angeles County.

American Journal of Epidemiology, 113, 691-699.

< previous page

page_73

next page >

< previous page

page_74

next page > Page 74

Fulop, G., & Strain, J. J. (1991). Diagnosis and treatment of psychaitric disorders in medically ill inpatients. Hospital and Community Psychiatry, 42, 389-394. Furher, R., & Ritchie, K. (1993). Re: C. Jagger et al.'s article "Screening for dementiaA comparison of two tests using Receiver Operating Characteristic (ROC) analysis" 7, 659-665. International Journal of Geriatric Psychiatry, 8, 867-868. Galton, F. (1883). Inquiries into human faculty and its development. New York: Macmillan. Goldberg, D. (1972). The detection of psychiatric illness by questionnaire. Oxford, England: Oxford University Press. Goldberg, D., & Hillier, V. F. (1979). A scaled version of the General Health Questionnaire. Psychological Medicine, 9, 139-145. Goldberg, D., & Williams, P. (1988). A user's guide to the General Health Questionnaire. Windsor: NferNelson. Goldstein, L. T., Goldsmith, S. J., Anger, K.,& Leon, A. C. (1996). Psychiatric symptoms in clients presenting for commercial weight reduction treatment. International Journal of Eating Disorders, 20, 191-197. Griest, J. H., Gustafson, D. H., Stauss, F. F., Rowse, G. L., Laughren, T. P., & Chiles, J. A. (1973). A computer interview for suicide-risk prediction. American Journal of Psychiatry, 130, 1327-1332. Haglund, R., & Schuckit, M. (1976). A clinical comparison of tests of organicity in elderly patients. Journal of Gerontology, 31, 654-659. Hajek, V., Rutman, D., & Scher, H. (1989). Brief assessment of cognitive impairment in patients with stroke. Archives of Physical Medicine and Rehabilitation, 70, 114-117. Hamilton, M. (1959). The assessment of anxiety states by rating. British Journal of Medical Psychology, 32, 5055. Hamilton, M. (1960). A rating scale for depression. Journal of Neurosurgery Psychiatry, 23, 50-55. Hamilton, M. (1967). Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology, 6 278-296. Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a Receiver Operating Characteristic (ROC) curve. Diagnostic Radiography, 143, 29-36. Hart, R. P., Levenson, J. L., Sessler, C. N., Best, A. M., Schwartz, S. M., & Rutherford, L. E. (1995). Validation of a cognitive test for delerium in medical ICU patients. Psychosomatics, 37, 533-546. Hawton, K. (1981). The long term outcome of psychiatric morbidity detected in general medical patients. Journal of Psychosomatic Research, 25, 237-243. Hedlund, J. L., & Vieweg, M.D. (1979). The Hamilton Rating Scale for Depression: A comprehensive review. Journal of Operational Psychiatry, 10, 149-165. Hoeper, E. W., Nyczi, G. R., & Cleary, P. D. (1979). Estimated prevalence of RDC mental disorders in primary care. International Journal of Mental Health, 8, 6-15. Hughson, A.V.M., Cooper, A. F., McArdle, C. S., & Smith, D. C. (1988). Validity of the General Health Questionnaire and its subscales in patients receiving chemotherapy for early breast cancer. Journal of Psychosomatic Research, 32, 393-402. Jacobs, J., Berhard, M., Delgado, A., & Strain, J. (1977). Screening for organic mental syndromes in the medically ill. Annals of Internal Medicine, 86, 40-46. Jagger, C., Clarke, M., & Anderson, J. (1992). Screening for dementia: A comparison of two tests using Receiver Operating Characteristic (ROC) analysis. International Journal of Geriatric Psychiatry, 7, 659-665. Jefferson, J. W. (1988). Biologic systems and their relationship to anxiety. Psychiatric Clinics of North America, 11, 463-472. Jitapunkul, S., Lailert, C., Worakul, P, Srikiatkhachorn, A., & Ebrahim, S. (1996). Chula Mental Test: A screening test for elderly people in less developed countries. International Journal of Geriatric Psychiatry, 11, 715-720. Johnson, R., Ellison, R., & Heikkinen, C. (1989). Psychological symptoms of counseling center clients. Journal of Counseling Psychology, 36, 110-114. Jones, L. R., Badger, L. W., Ficken, R. P., Leepek, J. D., & Anderson, R. L. (1987). Inside the hidden mental health network: Examining mental health care delivery of primary care physicians. General Hospital Psychiatry, 9, 287-293.

Judd, B., Meyer, J., Rogers, R., Gandhi, S., Tanahashi, N., Mortel, K., & Tawaklna, T. (1986). Cognitive performance correlates with cerebrovascular impairments in multi-infarct

< previous page

page_74

next page >

< previous page

page_75

next page > Page 75

dementia. Journal of the American Geriatrics Society, 34, 355-360. Kahn, R., Goldfarb, A., Pollack, M., & Peck, A. (1960). Brief objective measures for the determination of mental status in the aged. American Journalof Psychiatry, 117, 326-328. Kamerow, D. B., Pincus, H. A., & MacDonald, D. I. (1986). Alcohol abuse, other drug abuse, and mental disorders in medical practice: Prevalence, cost, recognition, and treatment. Journal of the American Medical Association, 255, 2054-2057. Kane, M. T., & Kendall, P. C. (1989). Anxiety disorders in children: A multiple-baseline evaluation of a cognitive-behavioral treatment. Behavior Therapy, 20, 499-508. Katon, W., Von Korff, M., Lin, E., Lipscomb, P., Russo, J., Wagner, E., & Polk, E. (1990). Distressed high utilizers of medical care: DSM-III-R diagnoses and treatment needs. General Hospital Psychiatry, 12, 355-362. Kedward, H. B., & Cooper, B. (1966). Neurotic disorders in urban practice: A 3 year followup. Journal of College of General Practice, 12, 148-163. Kempf, E. J. (1914-1915). The behavior chart in mental diseases. American Journal of Insanity, 7, 761-772. Kessler, L. G., Amick, B. C., & Thompson, J. (1985). Factors influencing the diagnosis of mental disorder among primary care patients. Medical Care, 23, 50-62. Kramer, M., German, P., Anthony, J., Von Korff, M., & Skinner, E. (1985). Patterns of mental disorders among the elderly residents of eastern Baltimore. Journal of the American Geriatrics Society, 11, 236-245. Kuhn, W. F., Bell, R. A., Seligson, D., Laufer, S. T., & Lindner, J. E. (1988). The tip of the iceberg: Psychiatric consultations on an orthopedic service. International Journal of Psychiatry in Medicine, 18, 375-378. LaRue, A., D'Elia, L., Clark, E., Spar, J., & Jarvik, L. (1986). Clinical tests of memory in dementia, depression, and healthy aging. Psychology and Aging, 1, 69-77. Lesher, E., & Whelihan, W. (1986). Reliability of mental status instruments administered to nursing home residents. Journal of Consulting and Clinical Psychology, 54, 726-727. Libow, L. (1981). A rapidly administered, easily remembered mental status evaluation: FROMAJE. In L. S. Libow, & F. T. Sherman (Eds.), The core of geriatric medicine (pp. 85-91). St. Louis: C. V. Mosby. Libow, L. (1977). Senile dementia and pseudosenility: Clinical diagnosis. In C. Eisdorfer & R. Friedel (Eds.), Cognitive and emotional disturbance in the elderly. Chicago: Year Book Medical Publishing. Linn, L., & Yager, J. (1984). Recognition of depression and anxiety by primary care physicians. Psychosomatics, 25, 593-600. Lish, J. D., Zimmerman, M., Farber, N. J., Lush, D. T., Kuzma, M. A., & Plescia, G. (1996). Suicide screening in a primary care setting at a Veterans Affairs medical center. Psychosomatics, 37, 413-424. Luce, R. D., & Narens, L. (1987). Measurement scales on the continuum. Science, 236, 1527-1532. Lyketsos, C. G., Hutton, H., Fishman, M., Schwartz, J., & Trishman, G. J. (1996). Psychiatric morbidity on entry to an HIV primary care clinic. AIDS, 10, 1033-1039. Madri, J. J., & Williams, P. (1986). A comparison of validity of two psychiatric screening questionnaires. Journal of Chronic Disorders, 39, 371-378. Malt, U. F. (1989). The validity of the General Health Questionnaire in a sample of accidentally injured adults. Acta Psychiatrica Scaninavica, 80, 103-112. Maser, J. D., & Cloninger, C. R. (1990). Comorbidity of mood and anxiety disorders. Washington, DC: American Psychiatry Press. McCartney, J., & Palmateer, L. (1985). Assessment of cognitive deficit in geriatric patients: A study of physician behavior. Journal of the American Geriatrics Society, 33, 467-471. McDermott, R., Hawkins, W., Littlefield, E., & Murray, S. (1989). Health behavior correlates of depression among university students. Journal of American College Health, 38, 115-119. McDougall, G. (1990). A review of screening instruments for assessing cognition and mental status in older adults. Nurse Practitioner, 15, 18-28. Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52, 194-216. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from

< previous page

page_76

next page > Page 76

persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749. Mesulam, M., & Geschwind, N. (1976). Disordered mental status in the postoperative period. Urologic Clinics of North America, 3, 199-215. Metz, C. E. (1978). Basic principles of ROC analysis. Seminars in Nuclear Medicine, 8, 283-298. Mitrushina, M., Abara, J., & Blumenfeld, A. (1995). Cognitive screening of psychiatric patients. Journal of Psychiatric Research, 29, 13-22. Moore, J. T., Silimperi, D. R., & Bobula, J. A. (1978). Recognition of depression by family medicine residents: The impact of screening. Journal of Family Practice, 7, 509-513. Mossman, D., & Somoza, E. (1991). Neuropsychiatric decision making: The role of disorder prevalence in diagnostic testing. Journal of Neuropsychiatry and Clinical Neurosciences, 3, 84-88. Mossman, D., & Somoza, E. (1992). Balancing risks and benefits: Another approach to optimizing diagnostic tests. Journal of Neuropsychiatry and Clinical Neurosciences, 4, 331-335. Mungas, D. (1991). In-office mental status testing: A practical guide. Geriatrics, 46, 54-66. Murphy, J. M., Berwick, D. M., Weinstein, M. C., Borus, J. F., Budman, S. H., & Klerman, G. L. (1987). Performance of screening and diagnostic tests. Archives of General Psychiatry, 44, 550-555. Myers, J. K., Weissman, M. M., Tischler, G. L., Holzer, C. E., III, Leaf, P. J., Orvaschel, H., Anthony, J. C., Boyd, J. H., Burke, J. D., Kramer, M., & Stoltzman, R. (1984). Six-month prevalence of psychiatric disorders in three communities. Archives of General Psychiatry, 41, 959-970. Nielson, A. C., & Williams, T. A. (1980). Depression in ambulatory medical patients. Archives of General Psychiatry, 37, 999-1009. Noyes, R., Chrisiansen, J., Clancy, J., Garvey, M. J., & Suelzer, M. (1991). Predictors of serious suicide attempts among patients with panic disorder. Comprehensive Psychiatry, 32, 261-267. Nunnally, J. (1978). Psychometric theory. New York: McGraw-Hill. O'Hara, M. N., Ghonheim, M. M., Hinrich, J. V., Metha, M. P., & Wright, E. J. (1989). Psychological consequences of surgery. Psychosomatic Medicine, 51, 356-370. Orleans, C. T., George, L. K., Houpt, J. L., & Brodie, H. (1985). How primary care physicians treat psychiatric disorders: A national survey of family practitioners. American Journal of Psychiatry, 142, 52-57. Parikh, R. M., Eden, D. T., Price, T. R., & Robinson, R. G. (1988). The sensitivity and specificity of the center for epidemiologic studies depression scale in screening for post-stroke depression. International Journal of Psychiatry in Medicine, 18, 169-181. Pfeiffer, E. (1975). A short portable mental status questionnaire for the assessment of organic brain deficit in elderly patients. Journal of the American Geriatrics Society, 23, 433-441. Piersma, H. L., Reaume, W. M., & Boes, J. L. (1994). The Brief Symptom Inventory (BSI) as an outcome measure for adult psychiatric inpatients. Journal of Clinical Psychology, 50, 555-563. Radloff, L. S. (1977). The CES-D scale: A self report depression scale for research in the general population. Applied Psychological Measurement, 1, 385-401. Radloff, L. S., & Locke, B. Z. (1985). The community mental health assessment survey and the CES-D scale. In M. M. Weissman, J. K. Myers, & C. G. Ross (Eds.), Community survey of psychiatric disorder. New Brunswick: Rutgers University Press. Rameizl, P. (1984). A case for assessment technology in long-term care: The nursing perspective. Rehabilitation Nursing, 9, 29-31. Rand, E. H., Badger, L. W., & Coggins, D. R. (1988). Toward a resolution of contradictions: Utility of feedback from the GHQ. General Hospital Psychiatry, 10, 189-196. Regier, D. A., Boyd, J. H., Burke, J. D., Rae, D. S., Myers, J. K., Kramer, M., Robins, L. N., George L. K., Karno, M., & Locke, B. Z. (1988). One month prevalence of mental disorders in the United States. Archives of General Psychiatry, 45, 977-986. Regier, D. A., Goldberg, I. D., Burns, B. J., Hankin, J., Hoeper, E. W., & Nyez, G. R. (1982). Specialist/generalist division of responsibility for patients with mental disorders, Archives of General Psychiatry, 39, 219-224. Regier, D., Goldberg, I., & Taube, C. (1978). The defacto U.S. mental health services system:

< previous page

page_77

next page > Page 77

A public health perspective. Archives of General Psychaitry, 35, 685-693. Regier, D. A., Robert, M. A., Hirschfeld, Goodwin, F. K., Burke, J. D., Lazar, J. B., & Judd, L. L. (1988). The NIMH depression awareness, recognition, and treatment program: Structure, aims, and scientific basis. American Journal of Psychiatry, 145, 1351-1357. Reisberg, B. (1984). Stages of cognitive decline. American Journal of Nursing, 84, 225-228. Reisberg, B., Ferris, S., deLeon, M., & Crook, T. (1982). The Global Deterioration Scale for assessment of primary degenerative dementia. American Journal of Psychiatry, 139, 1136-1139. Reisberg, B., Ferris, S., deLeon, M., & Crook, T. (1988). Global Deterioration Scale (GDS). Psychopharmacology Bulletin, 24, 661-663. Reynolds, W. M., & Gould, J. W. (1981). A psychometric investigation of the standard and short form Beck Depression Inventory. Journal of Consulting Clinical Psychology, 49, 306-307. Riskind, J. H., Beck, A. T., Brown, G., & Steer, R. A. (1987). Taking the measure of anxiety and depression: Validity of the reconstructed Hamilton scales. Journal of Nervous and Mental Disorders, 175, 474-479. Roberts, R. E., Rhoades, H. M., & Vernon, S. W. (1990). Using the CES-D Scale to screen for depression and anxiety: Effects of language and ethnic status. Psychiatry Research, 31, 69-83. Robins, L. N., Helzer, J. E., Weissman, M. M., Orvaschel, H., Greenberg, E., Burke, J. D., & Regier, D. A. (1984). Lifetime prevalence of specific psychiatric disorders in three sites. Archives of General Psychiatry, 41, 949-958. Roca, P., Klein, L., Kirby, S., McArthur, J., Vogelsang, G., Folstein, M., & Smith, C. (1984). Recognition of dementia among medical patients. Archives of Internal Medicine, 144, 73-75. Rosenthal, T. L., Miller, S. T., Rosenthal, R. H., Sadish, W. R., Fogelman, B. S., & Dismuke, S. (1992). Assessing emotional interest at the internist's office. Behavioral Research and Therapy, 29, 249-252. Royce, D., & Drude, K. (1984). Screening drug abuse clients with the Brief Symptom Inventory. International Journal of Addiction, 19, 849-857. Saravay, S. M., Pollack, S., Steinberg, M. D., Weinschel, B., & Habert, B. A. (1996). Four-year follow-up of the influence of psychological comorbidity on medical rehospitalization. American Journal of Psychiatry, 153, 397-403. Schmidt, N., & Telch, M. (1990). Prevalence of personality disorders among bulimics, non-bulimic binge eaters, and normal controls. Journal of Psychopathology and Behavioral Assessment, 12, 160-185. Schulberg, H. C., Saul, M., McClelland, M., Ganguli, M., Christy, W., & Frank, R. (1985). Assessing depression in primary medical and psychiatric practices. Archives of General Psychiatry, 42, 1164-1170. Schurman, R. A., Kramer, P., & Mitchell, J. B. (1985). The Hidden Mental Heath Network. Archives of General Psychiatry, 42, 89-94. Seay, T., & Beck, T. (1984). Alcoholism among college students. Journal of College Student Personnel, 25, 9092. Sederer, L. I., & Dickey, B. (1996). Outcomes assessment in clinical practice. Baltimore: Williams & Wilkins. Seltzer, A. (1989). Prevalence, detection and referral of psychiatric morbidity in general medical patients. Journal of the Royal Society of Medicine, 82, 410-412. Shapiro, S., German, P., Skinner, E., Von Korff, M., Turner, R., Klein, L., Teitelbaum, M., Kramer, M., Burke, J., & Burns, B. (1987). An experiment to change detection and management of mental morbidity in primary care. Medical Care, 25, 327-339. Shore, D., Overman, C., & Wyatt, R. (1983). Improving accuracy in the diagnosis of Alzheimer's disease. Journal of Clinical Psychiatry, 44, 207-212. Shrout, P. E., & Yager, T. J. (1989). Reliability and validity of screening scales: Effect of reducing scale length. Journal of Clinical Epidemiology, 42, 69-78. Snyder, S., Strain, J. J., & Wolf, D. (1990). Differentiating major depression from adjustment disorder with depressed mood in the medical setting. General Hospital Psychiatry, 12, 159-165. Somoza, E. (1994). Classification of diagnostic tests. International Journal of Biomedical Computing, 37, 41-55. Somoza, E. (1996). Eccentric diagnostic tests: Redefining sensitivity and specificity. Medical Decision Making, 16, 15-23.

< previous page

page_78

next page > Page 78

Somoza, E., & Mossman, D. (1990a). Introduction to neuropsychiatric decision making: Binary diagnostic tests. Journal of Neuropsychiatry and Clinical Neurosciences, 2, 297-300. Somoza, E., & Mossman, D. (1990b). Optimizing REM latency as a diagnostic test for depression using ROC analysis and information theory. Biological Psychiatry, 27, 990-1006. Somoza, E., & Mossman, D. (1991). Biological markers and psychiatric diagnosis: Risk-benefit analysis using ROC analysis. Biological Psychiatry, 29, 811-826. Somoza, E., & Mossman, D. (1992a). Comparing and optimizing diagnostic tests: An information-theoretic approach. Medical Decision Making, 12, 179-188. Somoza, E., & Mossman, D. (1992b). Comparing diagnostic tests using information theory: The INFO-ROC technique. Journal of Neuropsychiatry and Clinical Neurosciences, 4, 214-219. Somoza, E., Steer, R. A., Beck, A. T., & Clark, D. A. (1994). Differentiating major depression and panic disorders by self-report and clinical rating scales: ROC analysis and information theory. Behavioral Research and Therapy, 32, 771-782. Spilker, B. (1996). Quality of life and pharmacoeconomics in clinical trials. Philadelphia: Lippincott-Raven. Steer, R. A., & Beck, A. T. (1996). Beck Depression Inventory. In L. I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 100-104). Baltimore: Williams & Wilkens. Steffens, D. C., Welsh, K. A., Burke, J. R., Helms, M. J., Folstein, M. F., Brandt, J., McDonald, W. M., & Breitner, J. C. (1996). Diagnosis of Alzheimer's disease in epidemiologic studies by staged review of clinical data. Neuropsychiatry, Neuropsychology and Behavioral Neurology, 2, 107-113. Strain, J. J., Fulop, G., Lebovits, A., Ginsberg, B., Robinson, M., Stern, A. Charap, P., & Gany, F. (1988). Screening devices for diminished cognitive capacity. General Hospital Psychiatry, 10, 16-23. Striegel-Moore, R., Silberstein, L., Frensch, P., & Rodin, J. (1989). A prospective study of disordered eating among college students. International Journal of Eating Disorders, 8, 499-509. Swedo, S. E., Rettew, D. C., Kuppenheimer, M., Lum, D., Dolan, S., & Goldberger, E. (1991). Can adolescent suicide attempters be distinguished from at-risk adolescents. Pediatrics, 88, 620-629. Swets, J. A. (1964). Signal detection and recognition by human observers. New York: Wiley. Swets, J. A. (1979). ROC analysis applied to the evaluation of medical imaging techniques. Investigatory Radiology, 14, 109-121. Szulecka, T., Springett, N., & De Pauw, K. (1986). Psychiatric morbidity in first-year undergraduates and the effect of brief psychotherapeutic interventionA pilot study. British Journal of Psychiatry, 149, 75-80. Telch, M., Lucas, J., & Nelson, P. (1989). Non-clinical panic in college students: An investigation of prevalence and symptomatology. Journal of Abnormal Psychology, 98, 300-306. Teri, L., Larson, E., & Reifler, B. (1988). Behavioral disturbance in dementia of the Alzheimer type. Journal of the American Geriatrics Society, 36, 1-6. Tollefson, G. D. (1990). Differentiating anxiety and depression. Psychiatric Medicine, 8, 27-39. Vecchio, T. J. (1966). Predictive value of a single diagnostic test in unselected populations. New England Journal of Medicine, 274, 1171. Von Korff, M., Dworkin, S. F., & Kruger, A. (1988). An epidemiologic comparison of pain complaints. Pain, 32, 173-183. Weinstein, M. C., Berwick, D. M., Goldman, P. A., Murphy, J. M., & Barsky, A. J. (1989). A comparison of three psychiatric screening tests using Receiver Operating Characteristic (ROC) analysis. Medical Care, 27, 593-607. Weinstein, M. C., Fineberg, H. V., Elstein, A. S., Frazier, H. S., Neuhauser, D., Neutra, R. R., & McNeil, B. J. (1980). Clinical decision analysis, Philadelphia: Saunders. Weissman, M. M., & Merikangas, K. R. (1986). The epidemiology of anxiety and panic disorder: An update. Journal of Clinical Psychiatry, 47, 11-17. Weissman, M. M., Myers, J. K., & Thompson, W. D. (1981). Depression and its treatment in a U.S. urban community. Archives of General Psychiatry, 38, 417-421. Wells, C. (1979). Pseudodementia. American Journal of Psychiatry, 136, 895-900. Wells, K. B., Golding, J. M., & Burnam, M. A. (1988). Psychiatric disorders in a sample of

< previous page

page_79

next page > Page 79

the general population with and without chronic medical conditions. American Journal of Psychiatry, 145, 976-981. Wells, V., Klerman, G., & Deykin, E. (1987). The prevalence of depressive symptoms in college students. Social Psychiatry, 22, 20-28. West, R., Drummond, C., & Eames, K. (1990). Alcohol consumption, problem drinking and anti-social behaviour in a sample of college students. British Journal of Addiction, 85, 479-486. Westefeld, J. S., Bandura, A., Kiel, J. T., & Scheel, K. (1996). The College Student Reason for Living Inventory: Additional psychometric data. Journal of College Student Development, 37, 348-350. Westefeld, J. S., Cardin, D., & Deaton, W. L. (1992). Development of the College Student Reasons for Living Inventory. Suicide and Life-Threatening Behavior, 22, 442-452. Westefeld, J. S., & Liddell, D. L. (1994). The Beck Depression Inventory and its relationship to college student suicide. Journal of College Student Development, 35, 145-146. Whitaker, A., Johnson, J., Shaffer, D., Rapoport, J., Kalikow, K., Walsh, B., Davies, M., Braiman, S., & Dolinsky, A. (1990). Uncommon trouble in young people: Prevalence estimates of selected psychiatric disorders in a non-referred adolescent population. Archives of General Psychiatry, 47, 487-496. Williams, J. B. (1988). A structured interview guide for the Hamilton Depression Rating Scale. Archives of General Psychiatry, 45, 742-747. Wilson, J. M., & Junger, F. (1968). Principles and practices of screening for diseases. (Public Health Papers No. 34). Geneva: WHO. Wise, M. G., & Taylor, S. E. (1990). Anxiety and mood disorders in mentally ill patients. Journal of Clinical Psychiatry, 51, 27-32. Woodworth, R. S. (1918). Personal Data Sheet. Chicago: Stoelting. Yelin, E., Mathias, S. D., Buesching, D. P., Rowland, C., Calucin, R. Q., & Fifer, S. (1996). The impact of the employment of an intervention to increase recognition of previously untreated anxiety among primary care physicians. Social Science in Medicine, 42, 1069-1075. Yopenic, P. A., Clark, C. A., & Aneshensel, C. S. (1983). Depression problem recognition and professional consultation. Journal of Nervous and Mental Disorders, 171, 15-23. Zabora, J. R., Smith-Wilson, R., Fetting, J. H., & Enterline, J. P. (1990). An efficient method for psychosocial screening of cancer patients. Psychosomatics, 31, 1992-1996. Zalaquett, C. P., & Wood, R. J. (1997). Evaluating stress: A book of resources. Lanham, Md: Scarecrow. Zung, W.K.W. (1965). A self-rating depression scale. Archives of General Psychiatry, 12, 63-70. Zung, W., Magill, M., Moore, J., & George, D. (1983). Recognition and treatment of depression in a family practice. Journal of Clinical Psychiatry, 44, 3-6.

< previous page

page_79

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_81

next page > Page 81

Chapter 3 Use of Psychological Tests/ Instruments for Treatment Planning Larry E. Beutler Ginger Goodrich Daniel Fisher Oliver B. Williams University of California, Santa Barbara The advent of the descriptive diagnostic system of DSM-III, biological psychiatry, and managed health care has conspired to produce a decline of third-party support for the use of formal psychological tests as routine diagnostic procedures in mental health. Descriptive diagnosis and the symptom focus of biological treatments eliminated the need for complex tests to identify covert and highly abstract psychic processes, as had previously been required in the diagnosis of disorders such as schizophrenia and neurosis. It is paradoxical that the same psychiatric and managed care forces that initially voiced concerns about maintaining reliable and valid diagnostic data have consistently preferred using of subjective and unreliable unstructured clinical interviews to gather this data, instead of empirically established, reliable, and valid psychological tests. The virtual exclusion of formal tests from the list of routinely approved intake procedures underlines the signal failure of psychological assessment to establish itself as making a meaningful contribution to treatment planning. To advantageously capitalize on the empirical advantages of psychological tests over unstandardized clinical methods, the nature and goals of the assessment process must change. The omnibus, broad-ranging instruments that have long served this tradition must give way to assessment procedures that are short, practical, and treatment centered. A new patient presents the clinician with a number of important questions; answering these is made easier by the use of reliable and empirically based assessment procedures: Is this condition treatable? Is psychotherapy or pharmacotherapy an appropriate treatment modality? What about family therapy? Should the treatment focus on the patient's symptoms, on broader symptoms of depression and anxiety, or even on the resolution of underlying, dynamic conflicts? Should the patient be hospitalized for further evaluation, or be seen by a neurologist or other medical specialist? What markers will tell practitioners when treatment can be safely terminated? The immediate challenge to the clinician is to decide on the most productive intervention with which to commence treatment and engage the client. Simultaneously, the clinician must foment a treatment plan that will be maximally effective in addressing the client's needs. In pursuing these objectives, it is implicitly acknowledged that treatments

< previous page

page_81

next page >

< previous page

page_82

next page > Page 82

that are effective for one client or problem may be ineffective for another. In recognition of this fact, health care researchers have attempted to develop guidelines that will assist clinicians by identifying both those treatments that have the highest likelihood of success and those that might be either inappropriate or minimally effective. The emerging field of Prescriptive Treatment Planning is devoted to the prescription of effective treatments and the proscription of ineffective ones (Beutler & Clarkin, 1990; Beutler & Harwood, 1995; Frances, Clarkin, & Perry, 1984). Because of their psychometric qualities relative to unstructured interview methods, and because they are adaptable to complex statistical manipulations, psychological tests are ideal for the task of developing standardized procedures for differentially assigning or recommending psychosocial treatments (e.g., Beutler & Berren, 1995; Butcher, 1990; Graham, 1987, 1993; Groth-Marnat, 1997). However, most of the indicators and signs that are employed by clinicians in making differential mental health treatment decisions from psychological tests are based on clinical experiences and conjectures rather than on empirical evidence of improved treatment efficacy or effectiveness. Accordingly, this chapter is devoted to providing the clinician with a representative overview of the research that suggests that test performance may predict both treatment outcome and, more importantly, a differential response to available treatments. It also reports an initial effort to develop a method that consolidates, in one measurement device, information that is currently available through a large battery of contemporary tests. Predictive Dimensions in Differential Therapeutics Psychological tests have traditionally been used to address questions within five domains: diagnosis, etiology or causes of behavior, prognosis and course, treatment planning, and functional impairment (Beutler & Rosner, 1995). Of these, as noted, questions of diagnosis and differential diagnosis have always been primary. Psychological tests like the Rorschach and Thematic Apperception Test (TAT) were developed and came to be relied on to uncover covert thought disorder, underlying dynamic conflicts, and pathological ideation and impulses associated with various diagnoses. These tests purported to be able to reveal these hidden processes much more validly than unstructured interviews. In response, diagnoses came to frequently depend on their being able to do so. As the diagnostic system became less reliant on evidence of covert, underlying processes, with the advent of the third edition of the Diagnostic and Statistical Manual (DSM-III; APA, 1983), weaknesses of projective tests became apparent (Butcher, 1995; Groth-Marnat, 1997; Nezworski & Wood, 1995). Even beyond the contributions of DSM-III, the seemingly unbridled expansion and growing complexity of the diagnostic system itself has raised concerns about the very processes of constructing psychiatric diagnoses. Contemporary disorders and their criteria (DSM-IV; APA, 1994) represent a consensual opinion of a committee of psychiatric experts, the majority vote of whom determines whether a given pattern of symptoms should be accorded the status of a socially viable ''syndrome" or "disorder." The committee's decisions to recognize a given cluster of symptoms as a diagnosable condition has traditionally been based on (a) the presence and frequency of the symptoms, (b) an analysis of the symptom's social significance and interpersonal effects, and where the empirical evidence has warranted, (c) the specificity of the symptomatic response to various classes of drugs.

< previous page

page_82

next page >

< previous page

page_83

next page > Page 83

However, the committees that have been responsible for the development of the various DSMs have largely ignored empirical information about patient characteristics and traits (e.g., coping styles, resistance, conflicts, etc.) that have been useful in the selection and use of various psychotherapeutic procedures. Consequently, even if a diagnosis is reliable (and there is still debate about this, e.g., Follette & Houts, 1996; Wells & Sturm, 1996), it provides little information on which to develop a differentially sensitive psychotherapeutic program. Although a patient with a diagnosis of major depression with vegetative signs can be expected to respond better to tricyclic antidepressants than to anxiolytics (e.g., Wells & Sturm, 1996), the diagnostic label does not allow a clinician to select among cognitive, interpersonal, or relationship psychotherapies. The symptoms that determine diagnosis are quite insensitive to qualities and characteristics that determine how well prospective patients will respond to them. The treatments themselves are cross-cutting. Their use is not bound by or specific to patient diagnoses and, in fact, diagnoses may be poor indicators for their implementation. It is unlikely that a cognitive therapy can be constructed to be so specific that it would work well for those with depression but not for those with anxiety, personality disorders, minor depressions, and eating disorders. Cognitive therapy has been applied to all of these disorders, as well as such widely different conditions as tics, sexual dysfunctions, sleep disorders, impulse control disorders, substance abuse disorders, adjustment disordersvirtually any condition in which thought and/or behavior is disrupted. Such theoretically diverse treatments as cognitive therapy, behavior therapy, psychodynamic therapy, and interpersonal therapy have all been advocated as treatments for the same multitude of diagnostic conditions. This cross-diagnostic application of psychotherapies does not mean that specific treatment indicators are not available. Indeed, some such indicators are present, but these indicating conditions were ignored in constructing the diagnostic criteria. Most clinicians realize the weaknesses and limitations of diagnosis and develop a large and rich array of treatment possibilities as they seek and obtain extradiagnostic information. This information is consolidated into both a patient formulation and a treatment plan. However, the patient formulations as well as the resulting treatment plans vary widely from clinician to clinician, even within a given theoretical framework (e.g., Caspar, 1995; Horowitz et al., 1984; Luborsky, 1996; Masterson, Tolpin, & Sifneos, 1991; Vaillant, 1997). The uniqueness of the diverse formulations reflect the amalgamation and influence both of therapists' personal theories of psychopathology and variations that exist among different formal systems. The failure of clinicians to rely on empirically derived dimensions of personality and prediction in constructing their formulations of patients and treatments probably reflects the absence of both knowledge about how to define and use such empirical predictors and of discriminating measures that are simply administered and reliably capture some of the patient characteristics related to treatment outcomes. Many authors have attempted to define the extradiagnostic dimensions that may allow a clinician to predict the differential effects of applying different therapeutic procedures. Most of these efforts have provided guidelines for the application of different procedures within a single theoretical framework, and few have attempted to incorporate the breadth of interventions that characterize widely different theories. For example, Hollon and Beck (1986) suggested the conditions under which cognitive therapy might be directed to schematic change versus changes in automatic thoughts, and Strupp and Binder (1984) introduced guidelines within which the psychodynamic therapist may differentially offer interpretations or support. However, because any single theoretical

< previous page

page_83

next page >

< previous page

page_84

next page > Page 84

framework is less than comprehensive of the many foci, procedures, and strategies that are advocated by the available array of psychotherapies, these mono-theoretical guidelines are necessarily incomplete and weakened. Recognizing the limitations that exist when only procedures advocated by a single theory can be selected for use, in recent years there has emerged a strong movement toward "technical eclecticism" among practitioners and researchers alike (Norcross & Goldfried, 1992; Striker & Gold, 1993). The several approaches that constitute this movement, although diverse in type, share the objective of developing guidelines for the selection of maximally effective interventions from the broadest possible array of proven procedures, regardless of the theories that gave rise to them. These guidelines specify the characteristics of patients and therapeutic situational demands that best fit one another by various theories of intervention. The various models differ in their level of technical specificity and in the nature of the constructs that they select as most important. For example, Lazarus (1981) developed one of the more widely recognized integrative models, Multi-Modal Therapy (MMT). MMT offers a general framework on which patient experience and problems are defined, and relates these general dimensions to the use of different models and techniques of treatment. Specifically, MMT provides a structured means for assessing the relative and absolute levels of problems in seven general domains of experience, the collection of which is described by the acronym BASICID (Behaviors, Affects, Sensory experiences, Imagery, Cognitions, Interpersonal relationships, and need for Drugs). The clinician observes the levels of disturbance in each domain and then determines their interrelations in the form of a firing or triggering order, describing the pattern of behavior that occurs when the problems arise. Then, the model proposes classes of interventions that correspond with the dimensions of patient experience affected by the problem. Thus, experiential procedures may be used when sensory and affective experiences are disturbed, behavioral interventions may be used when behavioral symptoms are disruptive, cognitive change procedures may be used when dysfunctional thoughts are observed, and so forth. In contrast to the focus on problem activation that defines the integration of procedures within MMT, other approaches have emphasized other methods of defining how procedures are integrated. In some cases, this has resulted in a less specific relation being posited between patient dimensions and the nature of treatment techniques than that proposed by Lazarus. Some, for example, have identified stages of problem development or resolution as an organizing principle. These stage models vary from one to another by virtue of the degree of emphasis they place on patient variables (Prochaska, 1984) or intervening therapy goals (Beitman, 1987) as the stage indicators. Prochaska (1984), more specifically, identified broad classes of intervention that may be recommended as a function of the patient's stage of problem resolution. Thus, behavioral strategies are recommended when the patient is in a stage of active problem resolution, strategies that raise awareness are used when the patient is in a preconceptual (precontemplative) stage of problem resolution, insight strategies are used when one is in the process of problem contemplation and cognitive exploration, and so forth. Beitman (1987), on the other hand, applied the concept of stages to the organization of the course of psychotherapy rather than to the stage of problem resolution achieved by the patient. Accordingly, he emphasized that the early sessions should focus on relationship development, and then the therapist should proceed through helping the patient recognize patterns, changing these patterns, and preparing for termination. Beutler and Clarkin (1990) suggested a resolution of these viewpoints, offering the possibility that there may be interrelationships among patient and treatment stages. The

< previous page

page_84

next page >

< previous page

page_85

next page > Page 85

resolution of such differences is dependent on the ability and success in developing psychological tests that can reliably identify patient and problem information that is directly usable in planning treatments. Patient Predisposing Variables Psychometrically stable measurements of treatment-relevant patient dimensions (i.e., predisposing variables) potentially could be used to identify markers for the application of different interventions. Unavoidably, however, the patient, treatment, and matching dimensions that have been correlated with treatment effects is virtually limitless (Beutler, 1991). In an effort to bring some order to the many variables and diverse hypotheses associated with the several models of differential treatment assignment and to place them in the perspective of empirical research, Beutler and Clarkin (1990) grouped patient characteristics presented by the different theories into a series of superordinate and subordinate categories. This classification included seven relatively specific classes of patient variables that are distinguished both by their susceptibility to measurement using established psychological tests and by their ability to predict differential responses to psychosocial treatment (Beutler & Hodgson, 1993; Gaw & Beutler, 1995). These categories included: Functional Impairment, Subjective Distress, Problem Complexity, Readiness for/Stage of Change, Potential to Resist Therapeutic Influences, Social Support, and Coping Styles. These "patient predisposing" dimensions will provide points of reference for organizing the topics of this chapter as the use of psychological tests for treatment planning is considered. Table 3.1 summarizes some representative instruments that may be used for assessing these various dimensions.1 Functional Impairment Traditionally, literature on the role of problem severity in treatment success has confounded two aspects of patient functioning: level of impairment and subjective distress (e.g., Beutler, Wakefield, & Williams, 1994). Not surprisingly, therefore, research on this topic has produced mixed results. To clarify the differing roles of impairment and distress, this discussion follows Strupp, Horowitz, and Lambert (1997) and distinguishes between external ratings of social functioning (i.e., observed impairment) and self-reports of unhappiness and subjective distress. Level of patient impairment in social functioning is a variable of considerable importance to treatment studies. It is typically measured by the General Assessment of Functioning (GAF) scale from the DSM-IV or by specific problem-centered measures, 1 Because of limited space, this discussion is restricted to psychological interventions and initial predisposing variables. This chapter gives only cursory attention to how tests have been used for the selection either of medical/somatic interventions (e.g., hospitalization, ECT, medication, etc.), establishing DSM diagnoses, or for the purposes of making treatment alterations in midcourse. Likewise, the discussion does not include differential treatment planning for children and adolescents, explorations of the relation between treatment initiated changes and subsequent modifications of treatment plans, or of the relation between psychotherapy process events and subsequent outcomes. Refer to Beutler and Clarkin (1990) for a more extensive consideration of both patient and treatment variables within these latter classes of treatments.

< previous page

page_85

next page >

< previous page

page_86

next page > Page 86

TABLE 3.1 Representative Tests for Measuring Patient/Problem Dimensions Test Functional Subjective Readiness Problem Resistance Social Coping Impairment Distress for Complexity Potential Support Style Change ADIS** X BDI* X X HRSD** SCL-90X X R* STAI* X Stages of X Change* TRS* X CCRT** X MMPI* X X CPI* X FES* X Note. ADIS = Anxiety Disorders Interview Schedule, BDI = Beck Depression Inventory, HRSD = Hamilton Rating Scale for Depression, SCL-90-R = Symptom Checklist-90-Revised, STAI = State-Trait Anxiety Inventory, TRS = Therapeutic Reactance Scale, CCRT = Core Conflictual Relationship Theme, MMPI = Minnesota Multiphasic Personality Inventory, CPI = California Personality Inventory, FES = Family Environment Scale. * Self-report instruments ** Observer report instruments such as the Anxiety Disorders Interview Schedule (ADIS; DiNardo, O'Brien, Barlow, Waddell, & Blanchard, 1983) and the Hamilton Rating Scale for Depression (HRSD; Hamilton, 1967). Changes on these indices of impairment reflect treatment improvement, but initial level of functional impairment may also serve as an index that can be used for planning the intensity and modality of treatment. This conclusion derives from three lines of evidence. First, there is a negative relation between level of impairment and amount of treatment-related improvement, quite independently of the type of treatment. This finding has been obtained in such diverse conditions as bulimia (Fahy & Russell, 1993), obsessive-compulsive disorder (Keijsers, Hoogduin, & Schaap, 1994), major depression (Beutler, Kim, Davison, Karno, & Fisher, 1996), and substance abuse (McLellan, Woody, Luborsky, O'Brien, & Druley, 1983). Indeed, McClellan et al. determined that measures of functional impairment were the single best (negative) predictors of treatment outcome. Second, impairment level has implications for determining the intensity and length of treatment that will be helpful. In spite of the poor prognosis, there is evidence that persistent treatment may eventually induce an effect among moderately impaired patients (Gaw & Beutler, 1995; Keijsers et al., 1994). Shapiro et al. (1994), for example, compared behavioral and psychodynamic-interpersonal therapies that were applied over a format of either 8 or 16 weeks duration. The more intensive and lengthy treatment showed the most positive effects among those with high levels of impairment, regardless of the model or type of treatment implemented. Those with low levels of impairment were not benefitted by intensifying treatment. Third, functional impairment is an important variable in anticipating patient relapse. Maintenance of treatment effects appears to be negatively affected by the initial severity of patient impairment. Even the positive effects of intensive treatment among patients with high levels of impairment may be negated with time. Thus, the good results obtained by Shapiro et al. (1994), favoring more intensive treatment among more impaired

< previous page

page_86

next page >

< previous page

page_87

next page > Page 87

individuals, largely disappeared after a year (Shapiro et al., 1995). Likewise, T. A. Brown and Barlow (1995) demonstrated both initial therapeutic gains among patients with panic disorder, and a negative relation between impairment and maintenance of benefit 2 years later. They concluded that even when treatment is able to induce an initial positive effect among those with high levels of initial impairment, these improvements are not wellmaintained when compared to patients who were less severely impaired. In all of these studies, unfortunately, even the most intensive treatment was short term and infrequent, compared to conventional standards for treating those with severe problems. The intensive treatment studied by Shapiro and his group ran just 16 sessions and certainly would not be considered "intense" by most practitioners. Studying a treatment whose length and frequency was more typical of the intensity applied to such problems in usual practice might have improved the results, both by facilitating the initial gains and by maintaining the effects of treatment longer than the short-term treatments studied in these investigations. Another important area of study is the exploration of whether impairment level can help one selectively apply different treatments. Some evidence, for example, suggests that level of impairment may differentially respond to different classes of antidepressant drugs (e.g., Ackerman, Greenland, Bystritsky, Morgentern, & Katz, 1994). Of note in this line of investigation are studies that have attempted to support the conventional wisdom that level of functional impairment is a negative or contra-indicator for using psychosocial treatments and a positive indicator for applying psychopharmacological interventions. The well-known National Institute of Mental Health (NIMH) collaborative study of depression (Elkin et al., 1989), for example, using composite measures of initial impairment level, determined that those with severe symptoms responded more rapidly to tricyclic antidepressants than to psychotherapy. However, this conclusion is not strongly supported by other lines of research. For example, several studies have used clinician ratings of "endogeneity" as a measure of impairment level, and have explored this measure as an index for predicting the differential efficacy of medical and psychosocial interventions among depressed patients. The studies indicate that pharmacotherapy achieves its greatest efficacy among patients with endogenous symptoms, as compared to those with less severe reactive depressions. Surprisingly, however, they have uniformly failed to find the hypothesized difference between various forms of cognitive therapy and pharmacotherapy among the most seriously symptomatic participants (see Simons & Thase, 1992). Other evidence confirms the surprising observation that psychosocial interventions are at least as effective as antidepressant and antianxiety medication among most nonsomatic patients (Antonuccio, Danton, & DeNelsky, 1995; Elkin et al., 1989; Nietzel, Russell, Hemmings, & Gretter, 1987; Robinson, Berman, & Neimeyer, 1990). In contrast, there is also promising evidence that functional impairment may be a mediator of differential effects attributed to various psychosocial models of treatment (e.g., Fremouw & Zitter, 1978; Joyce & Piper, 1996; McLellan et al., 1983; Shoham-Salomon & Rosenthal, 1987; Woody et al., 1984). Of special note, unimpaired object (interpersonal) relations (Joyce & Piper, 1996) and absence of comorbid personality disorders (Woody et al., 1984) have been found to enhance the power of dynamically oriented psychotherapy. The opposite relationships may also holdpoor interpersonal relationships and complex personality disorders may respond poorly to psychodynamic and insight treatments, compared to other psychosocial interventions. For example, Kadden, Cooney, Getter, and Litt (1989) found that patients who exhibited sociopathic personality patterns responded better to cognitive-behavioral coping skills training than

< previous page

page_87

next page >

< previous page

page_88

next page > Page 88

to an insight-oriented, interpersonal, interactional group therapy. Likewise, in a study of acutely impaired, psychiatric inpatients, Beutler, M. Frank, Scheiber, Calvert, and Gaines (1984) concluded that in this population, experiential-expressive interventions are not as effective as interactive, process-oriented, or behaviorally oriented therapies. Patients treated with experiential-expressive therapies showed increased symptoms and deterioration at the end of treatment, whereas these negative effects were not found among those treated with the other interventions. Subjective Distress Patient distress is a cross-cutting, cross-diagnostic index of well-being. It is poorly correlated with external measures of impairment, and is a transitory or changeable symptom state (Strupp et al., 1997; Lambert, 1994). In clinical research, the Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961), the SCL-90-R (Derogatis, 1994), and the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Lushene, 1970) have been most often used for assessing subjective distress. Interestingly, theoretical perspectives have emphasized the importance of distress as a motivating variable in keeping a patient engaged in treatment (J. D. Frank & J. B. Frank, 1991), as well as a measure of improvement. There is at least modest support of this latter proposition and, unlike the retarding effect of patient level of impairment, moderate amounts of subjective distress have generally been found to be a positive correlate of subsequent improvement (Lambert, 1994). Specifically, there is reasonably consistent evidence that psychosocial treatments achieve their greatest effects among those with relatively high initial levels of subjective distress (e.g., Klerman, 1986; Klerman, DiMascio, Weissman, Prusoff, & Paykel, 1974; Lambert & Bergin, 1983). These findings are especially strong among those with ambulatory depressions, general anxiety, and diffuse medical symptoms. Using the BDI as a measure of distress, for example, Parker, Holmes, and Manicavasager (1986) found that initial depression severity was positively correlated with treatment response among general medical patients. Likewise, Mohr et al. (1990) observed that the likelihood (though not the magnitude) of response to treatment was positively and linearly associated with general symptom severity on the SCL-90-R among patients with moderately severe depression. Even further, among patients with mild and moderate impairment levels, research evidence suggests that psychosocial interventions are as effective as antidepressant and antianxiety medications (Elkin et al., 1989; Nietzel et al., 1987; Robinson et al., 1990). These findings are not entirely uniform, however, and some evidence indicates that subjective distress may relate to outcome in a curvilinear fashion, particularly when personality disturbance or somatic symptoms are present. Hoencamp, Haffmans, Duivenvooden, Knegtering, and Dijken (1994), for example, found that whereas a positive, linear relation existed between distress and improvement among depressed and anxious patient groups, a curvilinear relation chararacterized those who had a comorbid personality disorder (with the exception of obsessive-compulsive personality disorder) and those whose complaints were weighted heavily in the direction of somatic symptoms. At least among nonsomatic patients, subjective distress has been implicated in the prediction of differential responses to various forms of psychotherapeutic treatment. For example, in the NIMH collaborative study of moderate depression, subjective distress, as measured by the BDI, differentiated the efficacy of the psychotherapeutic treatments (Imber et al., 1990). Those patients with the most severe distress were most effectively

< previous page

page_88

next page >

< previous page

page_89

next page > Page 89

treated by interpersonal psychotherapy, as compared to cognitive therapy. Beutler et al. (1996), employing a similar sample, demonstrated that level of subjective distress was positively related to the efficacy of selfdirected, supportive forms of treatment, but was not substantially related to the effects of cognitive and experiential treatments. The consistency of the foregoing relation is mitigated by the pattern of response among patients for whom somatic symptoms are prominent. Blanchard, Schwarz, Neff, and Gerardi (1988) determined that subjective distress (measured by the STAI) was negatively, rather than positively, associated with improvment among patients with irritable bowel syndrome. Those patients whose subjective anxiety did not exceed moderate limits were most likely to benefit from behavioral and self-regulatory treatment. Similarly, using the BDI as a subjective measure of distress/depression, Jacob, Turner, Szekely, and Eidelman (1983) suggested that those with low distress levels were more likely than those with high levels to benefit from self-monitored relaxation as a treatment for headaches. Patients with moderate and high levels of subjective distress were most inconsistently benefitted by behavioral and psychotherapeutic treatments. Readiness for Change Prochaska and colleagues (1984; Prochaska & DiClemente, 1986; Prochaska, DiClemente, & Norcross, 1992) suggested that a patient's progress in treatment is a function of how well the intervention method used fits the patient's position along a progressive series of stages reflecting personal efforts to change. He and his colleagues identified five stages or phases through which a person progresses as they seek to change an aspect of their life. The Stages of Change Questionnaire (Prochaska, Velicer, DiClemente, & Fava, 1988) is designed to assess these stages of readiness and differential receptivity to different interventions. The patient's stage of change is taken as an indication of the level of a patient's receptivity to different strategies of influence. It is thought that an individual normally proceeds sequentially through the stages, sometimes recycling several times, in the course of intentionally implementing change. These stages of readiness include: precontemplation, contemplation, preparation, action, and maintenance. Prochaska and his colleagues posed two hypotheses regarding the stage of readiness achieved by a patient: (a) More advanced stages of readiness are associated with a greater likelihood of improvement; and (b) the stage of readiness serves as an indicator for the use of specific therapeutic interventions. In support of the first of these propositions, they demonstrated that among patients seeking help to quit smoking, those who progressed to a higher stage of readiness during the early phase of treatment also doubled the likelihood of making improvement within the subsequent 6 months (Prochaska et al., 1992). Research support of the proposition that a patient's pretreatment stage of readiness predicts a differential response to specific interventions has been more difficult to obtain. Prochaska (1984) initially postulated that action-oriented (behavior) therapies were best suited to individuals who had achieved the preparation and action stages of readiness, but would be less suited to patients who were in the precontemplation or contemplation stages of readiness. In turn, consciousness-raising and motivation enhancement techniques (e.g., insight-oriented therapies) were posited to be most effective for patients in these early stages of readiness. Prochaska's proposals have stimulated a good deal of research on how people prepare themselves for making changes and how these processes may be used for treatment

< previous page

page_89

next page >

< previous page

page_90

next page > Page 90

planning (e.g., O'Connor, Carbonari, & DiClemente, 1996; Prochaska, Rossi, & Wilcox, 1991). Findings provide modest support for the value of fitting some intervention strategies to the patient's stage of readiness. For example, Prochaska et al. (1988) successfully demonstrated that patients' stage of change contributed to the relative effectiveness of different treatments for reducing smoking. Project MATCH (Project MATCH Research Group, 1997) compared the patient's pretreatment readiness for change to the effectiveness of various types of intervention. The findings were only partially supportive of Prochaska's predictions: Patients who were identified as having little readiness for change (precontemplative and contemplative stages) responded better to procedures designed to enhance motivation and encourage contemplation than to the more action-oriented procedures of cognitive-behavioral therapy. Patients at the action stage, however, did not show the expected preference for cognitive therapy, and the significant fit obtained between stage and therapy strategy proved to be time dependent, emerging only during the last month of the follow-up. Problem Complexity In addition to the severity of symptomatic presentation and the stage of problem resolution achieved, problems also vary in their complexity (Beutler, 1983; Beutler & Clarkin, 1990). Complexity is indexed by the concomitant presence of personality disorders, by evidence of chronicity of major disorders, and by evidence that interpersonal and conflictual patterns recur in persistent and pervasive ways. Recurrent patterns that indicate complex, problematic behavior are thought to be evoked by the similarity of symbolic meanings given to evoking cues, rather than by obvious similarities in overt stimulus characteristics (Barber, 1989; Crits-Christoph & Demorest, 1991). Complex patterns are expressed in a similar way across a large number and variety of social systems, transcending specific events and situations. The Minnesota Multiphasic Personality Inventory (MMPI; Butcher, 1990) and other omnibus personality measures have been used to assess the chronicity of problems associated with predicting the value of symptomfocused interventions. Knight-Law, Sugerman, and Pettinati (1988), for example, found that the effectiveness of behavioral-symptom focused interventions was highest among those patients whose MMPIs indicated that their problems were reactive and situational. Similar evidence that situation-specific problems are more responsive to behavioral treatments than chronic and recurrent ones has accrued among individuals with complex somatic symptoms (LaCroix, Clarke, Bock, & Doxey, 1986), those who abuse alcohol (Sheppard, Smith, & Rosenbaum, 1988), those with eating disorders (Edwin, Anderson, & Rosell, 1988), and patients with chronic back pain (Trief & Yuan, 1983). Although omnibus personality tests are useful in assessing chronicity, they are limited in identifying the significance of pervasive dynamic conflicts. Among the instruments that are designed to determine the presence and pervasiveness of interpersonal, conflictual themes, the Core Conflictual Relationship Theme (CCRT) method (Barber, 1989; Crits-Christoph & Demorest, 1988; Crits-Christoph, Demorest, & Connolly, 1990; CritsChristoph, Luborsky, Dahl, Popp, Mellon, & Mark, 1988; Luborsky, 1996; Luborsky, Crits-Christoph, & Mellon, 1986) is probably the most promising. The CCRT is based either on clinician ratings or self-reports, and is designed to define patterns related to complex, dynamically oriented problems. The method identifies three sequential aspects of recurring interpersonal behaviors: the organizing wishes that motivate the interaction,

< previous page

page_90

next page >

< previous page

page_91

next page > Page 91

the actions anticipated from others if these wishes are expressed, and the acts of self that either follow or prevent these acts of others. The pervasiveness of a given theme across a variety of interpersonal relationships can be viewed as an index of problem complexity. Treatments vary widely in the breadth of their objectives, ranging from symptomatic to thematic. These variations in breadth are reminiscent of corresponding variations in problem complexity, suggesting a link between the two. For example, psychosocial interventions, as a rule, are aimed at broader objectives than medical ones (DeRubeis et al., 1990; Simons, Garfield, & Murphy, 1984), and treatments that are oriented toward insight and awareness focus on broader themes than behavioral and cognitive ones (e.g., Caspar, 1995; Luborsky, 1996; Strupp & Binder, 1984). The obvious similarity between problem complexity and treatment focus suggests an optimal fit between problem complexity and the breadth of treatment focus applied. Thus, high problem complexity should favor psychosocial over pharmacological interventions, and systemic or dynamic treatments over symptom-focused procedures (Gaw & Beutler, 1995). In the absence of research directly on this hypothesis, evidence for its validity is, necessarily, indirect. One line of supportive research has revealed that recurrent themes are useful as guides for constructing dynamic interventions. Two findings are relevant, indicating that treatment outcome is enhanced as a function of the level of correspondence between the interpretation offered by the therapist and the dynamic theory directing therapy (Goldfried, 1991), and the level of correspondence between the interpretation offered and the most pervasive (independently determined) theme (Crits-Christoph, Cooper, & Luborsky, 1988). Collectively, these data suggest that amount of problem complexity is positively related to the effectiveness of broad-band, psychodynamic interventions. A second line of investigation reveals that problem complexity is inversely related to the impact of narrow-band treatments. Using comorbidity (coexisting personality or somatic disorder) as an index of complexity, several studies (e.g., Fahy & Russell, 1993; Fairburn, Peveler, Jones, Hope, & Doll, 1993; Hoencamp et al., 1994) have found that, in treating patients with cognitive-behavioral therapy (a symptom-focused treatment), complexity was a negative indicator of improvement. Wilson (1996) conceded that cognitive-behavioral treatment has been observed to have poor effects on such patients, but he pointed out that such complexity is a negative prognostic factor for all interventions. Whereas this point is well taken and indicates the need to standardize the means of identifying problem complexity, the collection of results suggests a promising mediating role of problem complexity in predicting (or controlling) the benefit of treatments that vary in breadth of focal objectives. Reactant/Resistance Tendencies Several investigations have explored the predictive role of patient resistance to psychosocial interventions in selecting therapy procedures. These studies vary in the degree to which they consider resistance to be a statelike or traitlike variable (e.g., Beutler, Engle, et al., 1991; Miller, Benefield, & Tonigan, 1993). Both aspects of resistance must be considered, however. Traitlike resistance has been particularly promising as both an indicator of poor prognosis and as a mediator of differential treatment response (Arkowitz, 1991). For example, Khavin (1985) compared the psychological characteristics of 50 young adult

< previous page

page_91

next page >

< previous page

page_92

next page > Page 92

males who were being treated for stuttering with psychosocial interventions and concluded that resistance-prone patients did poorly in all forms of intervention. The Ego Strength (ES) subscale of the MMPI (Barron, 1953) and the Therapeutic Reactance scale (TRS; Dowd, Milne, & Wise, 1991) were designed specifically as traitlike measures of prognosis to be used to predict resistance to treatment. Although the success of single-scale MMPI indices has been mixed (Graham, 1987), the TRS scale is better founded in theory and holds considerable promise for predicting differential response to directive and nondirective therapies (e.g., Dowd, Wallbrown, Sanders, & Yesenosky, 1994; Horvath & Goheen, 1990; Hunsley, 1993; Tracey, Ellickson, & Sherry, 1989). Most of the studies of this proposition involve patients with transitory and acute conditions, however. Empirical validation of this test among representative clinical populations is still needed. To broaden the applicability of the TRS and to define the correlates of resistance traits, Dowd and his colleagues (Dowd et al., 1994) inspected correlates of this test when compared to established measures of personality traits. For example, they regressed their own and a German-language measure of resistance traits (Fragebogen zur Messung der psychologischen Reactanz [Questionnaire for the Measurement of Psychological Reactance]; Merz, 1983) on scores from the California Psychological Inventory-Revised (CPI-R; Gough, 1987) scales. The results indicated that resistance-prone individuals were relatively less concerned about ''impression management" and relatively more likely to resist rules and social norms than people who had low resistance potential. Moreover, high trait-reactant individuals preferred work settings that allow them to exercise personal freedom and initiative. These findings suggest that high resistant individuals would respond poorly to therapies that are highly therapistcontrolled and directive (Beutler, 1983, 1991; Shoham-Salomon & Hannah, 1991). Several researchers have extended this hypothesis to postulate that high resistance-prone individuals would respond to paradoxical interventions that capitalize on their oppositional tendencies (e.g., Shoham-Salomon, Avner, & Neeman, 1989; Swoboda, Dowd, & Wise, 1990). Horvath and Goheen (1990) supported this hypothesis, finding that clients whose TRS scores indicated high levels of traitlike resistance responded well to a paradoxical intervention, maintaining their improvements beyond the period of active treatments. Less reactant clients exposed to the same treatment deteriorated after active treatment stopped. The reverse pattern was found among those treated with a nonparadoxical, stimulus control intervention. A prospective test of the hypothesis that clients who varied on measures of resistance potential would respond in an opposite way to directive and nondirective therapies was undertaken by Beutler, Engle et al. (1991), using a combination of MMPI subscales as an index of resistance. They demonstrated that manualized therapies differing in level of therapist directiveness were differentially effective for reducing depressive symptoms. Among high resistance-prone depressed subjects, the nondirective therapy surpassed the directive ones in effecting change in depressive symptoms, but the reverse was true among low resistant patients. This result was cross-validated at a 1-year follow-up of depression severity and relapse (Beutler, Machado, Engle, & Mohr, 1993), and also was extended to a cross-cultural sample of several alternative measures of resistance (Beutler, Mohr, Grawe, Engle, & MacDonald, 1991). In a study of brief directive and confrontational motivational interviews in treatment of problem drinkers, Miller and his colleagues (Miller et al., 1993) concluded that client in-therapy resistance was associated with poor outcomes at the 1-year follow-up. However, failure to measure traitlike resistance potential makes it difficult to say whether the pattern could have been altered by adjusting the nature of the intervention.

< previous page

page_92

next page >

< previous page

page_93

next page > Page 93

Beutler, Sandowicz, Fisher, and Albanese (1996) reviewed a variety of measures of patient state and traitlike resistance, along with associated evidence of their value in making differential treatment decisions. They concluded that the evidence strongly supports the role both of resistance traits and states, as measured by a variety of instruments, as contra-indicators for the use of directive interventions among psychotherapy patients. They also concluded that the evidence supports the value of differentially applying directive interventions to the treatment of low resistant and self-directed procedures to the treatment of high resistant patients. Social Support The level of social and interpersonal support from others has also been widely postulated as a predictor of therapeutic outcome and maintenance. Numerous studies have found that the presence of social support can improve outcomes in psychotherapy and decrease the likelihood of relapse (e.g., Sherbourne, Hays, & Wells, 1995; Vallejo, Gasto, Catalan, Bulbena, & Menchan, 1991). However, a close inspection of this literature suggests that some methods of measuring social support are better than others in making this prediction. Measures of social support either rely on external evidence of resource availability (objective support), such as proximity of family members, marriage, social network participation, and so on (e.g., Ellicott, Hammen, Gitlin, G. Brown, & Jamison, 1990), or on self-reports (e.g., R. H. Moos & B. S. Moos, 1986) by patients themselves (subjective support). These two methods of measurement play different roles in predicting response to treatment. For example, using the Family Environment scale (FES; R. H. Moos & B. S. Moos, 1986), R. H. Moos (1990) found that the proximal availability of at least one objectively identified confidant and family support member and the level of satisfaction derived from these relationships, each significantly and independently increased the likelihood of improvement among depressed patients. In comparing the predictive power of subjective and objective measures of support, Hooley and Teasdale (1989) found that the impact of subjective social support exceeded the impact of objective measures in the treatment of depressed patients. For example, the quality of the marital relationship, rather than its presence, predicted relapse rates. And, not all indices of marital quality are of equal importance. In fact, the level of perceived personal criticism from spouses accounted for more of the variance in relapse rates than did the presence of less personal marital conflict. The relative importance of subjective social support as a predictor of outcome also found support in a study by Hoencamp et al. (1994). Perceived lack of family support was found to have a significant negative association with treatment outcome with moderately depressed outpatients. This study made the surprising finding that the quality of perceived contact with children was negatively related to outcome, however. Those patients who reported having a poor relationship with their children improved more than those who sought and felt support from their children. This is an interesting finding but one whose interpretation is still uncertain. Clinicians should not prematurely reject the role of objective social support. Billings and R. H. Moos (1984) investigated the relation between the availability of social support networks and the chronicity and impairment associated with depression. Although both chronic and nonchronic patients reported having fewer available social resources than nondepressed controls, only among patients with nonchronic depression was the severity of the problem related to the availability of social resources. These findings raise some

< previous page

page_93

next page >

< previous page

page_94

next page > Page 94

interesting questions about the role of social support in the etiology of chronically depressed individuals, and about the possibility of a differential effect of activating support systems within the treatment of depression along the continuum of chronicity. Pursuing this point further, one of the most interesting aspects of social support may be its potential as an index for differential assignment of intensive and short-term treatments. Moos (1990), for example, found that social support availability was related to the optimal duration of treatment, the level serving as either an indicator or a contraindicator for the application of long-term treatment. Depressed patients who lacked social support continued to improve as a direct function of the number of weeks of treatment, whereas those patients who had satisfying support systems achieved an asymptotic level of benefit early in treatment and failed to benefit from continuing treatment. Interestingly, these latter patients were at risk for deterioration during long-term therapy. Another line of research in the domain of social support suggests that the way patients use the resources available to them may also be important, independent of the availability of these resources. Longabaugh and his colleagues (Longabaugh, Beattie, Noel, Stout, & Malloy, 1993) compared conventional measures of objective or subjective social support to the patient's level of social investmentthe effort expended in maintaining involvement with others. Social investment was measured by assessing both the amount of time a person spent close to another person (an aspect of objective support), and the patient's subjective perception of the quality of that relationship. Social investment, thus, incorporated concepts relating both to objective and subjective social support. Longabaugh et al. compared the independent roles of social support and social investment both in prognosis and differential response to psychotherapies. They found that both of these concepts predicted a differential response to relationship-enhancement and cognitive-behavioral therapies, but social investment had a more central and pervasive mediating role in this process. Among those who experienced little satisfying support from others, cognitive-behavioral therapy was more effective than relationship-enhancement therapy. This effect was not apparent among those who felt supported by others. However, these effects were partially ameliorated by the presence of high social investment. Among individuals who were judged to have high social investment, regardless of the level of available social support, relationship-enhancement therapy was more effective than cognitive therapy. A correspondent match between social investment and type of therapy also improved maintenance effects. Coping Styles People adopt characteristic ways of responding in times of distress. Coping styles embody a collection of both conscious and nonconscious behaviors that endure across situations and times (Butcher, 1990). These traitlike qualities go by various titles, but generally reflect qualities that vary from extroversion and impulsivity, on one hand, to self-constraint and emotional withdrawal, on the other (H. J. Eysenck, & S. B. G. Eysenck, 1969). This dimension of external to internal patterns of behavior can be assessed with omnibus personality measures, including such instruments at the Eysenck Personality Inventory (EPI; H. Eysenck & S. B. G. Eysenck, 1964), the MMPI, the California Personality Inventory (CPI), the NEO Personality Inventory (NEO-PI; Costa & McCrae, 1985); and the Millon Clinical Multiaxial Inventory-III (MCMI-III; Millon, 1994).

< previous page

page_94

next page >

< previous page

page_95

next page > Page 95

The CPI and MMPI have been most often used in the study of differential response to psychotherapy. Research of this type suggests that the effects of behaviorally and insight-oriented psychotherapies are differentially moderated by patient coping style, ranging from externalized and impulsive to internalized and seclusive. For example, in a well-controlled study of interpersonal and behavioral therapies, Kadden, Cooney, Getter, and Litt (1989) determined that alcoholic subjects who were high and low on the CPI Socialization subscale, a measure of sociopathic impulsivity, were predictive of the response to treatments that were based on behavioral and interpersonal insight models, respectively. Continued improvement over a 2-year follow-up period was also found to be greatest among compatibly matched, client-therapy dyads (Cooney, Kadden, Litt, & Getter, 1991). Other studies have also confirmed this relation and expand the role of patient coping style as a predictor of differential response to various psychotherapies. For example, Beutler and his colleagues (e.g., Beutler, Engle et al., 1991) found that depressed patients who scored high on the MMPI externalization subscales responded better to cognitive-behavioral treatment than to insight-oriented therapies, and the reverse was found with patients who scored low on this dimension. Both Beutler and Mitchell (1981) and Calvert, Beutler, and Crago (1988) found a similar pattern among mixed psychiatric in- and outpatients using the MMPI. Similarly, Longabaugh et al. (1994) found that alcoholics who were characterized as being impulsive and aggressive (externalizing behaviors) drank less frequently and with less intensity after receiving cognitive-behavioral treatment than after receiving relationship-enhancement therapy. The reverse was found with alcoholic clients who did not have these traits. Similarly, Barber and Muenz (1996) found cognitive therapy to be more effective than interpersonal therapy among patients who employed direct avoidance (externalization) as a coping mechanism, and interpersonal therapy was more effective among obsessively constricted (internalization) patients. Barber and Muenz noted the similarity to the findings of Beutler et al. and advanced an interpretation based on the theory of oppositesindividuals respond to interventions that are counter to their behavior and thereby undermine their own customary styles: For avoidant clients, cognitive therapy pushes clients to confront anxiety-provoking situations through homework and specific instructions, whereas obsessive clients who tend toward rigidity and intellectualization are encouraged to depart from these defenses by advancing interpersonal, insight-oriented interpretations. This latter interpretation has received some indirect support in studies of patient preferences for treatment type. Tasca and associates (Tasca, Russell, & Busby, 1994) found that externalizers preferred a process-oriented psychodynamic group over a structured activity-oriented group when allowed to make a choice, whereas internalizers preferred a cognitive-behavioral intervention. In each case, the therapy that is suggested as least effective by other research was preferred, perhaps because it posed less threat to the clients' normal defenses. Further research is called for on this interesting paradox, however. Combinations of Matching Dimensions Theoretical literature suggests that matching patients to treatments by utilizing a number of dimensions at once may enhance outcome more than by using any single dimension (Beutler & Clarkin, 1990). There is some evidence to support this suggestion, and this

< previous page

page_95

next page >

< previous page

page_96

next page > Page 96

literature suggests that various matching dimensions may add independent predictive power to treatment outcomes. The research program is designed to test the independent and collective contributions of several of the matching dimensions discussed in this chapter across samples of patients with depression and substance abuse. Both individual and joint effects of various matching dimensions have been revealed in an initial study that included co-habitating couples, one of whom was a problem drinker (Karno, 1997). The Couples Alcoholism Treatment (CAT) program employed an attribute x treatment interaction (ATI) design, manualized cognitive therapy (CT; Wakefield, Williams, Yost, & Patterson, 1996) and family systems therapy (FST; Rohrbaugh, Shoham, Spungen, & Steinglass, 1995), and it carefully monitored outcomes to test the mediating roles of patient variables on response to treatments. The separate and combined effects of matching four client characteristics to corresponding aspects of treatment were assessed: level of functional impairment corresponding with the number and frequency of treatment sessions, level of patient initial subjective distress corresponding with therapist focus on increasing or decreasing level of arousal, level of patient traitlike resistance corresponding to level of therapist directiveness, and patient coping style corresponding with the relative therapeutic focus on symptom change or insight. For the major analysis, the model of psychotherapy was ignored in favor of looking more specifically at actual in-therapy behaviors of therapists using the various models. It was reasoned from prior research that various models overlap in the particular procedures and therapeutic styles represented. Thus, the distinctiveness of the model was considered to be less sensitive to differences among therapist behaviors than direct observations (Beutler, Machado, & Neufeldt, 1994). Thus, all four matching variables were studied by measuring patient variables before treatment and directly observing the nature of therapy process. Patient variables were assessed by using standardized tests (e.g., MMPI, BSI, etc.), as suggested in the foregoing sections. Ratio scores reflecting the amount of a given therapeutic activity relative to a corresponding patient quality (e.g., amount of directiveness per patient nondefensiveness, amount of behavioral focus relative to patient externalization, amount of emotional focus relative to patient initial distress level, intensity of treatment relative to level of functional impairment, etc.) were used to assess the degree of correspondence between each patient and treatment dimension. Hierarchical linear modeling was used to assess the contributions of each patient and therapy dimension separately, and for the four matching dimensions. Effects were assessed and modeled over time (20 treatment sessions and 1-year follow-up). In order to ensure a wide distribution of therapeutic procedures, the two treatments were designed to differ in two dimensions. CT was symptom focused and FST was system focused; CT was therapist directed and FST was patient directed. By chance, they also differed in intensity or concentration (the number of weeks required to complete planned weekly and biweekly sessions), with the 20 sessions of FST taking longer to complete than the 20 sessions of CT. Individual therapists also differed, both within and between treatments, in their levels of directiveness, the application of insight-oriented procedures, and their success in raising patient emotions and arousal. The sample consisted of 62 male and 12 female alcoholics and their partners. The outcome measures reflected the substance abuse status and general psychiatric functioning of the identified alcoholic patients. All identified patients were alcohol dependent, most (85%) were European American, and they had been partnered for an average of 8.3 years. Nearly half of the patients used illicit drugs in addition to being dependent on alcohol.

< previous page

page_96

next page >

< previous page

page_97

next page > Page 97

The growth curve modeling procedure revealed a relatively steady decline of symptoms throughout treatment, independent of treatment type or level of fit between patient and treatment qualities. Estimated abstinence rates were quite low, but the mean rates were consistent with other research on substance abuse treatment (Bellack & Hersen, 1990). At termination, abstinence rates were 42.9% and 37.5% for clients who had received CT and FST, respectively. At follow-up, the rates were 39.3% and 29.7%. An even lower rate of change was noted on general symptoms, independent of alcoholic symptoms. These low average rates of change, along with wide variations of outcomes within treatments, were not unexpected and precisely underline the need to match patients with those treatments for which they are best suited. Analyses of the independent effects of patient and therapy variables revealed that level of patient distress and amount of patient impulsivity were inhibitors of treatment benefit, whereas treatment intensity and level of behavioral/symptom orientation both were associated with the level of improvment achieved. In all instances, there were wide differences from patient to patient indicating the presence of patient variables that were selectively determining treatment response. The analysis of the matching dimensions revealed that by the 6month posttreatment, three of the four matching dimensions studied proved to be related to desirable changes in alcohol usage. Specifically: 1. The correspondent match of level of initial severity (functional impairment) and level of care (average time to complete 20 planned sessions) predicted improvement in substance abuse. Patients whose level of care corresponded with the amount of impairment in functioning (high functioning with low intensive therapies and low functioning with high intensive procedures) tended to show more alcohol-related improvements than those who did not correspond on this patient-treatment dimension. 2. The match between patient resistance and treatment directiveness predicted change in alcohol use. Resistant patients who received nondirective interventions and nonresistant patients who received directive interventions reduced consumption and abuse more than those patients who were mismatched to level of therapist directiveness. 3. When patients were separated into two groupsabstinent and nonabstinentthe relation between therapist activation of affect and patient initial distress emerged as a significant predictor. Patients with low levels of distress treated with emotional activating procedures and those whose high level of distress was treated with emotional reduction procedures were more likely to benefit than their mismatched counterparts. Collectively, the matching dimensions alone accounted for 76% of the variance in alcohol-related changes. Therapist directiveness was also a positive predictor of treatment benefit, independent of its fit with patient qualities. Likewise, low levels of patient impairment are a predictor of positive change. By adding these two independent effects to the equation, it was possible to account for 82% of the variance in outcome, which is an astonishingly high rate of prediction. In comparison to changes in alcohol abuse, general changes in psychiatric functioning were not as efficiently predicted. This is not to say that the patient, treatment, and matching variables were unimportant, however. A patient (impulsivity), two treatment (arousal induction and symptom-focused interventions), and a matching dimension (distress X stress-reduction procedures) accounted for nearly 50% of the variance in outcomes. Improvement was significantly but negatively related to initial patient impulsivity/externalization, and positively to the use both of arousal induction procedures and symptom-focused interventions. At the same time, the level of initial patient distress and the corresponding amount of emphasis on procedures that reduced or raised distress predicted improved psychiatric functioning.

< previous page

page_97

next page >

< previous page

page_98

next page > Page 98

Taken collectively, the foregoing results provide support for the conclusion that patient distress and impairment, coping style, and resistance behaviors are implicated in predictable ways in the selection of psychotherapeutic strategies for treating either depression, substance abuse, or both. Patient coping style, therapist emotional focus, and treatment intensity and support are all qualities that can be identified as valuable regardless of the nature of patients and their problems. An ideal treatment directly focuses on particularly disruptive social symptoms, such as those involved in drug abuse or acting out behaviors; attempts to enhance emotional arousal and processing; adapts the level of emotional focus to the level of patient subjective distress; adapts the symptomatic versus insight/awareness focus of treatment to the level of patient externalization; and adapts the level of confrontation and directiveness to patient level of resistant traits. Selection of Appropriate Instruments The selection of appropriate instruments to measure the patient's presenting symptoms, personality traits, and transitional states is an important concern for the clinician. In each of the previous sections, representative instruments have been presented whose use has been supported empirically. The clinician must keep in mind several important considerations when selecting and using instruments for treatment planning purposes (see Cattell & Johnson, 1986; Goldstein & Hersen, 1990). First, the clinician must select instruments that together measure a variety of dimensions, such as those presented in Table 3.1, which have been selected because of their potential importance in making the required treatment decisions. Some instruments measure more than one dimension, but few measure all of the qualities recommended here. Even if they did, one must also consider the advantages and costs of including multiple instruments, some of which reflect different points of view or embody different perspectives and viewpoints. Both observer ratings and self-report measures should be used whenever possible in order to prevent the unchecked influence of a single perspective from biasing the results. Clinicians require focused tools that are specific to the task of assessing relevant patient qualities that can be used to guide treatment decisions. Hayes, Nelson, and Jarrett (1987) argued that "the role of clinical assessment in treatment utility has been buried by conceptual confusion, poorly articulated methods, and inappropriate linkage to structural psychometric criteria" (p. 973). A clinician-based, research-informed method of identifying traits and relatively enduring states of patients is being developed that will allow clinicians to select psychotherapeutic strategies that best fit different patients. This effort has been based on the method of Systematic Treatment Selection (STS) originally outlined by Beutler and Clarkin (1990) and revised by Gaw and Beutler (1995) and by Beutler, Consoli, and Williams (1995). In its final form, the STS software program2 will help clinicians develop empirically based and validated treatment plans. Clinicians will enter patient information obtained through their usual assessment procedures using an interactive computer interface. The source of this information is designed to be flexible, relying on clinical observations as well as including the option of supplementing these observations with results from 2 The STS Treatment Planning software will be distributed by New Standards, Inc.

< previous page

page_98

next page >

< previous page

page_99

next page > Page 99

standardized psychological tests. Output will include an (editable) intake report; a proposed treatment program that includes recommendations about level of care, treatment intensity, format and modality, medical considerations, risk assessment, and a variety of appropriate research-based treatment packages. It will also allow a clinician to project the course and length of treatment and, through subsequent patient monitoring, the projections will reveal the degree to which the patient's gains are in the limits expected within a given clinic. The program will flag nonresponders for further treatment refinement. Clinician profiling, therapist selection, and problem charting are additional features that will be present in the stand-alone version of the program. Although standardized instruments are available for assessing the various patient dimensions identified by the STS model, these instruments frequently are unnecessarily long and provide a good deal of superfluous information. Hence, a single instrument whose subscales are designed to reveal treatment-relevant characteristics promises to be more time efficient than those conventionally used in patient diagnostic assessment. The concluding section of this chapter describes a few of the most promising dimensions in the STS model, and describes the development and psychometric properties of this assessment procedure, the STS Clinician Rating Form, as applied to the following dimensions that have been reviewed previously: Functional Impairment, Subjective Distress, Problem Complexity, Resistance Potential, Social Support, and Coping Style. Methods Patient Participants. Participants in this study (Fisher, Beutler, & Williams, in press) included both patients and clinicians. Two archival and one prospective patient samples were utilized in developing the measure. Sample 1 was the main, prospective sample, with archival Samples 2 and 3 being used to increase sample size and generalization for the reliability and construct validity assessments of the STS Clinician Rating Form. The various intake procedures used in the separate samples were given to trained clinicians and provided the basis for the completion of the STS Clinician Rating Form (STS). The STS ratings allowed the discrepant information and tests derived from three different samples of patients to be translated into a common set of treatment-relevant dimensions. Sample 1 was comprised of ambulatory outpatients who presented with nonsubstance abuse primary diagnoses, average intellectual ability, and who had the ability to read at a sixth-grade level or more. Patients were diagnosed as having major depression (37%), dysthymia (37%), anxiety disorders (8%), or transient situational disturbances and personality disorders (18%). An initial sample of 48 individuals were screened and 46 elected to participate. The participants were largely Caucasian (84%) or Latino (11%), young adults (age = 34.55 years, SD = 11.71), and female (31 females, 15 males). Sample 2 consisted of 105 individuals entering the CAT project, the previously discussed, federally funded study on the treatment of alcoholism (Beutler, Patterson, et al., 1993). The participants from this sample comprised those who initially underwent intake evaluation and were identified as having substance abuse or substance dependent diagnoses. The 90 male participants initially assessed for inclusion had an average age of 37.78 years (SD = 8.81), and the 15 females had an average age of 40.00 years (SD = 7.06). Eighty-two percent of the participants were Caucasian.

< previous page

page_99

next page >

< previous page

page_100

next page > Page 100

Sample 3 consisted of 63 individuals who were reliably diagnosed as having a major depressive disorder (Beutler, Engle et al., 1991). These individuals were recruited and treated as part of a federally funded, randomized clinical trial study of cognitive, experiential, and self-directed therapies. Referred individuals were screened by telephone and then assessed by an independent clinician and subjected to a variety of standardized interviews and tests to assure compliance with depressive diagnostic and severity criteria. There were 22 male and 41 female participants, ranging in age from 22 to 76 years. They averaged 48.77 (SD = 14.95) and 45.41 (SD = 45.41) years of age, respectively. The sample was dominantly Caucasian (92%). Clinician Raters. Experienced (over 5 years) professional psychologists who were affiliated with the Psychotherapy Research Program at the University of California, Santa Barbara, were recruited and paid to complete the STS Clinician Rating Form on the three samples. They included three females and one male: One was Asian American and one was African American; two were self-employed in full-time clinical or consultive practice; one was a consultant to educational institutions; one was engaged in full-time clinical research; and one directed the outpatient clinic from which Sample 1 was obtained. Clinicians were trained to ensure a common level of familiarity with the various instruments and concepts that served as the basis for rating patient variables. Clinicians were first given a detailed description of the patient variables that were assessed in the STS Clinician Rating Form, and questions about the various constructs measured were discussed and answered. Readings were provided as necessary to supplement training. Specific training on the STS Clinician Rating Form proceeded by having clinicians complete ratings on various (nonstudy) patients, drawing from intake interviews (videotapes), whatever psychological tests were available, and clinical notes. They discussed the ratings and then rerated the tapes until they were able to produce criteria levels of initial agreement (k > .70) on one of the samples. At various points, the clinician ratings were compared to expert-derived criteria to ensure the achievement of accuracy. Patient Variables. The reliability phase of the study was based on Sample 1 data and focused on clinician responses to the STS Clinician Rating Form. To extend generalization, the construct validity phase used the additional samples, to the degree that they included data that was translatable to the current study. No satisfactory, standardized measures of level of social support, complexity, and level of functional impairment were available in the various samples, so these constructs were not assessed except from the STS Clinician Rating Form. Thus, information is available on reliability and expert-criteria validity, but not for other aspects of validity for these dimensions. The patient variables included in the construct validity phase included subjective distress, coping style (internal and external dimensions), and trait resistance. These three dimensions were assessed in two ways. Criterion measures of the constructs from standardized, selfreport tests were contrasted with ratings on the STS Clinician Rating Form as an evaluation of the construct validity of the latter instrument. Psychological Test Measures. The selection of criteria scores for the patient dimensions was constrained by the need to use instruments that were used in the various samples at the time of intake. Two of the three samples (Samples 1 and 2) included a variety of specific subscales from the MMPI-2, and these samples played a part in the discriminant validity phase of this study. Because of this same constraint, potentially important measures like the State-Trait Anxiety Inventory (STAI) were not available

< previous page

page_100

next page >

< previous page

page_101

next page > Page 101

as measures of subjective distress. The particular tests and scores used to operationalize the dimensions of subjective distress, coping style, and resistance were comprised of composite, self-report scores in order to obtain more stable and representative measures than could be provided by a single test or subtest score. Subjective distress was indexed by two different measures, reflecting respectively, statelike and traitlike qualities. The Pt subscale from the MMPI-2 (Butcher, 1990; Graham, 1993) was extracted as one measure of subjective distress. A second measure was based on the Beck Depression Inventory (Beck et al., 1961). Coping style was represented by two separate composite indicators, reflecting the separate dimensions of Externalization and Internalization. Both dimensions were extracted from the MMPI and MMPI-2 (Graham, 1993). These two dimensions of coping were used to separately assess the construct validity of comparable STS scales and then both were combined as a ratio to index a relative measure of Externalization as originally suggested by Welsh (1952). The mean elevations of four separate MMPI scales were used to index each dimension, based on prior research with this instrument (Beutler, Engle et al., 1991; Beutler & Mitchell, 1981; Calvert et al., 1988; Welsh, 1952). Externalization was indexed by a combination of Scales 3 (Hy), 4 (Pd), 6 (Pa), and 9 (Ma). Internalization was indexed by a combination of Scales 1 (Hs), 2 (D), 7 (Pt), and 0 (Si). A single measure of coping style was constructed, following the rationale of Welsh (1952) by constructing a ratio of Externalization to Internalization scores. Resistance traits have been the most difficult of the dimensions to measure (Beutler, Sandowicz et al., 1996). To reflect this complex dimension, a composite scale was constructed from the MMPI/MMPI-2. Through consultation with experts in the field and reviews of prior research, scales were identified that conceptually reflected aspects of patient resistance. An intercorrelation of many of these scales, using the MMPI-2 normative sample, allowed for the identification of several that conceptually reflected a common dimension. These subscales were moderately correlated with the Therapeutic Reactance scale (TRS; Dowd et al., 1991), earning correlations ranging from .31 to .64 (M = .53; all ps < .05). The selected subscales were subjected to a factor analysis, using the combined samples, and a composite factor that included the following scales was obtained: Cn (Control), Do (Dominance, weighted inversely), and Pa (Paranoia). The mean of these scales was then used as a final measure of resistance traits in the analysis of construct validity. STS Clinician Rating Form. The STS Clinician Rating Form was developed in a series of steps, beginning with a compilation of items that appeared to relate to the various targeted dimensions. The initial pool of 260 items was reduced to 226 by visual inspection, eliminating obvious overlaps and duplications. Dimensions assessed by the rating form included the patient's degree of Functional Impairment, Depression, Primary Problem/Disorder, Area of Social Impairment (e.g., family, partner, work, legal), Level of Social Support, Self-reported Distress, Clinician-rated Distress, Self-esteem, Externalization, Internalization, and Traitlike Level of Resistance. The items on these scales require clinician ratings based on all information available to them. The STS was designed to allow the clinician to subjectively weight all the information available at the time of intake to make summary ratings on a common set of scales. For the current study, clinicians were provided with a videotape of an early interview with the patient, clinical notes from the intake clinician, and whatever intake tests were part of the protocol governing the collection of data for that sample. For

< previous page

page_101

next page >

< previous page

page_102

next page > Page 102

those portions of the criteria validity assessment that did not include relations with psychological tests, all three samples were used (N = 216). The STS items were presented in a checklist format and clinicians were required to provide a dichotomous rating (present-not present) on each of the 226 initial items. Procedures. The procedures for addressing the questions posed in this study are described in sequence. Interrater Reliability of STS. Interrater reliability assessment took place in two stages using Sample 1 data. In the first stage, raters were compared to a sample of cases on which a criteria-level standard of accuracy had been established. In the second stage, clinician raters were paired randomly with one another and interrater reliabilities were computed on all pairs. In both cases, clinicians were independently provided with the following information and asked to review it before being presented with the STS Clinician Rating Form to complete: intake notes by the original intake clinician, a history of the patient's problem, the intake psychological test data, and a videotape of the intake session. Both the overall reliability of the STS Clinician Rating Form and the reliability of the specific subscales were computed. Sample 1 cases were assigned to rater pairs to ensure that each clinician rated at least 10 cases and all possible pairs were represented. In addition to making independent ratings, the clinician pairs also met and made consensual ratings for each patient on the STS Clinician Rating Form items. Construct Validity. To assess discriminant and convergent validity, Samples 1 and 2 were collapsed. STS Clinician Rating Forms were completed on all patients after the clinician reviewed the intake materials described in the previous paragraphs. The material supplied to the raters who assessed patients in Sample 2 was similar to that used for rating patients in Sample 1, varying only in the specific psychological tests employed by the varying research protocols. The intake material included a history of the patient and problem, a battery of intake tests used in the original research protocol, and a videotape of the patient's intake evaluation. The summary scores from the STS Clinician Rating Formrepresenting Subjective Distress, Internalization, Externalization, and Resistance traitswere compared both against each other (discriminant validity) and against the same constructs derived from standardized self-report tests (convergent validity). For these analyses, the STS Clinician Rating Form scores from a primary rater for Sample 2 and the consensual ratings derived from Sample 1 were used. In Samples 2 and 3, individual clinicians rated each sample, with a randomly selected 20% of cases being double rated to check reliabilities. Results Reliability Assessment. To indicate level of reliability, three types of concordance agreements were calculated: overall agreement of each possible rater pair, agreement of each rater with the sum of all other raters, and specific levels of interrater agreement for each of the three dimensions of interest to this study. These same data were used to calculate the agreement-level-rater-level concordance estimates. The mean coefficient of agreement of each rater with every other rater served

< previous page

page_102

next page >

< previous page

page_103

next page > Page 103

this purpose. The reliabilities of the separate STS subscales were also separately computed to derive an index of scale reliabilities. Overall interrater agreement was computed on all subscales of the STS Clinician Rating Form using Sample 1 data. The calculations reflected the mean degree of agreement across all rater pairs. The mean interrater concordance (K) coefficients ranged from .79 (Functional Impairment) to .99 (Presence of Eating Disorder), with an average coefficient of concordance of .84. Rater-level agreement was computed for each of the five individual raters, averaging across their pairings with each of the other raters. Based on a sample of 15 pairings for each rater, the mean coefficients of concordance ranged from .80 to .89. Specific levels of agreement were assessed by comparing raters against one another, again relying on Sample 1 data. The mean levels of interrater agreement were: .79 (Functional Impairment), .82 (Subjective Distress), .83 (Complexity), .80 (Resistance), .75 (Social Support), .86 (Internalization), and .86 (Externalization). Construct Validity. Criterion validity of the entire STS Clinician Rating Form was assessed in Sample 1 by comparing ratings of clinicians to an ''expert" standard. In addition, three other comparisons were initiated in order to determine the construct validity of four STS Clinician Rating Form dimensions (Subjective Distress, Externalization, Internalization, Resistance Traits): agreement with expert criteria ratings, convergent validity, discriminant validity, and concurrent validity. Expert Criteria Validity. Overall expert criteria agreement was calculated by averaging the concordance estimate of each rater, with two randomly selected "expert-rated" cases from Sample 1. Twelve cases were used as criteria samples for checking rater accuracy. The criteria of accuracy was consensual ratings from two expert raters who had the greatest familiarity with the cases. The mean, individual concordance estimates (K) with these criteria ranged from .69 to .80 (two raters). The overall mean concordance coefficient across the 260 STS items was .77, indicating a satisfactory level of criterion agreement across clinician raters. When these criteria ratings were applied to the seven focal scales, the following mean K values were obtained: .75 (Functional Impairment), .84 (Subjective Distress), .69 (Complexity), .83 (Resistance Potential), .76 (Social Support), .85 (Internalization), and .86 (Externalization). Convergent Validity. The test of convergent validity proceeded in two steps. The first step constituted a correlation between each item and a summary score for each dimension, drawing from all three samples. These data were used to reduce the items used in each scale. Items that did not correlate significantly (p £ .01) with the summary criteria were eliminated. Based on this step, the subjective distress subscale was reduced from 40 to 29 items; the Externalization subscale was reduced to 21 items; the Internalization subscale was reduced to 12 items; and the Resistance Traits subscale was reduced to 24 items. This refined item listing was used in all subsequent validational steps of the current project. The next step consisted of a comparison of the refined STS scales to the psychological test criteria. In this step, only Samples 1 and 2 were used because the four dimensions on which external validity was assessed were available. A series of Pearson product-moment correlations were computed between these refined STS dimensions and the independently derived criteria from standardized psychological tests of the same dimensions. The correlations are reported in Table 3.2.

< previous page

page_103

next page >

< previous page

page_104

next page > Page 104

TABLE 3.2 Intercorrelations of STS and Psychological Test Constructs Test Dimensions STS Ratings Subjective Subjective Externalization InternalizationResistance Distress Distress (MMPI) (MMPI) Traits (MMPI) (BDI) Subjective .63** .65** .42** .64** .47** Distress (STS) n = 89 n = 90 n = 89 n = 89 n = 81 Externalization -.07 -.06 .35** -.18 .21 (STS) n = 89 n = 90 n = 89 n = 81 n = 81 Internalization .34** .36** -.11 .42** .20 (STS) n = 89 n = 90 n = 89 n = 89 n = 81 Resistance .09 .07 .33* .08 .43** (STS) n = 89 n = 90 n = 89 n = 89 n = 81 *p < .01 **p < .001 The results of these analyses were mixed. The STS clinician rating of subjective distress correlated (p < .001) at the highest levels with the external criteria (r = .63 and .65 with Pt and BDI, respectively). The correlations of the other STS dimensions with relevant external self-report criteria were all significant, but weaker. They varied from .35 for the correspondence between STS and MMPI indices of Externalization to .43 for correspondent measures of resistance traits. As a further check of resistance traits, composite ratios of external to internal coping styles were constructed for both the STS and the MMPI. These two summary measures of relative coping style correlated at level of .46 (p < .001). Discriminant Validity. Another aspect of construct validity is the determination of how the various subscales and measures relate to one another. Discriminant validity requires that the three constructs are both relatively independent and reveal a prescribed pattern of relation with one another. Specifically, it was expected that the two coping style dimensions (Internalization and Externalization) would be significantly and negatively correlated; Subjective Distress was expected to be moderately correlated with Internalization but not Externalization; and Resistance Traits were expected to be moderately correlated with Externalization but not with either Internalization or Subjective Distress. The expected relations were obtained as revealed in Table 3.3. Internalization and Externalization were negatively correlated at a moderate level (r = -.44); Subjective Distress was correlated with Internalization (r = .48) but not with Externalization (r = -.03); and Resistance Traits were highly correlated with Externalization (r = .70) but only modestly with the other dimensions (r = .21 and -.26). Thus, the pattern of intercorrelations supported the discriminant validity of the three dimensions (collapsing Internalization and Externalization). A second test of discriminant validity cross-matched the STS dimensions with the external, psychological tests. The same pattern of relations should be revealed as found when the STS dimensions are intercorrelated with one another. A look back to Table 3.2 reveals that these patterns were in evidence, though they are not as striking as the patterns based on the internal correlations of the STS dimensions. Specifically, (a) STS Internalization and Externalization were correlated in a negative direction (r = -.18) as

< previous page

page_104

next page >

< previous page

page_105

next page > Page 105

TABLE 3.3 Intercorrelation of STS Dimensions Dimension InternalizationExternalization Resistance Traits Subjective .48** -.03 .21* Distress n = 93 n = 93 n = 93 Internalization -.44** -.26 n = 204 n = 204 Externalization .70** n = 141 *p < .01 **p < .001 expected; (b) STS subjective distress was quite highly correlated with MMPI Internalization (r = .64), but it was also correlated in a positive direction with MMPI Externalization (r = .42); and (c) STS resistance traits were correlated with MMPI externality but not with either MMPI indicators of distress or internalization. Concurrent Validity. An additional test was undertaken to ensure the STS's construct validity using all three samples. The samples reflected major depression (Sample 3), mixed psychiatric patients (Sample 1), and alcoholics (Sample 2), thus certain differences in levels of the four dimensions were expected across samples to assure concurrent validity: (a) the homogeneous depressed sample (Sample 3) would have higher subjective distress scores than the other two groups, with the alcoholic sample (Sample 2) having the lowest; (b) the alcoholic sample (Sample 2) should have the highest level of Externalization scores and the homogeneous depressed sample (Sample 3) should earn the highest Internalization scores; and (c) the alcoholic sample (Sample 2) should also be distinguished by the relatively high levels of Resistance Traits, compared to the other groups. Means and standard deviations are reported in Table 3.4. A series (four) of one-way analyses of variance that compared the three samples on the five dimensions demonstrated the expected relations. Analyses of variance revealed a significant sample effect for all variables: (a) Subjective Distress, F(2, 203) = 20.16, p < .001; (b) Internalization, F(2, 203) = 12.38, p < .001; (c) Externalization, F(2, 203) = 26.36, p < .001; and (d) Resistance Traits, F(2, 203) = 13.33, p < .001. In all cases, a post hoc Tukey test revealed significant ps < .05, favoring the expected group differences. The foregoing results are promising and suggest the value of the STS treatment planning procedure and software. However, the predictive validity of the various patient dimensions is still under investigation, and at the time of this writing, a validation of the predictive algorithms for developing treatment plans is being completed. By the time TABLE 3.4 Means and Standard Deviations of Samples by STS Dimension Variable Sample 1 Sample 2 Sample 3 9.17 (6.37) Subjective Distress 10.68 (4.60) 13.55 (4.34) 5.01 (4.02) 8.46 (4.73) Internalization 6.85 (5.13) 5.19 (5.30) Externalization 5.89 (5.11) 12.73 (8.60) 7.43 (6.07) Resistance Traits 10.76 (7.03) 14.34 (9.25) Note. Because of nonavailability of all self-report measures, only Samples 1 and 2 were used in the computations for Subjective Distress.

< previous page

page_105

next page >

< previous page

page_106

next page > Page 106

of its release, a small normative database representing the three samples described in the foregoing and additional beta samples of specialized psychiatric patients will be available. This normative base of approximately 300 individuals will allow the intiatiation of clinical applications. The next step planned in the construction and validation of the STS is the development of an interactive database that will systematically modify the algorithms with each new patient entry to improve and refine the predictive efficiency of the program for each clinical setting. That is, with continued use in a given system, local normative data will supplement the general database as the source of predictive algorithms in order to allow increasing specificity in the predictions and treatment recommendations with each use. A feature will be included to allow the program to use local resources and norms to identify specific therapists whose success rates are highest for patients who represent different patterns of demography, functional impairment, distress, social support, coping style, and resistance potential. Conclusions Psychological tests have been used widely in the prediction of response to treatment. This chapter has summarized the status of research on some of the more promising of these dimensions and their associated measures. Seven dimensions appear to be promising for use in planning treatment: Functional Impairment, Subjective Distress, Readiness for (or stage) of Change, Problem Complexity, Resistance Potential or Inclination, Social Support Level, and Coping Style. Some of these dimensions, such as social support and coping style, have individual components that appear promising as well. To summarize, the following conclusions appear to be justified: 1. Functional Impairment. Impairment level serves as an index of progress in treatment, as well as a predictor of outcome. High impairment may also serve as a contra-indicator for the use of insight- and relationship-oriented psychotherapies. Very impairing symptoms seem to indicate the value of pharmacological interventions or problem-oriented approaches, and mild to moderate levels of impairment may be conducive to or predictive of a positive response to a variety of psychotherapy models. 2. Subjective Distress. Subjective distress is directly related to improvement among nonsomatic depressed and anxious patients. Distress is more complexly related to improvment among those with somatic complaints. Distress level may be a particular marker for the differential application of self-directed and traditional therapies, and may indicate the need to use procedures that either lower or raise distress to enhance patient motivation. High distress serves as a positive marker for self-directed treatment among nonsomatic patients, whereas low distress may serve such a function among somatically disturbed patients. 3. Readiness for Change. Contemporary studies on patient stages of change are very promising. The higher the stage or readiness level of the patient, the more positive the outcomes of treatment. Some limited evidence suggests that those at the precontemplative (preconceptual) and contemplative (conceptual) stages of change may be particularly suited for interventions that raise consciousness and facilitate self-exploration. 4. Problem Complexity. Both symptom-focused psychopharmacological interventions and symptomfocused psychological interventions may be indicated most clearly among patients whose conditions are acute, or are relatively uncomplicated by concomitant personality disorder and interpersonal conflict or dynamic conflicts associated with symbolic internal conflicts. At the same time, comparisons across treatment models suggest that treatment efficacy is enhanced when the breadth of the interventions used correspond with the complexity of the problem presented. 5. Resistance Potential. Traitlike resistance is a reasonably good predictor of the differential effects of directive and nondirective therapies. Therapist guidance, use of status, and control are

< previous page

page_106

next page >

< previous page

page_107

next page > Page 107

contraindicated among resistance-prone individuals. On the other hand, these patients respond quite well to paradoxical and nondirective interventions, whereas low resistance-prone patients respond well to directive interventions and therapist guidance. State reactions that suggest the presence of resistance also may be indicators for altering how material is presented within treatment sessions. 6. Social Support. Dimensions of objective social support, subjectively experienced support, and social investment have implications for treatment planning, even serving as indices and predictors of differential response to various psychosocial interventions. Some aspects of support even serve as contraindicators for long-term treatment. Patients with low levels of objective social support, or who feel unsupported by those around them, are candidates for long-term or intensive treatment. Their level of improvement corresponds with the intensity of treatment. However, those with good support systems do not respond well to intensive or long-term treatments. Their improvement appears to reach an asymptote and may even decline with continuing treatment. Level of social investment may outweigh actual support availability, however, and may be a mediator that increases the value of relationship-oriented psychotherapy over behavioral and symptomatically oriented treatments. 7. Coping Style. Patient level of impulsivity, or coping style, has consistently been found to be a differential predictor of the value of cognitive-behavioral and relationship, or insight-oriented, treatments. Comparatively, externalizing, impulsive patients respond better to behaviorally oriented therapies and constricted and introspective patients tend to respond better to insight- and relationship-oriented therapies. The types of tests used to assess these various dimensions often have implications for assessment of outcome itself. Measures of Functional Impairment, Subjective Distress, and Problem Complexity may be especially valuable for assessing outcome or predicting prognosis to treatment generally. In contrast, both statelike measures of resistance and readiness for change, and trait measures of resistance potential, complexity, and coping styles appear promising for selecting treatments that vary in directiveness and insight foci, respectively. The measures used to assess all of these patient dimensions are often drawn from omnibus personality measures and, in the case of coping style measures, may reflect complex processes that encompass both unconscious and conscious experience. Taken together, the research reported in this brief and selective review suggests that various combinations of dimensions allow discrimination among treatment variables and may point to directions in which the development and applications of treatments may evolve in clinical practice. Accordingly, this chapter has reported the initial development of the STS, a clinician-based measure that promises to tap a multitude of relevant treatment planning dimensions that can be used to plan a treatment with the potential to enhance the efficiency of treatment. This procedure is currently being developed as a computer-based assessment tool to complement the use of standardized tests. The initial data presented here is promising, but suggests that the information derived from clinicians may not be as sensitive as that based on patient self-reports. A combination of established tests and structured clinician judgments may be optimal for deriving the dimensions that can be used in treatment planning. References Ackerman, D. L., Greenland, S., Bystritsky, A., Morgenstern, H., & Katz, R. J. (1994). Predictors of treatment response in obsessive-compulsive disorder: Multivariate analyses from a multicenter trial of Clomipramine. Journal of Clinical Psychopharmacology, 14, 247-253. American Psychiatric Association (1983). Diagnostic and statistical manual of mental disorders (3rd ed.). Washington, DC: Author.

< previous page

page_107

next page >

< previous page

page_108

next page > Page 108

American Psychiatric Association (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Antonuccio, D. O., Danton, W. G., & DeNelsky, G. Y. (1995). Psychotherapy versus medication for depression: Challenging the conventional wisdom with data. Professional Psychology: Research and Practice, 26, 574-585. Arkowitz, H. (1991, August). Psychotherapy integration: Bringing psychotherapy back to psychology. Paper presented at the annual meeting of the American Psychological Association, San Francisco, CA. Barber, J. P. (1989). The Central Relationship Questionnaire (version 1.0). Unpublished manuscript, University of Pennsylvania, School of Medicine, Philadelphia. Barber, J. P., & Muenz, L. R. (1996). The role of avoidance and obsessiveness in matching patients to cognitive and interpersonal psychotherapy: Empirical findings from the treatment for depression collaborative research program. Journal of Consulting and Clinical Psychology, 64, 951-958. Barron, F. (1953). An ego strength scale which predicts response to psychotherapy. Journal of Consulting Psychology, 17, 327-333. Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-569. Beitman, B. D. (1987). The structure of individual psychotherapy. New York: Guilford. Bellack, A. S., & Hersen, M. (Eds.). (1990). Handbook of comparative treatments for adult disorders. New York: Wiley. Beutler, L. E. (1983). Eclectic psychotherapy: A systematic approach. New York: Pergamon. Beutler, L. E. (1991). Have all won and must all have prizes? Revisiting Luborsky, et al.'s verdict. Journal of Consulting and Clinical Psychology, 59, 226-232. Beutler, L. E., & Berren, M. (1995). Integrative assessment of adult personality. New York: Guilford. Beutler, L. E., & Clarkin, J. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. New York: Brunner/Mazel. Beutler, L. E., Consoli, A. J., & Williams, R. E. (1995). Integrative and eclectic therapies in practice. In B. Bongar & L. E. Beutler (Eds.), Comprehensive textbook of psychotherapy: Theory and practice (pp. 274-292). New York: Oxford University Press. Beutler, L. E., Engle, D., Mohr, D., Daldrup, R. J., Bergan, J., Meredith, K., & Merry, W. (1991). Predictors of differential and self directed psychotherapeutic procedures. Journal of Consulting and Clinical Psychology, 59, 333-340. Beutler, L. E., Frank, M., Scheiber, S. C., Calvert, S., & Gaines, J. (1984). Comparative effects of group psychotherapies in a short-term inpatient setting: An experience with deterioration effects. Psychiatry, 47, 66-76. Beutler, L. E., & Harwood, T. M. (1995) Prescriptive psychotherapies. Applied and Preventive Psychology, 4, 89-100. Beutler, L. E., & Hodgson, A. B. (1993). Prescriptive psychotherapy. In G. Stricker & J. R. Gold (Eds.), Comprehensive handbook of psychotherapy integration (pp. 151- 163). New York: Plenum. Beutler, L. E., Kim, E. J., Davison, E., Karno, M., & Fisher, D. (1996). Research contributions to improving managed health care outcomes. Psychotherapy, 33, 197-206. Beutler, L. E., Machado, P.P.P., Engle, D., & Mohr, D. (1993). Differential patient X treatment maintenance of treatment effects among cognitive, experiential, and self-directed psychotherapies. Journal of Psychotherapy Integration, 3, 15-32. Beutler, L. E., Machado, P.P.P, & Neufeldt, S. (1994). Therapist variables. In A. E. Bergin & S. L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 229-269). New York: Wiley. Beutler, L. E., & Mitchell, R. (1981). Psychotherapy outcome in depressed and impulsive patients as a function of analytic and experiential treatment procedures. Psychiatry, 44, 297-306. Beutler, L. E., Mohr, D. C., Grawe, K., Engle, D., & MacDonald, R. (1991). Looking for differential effects: Cross-cultural predictors of differential psychotherapy efficacy. Journal of Psychotherapy Integration, 1, 121142. Beutler, L. E., & Rosner, R. (1995). Introduction. In L. E. Beutler & M. Berren (Eds.), Integrative assessment of adult personality (pp. 1-24). New York: Guilford. Beutler, L. E., Sandowicz, M., Fisher, D., & Albanese, A. L. (1996). Resistance in psychotherapy:

< previous page

page_109

next page > Page 109

What can be concluded from empirical research? In Session: Psychotherapy in Practice, 2, 77-86. Beutler, L. E., Wakefield, P., & Williams, R. E. (1994). Use of psychological tests/instruments for treatment planning. In M. E. Maruish (Ed.), Use of psychological testing for treatment planning and outcome assessment (pp. 55-74). Hillsdale, NJ: Lawrence Erlbaum Associates. Billings, A. G., & Moos, R. H. (1984). Chronic and nonchronic unipolar depression: The differential role of environmental stressors and resources. Journal of Nervous and Mental Disease, 172, 65-75. Blanchard, E. B., Schwarz, S. P., Neff, D. F., & Gerardi, M. A. (1988). Prediction of outcome from the selfregulatory treatment of irritable bowel syndrome. Behavior, Research and Therapy, 26, 187-190. Brown, T. A., & Barlow, D. H. (1995). Long-term outcome in cognitive-behavioral treatment of panic disorder: Clinical predictors and alternative strategies for assessment. Journal of Consulting and Clinical Psychology, 63, 754-765. Butcher, J. N. (1990). The MMPI-2 in psychological treatment. New York: Oxford University Press. Butcher, J. N. (Ed). (1995). Clinical personality assessment: Practical approaches. New York: Oxford University Press. Calvert, S. J., Beutler, L. E., & Crago, M. (1988). Psychotherapy outcome as a function of therapist-patient matching on selected variables. Journal of Social and Clinical Psychology, 6, 104-117. Caspar, F. (1995). Plan analysis: Toward optimizing psychotherapy. Seattle: Hogrefe & Huber. Cattell, R. B., & Johnson, R. C. (Eds.). (1986). Functional psychological testing. New York: Brunner/Mazel. Cooney, N. L., Kadden, R. M., Litt, M. D., & Getter, H. (1991). Matching alcoholics to coping skills or interactional therapies: Two-year follow-up results. Journal of Consulting and Clinical Psychology, 59, 598601. Costa, P. T., & McCrae, R. R. (1985). The NEO Personality Inventory manual. Odessa, FL: Psychological Assessment Resources. Crits-Christoph, P., Cooper, A., & Luborsky, L. (1988). The accuracy of therapists' interpretations and the outcome of dynamic psychotherapy. Journal of Consulting and Clinical Psychology, 56, 490-495. Crits-Christoph, P., & Demorest, A. (1988, June). The development of standard categories for the CCRT method. Paper presented at the Society for Psychotherapy Research, Santa Fe, NM. Crits-Christoph, P., & Demorest, A. (1991). Quantitative assessment of relationship theme components. In M. J. Horowitz (Ed.), Person schemas and maladaptive interpersonal patterns (pp. 197-212). Chicago: University of Chicago Press. Crits-Christoph, P., Demorest, A., & Connolly, M. B. (1990). Quantitative assessment of interpersonal themes over the course of psychotherapy. Psychotherapy, 27, 513-521. Crits-Christoph, P., Luborsky, L., Dahl, L., Popp, C., Mellon, J., & Mark, D. (1988). Clinicians can agree in assessing relationship patterns in psychotherapy. Archives of General Psychiatry, 45, 1001-1004. Derogatis, L. R. (1994). SCL-90: Administration, scoring and procedures manual (3rd ed.). Minneapolis, MN: National Computer Systems. DeRubeis, R. J., Evans, M. D., Hollon, S. D., Garvey, M. J., Grove, W. M., & Tuason, V. B. (1990). How does cognitive therapy work? Cognitive change and symptom change in cognitive therapy and pharmacotherapy for depression. Journal of Consulting and Clinical Psychology, 58, 862-869. DiNardo, P. A., O'Brien, G. T., Barlow, D. H., Waddell, M. T., & Blanchard, E. B. (1983). Reliability of DSMIII anxiety disorder categories using a new structured interview. Archives of General Psychiatry, 40, 1070-1075. Dowd, E. T., Milne, C. R., & Wise, S. L. (1991). The Therapeutic Reactance Scale: A measure of psychological reactance. Journal of Counseling and Development, 69, 541-545. Dowd, E. T., Wallbrown, F., Sanders, D., & Yesenosky, J. M. (1994). Psychological reactance and its relationship to normal personality variables. Cognitive Therapy and Research, 18, 601-612. Edwin, D., Anderson, A. E., & Rosell, F. (1988). Outcome prediction of MMPI in subtypes of anorexia nervosa. Psychosomatics, 29, 273-282.

< previous page

page_109

next page >

< previous page

page_110

next page > Page 110

Elkin, I., Shea, T., Watkins, J. T., Imber, S. D., Sotsky, S. M., Collins, J. F., Glass, D. R., Pilkonis, P. A., Leber, W. R., Docherty, J. P., Feister, S. J., & Parloff, M. B. (1989). National Institute of Mental Health treatment of depression collaborative research program. Archives of General Psychiatry, 46, 971-982. Ellicott, A., Hammen, C., Gitlin, M., Brown, G., & Jamison, K. (1990). Life events and the course of bipolar disorder. American Journal of Psychiatry, 147, 1194-1198. Eysenck, H., & Eysenck, S.B.G. (1964). Manual of the Eysenck Personality Inventory. London: University of London Press. Eysenck, H. J., & Eysenck, S.B.G. (1969). Personality structure and measurement. San Diego: Knapp. Fahy, T. A., & Russell, G.F.M. (1993). Outcome and prognostic variables in bulimia nervosa. International Journal of Eating Disorders, 14, 135-145. Fairburn, C. G., Peveler, R. C., Jones, R., Hope, R. A., & Doll, H. A. (1993). Predictors of 12-month outcome in bulimia nervosa and the influence of attitudes to shape and weight. Journal of Consulting and Clinical Psychology, 61, 696-698. Fisher, D., Beutler, L. E., & Williams, O. B. (in press). STS Clinician Rating Form: Patient assessment and treatment planning. Journal of Clinical Psychology. Follette, W. C., & Houts, A. C. (1996). Models of scientific progress and the role of theory in taxonomy development: A case study of the DSM. Journal of Consulting and Clinical Psychology, 64, 1120-1132. Frances, A., Clarkin, J., & Perry, S. (1984). Differential therapeutics in psychiatry. New York: Brunner/Mazel. Frank, J. D., & Frank, J. B. (1991). Persuasion and healing (3rd ed.). Baltimore: Johns Hopkins University Press. Fremouw, W. J., & Zitter, R. E. (1978). A comparison of skills training and cognitive restructuring-relaxation for the treatment of speech anxiety. Behavior Therapy, 9, 248-259. Gaw, K. F., & Beutler, L. E. (1995). Integrating treatment recommendations. In L. E. Beutler & M. Berren (Eds.), Integrative assessment of adult personality (pp. 280-319). New York: Guilford. Goldfried, M. R. (1991). Research issues in psychotherapy integration. Journal of Psychotherapy Integration, 1, 5-25. Goldstein, G., & Hersen, M. (Eds.). (1990). Handbook of psychological assessment (2nd ed.). New York: Pergamon. Gough, H. G. (1987). California Psychological Inventory administrator's guide. Palo Alto, CA: Consulting Psychologists Press. Graham, J. R. (1987). The MMPI: A practical guide (2nd ed.). New York: Oxford University Press. Graham, J. R. (1993). The MMPI-2: Assessing personality and psychopathology. New York: Oxford University Press. Groth-Marnat, G. (1997). Handbook of psychological assessment (3rd ed.). New York: Wiley. Hamilton, M. (1967). Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology, 6, 278-296. Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1987). The treatment utility of assessment: A functional approach to evaluating assessment quality. American Psychologist, 42, 963-974. Hoencamp, E., Haffmans, P. M. J., Duivenvoorden, H., Knegtering, H., & Dijken, W. A. (1994). Predictors of (non-) response in depressed outpatients treated with a three-phase sequential medication strategy. Journal of Affective Disorders, 31, 235-246. Hollon, S. D., & Beck, A. T. (1986). Research on cognitive therapies. In S. L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed., pp. 443-482), New York: Wiley. Hooley, J. M., & Teasdale, J. D. (1989). Predictors of relapse in unipolar depressives: Expressed emotion, marital distress, and perceived criticism. Journal of Abnormal Psychology, 98, 229-235. Horowitz, M., Marmar, C., Krupnick, J., Wilner, N., Kaltreider, N., & Wallerstein, R. (1984). Personality styles and brief psychotherapy. New York: Basic Books. Horvath, A. O., & Goheen, M. D. (1990). Factors mediating the success of defiance- and compliance-based interventions. Journal of Counseling Psychology, 37, 363-371. Hunsley, J. (1993). Treatment acceptability of symptom prescription techniques. Journal of Counseling Psychology, 40, 139-143.

Imber, S. D., Pilkonis, P. A., Sotsky, S. M., Elkin, I., Watkins, J. T., Collins, J. F., Shea,

< previous page

page_110

next page >

< previous page

page_111

next page > Page 111

M. T., Leber, W. R., & Glass, D. R. (1990). Mode-specific effects among three treatments for depression. Journal of Consulting and Clinical Psychology, 58, 352-359. Jacob, R. G., Turner, S. M., Szekely, B. C., & Eidelman, B. H. (1983). Predicting outcome of relaxation therapy in headaches: The role of "depression." Behavior Therapy, 14, 457-465. Joyce, A. S., & Piper, W. E. (1996). Interpretive work in short-term individual psychotherapy: An analysis using hierarchical linear modeling. Journal of Consulting and Clinical Psychology, 64, 505-512. Kadden, R. M., Cooney, N. L., Getter, H., & Litt, M. D. (1989). Matching alcoholics to coping skills or interactional therapies: Post-treatment results. Journal of Consulting and Clinical Psychology, 57, 698-704. Karno, M. (1997). Identifying patient attributes and elements of psychotherapy that impact the effectiveness of alcoholism treatment. Unpublished doctoral dissertation, University of California, Santa Barbara. Keijsers, G.P.J., Hoogduin, C.A.L., & Schaap, C.P.D.R. (1994). Predictors of treatment outcome in the behavioural treatment of obsessive-compulsive disorder. British Journal of Psychiatry, 165, 781-786. Khavin, A. B. (1985). Individual-psychological factors in prediction of help to stutterers. Voprosy-Psikhologii, 2, 133-135. Klerman, G. L. (1986). Drugs and psychotherapy. In S. L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed., pp. 777-818). New York: Wiley. Klerman, G. L., DiMascio, A., Weissman, M. M., Prusoff, B., & Paykel, E. S. (1974). Treatment of depression by drugs and psychotherapy. American Journal of Psychiatry, 131, 186-191. Knight-Law, A., Sugerman, A. A., & Pettinati, H. M. (1988). An application of an MMPI classification system for predicting outcome in a small clinical sample of alcoholics. American Journal of Drug and Alcohol Abuse, 14, 325-334. LaCroix, J. M., Clarke, M. A., Bock, J. C., & Doxey, N. C. (1986). Predictors of biofeed-back and relaxation success in multiple-pain patients: Negative findings. International Journal of Rehabilitation Research, 9, 376378. Lambert, M. J. (1994). Use of psychological tests for outcome assessment. In. M. E. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 75-97). Hillsdale, NJ: Lawrence Erlbaum Associates. Lambert, M. J., & Bergin, A. E. (1983). Therapist characteristics and their contribution to psychotherapy outcome. In C. E. Walker (Ed.), The handbook of clinical psychology (Vol. 1, pp. 205-241). Homewood, IL: Dow Jones-Irwin. Lazarus, A. A. (1981). The practice of multimodal therapy. New York: McGraw-Hill. Longabaugh, R., Beattie, M., Noel, N., Stout, R., & Malloy, P. (1993). The effect of social investment on treatment outcome. Journal of Studies on Alcohol, 54, 465-478. Longabaugh, R., Rubin, A., Malloy, P., Beattie, M., Clifford, P. R., & Noel, N. (1994). Drinking outcomes of alcohol abusers diagnosed as antisocial personality disorder. Alcoholism: Clinical and Experimental Research, 18, 778-785. Luborsky, L. (1996). The symptom-context method. Washington, DC: American Psychological Association. Luborsky, L., Crits-Christoph, P., & Mellon, J. (1986). The advent of objective measures of the transference concept. Journal of Consulting and Clinical Psychology, 54, 39-47. Masterson, J. F., Tolpin, M., & Sifneos, P. E. (1991). Comparing psychoanalytic psychotherapies. New York: Brunner/Mazel. McLellan, A. T., Woody, G. E., Luborsky, L., O'Brien, C. P., & Druley, K. A. (1983). Increased effectiveness of substance abuse treatment: A prospective study of patient-treatment "matching." Journal of Nervous and Mental Disease, 171, 597-605. Merz, J. (1983). Fragenbogen zur Messung der psychologischen Reactanz [Questionnaire for the measurement of psychological reactance]. Diagnostica, 29, 75-82. Miller, W. R., Benefield, G., & Tonigan, J. S. (1993). Enhancing motivation for change in problem drinking: A controlled comparison of two therapist styles. Journal of Consulting and Clinical Psychology, 61, 455-461. Millon, T. (1994). Millon Clinical Multiaxial Inventory-III (MCMI-III) manual. Minneapolis, MN: National Computer Systems.

< previous page

page_111

next page >

< previous page

page_112

next page > Page 112

Mohr, D. C., Beutler, L. E., Engle, D., Shoham-Salomon, V., Bergan, J., Kaszniak, A. W., & Yost, E. (1990). Identification of patients at risk for non-response and negative outcome in psychotherapy. Journal of Consulting and Clinical Psychology, 58, 622-628. Moos, R. H. (1990). Depressed outpatients' life contexts, amount of treatment and treatment outcome. Journal of Nervous and Mental Disease, 178, 105-112. Moos, R. H., & Moos, B. S. (1986). Family Environment Scale manual (2nd ed.). Palo Alto, CA: Consulting Psychologists Press. Nezworski, M. T., & Wood, J. M. (1995). Narcissism in the comprehensive system for the Rorschach. Clinical Psychology: Science and Practice, 2, 179-199. Nietzel, M. T., Russell, R. L., Hemmings, K. A., & Gretter, M. L. (1987). Clinical significance of psychotherapy for unipolar depression: A meta-analytic approach to social comparison. Journal of Consulting and Clinical Psychology, 55, 156-161. Norcross, J. C., & Goldried, M. R. (Eds.). (1992). Handbook of psychotherapy integration. New York: Basic Books. O'Connor, E. A., Carbonari, J. P., & DiClemente, C. C. (1996). Gender and smoking cessation: A factor structure comparison of processes of change. Journal of Consulting and Clinical Psychology, 64, 130-138. Parker, G., Holmes, S., & Manicavasagar, V. (1986). Depression in general practice attenders"caneness," natural history and predictors of outcomes. Journal of Affective Disorders, 10, 27-35. Prochaska, J. O. (1984). Systems of psychotherapy: A transtheoretical analysis (2nd ed.). Homewood, IL: Dorsey. Prochaska, J. O., & DiClemente, C. C. (1986). The transtheoretical approach. In J. C. Norcross (Ed.), Handbook of eclectic psychotherapy (pp. 163-200), New York: Brunner/Mazel. Prochaska, J. O., DiClemente, C. C., & Norcross, J. C. (1992). In search of how people change: Applications to addictive behaviors. American Psychologist, 47, 1102-1114. Prochaska, J. O., Rossi, J. S., & Wilcox, N. S. (1991). Change processes and psychotherapy outcome in integrative case research. Journal of Psychotherapy Integration, 1, 103-120. Prochaska, J. O., Velicer, W. F., DiClemente, C. C., & Fava, J. (1988). Measuring process of change: Applications to the cessation of smoking. Journal of Consulting and Clinical Psychology, 56, 520-528. Project MATCH Research Group. (1997). Matching alcoholism treatments to client heterogeneity: Project MATCH posttreatment drinking outcomes. Journal of Studies on Alcohol, 58, 7-29. Robinson, L. A., Berman, J. S., & Neimeyer, R. A. (1990). Psychotherapy for the treatment of depression: A comprehensive review of controlled outcome research. Psychological Bulletin, 108, 30-49. Rohrbaugh, M., Shoham, V., Spungen, C., & Steinglass, P. (1995). Family systems therapy in practice: A systemic couples therapy for problem drinking. In B. Bongar & L. E. Beutler (Eds.), Comprehensive textbook of psychotherapy: Theory and practice (pp. 228-253). New York: Oxford University Press. Shapiro, D. A., Barkham, M., Rees, A., Hardy, G. E., Reynolds, S., & Startup, M. (1994). Effects of treatment duration and severity of depression on the effectiveness of cognitive-behavioral and psychodynamicinterpersonal psychotherapy. Journal of Consulting and Clinical Psychology, 62, 522-534. Shapiro, D. A., Rees, A., Barkham, M., Hardy, G., Reynolds, S., & Startup, M. (1995). Effects of treatment duration and severity of depression on the maintenance of gains after cognitive-behavioral and psychodynamicinterpersonal therapy. Journal of Consulting and Clinical Psychology, 63, 378-387. Sheppard, D., Smith, G. T., & Rosenbaum, G. (1988). Use of MMPI subtypes in predicting completion of a residential alcoholism treatment program. Journal of Consulting and Clinical Psychology, 56, 590-596. Sherbourne, C. D., Hays, R. D., & Wells, K. B. (1995). Personal and psychosocial risk factors for physical and mental health outcomes and course of depression among depressed patients. Journal of Consulting and Clinical Psychology, 63, 345-355. Shoham-Salomon, V., Avner, R., & Neeman, K. (1989). "You are changed if you do and changed if you don't:" Mechanisms underlying paradoxical interventions. Journal of Consulting and Clinical Psychology, 57, 590-598. Shoham-Salomon, V., & Hannah, M. T. (1991). Client-treatment interactions in the study of

< previous page

page_112

next page >

< previous page

page_113

next page > Page 113

differential change processes. Journal of Consulting and Clinical Psychology, 59, 217-225. Shoham-Salomon, V., & Rosenthal, R. (1987). Paradoxical interventions: A meta-analysis. Journal of Consulting and Clinical Psychology, 55, 22-28. Simons, A. D., Garfield, S. L., & Murphy, G. E. (1984). The process of change in cognitive therapy and pharmacotherapy for depression. Archives of General Psychiatry, 41, 45. Simons, A. D., & Thase, M. E. (1992). Biological markers, treatment outcome, and 1-year follow-up in endogenous depression: Electroencephalogic sleep studies and response to cognitive therapy. Journal of Consulting and Clinical Psychology, 60, 392-401. Spielberger, C. D., Gorsuch, R. L., & Lushene, R. E. (1970). The State-Trait Anxiety Inventory (STAI) test manual for form X. Palo Alto: Consulting Psychologists Press. Striker, G., & Gold, J. R. (Eds.) (1993). Comprehensive handbook of psychotherapy integration. New York: Plenum. Strupp, H. H., & Binder, J. L. (1984). Psychotherapy in a new key. New York: Basic Books. Strupp, H. H., Horowitz, L. M., & Lambert, M. J. (1997). Measuring patient changes in mood, anxiety, and personality disorders: Toward a core battery. Washington, DC: American Psychological Association. Swoboda, J. S., Dowd, E. T., & Wise, S. L. (1990). Reframing and restraining directives in the treatment of clinical depression. Journal of Counseling Psychology, 37, 254-260. Tasca, G. A., Russell, V., & Busby, K. (1994). Characteristics of patients who choose between two types of group psychotherapy. International Journal of Group Psychotherapy, 44, 499-508. Tracey, T. J., Ellickson, J. L., & Sherry P. (1989). Reactance in relation to different supervisory environments and counselor development. Journal of Counseling Psychology, 36, 336-344. Trief, P. M., & Yuan, H. A. (1983). The use of the MMPI in a chronic back pain rehabilitation program. Journal of Clinical Psychology, 39, 46-53. Vaillant, L. M. (1997). Changing character. New York: Basic Books. Vallejo, J., Gasto, C., Catalan, R., Bulbena, A., & Menchon, J. M. (1991). Predictors of antidepressant treatment outcome in melancholia: Psychosocial, clinical, and biological indicators. Journal of Affective Disorders, 21, 151-162. Wakefield, P. J., Williams, R. E., Yost, E. B., & Patterson, K. M. (1996). Couple therapy for alcoholism: A cognitive-behavioral treatment manual. New York: Guilford. Wells, K. B., & Sturm, R. (1996). Informing the policy process: From efficacy to effectiveness data on pharmacotherapy. Journal of Consulting and Clinical Psychology, 64, 638-645. Welsh, G. S. (1952). An anxiety index and an internalization ratio for the MMPI. Journal of Consulting Psychology, 16, 65-72. Wilson, T. G. (1996). Treatment of bulimia nervosa: When CBT fails. Behavioral Research Therapy, 34, 197212. Woody, G. E., McLellan, A. T., Luborsky, L., O'Brien, C. P., Blaine, J., Fox, S., Herman, I., & Beck, A. T. (1984). Severity of psychiatric symptoms as a predictor of benefits from psychotherapy: The Veterans Administration-Penn Study. American Journal of Psychiatry, 141, 1172-1177.

< previous page

page_113

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_115

next page > Page 115

Chapter 4 Use of Psychological Tests for Assessing Treatment Outcome Michael J. Lambert Brigham Young University John M. Lambert University of Utah Outcome assessment is a branch of applied psychology that illuminates the strength of the effects of psychological interventions on patient functioning. In psychotherapy research, this assessment is done in the context of specific research designs aimed at answering specific questions of theoretical importance. The broad questions that are addressed in outcome assessment require an equally broad range of research designs to draw conclusions. Although assessment of outcome occurs only within the context of a particular research strategy (single case design, program evaluation, comparative outcome study, etc.), a limited set of procedures and principles guides the selection of outcome measures. This chapter offers a brief history of outcome assessment. This is followed by some guidelines for selecting outcome measures, followed by recommendations for assessing outcome in psychotherapy. A great deal of psychotherapy research has been undertaken over the past 50 years. Therefore, much is known about outcome measurement. This knowledge can aid the interested practitioner in selecting and using outcome measures in this age of accountability. An Overview of Outcome Assessment The problems associated with assessing the changing psychological status of patients are, as Luborsky (1971) suggested, a ''hardy perennial" in the field of psychotherapy. Historically, psychotherapists have devoted themselves to defining and perfecting treatments, not assessing the consequences of these treatments systematically. Likewise, social or personality psychologists historically have developed assessment devices in contexts devoid of interest in personality change or symptomatic improvement. Personality psychologists have been more interested in static traits and stability than in change per se. Occasional exceptions to this trend can be found (e.g., Worchel & Byrne, 1964). But for the most part, little real effort has been expended on developing measures for the

< previous page

page_115

next page >

< previous page

page_116

next page > Page 116

purpose of measuring change. Outcome assessment is the neglected domain between these two fields of study. Although measurement and quantification are central properties of empirical science, the earliest attempts at quantifying treatment gains lacked scientific rigor. The field gradually has moved from complete reliance on therapist ratings of gross and general improvement to the use of outcome indices of specific symptoms that are quantified from a variety of viewpoints, including the patient, outside observers, relatives, physiological indices, and environmental data such as employment records. The data generated from these viewpoints are always subject to the limitations inherent in the methodology; none is "objective" or most authoritative. Reliance on multiple viewpoints is an improvement from previous measurement methods, which were difficult to replicate because of their lack of clear operational definitions and lack of systematic means of data collection. Psychotherapy outcome assessment, with some recent notable exceptions such as the Consumer Reports satisfaction survey (Seligman, 1995), has moved from being based on simple posttheory ratings to relying on complex and multifaceted assessments of change. In the past, attempts at measuring change have reflected the fashionable theoretical positions of the day. Early studies relied on devices developed out of Freudian dynamic psychology. These devices (e.g., Rorschach and Thematic Apperception Test [TAT]) largely have been discarded as measures of outcome because of their poor psychometric qualities, reliance on inference, and the fact that they mainly reflected the interest of orientations that emphasized unconscious processes. Even if scoring systems such as Exner's for the Rorschach have overcome some of the psychometric problems associated with projective testing, these devices are not used in outcome studies because of practical constraints (they are time intensive). The use of these measures was followed by the use of devices consistent with client-centered theory (e.g., the Q-sort technique), behaviorism (behavioral monitoring), and more recently, cognitive theories with their emphasis on automatic thoughts. Although outcome assessment always will be guided by "in-vogue" theoretical positions, the field as a whole has moved a long way from its early theoretical foundations. Nonetheless, there are important lessons to be learned from past attempts at measuring change. It would be unfortunate if nothing from the past was used to guide current attempts to measure patient gains. Contemporary research reflects some significant lessons from early scientific efforts. For example, the Clinical Research Branch of the National Institute of Mental Health (NIMH) sponsored an Outcome Measures Project in 1975 in order to better evaluate the effectiveness of different psychotherapies through the potential creation of a core battery of instruments to ease the comparison and integration of research findings (Waskow & Parloff, 1975). Since that time there have been many developments in the field, yet a core battery has not been attained. Recently, the core battery idea has been brought up again and work has been intensified in order that a core battery of tests and measures might be narrowed and operationalized. In 1994 the American Psychological Association (APA) supported a conference at Vanderbilt University to discuss some questions like the following: Would a core battery be most useful as a small number of instruments to be used for all studies of outcome, or perhaps in the form of algorithms or flow diagrams that consider any aspect of a person's functioning essential and specific to a certain diagnostic category? What needs to be measured? What criteria should be used to evaluate outcome measures? What instruments should be used? These questions were discussed and proposals were made by different groups of experts working on separate diagnostic categories of anxiety

< previous page

page_116

next page >

< previous page

page_117

next page > Page 117

disorders, mood disorders, and personality disorders (Horowitz, Strupp, Lambert, & Elkin, 1997; see Strupp, Horowitz, & Lambert, 1997). The history of assessing outcome suggests several guidelines for the use of tests in future research and practice. The most important of these guidelines shows up in the current tendencies to clearly specify what is being measured, so that replication is possible; measure change from multiple perspectives; employ different types of rating scales and methods; employ symptom-based atheoretical measures; and examine, to some extent, patterns of change over time. These practices are an improvement over the past, and they are highlighted further in the following sections. The Current State of Outcome Assessment: Diversity if Not Chaos Common Measures of Outcome All measures of outcome have weaknesses, yet using measures that have a history of frequent use will provide advantages that are not available with new or infrequently used measures. Primary among these advantages is easy comparison across studies in order to judge the degree to which patients begin therapy at equivalent levels of pathology and the degree to which patients show comparable changes following treatment (across studies). Several surveys have summarized measures that occur frequently in studies of psychotherapy. Lambert (1983) reported that the following self-report scales were the most commonly used outcome measures in the Journal of Consulting and Clinical Psychology from 1976 to 1980: State-Trait Anxiety Inventory (STAI), Minnesota Multiphasic Personality Inventory (MMPI), Rotter Internal-External Locus of Control, S-R Inventory of Anxiousness, and the Beck Depression Inventory (BDI). In their review of 21 separate American journals published between 1983 and 1988, J. E. Froyd, Lambert, and J. D. Froyd (1996) summarized usage data from 334 outcome studies. The most frequently used self-report scales were the BDI, STAI, Symptom Checklist-90 (SCL-90), Locke-Wallace Marital Adjustment Inventory, and the MMPI. In another review of articles in the Journal of Consulting and Clinical Psychology (1986-1991), Lambert and McRoberts (1993) reviewed 116 studies of psychotherapy with adults. The frequency of outcome measure categorized by source is presented in Table 4.1. As can be seen, the measures employed continue to be similar within the category of self-report methodology. Clearly, the BDI, STAI, and SCL-90 remain the most popular measures used across a broad sampling of disorders. As one moves within a specific disorder, the listing of most frequently used scales can be expected to change, although it appears that the scales just mentioned, by virtue of their focus on anxiety or depression, remain relevant and popular. Beyond self-report methodology, there is less consensus within categories of usage. The Hamilton Rating Scale for Depression (Hamilton, 1967) was used frequently in the studies reviewed by Lambert (1983) and J. E. Froyd et al. (1996). In the more recent survey, it remains relatively popular, either in the hands of the therapist or through the use of expert raters. The Locke-Wallace Marital Adjustment Inventory (Locke & Wallace, 1959) is the most frequently used specific scale employed with significant others,

< previous page

page_117

next page >

< previous page

page_118

next page > Page 118

TABLE 4.1 Commonly Used Inventories and Methods of Assessment Self-report No. of % of Instrumental No. % Significant Others No.% (N = 384) Times Total (N = 50) (N = 15) Used Beck Depression 40 10.4 Heartrate 9 18 Information on Inventory Blood 7 14 specific behavior 5 33 Experimentor-created 37 9.6 pressure 5 10 Problem checklist scales or Weight 5 10 by 6 40 questionnaires 27 7.0 Saliva 3 6 informant 14 3.6 composition 2 4 Single use of Diary behavior and/or 12 3.1 CO level measure thoughts Respiration of family State-Trait Anxiety 6 1.6 rate functioning Inventory 6 1.6 (e.g., Family Life SCL-90-R 5 1.3 Symptom 3 Minnesota Multiphasic Checklist, Family 20 Personality Inventory 5 1.3 Environment, Dysfunctional Attitude Scale Family Scale Adjustment Hassles Scale Questionnaire) Schedule for Affective Disorders and Schizophrenia Trained Therapist Observer (N = 66) No. % (N = 67) No. % Interview-global or level Frequency of of 35 53 specific 13 19 functioning ratings behavior Hamilton Rating Scale 14 21 Rating of 27 40.3 for behavior or 12 17.9 Depression* subject characteristics Interview of subject Note. From Lambert and McRoberts (1993). * This scale also was counted as a trained observer measure when it was administered by someone other than the therapist.

< previous page

page_118

next page >

< previous page

page_119

next page > Page 119

but it is typically employed in marital therapies where both partners are the recipients of treatment. Despite the fact that some measures are used repeatedly, their frequency is still not high. For example, of the 384 uses of self-report scales reported in Table 4.1, the MMPI made up only 1.6% of the total. It is only a commonly used measure in the relative sense of the word. It is startling to discover the seemingly endless number of measures used to objectify outcome. J. E. Froyd et al. (1996) consulted journals that were representative of a broad range of therapy as practiced and reported in contemporary professional literature. A total of 1,430 outcome measures were identified and represented a wide variety of patient diagnoses, treatment modalities, and therapy types. Of this rather large number, 840 different measures were used just once. A second review looked at a more homogeneous set of studies in which data on agoraphobia outcome published during the 1980s (Ogles, Lambert, Weight, & Payne, 1990) located 106 studies using 98 unique outcome measures. This occurred in a well-defined, limited disorder treated with an equally narrow range of interventions, mainly behavioral and cognitive-behavioral therapies. Similar conclusions have been drawn by E. A. Wells, Hawkins, and Catalano (1988), who reported more than 25 ways to measure drug usage in addiction outcome research. The proliferation of outcome measures (a sizable portion of which were unstandardized scales) is overwhelming and could be expanded if consideration is given to the fact that some measures (e.g. HRSD) are actually not single scales, but scales with multiple variants. C. T. Grundy, Lunnen, Lambert, Ashton, and Tovey (1994), for example, found more than a dozen versions of the HRSD. Those who assess change have not agreed on a standard battery of tests and procedures even within homogeneous patient populations, but progress is being made to these ends as mentioned previously (Strupp et al., 1997). The seeming disarray of instruments is partly a function of the complex and multifaceted nature of psychotherapy outcome as reflected in the divergence in clients and their problems, treatments, and underlying assumptions and techniques, and the multidimensionality of the change process itself. But it also represents the struggle (failure) of scientists and practitioners to agree on valued outcomes. Indeed, measuring the outcomes of psychotherapy promises to be a hardy perennial for years to come. Change is Complex Although most current outcome research studies focus on seemingly homogeneous samples of patients (e.g., unipolar depression, agoraphobia), it is clear that each patient is unique and brings unique problems to treatment. For example, although the major complaint of a person is summed up as "depression" and this person meets diagnostic criteria for major depression, this same patient can have serious interpersonal problems, somatic concerns, evidence of anxiety, financial difficulties, problems at work, problems parenting children, substance abuse, and so on. These diverse problems may be addressed in therapy, and proper assessment of outcome may require that changes in all these problems be measured. Obviously, this is a demanding task that cannot be accomplished fully in any particular study or by the practitioner. The complexity of human behavior and the complexity of theories and conceptions of human behavior invite incredible complexity in operationalizing the changes that occur as a result of psychotherapy. For example, Williams (1985) documented considerable evidence that, even within the seemingly limited diagnosis of agoraphobia, there is considerable diversity among

< previous page

page_119

next page >

< previous page

page_120

next page > Page 120

patients. He noted that there is considerable diversity in the situations that provoke panic across patients, including numerous phobias that often appear as simple phobias (e.g., fear of flying, heights). At the same time, the most frequent panic provoking situation (driving on freeways) was rated as "no problem" by nearly 30% of agoraphobics. The typical agoraphobic is severely handicapped in some situations, moderately handicapped in others, and not at all restricted in other situations. Furthermore, agoraphobics have many fears that are common to social phobia (e.g., fear of causing a public disturbance, being stared at) as a primary or secondary fear. They also have many somatic complaints for which they often and persistently seek medical consultation for a physical diagnosis even after agoraphobia is diagnosed. These fears overlap with both hypochondriacal and hysterical disorders. "The configuration of fears in agoraphobics is so highly idiosyncratic that it is substantially true that no two agoraphobics have exactly the same pattern of phobias, and that two people with virtually no overlapping areas of phobia disability can both be called agoraphobic" (Williams, 1985, p. 112). It is clear that however specific a diagnosis may seem, the term does not denote only a precise set of symptoms that occur independently of other symptoms. Thus, a single measure cannot hope to capture the complications of psychological functioning or adequately evaluate therapeutic change, because no single measure of disability can routinely capture the complexity of the individual patient. Given the great complexity in the persons to be treated, it can be suggested that researchers begin studying outcome by identifying major targets of treatment, while accepting that the resulting picture of change will be far from complete. Although this fact cannot be changed, its implications can be recognized. For example, those who produce as well as consume psychotherapy research (e.g., the insurance industry, government policymakers, etc.), need to show due modesty in the conclusions they draw from research. Given repeated failures to capture the complexity of patient functioning and change, it is little wonder that many practitioners are not avid consumers of psychotherapy research. Nevertheless, being critical of psychotherapy research does not offer a solution to the problem of complexity. One can deal with some of the complexities of outcome assessment by employing a conceptual scheme that helps organize issues and procedures. Conceptualizing Measures and Methods Various nonbinding conceptual schemes have been put forth in order to better organize the multidimensional chaos (see Lambert, Ogles, & Masters, 1992; see also Schulte, 1995). For example, McLellan and Durell (1996) suggested four areas for assessment of outcome: reduction of symptoms; improvement in health, personal, social functioning; cost of care; and reduction in public health and safety threats. Docherty and Streeter (1996), on the other hand, suggested seven dimensions that must be considered in outcome assessment: symptomatology, social/interpersonal functioning, work functioning, satisfaction, treatment utilization, health status/global wellbeing, and health-related quality of life. The previous two examples take a rather broad view of the topic. The following is a conceptual scheme that concerns itself more narrowly with issues that surround the collection of outcome data in research that focuses on the individual patient rather than the health system or the administrative area.

< previous page

page_120

next page >

< previous page

page_121

next page > Page 121

The conceptual scheme offered purposely ignores psychotherapy theories and paradigms. Yet, theoretical concerns play a major role in determining what clinicians value as outcome appraisers (Cohen, 1980). Theory may play a prominent role in choice of outcome assessment measures especially to therapists who subscribe to a particular position. However, the use of instruments that are relevant within a theoretical system will not certify effectiveness or prove convincing to the parties interested in the effectiveness of differing forms of psychotherapy. Third-party reimbursers, clinicians, academicians, and the public at large will be influenced significantly if changes in clients can be shown through research to be practically important. Instruments associated with a variety of sources, numerous content areas and social levels, and varying time orientations will be most credible in demonstrating the value of psychotherapy. Table 4.2 presents an evolving and useful conceptual scheme that organizes several important dimensions in measuring outcome (Ogles, Lambert, & Masters, 1996). These dimensions are social level of outcome assessment, the methods (technology) used in outcome assessment, sources that generate outcome data, content of outcome measure, and time orientation. The first three areas are discussed fully. The last two dimensions can be briefly summed up. Time orientation is a dimension that describes the degree to which an instrument measures statelike, unstable constructs versus traitlike, stable constructs. The main question of research for this dimension is whether particular patient attributes remain consistent over time and whether change following treatment is characterized by different patterns (e.g., Howard, Leuger, Maling, & Martinovich, 1993, suggested that changes in patient morale occur early while changes in functioning take longer). The content dimension considers behavioral, cognitive, and affective faculties, including physiology, as a subset of behavior to answer the question: What psychological area is being measured? TABLE 4.2 Scheme for Organizing and Selecting Outcome Measures Social Level Source Technology Content Time Orientation IntrapersonalSelf Global Cognition Trait 1 1 1 1 1 2 2 2 2 2 * * * * * InterpersonalTherapist Specific Affect State 1 1 1 1 1 2 2 2 2 2 * * * * * Social role Trained ObservationBehavior observer 1 1 1 1 2 2 2 2 * * * * Relevant otherStatus 1 1 2 2 * * Institutional 1 2 * Note. The numbers and asterisks below each area represent the idea that there can be subcategories such as kinds of intrapersonal events or kinds of interpersonal measures, etc.

< previous page

page_121

next page >

< previous page

page_122

next page > Page 122

Social Level of Outcomes Assessment The first dimension listed in Table 4.2 is entitled "Social Level." Social level areas covered by outcome measures can be divided into intrapersonal, interpersonal, and social role performance. Thus, social level can be seen as a dimension that reflects the need to assess changes that occur within the client, in the client's intimate relationships, and, more broadly, in the client's participation in community and social roles. This dimension can be considered a continuum that represents the degree to which an instrument measures subjective discomfort, intrapsychic attributes, and bodily experiences, versus characteristics of the client's participation in the interpersonal world. It is a matter of intellectual curiosity and values, if not empirical importance, to know about the social level of the changes that are targeted and modified in treatment efforts. Empirically, the results of outcome studies are more impressive when social level is measured broadly, considering more than a single area, because interventions can have side effects as well as more and less extensive effects. Certainly, these social level areas reflect the values and interests of clients, mental health providers, third-party payers, government agencies, and society at large. In the J. E. Froyd et al. (1996) review, 74% of outcome measures focused on the intrapersonal social level, whereas 17% and 9% focused on interpersonal and social role performance, respectively. To date, changes in these last two areas have been underrepresented in outcome research. Several issues other than social level are represented in the conceptual scheme. Another dimension that is of central importance in outcome assessment is the source from which data are generated. Change Should be Measured from Multiple Perspectives In the ideal study of change, all the parties involved who have information about change might be represented. This would include the client, therapist, relevant (significant) others, trained judges (or observers), and societal agencies who store information, such as employment and educational records. Unlike the physical sciences, measurement in psychotherapy is highly affected by the politics and biases of those providing the data. It is seldom possible to merely observe phenomena of interest without seeing it through some filtering lens. The source dimension is a sort of hierarchy, beginning with those most involved with therapy (client and therapist) and moving to the more remote (others and institutions). Yet, it is a fluid dimension and the ordering of the specific source influence could change in a particular setting or circumstance. In the aforementioned review of 116 outcome studies found in the Journal of Consulting and Clinical Psychology (JCCP) between 1986 and 1991, Lambert and McRoberts (1993) examined research practices related to assessment source. Specific outcome measures were classified into five source categories: self-report, trained observer, significant other, therapist, or instrumental (a category including societal records or instruments, such as physiological recording devices). Frequency data were then computed on the usage of specific instruments and instrument sources across studies. As may be expected, the most popular source for outcome data was the client. In fact, 25% of the studies used client self-report data as the sole source for evaluation. Of the studies that relied solely on this type of self-report scale, three fourths used more than a single self-report scale. The next most frequent procedure employed two data sources simultaneously (self-report and observer ratings). This combination occurred in 20% of the studies,

< previous page

page_122

next page >

< previous page

page_123

next page > Page 123

followed by self-report and therapist ratings at 15% and self-report and instrumental sources at 8%. A self-report scale was used alone or in combination in over 90% of the studies. Significant other ratings rarely were employed. They were utilized alone or in combination with some other data sources in about 9% of the studies reported. The therapist rated outcome alone or in combination with other measures in about 25% of the studies. Impressively, 30% of the studies used six or more instruments to reflect changes in patients. The most ambitious effort had a combination of 12 distinct measures to assess changes following psychotherapy. Clearly, one of the most important conclusions to be drawn from past psychotherapy outcome research is that the results of studies can be misunderstood easily and even misrepresented through failure to appreciate the effects that different perspectives can have in reflecting the degree of change that results from therapy. The necessary and, to some degree, common practice of applying multiple criterion measures in research studies (Lambert, 1983) has made it obvious that multiple measures from different sources do not yield unitary results. For example, in studies using multiple criterion measures, a specific treatment used to reduce seemingly simple fears may result in a decrease in behavioral avoidance of the feared object (provided by observers), but may not affect the self-reported level of discomfort associated with the feared object (Mylar & Clement, 1972; Ross & Proctor, 1973; Wilson & Thomas, 1973). Likewise, a physiological indicator of fear may show no change in response to a feared object as a result of treatment, whereas improvement in subjective self-report will be marked (Ogles et al., 1990). Relying on different sources of assessment can have an impact on conclusions (e.g., Glaister, 1982). In a review of the effects of relaxation training, Glaister found that relaxation in contrast to other procedures (mainly exposure) had its principal impact on physiological indices of change. Indeed, it was superior to other treatments in 11 of 12 comparisons, whereas the other (exposure) conditions were superior in 28 of 38 comparisons using verbal reports of improvement by the patients. On behavioral measures (including assessor ratings), neither exposure nor relaxation appeared superior. Farrell, Curran, Zwick, and Monti (1983), although showing that raters can discriminate social skill deficits from anxiety level on the Simulated Social Skills Test, also found that there was poor correspondence between selfratings and behavior ratings of these variables. This lack of convergence between measurement methods also was apparent when physiological measures were added (Monti, Wallender, Ahern, Abrams, & Munroe, 1983). Little convergent validity was found for measurement method. It appears that different measures of the same target problem often disagree (e.g., self-report of sexual arousal and physiological measures; Sabalis, 1983). This conclusion is supported further by factor analytic studies that have combined a variety of outcome measures. The main factors derived from some "older" studies that employed factor analytic data tend to be associated closely with the measurement method or the source of observation used in collecting data, rather than being identified by some theoretical or conceptual variable that would be expected to cut across techniques of measurement (Cartwright, Kirtner, & Fiske, 1963; Forsyth & Fairweather, 1961; Gibson, Snyder, & Ray, 1955; Shore, Massimo, & Ricks, 1965). A more recent example was reported by Pilkonis, Imber, Lewis, and Rubinsky (1984), who factor analyzed 15 scales representing a variety of traits and symptoms from the client, therapist, expert judges, and significant others. These scales were reduced to three factors that most clearly represented the source of data, rather than the content of the scale. Beutler and Hamblin (1986) reported similar results.

< previous page

page_123

next page >

< previous page

page_124

next page > Page 124

However, few studies have recognized or adequately dealt with the complexities that result from divergence between sources, although creative efforts and some progress have been made. Berzins, Bednar, and Severy (1975) directly addressed the issue of consensus among criterion measures. They studied the relation among outcome measures in 79 client-therapist dyads using the MMPI, the Psychiatric Status Schedule, and the Current Adjustment Rating Scale. Sources of outcome measurement involved the client, therapist, and trained outside observers. Data from all three sources and a variety of outcome measures showed generally positive outcomes for the treated group as a whole at termination. There was the usual lack of consensus between criterion measures. However, the primary thesis of Berzins et al. (1975) was that problems of intersource consensus can be resolved through the application of alternatives to conventional methods of analysis. The principal components analysis showed four components: changes in patients' experienced distress as reported by clients on a variety of measures; changes in observable maladjustments as noted by psychometrist, client, and therapist (an instance of intersource agreement); changes in impulse expression (an instance of intersource disagreement between psychometrist and therapist); and changes in self-acceptance (another type of client-perceived change). The practical implications of these results is that a single criterion might suffice for measuring changes in one area of interest such as maladjustment, whereas this practice would be misleading if a single criteria were employed in another area of functioning, such as impulse control. J. Mintz, Luborsky, and Christoph (1979) addressed the question of intersource consensus by analyzing data in two large uncontrolled studies of psychotherapythe Penn Psychotherapy Project and the Chicago study, reported by Cartwright et al. (1963). They reported that there was substantial agreement among the viewpoints of patient, therapist, and outside raters when outcome was defined broadly as posttherapy adjustment or overall benefit. They concluded that, contrary to common opinion, consensus measures of psychotherapy outcome could be defined meaningfully. Despite this consensus, they noted that "distinct viewpoints do exist" (p. 331). In fact, when considering the effect sizes reported by J. Mintz et al. (1979), it is clear that the range of improvement varied, as a minimum from .52 to .93 on pre- to postchanges. The lowest effect size came from the MMPI Hypochondriasis scale and the highest from the Inventory of Social and Psychological Functioning, an observer rating of social adjustment. In addition, although correlations between viewpoints were statistically significant, they were often low. For example, in the Chicago data (N = 93), correlations between viewpoints on ratings of adjustment ranged from .39 to .59. The lack of consensus across sources of outcome evaluation, especially when each source presumably is assessing the same phenomena, has been viewed as a threat to the validity of data. Indeed, it appears that outcome data provide evidence about changes made by the individual, as well as information about the differing value orientations and motivations of the individuals providing outcome data. This issue has been dealt with in several ways, ranging from discussion of "biasing motivations" and ways to minimize bias, to discussions of the value orientation of those involved (Docherty & Streeter, 1996; Strupp & Hadley, 1977). The consistency of findings illuminating factors associated with the source of ratings, rather than the content of patient problems, highlights the need to pay careful attention to divergence of changes that follow psychological interventions, and the way information from different perspectives is analyzed and reported in outcome studies. The fact that source factors have been replicated across a variety of scales, patient populations, and three or four decades also suggests that these findings are robust. It is clear that outcome studies need to collect outcome data

< previous page

page_124

next page >

< previous page

page_125

next page > Page 125

from a variety of sources. Finding ways to combine these data to estimate overall change remains a task for future research. Technology of Change Measures In addition to selecting different sources to reflect change, the technology used in devising scales can have an impact on the final index of change. The measures can vary depending on the degree to which they show large versus small effects in studies of outcome. Smith, Glass, and Miller (1980) suggested that several factors associated with rating scales affect estimates of psychotherapy outcome. When summed, these factors were labeled ''reactivity." The variables of importance were the degree to which a measure could be influenced by either the client or therapist, the similarity between therapy goals and the measure, and the degree of blinding in the assessment process. The correlation between these dimensions (ratings of reactivity) and effect size was .18. This was a statistically significant and substantial relation. The type of outcome measure was categorized. Those measures showing the highest effect sizes were ratings of fear and anxiety, vocational or personal development, emotional somatic complaints, and measures of global adjustment. Those with the smallest effect sizes were personality traits, life indicators of adjustment, work, and school achievement. Table 4.2 lists several different technologies (or procedures) that have been employed in outcome measurement. These include evaluation (global ratings including measures of client satisfaction), description (specific symptom indexes), observation (behavioral counts), and status (physiological and institutional measures). Unfortunately, these procedures for collecting outcome data on patient change vary simultaneously on several dimensions, making it difficult to isolate the aspect of the measurement method that may be most important. For example, J. Mintz, L. I. Mintz, Arvuda and Hwang (1993), studying treatments of depression, showed symptomatology regularly remitted before positive changes were found in social and work functioning. However, the extent to which content versus technology played a role in the difference is not clear. A broad dimension on which technologies vary appears to be a direct-indirect dimension. Here, the data are seen as possibly reflecting a bias determined by the propensity of subjects to produce effects consciously. Thus, global ratings of outcome and client satisfaction measures call for (either implicitly or explicitly) raters (usually clients) to directly evaluate outcome. Their attention is drawn to the question, "Did I get better in therapy?" In contrast, specific symptom indices focus the raters' attention (before and after treatment) on the status of specific symptoms and signs at the time the rating is made without explicit references to the outcome of therapy. Although there is still knowledge at posttesting that the therapy (or even the therapist) is being evaluated directly, the tendency to rate change is diminished compared with global ratings. Observer ratings in the form of behavioral counts can be even more objective if enough attention is devoted to the procedures that are used. Ideally, these observer ratings call for counting behaviors in real-life circumstances, in which the patients do not know they are being observed or have plenty to focus on besides the impression they are making on the observer. Physiological monitoring usually is not under the conscious control of the patient or, at the very least, presents a real and serious challenge to conscious distortion. Institutional measures, such as grade point average (GPA), are usually the culmination of a host of complex behaviors influenced by a wide variety of

< previous page

page_125

next page >

< previous page

page_126

next page > Page 126

factors, and usually this type of behavior is produced without reference to the research project. Therefore, this type of data may be the least reactive data that can be collected. Green, Gleser, Stone, and Siefert (1975) compared final status scores, pretreatment to posttreatment difference scores, and direct ratings of global improvement in 50 patients seen in brief crisis-oriented psychotherapy. The Hopkins Symptom Checklist was filled out by the patient, whereas a research psychiatrist rated the patient on the Psychiatric Evaluation Form and the Hamilton Depression Rating Scale. Ratings of global improvement were made by the patient and the therapist. Green et al. (1975) concluded that the type of rating scale used has a great deal to do with the percentage of patients considered improvedmore so, in fact, than improvement per se. They also suggested that outcome scores have more to do with the finesse of rating scales than whether ratings are objective. Global improvement ratings by therapists and patients showed very high rates of improvement, with no patients claiming to do worse. When patients had to rate their symptoms more specifically, however, as with the Hopkins Symptom Checklist, they were likely to indicate actual intensification of some symptoms and to provide more conservative data than gross estimates of change (see Garfield, Prager, & Bergin, 1971). Ratings of client satisfaction are valuable as indices of outcome and often produce results correlated with other technologies and methods of assessing outcome (Berger, 1983). Waxman (1996), for example, showed that of 22 patients having the poorest treatment response (as measured by the Brief Symptom Index), 21 were also the most dissatisfied with treatment. But, because satisfaction scales usually do not provide the kind of theoretically or practically important information desired in outcome research (e.g., the specific kind of symptoms or problems that are changing during therapy), they are not valued highly in theoretically oriented work. Nevertheless, they can provide important data about satisfaction and limited information about improvement, albeit more soft data than that which usually is sought in formal research (Lambert, Christensen, & DeJulio, 1983). Although measures of satisfaction and gross ratings of outcome have been eschewed in efficacy research (e.g., clinical trials), they are frequently mentioned as important by those conducting research in managed care organizations (Gerarty, 1996). However, patient satisfaction is often independent of other clinical outcomes. It appears that staff (not therapist) capabilities, facility amenities, and to a large degree, patient expectations account for a significant portion of the variance in patient satisfaction scores (Hall, Elliot, & Stiles, 1992; Hsieh & Kagle, 1991). They also are highly sensitive to methodological circumstances. For example, phone follow-ups are overly sensitive to interviewer behavior, and they vary a great deal as a consequence of time from discharge, anonymity of feedback, face-to-face encounter with the therapist, and perceived purpose of the data gathering effort. Outcome Assessment Must be Sensitive to Change As the preceding discussion suggests, a central issue in outcome assessment is the degree to which different measures and measurement methods are likely to reflect changes that actually occur as a result of participation in therapy. For example, if the Beck Depression Inventory (a self-report instrument) is chosen as an outcome measure, will it reflect the same degree of change as the Hamilton Rating Scale for Depression (a clinician rating)? Will gross ratings of overall change provided by the patient show larger or smaller

< previous page

page_126

next page >

< previous page

page_127

next page > Page 127

amounts of improvement than a scale that measures change on specific symptoms? To what extent are the conclusions drawn in comparative outcome studies determined by the specific measures selected by researchers? Do the techniques of meta-analysis actually allow clinicians to summarize across the different outcome measures employed in different studies (essentially combining them), and thereby facilitate accurate conclusions about differential treatment effects? There is a growing body of evidence to suggest that there are reliable differences in the sensitivity of instruments to change. In fact, the differences between measures is not trivial, but large enough to raise questions about the interpretation of research studies. Two examples of such differences will make the importance of instrument selection clear. Table 4.3 presents data from the Ogles et al. (1990) review of agoraphobia outcome studies published in the 1980s. The effect sizes presented (based on pretest-posttest differences) show remarkable disparity in estimates of improvement as a function of the outcome instrument or method of measurement selected for a study. The two extremes based on measuring scales, Fear Survey Schedule (M = .99) and Phobic Anxiety and Avoidance Scale (M = 2.66), suggest different conclusions. The average patient taking the Fear Survey Schedule moved from the mean (50th percentile) of the pretest group to the 16th percentile after treatment. In contrast, the average patient being assessed with measures of Phobic Anxiety/Avoidance moved from the 50th percentile of the pretest group to the .00 percentile of the pretest group following treatment. Comparisons between the measures depicted in Table 4.3 are confounded somewhat by the fact that the data were aggregated across all studies that used either measure. But, similar results can be found when only studies that give both measures to a patient sample are aggregated. Table 4.4 presents data from a comparison of three frequently employed measures of depression: the Beck Depression Inventory (BDI) and Zung Self-Rating Scale for Depression (ZSRS), self-report inventories; and the Hamilton Rating Scale for Depression to (HRSD), an expert judge rating (Lambert, Hatch, Kingston, & Edwards, 1986). Meta-analytic results suggest that the most popular dependent measures used to assess depression following treatment provide reliably different pictures of change. It appears that the HRSD, as employed by trained professional interviewers, provides a significantly larger index of change than the BDI and ZSRS. Because the amount of actual improvement that patients experience after treatment TABLE 4.3 Overall Effect Size (ES) Means and Standard Deviations by Scale Scale Na Mes SDes Phobic anxiety and avoidance 65 2.66 1.83 Global Assessment Scale 31 2.30 1.14 Self-Rating severity 52 2.12 1.55 Fear Questionnaire 56 1.93 1.30 Anxiety during BATb 48 1.36 .85 Behavioral Approach Test 54 1.15 1.07 Depression measures 60 1.11 .72 Fear Survey Schedule 26 .99 .47 Heart rate 21 .44 .56 Note. Based on Ogles, Lambert, Weight, and Payne (1990). aN = the number of treatments whose effects were measured by each scale b BAT = Behavioral Avoidance Test

< previous page

page_127

next page >

< previous page

page_128

next page > Page 128

TABLE 4.4 Matched Pairs of Mean Effect Size (ES) Values Scale Pair Na Mesb SDes t HRSD/ZSRS 17 0.94*/0.62* 0.61/0.30 1.88 BDI/HRSD 49 1.16**/1.57** 0.86/1.08 2.11 0.70/1.03 ZSRS/BDI 13 0.46/0.52 1.65 Note. HRSD = Hamilton Rating Scale for Depression, ZSRS = Zung Self-Rating Scale, BDI = Beck Depression Inventory. From Lambert, Hatch, Kingston, and Edwards (1986). Reprinted by permission of the American Psychological Association and authors. aN = the number of treatments whose effects were measured by each pair of depression scales. b Values derived from studies in which subjects' depression was measured on two scales at a time. Effect-size represents within-study comparisons. * p < .05 **p < .25 is never known, these findings are subject to several different interpretations. It may mean that the HRSD overestimates patient improvement, but it could be argued just as easily that the HRSD accurately reflects improvement and the BDI and ZSRS underestimate the amount of actual improvement. Both over- and underestimation also may be suggested, with true change falling somewhere in between the HRSD estimate and those provided by the BDI and ZSRS. However, there are reliable differences between measures and these differences need to be explored and understood. Do self-report scales generally produce smaller effects than expert judge ratings? Is the difference due to the fact that the content of the scales is not identical or that different sources are providing the data? Additional meta-analytic data suggest further differences between the size of treatment effects produced by different outcome measures (cf. Miller & Berman, 1983; Ogles et al., 1990; D. A. Shapiro & D. Shapiro, 1982; Smith et al., 1980). Abstracting from these and related studies, the following conclusions tentatively can be drawn: (a) Therapist and expert judge-based data, in which judges are aware of the treatment status of clients, produce larger effect sizes than self-report data, data produced by significant relevant others, institutional data, or instrumental data. (b) Gross ratings of change produce larger estimates of change than ratings on specific dimensions or symptoms. (c) Change measures based on the specific targets of therapy (such as individualized goals or anxiety-based measures taken in specific situations) produce larger effect sizes than more distal measures, including tests of personality. (d) Life adjustment measures that tap social role performance in the natural setting (e.g., GPA) produce smaller effect sizes than more laboratory-based measures. (e) Measures collected soon after therapy show larger effect sizes than measures collected at a later date. And (f), physiological measures such as heart rate usually show relatively small treatment effects. These tentative conclusions are worthy of continual exploration in future research. This research should replicate past research in at least one regardin instructing those who provide data (such as clients) to report their status or judgments as honestly as possible. Past research has been interested in discovering the truth about outcomes, not in providing inflated estimates of treatment effects. Many measures are very susceptible to the instructional set given to those who are providing the data. Further research is needed to clarify the various factors that inflate and deflate estimates of change. For now, however, it is clear that dependent measures are not equivalent in their tendencies to reflect change and that meta-analysis, because it

< previous page

page_128

next page >

< previous page

page_129

next page > Page 129

typically is used to combine different measures, cannot overcome the differences between measures. The thoughtful researcher and consumer of research will give careful attention to the way in which technology, source, social level, time orientation, and content of measurement effects estimates of improvement and the meaning of the results of outcome studies. The effect size and practical importance of treatment effects remain highly dependent on which dependent measures are used to assess change. Practical Advances that May Close the Gap Between Research and Practice Outcome assessment, as part of an applied science of psychotherapy research, should have a direct impact on clinical practice. Two developments in outcome assessment may play an important role in bridging the gap between research and practice: individualizing outcome assessment and assessing clinically significant outcome. Individualizing Outcome Assessment As already pointed out, state-of-the-art assessment of outcome relies heavily on the application of atheoretical, monotrait, standardized scales applied with homogeneous patient samples. This practice can be contrasted with the practice of relying on a careful analysis of the unique goals of an individual patient. The possibility of tailoring change criteria to each individual in therapy was mentioned frequently in the 1970s and seemed to offer intriguing alternatives for resolving several recalcitrant dilemmas in measuring change. In the 1980s and early 1990s, there has been a new surge of interest in making change measures more ideographic. This interest has been bolstered by the flux of general articles on qualitative research methods (e.g., Polkinghorne, 1991) and the desire to make psychotherapy research more responsive to the needs of the clinician and general clinical practice. Typical of these approaches is the case-formulation method advocated by Persons (1991). She criticized outcome research for being incompatible with psychotherapy as it actually is practiced. Among her criticisms was the overreliance of research on standardized measures of outcome. She noted that even patients with a homogeneous disorder have a wide range of problems, including work problems, social isolation, financial stresses, medical problems, and tension in relationships with parents, spouse, children, or friends, to name a few. She argued that the typical standardized assessment procedure ignores most of these difficulties, whereas the therapist does not. She further noted that these assessment procedures in psychotherapy, as guided by theory, are ideographic and multifaceted, not standardized and limited to a single problem or related set of symptoms. Her suggestions for improving psychotherapy research called for individualization of outcome: Each patient will have a different set of problems assessed with a different set of measures. Her suggestions have not gone unchallenged (Garfield, 1991; Herbert & Mueser, 1991; Messer, 1991; Schacht, 1991; Silverman, 1991). But Persons (1991) was hardly the first to make such recommendations. For example, Strupp, Schacht, and Henry (1988) argued for the principle of problem-treatment-outcome congruence. Unfortunately, theirs and Person's proposals have yet to face the foreboding task of

< previous page

page_129

next page >

< previous page

page_130

next page > Page 130

empirical application. Similar, if not more practical, approaches were undertaken in the 1970s and 1980s with mixed success. One method that has received widespread attention and use is Goal Attainment Scaling (GAS; Kiresuk & Sherman, 1968). GAS requires that a number of treatment goals be set up prior to intervention. These goals are formulated by an individual or a combination of clinicians, client, and/or a committee assigned to the task. For each goal specified, a scale with a graded series of likely outcomes, ranging from least to most favorable, is devised. These goals are formulated and specified with sufficient precision that an independent observer can determine the point at which the patient is functioning at a given time. The procedure also allows for transformation of the overall attainment of specific goals into a standard score. In using this method for the treatment of obesity, for example, one goal could be the specification and measurement of weight loss. A second goal could be reduction of depressive symptoms as measured by a single symptom scale, such as the BDI. Marital satisfaction could be assessed if the patient has serious marital problems. The particular scales and behaviors examined could be varied from patient to patient, and, of course, other specific types of diverse measures from additional points of view can be included. Several methodological issues need to be attended to while using GAS or similar methodology in controlled research (Cytrynbaum, Ginath, Birdwell, & Brandt, 1979). GAS has been applied within a variety of settings with varied success. Woodward, Santa-Barbara, Levin, and Epstein (1978) examined the role of GAS in studying family therapy outcome. Woodward et al. used content analysis to analyze the nature of the goals that were set and the kind of goals set by therapists of different types. In their study, which focused on termination and 6-month follow-up goals, 270 families were considered. This resulted in an analysis of 1,005 goals. Woodward et al., advocates of GAS, reported reliable ratings reflecting diverse changes in the families studied. They also noted that GAS correlated with other measures of outcome, and thus seemed to be valid. This study also suggests an advantage of GAS: It not only is applicable with individuals, but can be used to express change in larger systems. Thus, it has been recommended for use in marital and family therapy (Russell, Olson, Sprenkle, & Atilano, 1983). It continues to be applied with families as a way to express changes in the family as a whole, rather than limiting assessment to the identified patient (Fleuridas, Rosenthal, G. K. Leigh, & T. E. Leigh, 1990). More critical analyses show that GAS suffers from many of the same difficulties as other individualized goalsetting procedures. The correlations between goals seems to be around .65, raising the question of their independence. Goals judged either too easy or too hard to obtain are often included for analysis, but, most important, goal attainment is judged on a relative rather than an absolute basis so that behavior change is confounded with expectations as well as importance (Clark & Caudrey, 1986). Further, the choice and attainment of goals are related to client as well as therapist characteristics that effect goal setting as well as change rating. Calsyn and Davidson (1978) reviewed and assessed GAS as an evaluative procedure. They suggested that GAS has poor reliability because there is insufficient agreement between raters on the applicability of predefined content categories to particular patients. In addition, the interrater agreement for goal attainment ranged from r = .51 to .85, indicating variability between those making ratings (e.g., therapist, client, expert judge). In general, studies that have correlated GAS improvement ratings with other ratings of improvement, such as MMPI scores, client satisfaction, and therapist improvement

< previous page

page_130

next page >

< previous page

page_131

next page > Page 131

ratings, have failed to show substantial agreement and frequently coefficients have been below .30 (Fleuridas et al., 1990). In addition, Calsyn and Davidson pointed out that the use of GAS also frequently eliminates the use of statistical procedures, such as covariance, that could otherwise correct for sampling errors. Because of this problem, as well as the unknown effects of low reliability, it is suggested that if GAS is used it only should be used in conjunction with standard scales applied to all patients. Suggestions for the use of GAS in psychotherapy research have been made by Mintz and Kiesler (1982), and the interested researcher may wish to review their recommendations or the review by Calsyn and Davidson (1978). Since these reviews, Lewis, Spencer, Haas, and DiVittis (1987) described methods of data gathering and scale construction that they felt increase the reliability and validity of GAS. They applied GAS in conjunction with family-based interventions with inpatients. Specific procedures for goal creation and later evaluation increased reliability and validity, without reducing the advantages of individualized goals. Among the innovations they suggested was the use of GAS ratings only at follow-up, with evaluations of the pattern of adjustment built into goal expectations and evaluations. GAS still is being applied in a variety of settings such as inpatient and school (Maher & Barbrack, 1984), with a variety of patient groups and treatment methods, such as group therapy (Flowers & Booarem, 1990) or with the severely mentally retarded (Bailey & Simeonsson, 1988). However, examination of these studies reveals widespread modification in its use, so that it is misleading to consider it a single method: GAS is a variety of different methods for recording and evaluating client goal attainment. It is not possible to compare goal attainment scores accurately from one study to the next. In addition to the previously stated problems, several issues are raised, given that individualized goals will not be much more than poorly defined subjective decisions by patient or clinician; units of change derived from individually tailored goals are unequal and therefore hardly comparable; different goals are differentially susceptible to psychotherapy influence; the tendency for goals to change early in therapy (which requires revision of the GAS goals); and the fact that because some therapies have a unitary goal or set of goals, the status of individually tailored goals is tenuous. Effective individualization of goals for the purpose of empirically assessing patient change remains an ideal, rather than a reality. The clinician may be better off having a wide range of standardized scales available. Among these scales would be those itemized in this book. At the very least, a clinician should have available a measure of depression (e.g., BDI), a measure of anxiety (e.g., STAI), and a measure of relationship (e.g., Marital Adjustment Inventory) for adult patients. Clinical Versus Statistical Significance Most psychotherapy research is aimed at questions of theoretical interest. Is dynamic therapy more effective than cognitive therapy? Is exposure in vivo necessary for fear reduction? These and a host of similar questions give rise to the research designs that have been used in outcome research. The data acquired in outcome studies are submitted to statistical tests of significance. Group means are compared, the within-group and between-group variability are considered, and the resulting numerical figure is compared with a preset critical value. When the magnitude of the distance between groups is sufficiently large, it is agreed that the results are not likely to be due to chance fluctuations in sampling, thus statistical significance has been demonstrated. This is the

< previous page

page_131

next page >

< previous page

page_132

next page > Page 132

standard for most research and is an important part of the scientific process. However, a common criticism of outcome research is that the results of studies, because they typically are reported in terms of statistical significance, obscure both the clinical relevance of the findings and the impact of the treatment on specific individuals. Unfortunately, statistically significant improvements do not necessarily equal practically important improvements for the individual client. Therefore, statistically significant findings may be of limited practical value. This fact raises questions about the real contributions of empirical studies for the practice of psychotherapy. It is conceivable that, in a well-designed study, small differences between large groups after treatment could produce findings that reach statistical significance, whereas the real-life difference between patients receiving different treatments is trivial in terms of the reduction of painful symptoms. For example, a behavioral method of treatment for obesity may create a statistically significant difference between treated and untreated groups if all treated subjects lost 10 pounds and all untreated subjects lost 5 pounds. However, the clinical utility of an extra 5-pound weight loss is debatable, especially in the clinically obese patient. This dilemma goes to the core of outcome assessment: adequate definitions and quantification of improvement. Numerous attempts have been aimed at translating reports of treatment effects into metrics that reflect the importance of the changes that are made: (a) In the earliest studies of therapy outcome, patients were categorized "posttherapy" with gross ratings of "improved," "cured," and the like, implying meaningful change. The lack of precision in such ratings, however, resulted in their waning use (Lambert, 1983). (b) Those interested in operant conditioning and single-subject designs developed concepts such as ''social validity" to describe practically important improvement (Kazdin, 1977; Wolf, 1978). However, decisions about the importance of change remained somewhat subjective and unquantified. (c) Some disorders easily lend themselves to analysis of important changes, because improvement can be defined as the absence of a behavior, such as cessation of drinking, smoking, or drug use. Unfortunately, most symptoms targeted in psychotherapy cannot be defined and measured so clearly, and even where the absence of a behavior can be easily quantified, there is a lack of consensus about the proper procedures. E.A. Wells et al. (1988), for example, reported identifying 25 different ways of estimating drug use cessation. There is growing recognition that the concept of clinical significance is important and that many different approaches can be used to operationalize it (Jacobson, 1988). Several related methods were compared in a special issue of Behavioral Assessment (e.g., Kendall & Grove, 1988). A discussion and illustration of some of these methods clarify their contemporary use. Jacobson, Follette, and Revenstorf (1984) brought clinical significance into prominence by proposing statistical methods that would illuminate the degree to which individual clients recovered at the end of therapy (see also Jacobson & Truax, 1991). Recovery was proposed to be a posttest score that was more likely to belong to the functional than dysfunctional population of interest. Estimating clinical significance requires norms for the functional sample and presumes certain assumptions about the test scores have been met. For change to be clinically significant, a patient must change enough so that one can be confident that the change exceeds measurement error (calculated by a statistic titled the reliable change index, or RCI). When a patient moves from one distribution (dysfunctional) to another (functional), and the change reliably exceeds measurement error (the RCI is calculated by dividing the absolute magnitude of change by the standard error of measurement), change is viewed as clinically significant.

< previous page

page_132

next page >

< previous page

page_133

next page > Page 133

The patient is more likely functional than dysfunctional. This method provides a way to assess the reliability of an individual client's pre- and posttreatment change, and the practical meaningfulness of this change. Jacobson, Follette, Revenstorf, Baucom, et al. (1984) applied their criteria of clinical significance to studies of behavioral marital therapy. They were able to develop cutoff scores based on the normative data of functional and dysfunctional couples who had taken the Locke-Wallace Marital Adjustment Inventory. A growing number of studies have employed these techniques with various treatment samples with considerable success (Lacks & Powlishta, 1989; Mavissakalian, 1986; Perry, Shapiro, & Firth, 1986; Schmaling & Jacobson, 1987). Although this method has received favorable reviews (Ankuta & Abeles, 1993; Goldfried, Greenberg, & Marmar, 1990; Lambert & Hill 1994, Lambert, Shapiro, & Bergin, 1986), there are several limitations that have gone unaddressed. Jacobson, Follette, and Revenstorf's (1984) proposal does not provide an operationalization of a comparative social standard. This failure to define normative samples results in three specific problems: an inability to identify and use relevant normative samples across studies, the restriction of the social validation methodology by the use of only one dysfunctional and one functional sample, and the lack of a procedure to determine the distinctness of samples. Although Jacobson, Follette, and Revenstorf (1984) proposed the use of a functional and a dysfunctional sample for "whatever variable is being used to measure the clinical problem" (p. 340), they did not specify what they meant by "functional" or provide a method for determining the social relevance of the samples. In a later article, Jacobson and Revenstorf (1988) revealed their awareness of this problem, yet still provided no suggestions as to how to arrive at this agreed on use of samples. The second problem, using only two normative samples to represent extremes (functional and dysfunctional), has produced a great deal of criticism. For example, Wampold and Jenson (1986) suggested that using two sample distributions relies on the assumption that the population forms a bimodal distribution. They concluded that this is seldom the case, thus this methodology has limited utility. Other writers (e.g., Hollon & Flick, 1988) pointed out that using only two poorly defined, extreme samples results in unstable cutoff points. Another criticism of using two extreme samples is the inability of identifying individuals who may make clinically meaningful change but do not change enough to enter the functional sample's distribution. Jacobson and Revenstorf (1988) recognized this limitation and identified it as one of the most fundamental questions regarding the viability of their method. The final problem is the lack of a procedure to determine the distinctness of samples. This prevents identifying whether sample overlap exists to such an extent that comparisons between samples become meaningless. Although Jacobson, Follette, and Revenstorf (1984) cautioned that using two samples as social standards for comparisons is justified only when they form distinct distributions, they offered no method to operationalize distinctness. Given the importance of distinct samples, a method for determining such needs to be developed. To further the objective of social validation and overcome some of its problems, Tingey, Lambert, Burlingame, and Hansen (1996a) proposed three guidelines focusing on the derivation of relevant social standards: guidelines for defining and identifying relevant normative samples, utilizing the social validation methodology more fully by employing multiple normative samples, and providing a procedure for determining the distinctness of these samples.

< previous page

page_133

next page >

< previous page

page_134

next page > Page 134

Identifying Relevant Normative Sample. In addition to matching client's demographics, the proposed social validation methodology requires that clinicians identify normative samples that show a level of performance that is important or relevant to society. Two basic factors are involved in applying this guideline: choosing a specifying factora characteristic or symptom to be studied; and identifying an impact factora particular effect or impact the specifying factor has on society. Specifying and impact factors are essentially an extension of Kendall and Grove's (1988) classification of outcome assessment. The specifying factor is the characteristic or symptom being measured via an outcome instrument (i.e., depression, anxiety). To be useful, specifying factors must be clearly defined and exist in varying amounts or degrees across society. Although the specifying factor need not necessarily be observable, it must at least be conceptually sound and justifiable. The impact factor is how the specifying factor affects individuals and societyits impact on treatment utilization or job performance, and the like. The impact factor is generally defined by some behavior resulting from various levels of the specifying factor. The behavior must be considered important or relevant by some segment of society, and vary proportionally with different levels of the specifying factor. One way to determine the impact factor is to observe important behaviors that covary with different levels of the specifying factor. One relevant behavior might be involvement in treatment. It is reasonable to conclude that individuals involved in varying intensities of treatment (impact factor)such as inpatient, partial hospitalization, outpatient, or no treatmentwould demonstrate different levels of pathological symptomology (specifying factor). Thus, a reduction in a client's score from levels of symptomatic distress found in inpatient populations to that normally exhibited by outpatient populations would imply a significant positive change in impact. The value of these two guidelines is that they introduce the possibility of deriving multiple normative samples using an impact factor, and thereby anchor the assessment of clinically significant change to important social standards. Multiple Normative Samples. The use of multiple normative samples as standards requires an expanded definition of clinically significant change. Tingey et al. (1996a) further proposed that clinically significant change be defined as movement from one socially relevant sample into another based on the impact factor selected, rather than from a "dysfunctional" to a "functional" distribution as proposed by Jacobson and Revenstorf (1988). These multiple samples would be organized along a rational or empirical continuum representing low to high impact, and correspondingly, low to high levels of the specifying factor. Illustrative Example. Demonstrating the process for establishing a continuum will help clarify the aforementioned guidelines and extensions. The basic steps involve selecting a specifying factor that is defined by a reliable outcome instrument, identifying an impact factor (a behavior relevant to society that covaries with different levels of the specifying factor) and normative samples demonstrating different levels of this factor, determining the statistical distinctness of these socially relevant samples, calculating RCIs for all possible sample pairs, and calculating cutoff points between adjacent sample pairs along the continuum. In accord with the first step, the specifying factor selected was general symptoms of psychological distress as measured by the Symptom Checklist-90-Revised (SCL-90-R; Derogatis & Melisaratos, 1983). The SCL-90-R defines a clear, discrete specifying factor, global psychological distress, and measures it in a continuous fashion using a Likert scale with 90 items sampling a variety of problem areas. In addition, it is a

< previous page

page_134

next page >

< previous page

page_135

next page > Page 135

frequently used instrument in psychotherapy research (J.E. Froyd et al., 1996; Lambert et al., 1983). As such, it is well suited for developing a normative continuum. Next, a socially relevant impact factor that logically results from symptoms of psychological distress and covaries with them is treatment. It is reasonable to conclude that groups of clients involved in varying intensities of treatment (from none to inpatient treatment) with varying costs to society would show substantial differences on their SCL-90-R scores. Using this rationale, four socially relevant normative samples were identified: a sample of healthy persons who were carefully screened to exclude those in treatment (asymptomatics), an unselected sample (with regards to treatment participation) of community adults (mildly symptomatics; Derogatis & Melisaratos, 1983), people who were receiving outpatient group treatment (moderately symptomatic; Burlingame & Barlow, 1996); and a sample of persons receiving inpatient treatment (severely symptomatic; Derogatis & Melisaratos, 1983). As cited earlier, the last three samples were taken from existing literature. The second sample, mildly symptomatic, is labeled such due to the estimate that 20% of the general population experience significant levels of psychological distress (Saunders, Howard, & Newman, 1988), and people with disorders who might have been undergoing treatment were not screened out. This community-based sample provides "normal" control statistics for the norms of the SCL-90-R. The first sample, asymptomatic, was collected by Tingey et al. (1996a) and requires more explanation. The asymptomatic sample consists of specially screened patients who were not in any treatment. The screening occurred at several levels. Initially, a subject pool was derived by approaching 30 licensed psychologists from a Western state and asking them to nominate individuals from the community who they felt were psychologically healthy and high functioning. Nominated individuals were then contacted by phone and, after agreeing to participate in the study, were given a screening interview. The interview excluded subjects if they were pregnant, currently taking any psychotropic medication, or had ever been diagnosed with a mental disorder. Subjects passing the phone screening were invited to a local mental health clinic, where they were further screened using a test battery including the Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961) and the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, Lushene, & Jacobs, 1983). These instruments were selected in an attempt to screen out subjects who were symptomatic for anxiety and depressionthe two most common symptoms of psychopathology. Cutoff scores for the test battery were based on norms reported in the literature. For the BDI, subjects scoring nine or under were considered free of depression and were included in the asymptomatic sample (Beck, 1972; Beck & Beamesderfer, 1974). A score of 35 was selected as the cutoff for both the State and Trait sections of the STAI in that it corresponds to the mean score of working adult females (Spielberger et al., 1983). There were 82 subjects who passed all screening criteria. This sample had a mean age of 43.23 (SD = 13.89) and consisted of 46.9% males and 53.1% females. A more complete description of this sample and methodology of data collection can be found in Tingey (1989). Descriptive data for these four samples on the SCL-90-R Global Severity Index (GSI) are reported in Table 4.5. This index was chosen because it is the usual score reported in outcome studies to express patient improvement, and because it "represents the best single indicator of the current level or depth of the disorder" (Derogatis & Melisaratos, 1983, p. 11). As postulated, the four samples indeed form a continuum of increasing symptomatology with a progressive increase in the means of adjacent samples as one ascends the continuum.

< previous page

page_135

next page >

< previous page

page_136

next page > Page 136

TABLE 4.5 Normative Continuum of SCL-90-R Global Severity Index (GSI) Raw Scores Across Samples Sample SCL-90-R GSI Score 0.19 (0.16) Asymptomatic (N = 82) mean (SD) 0.31 (0.31) Mildly Symptomatic (N = 974) mean (SD) 0.79 (0.45) Moderate Symptomatic (N = 97) mean (SD) 1.30 (0.82) Severely Symptomatic (N = 313) mean (SD) Note. From Tingey, Lambert, Burlingame, & Hansen (1996a). With the samples identified and the continuum formed, the third step was to determine the distinctness of adjacent samples using t and "d" tests. Each sample in the continuum met two criteria when compared with the adjacent sample. All t values surpassed alpha of .05, and all "d" values surpassed a criterion of .5, signifying a moderate effect according to Cohen (1980). Once relevant normative samples were identified and their distinctness statistically verified, the last two steps of generating RCIs and cutoff points were completed. As these procedures are quite complex, the reader is referred to Tingey et al. (1996a) for further explanation. Jacobson and Revenstorf (1988) suggested establishing a confidence band based on the RCI around the cutoff to alleviate the error associated with a discrete cutting point. The cutoffs between adjacent samples define the point where it is statistically more likely for a score to be in one, as opposed to the other overlapping distribution (Jacobson, Follette, & Revenstorf, 1984; Jacobson & Truax, 1991). The cutoff points plus the confidence bands around them indicate the boundary between two adjacent normative distributions that must be crossed from preto posttreatment in order for this difference to be considered clinically significant using the most stringent criterion (Jacobson & Revenstorf, 1988). Table 4.6 presents the different cutoff points and confidence bands for adjacent sample pairs. Figure 4.1 illustrates this material in graphic form. In keeping with Jacobson and Revenstorf (1988), confidence bands have been included in both Table 4.6 and Fig. 4.1. However, Tingey et al. (1996a) did not propose that a client should be required to exceed both the RCI and the confidence band to be considered improved. With general normative standards identified and RCIs, cutoff points, and confidence bands calculated, clinicianbased or research-based outcome data can be compared to them as a standard way of judging client change. Any set of outcome data that used the SCL-90-R can be compared to the normative standards identified here to assess their clinical significance. To illustrate, selected data from a group therapy process and outcome study (Burlingame & Barlow, 1996) were applied. In this study, 97 outpatients underwent group treatment for 15 weeks. They were assessed with the SCL-90-R at pretest, after 8 weeks of group treatment, after termination (at 15 weeks), and 6 months

< previous page

page_136

next page >

< previous page

page_137

next page > Page 137

TABLE 4.6 Cutoff Points and Confidence Bands for the Continuum Sample Pairs Asymptomatic Mildly Symptomatic Moderately vs. Mildly vs. Moderately Symptomatic vs. Symptomatic Symptomatic Severely Symptomatic Cutoffa 0.23 0.51 0.97 Confidence 0.15-0.31 0.38-0.64 0.76-1.19 Bandb Note. From Tingey, Lambert, Burlingame, and Hansen (1996a). a Cutoff = (Jacobson et al., 1984) b Confidence Band = 1/2 RC ± Cutoff (Jacobson & Revenstorf, 1988) following termination. For illustrative purposes, the course of therapy for four of the treated patients is illustrated in Fig. 4.2. These four patients were selected because they showed a different course in treatment. As can be seen in Fig. 4.2, two began treatment in the severely symptomatic range (c and d) and two began in the moderately symptomatic range (a and b). Patient d showed little improvement across the course of

Fig. 4.1. SCL-90-R GSI cutoffs and confidence bands for the four normative samples. From Tingey, Lambert, Burlingame, and Hansen (1996a).

< previous page

page_137

next page >

< previous page

page_138

next page > Page 138

Fig. 4.2. Illustration of clinically significant change patterns for four patients using multiple normative sample cutoffs and repeated measures. From Tingey, Lambert, Burlingame, and Hansen (1996a). treatment but showed rather dramatic improvement when assessed at follow-up. At follow-up, d had crossed the cutoff separating severe and moderate distributions as her change score exceeded the RCI of .42. Patient c, on the other hand, started within the severely symptomatic distribution making continual progress. Patient c met the criteria for clinically significant change after 8 weeks of treatment. By termination, c had not passed the cutoff for entry into the asymptomatic distribution, but this criterion was reached during the followup period. Patient b began within the moderately disturbed sample distribution and had the possibility of meeting the criteria for either clinically significant deterioration or improvement. Patient b showed a deteriorating course 8 weeks into the treatment passing the cutoff (.97) as well as changing more than .43 on the GSI. At termination, b's status was less clear. His posttreatment score was still past the cutoff (.97), but his RCI was now less than the .43 necessary to consider the change reliable. By the follow-up testing he again met the criteria for clinically significant change (deterioration).

< previous page

page_138

next page >

< previous page

page_139

next page > Page 139

Patient a's scores illustrate movement from the moderately symptomatic category into asymptomatic group. The course of improvement appeared to be relatively fast with regard to meeting the criteria for clinically significant improvement (from moderately to mildly symptomatic). By follow-up, a had passed the cutoff and moved into the asymptomatic distribution but was not below the confidence band, making his status with regards to this distribution uncertain. On the basis of meeting the criteria for clinically significant change during the active phase of treatment, patients c, b, and a, along with patients who made similar changes, were studied intensively with regard to their participation in group process and their individual pretherapy characteristics. Were their scores to be monitored during treatment, some attempt to prevent the deterioration of Patient b might have been possible. This possibility is enhanced by the development and use of cutoff scores and an estimate of reliable change. The addition of a continuum in the assessment of reliable change and clinical significance adds a considerable amount of information and softens the criteria for achieving clinical change. Clients do not necessarily have to enter the most functional sample distribution to be considered clinically improved. This addresses a complaint Jacobson, Follette, and Revenstorf (1986) made about their method being too stringent and unable to identify subjects who "gain substantial benefits from psychotherapy . . . even though they remain somewhat dysfunctional" (p. 311). In practical applications this may be particularly useful because clinically significant improvement can be reached by movement from the distribution of patients who often need very restrictive and expensive inpatient settings into the distribution of people who require less expensive outpatient treatment. At the other end of the continuum, employee assistance programs often treat patients who at the commencement of treatment are moderately symptomatic or even mildly symptomatic. The continuum makes it possible to track movement of their scores into the mildly or asymptomatic norms (i.e., even these patients can be identified as making clinically significant change). A continuum also introduces the likelihood of quantifying different degrees of clinical change and of detecting deterioration. With more than two normative samples, it now becomes possible for a client's score at posttreatment to have moved across more than one distribution in a positive direction, or to have moved in a negative direction into a less functional distribution. This valuable information could be particularly useful in differentiating the effects of different treatment types, or in evaluating the effectiveness of specific process variables (cf. deterioration rates in the NIMH collaborative depression treatment project; Ogles, Lambert, & Sawyer, 1995). Overall, a continuum appears to provide a more precise and relevant perspective on clinical change. One advantage of defining cutoff points that establish standards for identifying clinically significant change is in clinical practice. As managed health networks and government agencies insist on tracking patient outcome, it appears more likely that individual providers will search for a standard way of evaluating patient progress. Figure 4.2 provides an interesting and easy way for clinicians to use standard indices to identify patients' pretreatment status and track their progress over time while using one possible standard for improvement. Several limitations and needs for future research can be identified and require further attention. First, the Ns that make up normative samples should be substantial. Several samples rather than single samples of inpatients, outpatients, community, and asymptomatic individuals would be ideal. The popularity of the SCL-90-R and several other measures make this feasible. Investigations of the BDI (Seggar & Lambert, 1994), STAI

< previous page

page_139

next page >

< previous page

page_140

next page > Page 140

(Condon & Lambert, 1994) and the Achenbach Child Behavior Checklist (E.M. Grundy & Lambert, 1994) suggest that these scales have ample normative data. On the other hand, it is difficult to find scores for the untreated subjects on the Hamilton Rating Scale for Depression and, occasionally, scores on these scales do not discriminate between inpatients and outpatients (C.T. Grundy, Lambert, & E. Grundy, 1996). Large sample sizes would also help to insure that the distributions approximate normality. Much of the work on clinical significance assumes normality of distributions, and the consequences of violating this assumption have not been investigated. The size of the correlation coefficient selected to calculate RCI makes a large impact on estimates of reliable change. Rather than selecting a single reliability coefficient, it may be best to take the median figure from a number of studies of reliability. It can be argued that the coefficient should reflect stability over the course of treatment, a time period that could reach 20 weeks. It can also be argued that the coefficient should be based on 1 or 2 weeks, that is, on the time frame in which patients are being asked to rate their symptoms. Even shorter intervals for patient populations may be appropriate. However, 1-week test-retest data are generally not available and an approximate substitute that approaches this very tight time reference will need to be identified. In this respect, it may be appropriate to use an alpha coefficient to represent such a short time frame. The coefficient should reflect measurement error for an individual's score at the time that individual is assessed rather than reflecting changes in the score that could be presumed to be due to treatment. Another important issue in defining clinically significant change is the effect of having a limited range of scores such as those in the more functional distribution of the SCL-90-R and other specific tests that are used in outcome research. Although the normative samples plotted in Fig. 4.1 meet the proposed criteria of distinctiveness, the sizable reduction in differences between means for adjacent normative samples (e.g., severe to moderate, moderate to mild, mild to asymptomatic) coupled with a parallel reduction in variance across samples provide ample evidence for a floor effect on the GSI. This phenomena mandates the use of several RCI indices rather than one standard figure for the test (GSI) as a whole. Using one RCI computed from all the normative samples would result in an RCI that is too large to be useful at the lower end of the symptomatic (GSI) continuum. Most outcome measures were made for studying the presence rather than the absence of symptoms, suggesting the need to expand the capacity of measures to properly assess this end of the continuum. A final issue of importance is the value and desirability of placing confidence bands around the cutoff scores. Using bands around a cutoff score increases the degree of certainty that patients are classified correctly. Although the idea of using confidence bands has appeal and data on confidence bands has been presented for illustrative purposes, the use of bands results in practical problems that argue against their value in research and clinical settings. One practical problem is that bands result in unclear classification of patient status prior to and following therapy. It is desirable to calculate the percentage of patients who are among the ranks of the disturbed sample both before and after treatment. The use of a band makes classification ambiguous and needlessly complex. At pretreatment, it may result in large numbers of patients who cannot be considered dysfunctional. At posttreatment, it results in patients having to meet three criteria for improvement instead of two. For many patients (those above the cutoff, but within the band) it will not be possible to meet all three criteria. It is recommended that researchers use only the cutoff score and the RCI in presenting data about clinically significant change.

< previous page

page_140

next page >

< previous page

page_141

next page > Page 141

Similar and Related Procedures Additional examples of estimating clinically significant change have been published in recent years. These methods have emphasized the use of normative comparison. Examples include the use of social drinking behaviors as criteria for outcome in the treatment of problem drinking, or the use of definitions of adequate sexual performance (e.g., ratio of orgasms to attempts at sex, or as time to orgasm following penetration; Sabalis, 1983). These criteria are based on data about the normal functioning of individuals and can be applied easily and meaningfully with a number of disorders where normal or ideal functioning is readily apparent and easily measured (e.g., obesity, suicidal behavior). But normative comparison also can be used to quantify clinical significance. This strategy involves comparing the behavior of clients before and after treatment to that of a sample of nondisturbed "normal" peers. This method has the advantage that comparisons can be based directly on the psychological tests commonly used to measure therapy outcome, if a standardization sample of nonpatients is also available. Usually the procedure involves comparing the end state functioning of treated clients to various control groups. Thus, standards of clinical improvement can be based on normative data and posttreatment status gathered through meta-analysis of multiple samples of patients, instead of the magnitude of change of specific individual patients. For example, Trull, Nietzel, and Main (1988) reported a meta-analytical review of 19 studies of agoraphobia that used the Fear Questionnaire. Self-reported posttreatment adjustment of agoraphobics was compared with two normative samples. The normative samples were based on college students (at two universities) and a community sample drawn randomly from the phone directory. Both samples included subjects who had never received treatment for a phobic condition. As might be expected, the community sample was more disturbed than the college sample, probably because agoraphobia prohibits or inhibits attendance in college classes. As a consequence, in this study estimates of clinically significant change via normative comparison turned out to be a function of which normative group was used for comparison. Agoraphobics, treated mainly with exposure, improved during treatment. The average agoraphobic started at the 99th percentile of the college norms and improved to the 98.7th percentile at the end of treatment. The average agoraphobic also started at the 97th percentile of the community norm and progressed to the 68th percentile at posttreatment and to the 65.5th percentile at follow-up. Using similar methodology Nietzel, Russell, Hemmings, and Gretter (1987) studied the clinical significance of psychotherapy for unipolar depression. They compared the posttherapy adjustment of depressed and nondepressed adults who took the BDI. In all, 28 published studies were used to calculate composite BDI norms; these were compared with outcomes from 31 outcome studies that yielded 60 effect sizes. Three normative groups could be identified: a nondistressed group; a general population group (consisting mostly of collegiate subjects); and a situationally distressed group (e.g., pregnant women), which turned out to be very similar to the general population samples. Comparisons contrasting the depressed patients with the normative samples suggested that the various treatments (all of which appeared similar in their effectiveness) produced clinically significant changes in relation to the general population. In fact, the average depressed patient moved from the 99th percentile of the general population norms to the 76th percentile of this reference sample. These gains were maintained at follow-up. In reference to the nondistressed group, the same improvements were much less remarkable. The average patient only moved from the 99th percentile to the 95th percentile.

< previous page

page_141

next page >

< previous page

page_142

next page > Page 142

Nietzel et al. concluded that clinically significant improvement depends on the nature of the normative sample. Obviously, selection of normative samples has a high impact on estimates of meaningful improvement. A recent study combining various methods of calculating clinical significance illustrates the potential of using more than one procedure. Scott and Stradling (1990) studied and contrasted the effects of cognitive therapy offered in either an individual or group format to patients who were depressed. They reported results from an analysis of BDI scores. Patients were assigned to either of the treatment groups or a wait-list control while still receiving their customary treatment from their general practitioner (which included tricyclic medication in about one half of the patients). Besides the usual group comparisons based on inferential statistics, Scott and Stradling reported clinically significant improvements as well. They reported the percentage of patients reaching various cutoff scores on the BDI. Using Kendall, Hollon, Beck, Hammen, and Ingram's (1987) criteria for nondepression, mild depression, moderate depression, and severe depression, Scott and Stradling were able to show obvious differences between wait list and psychotherapy outcome over the 12 weeks of treatment, and for 1-year follow-up. Scott and Stradling (1990) also applied the RCI as a primary criterion, showing that patient change was of a great enough magnitude so that patients could reasonably be considered to have left the ranks of the dysfunctional. In fact, using the RCI, they estimated that 100% of those in the group treatment and 84% of those in individual treatment manifest clinically significant improvement. Fifty-three percent on the wait list showed similar improvement. In addition, 5% of the wait list subjects deteriorated, whereas none of the treatment subjects did. Although there have been favorable reviews, numerous problems remain with this methodology. These include the complexities created by the fact that researchers use multiple outcome measures, each one possibly providing different information about the individual and the group as a whole. One measure may show clinically significant change for the group or a specific individual, whereas another does not. Other problems include the use of discrete cutoff points and their derivation; the problems that result from score distributions that are not normal; and the limitations of floor and ceiling effects in many of the most frequently used tests. This latter problem is especially serious, because many tests are weighted heavily toward pathology and not developed for use with people who represent the actualized end of the continuum of functioning. In some instances, it is this actualized end of the continuum that represents the patients' non-disturbed peers. Moreover, there is considerable controversy about procedural and statistical analyses (cf. Lacks & Powlishta, 1989) that have substantial impact on estimates of clinical significance. Some procedures provide more conservative criteria, whereas others are lenient. Thus, statistical methods do not eliminate the seemingly inevitable application of values to operationalizing outcome. These statistical methods make the judgments explicit and replicable, so that researchers can equate clinical significance across studies. Finally, the use of multiple normative samples requires an expanded definition of clinically significant change. These procedures have been criticized by Follette and Callaghan (1996), who considered an enlarged definition of clinical significance to obscure its underlying principle to measure change in a way that is meaningful to the client. Martinovich, Saunders, and Howard (1996) argued that the extensions advocated by Tingey et al. (1996a) compound the problems associated with clinical significance rather than solving them. They suggested several methods for solving some of the problems inherent in this methodology. Tingey, Lambert, Burlingame, and Hansen

< previous page

page_142

next page >

< previous page

page_143

next page > Page 143

(1996b) responded to these and other criticisms. The interested reader will find the interchange quite helpful in evaluating the uses and problems with clinical significance. The development of statistically defined clinically significant change, although not without controversy, should be applauded and encouraged. Clinicians now have, at their disposal, normative data that allow for estimating clinically significant improvement on a few important outcome measures (e.g., BDI, SCL-90-R, Locke-Wallace Marital Adjustment Inventory, Fear Questionnaire, Child Behavior Checklist, Hamilton Depression Rating Scale, Outcome Questionnaire, and Inventory for Interpersonal Problems; the interested reader can find these cutoffs in Ogles et al., 1996). Data on clinical significance may stimulate research applications in private practice settings, as well as improve the translation of research findings into clinician friendly facts. Issues in Need of Research There is a long list of research topics that would be welcome in the struggle to make outcome assessment more useful to clinicians and to policymakers. Two topics, cost-effectiveness and provider profiling, are highlighted here. Cost-Effective Care as an Outcome Among those outcomes that can be monitored, the cost-effectiveness of treatments is an interesting but rarely studied phenomenon. In a society that is becoming more preoccupied with cost containment, the costeffectiveness of treatment is an important outcome. The value of care is often defined as the trade-off between quality of care (or traditional clinical outcomes) in relation to dollars spent on care. To health plans and employers (if not patients), the value of care, or cost-effectiveness of care, should be as important as absolute costs for deciding on a treatmentafter all, there is little point in spending money on something that provides no benefit just because it is cheap. Cost-effectiveness data is particularly important when the effects of different treatments are equala state of affairs that is common in psychotherapy and pharmacotherapy (Lambert & Bergin, 1994). Perhaps the best example of research on this topic is the Rand Medical Outcome Study, which examined outcomes in depression (K.B. Wells, Strum, Sherbourne, & Meredith, 1996). This study examined systems of care, patient case mix, process of care, utilization, and clinical outcomes in an indirect structural analysis in order to develop models that inform policy and treatment decisions. Wells et al. found that treatment increases mental health services utilization and costs, regardless of provider specialty (general practitioner, psychiatrist, or other mental health specialist). The lowest cost, but also the worst outcomes for depression, were found in the general medical sector; the highest costs, but also the best outcomes, occurred in psychiatry. When costeffectiveness ratios were calculated, the greatest ''value" was to be found in the other mental health specialist provider group (psychologists, etc.). Wells et al. also estimated that quality improvement programs or decisions can make substantial improvements in cost-effectiveness or value. Without quality improvement that takes into account the cost-benefit ratio of different treatment, the current tendency to shift treatment toward

< previous page

page_143

next page >

< previous page

page_144

next page > Page 144

general medical practitioners may continue because it reduces costs. Cost-benefit studies show that such decisions worsen patient functioning. Whereas the results of the Rand study are highly complicated and complex (depending on type of depression, follow-up time period, etc.), it is obvious that the results provide a rich source of data. Using the data analytic procedures utilized by Wells and colleagues, it is possible to calculate the amount of money it costs to reduce a single symptom by a particular amount (e.g., what it costs to reduce the number of headaches a person has by one per week through the use of medication versus biofeedback). It also would be possible to estimate the cost of bringing a depressed patient into a normal state of functioning (and keep them there). Moreover, it is possible to compare the costs associated with specific treatment strategiesthat is, the cost-effectiveness of group versus individual cognitive behavior therapy. The Rand study of depression was a large-scale, extensive effort costing approximately $4 million to complete (K.B. Wells et al., 1996). Numerous other studies have been conducted on a variety of other disorders, such as chronic pain (Texidor & Taylor, 1991) and psychosomatic disorders (Pautler, 1991), but none have reached the scope of the Rand study. The limited number of studies and their diversity make it difficult to identify the best methods of estimating costs. Like the area of clinical outcome measurement, there are few agreed on methods of estimating treatment costs. The Rand study defined health as the number of serious functioning limitations. Costs were based on the "direct" costs of providing services to treat depression, and the value of care was estimated for the cost of reducing one or more functioning limitations. Other researchers estimate cost as an average cost of providing treatment for an episode of illness per patient by adding up staff expenses (including benefits) and then dividing by the number of patients treated in a year (Melson, 1995). Researchers have also attempted to estimate social costs, such as those that arise from lost productivity, use of social services and criminal justice system, and use of other health services. Cost-benefit analysis combined with estimates of outcome based on clinical significance could be usefully applied in the managed mental health setting to understand the consequences of rationing treatment. What is the "value" of fewer sessions versus more sessions on the long-term adjustment of patients? The complexity of these issues is beyond the scope of this chapter. Suffice it to say that estimates of the cost, cost-effectiveness, and medical cost offsets of psychotherapy are important topics in the assessment of psychotherapy outcome. McKenzie (1995), for example, argued that the relative equivalence of outcome in group and individual psychotherapy can be a powerful argument for the use of group therapy when the cost of delivering treatment and cost-benefit group therapy is considered. At the very least, this finding emphasizes the importance of research aimed at selecting patients who are most suitable for group treatment. The Importance of Tracking Outcome for Quality Care: The Case for Provider Profiling Considerable psychotherapy outcome research shows that a major contributor to patient improvement and deterioration is the individual therapist (Lambert & Bergin, 1994). Despite current emphasis on "empirically validated" treatments (Task Force, 1996), manual-based therapy (Lambert, 1998; Wilson, in press), and treatment guidelines that assume the curative power of therapy rests on treatment techniques, ample evidence

< previous page

page_144

next page >

< previous page

page_145

next page > Page 145

suggests the importance of particular therapists for positive outcomes (Garfield, 1996; Lambert & Okiishi, 1997). One clear implication of this finding is that it is important to use psychological tests to track patient outcome (by provider) for the purpose of increasing the quality of services (Clement, 1996; Lambert & Brown, 1996). This type of research can be expected to directly modify the practices of clinicians while research on specific disorders (clinical trials) will be much slower in having a real impact on clinical practice. It is important to use outcome measures that can provide clinicians with information about the effectiveness of their practice. Figure 27.6 in chapter 27 (this volume) presents data on patients and their clinicians who work in different outpatient clinics. The data presented suggest clear differences in average pretherapy levels of disturbance and also in the average amount of change associated with particular therapists. These data alert the clinicians to the fact that they have different outcomes and suggest the need for particular clinicians to explore the reasons for poorer outcomes in their patients (relative to other clinicians). Figure 27.4 from chapter 27 (this volume) suggests the unusual rapidity of improvement in patients treated by therapist L.J. This provider profile, based on repeated measurement of patient progress, shows an unusual pattern of improvement and calls for exploration of the methods used by L.J. Without the use of a reliable tracking device along with criteria of successful outcome it would be far more difficult to compare patient/clinician success rates. It is not difficult to see the advantages of this methodology for clinicians, health systems, and most importantly patients. Tracking patient outcome through the use of meaningful outcome measures can result in improved clinical decision making and quality of patient care. Conclusions The assessment of psychotherapy outcome is an important endeavor, impacting both the science and practice of mental health services. Outcome assessment is based on a rich tradition of research, and has shown steady improvement as a scientific endeavor over the last five decades. The phenomenon of measuring change and improvement is a fascinating, although presently chaotic, topic of scientific inquiry. It is hoped that it will continue to be a fruitful ground for collaborative efforts and important discoveries. This exciting area of inquiry is wide open to the energetic and gifted student. Important discoveries await the determined and patient researcher. References Ankuta, G.Y., & Abeles, N. (1993). Client satisfaction, clinical significance, and meaningful change in psychotherapy. Professional Psychology: Research and Practice, 24, 70-74. Bailey, D.B., & Simeonsson, R.J. (1988). Investigation of use of goal attainment scaling to evaluate individual progress of clients with severe and profound mental retardation. Mental Retardation, 26, 289-295. Beck, A.T. (1972). Depression: Causes and treatment. Philadelphia: University of Pennsylvania Press. Beck, A.T., & Beamesderfer, A. (1974). Assessment of depression: The depression inventory. In P. Pichot (Ed.), Psychological measurements in psychopharmacology, modern problems in pharmacopsychiatry (Vol. 7, pp. 151-169). Basel, Switzerland: Karger.

< previous page

page_145

next page >

< previous page

page_146

next page > Page 146

Beck, A.T., Ward, C.H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571. Berger, M. (1983). Toward maximizing the utility of consumer satisfaction as an outcome measure. In M.J. Lambert, E.R. Christensen, & S.S. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 56-80). New York: Wiley. Berzins, J.I., Bednar, R.L., & Severy, L.J. (1975). The problem of intersource consensus in measuring therapeutic outcomes: New data and multivariate perspectives. Journal of Abnormal Psychology, 84, 10-19. Beutler, L.E., & Hamblin, D.L. (1986). Individualized outcome measures of internal change: Methodological considerations. Journal of Consulting and Clinical Psychology, 54, 48-53. Burlingame, G.M., & Barlow, S.H. (1996). Outcome and process differences between professional and nonprofessional therapists in time-limited group psychotherapy. International Journal of Group Psychotherapy, 46, 455-478. Calsyn, R.J., & Davidson, W.S. (1978). Do we really want a program evaluation strategy based on individualized goals? A critique of goal attainment scaling. Evaluation Studies: Review Annual, 1, 700-713. Cartwright, D.S., Kirtner, W.L., & Fiske, D.W. (1963). Method factors in changes associated with psychotherapy. Journal of Abnormal and Social Psychology, 66, 164-175. Clark, M.S., & Caudrey, D.J. (1986). Evaluation of rehabilitation services: The use of goal attainment scaling. International Rehabilitative Medicine, 5, 41-45. Clement, P.W. (1996). Evaluation in private practice. Clinical Psychology: Science and Practice, 3, 146-159. Cohen, L.H. (1980). Methodological prerequisites for psychotherapy outcome research. Knowledge: Creation, Diffusion, and Utilization, 2, 263-272. Condon, K.M., & Lambert, M.J. (1994, June). Assessing clinical significance: Application to the State-Trait Anxiety Inventory. Paper presented at the annual meeting of the Society for Psychotherapy Research, York, England. Cytrynbaum, S., Ginath, Y., Birdwell, T., & Brandt, L. (1979). Goal attainment scaling: A critical review. Evaluation Quarterly, 3, 5-40. Derogatis, L.R., & Melisaratos, N. (1983). The Brief Symptom Inventory: An introductory report. Psychological Medicine, 13, 595-605. Docherty, J.P., & Streeter, M.J. (1996). Measuring outcomes. In L.I. Sederer & B. Dickey (Eds.), Outcome assessment in clinical practice (pp. 8-18). Baltimore: Williams & Wilkins. Farrell, A.D., Curran, J.P., Zwick, W.R., & Monti, P.M. (1983). Generalizability and discriminant validity of anxiety and social skills ratings in two populations. Behavioral Assessment, 6, 1-14. Fleuridas, C., Rosenthal, D.M., Leigh, G.K., & Leigh, T.E. (1990). Family goal recording: An adaptation of goal attainment scaling for enhancing family therapy and assessment. Journal of Marital and Family Therapy, 16, 389-406. Flowers, J.V., & Booarem, C.D. (1990). Four studies toward an empirical foundation for group therapy. Journal of Social Service Research, 13, 105-121. Follette, W.C., & Callaghan, G.M. (1996). The importance of the principle of clinical significancedefining significant to whom and for what purpose: A response to Tingey, Lambert, Burlingame, and Hansen. Psychotherapy Research, 6, 133-143. Forsyth, R.P., & Fairweather, G.W. (1961). Psychotherapeutic and other hospital treatment criteria: The dilemma. Journal of Abnormal and Social Psychology, 62, 598-604. Froyd, J.E., Lambert, M.J., & Froyd, J.D. (1996). A review of practices of psychotherapy outcome measurement. Journal of Mental Health, 5, 11-15. Garfield, S.L. (1996). Some problems associated with validated forms of psychotherapy. Clinical Psychology: Science and Practice, 3, 245-250. Garfield, S.L. (1991). Psychotherapy models and outcome research. American Psychologist, 46, 1350-1351. Garfield, S.L., Prager, R.A., & Bergin, A.E. (1971). Evaluation of outcome in psychotherapy. Journal of Consulting and Clinical Psychology, 37, 307-313. Gerarty, R.D. (1996). The use of outcome assessment in managed care: Past, present, and future. In L.I. Sederer & B. Dickey

< previous page

page_147

next page > Page 147

(Eds.), Outcome assessment in clinical practice (pp. 129-138). Baltimore: Williams & Wilkins. Gibson, R.L., Snyder, W.U., & Ray, W.S. (1955). A factors analysis of measures of change following clientcentered psychotherapy. Journal of Counseling Psychology, 2, 83-90. Glaister, B. (1982). Muscles relaxation training for fear reduction of patients with psychological problems: A review of controlled studies. Behavior Research and Therapy, 20, 493-504. Goldfried, M.R., Greenberg, L.S., & Marmar, C. (1990). Individual psychotherapy: Process and outcome. Annual Review of Psychology, 41, 659-688. Green, B.C., Gleser, G.C., Stone, W.N., & Siefert, R.F. (1975). Relationships among diverse measures of psychotherapy outcome. Journal of Consulting and Clinical Psychiatry, 43, 689-699. Grundy, C.T., Lambert, M.J., & Grundy, E. (1996). Assessing clinical significance: Application to the Hamilton Rating Scale for Depression. Journal of Mental Health, 5, 25-33. Grundy, C.T., Lunnen, K.M., Lambert, M.J., Ashton, J.E., & Tovey, D.R. (1994). The Hamilton Rating Scale for Depression: One scale or many? Clinical Psychology: Science and Practice, 1, 197-205. Grundy, E.M., & Lambert, M.J. (1994, June). Assessing clinical significance: Application to the Child Behavior Checklist. Paper presented at the annual meeting of the Society for Psychotherapy Research, York, England. Hall, M.C., Elliot, K.M., & Stiles, G.W. (1992). Hospital patient satisfaction: Correlates, dimensionality, and determinants. Journal of Hospital Marketing, 1, 77-90. Hamilton, M. (1967). Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology, 6, 278-296. Herbert, L.D., & Mueser, K.T. (1991). The proof is in the pudding: A commentary on persons. American Psychologist, 46, 1347-1348. Hollon, S.D., & Flick, S.N. (1988). On the meaning and methods of clinical significance. Behavior Assessment, 10, 197-206. Horowitz, L.M., Strupp, H.H., Lambert, M.J., & Elkin, I. (1997). Overview and summary of the core-battery conference. In H.H. Strupp, L.M. Horowitz, & M.J. Lambert (Eds.), Measuring patient changes in mood, anxiety, and personality disorders: Toward a core battery. (pp. 11-54). Washington, DC: American Psychological Association. Howard, K.I., Lueger, R.J., Maling, M.S., & Martinovich, Z. (1993). A phase model of psychotherapy outcome: Causal medication of change. Journal of Clinical and Consulting Psychology, 61, 678-685. Hsieh, M.O., & Kagle, J.D. (1991). Understanding patient satisfaction and dissatisfaction with health care. Health Social Work, 16, 281-290. Jacobson, N.S. (1988). Defining clinically significant change: An introduction. Behavioral Assessment, 10, 131132. Jacobson, N.S., Follette, W.C., & Revenstorf, D. (1984). Psychotherapy outcome research: Methods for reporting variability and evaluating clinical significance. Behavior Therapy, 15, 336-352. Jacobson, N.S., Follette, W.C., & Revenstorf, D. (1986). Toward a standard definition of clinically significant change. Behavior Therapy, 17, 308-311. Jacobson, N.S., Follette, W.C., & Revenstorf, D., Baucom, D.H., Hahlweg, K., & Margolin, G. (1984). Variability in outcome and clinical significance of behavioral marital therapy: A reanalysis of outcome data. Journal of Consulting and Clinical Psychology, 52, 497-504. Jacobson, N.S., & Revenstorf, D. (1988). Statistics for assessing the clinical significance of psychotherapy techniques: Issues, problems, and new developments. Behavioral Assessment, 10, 133-145. Jacobson, N.S., & Truax, P. (1991). Clinical Significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Kazdin, A.E. (1977). Assessing the clinical or applied importance of behavior change through social validation. Behavior Modification, 1, 427-452. Kendall, P.C., Hollon, S., Beck, A.T., Hammen, C., & Ingram, R.E. (1987). Issues and recommendations regarding use of the Beck

< previous page

page_147

next page >

< previous page

page_148

next page > Page 148

Depression Inventory. Cognitive Therapy and Research, 11, 289-300. Kendall, P.C., & Grove, W.M. (1988). Normative comparisons in therapy outcome. Behavioral Assessment, 10, 147-158. Kiresuk, T.J., & Sherman, R.E. (1968). Goal attainment scaling: A general method for evaluating comprehensive community mental health programs. Community Mental Health Journal, 4, 443-453. Lacks, P., & Powlishta, K. (1989). Improvement following behavioral treatment for insomnia: Clinical significance, long-term maintenance, and predictors of outcome. Behavior Therapy, 20, 117-134. Lambert, M.J. (1983). Introduction to assessment of psychotherapy outcome: Historical perspective and current issues. In M.J. Lambert, E.R. Christensen, & S.S. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 3-32). New York: Wiley Interscience. Lambert, M.J. (1998). Manual-based treatment and clinical practice: Hangman of life or promising development? Clinical Psychology: Science and Practice, 5, 391-395. Lambert, M.J., & Bergin, A.E. (1994). The effectiveness of psychotherapy. In A.E. Bergin & S.L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 143-189). New York: Wiley. Lambert, M.J., & Brown, G.S. (1996). Data-based management for tracking outcome in private practice. Clinical Psychology: Science and Practice, 3, 172-178. Lambert, M.J., Christensen, E.R., & DeJulio, S.S. (Eds.). (1983). The assessment of psychotherapy outcome. New York: Wiley. Lambert, M.J., Hatch, D.R., Kingston, M.D., & Edwards, B.C. (1986). Zung, Beck, and Hamilton rating scales as measures of treatment outcome: A meta-analytic comparison. Journal of Consulting and Clinical Psychology, 54, 54-59. Lambert, M.J., & Hill, C.E. (1994). Assessing psychotherapy outcomes and processes. In A.E. Bergin & S.L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 72-113). New York: Wiley. Lambert, M.J., Ogles, B.M., & Masters, K.S. (1992). Choosing outcome assessment devices: An organizational and conceptual scheme. Journal of Counseling and Development, 70, 527-532. Lambert, M.J., & McRoberts, C.H. (1993, April). Outcome measurement in JCCP: 1986-1991. Paper presented at the meetings of the Western Psychological Association, Phoenix, AZ. Lambert, M.J., & Okiishi, J.C. (1997). The effects of the individual psychotherapist and implications for future research. Clinical Psychology: Science and Practice, 4, 66-75. Lambert, M.J., Shapiro, D.A., & Bergin, A.E. (1986). The effectiveness of psychotherapy. In S.L. Garfield & A.E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed., pp. 157-212). New York: Wiley. Lewis, A.B., Spencer, J.H., Haas, G.L., & DiVittis, A. (1987). Goal attainment scaling: Relevance and replicability in follow-up of inpatients. Journal of Nervous and Mental Disease, 175, 408-418. Locke, H.J., & Wallace, K.M. (1959). Short-term marital adjustment and prediction tests: Their reliability and validity. Marriage and Family Living, 21, 251-255. Luborsky, L. (1971). Perennial mystery of poor agreement among criteria for psychotherapy outcome. Journal of Consulting and Clinical Psychology, 37, 316-319. Maher, C.A., & Barbrack, C.R. (1984). Evaluating the individual counseling of conduct problem adolescents: The goal attainment scaling method. Journal of School Psychology, 22, 285-297. Martinovich, Z., Saunders, S., & Howard, K.I. (1996). Some comments on "assessing clinical significance." Psychotherapy Research, 6, 124-132. Mavissakalian, M. (1986). Clinically significant improvement in agoraphobia research. Behavior Research and Therapy, 24, 369-370. McKenzie, K.R. (1995). Effective use of group therapy in managed care. Washington, DC: American Psychiatric Press. McLellan, A.T., & Durell, J. (1996). Outcome evaluation in psychiatric and substance abuse treatments: Concepts, rationale, and methods. In L.J. Sederer & B. Dickey (Eds.), Outcome assessment in clinical practice (pp. 34-44). Baltimore: Williams & Wilkins. Melson, S.J. (1995). Brief day treatment for nonpsychotic patients. In K.R. McKenzie (Ed.), Effective use of group therapy in managed care (pp. 113-128). Washington, DC: American Psychiatric Press.

< previous page

page_149

next page > Page 149

Messer, S.B. (1991). The case formulation approach: Issues of reliability and validity. American Psychologist, 46, 1348-1350. Miller, R.C., & Berman, J.S. (1983). The efficacy of cognitive behavior therapies: A quantitative review of the research evidence. Psychological Bulletin, 94, 39-53. Mintz, J., & Kiesler, D.J. (1982). Individualized measures of psychotherapy outcome. In P.C. Kendall & J.N. Butcher (Eds.), Handbook of research methods in clinical psychology (pp. 491-534). New York: Wiley. Mintz, J., Luborsky, L., & Christoph, P. (1979). Measuring the outcomes of psychotherapy: Findings of the Penn Psychotherapy Project. Journal of Consulting and Clinical Psychology, 47, 319-334. Mintz, J., Mintz, L.I., Arvuda, M.J., & Huang, S.S. (1993). Treatments of depression and the functional capacity to work. Archives of General Psychiatry, 49, 761-768. Monti, P.M., Wallander, J.L., Ahern, D.K., Abrams, D.B., & Munroe, S.M. (1983). Multi-modal measurement of anxiety and social skills in a behavioral role-play test. Generalizability and discriminant validity. Behavioral Assessment, 6, 15-25. Mylar, J.L., & Clement, P.W. (1972). Prediction and comparison of outcome in systematic desensitization and implosion. Behavior Research and Therapy, 10, 235-246. Nietzel, M.T., Russell, R.L., Hemmings, K.A., & Gretter, M.L. (1987). Clinical significance of psychotherapy for unipolar depression: A meta-analytic approach to social comparison. Journal of Consulting and Clinical Psychology, 55, 156-161. Ogles, B.M., Lambert, M.J., & Masters, K.S. (1996). Assessing outcome in clinical practice. Boston: Allyn & Bacon. Ogles, B.M., Lambert, M.J., & Sawyer, M.D. (1995). The clinical significance of the NIMH treatment of depression collaborative research program data. Journal of Consulting and Clinical Psychology, 63, 317-325. Ogles, B.M., Lambert, M.J., Weight, D.G., & Payne, I.R. (1990). Agoraphobia outcome measurement: A review and meta-analysis. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 2, 317-325. Pautler, T. (1991). A cost effective mind-body approach to psychosomatic disorders. In K.N. Anchor (Ed.), The handbook of medical psychotherapy: Cost effective strategies in mental health (pp. 231-248). Toronto: Hogrefe & Huber. Perry, G., Shapiro, D.A., & Firth, J. (1986). The case of the anxious executive: A study from the research clinic. British Journal of Medical Psychology, 59, 221-233. Persons, J.B. (1991). Psychotherapy outcome studies do not accurately represent current models of psychotherapy: A proposed remedy. American Psychologist, 46, 99-106. Pilkonis, P.A., Imber, S.D., Lewis, P., & Rubinsky, P. (1984). A comparative outcome study of individual, group, and conjoint psychotherapy. Archives of General Psychiatry, 41, 431-437. Polkinghorne, D.E. (1991). Two conflicting calls for methodological reform. The Consulting Psychologist, 19, 103-114. Ross, S.M., & Proctor, S. (1973). Frequency and duration of hierarchy item exposure in a systematic desensitization analogue. Behavior Research and Therapy, 11, 303-312. Russell, C.S., Olson, D.H., Sprenkle, D.H., & Atilano, R.B. (1983). From family system to family system: Review of family therapy research. American Journal of Family Therapy, 11, 3-14. Sabalis, R.F. (1983). Assessing outcome in patients with sexual dysfunctions and sexual deviations. In M.J. Lambert, E.R. Christensen, & S.S. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 205-262). New York: Wiley. Saunders, S.M., Howard, K.I., & Newman, F. L. (1988). Evaluating the clinical significance of treatment effects: Norm and normality. Behavioral Assessment, 10, 207-218. Schacht, T.E. (1991). Formulation-based psychotherapy research: Some further considerations. American Psychologist, 46, 1346-1347. Schmaling, K.B., & Jacobson, N.S. (1987, November). The clinical significance of treat ment gains resulting from parent-training interventions for children with conduct problems: A reanalysis of outcome data. Paper presented at the annual meeting of the Association for the Advancement of Behavior Therapy, Boston. Schulte, D. (1995). How treatment success could be assessed. Psychotherapy Research, 5, 281-296.

< previous page

page_149

next page >

< previous page

page_150

next page > Page 150

Scott, M.J., & Stradling, S.G. (1990). Group cognitive therapy for depression produces clinically significant reliable change in community-based settings. Behavioral Psychotherapy, 18, 1-19. Seggar, L., & Lambert, M.J. (1994, June). Assessing clinical significance: Application to the Beck Depression Inventory. Paper presented at the annual meeting of the Society for Psychotherapy Research, York, England. Seligman, M.E.P. (1995). The effectiveness of psychotherapy: The Consumer Reports study. American Psychologist, 50, 965-974. Shapiro, D.A., & Shapiro, D. (1982). Meta-analysis of comparative therapy outcome studies: A replication and refinement. Psychological Bulletin, 92, 581-604. Shore, M.F., Massimo, J. L., & Ricks, D.F. (1965). A factor analytic study of psychotherapeutic change in delinquent boys. Journal of Clinical Psychology, 21, 208-212. Silverman, W.K. (1991). Person's description of psychotherapy outcome studies does not accurately represent psychotherapy outcome studies. American Psychologist, 46, 1351-1352. Smith, M.L., Glass, G.V., & Miller, T. (1980). The benefits of psychotherapy. Baltimore: Johns Hopkins University Press. Spielberger, C.D., Gorsuch, R.L., Lushene, P. R., & Jacobs, G.A. (1983). Manual for the State-Trait Anxiety Inventory (Form Y). Palo Alto, CA: Consulting Psychologists Press. Strupp, H.H., & Hadley, S.W. (1977). A tripartite model of mental health and therapeutic outcomes: With special reference to negative effects in psychotherapy. American Psychologist, 32, 187-196. Strupp, H.H., Schacht, T.E., & Henry, W.P. (1988). Problem-treatment-outcome congruence: A principle whose time has come. In H. Dahl, H. Kaechele, & H. Thomas (Eds.), Psychoanalytic process research strategies (pp. 1-14). Berlin: Springer. Strupp, H.H., Horowitz, L.M., & Lambert, M.J. (1997). Measuring patient changes in mood, anxiety, and personality disorders: Toward a core battery. Washington, DC: American Psychological Association. Task Force in Promotion and Dissemination of Psychological Procedures (1996). An update on empirically validated therapies. The Clinical Psychologist, 49, 5-22. Texidor, M. & Taylor, C. (1991). Chronic pain management: The interdisciplinary approach and cost effectiveness. In K.N. Anchor (Ed.), The handbook of medical psychotherapy: Cost effective strategies in mental health (pp. 89-100). Toronto: Hogrefe & Huber. Tingey, R.C. (1989). Assessing clinical significance: Extension in methods and application to the SCL-90 R. Dissertation Abstracts International, 50, 04B. Tingey, R.C., Lambert, M.J., Burlingame, G. M., & Hansen, N.B. (1996a). Assessing clinical significance: proposed extensions to method. Psychology Research, 6, 109-123. Tingey, R.C., Lambert, M.J., Burlingame, G. M., & Hansen, N.B. (1996b). Clinically significant change: Practical indicators for evaluating psychotherapy outcome. Psychotherapy Research, 6, 144-153. Trull, T.J., Nietzel, M.T., & Main, A. (1988). The use of meta-analysis to assess the clinical significance of behavior therapy for agoraphobia. Behavior Therapy, 19, 527-538. Wampold, B.E., & Jenson, W.R. (1986). Clinical significance revisited. Behavioral Therapy, 17, 302-305. Waskow, I.E., & Parloff, M.B. (1975). Psychotherapy change measures. Rockville, MD: National Institute of Mental Health. Waxman, H.M. (1996). Using outcomes assessment for quality improvement. In L.J. Sederer & B. Dickey (Eds.), Outcome assessment in clinical practice (pp. 25-33). Baltimore: Williams & Wilkins. Wells, E.A., Hawkins, J.D., & Catalano, R.F. (1988). Choosing drug use measures for treatment outcome studies: 1. The influence of measurement approach on treatment results. International Journal of Addictions, 23, 851-873. Wells, K.B., Strum, R., Sherbourne, C.D., & Meredith, L.A. (1996). Caring for depression: A Rand study. Cambridge, MA: Harvard University Press. Williams, S.L. (1985). On the nature and measurement of agoraphobia. Progress in Behavior Modification, 19, 109-144. Wilson, G.T. (in press). Manual-based treatment and clinical practice. Clinical Psychology: Science and Practice, 5. Wilson, G.T., & Thomas, M.G. (1973). Self versus drug-produced relaxation and the effects

< previous page

page_151

next page > Page 151

of instructional set in standardized systematic desensitization. Behavior Research and Therapy, 11, 279-288. Wolf, M.M. (1978). Social validity: The case for subjective measurement or how applied behavior analysis is finding its heart. Journal of Applied Behavior Analysis, 11, 203-214. Worchel, P., & Byrne, D. (Eds.) (1964). Personality change. New York: Wiley. Woodward, C.A., Santa-Barbara, J., Levin, S., & Epstein, N.B. (1978). The roles of goal attainment scaling in evaluating family therapy outcome. American Journal of Orthopsychiatry, 48, 464.

< previous page

page_151

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_153

next page > Page 153

Chapter 5 Guidelines for Selecting Psychological Instruments for Treatment Planning and Outcome Assessment Frederick L. Newman Florida International University James A. Ciarlo University of Denver Daniel Carpenter Merit Behavioral Care Corporation Cornell University College of Medicine Envision the situation if oncological medicine were forced to base treatment decisions on just diagnosis and cost containment rather than on clinical status and outcome: "Mr. Smith, 90% of the tumor is now benign. Only 10% of the tumor remains malignant. Unfortunately you have used up the 20 sessions of radiation treatment allowed under your managed care plan for this year. Please come back next year when your insurance eligibility has been renewed" (Newman & Carpenter, 1997, p. 1040). Few would argue against employing health status, biological, or behavioral criteria when setting eligibility and level of care requirements for oncological medicine (or for the delivery of a baby, or for most nonelective surgery). The same logical arguments can and should be offered to support the delivery of mental health services. There are psychological assessment techniques that can be used to provide valid evidence that such criteria have been met. But how does one decide which instrument is most suitable to the circumstances? This chapter provides guidelines that can be used to select one or more instruments that are most suitable to the population being served and the treatment goals of the service(s). The guidelines can also be used to evaluate the appropriateness of an instrument(s) that is (are) currently in use, or proposed. One theme of the chapter is that the guidelines must be understood within the current demands on clinical practice and the delivery of mental health services. One contextual constraint is efforts to contain costs through managed care. The good news is the availability of psychological assessment instruments that may be applied in managed care settings to determine eligibility and level of care (see reviews by Howard, Moras, Brill, Martinovich, & Lutz, 1996; Newman & Tejeda, 1996). Additional good news is that many employers and behavioral health insurers now appear to understand that simple cost containment for one episode of care could lead to greater long-term expense. This is particularly true for the growing number of behavioral managed care programs that serve persons with severe and persistent mental illnesses. These insurers

< previous page

page_153

next page >

< previous page

page_154

next page > Page 154

are seeking valid procedures to address three basic questions: Should the illness be treated? What interventions are needed and by whom, and where should they be delivered? What are the outcome criteria? Or, more generally, who is eligible for what level of care? Level of care can be described as the amount and type of clinical or support resources that ought to be allocated to achieve a satisfactory outcome. Many researchers are actively involved in addressing these questions. The Internet bulletin boards serving mental health services researchers (e.g., OUTCMTEN) have active interchanges about which instruments are appropriate under what circumstances. When arguing for a particular level of care, practitioners find little evidence in the research literature to guide their decisions about the appropriate level of care (Newman & Tejeda, 1996). Traditional clinical research designs fix the treatment dosage level and run a horse race between experimental and control conditions or among several alternative treatments to determine which achieves the best outcomes with that dosage. Yet, in practice, the clinician works with the patient to achieve an agreed on level of functioning or reduction in symptom distress, or both. There is a need to modify research strategies on mental health services effectiveness such that it is possible to address such questions as: What type and amount of treatment will achieve a given behavioral criterion for XX% of the patients who meet the entry level of functioning? To be effective (and cost-effective) in the selection of an instrument or instruments, an additional series of questions must systematically be addressed: 1. What psychological and community functioning domains do clinicians wish to assess for this patient or this group of patients? 2. What are the behaviors that clinicians expect to impact? 3. What clinical or program decisions will be supported by an assessment of the person's psychological state or functional status? 4. What is the most cost-effective means for performing these assessments? Eleven guidelines are offered for instrument selection. The guidelines were originally developed by a panel of experts assembled by the National Institute of Mental Health (Ciarlo, Brown, Edwards, Kiresuk, & Newman, 1986)1 and were more recently updated to consider the potential impact on managed care (Newman & Carpenter, 1997). This chapter provides additional updates to the guidelines in terms of two demands on the clinical community: managed care and consumer choice. These are not, and should not be, independent. The assessment techniques used by managed care to determine eligibility, level of care, progress, and outcome should also be used as part of a delivery system report card to inform consumers and other purchasers of mental health services (Dewan & Carpenter, 1997; Mulkern, Leff, Green, & Newman, 1995; Newman, DeLiberty, Hodges, McGrew, & Tejeda, 1997). Consumer groups are requesting that the report card go beyond that of satisfaction with the manner in which services are provided (although that is also important) to incorporate the quality and long-term effects of the services themselves. Thus, the proper selection of psychological assessment techniques is critical to managed care and to consumer choice. The guidelines are summarized in Table 5.1 and are organized under five groupings: Applications of Measures, Methods and Procedures, Psychometric Features, Cost Considerations, and Utility Considerations. It should be obvious that the guidelines are not independent of each other. Yet, each focuses on unique concerns that will help readers 1 Members of the expert panel were A. Broskowski, J.A. Ciarlo, G.B. Cox, H.H. Goldman, W.A. Hargreaves, I. Elkins, J. Mintz, F.L. Newman, and J.W. Zinober.

< previous page

page_154

next page >

< previous page

page_155

next page > Page 155

TABLE 5.1 Guidelines for the Development, Selection and/or Use of Progress-outcome Measures Applications 1. Relevance to target group and independent of treatment provided, although sensitive to treatmentrelated changes. Methods and procedures 2. Simple, teachable methods. 3. Use of measures with objective referents. 4. Use of multiple respondents. 5. More process-identifying outcome measures. Psychometric features 6. Psychometric strength: Reliable, valid, sensitive to treatment-related change, and nonreactive. Cost considerations 7. Low costs. Utility considerations 8. Understanding by nonprofessional audiences. 9. Easy feedback and uncomplicated interpretation. 10.Useful in clinical services. 11.Compatibility with clinical theories and practices. consider the demands of their own situation, the literature and the relation of that guideline to the other guidelines. Guideline 1 Relevance to Target Group What are the characteristics of a target group that require its own assessment approach? An outcome measure or set of measures should be relevant and appropriate to the target group(s) whose treatment is being studied; that is, the most important and frequently observed symptoms, problems, goals, or other domains of change for the group(s) should be addressed by the measure(s). . . . Other factors being equal, use of a measure appropriate to a wider range of client groups is preferred. . . . Measures (should be) . . . independent of the type of treatment service provided are to be preferred. (Ciarlo et al., 1986, p. 26) Common wisdom holds that treatment selection, and a person's probable response to treatment, should be based on both clinical and demographic characteristics (Beutler & Clarkin, 1990). A target group can be described as a cluster of persons with similar clinical-demographic characteristics that are expected to have a similar response to treatment. Beutler and Clarkin (1990) provided guidelines that follow from the clinical literature. Combined use of needs assessment information from epidemiological surveys with expert panels have been employed in identifying target groups requiring similar systems of services for persons with a severe mental illness (Newman, Griffin, Black, & Page, 1989; Uehara, Smukler, & Newman, 1994). Another approach is to link the epidemiological data with historic levels of care or a combination of both (Uehara et al., 1994). A second feature of a target group that must be considered is personal characteristics that are known to influence how the information is collected. Differences in age, ethnicity (related to language and meaning), comorbidity with a physical illness or developmental disability, and past experiences can all influence the administration of a procedure. The

< previous page

page_155

next page >

< previous page

page_156

next page > Page 156

instruments discussed in this text provide an excellent platform for selecting such measures. However, if the particular target group served by a given program or practice has qualities that differ markedly from those of the overall target group, then a more detailed review of the literature cited within the reference lists at the end of a chapter will be required. Texts such as this one, and Ciarlo et al. (1986), contain sections and reference lists that provide greater detail on the limits of the techniques and their potential use with other populations. Guideline 2 Simple, Teachable Methods Ciarlo et al. (1986) pointed out that this second guideline was readily agreed on by all of the panelists working on these guidelines, but the development of training manuals and methods for assuring the quality of instrument administration at that time was seen as weak. Since then, the development of computer-assisted administration of assessment techniques has enhanced the reliability and validity of implementation by standardizing the way in which queries are presented. The long-standing difficulty of bridging the ethnic and cultural differences between the clinician and the patient can also be helped with the use of culturally sensitive selection of an assessment technique from a computerized menu of instruments. Even with the development of computer-assisted methods, the traditional guidelines for developing training materials and for controlling administrative quality must still be applied (see texts such as Nunnally & Burnstein, 1994; or Cronbach, 1970). Self-report measures (e.g., SCL-90-R, Basis-32, Beck Depression Inventory, MMPI-2) or measures completed by a significant other (e.g., the parents when using the Children's Behavior Checklist; Achenbach & Edelbrock, 1983) that have survived scrutiny and are considered to have adequate psychometric quality usually have good instructions and administration manuals. But, if the recommended guidelines for administration are ignored, then there are potentially disastrous effects on measure reliability and validity. For example, the instructions for most self-report instruments strongly recommend completion independent of guidance or advice from others, preferably in isolation. But, this requirement has not always been adhered to adequately. It is possible that the use of computers to collect self-report information will also increase the fidelity of the data collection. This is one area where computer-assisted applications are particularly useful. Many people are accustomed to interacting with a machine that asks them questionsoften of a quite personal nature (Locke et al., 1992; Navaline et al., 1994). Measures completed by an independent clinical observer or by the treating clinician can be very useful, but often the instructions on the instrument's use, training, and quality control procedures are poorly developed. On the one hand, such measures seek to make use of the professional's trained observations. On the other hand, such scales tend to be more reactive to clinician judgment bias (Newman, 1983; Patterson & Sechrest, 1983). Procedures for surfacing judgment biases in a staff training format are discussed in Newman (1983) and detailed in Newman and Sorensen (1985). When an assessment instrument is to be used as the basis for determining the level of need and reimbursement in a managed care environment, controls over training and use of a clinician rating assessment instrument are necessary to prevent improper use of the instrument. A good example of where such controls are currently employed is

< previous page

page_156

next page >

< previous page

page_157

next page > Page 157

with the Indiana Division of Mental Health's implementation of a managed care program, the Hoosier Assurance Plan (HAP). The HAP provides state funds to cover services to adults with a serious mental illness or a chronic addiction and children/adolescents with a severe emotional disorder. Two key features of the plan are that the consumer is to have informed choice2 of service provider; and the level of reimbursement is determined by the level of need demonstrated by the array of factor scores on an assessment instrument (one instrument for children and adolescents, and another for adults). The assessment instruments employed by HAP underwent 2 years of extensive pilot testing to assure that the psychometric qualities met the standards set by the advisory committee (Newman et al., 1997). To assure the integrity of the clinician's rating of the consumer, two controls have been implemented. First, all clinicians must be trained to an established criteria in the use of the instrument, with evidence of such achievement available for an audit. To support the local service program training efforts, each program is expected to have one or more clinical staff trained as trainers, where the training of trainers program, funded by the state, includes training packets, training vignettes, and guidelines on how to conduct the staff training programs. Second, an independent audit team (staffed by registered nurses with specific training) conducts a review of a random sample of cases on site at each service provider. A report of the audits goes both to the service program and to the state office of mental health. At this time, the reports have led to changes in training and supervision practices at the local level, with no official actions by the state office of mental health; however, it is understood that repeated problems could lead to actions by that office. Thus, training in the use of the instrument along with audits of both the local program's application of training procedures and the assessment instruments were seen as necessary controls. These controls were employed in the pilot work, and high levels of reliability and validity were observed (Newman et al., 1997). Guideline 3 Use of Measures with Objective Referents An objective referent is one for which concrete examples are given for each level of a measure or at least at key points on the rating scale. A major asset of objective referents is the potential to develop reliable and usable norms for an instrument, which is particularly critical when applied to managed care eligibility and level of care decisions. One of the best examples of a scale with objective referents is the Child Adolescent Functional Assessment Scale (CAFAS; Hodges, 1996; Hodges, & Gust, 1995). Examples of behaviors are provided at each of four levels of impairment for each of 35 categories of behavior. For example, under the most severe level of impairment for Unsafe/Potentially 2Informed Choice is supported by an annual The Hoosier Assurance Plan Provider Profile Report Card, which contains information on the performance of each service provider as measured by data from two data sources. The first contains the results of telephone interviews of a random sample of consumers asking about the impact of the services on their functioning and quality of life. The sample is stratified by service program and conducted by the University of Indiana Survey Research Unit. The second source is the reported baselines and the 90-day changes in factor scores from the clinical assessments performed on all consumers covered by Hoosier Assurance Plan (Indiana's managed care program). A copy of the Provider Profile Report Card, the adult instrument, and the training manual can be obtained from The Evaluation Center @ HSRI, 2336 Mass Avenue, Cambridge, MA 02140.

< previous page

page_157

next page >

< previous page

page_158

next page > Page 158

Unsafe Behavior there are five examples of behaviors at this level, two of which are the following: 114: Dangerous behavior caused harm to household member. 117: Sexually assaulted/abused another household member, or attempted to (e.g., sibling). Another approach involves the development of multiple items within a class of behaviors, and the rater is provided one referent behavior in an item and then requested to identify either the behavior's frequency (e.g., xtimes in last 24 hours, or in the last week, or in the last 30 days), the similarity of the observed behavior to the referent behavior (e.g., most like to least like the referent behavior), or the intensity of referent behavior (e.g., from not evident to mild to severe). Which approach may be best suited to each particular program is an empirical issue (see Guidelines 6-11). Clinicians often proclaim the attractiveness of instruments that are individualized to the client. The most attractive features of these instruments are that the measures can be linked more directly to the consumer's own behaviors and life situation, and treatment selection, course, and outcome can be individualized to the consumer. In fact, a consistent finding in the literature is that when the client and the clinician have agreed on the problems and goals, there is a significantly positive increase in outcome (Mintz & Kiesler, 1981). Measures of this sort include target complaints (severity of), goal-attainment scaling, problem-oriented records, and global improvement ratings. The major problem with such measures involves the issue of generalizability. Specifically, is the change in the severity of one person's complaint-problem comparable to a like degree of change in another person's complaintproblem? Although the issue of generalizability plagues all measures, without objective referents the distribution of outcomes becomes free-floating across settings or clinical groups (Cytrynbaum, Ginath, Birdwell, & Brandt, 1979), thereby limiting the utility of the measures. There are arguments on the other side of this issue, but mostly when data aggregation is involved. Several metaanalytic studies, where effect size is standardized, have been very informative without specifically identifying the behaviors that have been modified (e.g., Lipsey & Wilson, 1993). Howard, Kopta, Krause, and Orlinsky (1986) studied the relation of ''dosage" (number of visits) to outcome across studies where the measure of outcome was simply whether improvement was observed. But application of data aggregation methods may only be useful for addressing research and policy questions without satisfying the need for a clinician or a clinical service to communicate with a patient or an insurer about an individual patient's eligibility for care. Even with objective referents, local conditions, including statewide funding practices or community standards of "normal functioning," will transform the distributions of any measure, with or without objective referents (Newman, 1980). Thus, studies that identify local "norms" should become standard practice for any measure intended to set funding guidelines, to set standards for treatment review, or to conduct evaluation research (Newman, 1980; Newman, Kopta, McGovern, Howard, & McNeilly, 1988). Individualized problem identification and goal setting do have beneficial outcomes, so it may be possible to have the best of both worlds. This can be accomplished by using both an individualized instrument and an instrument with national norms that has objective referents. Sechrest (personal communication, November 1993) recommended that by using both individualized instruments and instruments with national norms, two demands can be satisfied. One demand is that of identifying the individual characteristics of the patient in terms that are most useful in the local situation. The

< previous page

page_158

next page >

< previous page

page_159

next page > Page 159

other is to relate these characteristics to the patient's performance on a standardized instrument. In both cases, the other guidelines that support the reliable and valid application of either assessment technique also must be applied. Guideline 4 Use of Multiple Respondents A number of theorists and researchers have noted that measures from the principal stakeholders (client, therapist, significant other collateral, research evaluator) should be obtained because each views the process and outcomes of treatment differently (Ciarlo et al., 1986; Ellsworth, 1975; Lambert, Christensen, & DeJulio, 1983; Strupp & Hadley, 1977). The importance of this guideline varies by target group and by the clinical situation involved. For example, a second informant can be helpful when assessing behaviors that are socially undesirable or about which someone might generally be guarded, reticent, or simply unaware. In the assessment of children, Achenback and Edlebrock (1983) considered the parents of psychologically troubled children as primary observers, whereas teachers are considered secondary. Similar issues are being addressed in the development of assessment scales for the elderly, where the adult children of the frail elderly would be considered as major stakeholders whose assessments are considered primary over self-reports (R.A. Kane & R.L. Kane, 1981; Lawton & Teresi, 1994; Mangen & Peterson, 1984). There have been a number of researchers who have contrasted the views of the four major respondents: client, treating clinician, significant other, and independent clinical observer. Turner, McGovern, and Sandrock (1982) found that there is a high level of agreement across different scales originally designed specifically for use by one of the respondent groups, as evidenced through high canonical correlations across respondents (e.g., the SCL-90-R for clients, the Colorado Clinical Rating Scale and the Global Assessment Scale for clinicians, and the Personal Adjustment and Role Skills scale for significant others) when instructions were modified to fit each of the respondents. High coefficients were obtained when observers described specific behaviors (where the scale had objective referents), and lower coefficients were obtained when observers described how another person felt (e.g., she/he felt "happy" or "sad"). The major advantages achieved by obtaining measures from multiple observers are: (a) Each observer's experiences result in a unique view of the client (although Turner et al., 1982, suggested that these views can be highly similar). (b) Concurrent validation of the client's behavioral status and changes can be obtained. (c) Responses are likely to be more honest if all of the respondents are aware that there are multiple respondents. And, (d) discrepancies between informants can enlighten the clinician to potential problem areas to be addressed in treatment (e.g., client: "I am sleeping ok!"; spouse: "He paces all night long!"). A major disadvantage of using multiple sources is higher costs, particularly in terms of the time and effort of data collection and analysis. There is also the added logistical problem of attempting to collect the functional status data from multiple respondents at the same time, such that the same states are being observed. The time and effort costs are becoming more manageable with the use of computer-assisted testing and scoring procedures; however, the additional costs of hardware and software must be considered.

< previous page

page_159

next page >

< previous page

page_160

next page > Page 160

Guideline 5 More Process-Identifying Outcome Measures Measure(s) that provide information regarding the means or processes by which treatments may produce positive effects are preferred to those that do not (Ciarlo et al., 1986, p. 28). The basic concept here is at least controversial. On one side of the issue, Orlinsky and Howard (1986) argued that there ought to be a relation between process and outcomes. Behavioral and cognitive behavioral treatments employing self-management, homework assignments, and self-help group feedback often use measures with objective behavioral referents as both process and outcome measures. On the other side, Stiles and Shapiro (1995) argued that most important interpersonal and relationship ingredients (processes) that occur during psychotherapy (possibly other psychosocial intervention sessions as well) are not expected to correlate with outcome. Adequate empirical support for either side of the argument is still lacking, and the different sides of the arguments appear to be theory related (Newman, 1995). It is probably best to consider this guideline in terms of measuring treatment progress or attainment of intermediate goals of the treatment plan. Steady progress toward these goals ought to be an integral part of the conversation between patient and clinician, and it is a key concern of the managed care companies. Howard et al. (1996) described how clinicians can map the individual client's progress on a standardized measure relative to the progress expected from a database that included a large sample of consumers who had similar clinical characteristics at the start of their therapeutic episode. (Details on this procedure are described later by Newman and Dakof in chap. 7). The approach proposed by Howard et al. (1996) is particularly attractive to managed care companies because it is possible to empirically describe when a consumer's progress is or is not satisfactory in a timely fashion. If the observed progress is working, continue treatment; if it is not working, it may be advisable to shift the treatment strategy. A strong argument can be made that behavioral "markers" of progress or risk level should be taken regularly during the course of treatment to decide if a change in therapeutic strategy should be considered. Examples of behavioral markers include signs or levels of suicidality, depression, anxiety, substance abuse, interpersonal functioning, or community functioning (Lambert & Lambert, chap. 5, this volume; Maruish, 1994). These "markers" are not necessarily describing the actual therapeutic process. Instead, they are global indicators describing whether the person is functioning adequately to consider continuing versus altering the planned treatment. Certainly, programs serving consumers with serious and persistent illnesses should adopt a strategy of regularly collecting such "progress" measures. Guideline 6 Psychometric Strengths The measure used should meet minimum criteria of psychometric adequacy, including: a) reliability (testretest, internal consistency, or inter-rater agreement where appropriate); b) validity (content, concurrent, and construct validity); c) demonstrated sensitivity to treatment-related change; and d) freedom from response bias and non-reactivity (insensitivity) to extraneous situational factors that may exist (including physical settings, client expectation, staff behavior, and accountability pressures). The measure should be difficult to intentionally fake, either positively or negatively. (Ciarlo et al., 1986, p. 27)

< previous page

page_160

next page >

< previous page

page_161

next page > Page 161

Two issues are discussed under the topic of psychometric features. First, it is important to use measures of high psychometric quality. Second, the psychometric quality of the local application of an instrument is related to the quality of services. Measures of high psychometric quality are important. On the surface, no one should argue to lower the standards for an instrument's psychometric qualities. Yet, the more reactive, less psychometrically rigorous global measures (e.g., global improvement ratings, global level of functioning ratings) tend to be more popular with upper level decision makers (e.g., program managers, legislators). Although it is possible to exert reasonable control over the application of these measures to assure psychometric quality (Newman, 1980), if such control is not enforced, then psychometric quality suffers (Green, Nguyen, & Attkisson, 1979). The psychometric quality of an instrument as implemented in a local program is related to the program's quality of care. This is a bold, double-edged assertion. On the one hand, the selection and use of an instrument of poor psychometric quality could depreciate the quality of care because it is likely that the wrong information about a person would be transmitted. One the other hand, it is also possible that a service providing poor quality care can depreciate the psychometric quality of the assessment techniques. If local data collection produces low reliability and validity estimates, and there exists evidence that the instrument has been demonstrated to have adequate reliability and validity in another context, then one of two possibilities needs to be considered. It is possible that lower levels of reliability and validity could be simply a matter of shoddy data collection and management. But, as presented in a later argument suggesting that it is also possible that if the psychometric quality of an instrument is lower in its local application than found in its national norms, then it is possible the quality and effectiveness of clinical services should be questioned. There are three conditions that, when satisfied, both the quality of care and the psychometric quality of assessment data are increased. First, a clinical service should have clearly defined the target groups it can treat (see Guideline 1). Second, the service should have clearly defined progress and outcome goals for each target group in terms that are observable and measurable. Third, the leadership and staff of a psychological service should have identified one or more instruments whose interpretative language is useful to support clinical communication about patient status relative to service goals. When one or more instruments are selected within the context of the first two assumptions, then the instrument can support reliable and useful communication, which in turn should promote a high quality of care. To illustrate the relation between the quality of care and the quality of an instrument as implemented, consider the issue of psychometric reliability as it might relate to program quality. If reliability of communication is low (between the client and therapist, or among two or more treatment staff), then it is likely that there is inconsistent communication or understanding about the client's psychological and functional status, the service treatment's intention, and/or the client's progress-outcome. If there is inconsistent communication or understanding regarding these aspects between client and therapist or among clinical staff, then a poor outcome would be the most likely result (Mintz & Kiesler, 1981). How does the use of standardized measures fit into the picture of increasing the accuracy of clinical communication? There are two points to be made. First, careful selection of the progress-outcome measures must be preceded by, and based on, a clear statement of a program's purpose and goals. Second, the language describing the functional domains (i.e., factor structure) covered by the instruments represents an agreed on vocabulary for staff to use when communicating with and about clients. If

< previous page

page_161

next page >

< previous page

page_162

next page > Page 162

the language of communication is related to the language of the instruments, then any inconsistency in use of the instruments would reflect an inconsistency in the communication with clients and among staff when discussing clients.3 Another consideration concerns an instrument's validity in its local application. If locally established estimates of instrument validity among services or within a service deviate from established norms, then the service staff's concept of "normal" needs to be studied. Classical examples of such differences are those found in estimates of community functioning between inpatient and outpatient staff (Newman, Heverly, Rosen, Kopta, & Bedell, 1983). Kopta, Newman, McGovern, and Sandrock (1986) found that when there were multiple frames of reference among clinicians of different theoretical orientations, there were different syntheses of the clinical material within a session and different intervention strategies and treatment plans proposed. McGovern et al. found that differences in attributions of problem causality and treatment outcome responsibility were related to judgments regarding the clinicians' choices of treatment strategies (McGovern, Newman, & Kopta, 1986). These differences in frames of reference influence (i.e., probably reduce) the estimates of concurrent validity of measures in use as well as measure interrater reliability. However, reduced coefficients of reliability and validity are not as serious an issue as the potential negative impact on services when purpose, language, and meaning lack clarity among service staff. A two-part recommendation should be considered. First, the leadership of a service program should implement or refine operations that satisfy the three assumptions identified earlier: Obtain a clear target group definition by service staff, provide operational definitions of treatment goals-objectives, and work toward selection of instruments whose structure and language reflects the first two assumptions. The program should also incorporate staff supervision and development procedures that will identify when and how differences in frames of reference and language meaning are occurring. Such staff development exercises can serve to document the level of measure reliability. These can be conducted at relatively low costs (see Newman & Sorensen, 1985). The exercises contrast staff assessments and treatment plan for the same set of patient profiles. The patient profiles could be presented via taped interviews or via written vignettes. Green and Gracely (1987) found that a two-page profile was as effective as a taped interview (and a lot less costly) when estimating interrater reliability. Methods for constructing such profiles and analyzing the results are described in Newman and Sorensen (1985) and in Heverly, Fitt, and Newman (1984). The data from these exercises can also be used to assess the degree to which the local use of the instruments matches the national norms. Guideline 7 Low Measure Costs Relative to Its Uses How much should be spent on collecting, editing, storing, processing, and analyzing progress-outcome information? The answer to this question must be considered in terms of the five important functions that the data support: screening-treatment planning, 3 Work with several colleagues has focused on both the methods and results of studies identifying factors influencing differences in clinician's perceptions. The theoretical arguments and historical research basis for this line of work is discussed in Newman (1983). The procedures for conducting these studies as staff development sessions are detailed in Newman and Sorensen (1985) and in Heverly et al. (1984). Examples of studies on factors influencing clinical assessment and treatment decisions include Heverly et al., 1984; Kopta, Newman, McGovern, and Angle (1989); Kopta et al. (1986); McGovern et al. (1986); Newman et al. (1983); and Newman et al. (1988).

< previous page

page_162

next page >

< previous page

page_163

next page > Page 163

quality assurance, program evaluation, cost containment (utilization review), and revenue generation. Given these functions, a better question might be: What is the investment needed to assure a positive return on these functions? There are several pressures on mental health (and physical health) services indicating that an investment in the use of progress-assessment instruments can be cost beneficial. The first is the requirement for an initial assessment to justify entry into services and development of a treatment plan for reimbursable clients. For persons with a serious and persistent mental illness, funded placement (e.g., by Medicaid in most states) in extended community services (waivered services) requires a diagnostic and functional assessment. Most thirdparty payers will reimburse judicious use of such activities and the affiliated resource costs if it can be shown that the testing is a cost-effective means of making screening (utilization review) decisions. Justification for continued care is also required by both public and private third-party payers. Again, the cost of the assessment can often be underwritten by the cost containment-quality assurance agreement with the thirdparty payer. The second pressure is the emerging litigious culture that requires increasing levels of accountability for treatment interventions. The legal profession is divided on this issue. One view says that the less hard data a service program has, the less liability it would have for its actions. The credo here appears to be, "Do not put anything in writing unless required to do so by an authority that will assume responsibility." The other view says that a service program increases its liability if it does not have any hard evidence to justify its actions. There is little doubt that the former view has been the most popular view until recently. With increased legal actions by consumer groups on the "right to treatment," there is likely to be an increased need for data that can justify the types and levels of treatment provided. A parallel force is exerted by the increased budgetary constraints by both private and public sources of revenues for mental health services. Pressures to enforce application of cost containment-utilization review guidelines appear to be far stronger than pressures for assuring quality of care. Although the literature has indicated the efficacy of many mental health interventions, empirical literature supporting the cost-effectiveness and cost benefits of these services still lags (Newman & Howard, 1986; Yates & Newman, 1980). When Ciarlo assembled the panel of experts for NIMH, a cost estimate of 0.5% of an agency's total budget was considered to be a fair estimate of affordable costs for collecting and processing progress-outcome data. This was to include the costs of test materials, training of personnel, as well as collecting and processing the data. This estimate was made at a time when the public laws governing the disbursements of Federal Block Grant funds required that 5% of the agency's budget go toward evaluations of needs and program effectiveness. There have been three notable changes in service delivery that have occurred since the time when the panel of experts met. One is that the Health Care Financing Agency (HCFA) and other third-party payers now require an assessment procedure that will identify and deflect those who do not require care, or that will identify the level of care required for clients applying for service. They do offer limited reimbursement for such assessment activities. The second change focuses on the use of assertive case management or continuous treatment team approaches for persons with a serious and persistent mental illness or who are substance abusers. Here the client tracking procedures can be part of the reimbursed overhead costs. The third and perhaps the most powerful impetus for change is the new National Committee on Quality Assurance (NCQA) standards for accrediting managed behavioral

< previous page

page_163

next page >

< previous page

page_164

next page > Page 164

health organizations. There is a standard required for more clinical and quality studies than in the past. Managed care contractors can expect to face rising expectations for outcome data collection and analysis, auditing of medical records, and adherence to practice guidelines. Assessment and client-tracking procedures are logically compatible activities. The requirement for initial and updating assessments to justify levels of care can be integrated with the client-tracking system requirement for case management or treatment team approaches. If a cost-effective technique for integrating the assessment and the client-tracking procedures is instituted, then the costs for testing become part of the costs of coordinating and providing services. It is possible that if the costs considered here were restricted to just the costs of purchasing the instrument and the capacity to process the instrument's data (and not the professionals' time), then the costs might not exceed the 0.5% estimate. Proper cost-estimation studies need to been done to provide an empirical basis to identify the appropriate levels of costs. Guideline 8 Understanding by Nonprofessional Audiences The scoring procedures and presentation of the results should be understandable to stakeholders at all levels. These stakeholders include consumers and their significant others, third-party payers, as well as administrative and legislative policymakers at the local, state, and federal levels. Consumers The analysis and interpretation of the results should be understandable at the individual consumer level. Two lines of reasoning support this claim. The first is the increased belief in and legal support for the consumer's right to know about the assessment's results and the associated selection of treatment and services. An understandable descriptive profile of the client can be used in a therapeutically positive fashion. Examples for the client or family member's consideration might include the following: 1. Does the assessment score(s) indicate my need for, progress in, or success with treatment, or the need for continued treatment? 2. Does a view of my assessment score(s) over time describe how I functioned in the past relative to how I am doing now? 3. Does the assessment score(s) help me communicate how I feel or function to those who are trying to serve, treat, or assist me (including my family)? 4. Does the assessment help me understand what I can expect in the future? Third Parties A second aspect of this guideline is the advantage of being able to aggregate understandable test results over groups of consumers in order to communicate evaluation research results to influential stakeholders (e.g., regulators, third-party payers, legislators, employers, citizens, or consumer groups). This includes needs assessment for program and

< previous page

page_164

next page >

< previous page

page_165

next page > Page 165

budget planning (Newman et al., 1989; Uehara et al., 1994). It also includes evaluating program effectiveness and/or cost-effectiveness among service alternatives for policy analysis and decisions (Newman & Howard, 1986; Yates, 1996; Yates & Newman, 1980). The budget planners and the policy decision makers require easily understood data. They are often reticent to rely solely on expert opinion to interpret the data, with some even preferring to do it themselves. Examples of questions that the data should ideally be able to address include: Do the scores show whether clients have improved functioning to a level where they either require less restrictive care or no longer require care? Do the measures assess and describe consumers' functioning in socially significant areas, for example, independent living, vocational productivity, appropriate interpersonal and community behaviors? Would the measures permit comparisons of relative program effectiveness among similar programs that serve similar clients? In summary, it is important to ensure not only that the test results are understandable to those at the front-line level (consumers, their families, and service staff) but also that the aggregate data is understandable to budget planners and policymakers. Guideline 9 Easy Feedback and Uncomplicated Interpretation The discussion under Guideline 8 is also relevant here, but here the focus is on presentation. Do the instrument and its scoring procedures provide reports that are easily interpreted? Does the report stand on its own without further explanations or training? For example, complex "look-up" tables are less desirable than a graphic display describing the characteristics of a client or a group of clients relative to a recognizable norm. Computerized scoring and profile printouts in both narrative and graphic form are becoming more common, which is to be commended. This trend reiterates the importance of Guideline 9. Another recent development is the trend to develop report cards on mental health services for use by consumers, their families, or by funding agencies (Dewan & Carpenter, 1997). In the past, such report cards focused on the types of consumers served and their level of satisfaction with a service. Recently, there has been a movement to provide report cards that describe the impact of the services on the consumers' quality of life and functioning (DeLiberty, 1998; Mulkern, Leff, Green, & Newman, 1995). The use of report cards is sufficiently new that systematic research on the quality and impact of instruments when used as part of a report card has yet to be done. Nevertheless, report cards of HMOs in general health care are substantial enough to reach wide distribution in the popular press, as was the case with the Wall Street Journal during 1997. There are two important cautionary notes about the relation between what is communicated by a report and the actual underlying variables captured by the scale. First, the language of the presentation should not be so "user friendly" that it misrepresents the data. The language used to label figures and tables must be carefully developed such that the validity of the instrument's underlying constructs are not violated. A related problem arises when it is assumed that the language used in the report matches the language used by patients or family members in their effort to understand and cope with their distress. For example, an elevated SCL-90 Depression subscale score might not match the patient's experience of elevated depression. It is important not to allow the language of test results to mask issues that are clinically important, as well as important to the patient.

< previous page

page_165

next page >

< previous page

page_166

next page > Page 166

Guideline 10 Useful in Clinical Services The assessment instrument(s) used should support the clinical processes of a service with minimum interference. An important selection guideline is whether the instrument's language, scoring, and presentation of results supports clinical decisions and communication. Those who need to communicate with each other include not only the clinical and service staff working with the client, but also the clients and their collateral-significant others. These clinically relevant questions might be considered when discussing the instrument(s) utility: Will the test results describe the likelihood that the client needs services and be appropriately responsive to available services? Do the test results help in planning the array and levels of services, treatments, and intervention styles that might best meet service goals? Do the test results provide sufficient justification for the planned treatment to be reimbursed by third-party payers? Is the client responding to treatment as planned, and if not, what areas of functioning are or are not responding as expected? An ideal instrument meeting this guideline would be sufficiently supportive of these processes such that the effort required to collect and process the data would not be seen as a burden. The logic here is complementary to Guideline 7, that is, the measure should have low costs relative to uses in screening-treatment planning, quality assurance, cost control, and revenue generation. Here, however, the emphasis is on utilization of the measure's results. The more the instrument is seen as supporting these functions, the less expensive and interfering the instrument will be perceived by clinical staff. Guideline 11 Compatibility with Clinical Theories and Practices An instrument that is compatible with a variety of clinical theories and practices should have wider interest and acceptance by a broad range of clinicians and stakeholders than one based on only one concept of treatment improvement. The former would provide a basis for evaluative research by contrasting the relative effectiveness of different treatment approaches or strategies. How does one evaluate the level of compatibility? A first step is to inquire about the context in which it was developed and the samples used in developing norms. For example, if the normative sample were clients on inpatient units, then it would probably be too limited because inpatient care is now seen as the most restrictive and infrequently used level of a continuum of care. The broader the initial sampling population used in the measure's development, the more generalizable the instrument. Ideally, there should be available norms for both clinical and nonclinical populations. For example, if an instrument is intended for a population with a chronic physical disability (e.g., wheelchair bound), then for sampling purposes the definition of a normal functioning population might change to persons with the chronic physical disability who function well in the community (Saunders, Howard, & Newman, 1988). Another indicator of measure compatibility is whether there is evidence that its use in treatment/service planning and review matches the research results published in refereed journals. This is especially important when the data are used to contrast the outcomes of two or more therapeutic (or service) interventions. In reviewing this type of research, look first at the types of clients served, the setting, and the type of diagnoses

< previous page

page_166

next page >

< previous page

page_167

next page > Page 167

and problems treated. Also note the differences in standard deviations among the groups in this literature. Evidence of compatibility would be indicated by similar (homogeneous) variations among the treatment groups. Homogeneity would indicate that errors of measurement (and/or individual differences and/or item difficulty) were not biased by the therapeutic intervention that was employed. One note of caution needed here is that it is possible for a measure to have homogeneity of variance within and across treatment groups, and to still lack equal sensitivity to the respective treatment effects. If a measure is not sensitive to treatment effects, its use as a progress or outcome assessment instrument is invalid. Methods for assessing these features are discussed in Newman and Tejeda (chap. 8, this volume) and go beyond the purposes of this chapter. Conclusions These guidelines are designed to support the evaluation of an assessment instrument and are not presented as firm rules of conduct. Few, if any, instruments can fully meet all the guidelines. But, it is expected that if their use is as a means of drawing together available information on an instrument, they will decrease the number of unexpected or unpleasant surprises in the adaptation (or adoption) and use of a measure. The application of the 11 guidelines has its own costs. Although a master's degree level of psychometric training is sufficient background to assemble the basic information on an instrument's ability to meet these guidelines, a full explication of the guidelines requires broader input. Some of the guidelines require clinical supervisors and managers to review clinical standards, program procedures, and policies. Other guidelines will require an interchange among clinical supervisory and fiscal management personnel in areas where they had little prior experience (e.g., the costs and worth of testing). Recent experience of involving consumers and families of consumers as participants in the selection or modification of instruments has been extremely useful (Mulkern, Leff, Newman, & Green, 1995; Newman et al., 1997). For example, Teague and colleagues (Teague, 1995; Teague, Drake, & Ackerson, 1995) found that self-management skills were as important an outcome concern as level of symptom distress across all subgroups of consumers, family members, and clinical service providers. This finding was confirmed in a study in Indiana, where the advisory panel that guided the development of the assessment instruments (which included both professionals and consumer representatives) insisted that both selfmanagement and level of problem severity be considered together in the assessment of each problem area (Newman et al., 1997). Therefore, the ultimate benefits to clients and stakeholders of applying these guidelines are well worth the costs. Although there are no controlled studies on whether timely feedback of consumer assessment information will positively influence consumer, clinician, managerial, or policy decision making. References Achenbach T.M., & Edelbrock C.S. (1983). Manual for the Child Behavior Checklist and Revised Behavior Profile. Burlington, VT: Dept. of Psychology, University of Vermont. Beutler, L.E., & Clarkin, J.F. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. New York: Brunner/Mazel.

< previous page

page_167

next page >

< previous page

page_168

next page > Page 168

Ciarlo, J.A., Brown, T.R., Edwards, D.W. Kiresuk, T.J., & Newman, F.L. (1986). Assessing mental health treatment outcome measurement techniques (DHHS Publication No. ADM 86-1301). Washington, DC: Superintendent of Documents, U.S. Government Printing Office. Cronbach, L.J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row. Cytrynbaum, S., Ginath, T., Birdwell, J., & Brandt, L. (1979). Goal attainment scaling: A critical review. Evaluation Quarterly, 3, 5-40. DeLiberty, R. (1998). Developing a public mental health report card: the Hoosier assurance plan provider profile report card. Managed Care Quarterly, 6, 1-7. Dewan, N., & Carpenter, D. (1997). Value account of health care services. In L.J. Dickstein, M. B. Riba, & J.M. Oldham (Series Eds.), & J.F. Clarkin & J. Docherty (Vol. Eds.), Psychologic/Biologic Testing: Issues for Psychiatrists of the Annual Review of Psychiatry (Vol. 16, pp. 81-101). Ellsworth, R.B. (1975). Consumer feedback in measuring the effectiveness of mental health programs. In M. Guttentag, & E.L. Struening (Eds.), Handbook of evaluation research (Vol. 2, pp. 239-224), Beverly Hills, CA: Sage. Green, R.S., & Gracely, E.J. (1987). Selecting a rating scale for evaluating services to the chronically mentally ill. Community Mental Health Journal, 23, 91-102. Green, R.S., Nguyen, T.D., & Attkisson, C.C. (1979). Harnessing the reliability of outcome measures. Evaluation and Program Planning, 2, 137-142. Heverly, M.A., Fitt, D.X., & Newman, F.L. (1984). Constructing case vignettes for evaluating clinical judgement. Evaluation and Program Planning, 7, 45-55. Hodges, K. (1996). Child and Adolescent Assessment Scale (CAFAS): Miniscale version. 2140 Old Earhart Road, Ann Arbor, MI 48105. Hodges, K., & Gust, J. (1995). Measures of impairment for children and adolescents. Journal of Mental Health Administration, 22, 403-413. Howard, K.I., Kopta, S.M., Krause, M.S., & Orlinsky, D.E. (1986). The dose-effect relationship in psychotherapy. American Psychologist, 41, 159-164. Howard, K.I., Moras, K., Brill, P., Martinovich, Z., & Lutz, W. (1996). Evaluation of psychotherapy: Efficacy, effectiveness, and patient progress. American Psychologist, 51, 1059-1964. Kane, R.A., & Kane, R.L. (1981). Assessing the elderly: A practical guide to measurement. MA: Lexington. Kopta, A.M., Newman, F.L., McGovern, M. P., & Angle, R.S. (1989). The relationship between years of psychotherapy experience and conceptualizations, interventions, and treatment plan costs. Professional Psychology, 29, 59-61. Kopta, S.M., Newman, F.L., McGovern, M. P., & Sandrock, D. (1986). Psychotherapeutic orientations: A comparison of conceptualizations, interventions and recommendations for a treatment plan. Journal of Consulting and Clinical Psychology, 54, 369-374. Lambert, M., Christensen, E., & DeJulio, R. (Eds.). (1983). The assessment of psychotherapy outcome. New York: Wiley. Lawton, M.P., & Teresi, J.A. (Eds.). (1994). Annual review of gerontology and geriatrics: Focus on assessment techniques. New York: Springer. Lipsey, M., & Wilson, D. (1993). The efficacy of psychological, educational, and behavioral treatment: Confirmation and meta-analyses. American Psychologist, 48, 1181-1209. Locke, S. E., Kowaloff, H.B., Hoff, R.G., Safran, C., Popovsky, M.A., Cotton, D.J., Finklestein, D.M., Page, P.L., & Slack, W. V. (1992). Computer based interview for screening blood donors for risk of HIV transmission. Journal of American Medical Association, 264, 1301-1305. Mangen, D.J., & Peterson, W.A. (1984). Health, program evaluation, and demography. Minneapolis: University of Minnesota. Maruish, M. (Ed.). (1994). Use of psychological testing for treatment planning and outcome assessment. Mahwah, NJ: Lawrence Erlbaum Associates. McGovern, M.P., Newman, F.L., & Kopta, S. M. (1986). Meta-theoretical assumptions and psychotherapy orientation: Clinician attributions of patients' problem causality and

< previous page

page_168

next page >

< previous page

page_169

next page > Page 169

responsibility for treatment outcome. Journal of Consulting and Clinical Psychology, 54, 476-481. Mintz, J., & Kiesler, D.J. (1981). Individualized measure of psychotherapy outcome. In P. Kendall & J.N. Butcher (Eds.), Handbook of research methods in clinical psychology (pp. 491-534). New York: Wiley. Mulkern, V., Leff, S., Green, R.S., & Newman, F. L. (1995). Section II: Performance indicators for a consumeroriented mental health report card: Literature review and analysis. In V. Mulkern (Ed.), Stakeholders perspectives on mental health performance indicators. Cambridge, MA: The Evaluation Center @ HSRI. Navaline, H.A., Snider, E.C., Christopher, J. P., Tobin, D., Metzger, D., Alterman, A. I., & Woody, G.E. (1994). Preparation for AIDS Vaccine Trials: An automated version of the Risk Assessment Battery (RAB): Enhancing the assessment of risk behaviors. Aids Research and Human Retro Viruses. 10, S281-283. Newman, F.L. (1995). Disabuse of the drug metaphor in psychotherapy: An editorial dilemma. Journal of Consulting and Clinical Psychology, 62, 940-941. Newman, F.L. (1983). Therapists' evaluations of psychotherapy. In M. Lambert, E. Christensen, & R. DeJulio (Eds.), The assessment of psychotherapy outcome (pp. 497-534). New York: Wiley. Newman, F.L. (1980). Global scales: Strengths, uses and problems of global scales as an evaluation instrument. Evaluation and Program Planning, 3, 257-268. Newman, F.L., & Carpenter, D. (1997). In L.J. Dickstein, M.B. Riba, & J.M. Oldham (Series Eds.), & J.F. Clarkin & J. Docherty (Vol. Eds.), Psychologic/Biologic Testing: Issues for Psychiatrists of the Annual Review of Psychiatry (Vol. 16, pp. 59-79). Newman, F.L., DeLiberty, R., Hodges, K., McGrew, J., & Tejeda, M.J. (1997). The Indiana Hoosier Assurance Plan packet: A technical report. Copies can be obtained from Cambridge, MA: Evaluation Center @ HSRI. Newman, F.L., & Tejeda, M.J. (1996). The need for research designed to support decisions in the delivery of mental health services. American Psychologist, 51, 1040-1049. Newman, F.L., Heverly, M.A., Rosen, M., Kopta, S.M., & Bedell, R. (1983). Influences on internal evaluation data dependability: Clinicians as a source of variance. In A.J. Love (Ed.), Developing effective internal evaluation: New directions for program evaluation (No. 20). San Francisco: Jossey-Bass. Newman, F.L., Griffin, B.P., Black, R.W., & Page, S.E. (1989). Linking level of care to level of need: Assessing the need for mental health care for nursing home residents. American Psychologist, 44, 1315-1324. Newman, F.L., Fitt, D., & Heverly, M.A. (1987). Influences of patient, service program and clinician characteristics on judgments of functioning and treatment recommendations. Evaluation and Program Planning, 10, 260-267. Newman, F.L., Kopta, S.M., McGovern, M. P., Howard, H.I., & McNeilly, C. (1988). Evaluating the conceptualizations and treatment plans of interns and supervisors during a psychology internship. Journal of Consulting and Clinical Psychology, 56, 659-665. Newman, F.L., & Howard, K.I. (1986). Therapeutic effort, outcome and policy. American Psychologist, 41, 181187. Newman, F.L., & Sorensen, J.E. (1985). Integrated clinical and fiscal management in mental health: A guidebook. Norwood, NJ: Ablex. Nunnally, J.C., & Burnstein, I. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill. Orlinsky, D.E., & Howard, K.I. (1986). Process and outcome in psychotherapy. In S.L. Garfield & A.E. Bergin (Eds.). Handbook of psychotherapy and behavior change (3rd ed., pp. 331-381). New York: Wiley. Patterson, D.R., & Sechrest, L. (1983). Nonreactive measures in psychotherapy outcome research. Clinical Psychology Review, 3, 391-416. Saunders, S.M., Howard, K.I., & Newman, F. L. (1988). Evaluating the clinical significance of treatment effects: Norms and normality. Behavioral Assessment, 10, 207-218. Stiles, W.B., & Shapiro, D.A. (1995). Disabuse of the drug metaphor: Psychotherapy process-outcome correlations. Journal of Consulting and Clinical Psychology, 62, 942-948. Strupp, H.H., & Hadley, S.W. (1977). A tripartite model of mental health and therapeutic outcome. American Psychologist, 32, 187-196.

< previous page

page_169

next page >

< previous page

page_170

next page > Page 170

Teague, G.B. (1995, May). Outcome assessments in New Hampshire. Presentation at the National Conference of Mental Health Statistics, Washington, DC. Teague, G.B., Drake, R.E., & Ackerson, T.H. (1995). Evaluating use of continuous treatment terms for persons with mental illness and substance abuse. Psychiatric Services, 46, 689-695. Turner, R.M., McGovern, M.P., & Sandrock, D. (1982). A multiple perspective analysis of schizophrenic symptomatology and community functioning. American Journal of Community Psychology, 11, 593-607. Uehara, E., Smukler, M., & Newman, F.L. (1994). Linking resource use to consumer level of need in a local mental health system: Field test of the ''LONCA" case mix method. Journal of Consulting and Clinical Psychology, 62, 695-709. Yates, B.T. (1996). Analyzing costs, procedures, processes, and outcomes in human services. Thousand Oaks, CA: Sage. Yates, B.T., & Newman, F.L. (1980). Findings of cost-effectiveness and cost-benefit analyses of psychotherapy. In G. VandenBos (Ed.), Psychotherapy: From practice to research to policy (pp. 163-185). Beverly Hills, CA: Sage.

< previous page

page_170

next page >

< previous page

page_171

next page > Page 171

Chapter 6 Design and Implementation of an Outcomes Management System Within Inpatient and Outpatient Behavioral Health Settings Jacque Bieber American Academy of Neurology Jill M. Wroblewski Strategic Advantage, Inc., Minneapolis, Mn Cheryl A. Barber University of Minnesota The increasing focus on measuring outcomes in behavioral health care is the result of increased pressure from accreditation bodies, managed care interests, consumers, and government bodies for behavioral health care providers to demonstrate results. This so-called move to measure has been driven by the need for quality and accountability in the health care industry in general. Dennis O'Leary, president of the Joint Commission on Accreditation of Healthcare Organizations (JCAHO), predicted that "a successful health care organization's future will increasingly depend on its ability to conduct measurement activities that affirm the quality and value of its services and provide direction for improvement efforts" (1997, p. i). In behavioral health care, the various parties involved all have their own purposes for measuring outcomes. Patients and their families want to know if a particular provider will be able to help them feel better and experience a higher quality of life; in addition, measures make the sometimes intangible changes in a patient's life more tangible and comparable. Payers and managed care companies want to know whether they are getting results and value commensurate with the dollars spent. Regulatory bodies and accreditation agencies want to see evidence that quality is an integral component of how a given provider conducts business. Meeting the Increasing Demand for Quality and Accountability To meet these increasing demands for quality and accountability, behavioral health care providers need systems to monitor and report their own performance over time and to implement continuous efforts that will improve that performance. Health care organizationsfrom providers to accreditation bodies to managed care companiesare expanding their quality efforts to include measurement. The lexicon refers to such efforts

< previous page

page_171

next page >

< previous page

page_172

next page > Page 172

as quality or performance measurement systems. Three dimensionsstructure, process, and outcomeprovide the basis for quality measurement (Donabedian, 1985). The movement in behavioral health care is to require measures in all three of these dimensions, providing a multidimensional assessment of quality. Given this framework, structure and process provide the context within which outcomes information becomes useful in the movement for continuous improvement/quality. To better understand the value of outcomes measures in the quality movement requires definition of all three dimensions: 1. Structure refers to the characteristics of the environment in which care is delivered, whether at a community, institutional, or provider level. Is the environment able to support quality care by ensuring that the right mix of health professionals, settings, locations, and methods of payment is available to meet the cultural, safety, and convenience needs of a given population? How do the characteristics of the population served impact the outcomes (Quality Measurement Advisory Service, QMAS, 1996)? 2. Process refers to the discrete activities that contribute to patient carewhat is done to, for, or by patients (Joint Commission on Accreditation of Healthcare Organizations, JCAHO, 1997a; Migdail, Youngs, & BengaenSeltzer, 1995). Process measures assess whether the admission, assessment, diagnosis, treatment, and disposition processes currently in place are provided skillfully and humanely to the people who need them and in ways that are responsive to people's preferences. Process measures also determine whether provisions exist to inform, involve, and support patients in the decision-making process (QMAS, 1997). 3. Within this context, outcomes can be defined as the intermediate or ultimate results of efforts to prevent, diagnose, and treat various health problems (McGlynn, 1996). Although a consensual definition of outcomes does not exist, certain components of existing definitions are commonly agreed on. For instance, it is generally accepted that outcomes are results that may be favorable or adverse (e.g., an increase or decrease in symptoms, risk factors, or functionality) and can be applied to individuals, subsets of a population, or an entire population. Moreover, outcomes can cover various lengths of time, such as from intake to discharge, or from intake to 6 months after the date of admission (Donabedian, 1985; JCAHO, 1997a; Migdail et al., 1995; National Committee for Quality Assurance, NCQA, 1997a). Finally, outcomes measures provide an accurate, reliable, quantitative assessment of the results of the procedures and treatments rendered. Outcomes Management Systems An outcomes management system applies outcomes measures to the health care decision-making process. Even though the term outcomes is loosely used to apply to measures ranging from cost, recidivism rates, and length of stay to functionality, patient satisfaction, and symptom reduction (Davis & Fong, 1996), the overriding concern of organizations that measure outcomes is the need to demonstrate value. Namely, is the quality of the care worth the dollars invested? This is the resounding question asked by payers who are reluctant to include mental health in their benefits packages, as well as consumers who have lost faith in providers' ability to deliver the right treatment in the best way possible. Consumers are assessing health care using the standard of value-based purchasing, and providers are being required to implement credible quality and outcomes measures to demonstrate that prescribed care is necessary and effective (QMAS, 1997). Good measures of outcomes make it possible for consumers to compare the performance of various health plans and providers and for providers and payers to better understand what does and does not work. However, it is only when empirical evidence is provided that it becomes possible to identify practices in need of change and make continuous improvement a reality.

< previous page

page_172

next page >

< previous page

page_173

next page > Page 173

Within the controlled environment, the structure and process practices can be tested and developed. On dissemination, they can be applied in a real-world setting that includes an outcomes management component. Practitioners and researchers alike can examine the results of the outcomes studies to modify the structure and process of treatment as they are applied in a real-world or research setting. Benefits of Measuring Individual Patient Outcomes Although the demand for mental health services to be empirically based and scientific means additional work for the providers of such services, these professionals also see the benefits of measuring both individual and aggregated patient outcomes. For instance, behavioral health care professionals may believe that psychotherapy is a powerful and useful service, but unless they can demonstrate this value to consumers and payers, continued financial and political support for this service, and perhaps others, will likely decrease. Thus, accountability can be viewed as a vital tool in promoting the behavioral health care profession's survival. In order to be accountable, the full range of providersfrom individual practitioners to full-continuum facilitiesneed to find new ways to incorporate measurement into their day-to-day operations. More specifically, providers need to demonstrate accountability at the beginning, middle, and end of each episode of care. On an individual basis, the therapist will need to be skilled at rapid assessment and intervention: two components of the brief and accountable therapy model required by most managed care companies (Johnson, 1995). Yet, with the increasing demand to deliver more for less, how will providers be able to incorporate outcomes measurement into their daily workflow? One way of managing the cost and time required to implement measurement on an individual patient basis is to use the same measurement tool for assessment and outcomes. At intake, such a tool can be used to support triage decisions, speed up the authorization process, and focus therapeutic time and effort on a specific, limited area of change that the client and therapist will work to achieve (Johnson, 1995). Brief assessment tools that measure a patient's symptoms or level of functioning can quickly establish a therapeutically relevant focus for treatment planning and provide a baseline from which to monitor an individual patient's progress during and after treatment (Fischer & Corcoran, 1994a, 1994b). As a client and health care professional act on a treatment plan, changes will occur in the client's symptoms and/or level of functioning. Outcomes measures make it possible to monitor these changes as they take place and to maintain or modify the treatment plan in midcourse, if necessary. Outcomes data also can support treatment termination. Sauber (1996) found that the use of concurrent outcomes measures allows clinicians to make sound judgments about each individual case by relating the information received from outcomes measures to the care process and to make any changes necessary. Benefits of Aggregating Patient Outcomes Measures When outcomes measures that assess individual patients' statuses pre- and posttreatment are aggregated, it is possible to demonstrate results for a specific program, level of care, clinician, or hospital. By aggregating the data across all of the patients who received a

< previous page

page_173

next page >

< previous page

page_174

next page > Page 174

specific treatment, the effectiveness of the treatment can be determined. Comparisons can be used to track trends over time, compare one entity with another, or serve as the basis for continuous improvement by helping identify which programs, clinicians, or facilities are producing the best results in terms of quality and cost. This information can then be included in published practice guidelines, making broad-based use of the best practices possible. Aggregated measures also support providers who need to meet regulatory requirements that call for performance measurement and continuous improvement efforts. Finally, outcomes measures that state results in a concise, userfriendly format serve as effective communication tools in discussions with referring sources and third-party payers. Predictive Modeling Capabilities Over the past 10 years, the aim of professionals involved in outcomes measurement has been to develop a system of prescriptive care that identifies practices associated with better outcomes for particular types of patients and bridges the gap between actual and optimal practice. The first step in determining what constitutes the optimal effective treatment is to determine what treatments are effective for specific patient groups. Although it is not exhaustive in scope, a list of such treatments for specific disorders has been developed by the Division 12 Task Force on Psychological Interventions of the Division of Clinical Psychology of the American Psychological Association (APA). This task force set out to develop a list of effective psychotherapies, referred to as empirically validated therapies (EVTs), to be used to educate clinical psychologists, third-party payers, and the public (Chambless, 1995). The information was not intended to serve as guidelines, standards, or official policy (Chambless et al., 1996); rather, its purpose was to highlight the fact that a variety of psychotherapies have been proven efficacious by research in the past 15 years (Chambless, 1995). The task force identified effective treatments for given disorders but did not identify those that are better within disorders or which patient characteristics will influence the choice of therapy. Although treatment matching (i.e., identification of the optimal treatment for an individual based on the patient characteristics) has not yet been realized, creating a list of EVTs is certainly a first step in this effort. Findings related to treatment matching are still extremely limited. Beutler (1991), a leader in this field, noted that the lack of success may be due to failure to specify and measure meaningful populations and interventions. Continued scientific research and identification of reliable treatments by consumer interactions will provide the information needed for the development of empirically based treatment matching. As the development continues, treatment-matching characteristics should include those that predict the need for varying treatment approaches (as cited in Lyons, Howard, O'Mahoney, & Lish, 1997). Patient characteristics also are used in outcomes management systems for the purpose of risk adjustment. Risk adjustment corrects results for differences in a population with respect to patient mix, allowing for more valid comparisons across groups. The first step in this process is to identify what patient characteristics make treatment outcomes more or less difficult to achieve, regardless of quality of treatment. Characteristics that should be considered include the severity, acuteness, and complexity of the disorder. In developing appropriate risk-adjustment models, patient groups that are typically less well served

< previous page

page_174

next page >

< previous page

page_175

next page > Page 175

or more difficult to treat effectively are often identified. This information may lead to valuable findings for research related to treatment matching (Lyon et al., 1997). There is a good deal of skepticism related to treatment matching, which is not surprising, given the scientific and practical problems of reaching agreement even on what outcomes should be measured and how. Clinicians have traditionally regarded their work as part science and part art. They sometimes resent measurement, viewing it as an intrusion into their field as well as an attempt to quantify what cannot be quantified (Sauber, 1996). Until researchers can demonstrate how outcomes measures and treatment matching will produce results that are superior in terms of cost and treatment effectiveness, clinicians will not likely be convinced that predictive modeling is worth the time and effort required to bring it to fruition. Common Reasons for Implementing an Outcomes System Outcomes measures can help clinicians and researchers alike determine whether a given treatment worked, whether an individual patient improved, or whether a new type of psychopharmaceutical drug worked better than another (Sauber, 1996). The new market-driven demand for outcomes measures in behavioral health care has dramatically changed and broadened the focus of outcomes research, from measuring the effects of treatment in ideal settings (i.e., efficacy studies) to using outcomes to justify behavioral health care services for entire populations. Whereas efficacy studies assess outcomes of treatment delivered under ideal conditions, effectiveness studies assess outcomes of treatment delivered under ordinary circumstances, typical of everyday practice (Institute of Medicine, IOM, 1997; McGlynn, 1996). Because efficacy studies are costly, however, many treatment strategies have not been evaluated across the various diagnoses, severities, and patient characteristics that are present in the average practice. Yet, with the advent of clinical management information systems, it should be possible to implement outcomes measures that can improve accountability and lead to an improved understanding of which practices are most effective in producing positive outcomes with different kinds of patients (Docherty & Streeter, 1996; Kane, Bartlett, & Potthoff, 1995). One of the current goals of the psychology industry is to integrate efficacy and effectiveness research (Fowler, 1996 ). The first step toward achieving this goal would be to continue the basic research into structure, process, and outcomes in a clinical trial setting, where a specific group of patients can be treated in strictly controlled settings. Doing so would help determine the treatment characteristics associated with particular outcomes for a given case mix. When such results are collected and applied to practice environmentswhere patient characteristics, treatment protocols, and practitioners vary in many areasit is difficult to evaluate the impact of each structure and process component of treatment (McGlynn, 1996). In spite of the difficulty of applying the results of controlled studies in less than controllable environments, quality assessment systems must begin to incorporate measures of client outcomes in their measurement programs (IOM, 1997). Continuous improvement efforts are attempting to evaluate the reasons actual practices may fail to achieve the results obtained in clinical trials (i.e., assessing the gap between efficacy and effectiveness) and to look at ways to improve the delivery of care in real settings. The Committee on Quality Assurance and Accreditation Guidelines for

< previous page

page_175

next page >

< previous page

page_176

next page > Page 176

Managed Behavioral Healthcare contended that "outcomes research is vitally important to improve the base of evidence related to treatment effectiveness. Outcomes research is needed to provide explicit direction in identifying performance indicators associated with good outcomes for different patient characteristics, types of treatment programs, and types of managed care organizations" (IOM, 1997, p. 5). The committee also recommended that outcomes measures increasingly should be based on evidence from research. According to Migdail et al. (1995), "The best process indicators focus on processes that are linked to patient outcomes, meaning that a scientific basis exists for believing that the process, when provided effectively, will increase the probability of a desired outcome" (p. 483). In recent years, as outcomes measures have extended beyond clinical efficacy research to initiatives for creating a behavioral health care delivery system that is more efficient and effective, a number of approaches have been developed for various stakeholders in order to assess the quality of behavioral health care. These approaches include profiling, report cards, instrument panels, score cards, and benchmarking. (They are discussed in detail in the following sections.) Accreditation standards for behavioral health care also have been developed in recent years, providing a national "seal of approval" used by various stakeholders to assess the value of services provided. Outcomes measures can be used in the process of continuous quality improvement for behavioral health care organizations to assess and improve the quality of care. In addition, outcomes can be used internally for clinical decision making in behavioral health care. Clinicians and researchers must be able to demonstrate that the treatment provided to patients will lead to more successful outcomes. By tracking outcomes of patients throughout treatment, so-called real-time clinical decision making is possible. In this context, outcomes measures provide clinicians with an effective tool for evaluating and monitoring the progress of treatment for patients. Generating Report Cards of Provider Performance The restructuring of health care in the 1980s and early 1990s is responsible for bringing quality and accountability to the forefront in this industry. Today, controlling costs and improving the quality of care are the aims of health care reform. Purchasers have demanded ways to measure the effectiveness of these efforts. Rather than make selections based solely on the costs or sizes of networks, purchasers wanted ways to measure the value of the health care services purchased. In response to this demand, various health care and government organizations have gathered and published data that allow consumers, purchasers, and state agencies to compare the performances of competing health plans and provider services. Key stakeholders, such as consumers and purchasers, also have pushed the agenda toward presenting objective information on the quality of health plans and provider services. The documents that have resulted from these initiatives are known as report cards (Mental Health Statistics Improvement Program Task Force, MHSIP, 1996). They are brief, standardized forms of profiling used to compare providers, health care systems, and health care plans. Eisen and Dickey (1996) described profiling as an analytical method of comparing practice patterns, client outcomes, accessibility to care, appropriate utilization of services, and satisfaction with care of providers. Profiling is a comparison of two or more groups of clients, providers, payers, geographic regions, or systems of care. From such comparisons, standards and guidelines are developed that

< previous page

page_176

next page >

< previous page

page_177

next page > Page 177

are based on expert opinion, research, and normative data from the general population or population subgroups. Report cards provide comparative data for all types of performance measures. In behavioral health care report cards, those measures include access to mental health services, appropriateness of services provided, patient outcomes, consumer satisfaction, quality of care, and prevention. Information about these performance measures are used primarily by the following stakeholders in the behavioral health care arena: Consumers: To compare and evaluate alternative behavioral health service options. Advocacy groups: To promote better services. Health care purchasers: To evaluate managed care organizations or provider systems. Providers: To monitor the performance of their systems over time and implement continuous quality improvement by identifying areas that need change and by developing strategies to increase the quality of care. Government mental health agencies: To monitor quality and desired outcomes across different provider systems. Managed care organizations: To monitor quality and desired outcomes across different provider systems. Public health officials: To assess whether the nation's health care goals are being met (Eisen & Dickey, 1996; MHSIP, 1996). Prior to the mid-1990s, medical/surgical report cards included few performance indicators for behavioral health care (Panzarino, 1996). In spring 1995, the American Managed Behavioral Healthcare Association (AMBHA; 1995) addressed this void by becoming one of the first groups to initiate a report card in the behavioral health care field. The AMBHA's goal was to produce a set of nonfinancial indicators that would provide accountability in behavioral health care. Around the same time, the Center for Mental Health Statistics Improvement Program (MHSIP) was working on its own report card; in spring 1996, the Consumer-oriented Mental Health Report Card was released. This initiative involved consumers in every stage of the development process, which distinguished it from earlier initiatives. In 1994, the General Accounting Office (GAO) reported in a summary of report card initiatives that "individual consumers have had minimal input into selecting report card indicators, and little is known about their needs or interests. . . . As a result, their needs may not be met" (as cited in MHSIP, 1996, p. 3). A collaborative was formed, which consisted of consumers and professionals from MHSIP of the Center for Mental Health Services. This group's mandate was to construct a report card that addressed the needs of mental health consumers. Adults with serious mental illnesses and children with serious emotional disturbances were targeted and included as active participants in the process. (Many other report card efforts had focused on the general population, so the needs of people with serious mental illnesses were not adequately addressed.) The product of this effortthe Consumer-oriented Mental Health Report Cardwas designed to help mental health consumers, advocates, health care purchasers, providers, and state mental health agencies compare and evaluate mental health services based on concerns that are important to consumers (MHSIP, 1996). Other national-level report cards that currently include or plan to include a behavioral health care component include the following: 1. National Report CardThe Impact of Managed Care on People with Severe Mental Illness: National Alliance for the Mentally Ill (NAMI)

< previous page

page_177

next page >

< previous page

page_178

next page > Page 178

2. Health Plan Employer Data and Information Set (HEDIS) and Quality COMPASS: National Committee for Quality Assurance (NCQA) 3. Foundation for Accountability (FACCT) 4. Consumer Assessment of Health Plans Study (CAHPS): Agency for Health Care Policy and Research (AHCPR) At the state level, about half of all U.S. states have developed or currently are developing report cards or performance outcome measurement systems. At the county level, the National Association of County Behavioral Healthcare Directors (NACBHD) has contracted with the Evaluation Center at Health Services Research Institute to develop candidate indicators for county performance outcomes (National Technical Assistance Center, 1996). In addition, many managed care plans have developed their own report cards, as have some hospitals, health care provider groups, and business coalitions (Atlantic Information Services, AIS, 1996b). Score Cards and Instrument Panels. Score cards and instrument panels comprise an additional category of aggregate measurement that has emerged in recent years to help health care delivery system administrators better manage both cost and quality. Instrument panels and score cards provide information at the systems level and key processes within the system, determined by a leadership group within a health care organization. A key feature of such measures is that they provide critical, real-time information to management, which supports action-oriented decision making. These decision-support tools provide performance rankings for overall health care systems or specific processes or outcomes, such as patient access, clinical processes, and customer satisfaction. These measures are put in place to help identify and manage variation in the system (Lyons et al., 1997). Nelson, Batalden, Plume, Mihevc, and Swartz (1995) suggested that "clinical leaders can quickly scan the control chart instrument panel to check for special-cause variation, to identify desirable or undesirable trends and to observe the level of variation and average performance" (p. 159). For example, based on the results of a score card or instrument panel, a leadership team will want to investigate why little or no improvement occurred in areas that had been targeted for change. The group also will want to observe if any key measures show too much variation or are performing at unacceptable levels (Nelson et al., 1995). Nelson et al. (1995) differentiated report cards and instrument panels as different tools that are suitable for stakeholders with different needs. Table 6.1 presents a comparison of these tools based on four issues: main user, main use, time frame, and focus. In sum, purchasers use report cards to make judgments about whom to select as a provider, to evaluate past performance, and to promote accountability. Providers, on the other hand, use instrument panels to monitor and control critical processes, to correctly identify trends (i.e., positive or negative), and to promote better quality and value. TABLE 6.1 A Comparison of Report Cards and Instrument Panels Issue Report Card Instrument Panel Main user Purchasers Providers Main use Judgment, Control, improvement accountability Time Past Present, future frame Focus Outcomes, charges Process, outcomes, costs

< previous page

page_178

next page >

< previous page

page_179

next page > Page 179

Benchmarking. The process of benchmarking involves finding the best performer in an industry for a particular measure or indicator (Nelson, 1996). Once the best case has been identified, the underlying processes responsible for generating consistently superior results can be examined and replicated, allowing application of the model in other like settings. In short, benchmarking is the hunt for the best process. That process is found when an organization's measured outcomes are consistently the best of the best. These best practices then can be incorporated into practice guidelines or implemented as individual clinical process improvements. The issues associated with successful benchmarking are numerous: Who should participate? What questions will efforts address? What tools will be used? How will information be disseminated? How will the results be used? (Gift & Mosel, 1997). The need to collect comparable data requires a collaborative approach. More and more providers are collaborating on projects, in which the member groups' data are pooled and each member then is able to benchmark its results against those from similar organizations (Trabin, 1997). As noted by Trabin and Kramer (1997): Busy behavioral healthcare organizations with narrow profit margins do not have the wherewithal to create experimental or even quasi-experimental designed outcomes research. They need assistance in identifying the middle group between clinical trials research on one extreme and outcomes evaluation designs that are so poor as to not be worth doing on the other extreme. An approach being widely considered as an alternative to experimental design is benchmarking against the outcomes results of similar organizations that agree to use the same evaluation methodology. . . . The large outcome databases used by organizations that benchmark against each other will not only become repositories of valuable data, but will also be primary generators of new knowledge for our field. (p. 348) A number of collaborative efforts are underway in the behavioral health care marketplace. The Outcomes Management Program (OMP), implemented by the University of Cincinnati Quality Center in collaboration with Mesa Mental Health Professionals in Albuquerque, New Mexico, and the Council of Behavioral Group Practices, currently uses its benchmark findings in the following ways (Zieman, Kramer, & Daniels, 1997): to comply with NCQA standards for MBHOs, to conduct quality liaison activities with payers, to develop protocol, to profile providers, and to develop networks. The Practice Research Network (PRN), established by the American Psychiatric Association (APA), states that its purposes are to help generate psychiatric research that supports clinical decisions in a timely fashion and to consider policy issues. Presently, the network has over 500 members, who have been randomly selected and/or systematically recruited (Zarin, Pincus, & West, 1997). The Pennsylvania Psychological Association has developed another network of practitioners with support from the American Psychological Association's Committee for the Advancement of Professional Practice and the Pennsylvania State University. This network of practitioners and researchers are interested in the application of outcomes and other specialized clinical research (Ragusea, 1997). Assisting in Clinical Decision Making Outcomes measures can assist in clinical decision making in a number of ways. Before examining those possibilities, however, it is important to understand the context in which they will occur. The realities of the current health care environment have specific

< previous page

page_179

next page >

< previous page

page_180

next page > Page 180

effects on clinical practice and decision making in behavioral health care. Consider the following: Health care, the largest service industry in the United States, is becoming a market-driven industry, which is changing the power paradigm and decision-making process. Payers, providers, and patients each have power in the new paradigm. Health care consumers want convenience, quality, information, control, and support when they purchase health careand all for a good price. It is difficult to measure the quality of health care, as the treatment applied often disappears at the point of delivery. Managed care companies have succeeded in making brief episodes of care the norm. Because all of health care is a young science (and thus has limited empirical underpinnings), ''some of today's medical advice may be dogma rather than science, based on beliefs about the relationship between cause and effect rather than on facts" (Herzlinger, 1997, p. 72). Medicine is in the process of industrializing how it does business. To be successful in this effort, health care will need accurate measures of the costs and benefits of various methods and will need to be able to compare outcomes measures, as well (Lyons et al., 1997). Efficiency and effectiveness have become essential ingredients of successful health care organizations. What do all these developments mean for the behavioral health care practitioner? Is there a panacea for practitioners who are willing to "grapple with the need to be customer driven?" (Chowanec, Neunaber, & Krajl, 1994, p. 47)that is, willing to see patients as consumers who are important partners in health care? What part can outcomes management play in helping practitioners address the transformation in behavioral health care that is currently underway? Assessment and outcomes measures can support brief therapy by providing a real-time tool to focus the efforts and energies of practitioners and consumer-patients on well-defined needs. For instance, assessment data can be collected readily by having patients complete questionnaires, which can be scored within 5 to 15 minutes via computer or paper-and-pencil-based FAX-back systems. The resulting patient profile then is available to support treatment-planning and authorization decisions. A number of the primary principles of delivery that Cummings believed are necessary for brief, effective therapy are readily supported by assessment and outcomes measures. These measures can help focus the psychotherapy, engage patients as partners in treatment, and ensure that they receive appropriate amounts of therapy at given sessions within lifelong relationships. Outcomes measures can be used from episode to episode to track an individual consumer-patient's progress over a number of years (Cummings & Sayama, 1995). In addition, outcomes measures can serve as communication tools, a necessary ingredient in accountability. Outcomes measures provide a common way for professionals to talk to each other as well as those who review their work (Johnson, 1995). Outcomes measures can help the practitioner and consumer-patient measure progress toward a plan; modify the plan, as needed; support managed care authorizations for further care; and/or demonstrate that it is appropriate to terminate treatment. Outcomes measures can serve as the basic measurement tool for developing practice guidelines and continuous improvement. "The relative effectiveness of different approaches needs to be known, and routine monitoring of outcomes in a treatment setting or system of care can be a basis for continuous quality improvement within a treatment delivery organization" (IOM, 1997, p. 232).

< previous page

page_180

next page >

< previous page

page_181

next page > Page 181

Although the use of outcomes measures in clinical practice is still in its infancy, a number of organizations (including Charter Behavioral Health System and Kaiser Permanente) have initiated efforts to implement outcomes management systems. Considerable research is still needed, however, before the measurement of outcomes will be an integral part of all behavioral health care operations. This was the conclusion stated by the Institute of Medicine's (IOM) Committee on Quality Assurance and Accreditation Guidelines for Managed Behavioral Health Care (1997) when it released its recommendations in the area of outcomes research. The committee recommended continued support and development of collaborative health services research on the effectiveness of different treatment strategies for a variety of practitioner types and for consumers with different needs. Continuous Quality Improvement Continuous quality improvement is a managerial approach in which the goal is to put processes and systems in place that will prevent problems from happening, which makes corrective efforts unnecessary. Contrast this approach with quality assurance efforts that involve inspection of people and products, often after the fact. Continuous quality improvement is the process of assessing an organization's key systems, processes, and outcomes in an effort to understand how functions, processes, and structures influence performance so that revisions can be made to improve performance. A continuous feedback loop provides the foundation for efforts at continuous quality improvement (Lyons et al., 1997). Efforts toward continuous quality improvement started in the 1920s, when statistical process control methods were introduced in the Bell Laboratories. W. Edwards Deming brought these quality measurement techniques to Japanese manufacturers, who applied them to production processes and created superior quality in Japanese products. American manufacturers embraced these quality management principles in the late 1970s and early 1980s. The health care industry followed suit in the early 1980s, but improvements in the quality of care came slowly. It was found that the tools that could be easily transferred from a Japanese manufacturing plant to an American one could not be so easily transferred to health care. Health care professionals needed to develop new tools for continuous quality improvement and create methods of incorporating quality improvement into their workflow. In fact, the development of measurement tools and processes is an evolutionary process that is still in its early stages in the health care field. When continuous improvement processes are in place, data drive decision making by helping monitor critical processes and tracking the effects of changes in the delivery of service (IOM, 1997). In a health care setting, tracking outcomes must be logically linked with the processes and structures that exist in order to improve the quality or performance of an individual or organization. Lyons et al. (1997) stressed the need to reduce practice pattern variability as a key component of any continuous improvement effort: Taking an important lesson from the quality improvement initiatives in manufacturing, healthcare has learned that variation is the enemy of quality and that the practice of medicine must undergo an industrialization process. This process, now fully underway, promises to rationalize healthcare

< previous page

page_181

next page >

< previous page

page_182

next page > Page 182

delivery, provide accurate measures of the costs and benefits of interventions, and compare outcomes across providers (p. 12). A number of quality tools and methods have been introduced within the field of health care to manage variation, from practice guidelines to critical pathways. Nelson (1996) provided five methods to manage the structure and process of an organization, to provide feedback on the results of implementing the practices and interventions aimed at improving the process and structure of an organization, and then to insert that feedback and feed forward into the system (see Fig. 6.1). This last stage, feeding forward, gives the clinician specific, timely information about the patient's clinical status, functioning and general well-being, health habits, and ratings of care for the problems experienced to date. The clinician can then use this information to design the plan of care that best matches the patient's needs and also provides a qualitative health profile for a given patient population (Nelson, 1996). This set of quality improvement tools is representative of work underway throughout the health care industry to ensure standardization and support continuous quality improvement efforts. Continued efforts are needed as more and more accreditation agencies and purchasers are expecting health care organizations to have formal quality improvement programs in place. For example, the NCQA (1997c) described quality management and improvement as "an integrative process that links knowledge, structures, processes, and outcomes to enhance quality throughout an organization." According to the committee, a well-structured quality improvement program provides the framework within which an organization can assess and improve the quality of clinical care, clinical services, and member services. This structure includes a clear definition of the authority of the quality improvement program, its relationships to other organizational components, and its accountability to the highest level of governance within the organization. (pp. 41-42) Improvements in patient care and outcomes are the desired end results of continuous quality improvement efforts. Accreditation Requirements In 1997, the NCQA released new standards for managed behavioral health care organizations and the JCAHO released a manual of standards for behavioral health care. Previously, no national measures had existed for assessing the quality and accountability of behavioral health care services. The release of the NCQA and JCAHO standards clearly demonstrates recognition of the importance of assuring that quality performance standards are developed and processed for positive behavioral health care outcomes. Accreditation tends to set the minimum standards and provide a national seal of approval that can be used to assess the value of services offered by providers and managed care organizations (Drissel & Taylor, 1997). Mathios (1997) asserted that "accrediting organizations raise the minimum standards under which treatment delivery and managed care organizations must function" (p. 75). In fact, accreditation is required by corporate and government purchasers who bid out major contracts for behavioral health care services. Recently, national organizations that accredit various health/social service providers have attempted to develop core competencies that are expected to be consistent across agencies. For example, the Commission for the Accreditation of Rehabilitation Facilities

< previous page

page_182

next page >

< previous page

page_183

next page > Page 183

Fig. 6.1. Basic framework of health care delivery and context of improvement models. Reprinted with permission from "Using Outcomes Measurement to Improve Quality and Value" (No. 71, p. 113) by E.C. Nelson, in D.M. Steinwachs, L.M. Flynn, G.S. Norquist, & E.A. Skinner (Eds.), Using Client Outcomes Information to Improve Mental Health and Substance Abuse Treatment: New Directions for Mental Health Services. San Francisco: Jossey-Bass. Copyright © 1996, Jossey-Bass Inc., Publishers. All rights reserved.

< previous page

page_183

next page >

< previous page

page_184

next page > Page 184

(CARF) and Council on Accreditation of Services for Families and Children, Inc. (COA) are in the process of developing cooperative arrangements with the JCAHO to recognize each other's accreditation status. Such arrangements will protect small providers against the increased costs of meeting multiple accreditation requirements. Accreditation Bodies. Accreditation bodies also are placing more emphasis on performance and outcomes measures and less on standards that measure processes and structures. The goal of this trend is to encourage administrators and clinicians to review their organization's performance over time and identify opportunities for improvement. The following items describe the major accreditation agencies for behavioral health care in the United States: Joint Commission on the Accreditation of Healthcare Organizations (JCAHO). The JCAHO is the largest standards-setting and accrediting body for health care in the nation. In 1997, it introduced ORYX, the JCAHO performance measurement system, which requires organizations to collect and submit data about the results of their care for the purpose of reviewing quality performance over time. The accreditation process will require performance measurement in the future (JCAHO, 1997b; Moore, 1997). Commission for Accreditation of Rehabilitation Facilities (CARF). The CARF is a private, nonprofit, international accreditation commission that was established to improve community-based rehabilitation programs designed for people who are chronically and persistently mentally ill. It, too, is developing performance-based indicators as standards for services. National Committee for Quality Assurance (NCQA). The NCQA accredits health maintenance organizations (HMOs), health plans, provider networks, managed care organizations (MCOs), and managed behavioral health care organizations (MBHOs). It focuses on performance and outcomes measurement at the administrative level. The NCQA publishes the Health Plan Employer Data and Information Set (HEDIS), a type of report card that gives purchasers and consumers information based on standardized performance measures organized by domains in which to compare the performance of MCOs. The standards are based on a population-based continuous quality improvement management model that begins with needs assessment at the preventive level as well as the treatment level. Doing so helps promote population-based improvements. To date, most of HEDIS has focused on medical, rather than behavioral, care; only recently, HEDIS was expanded to include several behavioral measures. Although HEDIS data collection is not required for NCQA accreditation, most managed care organizations utilize HEDIS measures. Continuous quality improvement is a core principle of the NCQA's Behavioral Health Accreditation Standards (NCQA, 1997c). Council on Accreditation of Services from Families and Children, Inc. (COA). The COA accredits high quality, often lower cost, less intensive providers of a full range of social service and behavioral health care programs to children and families services. It accredits many services for which no other accreditor has standards (Mathios, 1997). The COA is attempting to apply more outcomes measures in its standards (Drissel & Taylor, 1997). Council on Quality and Leadership in Support of People with Disabilities. The Council on Quality and Leadership in Support of People with Disabilities accredits organizations that provide services and support to this population. The council uses a unique accreditation process in which staff meet with individuals with disabilities to identify how they define the outcomes expected from the services and support provided them. The council then provides feedback and consultation to the organization in the form of steps toward quality improvement. Accreditation is given to those organizations that demonstrate responsiveness to clients' priority outcomes, rather than compliance

< previous page

page_184

next page >

< previous page

page_185

next page > Page 185

with organizational processes. The purpose of this approach is to focus organizations on the design of best practices to optimize outcomes (Mathios, 1997). Common Trends and Barriers to Accreditation Guidelines. Drissel and Taylor (1997) suggested a trend in which accrediting agencies review continuous quality improvement against two criteria: (a) "how individual components are involved in setting and monitoring the performance of their part of the overall system" and (b) "how information on performance is fed back to the organization to help maintain or improve the overall quality of service and operations" (p. 13). The IOM's Committee on Quality Assurance and Accreditation Guidelines for Managed Behavioral Health Care (1997) pointed to the burden of accreditation requirements that often overlap in systems performing multiple functions. Namely, the cost is prohibitive for organizations to obtain more than one type of accreditation to satisfy employers, states, and other stakeholders, which makes multiple accreditation unrealistic. The committee suggests that such issues question the utility and validity of accreditation. In sum, "the accreditation industry is faced with pressure to focus its standards on the relevant issues, collaborate with similar organizations, and consolidate the multitude of accreditation standards to reduce overlap and redundancy" (pp. 214-215). Supporting Marketing Efforts Using outcomes measures to support marketing efforts is becoming the norm, rather than the exception, in the behavioral health care field. In today's market-driven environment, consumer-patients are looking for ways to shop more effectively for their health care needs. Outcomes systems provide tools that can help consumers get the most for their money. For instance, report card systems that compare one provider or health plan with another provide consumers with information about what types of results they may expect from a given provider or plan, which they can then use to comparison shop. Organizations that participate in report card systems have taken a large step toward demonstrating their accountability to consumers. When results are tracked over a period of time and needed improvements are made, this information can be communicated to the market. Providers that meet the performance measurement portion of accreditation and regulatory requirements can market this benefit to consumers in brochures, magazines, trade shows, and public forums. Certainly, individual organizations will have different opportunities for using outcomes measures to support their marketing efforts; nonetheless, the benefits of implementing outcomes management systems can and should be communicated to consumers. Questions Commonly Addressed by Outcomes Systems As noted, some accrediting organizations and benchmarking collaboratives are striving to implement outcomes systems that have common data elements and thus make reference comparisons possible. However, little standardization is in place on the dataelement level. Different systems continue to measure data in different ways. Most are, however, attempting to answer the same basic questions.

< previous page

page_185

next page >

< previous page

page_186

next page > Page 186

How Much Improvement Has Occurred as a Result of Treatment? All stakeholders are interested in improving the conditions of patients: Payers want to know whether patients needed treatment and if treatment took place, how much change occurred. Patients, along with their families, friends, and employers, want to know the level of improvement. Clinicians want to monitor the level of improvement throughout treatment. Outcomes measures can be used to satisfy all these needs by monitoring patients throughout treatment to determine if improvement is taking place and if so, to what degree. Outcomes measures also can indicate whether patients are not improving or are deteriorating, or whether they have improved to the point where treatment is no longer needed. Clinicians can use all of this information to better plan and manage the treatment process. Patients and others interested in their well-being will also find this information useful, as will potential patients and payers who want to compare the results of one provider's treatment with those of another. Nelson (1996), a proponent of using outcomes measures to improve the value of mental health care, believed there is currently a high demand from society to prove that treatment is beneficial. Nelson pointed out that "because of the social prejudice surrounding mental illness, the need to demonstrate striking proof of treatment benefit is heightened" (p. 122). What Type(S) of Patients Benefited most from the Treatment, Organization, and/or Individual Provider? what Type(S) Benefited Least? Smith (1996) stated that there are three objectives of patient outcomes assessment: to assess patient characteristics, to assess treatment processes, and to assess the effects of routine care in order to improve the outcomes produced by mental health and substance abuse treatment (pp. 59-60). In addition, Smith noted that outcomes assessment is being used to improve care by linking providers and patients with various treatment approaches and outcomes. Continued research is needed to determine the most effective means of matching patients with specific characteristics to those providers or modalities that will best serve their needs. Doing so continues to be a significant goal of the industry's measurement efforts. As discussed earlier, Nelson (1996) proposed five models for producing "better-value health care": guidelines, protocols, benchmarking, feedback/outcome measures, and feed forward (see Fig. 6.1). Yet, according to Nelson, "the gap between the potential gains and those realized to date is staggering" (p. 120). This statement applies to all the process improvement methods currently in place in the mental health field, particularly those that attempt to match patients who have specific characteristics and diagnoses with specific providers and treatments. What Nelson referred to as feed forward is the model that could bridge this gap. Although this is the least well known of the five models, "it has the potential to be the most powerful and helpful, because it can be used to improve care of individual patients immediately, as well as to create an information feedback loop to improve the care for future patients'' (p. 119).

< previous page

page_186

next page >

< previous page

page_187

next page > Page 187

The IOM's Committee on Quality Assurance and Accreditation Guidelines for Managed Behavioral Health Care (1997) cited the need to identify which treatment modalities are used and to track how they impact outcomes for patients with various characteristics. More successful and cost-effective outcomes will be possible when research can help answer questions regarding which types of patients are best served with which treatment processes and protocols. How Satisfied are Patients with the Products and Services they Receive? Measuring consumers' satisfaction with the products and services they receive has become commonplace for successful businesses. The notion that consumers have values and perspectives worthy of soliciting and acknowledging can be traced to the 1950s, when Drucker (1954) suggested to the business world that the customer was "the reason for being." The health care industry has only recently embraced the idea that consumer-patient satisfaction is a crucial factor in the perceived value of services offered and that the consumer-patient deserves to have a voice in health care decision making. Today, consumer-patients can make choices about their health care purchases and are demanding to become involved in the decision-making process. Thus, measuring how satisfied they are throughout the treatment process is an essential ingredient of health care in a competitive, industrialized delivery environment. Patients, along with their family and friends, who are satisfied with the quality of care they have received are likely to recommend the provider to others, as the occasion presents itself. They are also likely to return to the same provider, if the need arises. Perhaps just as importantly, satisfied patients are less likely to complain to others about the services they received. Surveys are often used to assess overall patient satisfaction. To be useful, a survey should cover a wide range of issues, including access to services, level of patient involvement in decision making, respect for individual and cultural differences, and the degree to which the care delivered was acceptable, met expectations, and produced desired results. Satisfaction surveys generally show that consumers have positive feelings about behavioral health care services (Elbeck & Fecteau, 1990; Polowczyk, Brutus, Orvieto, Vidal, & Cipriani, 1993). The main issue to consider when creating patient questionnaires is whether the information gathered will make it possible to modify services in ways that will improve customers' levels of satisfaction. It is not enough to find out whether people are satisfied or even with what issues they have complaints. Additional information about patients' expectations is needed to consider alternatives and thus improve the quality of care. Survey questions must be worded and ordered purposefully to elicit this type of information from patients (MacStravic, 1991). What is the Value of the Services Offered? The perception of value is different for all of the major stakeholders in behavioral health care. For instance, does the consumer-patient value the changes that have occurred from receiving services? Do family members perceive positive changes have taken place? Do employers see more productive employees? Do patients at one clinic function better than those at another? Does the payer think the services provided were worth the investment?

< previous page

page_187

next page >

< previous page

page_188

next page > Page 188

TABLE 6.2 How Stakeholders Can Use Outcomes Measures Users Uses Consumers Track progress Discuss options with clinician Identify preferred provider Payers/PurchasersAssess effectiveness of provider to support referrals Track progress Clinicians Support treatment decisions Track progress Assess effectiveness of methods Management Support staffing and resourcing decisions Compare modalities Assess effectiveness Outcomes measures can encompass a number of domains. Nonetheless, the primary goal of treatment and thus the primary measure of outcome should be whether a reduction in symptoms, disability, risk factors, and/or social consequences of a given disorder occurred as a result of treatment. To what level has functionality and quality of life improved? Given stakeholders' diverse perspectives on the value of services provided, it follows that they will all have different uses for outcomes information (see Table 6.2). Thus, one of the major tasks of designing and implementing an outcomes system is to identify the needs and relative value that outcomes measures may have for various stakeholders. How Much is the Cost of Treatment Offset by Savings in Other Areas of Patients' Functioning? Payers are primarily interested in how much care costs and the exact services provided for that cost. Obviously, then, they are interested in measures of controlling cost, as well. One promising measure has come from research regarding the use of psychotherapy to relieve patients' distress and thus their need for medical services. This concept is known as medical cost offset. Some of the original research in this area was conducted by Follette and Cummings as early as 1967 with Kaiser Permanente's department of psychotherapy. They found that the use of medical resources by persons with emotional distress was significantly higher than that by the average health plan user. Follette and Cummings also found that there were significant declines in medical resource utilization rates for patients who received psychotherapy, as compared with those patients in the control group who did not receive psychotherapy (as cited in Cummings & Sayama, 1995). In 1981, Cummings and VandenBos summarized Kaiser Permanente's 20 years of research into this issue, concluding that the absence of psychotherapy can "leave patients with little alternative but to translate stress into physical symptoms that will command the attention of a physician" (as cited in Cummings & Sayama, 1995, p. 25). Other researchers have replicated these findings. The Dean Foundation for Health, Research, and Education examined the use of general medical services by depressed and nondepressed patients. They found that the general medical costs of patients with depression can be reduced if those patients are diagnosed and adequately treated

< previous page

page_188

next page >

< previous page

page_189

next page > Page 189

("Depression Treatment," 1997). In 1989, Fiedler and Wight reported that Medicaid patients who received mental health intervention in addition to being hospitalized for physical maladies saved a cumulative average of $1,500 over the following 2.5-year period (as cited in "Mental Health Benefit," 1993). And Holder and Blose reported the results of a 3-year study conducted by Aetna, which saw the average subject's health costs drop from $242 the year prior to receiving mental health treatment to $162 per year for the 2 years following such treatment (as cited in "Mental Health Benefit," 1993). The idea that providing behavioral health care treatment can decrease total health care costseven when the cost of the behavioral health care intervention is includedis becoming widely accepted. Continued research into this important paradox in the health care industry should be a critical component of any outcomes management system. Measuring Outcomes There are three important questions to ask about measuring outcomes: What outcomes are of interest? Who will provide and use the information? And, when should information be gathered and disseminated? Each question is discussed briefly here and in more detail later. In deciding what outcomes are of interest, it is important to realize that the area of mental health and substance abuse covers a broad range of outcome domains. To reflect the effects of treatment in a comprehensive and valid way, a system should include a broad range of outcome dimensions (Docherty & Streeter, 1996). Domains beyond treatment effectiveness (e.g., consumer satisfaction) also must be considered. Answering the question of Who? requires determining the source or sources of the information. Depending on the domains of interest, collecting information from multiple sources may be necessary; for example, subjective quality of life measures can only be collected from individual patients. Moreover, collecting information from multiple sources will increase the depth of the system. If data on a given domain are available from multiple sources, collecting them will increase the robustness of the data by providing varying perspectives of that domain. Finally, determining when measures will be collected is equally important. The first decision related to the timing of measures is to determine at which time points (e.g., baseline, end of treatment, short-term follow-up, long-term follow-up) data will be collected. In order to have a complete spectrum of effect, baseline and multiple time points afterward must be measured. Some treatments have only long-term effects, whereas others may have short-term effects that diminish over time. Beyond determining time points, it must be decided whenrelative to the beginning or end of treatmentthese measurements will be taken. Answers to these three questions about outcomes will depend, in part, on the goals of the system, the population of interest, the resources available (both time and money), and the business flow. In addition, there are specific issues related to each area of what to measure, from whom to measure, and when to measure. What to Measure The domains measured as outcomes and reported in the professional literature highlight the multidimensionality of mental health and substance abuse treatment. It is well accepted that mental illness and substance abuse affect patients in many ways beyond

< previous page

page_189

next page >

< previous page

page_190

next page > Page 190

the symptoms of the primary illness. Although this may be true in most illnesses, it is especially true of mental illness and substance abuse. Thus, a true measure of the effectiveness of treatment includes assessment not only of the clinical status of the disorder but also daily functioning, quality of life, and interpersonal relationships. This emphasizes the need for measurement of multiple domains within the area of clinical outcomes. As the move to achieve standardized data continues, measurement will need to move from collection of satisfaction only to collection of more clinically related outcomes and collection of a broader range of outcome domains (IOM, 1997). Most commonly measured outcomes can be categorized into one of five broad domains: clinical status, functionality, health-related quality of life, satisfaction, and resource use/treatment utilization (Coughlin, Simon, Soto, & Youngs, 1997). Each of these domains is considered further later. Although such a discussion is outside the scope of outcomes measures, it is important to note here the need for additional basic data. These core data should include patient identifiers and descriptors. Regardless of what domains are collected, supplemental information should be collected on all patients in the system to serve the following four main goals: to describe the population, to identify subpopulations of interest, to provide data for predictive modeling, and to provide data for risk adjustment. If one of the goals of a system is to assess the impact of treatment process on outcomes, additional information related to treatment characteristics must also be included. Clinical Status. Clinical status is a broad domain that includes not only the symptoms of the disorder, but also areas such as severity, stability, and complications. The most common and perhaps simplest measure of clinical status is that of symptoms, which may include evaluation with respect to frequency, severity, and duration (Coughlin et al., 1997). Measures of symptoms simply evaluate the events or attributes due to the disorder experienced by the patient, whereas other measures of clinical status evaluate the severity of the symptoms, stability of the patient's status, and any clinical complications that may arise. Severity is measured as the duration of symptoms produced by the illness and the corresponding impairment in life. The same level of severity does not imply the same level of symptoms, nor does the same level of symptoms imply the same level of severity (Coughlin et al., 1997). Changes in symptom severity is often the most important and observable short-term or immediate measure of effectiveness in mental health treatment. In addition, measures of symptoms typically are objective in nature and can be collected from independent evaluators, often in a simple checklist format. Stability is a measure of the degree to which the patient's status will not change or fluctuate with respect to the disorder. Where severity and symptoms tend to be static measures (i.e., what is the clinical status at this moment), stability can be thought of as an expectation of severity or symptoms in the future. Complications consist of any undesirable results that occur while the patient is being treated. These results are not necessarily limited to those that occur directly from treatment or the treatment setting. A simple example would be the side effects experienced from taking a certain medication. Functionality. Functionality can be thought of broadly as individuals' ability to maintain their independence and roles within life (Andrews, Peters, & Teesson, 1994). More specifically, being functional means being able to perform regular and expected tasks and responsibilities. Exactly what comprises regular and expected functions must

< previous page

page_190

next page >

< previous page

page_191

next page > Page 191

be defined. As with clinical status, the domain of functionality is multidimensional. In general, functionality can be divided into three areas: physical, social/interpersonal, and role (work or school). Physical functionality is defined as the ability to perform usual physical activities, such as climbing stairs. Clearly, when multiple age groups are combined, especially people from the geriatric population, the expectations of normal functioning vary. The geriatric individual may not be able to perform typical adult physical functioning due to complications related to age, not clinical condition. Therefore, both the population of interest and the normal expected functioning of that population must be considered when measuring physical functionality. Sociallinterpersonal functionality is defined as an individual's ability to interact and maintain relationships with others, including friends, family, and support networks (Coughlin et al., 1997). Unlike physical and role functioning, expected social and interpersonal functioning is more consistent across age groups; however, this area of functionality may vary across cultural groups. Role functionality is defined as people's ability to perform their primary roletypically, that of student or employee. Role functioning is difficult to measure, because it has different meanings and priorities across and within age groups. For example, the primary role function of an adolescent is to attend school, whereas that for an adult is typically to be employed. It may be unclear whether the priority an adolescent assigns to attending school is commensurate with that which an adult assigns to being employed (e.g., the adult may be financially dependent on continuing that role). Measuring role functioning is further complicated by the relatively high rate of unemployment within the population of adults with mental health and substance abuse problems; thus, this role is applicable only to a subset of the adult population. The inclusion of a geriatric population further complicates the ability to measure role functioning, which is even less consistently defined within the elderly population. Expected functions related to physical, social/interpersonal, and role functionality clearly differ across age groups and populations, which may necessitate different tools for assessment. Differentiating patients' ability from what they actually do is not necessary when measuring functionality, as mental disorders affect both. Moreover, the outcome of interest is truly the level of functioning achieved (Andrews et al., 1994). Functionality is important not only to patients and the people around them but also to society as a whole. Increasing patients' abilities to care for themselves, physically and financially, and to interact with others decreases the responsibilities of caregivers and the community. Quality of Life. As with functionality, quality of life is a multidimensional domain, encompassing the physical, cognitive, affective, social, and economic areas of a patient's life. When asked to list the factors that influence quality of life, patients identified physical and mental health, living environment, income, happiness, and meaningful work (Cook & Jonikas, 1996). Although quality of life overlaps with functionality (Coughlin et al., 1997) and clinical status, they differ in that quality of life measures typically emphasize the patient's subjective outlook and values as they relate to functioning and clinical or health status. Health-related quality of life measures evaluate the quality of life with specific regard for the disorder being treated. Some measures of quality of life emphasize the objective view of what resources are needed to achieve desired needs. Measures of burden experienced by the care provider, for example, can be included in the domain of quality of life (Andrews et al., 1994).

< previous page

page_191

next page >

< previous page

page_192

next page > Page 192

Measures of quality of life and general health concepts are not well understood, have poor specificity, and generally are not well accepted by the scientific community. Even so, quality of life measures are very important to the patient, because they represent individuals' beliefs about their own status (Dornelas, Correll, Lothstein, Wilber, & Goethe, 1996). Lansky noted that although health status measures may have limited use as outcomes, they can be useful as predictive or risk-adjustment variables (Lansky, Butler, & Waller, 1992). Satisfaction. Although the patient has always been viewed as providing important information related to mental health services, it was not until Attkisson and Zwick (1982) published the Client Satisfaction Questionnaire (CSQ) that satisfaction became an accepted outcomes domain. In recent years, satisfaction has become increasingly important in the health care marketplace, both to patients and payers (AIS, 1996a). Satisfaction is, by definition, subjective, depending on individual values and perspectives. Moreover, measures of satisfaction can only be provided by the patient. In general, though, the domain of satisfaction measures whether a patient's expectations were met (Lyons et al., 1997). This should include measures of whether patients received what they wanted, when they wanted it, and whether what was received met their expectations. Satisfaction tends to be an easy area in which to collect information, for two main reasons: (a) The goal of information collection is clear, and (b) some basic commonality exists in measuring satisfaction, regardless of consumer differences. For example, timeliness of service and courtesy of staff are important aspects of consumer satisfaction in all service-related areas, not just mental health. The dimensions of satisfaction as described by Lyons et al. (1997), based in part on the satisfaction measure developed by the Rand Corporation for the Medical Outcomes Study (Ware & Hays, 1988), are summarized in Table 6.3. These dimensions are consistent with those outlined by the Mental Health Statistics Improvement Program (MHSIP) of the Center for Mental Health Services. MHSIP (1996) identified the four areas of service of most importance to consumers: access, appropriateness, outcomes, and prevention. (Prevention, although important to consumers, is not readily measured as an outcome by consumers; thus, omission is acceptable.) Some practitioners may argue that patients' satisfaction with their treatment is not important to its effectiveness. Yet, research findings suggest that satisfaction and effectiveness TABLE 6.3 Dimensions of Patient Satisfaction Dimension Description Technical quality Expertise of a service, including policies and procedures Competence Skill, level of training, knowledge of provider Interpersonal quality Personable qualities of the people encountered; focus is on people, not professionals. Access Ease of obtaining desired services; more objective than other areas Availability and choice Choices available and level of involvement in of service choice; similar to access Duration of care Duration of care was as expected; too little or too much can affect satisfaction Benefit/value Gain from service, usually improvement in clinical outcomes; assess gain in conjunction with cost; data are limited

< previous page

page_192

next page >

< previous page

page_193

next page > Page 193

are closely related. For instance, Pickett, Lyons, Polonus, Seymour, and Miller (1995) found satisfaction to be a strong correlate with perceived clinical benefit. Furthermore, Andrews et al. (1994) suggested that satisfaction may be an indicator of long-term outcomes, as satisfaction with services can influence care-seeking behavior and adherence to treatment regimens. Resource Use/Treatment Utilization. The domain of utilization typically measures the amount of health care servicesmedical and behavioralused for a particular episode of care, but it can also include measures of concurrent services and end-of-treatment level of care (Lyons et al., 1997). This domain tends to be most important to payers and providers in assessing costs, with respect to both time and money. When combined with other data, such as measures of change in clinical status, resource utilization measures provide the information necessary to begin assessment of efficiency and cost offset. A measure of the health care services involved within a single episode of care is one of the simplest types of utilization measures. Furthermore, data such as length of stay often are measured to properly analyze other outcomes. Failure to measure length of stay for a given episode of care makes it difficult, if not impossible, to interpret other outcomes (e.g., clinical improvement; Lyons et al., 1997). Although measuring utilization outside the current episode of care is desirable and may provide the most valuable information, the deterrents to doing so are numerous. For example, to properly measure rehospitalization following treatment requires collaborative efforts and combining data from numerous facilities and services. Thus, a number of issues will arise, including legal and ethical issues related to patient confidentiality and practical issues in combining multiple databases. In sum, the process of measuring utilization outside the current episode is time consuming, complicated, and expensive (Lyons et al., 1997). From Whom to Measure Clinical outcome data can be collected from three primary sources: the clinical staff, the patient, and collateral to the patient (i.e., sources connected to the patient). Each will provide important data and, at times, may be the only source of data in a given domain. Moreover, sources will provide a varying perspective and rate the outcome as it relates to them. For example, when rating response to treatment, the therapist will rate the degree of change due to treatment, patients will focus on their experienced change in subjective state, and patients' friends or relatives will rate the change as it directly affects them (Docherty & Streeter, 1996). Clearly, it would not be surprising for the ratings of response to treatment to vary among these three sources. Although these ratings may be contradictory at face value, when considered from the vantage point of the source of the information, each will provide valid and important information. In fact, when it is possible to obtain multiple sources of information, doing so is preferable to having just one source. Each source of information has strengths and weaknesses, which vary depending on the domain being measured (Docherty & Streeter, 1996). The following sections address each type of source in more detail. The Clinical Staff. The clinical staff may include not only therapists but anyone who provides clinical services. This staff is considered the best source of information for objective clinical measures, such as symptoms or disease severity. Although the

< previous page

page_193

next page >

< previous page

page_194

next page > Page 194

clinician generally can be expected to be more objective than the patient, the clinician's objectivity in evaluation over the course of treatment may be questionable. Obtaining a second opinion from an external observer would theoretically eliminate this concern; however, in practice, obtaining such information is expensive and difficult. The one exception may be an independent evaluation from a case manager, if the usual business flow includes such an individual (Lyons et al., 1997). Clinical staff information is typically available in three forms: administrative records, including insurance records and encounter files; medical records, including clinical notes and laboratory tests; and specifically designed surveys, including research study forms and other documents developed to collect outcomes data (McGlynn, 1996). As outcomes measures continue to be required for accreditation purposes, information from clinical staff will be increasingly available. However, as is true whenever data are used for a purpose other than that for which they were collected, clinical data collected from sources other than surveys share a number of inherent problems when used in an outcome system: lack of standardization, inability to link across sources, uneven quality or detail, and lack of important elements (McGlynn, Damberg, Kerr, & Schenker, 1996). Although collecting data from clinical staff through surveys allows for customizing the information collected, doing so typically is more expensive, requires additional work by clinical staff, and may have a low response rate. The Patient. It is generally agreed that patients are a reliable source of information for issues related to their symptoms, functioning, and satisfaction (Docherty & Streeter, 1996; Panzarino, 1996). Patients also can provide essential data about their thoughts, feelings, and behavior that is often unavailable from other sources. However, due to the nature of mental health disorders, collecting accurate information from patients is especially complicated in this area of health care. The results will depend on the severity and type of disorder; for instance, the ability of an individual with a severe mental health problem (e.g., psychosis) to provide valid information (especially at the initiation of treatment) may be questionable (McGlynn, 1996). An additional complication related to patient-reported data is bias due to false or incorrect reporting. False reporting occurs when a patient knowingly reports incorrect data. Incorrect reporting occurs when a patient does not provide correct data, perhaps due to an inaccurate memory (Azar, 1997) or misinterpretation of a question. To minimize such bias, extra care must be taken in collecting data from patients. Such care can be in the form of providing an appropriate setting for answering questions, giving clear instructions, or scheduling data collection at a time when the patient will be as able as possible to provide accurate information. Collateral to the Patient. As noted earlier, collateral sources are those individuals who are connected to patients and have a stake in their treatment, such as relatives, spouses, friends, and employers. Because mental illness and substance abuse often affect the people close to patients (in addition to themselves), these people are able to provide additional information related to outcomes. Patients' family and friends often are most interested in the same things as the patients: their functional status, quality of life, and satisfaction. However, the collateral source's assessment of the patient's level of functioning may differ from the patient's own assessment (McGlynn, 1996). Actually, an evaluation of the patient's functionality with respect to employment or social functioning may be best conducted by someone other than the patient, such as an employer, family member, or friend. Such an individual will

< previous page

page_194

next page >

< previous page

page_195

next page > Page 195

be able to provide a more objective evaluation than will the patient (providing, of course, that the collateral source is close enough to the patient to provide valid information). Although collateral information brings an added dimension to the outcomes measured, obtaining consistent information from collateral sources is challenging. The first issue relates to the availability of data. Some patients may have no collateral sources available, whereas other patients may be unwilling to provide contact information for such outside sources. Even when information about collateral sources is provided, response rates from these individuals are typically lower than those obtained from patients. Collateral data also have inherent variability based on the relationship of the person to the patient. Just as a patient and a clinician will have different perspectives related to the same outcomes, so will a spouse and a sibling providing information about the same individual. Even the amount of time the collateral source spends with the patient will vary and thus affect the accuracy of information provided. Yet another complication related to collateral information is determining how to use contradicting data when quantitative information is collected from collateral sources (e.g., relapse rates in substance abuse patients). Making this determination is especially difficult when collateral data are not available for all patients. Even when all the complications are considered, collateral sources provide important data that should not be excluded. Adding basic information about the collateral source's formal and practical relationships to the patient will increase the value of the data collected. Although being able to collect collateral data is ideal, it is difficult in practice (Lyons et al., 1997). One of the most practical and useful means of collecting collateral information is through a satisfaction instrument. When to Measure Determining when to measure outcomes depends on the outcomes being measured, the intervention being examined, and the logical model of how one effects the other (McGlynn, 1996). In general, outcomes should be measured as early as possible, when it is meaningful and convenient, and on a fixed schedule. The use of multiple timepoints is desirable in theory but increases the burden of respondents, jeopardizes compliance rates, and complicates data management (Lyons et al., 1997). However, conducting measures at baseline and multiple measures following treatment is essential for demonstrating valid change and effectiveness. Baseline. The timing of baseline assessments of patients with mental health and substance abuse problems is complicated. Ideally, baseline data should be collected prior to treatment (e.g., at the time of intake) for the purpose of establishing an accurate pretreatment status. But sometimes, data collection may have to be delayed until the patient is stable; in these cases, the immediate effect of treatment (i.e., stabilization) will not be represented in the patient data. Following a delay, the baseline assessment should be conducted as soon as the patient is stable enough to provide valid information. When assessing inpatients, allowing time for the baseline assessment is necessary; however, a maximum time from intake should be established (say, 24 to 48 hours from intake), after which data will not be used to assess pretreatment status. End of Treatment (Short-Term Follow-up). End-of-treatment measures evaluate the immediate effects of treatment interventions. In inpatient settings, these assessments can be built into discharge practices and collected routinely. In other settings, however,

< previous page

page_195

next page >

< previous page

page_196

next page > Page 196

final treatments often are not planned, making end-of-treatment evaluations difficult. This is especially true of data collected from either the patient or a collateral source. Moreover, some outpatient treatments are long term, which may make it difficult to conduct end-of-treatment evaluations in time-limited studies (Lyons et al., 1997). Such problems can be avoided by scheduling assessments at regular times during treatment (e.g., monthly). Some domains, such as satisfaction, only need to be measured at the end of treatment; even so, there are still important issues related to the timing of such measures. Although collecting satisfaction results at the time of discharge from a facility or at a final treatment provides the highest response rate, the results tend to have an upward bias, indicating higher satisfaction than the true level due to lack of anonymity and fear of retaliation. To maximize the response rate and minimize bias, Dornelas et al. (1996) suggested mailing the patient a questionnaire 4 to 10 days after discharge. Follow-up. As noted earlier, the ability to show an immediate effect of treatment is of great importance in assessment. In today's health care marketplace, however, many consumers and payers also require a demonstration of sustained effectiveness, as shown by follow-up data. Some domains related to effectiveness require long-term follow-up in order to show effects. For example, although interpersonal functionality may change with improvement in symptoms, functionality will continue to improve after symptom recovery. Demonstrating effectiveness in domains such as long-term functionality, future resource utilization, and relapse rates requires long-term followup assessment. Whereas collecting data for baseline (intake) and end-of-treatment (discharge) measures can be built into usual clinical practice and business flow, data collection for follow-up measures is inherently more complicated. Namely, collecting follow-up data involves numerous logistical and clinical issues. For example, determining the optimal time to collect follow-up information is complicated. Although it is desirable to show the longest sustained effect possible with treatment, the greater the time between treatment and follow-up, the lower the expected response rate, the greater the chance of external factors affecting the outcome, and the lower the ability to attribute change to treatment. Thus, without a sufficient sample size, long-term follow-up (i.e., more than 6 months) may not be feasible, and determining the timing of the follow-up may be limited (Dornelas et al., 1996). The population under consideration also will affect the decision of when to follow-up. Follow-up assessment on measures such as relapse, for instance, will not be meaningful if they are not timed appropriately for the population being studied. The nature of the population will affect the expected response rate, as well, and thus should be considered in determining the timing of the assessment (Dornelas et al., 1996). For instance, it can be costly or even impossible to locate a transient population for the purpose of conducting a 6-month follow-up; a 30- to 60-day follow-up may be appropriate for this population. However, a population from an employee assistance program (EAP) may be available for a 6-month follow-up. As discussed earlier, most systems of outcome measures consider episode of care as the framework of when to measure outcomes. However, decisions related to the timing of outcome measures could be affected by a new concept, episode of illness, which may encompass many episodes of care and cross a number of levels of care and various providers. As the continuum of care concepts and complete-care provider networks continues to grow, so will measurements related to episode of illness. If possible, an outcomes system should allow for measurement of episode of care as well as progress throughout episode of illness (Docherty & Streeter, 1996).

< previous page

page_196

next page >

< previous page

page_197

next page > Page 197

Steps in Designing and Implementing an Outcomes System Designing and implementing an outcomes system must begin with identifying its purpose and goals. Why is the system being implemented? is the question that needs to be asked, discussed, clarified, and answered. Consider that the nature and design of an outcomes system being implemented primarily to meet regulatory body requirements will be considerably different from one created to demonstrate accountability to multiple stakeholders. Moreover, it is possible for the stakeholders within one health care system to have different goals. Some may want to measure the outcomes of the tasks and processes performed; others may want to know consumer-patients' perceptions of the treatment delivered; and still others may want to know how often and how well the processes in place are performed. Closely related to the question of why the system is being developed is the question of how the information will be used. More specifically, is the outcomes management system a subset of quality improvement efforts? Will reports be used to demonstrate accountability to the external marketplace or to meet regulatory requirements? Or, will information about reports be used to support internal continuous improvement efforts? Is a set of measures being established to further the scientific efforts of the professional community regarding the effectiveness of new treatments? (Solberg, Mosser, & McDonald, 1997). The answers to these questions are so tightly aligned with the question of purpose that it is difficult, if not impossible, to separate the two. The only way to approach these questions is to solicit input from those stakeholders who will be gathering the data and using the information once it has been collected and compiled. How does management want to use the information? What types of resistance are present in the areas where the data will be collected? What purpose will outcomes measures have for the people responsible for data quality? Because different stakeholders likely will have different needs and the needs of these various stakeholders will need to be served by the same system, it is vital that a clear, written definition be produced in the form of a design document, identifying what each stakeholder expects to achieve. Those individuals who are designing the system need to identify the various stakeholders' needs, prioritize them, and then plan and communicate whose needs will be met, as well as how and when it can be done. This design document will serve as the blueprint for developing and implementing the outcomes management system; as such, it should be considered a working document that can be changed as the process unfolds. In finalizing the design document, it is important to consider whether individual stakeholders' needs are aligned with the overall vision of top management. If that vision is not clear or has not been adequately communicated, it will be necessary to resolve these issues. The overriding purpose of the outcomes system must be clearly stated before it can be determined what part each stakeholder will play in achieving that purpose. To meet accountability, research, and continuous improvement needs, it may be necessary to establish different measures and assign responsibility for them to different functions within the organization. A board of directors or steering committee might be put in place to coordinate and facilitate the efforts of these various functions, as required. The size and complexity of the organization will dictate whether such delineation of responsibility is needed. Likewise, how much the organization can successfully manage

< previous page

page_197

next page >

< previous page

page_198

next page > Page 198

TABLE 6.4 Characteristics of Measurement for Improvement, Accountability, Improvement Accountability Research Who? Medical group Purchasers Science community Audience Quality improvement Payers General public (Customers) team Patients/members Users (clinicians) Providers and staff Medical groups Administrators Why? Understanding of Comparison New knowledge, Purpose a. process Basis for choice without b. customers Reassurance regard for its Motivation and focus Spur for change applicability Baseline Evaluation of changes What? Specific to an Specific to an Universal (though Scope individual individual often medical site and medical group and limited process process generalizability) Measures Few Very few Many Easy to collect Complex collection Complex collection Approximate Precise and valid Very precise and valid Time period Short, current Long, past Long, past Confounders Measure or control Consider but rarely Describe and try to measure measure How? External Internal and at least External and usually Measures involved in the prefer to control both selection of measures process and collection Sample size Small Large Large Collection Simple and requires Complex and Extremely complex process minimal time, cost, requires and and moderate effort and expensive expertise cost May be planned for Usually repeated several repeats Very high Need for None for objects of High, especially for (Organization and confidentialitypeople) comparisonthe goal the is exposure individual subjects Note. From ''The Three Faces of Performance Measurement: Improvement, Accountability, and Research," by L.I. Solberg, G. Mosser, & S. McDonald, 1997, Journal of Quality Improvement, 23, p. 141. Oakbrook Terrace, IL: Joint Commission on Accreditation of Healthcare Organizations. Reprinted with permission. and resource is a determining factor in making preliminary decisions about who will receive what information once the outcomes management system is in operation. Table 6.4 provides a summary of the types of purposes that can be met by various measures. Measures of Quality Improvement, Accountability, and Research Quality Improvement. Quality improvement measures focus on processes that consist of a complex series of linked steps. By examining processes in improving care, rather than people, fear of blame is removed; thus, everyone will be able to concentrate on improvement, rather than defensiveness. As summarized by Solberg et al. (1997), "The most powerful improvements usually come from an understanding of processes and

< previous page

page_198

next page >

< previous page

page_199

next page > Page 199

from efforts to systematize them. . . . A focus on process also forces one to pay more attention to the desires of the customer and to the use of a data-driven scientific approach to change rather than a reliance on hunches and tampering" (p. 138). Individual measures should be specific to the process being implemented and involve key steps in the process. The data gathered must be specific to a group, facility, or unit with an individual implementation scheme. Data from multiple groups or sites usually are not useful for improvement purposes, because the different entities have their own implementation needs. Accountability. Accountability measures are useful for external groups, such as managed care companies, employers, and consumer groups. Yet, any one of these organizations also may be interested in how it compares with other similar organizations. Therefore, some overlap between accountability and quality improvement initiatives may be acceptable, because the comparison may initiate a move for internal improvement. Outcomes measures for accountability are not intended to be confidential and thus may produce fear and defensiveness. Fear that the information will be misused is common among staff in an organization. Moreover, staff may become defensivetrying to show that the information is wrong or that their group is differentwhen too much emphasis is placed on outcomes for accountability, particularly in the early stages of implementing an outcomes system. In contrast, measures for quality improvement purposes should not be released to the public, because doing so could be damaging to improvement efforts in the organization. (A reasonable exception to this policy is when some of the information provided by the quality improvement process is needed for external groups.) Research. The purpose of outcomes measurement for research is to produce new knowledge of general value within the field. Clinicians are trained in the research process and the application of information from research to a realworld setting. However, the target populations for such research would be limited compared to those used for measuring process improvement or accountability, as a more rigorous study design would need to be put in place. Outcomes measurement does not have the rigor of efficacy studies, and a clear understanding of which measures will be used for whatimprovement, accountability, or researchis essential to success. One way to gather information from various stakeholders is to involve them in focus groups. These groups can be used throughout the design and development process to examine the overall purpose of the system, to formulate specific questions that various stakeholders want answered by the outcomes system, or to plan the actual implementation (i.e., who does what, for whom, etc.). The following are some of the questions focus groups might address: Who will be involved? Specifically, who will provide the data? Who will use the information? Who will be responsible for the results? What key questions will be answered, and for what group of stakeholders? What tools or measures will be used? When will data be collected, and how will they be processed? What is the best way to incorporate data collection into the current workflow? Employees who are involved in decision making will have ownership of the process and be more likely to support the changes needed for the outcomes management system to be successful.

< previous page

page_199

next page >

< previous page

page_200

next page > Page 200

In designing an outcomes management system, decisions must be made to prioritize the measurement needs of various stakeholders. In doing so, two important (albeit obvious) limitations should be considered: "Not everything can be measured at the onset," and "A perfect solution is not possible" (Sauber, 1996, p. 13). Acknowledging these limitations will allow those involved to narrow the scope of the system, making it possible to achieve some of the goals. The most important point to keep in mind is to begin measurement with the most important and achievable goals. Implementing a partial system is better than not implementing any system at all. After the system is in place and users have begun to utilize it, incremental expansions and improvements can be made (Sauber, 1996). Moreover, implementing a partial system and building on it allows for improvement as the experience grows. Creating a Work Plan Once the goals of the system have been explicitly defined and information has been collected from various stakeholders regarding their needs and views, those individuals responsible for the outcomes management system must create a work plan or program. That plan should carefully delineate and define all the steps that will be taken to implement the system. In particular, the work plan should provide operational definitions of the three major steps in the outcomes management system: data collection, data processing/reporting, and data storage. It is impossible to define one area independent of the others, because all three are interrelated. Nonetheless, each will be discussed in a following section. Data Collection. To collect outcomes data of high quality and quantity, a careful implementation strategy must be developed. The operational details of implementing an outcomes system should be well planned and discussed with all the staff involved in the process. The first step in planning for data collection is to answer a few preliminary questions: What populations are served by the organization being measured, and which populations will be included in the measures? What services are offered to these various populations, and which ones do you want to measure? Once these preliminary decisions have been made, it will be possible to identify and define the domains that could be measured to achieve the stated purpose and goals of the outcomes management system. In selecting domains to measure, it is important to ensure that the information collected will be relevant and understandable to the various stakeholders. A consensus of which domains should be included probably will not be possible, due in part to the stakeholders' differing needs and goals. Table 6.5 summarizes a number of questions that should be considered when designing the data collection process. After answering these questions, a workflow plan or diagram should be developed for the outcomes system. The following issues need to be analyzed: the flow of patients through the system, the staff's operational workflow and areas of responsibility, and interfaces with other departments and the flow of external resources (e.g., information processing, paper/information) to and from other departments. The workflow plan should identify each department that will be involved in the data collection process (see Fig. 6.2). And, within the plan, proper staff should be identified to coordinate the collection of staff and patient data.

< previous page

page_200

next page >

< previous page

page_201

next page > Page 201

TABLE 6.5 Questions to Consider in Designing the Data Collection Process What measures will be collected? What domains are required to meet the goals of the system? What additional information is needed to support the system? What additional domains may be of use or interest and easily built into the system? How will these domains be measured? From whom will data be collected? Who can provide the data needed? Who is the population of interest: all respondents or a subpopulation? Will data be collected from a sample of the population or from all respondents within the population? How will the data be collected, and who will be responsible for ensuring quality data? Does this data exist in another system, or will it need to be collected from the source? What type of instrument will be used to collect the information from the sources: self-report forms? interview format? When will the data be collected: at intake? throughout the treatment process? at discharge? during follow-up? At what times is it necessary to collect the data? What logistical or attitudinal barriers are likely to impact the data collection process? The success of any outcomes system is contingent on incorporating data collection into the day-to-day flow of operations, so performing a workflow analysis may be one of the most critical steps in the process. Within a single organization, it is possible that the workflow may vary from facility to facility and/or provider to provider. Therefore, multiple workflow processes may need to be examined and different implementation schemes developed to account for variances. Once the workflow diagram or diagrams have been completed, it will be possible to identify the best place to integrate data collection into the intake, treatment, and discharge processes. Large systems that are actively trying to improve their data collection processes should review the workflow plans of other facilities that provide good as well as poor qualities and quantities of data. The information garnered from such reviews can be used to implement the best practices. Once the workflow analysis has been completed, it will be necessary to develop explicit definitions of all the data elements. Regardless of how obvious and intuitive a data element may seem, it still may be misinterpreted unless clearly defined. The following are examples of data element definitions: Admission date: The date the patient was admitted to the health care organization as an inpatient or outpatient. Discharge date: The date the patient was discharged from the health care organization as an inpatient or outpatient. Date of birth: The month, day, and year the patient was born. Diagnosis: The patient's primary ICD-9 code. Discharge disposition: The code indicating patient status as of the ending service date of the period covered by the present episode of care. Developing operational definitions of the data elements to be collected is important to assure consistency as well (Gift & Mosel, 1997). Data Processing/Reporting. The workflow plan must incorporate the processing and reporting of outcomes data into the overall scheme of implementation. Data can be processed locally or at a database center for a multifacility organization. A neutral

< previous page

page_201

next page >

< previous page

page_202

next page > Page 202

< previous page

page_202

next page >

< previous page

page_203

next page > Page 203

Fig. 6.2. Adult psychiatric inpatient workflow diagram.

< previous page

page_203

next page >

< previous page

page_204

next page > Page 204

database center also can be used for a multifacility or multiorganizational outcomes collaborative. Again, the workflow of each department involved in processing and reporting data needs to be reviewed and a plan developed for each segment of the outcomes system. The goal should be achieving a smooth integration among collecting, processing, and reporting data. Before data collection actually begins, a detailed plan of the data analysis and reporting strategy should be developed. The following questions will guide the report planning process: What are the key questions to be answered? How will the data be analyzed? Who is the primary audience for the report? How will the report be used? In what format (i.e., charts, tables) will the results be presented and communicated? How often will a new report be generated? Determining the answers to such key questions and ensuring that valid analytic methods have been used are basic requirements of any report. Beyond these requirements, the most important element in creating a successful reporting system is to tailor the report to the audience. A report must be useful and valid from the user's perspective. Thus, the level of complexity and depth of information presented will depend on the audience and its abilities and needs. For example, clinicians, managers, and purchasers will want detailed information they can understand, trust, and act onnot just general statements but specific information (e.g., about how patients admitted on Saturday feel about their waiting time, information received about the diagnosis and treatment plan, etc.). However, the CEO of an institution may only want the "bottom line," or a general summary of how the institution is performing. Regardless of the audience, some basic guidelines should be followed in developing a valid and useful report. Whenever possible, comparative measures should be provided for all results. The valid comparative group may be a predefined benchmark or goal, historic results, or similar respondents from another group, depending on the goals of the system. Creating a guide for the use and interpretation of the report is also recommended. This guide should be brief yet address both the strengths and limitations of using and interpreting the report. Data Storage. Designing an effective and useful database for an outcomes system usually requires consultation with specialists, as database design and administration is a specialized area within computer science. Nonetheless, several important issues should be considered when designing and developing a database for an outcomes system. An outcomes system database serves two primary goals: to accurately maintain all necessary information and to support analysis. The design of the database must balance these two goals. Thus, it should be developed after the data collection and reporting segments of the outcomes system have been planned. Database design should include input not only from the database experts but from the researchers who designed the data collection instruments as well as those who will analyze the data. Ideally, a database should be designed to allow flexibility with respect to questionnaire modifications. Thus, as the outcomes system grows and improves, the database will be enhanced such that all data can be maintained and accessed. At the onset, the individuals responsible for maintaining the database should be identified, including those responsible for quality assurance. A quality assurance plan for

< previous page

page_204

next page >

< previous page

page_205

next page > Page 205

the database should be developed while the database is in the design stage. This plan should identify how the database will be assessed for quality, how often it will be assessed, how the quality assessments will be documented, and what measures will be taken if the database fails any of the quality assessments. A database that assures reproducibility of reports in the future, even as the database grows, is advantageous in assessing the accuracy and quality of reports and analyses. Thus, it is important to consider the issue of reproducibility of reports when developing a data quality assurance plan. When the outcomes system involves using information collected from another database, additional issues must be considered. For instance, if individual observations from two databases are to be linked, the two databases must be compatible. Furthermore, a method must be developed and tested to link individuals or individual stays using unique identifiers. The design of the outcomes database also may be restricted by the need to be compatible with other databases. Avoiding Implementation Problems The checklist in Table 6.6 should be considered to avoid problems and issues that may impede implementation of the outcomes system. Patient-Related Issues A number of patient-related issues need to be addressed when designing and implementing an outcomes system, including security and confidentiality of data and patient resistance to completing questionnaires. Security is defined as the means for preventing unauthorized access. Confidentiality is defined as restricting access to information only to those individuals with appropriate reasons to have such access (NCQA, 1997a, p. 41). Security and Confidentiality. Maintaining the security and confidentiality of patient information needs to be considered in all applications within an outcomes system. Some researchers fear that the existence of large databases that contain patient information from several organizations may allow for breaches in patient confidentiality. The NCQA (1997a) emphasized the necessity of making a clear commitment to protecting the confidentiality of patient information. Internal policies should be developed that create a balance between a patient's right to privacy and an organization's need to understand who is utilizing its services, under what conditions, and to what ends (Lyons et al., 1997). Patient Resistance. Another patient-related issue is the reluctance some patients may have about providing information about their current condition. Lyons et al. (1997) discovered that some patients were afraid that if they reported they were doing well, their benefits would end. Other patients feared that if they reported they were not improving or were doing poorly, their treatment would end because it was not working. In order to improve the collection of data about patients, patients need to understand that the information they provide will be used to improve the quality of the services they receive. Careful explanationsin person and in writingof how the information will be used is a crucial aspect of implementing an outcomes system.

< previous page

page_205

next page >

< previous page

page_206

next page > Page 206

TABLE 6.6 Checklist for Implementation of the Outcomes System 1. Planning Stages A. Develop a well-organized, comprehensive, and sensible plan. Incomplete or poorly developed plans that are shown to staff/providers will generate unnecessary concern and resistance. B. Involve staff in the design of the outcomes management system, and ask for their commitment to the new process. Receive feedback on the design of the outcomes systemfrom supporters as well as nonsupportersto allow for fine-tuning. Demonstrate how the outcomes system can help meet regulations and requirements to continuously improve the quality of patient care. C. Pilot test the measures and methods to fine-tune the implementation plan. For the pilot phase, enlist staff who are highly motivated as well as staff who are resistant to the new outcomes system. (By using staff who are resistant, potential shortcomings and problems can be addressed early.) Collect feedback from patients, staff, and managers concerning the pilot project. 2. Instrument Selection A. Select patient-response instruments that take less than 20 minutes to complete, as longer instruments probably will meet with noncompliance and yield poor data. Select staff-response instruments that take only about 5 minutes per patient. Staff may be resistant if data collection and processing require more time than they feel is necessary. B. Involve the staff who will be collecting and using information in building meaningful reports. Select and design reports at the beginning of the implementation stage. Although reports can be modified later, it is important to start with a few that will be widely accepted and used to support important initiatives. 3. Staff Training A. Train staff regarding all aspects of the system that will affect their work, including data collection and reporting. B. Develop a procedures manual. Include Quick Start Guides to provide easy and quick access to key tasks. Include a workflow diagram that incorporates data collection and reporting into daily operations. 4. Implementation Stage A. Assign one person to be responsible for the quality and completeness of data collected in each organization. This person's duties should include the following: Facilitate the data collection process by providing tools and updates to those who need them. Ensure that data are complete before submitting them for processing. Identify and correct operational problems. Facilitate the timeliness of data collection and processing. B. Encourage staff participation during data collection. Share results with staff so they are aware of the benefits of successfully implementing an outcomes system. Encourage staff to review and understand reports so they know how the results of their data collection efforts are being utilized. Share process improvements that occur as a result of data collection efforts. Training The fundamental question for the trainer concerns how value for performance improvement processes and measures can be created in a population that knows it must show favorable outcomes and patient satisfaction results to remain in business but has little experience and, in many cases, little desire to use the information to operate its organization.

< previous page

page_207

next page > Page 207

Building value starts with communicating the purpose and anticipated benefits of implementing an outcomes management system to everyone who will be involved in itfrom the initial design stage through the implementation and evaluation stages of the process. All of the interactions that take place early in implementation lay the groundwork for subsequent phases; thus, training starts early and can become part of the focus group data collection process conducted at the beginning of the design phase. Adults like to know why changes are being made and how the changes will affect their jobs. It is only when individuals understand the benefits they will enjoy that true commitment will occur. The trainer should share some of those benefits with individuals (e.g., that the organization is implementing an outcomes system to compete in the current market, to reduce cost and/or increase revenue, to provide the information so essential to doing business in a managed care environment to the people who need it, etc.). It is important for the organization to communicate to its employees realistic expectations and goals, both the macroand micropurposes of the change. Individuals will play important roles in the success of outcomes measurement, so individuals must understand what specific part they will play in implementing the system. When conducting training, it is important to identify learners' current level of knowledge and begin teaching from that point forward. Consider what three to five things learners must know to be motivated and able to collect high quality data. Include all employees who will be responsible for providing clinical data, implementing patient surveys, or managing medical records. Design training materials so that current staff can use them to train new staff when needed. Depending on the work environment, it might be possible to train in half-day sessions, via teleconferencing, or via telephone conferencing. The training should be timed so that data collection will begin within a week of the training. It is also important to have a support line in place to answer questions once the actual implementation begins. Employees will probably resist changes if they feel threatened. Trainers should understand ways to minimize such resistance. One approach is to identify informal leaders who support the move to outcomes measurement and arrange to have them participate in training sessions. If employees are willing to treat the organization's outcomes measurement results with the same importance as its financial results, not only will the quality of patient care be improved but the "bottom line" will be as well. Conclusions The need for quality and accountability in behavioral health care drives outcomes management efforts. Taking measurement from the research or assessment environment to the brief therapy world of today's practitioner is a challenging task. With managed care becoming a reality of the behavioral health care industry, the idea of devoting the precious commodities of time and money to measurement can be overwhelming. Yet, the paradox exists; with measurement and continuous improvement becoming an integral part of a market-driven health care industry, as evidenced by its inclusion in the accreditation process for payers and providers alike, it is not possible to compete in the behavioral health care arena without an outcomes management system. Thus, the challenge is not whether to design and implement an outcomes management system, but how to design one that is meaningful, cost-effective, and can be implemented in an efficient manner.

< previous page

page_207

next page >

< previous page

page_208

next page > Page 208

References American Managed Behavioral Healthcare Association. (1995, August). Performance measures for managed behavioral healthcare programs. Washington, DC: AMBHA Quality Improvement and Clinical Services Committee. Andrews, G., Peters, L., & Teesson, M. (1994). The measurement of consumer outcomes in mental health: A report to the national mental health information strategy committee. Sydney, Australia: Australian Government Publishing Service. Atlantic Information Services. (1996a). A guide to patient satisfaction survey instruments: Profiles of patient satisfaction measurement instruments and their use by health plans, employers, hospitals, and insurers. Washington, DC: Author. Atlantic Information Services. (1996b). Health care report cards: Profiles of all major reports, shopping guides and consumer satisfaction surveys (2nd ed.). Washington, DC: Author. Attkisson, C.C., & Zwick, R. (1982). The Client Satisfaction Questionnaire: Psychometric properties and correlations with service utilizations and psychotherapy outcome. Evaluation and Program Planning, 5, 233237. Azar, B. (1997). Poor recall mars research and treatment: Inaccurate self-reports can lead to faulty research conclusions and inappropriate treatment. The American Psychological Association (APA) Monitor, 28(1), 1, 29. Beutler, L.E. (1991). Have all won and must all have prizes? Revisiting Luborsky et al.'s verdict. Journal of Consulting and Clinical Psychology, 59, 226-232. Chambless, D.L. (1995). Training in and dissemination of empirically-validated psychological treatments: Report and recommendations: Task force on promotion and dissemination of psychological procedures: Division of Clinical Psychology: American Psychological Association. The Clinical Psychologist, 48(1), 3-23. Chambless, D.L., Sanderson, W.C., Shoham, V., Bennett-Johnson, S., Pope, K.S., Crits-Christoph, P., Baker, M., Johnson, B., Woody, S.R., Sue, S., Bautler, L., Williams, D. A., & McCurry, S. (1996). An update on empirically validated therapies. The Clinical Psychologist, 49, 5-18. Chowanec, G., Neunaber, D., & Krajl, M. (1994). Customer driven mental healthcare and the role of the mental healthcare consultant. Consulting Psychology Journal, 46, 47-54. Cook, J.A., & Jonikas, J.A. (1996). Outcomes of psychiatric rehabilitation service delivery. In D.M. Steinwachs, L.M. Flynn, G.S. Norquist, & E.A. Skinner (Eds.), Using client outcomes information to improve mental health and substance abuse treatment: New directions for mental health services (No. 71, pp. 33-47). San Francisco: Jossey-Bass. Coughlin, K.M., Simon, K., Soto, L., & Youngs, M.T. (Eds.). (1997). The 1997 behavioral outcomes and guidelines sourcebook: A progress report and resource guide on mental health and substance abuse management. New York: Faulkner & Gray. Cummings, N., & Sayama, M. (1995). Focused psychotherapy throughout the life cycle. New York: Brunner/Mazel. Davis, D.F., & Fong, M.L. (1996). Measuring outcomes in psychiatry: An inpatient model. Journal on Quality Improvement, 22, 125-133. Depression treatment in primary care settings cuts medical utilization. (1997, May). Behavioral Health Outcomes, 2(5), 8-9. Docherty, J.P., & Streeter, M.J. (1996). Measuring outcomes. In L.I. Sederer & B. Dickey (Eds.), Outcomes assessment in clinical practice (pp. 8-18). Baltimore, MD: Williams & Wilkens. Donabedian, A. (1985). Explorations in quality assurance and monitoring: Vol. 3. The methods and findings of quality assessment and monitoring: An illustrated analysis. Ann Arbor, MI: Health Administration Press. Dornelas, E.A., Correll, R.E., Lothstein, L., Wilber, C., & Goethe, J.W. (1996). Designing and implementing outcome evaluations: Some guidelines for practitioners. Psychotherapy, 33, 237-245. Drissel, A.B., & Taylor, C. (1997). Accreditors: Adapting and evolving for behavioral care. Behavioral Health Management, 17, 10-13. Drucker, P.F. (1954). The practice of management. New York: Harper Business.

< previous page

page_208

next page >

< previous page

page_209

next page > Page 209

Eisen, S.V., & Dickey, B. (1996). Mental health outcome assessment: The new agenda. Psychotherapy, 33, 181189. Elbeck, M., & Fecteau, M.A. (1990). Improving the validity of measures of patient satisfaction with psychiatric care and treatment. Hospital and Community Psychiatry, 41, 998-1001. Fischer, J., & Corcoran, K. (Eds.). (1994a). Measures for clinical practice: A sourcebook: Vol. 1. Couples, families, and children (2nd ed.). New York: The Free Press. Fischer, J., & Corcoran, K. (Eds.). (1994b). Measures for clinical practice: A sourcebook: Vol. 2. Adults (2nd ed.). New York: The Free Press. Fowler, R.D. (Ed.). (1996). Outcome assessment of psychotherapy [Special Issue]. American Psychologist, 51(10). Gift, R.G., & Mosel, D. (1997). The benchmarking advisor: Tools, techniques, and tips: Avoid data collection calamities when benchmarking. In J. Mangano, L. Herold, P. Hamann, & B. Rosenthan (Eds.), 1998 medical quality management sourcebook: A comprehensive guide to clinical performance enhancement for plans and providers (pp. 87-91). New York: Faulkner & Gray. Herzlinger, R. (1997). Market driven health care: Who wins, who loses in the transformation of America's largest service industry. Reading, MA: Addison-Wesley. Institute of Medicine: Committee on Quality Assurance and Accreditation Guidelines for Managed Behavioral Health Care. (1997). Managing managed care: Quality improvement in behavioral health. Washington, DC: National Academy Press. Johnson, L.D. (1995). Psychotherapy in the age of accountability. New York: Norton. Joint Commission on Accreditation of Healthcare Organizations. (1997a). National library of healthcare indicators: Health plan and network edition. Oakbrook Terrace, IL: Author. Joint Commission on Accreditation of Healthcare Organizations. (1997b). ORYX Outcomes: The next evolution in accreditation: Performance measurement systems: Evaluation and selection. Oakbrook, IL: Author. Kane, R.L., Bartlett, J., & Potthoff, S. (1995). Building an empirically based outcomes information system for managed mental health care. Psychiatric Services, 46, 459-462. Lansky, D., Butler, J.V., & Waller, F. (1992). Using health status measures in the hospital setting: From acute care to ''outcomes management." Medical Care, 30, 57. Lyons, J.S., Howard, K.I., O'Mahoney, M.T., & Lish, J.D. (1997). The measurement and management of clinical outcomes in mental health. New York: Wiley. MacStravic, R.S. (1991). Beyond patient satisfaction: Building patient loyalty. Ann Arbor, MI: Health Administration Press (A Division of the Foundation of the American College of Healthcare Executives). Mathios, N. (1997). Accreditation for behavioral healthcare 1997: New initiatives relevant to your organization. Behavioral Healthcare Tomorrow, 6, 75-77. McGlynn, E.A. (1996). Setting the context for measuring patient outcomes. In D.M. Steinwachs, L. M. Flynn, G.S. Norquist, & E.A. Skinner (Eds.), Using client outcomes information to improve mental health and substance abuse treatment: New directions for mental health services (No. 71, pp. 19-32). San Francisco: Jossey-Bass. McGlynn, E.A., Damberg, C., Kerr, E.A., & E. L. Schenker. (1996, July). Building an integrated information system. Community health data systems: Special topics number 1 [On-line]. Available: http://www.chmis.org/qmas/chds1. html Mental health benefit is cost effective. (1993). American Psychological Association [On-line]. Available: http://www.apa.org/practice/costeffe.html Mental Health Statistics Improvement Program Task Force. (1996, April). Consumer-oriented mental health report card: The final report of the mental health statistics improvement program task force on a consumeroriented mental health report card. Cambridge, MA: The Evaluation Center at Human Services Research Institute. Migdail, K.J., Youngs, M.T., & Bengaen-Seltzer, B. (Eds.). (1995). The 1995 behavioral outcomes and guidelines sourcebook.: A progress report and resource guide on mental health and substance abuse management. New York: Faulkner & Gray. Moore, J.D. (1997). JCAHO tries again: Agency moves to accredit providers based on outcomes. Modern Healthcare, 27, 2-3. National Committee for Quality Assurance. (1997a). 1997 surveyor guidelines for the accreditation

< previous page

page_210

next page > Page 210

of managed behavioral healthcare organizations. Washington, DC: Author. National Committee for Quality Assurance. (1997b). A road map for information systems: Evolving systems to support performance measurement. In Health plan employer data and information (Set 3.0, Vol. 4). Washington, DC: Author. National Committee for Quality Assurance. (1997c). What is MBHO accreditation? [On-line]. Available: http://www.ncqa.org/accred/mbhotext.htm National Technical Assistance Center. (1996, Spring). Performance measures play crucial role in accountability for public behavioral healthcare system. Networks [On-line]. Available: http://www.nasmhpd.org/ntac/merged2. htm Nelson, E.C. (1996). Using outcomes measurement to improve quality and value. In D.M. Steinwachs, L.M. Flynn, G.S. Norquist, & E.A. Skinner (Eds.), Using client outcomes information to improve mental health and substance abuse treatment: New directions for mental health services (No. 71, pp. 19-32). San Francisco: Jossey-Bass. Nelson, E.C., Batalden, P.B., Plume, S.K., Mihevc, N.T., & Swartz, W.G. (1995). Report cards or instrument panels: Who needs what? Journal of Quality Improvement, 21, 155-166. O'Leary, D.S. (1997). Preface. In ORYX outcomes: The next evolution in accreditation: Technical implementation guide for performance measurement systems. Oakbrook Terrace, IL: Joint Commission on Accreditation of Healthcare Organizations. Panzarino, P. (1996). Outcomes management makes strides in quality accountability. Behavioral Healthcare Tomorrow, 5, 69-70. Pickett, S.A., Lyons, J.S., Polonus, T., Seymour, T., & Miller, S.I. (1995). Predictors of satisfaction with managed mental health care. Psychiatric Services, 46, 722-723. Polowczyk, D., Brutus, M., Orvieto, A.A., Vidal, J., & Cipriani, D. (1993). Comparison of patient and staff surveys of consumer satisfaction. Hospital and Community Psychiatry, 44, 589-591. Quality Measurement Advisory Service. (1996, May). Measuring health care quality for value-based purchasing. Fairfax, VA: Severyn Healthcare Consulting and Publishing. Quality Measurement Advisory Service. (1997). Organizing and financing a health care quality measurement initiative: A guide for getting started. Fairfax, VA: Severyn Healthcare Consulting and Publishing. Ragusea, S.A. (1997). Part 2: Newly emerging outcomes databases for individual practitioners: Pennsylvania psychological association's practice research network. Behavorial Healthcare Tomorrow, 6, 47. Sauber, S.R. (Ed.). (1996). Mental health practice under managed care: Vol. 6. Treatment outcomes in psychotherapy and psychiatric interventions. New York: Brunner/Mazel. Smith, G.R. (1996). State of the science of mental health and substance abuse patient outcomes assessment. In D.M. Steinwachs, L.M. Flynn, G.S. Norquist, & E.A. Skinner (Eds.), Using client outcomes information to improve mental health and substance abuse treatment: New directions for mental health services (No. 71, pp. 59-68). San Francisco: Jossey-Bass. Solberg, L.I., Mosser, G., & McDonald, S. (1997). The three faces of performance measurement: Improvement, accountability, and research. Journal of Quality Improvement, 23, 135-147. Trabin, T. (1997). The quality agenda accelerates: Outcomes collaboratives emerge. Behavioral Healthcare Tomorrow, 6, 11-13. Trabin, T., & Kramer, T. (1997). In the eye of the storm: Promoting quality initiatives for behavioral health care. Evaluation Review, 21, 342-351. Ware, J.E., & Hays, R.P. (1988). Methods for measuring patient satisfaction with specific medical encounters. Medical Care, 26, 393-402. Zarin, D.A., Pincus, H.A., & West, J. (1997). Part 2: Newly emerging outcomes databases for individual practitioners: APA's practice research network: An update on recent expansion and research activities. Behavioral Healthcare Tomorrow, 6, 45-46. Zieman, G.L., Kramer, T.L., & Daniels, A.S. (1997). Part 1: Measuring treatment effectiveness: Clinical outcomes in behavioral group practices. Behavioral Healthcare Tomorrow, 6, 38-40.

< previous page

page_210

next page >

< previous page

page_211

next page > Page 211

Chapter 7 Progress and Outcome Assessment of Individual Patient Data: Selecting Single-Subject Design and Statistical Procedures Frederick L. Newman Florida International University Gayle A. Dakof University of Miami School of Medicine One need not apologize for an interest in exploring and making inferences about observations on the individual consumer. There has been a long rich history of individual subject research in psychology, starting with the psychophysics and sensory psychology studies performed in Europe and the United States during the 1800s (Osgood, 1953; Stevens, 1951). The research methods, mostly relying on extensive within-person replication techniques, were sufficiently rigorous so that much of it withstood the tests of replication and generalizability over many individuals (Stevens, 1966). The psychophysics studies were also characterized by their focus on discovering what can be identified as trait, rather than state, characteristics of their subjects. State characteristics (i.e., characteristics distinguished by the environmental and modifiable character of the individual) became the domain of those interested in the areas of behavior change (e.g., behavior modification, learning theory, human judgment and decision making, health and clinical psychology, psychopharmacology, and some subsets of physiological psychology). Interest in how behavioral patterns or state characteristics changed over time becomes problematic for single-subject methods as practiced by the early psychophysics researchers. Simply stated, replications over time were expected to show change. Three traditions of single-subject clinical research have emerged: 1. The use of clinical case notes over the course of treatment: procedure used by those who were developing or studying psychodynamic theories, particularly those developing or studying psychoanalytic theories. 2. The counting of discrete observable behavioral events over time and under different conditions controlled by the experimenter: procedure employed by behavior analysts and therapeutic process researchers (many of whom would not wish to be identified as belonging to the same category as the behavioral analysts, and vice versa). 3. The scores from a standardized instrument contrasted with established empirical norms: procedure employed by neuropsychologists to understand the current intellectual or cognitive state of the individual, or by industrial organization researchers working on matching personal characteristics to a job.

< previous page

page_211

next page >

< previous page

page_212

next page > Page 212

The Use of Clinical Case Notes Reporting of individual clinical case studies in the form of written narratives taken from the theorist's own clinical notes or that of a close colleague has been an integral part of clinical psychology's and psychiatry's historical development (e.g., as presented in the psychoanalytic and psychodynamic literature). Unfortunately, evidence to support replication has been a subject of some controversy about the scientific credibility of those who have used this approach (e.g., the controversy about whether Freud and Jung reported the evidence in and from their case notes properly). The major difficulty in the early attempts to use clinical case studies as scientific evidence to construct or test theories was the lack of agreement on what methods would be necessary and sufficient to argue for the data's validity. Clinical case notes, in the narrative form, have too many "alternative explanations," including the frailty of human judgment in general (Meehl, 1954; Newman, 1983) and the inherent conflict of interest of having the clinician who is treating the person being both the observer and the synthesizer of the observations into a written narrative. Observing and Counting Discrete Behaviors Over Time Behavioral analysts have been employing single-subject techniques for some time (L.W. Craighead, W.E. Craighead, Kazdin, & Mahoney, 1994; Hersen & Barlow, 1976; Herson, Michaelson, & Bellack, 1994; Kazdin, 1992). The tradition of those employing behavior analysis techniques is to select a discrete, readily observable behavior that can be tracked reliably over time and is sensitive to the environmental intervention under control of the researcher/clinician. A review of the behavior analysis literature did not find single-subject studies that employed the types of psychological measures that are the focus of this volume. Instead, the investigator selects a marker behavior that is the most suitable indicator of an underlying construct. In the case of phobic, avoidance, or even abusive behaviors, selection of a behavioral marker to track over the course of treatment is quite straightforward. When the type of distress or functioning is more generic (e.g., depression, or interpersonal attachment/commitment), then the selection of one or more marker behaviors may be more problematic. Texts on behavioral analytic techniques (e.g., L.W. Craighead et al., 1994) will caution the reader that the selection of the marker behavior or indicator must make sense both theoretically and empirically (for its psychometric qualities). Contrasting Psychological Test Scores for an Individual on a Standardized Instrument with Published Norms This approach represents the core of the tradition of what is popularly called psychological testing. There are several examples distributed through this text (e.g., the chapters on the MMPI-2, MMPI-A, Conners Rating Scale). A major function of this tradition is to present the results of psychological test for an individual with those of either a clinical or a nonclinical norm to justify the need for treatment to a third-party payer

< previous page

page_212

next page >

< previous page

page_213

next page > Page 213

or an institution (e.g., a hospital or school). This approach is also used to justify continued treatment. Using the results of such tests to indicate when an individual's treatment should stop is not part of the tradition of psychological testing. Use of psychological testing in follow-up to treatment appears to be confined to treatment outcome studies where results on groups of consumers under one or more treatment conditions are compared. One conclusion that could be drawn from the traditional uses of single-subject studies is that the use of psychological testing of the sort covered in this text cannot be employed in single-subject research. That conclusion cannot yet be drawn for two reasons. First, there is a great need for single-subject studies in developing new treatments (Greenberg & Newman, 1996). Also, there are important applications for use of tests such as these by the individual clinician (or a clinical team) in tracking the progress of an individual consumer/client, providing justification for initiating, continuing, and shifting treatment strategies to a thirdparty payer (Newman & Tejeda, 1996). Selection and Use of Psychological Tests in Single-Subject Clinical and Research Applications Clinicians, supervisors, and mental health administrators, arguably, need to know whether or not the services they provide ameliorate the consumer's symptoms and improve on their capacity to manage their community functioning. Such knowledge should be useful in treatment planning and decision making. For instance, if the intervention yields little or no improvement where such improvement should be expected, given what is known in the literature, then the service provider(s) would want to consider a different intervention strategy. If the person is in individual therapy, the clinician might decide that progress could be achieved if the family is brought into the therapy process, or perhaps if the frequency of therapeutic contact is modified to be more frequent or changed to another mode, such as group or family therapy. For example, if observed behavior patterns move sufficiently in a negative direction, then more intensive or restrictive treatment (e.g., initiating or increasing the dosage of a medication, the use of day treatment or inpatient treatment) might be justified. On the other hand, the charting of the person's functioning might lead the clinician to realize that there has been sufficient symptom reduction and improvement in the management of their day-to-day functioning to warrant a discussion of termination from treatment. Too frequently, however, the clinician has no method for systematically charting individual consumers' progress. Well-established psychological instruments, such as those reviewed in this book, are ideally suited to chart the progress of an individual, and hence assist service providers in clinical decision making (e.g., when to refer the person to a more or less intensive treatment, when to terminate treatment, when to refer to another type of intervention). The psychological tests discussed here are widely used, reliable, and valid measures with established norms. Moreover, most of these tests are easy to administer, score, and interpret. Thus, it appears to be a relatively straightforward procedure to administer and score a few carefully selected instruments repeatedly throughout treatment, plot the scores on a graph, and use this information to assess the consumer's progress. What follows are the basic guidelines for designing a procedure to systematically collect and analyze data to assess individual patient progress.

< previous page

page_213

next page >

< previous page

page_214

next page > Page 214

Selection of an Instrumentation As discussed earlier, the tests and measures reviewed in the current volume can be very useful to individual clinicians because they are widely used, and have well-established psychometric properties and norms. But a more important consideration in single-subject research is that the selected test must demonstrate sensitivity to change. In psychotherapeutic work, especially in the early phase of treatment, important changes might be quite subtle. Marked changes in conduct disorder (see examples using the Child Behavior Checklist, or CBCL, by Achenbach; or the Child Adolescent Functional Assessment Scale, or CAFAS, by Hodges; or when using the Beck Depression Inventory) are not observable until substance use/abuse is curtailed and/or the family becomes engaged in the treatment (Liddle, Rowe, Dakof, & Lyke, 1998). Moreover, the clinician must make sure the selected test measures a behavior, attitude, or emotion expected to change as a result of the treatment. For example, in Multidimensional Family Therapy (Liddle & Saba, 1981) of adolescent drug abuse, the clinician targets parenting behaviors. Specifically, the therapists work to change how parents control (i.e., set limits and monitor), nurture (i.e., express love and concern), and grant autonomy to their acting out adolescent. It is important, then, that the clinician measure these specific parenting behaviors, rather than overall family closeness, such as is measured by the Family Environment Scale (Moos, 1994). The clinician must be careful to select a measure that is sensitive to the types of changes targeted in the intervention. For example, youth referred for substance abuse treatment frequently have behavior problems and symptoms that are comorbid with the drug use. For example, one youth might be particularly aggressive and violent. In this case, one of the first treatment goals would be to reduce the violent and aggressive behaviors. Aggressive behavior scale should be assessed with this particular youth, using a measure of such as the Achenbach's Child Behavior Checklist (CBCL) or Youth Self-report (YSR). Another drug using youth might present with depression and aggression; in this case, the clinician would be wise to measure both aggression and depression. Thus, as in any research, the researcher (or in this case the clinician) must carefully select measures that assess the targets of the intervention (see chap. 5, by Newman, Ciarlo, & Carpenter, for a discussion of the guidelines for selecting a measure). Selection of Criterion on which to Base Clinical Decision Making Targeting Clinical Decisions. The biggest challenge in what is proposed here is the absence of clear standards for decision making with regard to admissions, referral to more intense or less intense treatment settings, or termination of treatment. This is the start of an era where neither a clinician, a group of clinicians, nor a service agency simply opens the doors of a service and provides treatment for all those who enter. Decisions must be made about what the needs are (usually requiring some epidemiological needs assessments estimates) and to what degree these needs are being met by exiting service providers. The epidemiological technology of assessing mental health service needs should provide a profile of the major psychosocial characteristics of those who need a service. These characteristics typically include age, gender, socioeconomic status, major symptoms/diagnostic groups, community functioning level, and level of social support. Once a decision is made that there is a group of people who can be served, then a set of measures and instruments need to be selected to support four distinct sets of clinical decisions:

< previous page

page_214

next page >

< previous page

page_215

next page > Page 215

1. What are the psychosocial characteristics of those who should be admitted into the service? 2. What are the characteristics that indicate the need for referral to a more intensive-restrictive service? 3. What are the characteristics that indicate the need for referral to a less intensive-restrictive service? 4. What are the characteristics that indicate if services are no longer needed? To address each of these questions, operational definitions and measures are needed to estimate level of functioning, symptoms, socialization, and overall ability for an individual to manage their day-to-day affairs (Newman, Hunter, & Irving, 1987). Moreover, each of the last three questions also raises an issue of amount (dosage) and type of therapeutic effort, and amount of change in symptom distress, functioning, or selfmanagement achieved over time (Carter & Newman, 1976; Howard, Kopta, Krause, & Orlinsky, 1986; Howard, Moras, Brill, Martinovich, & Lutz, 1996; Newman & Howard, 1986; Newman & Sorensen, 1985; Newman & Tejeda, 1996; Yates, 1996; Yates & Newman, 1980). Figure 7.1 illustrates how an individual consumer may progress relative to the four aforementioned questions. This example uses technology that is currently available, with one limitation to be discussed later. The intent of Fig. 7.1 is to show that by employing existing technology, it is possible to describe the impact of two or more interventions in terms of its cost-effectiveness in achieving observable criteria. The example illustrated in Fig. 7.1 offers a graphic analysis in the form of a progress-outcome report for three hypothetical individuals. These three persons have been identified as having similar characteristics at the outset of treatment (i.e., all three have similar initial levels of functioning, and have service plans for a 26-week period, such that the expected cost of administering the treatment plan is the same). However, two individuals differ with regard to their performance over the course of treatment. Person 1 is expected to improve in overall functioning to a point where she/he can either be referred to a service that is less intensive, or she/he may be terminated from services because his/her functioning and/or self-management is adequate to be independent of formal treatment supports. Person 2 has maintained functioning or level of self-management within the bounds described as acceptable to community functioning. Person 3 represents a person whose treatment goals were similar to that of Person 2, and is discussed later. The vertical axis of Fig. 7.1 represents a person's overall ability to function in the community. It is understood that this global measure must be supported by a multidimensional view of the persons within each group (Newman & Ciarlo, 1994). The horizontal axis represents the cumulative costs of providing services from the beginning of this episode of treatment. The two horizontal dashed lines inside of the box represent the behavioral criteria set a priori, that is, the lower and upper bounds of functioning for which this mental health service is designed to serve. If consumers behave at a level below the lower dashed line, then they should be referred to a more intensive service. Likewise, if consumers behave at a level above the upper dashed line, then either services should be discontinued or a referral to a less intensive-expensive service should be considered. Finally, the vertical dashed line inside of the box represents the cumulative costs of the planned services for the 26-week period. In this hypothetical case, all three groups have treatment plans with the same expected costs. The circled numbers within Fig. 7.1 track the average progress of consumers within each of the groups at successive 2-week intervals. The vertical placement of the circled number represents the group's average level of functioning at that time and the horizontal placement represents the group's average costs of services up to that point. The sequence of 13 circled numbers represents the average progress of a person within 2-week intervals

< previous page

page_215

next page >

< previous page

page_216

next page > Page 216

Fig. 7.1. Hypothetical example of an analysis of changes in functioning and costs relative to managed care behavioral criteria for three consumers. Person 1 met the planned objective of improved functioning such that the person was able to move to terminate services within 6 months. Person 2 was able to maintain functioning over the same 6 months, but to use about the same amount of resources. Person 3 required additional resources to maintain adequate levels of community functioning. Adapted from "The Need for Research Designed to Support Decisions in the Delivery of Mental Health Services" by F.L. Newman & M.J. Tejeda, 1996, American Psychologist, 51. Copyright © 1996 by the American Psychological Association. Reprinted with permission. over the 26 weeks of care. For Person 1, with a 6-month objective of improvement above that of the upper dashed line, the objective is met. For Person 2, with the 6-month objective of maintaining functioning within the range represented by the area between two dashed lines, the objective also is met. Person 3 is an example of a client who exceeded the bounds of the intend service-treatment plan after 4 months, 2 months shy of the objective. People were able to maintain their community functioning, but only with the commitment of additional resources. A continuous quality improvement program should focus a post-hoc analysis on these consumers to determine if they belong to another cost-homogeneous subgroup or whether the services were provided as intended, as well as whether modification of current practices is needed.

< previous page

page_216

next page >

< previous page

page_217

next page > Page 217

A major difficulty in attempting to enact these recommendations is that of obtaining a believable database to determine the appropriate progress-outcome criteria or the expected social or treatment costs. Some have used expert panels to set the first draft of such criteria (e.g., Newman, Griffin, Black, & Page, 1989; Uehara, Smukler, & Newman, 1994). Howard et al. (1986) used baseline data from prior studies, along with measures taken on a nonpatient population. But both strategies have problems. The research literature is largely based on studies where the dosage was fixed within a study, thereby constricting inferences about what type and amount of effort can achieve specific behavioral criteria. Howard's early work on a dosage and phase model of psychotherapy was statistically confounded by combining the results of controlled studies where number of sessions was fixed, with data from naturalistic studies. Some of the naturalistic data came from persons who had no limit on sessions and others did have limits on the number of sessions imposed by third-party payers. Based on previous experiences, it would appear the dosage data as reported are probably trustworthy. However, there are no studies where behavioral criteria were set a priori or in which the type and amount of efforts were seen as the dependent variables in estimating the success or failure to achieve these criteria. Yet the logic of managed care and the logic of the National Institute of Mental Health (NIMH) practice guidelines would require that such study results be available to set criteria for using a particular intervention strategy or to set reimbursement standards. This void must be filled by data from well-designed efficacy and cost-effectiveness studies that can provide empirical support for setting behavioral outcome criteria for managed care programs. Without such data there is faint hope of changing the current practice of setting dosage guidelines independent of behavioral criteria. Patient Profiling. The technique developed by Howard et al. (1996) is sufficiently different from the example given in Fig. 7.1 to warrant a more detailed discussion. Howard et al. (1996) introduced the technique as a new paradigm of "patient-focused research." The paradigm addressed the question of "Is this patient's condition responding to the treatment that is being applied?" (p. 1060). The technique makes use of the progress and outcomes of a large database (N > 6,500) of adults who received outpatient psychotherapy and the dose-response curves that were obtained for those in the database. Based on the prior research reported by Howard's group, a predictive model of expected change based on seven intake characteristics can be generated (W. Lutz, personal communication, January 14, 1998). The seven intake variables are Level of Well-being, a subscale of the Mental Health Index (Howard, Luger, Mailing, & Martinovich, 1993); Level of Functioning, a subscale of the Mental Health Index (Howard et al., 1993); Symptom Severity, a subscale of the Mental Health Index (Howard et al, 1993); Prior psychotherapy (none, 1-3 months, 4-6 months, 6-12 months, 12+ months); Chronicity ("How long have you had this problem?"); Expectation ("How well do you expect to feel emotionally and psychologically after therapy is completed?"); and Clinician's rating of the Global Assessment of Functioning (Axis V of DSM-IV). Prior to the initial therapy session, the person is asked to complete the Mental Health Index, which includes items covering six of the seven variables (plus other areas of functioning). The values obtained on these six variables, plus the clinician's rating of the person's Global Assessment of Functioning (GAF) are entered as predictor variables into a growth curve modeling program (e.g., HLM; Bryk & Raudenbush, 1992) to obtain an estimate of the expected dose-response curve for that person. Formally, this is called a Level 2 analysis because the predicted outcome for the individual uses the results of the subset of those people with the database of approximately 6,500 people

< previous page

page_217

next page >

< previous page

page_218

next page > Page 218

who have scores on the seven predictor variables similar to those given by the person under study. The HLM program also permits an estimate of what is described as a "failure boundary," based on the growth curve of those persons for whom the expected growth curve was either flat (i.e., a slope of zero) or negative. Once the expected dose-response curve for those who showed improvement and the failure boundaries are estimated, then it is just a matter of obtaining and plotting the results of the scores a person obtains on the Mental Health Index after every fourth session. The plot of these data every four sessions represents the individual's growth curve that can be contrasted with the expected growth curve and the "failure boundary" curve. Figures 7.2 and 7.3 provide examples of outcome relative to the expected and "failure" growth curves for a hypothetical person undergoing psychotherapy treatment. The patient profiling method, though elegant, would appear to be beyond the resources of most clinicians working in private individual or group practices, small public clinics, or hospital settings without access to a large database from which they could use to estimate the growth curves. Moreover, most professionals currently working in the real world of clinical service delivery were not trained to use the latest statistical packages involving growth curve analysis. In spite of the limitations already mentioned (measures with appropriate norms, standards on which to base clinical decisions, adequate criterion for clinical decision making based on a large sample size), we still recommend that careful assessment of patient progress on measures be collected over the course of treatment. Graphing of these measures will then assist in clinical decision making.

Fig. 7.2. Course of Mental Health Index for a person for whom treatment progress and outcome had a good prognosis and the observed progress and outcome was better than expected. From "Evaluation of Psychotherapy: Efficacy, Effectiveness, and Patient Progress" by Howard et al., 1996, American Psychologist, 51, p. 1062. Copyright © 1996 by the American Psychological Association. Reprinted with permission.

< previous page

page_218

next page >

< previous page

page_219

next page > Page 219

Fig. 7.3. Course of Mental Health Index for a person for whom treatment progress and outcome had a poor prognosis and the observed progress and outcome was even worse than expected. From ''Evaluation of Psychotherapy: Efficacy, Effectiveness, and Patient Progress" by Howard et al., 1996, American Psychologist, 51, p. 1062. Copyright © 1996 by the American Psychological Association. Reprinted with permission. Clinical Significance as an Outcome Criterion. Still another approach is to identify progress or outcome on a standardized measure relative to a criteria of clinical significance (Jacobson & Truax, 1991). This can be done when the assessment instrument has been administered to both a clinical and a nonclinical group. A clinically significant change in behavior is when a person's score on the psychological test is statistically significant from the pretest average of a clinical group and more like that of the distribution of the nonclinical group than the distribution of the clinical group. When there are no norms on a nonclinical group, then the convention recommended by Jacobson and Truax is to say that clinically significant change has occurred if the difference score from pretreatment to posttreatment is significant and statistically equal to or greater than two standard deviations in the direction of improved functioning. Howard and his colleagues (1996) recommended that change equal to or greater than 1.8 standard deviations in a positive direction would recommend a score that is more likely to be represented by the nonclinical group than the clinical group prior to treatment. Practical Considerations: What Data to Collect and When? There is a diversity in opinions among clinicians about whether to, and/or how often and when (prior to or following a therapy session), to collect client self-report and/or clinician data on a standardized instrument. Having gone this far into the current

< previous page

page_219

next page >

< previous page

page_220

next page > Page 220

chapter, it is probably safe to assume that readers hold some interest in collecting intake and outcome information from the consumer. From experience, a key issue underlying the question of "whether" to collect the data using a standardized instrument about the person's status is related to whether the participating clinician feels that collecting such information is useful to therapy. To state the obvious: Those who do not see any reason or use for such information in their treatment strategy do not willingly collect such data and those who find such information useful to their therapeutic intervention do routinely collect such data. Those who do find such information useful tend to describe their theoretical orientation as behavioral or cognitive-behavioral, although there are many exceptions among colleagues within all theoretical orientations. As Beutler (1991) pointed out, most practicing clinicians make use of a number of different techniques, behaving quite eclectically in their actual practice, even when there is a claim that one theoretical orientation guides the core of their treatment strategy. A recent Internet discussion (W. Lutz, personal communication, January 14, 1998) indicated that the diversity described earlier exists, even among those who say they have a major professional interest in outcome evaluation. On the basis of this nonscientific review of the issues and the demands for the graphic and statistical analytic methods proposed, the following guidelines are offered: 1. Collect data at intake prior to the initial interview to obtain a baseline measure. 2. Select a battery of instruments for the intake assessment, but only select easily completed instruments for progress evaluation. a. The selection of the battery of instruments used at intake should be useful for treatment planning and for identifying those intake variables (and circumstances) that would help identify the subset of persons for whom normative data (on either clinical or nonclinical populations, or both) exists. Thus, a key concern in selecting these instruments is whether such normative data are available. b. The measure administered during and following treatment must cover the domain(s) that are the foci of treatment. Because the focus of treatment may not be obvious prior to the first clinical interview, the selection and administration will probably follow the intake interview. This requires that the clinician have available a collection of instruments ready for such selection. 3. The person needs to be told in advance that time will be spent before and after the initial interview in performing the tasks required of the instruments. 4. The recommended frequency of administering the instrument should vary in accord with how the instrument is used and what kind of information the clinician (researcher) is seeking. Howard recommended every fourth session as ideal for estimating change relative to the dose-response curves his group has been studying. Others (e.g., Richard Hunter in Newman, Hunter, & Irving, 1987) have argued for the use of such an instrument at every session that the person is capable to complete such a form. Cognitive behavioral therapists have indicated that having the person complete the instrument prior to each session provides information that could guide the direction that therapy might take on that day. Because it is seldom clear as to when therapy will end, do not expect to easily obtain information from the last session. Thus, a strategy of routinely collecting information on a standardized instrument (e.g., with each session or with every fourth session) is reasonable. The development of new statistical routines describing the trajectory of change over time, but not requiring equal intervals between times of observation, will allow the scheduling of assessment periods in a way that is tailored more specifically to the treatment strategy. Suppose a therapist subscribed to the phase model recommended by Howard et al. (1993), and wanted to schedule assessment periods according to what they might expect

< previous page

page_220

next page >

< previous page

page_221

next page > Page 221

to be critical times for change within specific domains of functioning. For example, one strategy might start with a schedule of such assessments every session for the first 4 weeks during the remoralization phase, during the remediation (of symptoms) phase, and then once each 8 to 12 weeks during the rehabilitation (habit modification) phase. The application of patient profiling recommended by Howard et al. (1996) can readily handle the fitting of these data to the normative data to produce an expected growth curve, even though the intervals between data collection occasions do vary. Treatment Innovation This chapter has focused on how to use psychological tests to track patient progress and inform clinical decision making. The proposed procedures also can facilitate treatment improvement. If clinicians track individual patient progress over time in the ways discussed, they can use the data collected to compare successful outcomes with unsuccessful outcomes and to identify patterns that might lead to treatment innovations. Although there is a substantial body of research indicating the benefits of specific psychotherapeutic interventions (Shadish et al., 1997), few would deny that there is still considerable room for improvement. For example, a review of interventions with children and adolescents revealed that only from 5% to 30% of youth participating in such treatments evidence clinically significant change (Kazdin, 1987). The situation with child as well as adult treatment has fueled a movement toward the development of treatment innovations (Beutler & Clarkin, 1990; Onken, Blaine, & Boren, 1993). Kazdin (1994) identified seven steps to developing effective treatments: conceptualization of the dysfunction, research on processes related to the dysfunction, conceptualization of the treatment, specification of the treatment, tests of treatment process, tests of treatment outcome, and tests of the boundary condition and moderators. Repeated administration of the tests reviewed in the current volume, then, can guide treatment innovation and development. Conclusions It is somewhat amazing that the technology of single-subject research has not advanced further given its long history in the behavioral sciences. But then, the elegant examples and graphic models provided by early psychophysics and behavioral researchers set a baseline that has served us well. The key requirements of any research paradigm rest on the care with which the researcher provides adequate operational definitions of exogenous and endogenous variables, along with the care in collecting and presenting the data. These same requirements are particularly important in single-subject research and for the presentation of psychological testing data on a single individual at one point in time and over time. Thus, the traditional presentation of a profile on an individual relative to empirical norms that has been the standard in the reporting of psychological tests is still a worthy tool. The traditional staple of behavior analysts of plotting behaviors over time has been augmented by plotting the results of a psychological test on an individual over time relative to the changes expected in a norm treatment group

< previous page

page_221

next page >

< previous page

page_222

next page > Page 222

as recommended by Howard et al. (1996) and shown in Figs. 7.2 and 7.3, or relative to a standard as recommended by Newman and Tejeda (1996) as shown in Fig. 7.1. The introduction of the logic of "clinically significant change" by Jacobson and Truax (1991) identified another benchmark for the both group and single-subject research that should influence the interpretation of singlesubject research and clinical data. The critical question regarding the individual person becomes: Are the changes in behavior, as represented by psychological testing over time, clinically significant? But what is the standard that distinguishes the boundary between clinically significant and a nonsignificant outcome? Empirical norms, as exemplified in Figs. 7.2 and 7.3, should be the first choice. At a minimum, an operationally defined standard based on a consensus of experts (Newman, Hunter, & Irving, 1987; Uehara et al., 1994) could also be considered, but only as a temporary standard while an effort is made to collect normative data to set the standard. A basic standard set by the traditions of psychophysics and behavioral analysis is that of simplicity in the graphic display of the results. The integration of graphics with readily available spreadsheets and statistical packages on the PC has made the technology of simple graphic displays available to everyone. An expanded use of this technology in both clinical practice and in clinical research should become so common in the near future that such a chapter as this will need to take the discussion to a new plane. Hopefully, this means exploring other applications of the graphic analysis, such as statistical quality control theory (Green, in press) at that time. Acknowledgments Thanks are owed to our thoughtful and patient colleagues for their recommendations and comments on this chapter: Michael Dow, Howard Liddle, and Manuel J. Tejeda. References Beutler, L.E. (1991). Have all won and must all have prizes? Revisiting Luborsky et al.'s verdict. Journal of Consulting and Clinical Psychology, 59, 226-232. Beutler, L.E., & Clarkin, J.F. (1990). Systematic treatment selection: Toward targeted therapeutic interventions. New York: Brunner/Mazel. Bryk, A.S., & Raudenbush, S.W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage. Carter, D.E., & Newman, F.L. (1976). A client oriented system of mental health service delivery and program management: A work-book and guide (Series FN No. 4, DHEW Pub. No. ADM 76-307). Rockville, MD: Mental Health Service System Reports. Craighead, L.W., Craighead, W.E., Kazdin, A.E., & Mahoney, M.J. (1994). Cognitive and behavioral interventions: An empirical approach to mental health problems. Boston: Allyn & Bacon. Green, R.S. (in press). The application of statistical process control to manage global client outcomes in behavioral healthcare. Evaluation and Program Planning. Greenberg, L., & Newman, F.L. (1996). An approach to psychotherapy process research: Introduction to the special series. Journal of Consulting and Clinical Psychology, 64, 435-438. Hersen, M., & Barlow, D.H. (1976). Single case experimental designs: Strategies for studying behavioral change. New York: Pergammon Press. Hersen, M., Michaelson, L., & Bellack, A.S. (1994). Issues in Psychotherapy Research. New York: Plenum. Howard, K.I., Kopta, S.M., Krause, M.S., & Orlinsky, D.E. (1986). The dose-effect

< previous page

page_222

next page >

< previous page

page_223

next page > Page 223

relationship in psychotherapy. American Psychologist, 41, 159-164. Howard, K.I., Lueger, R.J., Maling, M.S., & Martinovich, Z. (1993). The attrition dilemma: Toward a new strategy for psychotherapy research. Journal of Consulting and Clinical Psychology, 54, 106-110. Howard, K.I., Moras, K., Brill, P.L., Marinovich, Z. & Lutz, W. (1996). Evaluation of psychotherapy: Efficacy, effectiveness, and patient progress. American Psychologist, 51(10), 1059-1064. Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19. Kazdin, A.E. (1992). Research design in clinical psychology. Needham Heights, MA: Allyn & Bacon. Kazdin, A.E. (1987). Comparative outcome studies of psychotherapy: Methodological issues and strategies. Journal of Consulting and Clinical Psychology, 54, 95-105. Kazdin, A.E. (1994). Methodology, design and evaluation in psychotherapy research. In A. E. Bergin & S.L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 543-594). New York: Wiley. Liddle, H.A., Rowe, C.L., Dakof, G.A., & Lyke, J. (1998). Translating parenting research into clinical interventions for families of adolescents. Clinical Child Psychology and Psychiatry, 3, 419-443. Liddle, H.A., & Saba, G. (1981). Systemic chic: Family therapy's new wave. Journal of Strategic and Systemic Therapies, 1, 36-69. Meehl, P.E. (1954). Clinical versus statistical prediction. Minneapolis: University of Minnesota. Moos, R. H. (1994). Editorial: Treated or untreated, an addiction is not an island unto itself. Addiction, 89, 507509. Newman, F.L. (1983). Level of functioning scales: Their use in clinical practice. In P.A. Keller & L.G. Ritt (Eds.), Innovations in clinical practice: A source book. Sarasota, FL: Professional Resource Exchange. Newman, F.L., Griffin, B.P., Black, R.W., & Page, S.E. (1989). Linking level of care to level of need: Assessing the need for mental health care for nursing home residents. American Psychologist, 44, 1315-1324. Newman, F.L., & Howard, K.I. (1986). Therapeutic effort, outcome and policy. American Psychologist, 41, 181187. Newman, F.L., Hunter, R.H., & Irving, D. (1987). Simple measures of progress and outcome in the evaluation of mental health services. Evaluation and Program Planning, 10, 209-218. Newman, F.L., & Sorensen, J.L. (1985). Integrated clinical and fiscal management in mental health. Norwood, NJ: Ablex. Newman, F.L., & Tejeda, M.J. (1996). The need for research designed to support decisions in the delivery of mental health services. American Psychologist, 51, 1040-1049. Onken, L.S., Blaine, J.D., & Boren, J.J. (1993). Behavioral treatments for drug abuse and dependence (NIDA Research Monograph No. 137). Rockville, MD: U.S. Department of Health and Human Services. Osgood, C.E. (1953). Method and theory in experimental psychology. New York: Oxford University Press. Shadish, W.R., Matt, G.E., Navarro, A.M., Siegle, G., Crits-Christoph, P., Hazelrigg, M. D., Jorm, A.F., Lyons, L.C., Nietzel, M. T., Prout, H.T., Robinson, L., Smith, M. L., Svartberg, M., & Weiss, B. (1997). Evidence that therapy works in clinically representative conditions. Journal of Consulting and Clinical Psychology, 65(3), 355-365. Stevens, S.S. (1951). Handbook of experimental psychology. New York: Wiley. Stevens, S.S. (1966). Metric for the social consensus. Science, 151, 530-541. Uehara, E.S., Smukler, M., & Newman, F.L. (1994). Linking resources use to consumer level of need: Field test of the "LONCA" method. Journal of Consulting and Clinical Psychology, 62, 695-709. Yates, B.T. (1996). Analyzing costs, procedures, processes, and outcomes in human services. Applied Social Research Series (Vol. 42). Thousand Oaks, VA: Sage. Yates, B.T., & Newman, F.L. (1980). Findings of cost-effectiveness and cost-benefit analyses of psychotherapy. In G. Vanden Bos (Ed.), Psychotherapy: From practice to research to policy (pp. 163-185). Beverly Hills, CA: Sage.

< previous page

page_223

next page >

< previous page

page_xi

next page > Page xi

For Abby, Katie, and Shelby

< previous page

page_xi

next page >

< previous page

page_225

next page > Page 225

Chapter 8 Selecting Statistical Procedures for Progress and Outcome Assessment: The Analysis of Group Data. Frederick L. Newman Florida International University Manuel J. Tejeda Gettysburg College The selection of appropriate statistical procedures in the analysis of psychological test results must be driven by the clinical and decision/administrative environments in which the procedures are used. This chapter provides recommendations and guidelines for selecting statistical procedures useful in two such environments: screening and treatment planning, and progress and outcome assessment. Each environment has unique demands warranting different, though not necessarily independent, statistical approaches. Concurrently, there must be a common concern regarding a measure's psychometric qualities and its relation with outcome. For screening and treatment planning, there is greater concern with predicting the effectiveness of outcome and the concomitant costs of resources to be consumed in treatment. For progress and outcome applications, there is the additional requirement of sensitivity to the rate and direction of the change relative to treatment goals. The discussion in this section focuses on issues of analysis that should be addressed and guidelines for evaluating and selecting statistical procedures within each application. The chapter is organized such that the more commonly used statistical models are discussed under a number of different clinical topics (issues or questions), for example, traditional regression or analysis of variance. In each instance, the exact form of the analysis takes on the format best suited to the clinical issue under discussion. Moreover, the analysis is usually contrasted with alternative approaches regarding assumptions, interpretations, and practicality. It should be noted that the best approach is more dependent on the specific clinical issue under investigation. A major theme of this chapter recommends that the clinical question should drive the selection of the analytic approach. Approach to Presenting the Statistical Material The logic of presentation is to first discuss a specific clinical or mental health service issue and then to recommend one or more statistical procedures that can address the issue. The language of the mathematical expression underlying the statistical procedure

< previous page

page_225

next page >

< previous page

page_226

next page > Page 226

serves to bridge the clinical issue with the statistical procedure. The expressions are presented here for three reasons. First, the clinical focus of the discussion is designed to help readers understand the logical link between the clinical issue and the statistical procedure. Second, the discussion is designed to provide readers with a sufficient understanding of the statistical logic and vocabulary to read and use a statistical computer package manual and related texts, or to converse with their resident statistician. Third, the discussion should help readers understand where and how the link between the clinical issue and the statistical procedure is strong and where it is weak. References and computational details, along with examples of how the technique is used in clinical research applications, are provided. Discussion on selecting statistical procedures will follow from two baselines. One is a formal conceptualization of measurement: What does the instrument seek to measure and what are the potential sources of error in the measurement? The second baseline is the clinical or service management question that is being asked of the measure. What follows is a definition of the general model and notation used throughout the chapter. Note that this model and notation is introduced for descriptive purposes. It is not the only model, or necessarily the best model for all situations. It is, however, a model that lends itself to a discussion of treatment planning and outcome assessment in mental health services. Collins and Horn (1991) provided a good review of alternative models. Suppose that at a specific time (t) researchers are interested in obtaining a measure Yijkt that proposes to describe a particular domain of human functioning (d) on the ith individual who belongs to a particular target group (bk) and is receiving a specific treatment (aj). The measurement model describes the influence of the person belonging to the kth target group and being under jth treatment at time t, on the observed behavior (Yijkt) of the domain called d, as follows:

The term abdijkt is the interaction of the jth treatment and the kth target group that influences the functional domain (d) for the ith subject at the time t, when the measure was taken. As a final note, Yijkt must contain the characteristics of the traditional operational definition. The measurement of Yijkt is bound by the same issues regarding accurate and valid measurement as other variables in the equation, such as subject characteristics and treatment assignment. There are two features of this model that are different from that which is offered in standard texts. First, the time that the measure is obtained (t) is included as a subscript in each term, and as such appears to be a constant. Time, of course, is a constant but t is included here as a reminder that all measures, particularly clinical measures of functional status, are time dependent. Such measures reflect states rather than traits (Kazdin, 1986). A more formal statement of the model could have treated time as an additional element of the model, thereby adding complexity to the presentation. Another tactic could have left t out completely. However, the temporal nature of the measure of functional status is important in most clinical service applications. Thus, a simple subscript t is used to indicate the temporal status of the measurement model. The clinical and statistical issues involved in measuring changes in functional status over time (progress and outcome) are discussed later, when the focus is on the impact of treatment and for whom the treatment works best. Second, this measurement model adds the parameter d to represent a particular domain of functioning (behavior). As with the use of the time element in the expression, d, a specific functional domain, is added for emphasis. Each of the elements in the

< previous page

page_226

next page >

< previous page

page_227

next page > Page 227

expression (treatment or target population characteristic in this case) should be seen as interacting with the measure of the individual on a specific functional domain. The inclusion of d is a reminder that the model may not hold if the observation made (Yijkt) does not actually represent the behavioral domain of interest. Prior to treatment the model reduces to:

where the term bdikt is the true value of the domain for the ith person belonging to the bk target population, at a given time (t). The last term, eikt, is an error term for the ith person that combines potential differences (error) due to at least three (potentially interacting) features: (a) Item (measure) difficulty (bit) at that time; (b) imprecise measurement at that time (mit); and, (c) individual differences introduced by that person (the individual's state) at that time (dit). The three potentially confounding components of the error term are discussed later. As additional target population characteristics are considered, the potential for it to interact with each of the other terms will only add to the model's complexity. Because most scoring procedures attempt to derive a composite score for a set of factors, they add sources of variation such as client characteristics, which typically compound error (i.e, combined with b, m, and d). Because these sources of error are nonsystematic, they cannot simply be subtracted from aggregate scores. The number of sources of variance expand with the addition of just one additional client characteristic (G1). This would add two fixed interactions with treatment effects (aGjl and abGjkl) and the potential for up to seven random interactions with the three confounded components of error (item difficulty b, error of measurement m, and individual differences d). Despite what appears to be intractable complexity, there are a number of measures available that have demonstrated sufficient psychometric quality, with sufficiently strong treatment effects and client effects, but small random error effects. These instruments may be applied to a fairly wide range of client characteristics. This is consistent with the outcome assessment recommendation of an NIMH expert panel (see Newman, Ciarlo, & Carpenter, 1998) that an ideal measure should be able to serve a wide range of client groups. The wide applicability of these measures also decreases the costs of providing different measures for each of the client groups and increases the measure's utility (e.g., in planning or evaluating a program of services). Measurement and Statistics Despite advances in methodology and analysis, measurement remains the foundation of statistical inference. The validity of these measures dictates the extent and nature of inferences that can be made as a result of the analysis. Thus, measurement issues cannot be underestimated in their importance when considering the presentation of statistical material. Fluency with measurement issues will impact both the design of studies and the subsequent statistics used in the analysis of their data. The interaction between design, measurement, and statistics is often overlooked when planning a study. Briefly, consider the impact of an invalid or unavailable measure on design and statistics. Clearly, a study could not be designed if measures were unavailable. Likewise, interpretation of results would be impossible in the presence of an invalid instrument.

< previous page

page_227

next page >

< previous page

page_228

next page > Page 228

In terms of the organization of this chapter, screening is synonymous with measurement, and treatment planning is based on valid measurement. Moreover, progress and outcome analyses as discussed in the subsequent section are impossible without valid and reliable measures. Thus, the next sections present a careful review of the foundations of measurement as well as new advances in the field of psychometrics, linking these points to the discussion of statistical analysis. Screening and Treatment Planning Primary Objectives There are two. The first is to provide reliable and valid evidence as to the appropriateness of the client's eligibility for a treatment. If appropriate and eligible, then the second objective is to obtain evidence as to which treatments or services would best help the client progress toward the outcome goals. At a minimum, statistical procedures required to assess reliability and validity must be evaluated in the context of these two objectives. Scale Reliability and Consistency of Clinical Communication Although a scale may have a published history of reliability, it is sometimes useful to determine if the local applications of the scale are reliable and, if not, to identify the factors contributing to reduced reliability. Note that a number of forms of reliability exist. In the current discussion, refer to estimates of internal consistency, particularly what is commonly called Cronbach's alpha. However, other forms of reliability provide valuable information. The coefficient of concordance (sometimes referred to as parallel forms) provides information about how sections of a measure relate to one another and if measurement is consistent over sections measuring the same construct. Similarly, the coefficient of stability (sometimes referred to as test-retest) provides information about change in measurement over time when no intervention has occurred. Such forms of reliability are not discussed here, but nevertheless add to the overall psychometric evaluation of an instrument. By conducting studies to assess the scale's reliability, it is possible to investigate the consistency of staff communication about the client. The concern for staff communication emanates from the need for staff to maintain a consistent frame of reference when discussing a client's strengths, problems, treatment goals, and progress. Moreover, if done properly, it will be possible to identify the factors depreciating the reliability of staff communication, thereby determining the need for staff development and future training. The general concept of reliability can be described as a proportion. The proportion's numerator describes the amount of variability due to true measurement of individuals' functional characteristics, bd. The denominator describes the referent, or baseline variability. The referent is the variability due to the functional characteristics plus the variability due to extraneous factors, e (e.g., those factors that reduce the consistency with which the instrument is employed). Essentially, the variance that the items have

< previous page

page_228

next page >

< previous page

page_229

next page > Page 229

in common is divided by the total variance of the items in the scale. The proportion is expressed as follows:

Reliability increases directly with variation due to differences in the functional characteristics in the target population, s2bd, and decreases with variability due to extraneous factors, s2e. Internal consistency considers the variability among items, Ip, used to estimate the functional characteristic, d. The expected value of a set of unbiased items, Ip, equals the expected value of d for the ith person in the bk target group. As the correlation, rId, increases, the items are said to have increased internal consistency with each other and with their estimation of d and therefore to have increased reliability. This can be seen in following expression:

where (1 - r2Id) is the proportion of the total variance that is not described by the relation between I and d across individuals. Earlier in the chapter, "item difficulty" was identified as a potential source of error variance in the basic measurement equation. Item difficulty in the present context could be described as the extent to which the internal consistency of items varies among persons functioning at different levels within the target population at time t, bkt. Reliability-internal consistency estimates could be used to estimate item difficulty effects. This could be done by obtaining the reliability-internal consistency estimates for persons functioning at the lowest, middle, and highest third of the distribution on a global measure of functioning (e.g., using the current DSM-IV Axis V). If the proportions do not differ significantly, then item difficulty, as defined here, is not a significant source of error variance. (It must be noted that in other environmentse.g., educational performance testing and personnel selectionhaving item difficulty as a significant source of variance is considered to be desirable.) Procedures for estimating internal reliability are part of most statistical packages (e.g., SPSS, SYSTATTESTAT, SAS) as either a stand-alone computer program or part of a factor analysis program. Interrater Reliability and Clinical Communication The second major concern regarding reliability in the screening and treatment planning processes is interrater reliability. This is particularly relevant where raters are members of a clinical team. If their judgments vary, then treatment goals and treatment actions will differ in ways not necessarily related to client needs. For interrater reliability the descriptive expression is:

The magnitude of the interaction, S2(rater × bd), decreases as agreement among raters increases. Thus, interrater reliability increases as this interaction decreases. Four major

< previous page

page_229

next page >

< previous page

page_230

next page > Page 230

features of an instrument are said to increase interrater reliability. The first is to increase the internal consistency of the items, particularly by anchoring the items to objective referents that have the same meaning to all team members. The second is to develop the instrument's instructions so as to minimize the influence of inappropriate differences among raters. Unfortunately, the only means of uncovering ''inappropriate" rater behaviors is through the experience of pilot testing the instrument in a variety of situations. Thus, it is also important to look at the testing history of the instrument. The third is training and retraining to correct for initial rater differences and the "drift" in the individual clinician's frame of reference that can occur over time. Fourth and last, it is critical to anchor observation in behaviors as much as possible to reduce inference. The greater the inference required, the greater the likelihood of error being introduced. High inference coding procedures generally produce lower reliability estimates than low inference coding procedures because perception varies by individual raters as well as the intentions of those being rated. To maximize interrater reliability, training manuals with example cases are very important. As described in Newman and Sorensen (1985), it is possible to fold some of the training and retraining activities into treatment team meetings and case conferences. Discussions of methods of assessing factors that may be influencing interrater reliability are presented in Newman (1983) and in Newman and Sorensen (1985). The major computer packages provide programs that permit partitioning the sources of variance (univariate or multivariate) due to differences among raters when multiple-rater data are entered for the same individuals. Measurement Model Testing Measurement modeling currently represents one of the forefronts of psychometrics in terms of construct validation. This section provides an introductory review of measurement modeling and testing. Remember that the purpose here is to detail the statistical process in an effort to refresh the reader in the analytic method, or to provide enough foundation that a conversation with a resident statistician is possible during analyses. Measurement modeling is a fundamental step to be conducted prior to data analysis with regard to research questions. Psychometric examination remains a universal first step in data analysis. Measurement modeling is conducted via the more general technique of structural equation modeling. A number of software packages for structural equation modeling are available, including the more common LISREL (developed by Joreskog & Sorbom), EQS (developed by Bentler), and AMOS (developed by Arbuckle). Because these packages are under constant review and revision, it is best to seek the latest information about each. There are also shareware programs available on the Internet. Each program handles data slightly differently and differential constraints on variance are placed by each program. Users of these programs are urged to familiarize themselves with the subtleties of the program they use in order to understand their results. Measurement modeling initially involves identifying the factor structure of each of several instruments and subsequently defining relations among instruments based on similar constructs. Consider a 10-item instrument composed of two, 5-item scales that measure M and N. In principal axis factor analysis, the 10 items would form a linear composite that may or may not reflect the two constructs, M and N. Factor analysis

< previous page

page_230

next page >

< previous page

page_231

next page > Page 231

partitions variance and creates factors mathematically. Conversely, measurement modeling imposes a structure on the data forcing the program to consider only the solution of two factors composed of the designated 5-item sets. (The terms latent variable, construct, and factor are often used interchangeably, with latent variable used more often in structural equation modeling.) The formal mathematical statement that represents how well a set of items measures a construct plus error is

Where X represents a vector of observed responses of items. Lx represents a vector of factor loadings imposed by the user that essentially suggests the amount or contribution of each item to the vector of factors, X. Finally, d represents a vector of disturbance terms, or error. This equation can easily be expanded to include various measures at the item level. Because it involves vectors, the equation represents an unlimited set of items relating to an unlimited set of constructs. In measurement modeling, it is desirable to determine how constructs relate to one another, thus any number of constructs can be introduced in order to examine their interrelations. For example, another measure with two subscales, D and E, could be added to M and N. All items could then be included in the analysis and a 4-factor model of M, N, D, and E could be tested. Or, if M and D supposedly measure the same construct, it would be possible to test a three-factor model with items that loaded on M and D loading on a single factor. The power of measurement modeling is to define the interrelations of the data based on constructs prior to intervention and to create latent variables based on diverse measures that might also include diverse methods. In the previous example, M could be self-report and D could be clinician ratings. In examining factor structures via measurement modeling, there is also interest in assessing how well the a priori structure is reflected in the data. That assessment is termed "fit." A large number of fit indices have been developed, each of which is designed to reflect the amount by which a model reduces the covariance among items in a set of observed (or item-level) data by describing the results in terms of the factors or constructs instead of describing the results in terms of the individual items. Thus, a fit index of .90 often suggests that a hypothesized model has reduced covariance by approximately 90%. In recent years, fit indices have been developed to allow for model comparisons, assessment of parsimony, as well as covariance reduction. Like structural equation modeling in general, a full discussion of fit indices is impossible here. In general, however, the comparative fit index (CFI; Bentler, 1990) and the nonnormed fit index (NNFI; Bentler & Bonett, 1980) represent two important indices in the assessment of most measurement models with values approaching 1.00, representing the best fit. What does the testing of a measurement model mean in terms of construct validation? Measurement modeling provides a method of testing whether various sources of data (cf. self-report and observer) are related to one another, as well as whether various constructs are independent or interrelated. In a nutshell, the measurement model represents the measures and the interrelations of the measures employed in any study. A measurement model supported by a fit index exceeding .90 provides evidence that multiple measures, whether by source or method of the same construct, are related to one another. Thus, the measures that meet a conservative criteria such as .90 provide evidence of construct validity.

< previous page

page_231

next page >

< previous page

page_232

next page > Page 232

Other Forms of Scale Validation It is often inexpensive to collect some additional data to estimate the instrument's concurrent and construct validity as a screening instrument. This can be done by first identifying variables that ought to be related to (i.e., predict) the instrument's factor scores (scores derived from the factor analysis). Then a set of multivariate regression or analysis of variance equations are developed to estimate whether relations that ought to exist do so. Variables often used in such validation analyses for populations of persons with a severe mental illness are: prior psychiatric history (e.g., hospitalizations, number of episodes per year), major diagnosis (e.g., schizophrenic versus nonschizophrenic, with or without a dual diagnosis of substance abuse), employment (school attendance) history, scores on known measures (e.g., Beck Depression Inventory, SCL-90-R, State-Trait Anxiety Inventory, MMPI-2, current Global Assessment of Functioning), and social support (yes-no, number of contacts per week/month, or number of housing moves over the last 6 or 12 months). In the present context, these variables are referred to as predictor variables, that is, variables that the literature indicates should predict differences in scores on the instrument for persons in the target population. The selection of the predictor variables must be tailored specifically to the target population and the types of service screening decisions that need to be made. Once selected, the use of multivariate analysis of variance or regression analysis requires the user to identify these variables in a prediction equation. For the present case, the instrument's factor scores should be listed on the left side of the expression and the predictor variables are listed as a sum on the right:

Note that in the true multivariate case, each of the predictor variables is correlated or regressed on the family of measures Y1,Y2, . . . , Yp via a statistic called a canonical correlation, that is, a correlation between an independent variable and a set of dependent variables. The canonical correlation has an interesting clinical interpretation. First, assuming that the set of dependent measures in the multivariate analysis represents a profile of one or more behavioral domains of clinical interest, then the canonical correlation represents a description of the strength of the relation of that variable with the clinical profiles represented by the set of dependent measures. That knowledge could lead to developing hypotheses about how the manipulation or control of that predictor might influence outcome as represented by that set of dependent measures. If the predictor is a potential moderator variable, then the interpretation might be one of trying to determine for whom an intervention (treatment) works best. Some sets of predictor variables can be considered as main effects and some are best considered as a first-order interaction. For example, DSM-III-R Axis V (Global Assessment of Functioning) at admission is often a significant effect when considered as a first-order interaction with either current social support or prior psychiatric history on multiple factored scales for the seriously mentally ill (Newman, DeLiberty, Hodges, & McGrew, 1997; Newman, Griffin, Black, & Page, 1989; Newman, Tippett, & Johnson, 1992). Thus, the expression for this example would be:

To do this for a community-based sample, ordinal classes of Axis V scores (1-35, 35-50, 51-65, 65+) and prior hospitalization (none last 12 months, once, two plus) were created

< previous page

page_232

next page >

< previous page

page_233

next page > Page 233

that would produce a sufficient number of subjects within each combination (cell) of the interaction (Newman et al., 1992; Newman et al., 1997). For the mixed nursing home and community-based sample, the ordinal classes on Axis V were more detailed at the lower end (1-25, 26-40, 41-60, 61+; Newman et al., 1989). One Application of Reliable Instruments: Discriminant Analysis The focus here is on how well the instrument's scoring procedures, often factor scores, will lead to correctly placing a client into an appropriate service modality or program. The key questions here are: How does one define a correct placement? What is meant by an appropriate service modality or program? There have been two general approaches to answering these questions. One is to contrast the recommendations or predictions made by the instrument with the placement recommendations of an experienced group of clinicians. Suppose researchers wish to evaluate a scale with p factors in terms of how likely it will correctly place a person in the appropriate service modality. For a scale that discriminates well, a discriminant score, Di, can be estimated for an individual from the scale's p factor scores, Xj (j = 1, 2, . . . , p), where the value of Di, implies a recommendation of the most appropriate service modality placement. The computation of Di for the ith person is the sum of each of that person's factor scores, Xij, weighted by the factor's coefficient, bj. Thus,

To determine which service modality group assignment, Gk, is most appropriate, a Bayesian rule is applied that gives the probability of each group assignment for a given value of Di. In a discriminant analysis, the probability for each group assignment, Gk, is computed for each value of Di, and the group with the highest probability, relative to the other possible group assignments, is labeled the "appropriate" service modality. The specific expression of Bayes' rule for estimating the probability of an assignment to the Gk service modality group for the ith person is:

The second approach to validating the use of the instrument's factor structure (and the discriminant function analysis results) in service modality placement is to estimate the relations between the test scores, the program of services, and client outcome. If the relation between service program and outcome is improved by knowing the screening test results, then the instrument can be viewed as beneficial in the screening and treatment planning process. Collecting data for this approach takes considerable time. Thus, the discriminant analysis is best applied first. However, because successful outcome is considered to be the gold standard, the second approach should be planned and conducted as a long-term evaluation of a screening instrument's worth. Linking Levels of Need to Care: Cluster Analysis Another approach can be employed when using a multidimensional instrument to recommend a more complex array of treatments and services (e.g., services for persons with a serious and persistent mental illness). If an instrument has an internally consistent

< previous page

page_233

next page >

< previous page

page_234

next page > Page 234

factor structure, then a cluster analysis technique can be employed to identify a mix of consumers with similar factor scores that are likely to have similar treatment and service resource needs (Newman et al., 1989; Uehara, Smukler, & Newman, 1994). The approach uses one or more panels of experienced clinicians in a structured group decision process to identify a treatment plan for a person who is described as having a given level of problem severity or functioning within one of the factors. For example, the panel might be given the following description of a person with a moderate level of depression: Signs of Depression: insomnia-hypersomnia, low energy-fatigue; decreased productivity, concentration; loss of interest in usual activities and in sex; tearfulness and brooding. Moderate: The signs are less severe than intense (the previous task considered an intense level of depression) and will often, after a few days or weeks, shift to either periods of normal behavior or moderate manic levels. Because of severity and duration of the signs are less than severe, the person is generally more cooperative with therapeutic efforts, but will seldom seek out the assistance himself. The panel is provided a list of available services and the professional disciplines of those who could provide each service. Using nominal groups procedures, the panel is then asked to indicate which services would be provided for the next 90 days, by whom, how often, and with what duration (e.g., 1 hour per day, week, month, or for 90 days). In the study by Newman et al. (1989) of nursing home and community care in Utah, 31 services were available. The panels developed an array of services for each of three problem intensities within each of 11 factors: psychotic signs, confusion, depression with suicide ideation, agitated-disruptive behavior, inappropriate social behavior, dangerousness-victimization, personal appearance, community functioning capability, socialinterpersonal adaptability, activities of daily living, and medical-physical. This exercise lead to 33 treatment service plans. The final step in linking level of need to level of care is to cluster together those treatment service plans that require similar resources (mostly professional personnel) for similar amounts of time. To perform a clustering of similar treatment plans within the total of 33 treatment plans, a common measure of therapeutic effort was developed based on the costs of each unit of service recommended in each of the 33 treatment plans. The employment costs of the professional(s) in a given service were estimated by the usual accounting procedures based on the salaries of who did what, with what resources, how frequently, and for what duration. To complete the cluster analysis, a matrix of service costs for each of the 33 combinations of factor intensity by service was set. The matrix is shown in Table 8.1. The statistical cluster analysis used here sorts through the columns of the matrix and clusters those columns that have the smallest adjacent cell differences (distances). The various statistical software packages permit the user to employ any one of a number of rules for assessing the distances between the cell entries in adjacent columns. The Utah study employed a Euclidian measure of distance contrasting differences between the costs of the service of adjacent cells in a pair of columns, Distance , where i = 1, 2, . . . , 31 services. It was found that the 33 columns of 31 service-cost cell entries formed six clusters. In other words, there were six patterns (clusters) of services that could be considered together as program modalities to provide coverage for all of the consumers sampled. Stated another way, the services listed under a cluster had similar staffing and scheduling requirements for consumers with a given

< previous page

page_234

next page >

< previous page

page_235

next page > Page 235

Service 1 Service 2 | | | Service 31

TABLE 8.1 Schemata of the Data Matrix Used in the Cluster Analysis Factor 1 Factor 2 Factor 3 Minimal Moderate Severe Minimal Moderate Severe Minimal Moderate Severe $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx $xx | | | | | | | | | | | | | | | | | | | | | | | | | | | $xx $xx $xx $xx $xx $xx $xx $xx $xx

< previous page

page_235

next page >

< previous page

page_236

next page > Page 236

set of characteristics. Most important, the consumer characteristics could be described by their scores on the multifactored scale. Closing Thoughts on Evaluating Measures The well-known influences of initial impressions, when documented by a psychological assessment technique, can be a wonderful asset or an expensive liability. Unless the consumer chooses to terminate treatmentand many dothe initial course of treatment typically persists without major modification for months and often years. Using an assessment technique to screen and to plan treatment is a social commitment by the clinician(s) and those who pay for the service. The strengths, problems, and goals identified with the support of the assessment should guide treatment. Reliability and validity studies are conducted on the instrument so that there are hard data to support the decision to influence the life of another person, the client, and to use the clinical and economic resources needed in doing so. A theme that runs through the first part of this chapter is that there are methods to determine if an instrument is effectively doing the job. Even if there are studies in the published literature that demonstrate the instrument's reliability and validity, it is desirable to perform studies on its local application. An additional theme is that some of these methods can be used as the basis for staff training and development as well as an empirical basis for utilization and quality assurance review. Thus, the statistical procedures described here can be useful in the internal evaluation of a program's quality as well as in studies on the instrument's application. It might be informative to consider a brief example of an internal evaluation study on the consistency of clinical communication in a program serving persons with a severe mental illness. Consistent communication across members of a treatment service team is considered to be important in supporting this population of consumers. Suppose that it is found that the interrater reliability among members of the service team is low on an instrument that has been demonstrated to be reliable and valid for this population under well-controlled conditions. In this case, it might be suspected that members of the service staff have different frames of references when describing a consumer's functioning. If such differences (inconsistencies) exist and go undetected, then the team members will treat the consumer differently based on their different frames of reference. Because consistency is vital to successful outcome in mental health service planning and delivery, inconsistent communication will probably lead to a breakdown in the quality of care. Progress and Outcome The section begins with a description of four assumptions about defining treatment progress and outcome goals that need to be addressed prior to selecting a statistical procedure for assessing questions regarding client progress and outcome. The remainder of the chapter focuses on a sequence of six clinical service questions that set the stage for selecting statistical questions. Recommendations for selecting a statistical procedure are given under each of the following six questions: Did change occur, by how much, and was it sustained? For whom did it work best? What was the nature of the change?

< previous page

page_236

next page >

< previous page

page_237

next page > Page 237

What effort (dose) was expended? Did the state(s) or level(s) of functioning stabilize? What occurred during the process? Specifying Treatment Service Goals The selection of a statistical approach to describe treatment progress or outcome should depend on the anticipated goal(s) of treatment. There are four assumptions that must be explicit when specifying a treatment goal and selecting a statistical approach. The first is that the consumers (clients, patients) are initially at a clinically unsatisfactory psychological state and/or level of functioning, and that there is a reasonable probability of change or at least stabilization by a given therapeutic intervention. Second, an agreed on satisfactory psychological state or level of functioning, observably different from the initial state, can be defined for that individual or for that clinical population. Third, it is assumed that a measure, scale, or instrument is available that can reliably and validly describe the status of the person in that target population at any designated time. The fourth assumption is that the instrument's score(s) describing an individual at an "unsatisfactory" state is reliably different from the score(s) describing that same individual at a "satisfactory" state, and the scores are not limited by "ceiling" and "floor" effects. If each of the assumptions is met, then specifying treatment goals and selecting the statistical approach to estimate the relative effectiveness of client progress or outcome as described in the specific goals can proceed. To support this approach, the remainder of this chapter organizes the discussion of the statistical procedures around the six generic questions regarding the achievement of specific treatment goal(s) for specific therapeutic intervention(s). The six questions and related statistical approaches are ordered from a macrolevel to a microlevel of investigation. This is done to give a context for determining what should be studied and how the data should be analyzed. A statistical procedure must fit the context of the question being asked. The statistical literature has a rich history of controversy and discussion on which aspects of change should be investigated and how each aspect should be analyzed (Collins & Horn, 1991; Cronbach & Furby, 1970; Francis, Fletcher, Strubing, Davidson, & Thompson, 1991; Lord, 1963; Rogosa, Brandt, & Zimowski, 1982; Rogosa & Willett, 1983, 1985; Willett, 1988; Willett & Sayer, 1994; Zimmerman & Williams, 1982a, 1982b). Historical controversies can be avoided by carefully articulating the research or evaluation question. The criterion for a well-developed question is that it frames the appropriate unit and level of analysis relevant to the question. To use the following section, the reader should first formulate an initial draft of the question(s) and the unit of analysis. Next, the investigator should check that the four assumptions have been made explicit, modifying the question(s), if necessary. Then, investigators can match their research question(s) to those given next and consider the recommendations offered. Although there is no "perfect" method, the discussion should provide the reader with guidelines for identifying the best method for their own situation. Question 1 Did Change Occur, by how Much, and Was it Sustained? Did specific domains of consumer functioning or states change by the end of therapy? If so, by how much? Were the changes sustained over time? These are often considered to be the first questions that those developing a therapeutic innovation seek to address:

< previous page

page_237

next page >

< previous page

page_238

next page > Page 238

Does the therapy make a difference? There are two general approaches that have been employed when addressing this question. One is to investigate the magnitude of the difference between the pretreatment and the posttreatment scores on the selected measure across subjects. The second is to contrast the trends on the status measures taken on subjects over time (pre-, during, post-, and follow-up to treatment). Each approach has its strengths and limitations, but they both have the potential to provide a gross estimation as to whether the therapeutic intervention makes a difference. Difference Scores. The difference score, Di, is often considered the most basic unit of analysis, with Di = (Xi1 - Xi2), where Xi1 and Xi2 are the observations recorded at Time 1 (usually prior to treatment) and at Time 2 (usually at the end of treatment) for the ith person. The mean values of Di can be contrasted between groups or within a single group against an expected outcome of no difference (D = 0.0, or an equal number of positive and negative values of D). There are two issues to be addressed here. One is to decide whether to use Di as the basic unit of analysis. The second is the research design used to address the question. There is a extensive and controversial literature on whether to use Di. Discussion has focused on two features of Di: the reliability of the difference score is inversely related to the correlation between the pre- and posttreatment measures, and the potential for a correlation between initial (pretreatment) status and the magnitude of the difference score. The potential bias introduced by these two features led Cronbach and Furby (1970) to recommend that the difference score not be used at all. They recommended that researchers instead concentrate on between-group outcome, posttreatment measures. Others have argued for the use of alternatives such as a residualized gain score, where the difference score is adjusted for initial, pretest differences (Webster & Bereiter, 1963). There is, however, an opposing point of view. Rogosa et al. (1982), Rogosa and Willett (1983, 1985), Willett (1988, 1989), and Zimmerman and Williams (1982a) collectively developed arguments with sufficient empirical support to conclude that difference scores were being damned for the wrong reasons. They also provided strong evidence that some of the most popular solutions (e.g., the residual gain score) have worse side effects than the problems those they sought to solve. The potential inverse relation between the reliability of D and the correlation between pre- and posttreatment scores is not necessarily a problem. The difference score is an unbiased estimator of change (a process), and scores at each of the other times (pre-and post-) are estimators of status (not a process) at those two respective times. When there is low reliability in D, it is to be interpreted as no consistent change. But as Zimmerman and Williams (1982a) showed, it is possible for the reliability of the difference score to exceed the reliability of the pretreatment score or the reliability of the posttreatment score. When this occurs, it is still valid to conclude that there is reliable change for persons even though there were unreliable measures obtained at Time 1 and at Time 2. Although there is a problem of measurement error at each of these times, it can still be concluded that there is a consistent change (a reliable process) among the inconsistent measures of status at each time. The second problem regarding Di pertains to the correlation between the magnitude of a difference score and the value of the pretreatment score. It is intuitively obvious that a person with a low pretreatment score, indicative of more severe psychological maladjustment, would appear to have a greater probability of obtaining a higher score on the second occasion (posttreatment). Despite this obvious relation, Rogosa and Willett (1983) showed that there is a "negative bias" in the estimate of the correlation

< previous page

page_238

next page >

< previous page

page_239

next page > Page 239

between the initial score and the difference score. Negative bias must not be misinterpreted as a negative correlation (Francis et al., 1991). Rogosa and Willett (1983) amply demonstrated that a raw difference score is not the best statistic for estimating the correlation between initial status and change. A number of texts recommend using a residual gain score to estimate the differences. The residual gain score is calculated by adjusting the difference score by the correlation between initial level and either the posttest score or the difference score. However, Rogosa and Willett (1985) and Francis et al. (1991) recommended that the residual gain score is a poor choice and should be avoided. There are two critical flaws with using the residual gain score. First, when used in the context of clinical practice, it describes a state of affairs that does not exist in reality (adjusting subjects to be at the same initial level, which is seldom, if ever, true). Second, it adjusts a measure of change (a process measure) with a status measure (pretreatment scores). The resulting statistic is no longer an unbiased measure of the change process because it was adjusted by a measure of status, which contains its own unique sources of errors. Should a difference score be used? The answer is yes, if the research question is simply stated (e.g., "Is there change related to treatment?"). Unfortunately, the issues related to a treatment intervention are often more complex. At a minimum, the investigator typically questions the treatment's differential effects with regard to one or more consumer characteristics over the course of treatment. Recently, investigators have become interested in the sustenance of the change after treatment has formally stopped. Some of these concerns can be assessed at the level of the first question (Did change occur, by how much, and did it sustain?). In these instances, the issues when addressing these refined questions pertain to design: What groups need to be contrasted? When does one need to sample consumer behaviors? In other instances, the refined question requires going to another level of focus. Comparing Trends in Status Measures Over Time. For those question refinements that can still be articulated as "Did change occur, by how much, and did it sustain?," issues of design and sampling time frames need to be identified. One design issue that can be easily dealt with is whether a Solomon four-group design is required in the evaluation of a therapeutic intervention. The Soloman four-group design attempts to control for or estimate the effects of carryover from pretest to posttest. Half of the subjects in the treatment group and half of the subjects in the control group are randomly assigned to a "no pretreatment" test condition and half to both a pre- and a posttreatment test condition. The carryover effects of testing that are estimated via a Solomon four-group design may not be a factor in most treatment research. Most consumers who enter therapy are not naive as to what their problems are, nor are they naive as to the general purpose of the intervention. This is particularly common in the experience of those working with persons with a severe mental illness or with those who are substance abusers. Moreover, work by Saunders (1991) has clearly shown that the majority of those who have entered psychotherapy and go beyond two sessions had prior experience with at least the intake process for psychotherapy. Given this, serious researchers studying a particular therapy or therapies should, if the opportunity presents itself, run a pilot study with a Solomon four-group design to assure themselves that such carryover effects are not a significant source of variance. There are two remaining design issues: What groups ought to be sampled? When does one collect data? The quick answer to the first is to sample those groups that satisfy the question and eliminate alternative explanations. As discussed earlier, it is

< previous page

page_239

next page >

< previous page

page_240

next page > Page 240

best to partition consumers by levels of any characteristic that is expected to modify the impact of the treatment. Enough has been written about the difficulty of interpreting single-group results so that most researchers will understand the need to develop either a ''waiting-list" control or a "treatment as usual" control to contrast with an innovative treatment. There is also a quick answer to the second question: If possible, collect data two or more times during treatment and two or more times after treatment. An optimal minimum is to collect data at four times: pretreatment, half-way through treatment, posttreatment and at least once in follow-up (e.g., 6 months following treatment termination). In this case, the inclusion of a fifth time pointsay, 1 year following treatmentallows for the testing of the stability of posttreatment effects. A typical design here would be a mixed between-group repeated measures design with two between-group variables (treatment variable, aj, and an initial consumer characteristic identified as a "moderator" variable, bk) and one within-group (time) variable of two, three, or four levels. Here, an analysis of variance or multivariate analysis of variance of the linear and quadratic trends can describe between-group differences in the direction and rates of change over time. Between-group contrasts of the linear trends within each group offer a test of whether direction and magnitudes of change vary as a function of groups. Between-group contrasts of the quadratic trends within each group would describe whether there are significant differences among groups in how their initial changes were modified over time. For the analysis of functional status over the three times from pretreatment, midtreatment to immediately posttreatment, the between-group contrast of quadratic trends would describe change over the course of treatment. When considering the changes between pretreatment and followup, evidence of regressive or sustenance trends could be tested. As was true of other forms of univariate and multivariate analysis of variance, the standard statistical packages have programs that can perform these analyses. One problem with these forms of analysis is that they do not tolerate missing data. They discard all subjects with data missing at any one point in time. But, if the research questions are at the macrolevel of group trends and effects, then these designs will be adequate. From experience, most investigators using these designs are initially satisfied, but then wish to understand some of the differences among subjects within the groups. Here, the analysis of variance methods are limited and the more microlevel questions discussed later are found to be more satisfying. In recent years, random regression models have increasingly dominated the way in which change is described and analyzed. Random regression models define the set of general statistical models that allow for change to be modeled over time such that individual trajectories are compared to one another, and then groups of trajectories are compared for statistical differences. The more common of random regression models seen in the literature is the Hierarchical Linear Model (HLM; Bryk & Raudenbush, 1987). HLM has found success in the outcome literature because of its ability to model individual and group trajectories and relate these to important clinical questions. Hierarchical Linear Models are so named because of the hierarchical structure of data required for analyses. In the present scope of mental health outcome, consider one typical hierarchical nesting structure occurring in these studies. Specifically, repeated measures of a symptom are nested within individuals and individuals nested within groups (such as treatment vs. no treatment, or innovative treatment vs. treatment as usual). The discussion of HLM begins here in a limited scope by examining the question of whether changed occurred. HLM is discussed later using its full application in the section, "Question 3: What was the nature of the change?"

< previous page

page_240

next page >

< previous page

page_241

next page > Page 241

HLM provides for statistical analyses very similar to repeated measures analysis of variance (ANOVA). However, HLM moves beyond ANOVA-based models by relieving the necessary restrictions these models place on data. ANOVA-based models require that time points be unvarying when data are collected. Therefore, data collected from subjects at 6 months postadmission result in a hard and fast rule of a 6-month interval that must be applied throughout the study. However, psychotherapy, as an example, rarely follows clean delineation of time. Termination from psychotherapy varies and thus is difficult to characterize as simply X months after admission. For the sake of analysis, the involvement of data collected at termination in a series of data collection time points is not immediately possible because of the ANOVA restrictions. Another situation where HLM provides advantages over ANOVA-based techniques involves missing data. Missing observations have serious impact on ANOVA-based techniques. The result of a single missing observation is the listwise (complete) deletion of the case. Radical loss of power can result when observations are missing at a single time point, even if data at subsequent time points are present. Because HLM extrapolates trends based on available observations, missing data does not result in a harsh elimination of observations. Rather, whatever data is available is used in the estimation of the group's change. Thus, data points are recovered in HLM when cases are lost due to any form of attrition and power retained. A broader discussion of rates of change is forthcoming in Question 3; however, several recommendations and considerations about HLM and ANOVA-based techniques are important to consider at this stage of discussion. First, HLM offers the advantage of allowing for time points that vary across subjects. This is only relevant when more than two time points have justifiable differences in terms of time. However, the advantages of HLM over two, and even three, time points is not defensibly superior to ANOVA-based techniques such that they warrant one to conclude that either is better. Beyond the fact that they have served as an historical bedrock, ANOVAbased techniques are rudimentary in describing whether change occurred (the point of Question 1). HLM, too, will describe whether two groups differed significantly from one another over two time points. The advantages of HLM are not immediately apparent until three more time points are included in the analyses. Then, HLM can answer important questions about trajectories and the nature of change that can only be described in a very rudimentary fashion with ANOVA-based techniques. Question 2 For Whom Did it Work Best? What were the characteristics that differentiated those who did from those who did not achieve a satisfactory state or level? There are two levels of discussion required of this question. The first is a review of the historical issues and findings of clinical investigations on the interactions between consumer characteristic and treatment. This concludes with recommendations focusing on how a treatment's theory can be used to identify design variables and classes of consumer variables that may interact with and impact on the outcome of treatment. The second level of discussion focuses on analytic methods that follow from how the design is developed. The study's design, of course, would follow from its theoretical rationale. When investigating the relations between initial status and change, the research question should be refined to focus on the consumer characteristic that predicts different rates of change. The refined question considers the characteristic related to initial status

< previous page

page_241

next page >

< previous page

page_242

next page > Page 242

as a moderator variable (i.e., that client attribute that alters the effects of the treatment variable). In other words, individuals in one category of a moderator variable experience a different outcome than individuals in other categories of the moderator variable. The logic here focuses on the interaction between the moderator variable and the treatment variable. For the simplest case, the values of Dijk would be predicted by the consumer's initial level on that the characteristic, bk, to moderate the potential impact of the treatment, aj, and this influence will be observed as an interaction effect, abjk. This conceptualization results in an expression that can then be evaluated by either a regression or analysis of variance using the standard statistical computer packages:

There is a strong logical argument for incorporating a well-conceived consumer characteristic as a moderator variable in most therapeutic intervention studies. The field has long claimed that individual therapeutic approaches are not all things to all people. But it is also becoming apparent that this question needs to be refined. A consumer characteristic moderator variable is just that: A variable that potentially moderates the degree to which the therapeutic intervention will have impact because of a characteristic brought into therapy by the consumer. It also could be argued that if the theoretical construct underlying the therapeutic intervention is adequately developed, then the moderator variable(s) should be easily identified. It is also possible that the variable might be a mediating variable rather than a moderating variable. Here a mediating variable is one whose refinement, presence, or absence in the design is required in order for the therapeutic intervention to be observed. Shadish and Sweeney (1991) showed that effect sizes in psychotherapy studies were directly related to moderator variables such as the outcome measure selected, standardization of the therapeutic interventions, and setting in which study is conducted. By incorporating an appropriate consumer characteristic as a moderator variable in the research design, the investigator will be able to obtain estimates of relations that can then be employed in a testable structural equation describing how the various variables come together to produce a therapeutic outcome. Client-Treatment Interaction. The client-treatment interaction literature is mixed. Significant interactions of client characteristics and treatment approaches, particularly within the psychotherapy interventions, are typically not found (Garfield, 1986; Shadish & Sweeney, 1992; Shoham-Salomon, 1991). A similar situation exists in educational research regarding student aptitude by instructional technique interactions (Cronbach & Snow, 1977; Snow, 1991). This has fascinated and puzzled investigators for some time and a number of logical explanations have been offered, often at odds with each other. Smith and Sechrest (1991) questioned whether such interactions exist. Beutler (1991) argued that there are a number of significant examples with important theoretical impact, albeit few, to recommend continuing development of research techniques that can surface the nature of these interactions. Across critics and defenders of investigating the client characteristic by treatment interaction, there appears to be agreement on the issues that need to be addressed if such research is to be done. All agree that better theory development is needed, where the investigator ought to articulate the answers to several questions: Which consumer characteristics will moderate differential outcome and why? Which behavioral progress or outcome measures are sufficiently sensitive to detect the interaction effects? What courses or rates of change are expected among the client-by-treatment groups and why?

< previous page

page_242

next page >

< previous page

page_243

next page > Page 243

The studies that are the exceptions (i.e., showing significant interactions) selected dependent measures that were directly related to the theory being tested. The dependent measures were of two types. In one type, the investigators selected measures that described client behaviors closely linked to the theory underlying the intervention (e.g., Shoham-Salomon, Avner, & Neeman, 1989; Shadish & Sweeney, 1991). The second type of study employed measures of effort as the unit of analysis. Howard, Kopta, Krause, and Orlinsky (1986) showed that persons treated for anxiety had different dose-effect curves than those treated for depression. Turner, Newman, and Foa (1983) contrasted the costs of follow-up treatment for persons who had successfully completed a flooding treatment for their obsessive-compulsive behaviors, but who had different styles of conceptualizing information. The two conceptualization styles contrasted were those that used a one-dimensional cognitive style expected of persons with obsessive-compulsive behaviors, and those with a "normal" threedimensional cognitive style. Although all subjects achieved the experimental criteria of successful outcome in terms of their obsessive-compulsive behaviors (e.g., excessive hand-washing) as Beutler (1991) predicted, the amount and costs of follow-up psychotherapy over the next 12 months differed between groups. Ironically, most of the subjects with a one-dimensional cognitive style sought out additional psychotherapy care focusing on general anxiety over the next 12 months. Those who exhibited the three-dimensional cognitive style typically avoided follow-up mental health (psychotherapy) care. A 12-month follow-up showed that those who engaged in follow-up psychotherapy had lower levels of general anxiety. Newman, Heverly, Rosen, Kopta, and Bedell (1983) analyzed intake, termination, and service cost data on 949 clients in New Jersey community mental health programs. Clients with unstable employment, history of aggressive behaviors, and an unwillingness to be in treatment (as perceived by the intake clinician) had statistically higher costs of services during the period from intake to discharge. A major problem in testing for interaction effects is the fact that tests of interaction effects under traditional analysis of variance and regression techniques have lower power (McClelland & Judd, 1993; Jaccard & Wan, 1995). Through the use of simulation studies, these investigators have provided ample demonstration that detecting interactions and moderator effects typically have low power, even when difficulties due to measurement reliability are minimal. When measurement reliability is a problem, issue of power worsened. But there are some alternative techniques emerging. Jaccard and Wan (1995) found that structural equation approaches are more powerful in detecting such interactions. Even more exciting are the findings by Willett and Sayer (1994), who discovered that structural equation modeling can also be employed to test differences in pattens of change among different subgroups within and between treatment groups. Willett and Sayer ended with the strong recommendation that "we can measure `change'and we should" (p. 379). Several authors (Bryk & Raudenbush, 1987; Lyons & Howard, 1991; Willett & Sayer, 1994) have described procedures that can be called on to identify when a source of error variance may be covering the interaction of a moderator and a treatment variable that should be investigated in follow-up research. However, as indicated earlier, one alternative explanation for large error variance is measurement (un)reliability (Lyons & Howard, 1991). In explaining the lack of significant interaction effects, Beutler (1991) also noted that most consumers who are involved in research are typically motivated to achieve a satisfactory psychological or functioning status. The same could be assumed of the

< previous page

page_243

next page >

< previous page

page_244

next page > Page 244

treating clinician. Although the beginning and the end points of the treatment may look similar for consumers with different characteristics, the processes for getting to the end point may be different. Beutler recommended that investigators consider obtaining more data to describe the progress during treatment. The form of analysis should focus on the course and rate of change during the treatment time frame. If this argument is convincing, then the reader should also consider the analyses discussed under the next question, "What is the rate of change? " In fact, the examples discussed in that section did show rates of change that were related to client characteristics (Francis et al., 1991; Willett, Ayoub, & Robinson, 1991). The next several sections explore the various alternatives to assess the question, For Whom? Regression or Analysis of Variance of "For Whom?" A number of authors have argued that it is best to use a traditional approach to describe and test for the nature of an interaction (Cronbach, 1987; Rosnow & Rosenthal, 1989). The expression recommended by Rosnow and Rosenthal (1989) to describe an interaction score, ABjk, for the jkth cell, influenced jointly by the jth treatment level and the kth level of client characteristic is:

An analysis of the interaction's significance is simply that of creating an F test of the proportion

The numerator represents an estimate of the interaction after adjusting for the main effects of treatment and client characteristics. The denominator is an estimate of the error of prediction that is typical (averaged) across combinations of row and column effects. When the ratio is much larger than 1.00, it can be assumed that the differences among the cell means, after adjusting for main effects, is greater than differences due to measurement error. Thus, the F ratio provides an estimate of whether treatment effects differ in some systematic way over levels of the client characteristic. There is one worrisome assumption in the analysis of interaction effects. It is technically identified as the assumption of independence. This assumption holds that there is no correlation between error of measurement within the cells and any of the three between-group sources of variance: levels of treatment effect, levels of client characteristic, combinations of treatment by client characteristic. Why is this considered to be important? First, most tests for a significant interaction effect (e.g., the F test given earlier) are not valid because both the numerator and the denominator terms are influenced in different ways by the relations between random error and one or more of the independent variables. When this occurs, the statistical tests of the hypotheses will be either positively or negatively biased, depending on the nature of the relation. A second issue regarding the independence assumption is that the research question needs to be reconsidered given that the existence of such correlations indicate that the interaction is not one of simple mean differences. If one or more correlations are significant, then the mean effects are likely to be confounded with one or more of three sources of random error: interactions with item difficulty, error of measurement, or temporal effects. Meyers and Well (1991) recommended a rough rule of thumb to determine when to be concerned about the existence of heterogeneity of variance that may be related to such correlations (i.e., hetroscedasticity). They suggested creating a ratio of the largest

< previous page

page_244

next page >

< previous page

page_245

next page > Page 245

to the smallest within cell variance (what is sometimes called the F-MAX test), and if the ratio is greater than 4:1 for equal n designs and 2:1 for unequal n designs, then there is reason for concern. Here Myers and Well joined Kenny (1979) in advising a direct analysis of scatter plots and the correlations between individual observations and group means. Specifically, consider calculating each of three correlations: Xijk with Aj treatment groups across levels of Bk, Xijk with Bk client characteristics over levels of Aj, and Xijk with all levels of ABjk. If the magnitude of a correlation is equal to or greater than .2, then look at the covariance structure of the treatment and client characteristics with the measures of change. Hoyle (R. Hoyle, University of Kentucky, personal communication, November 1993) pointed out that an additional problem in regression with interaction terms is multicollinearity. Frequently, the interaction term will be correlated .90 or more with one of the main effect terms. A strategy recommended by Hoyle is to center (i.e., subtract the mean from each subject's score) the two main effect terms before creating the interaction term. In summary, the use of analysis of variance or regression models are direct methods of testing for an interaction effect. The scarcity of significant treatment by client characteristic interactions suggests that either the theories relating treatment to client characteristic are fuzzy, or the measurement technique selected is inappropriate, or the analysis of variance model (with easily violated assumptions) is inadequate. Another possibility is that Smith and Sechrest (1991) were correct when they asked whether there are any significant sources of interaction variance to detect. Structural Equations. Structural equation modeling (SEM) represents a method that can describe the magnitude of the relations among independent variables as antecedent to outcome (i.e., they are either causal to, or mediators of, outcome). Consider the hypothetical example shown in Fig. 8.1 where there are two paths to outcome (effect) from the treatment variable (cause). One path goes directly from the treatment variable to the outcome measure. The strength of this relation is described by the parameter (a). When estimated, the value of (a) describes the amount of change in outcome that can be predicted by one unit of change in treatment. When treatment is a dichotomous variable (experimental versus control), then the value (a) would be the coefficient that best predicts effect size in a point-biserial regression equation. The coefficient represents the mean difference between groups on the dependent variable. When the treatment variable is ordinal or quantitative (e.g., dosage), then the value of (a) could be the coefficient in a regular regression equation. At this point, note that in analysis of variance, and particularly regression, models can be thought of as elementary structural equation models. The second path to outcome in Fig. 8.1 includes the client characteristic. This route to outcome has two coefficients, (b) and (c). In structural equation analysis, one will often develop a picture of the alternative paths to be considered, and then develop and contrast structural equations that predict the relations for each path. In a formal structural equation analysis, one could contrast the strength of each of the outcome prediction equations developed to represent each of the paths to the outcome variable(s). The critical issue is whether considering the client characteristic enhances the prediction of outcome over the more direct path as described by (a). If the combined relationships of (b) and (c) have greater predictive value of outcome than (a) alone, then the addition of a client characteristic will improve the prediction of outcome for a given treatment level.

< previous page

page_245

next page >

< previous page

page_246

next page > Page 246

Fig. 8.1. Two paths that could be used to describe relations between treatment and outcome. A hypothetical example of hypnotherapy focusing on a classroom anxiety and the client characteristic of susceptibility to hypnosis to illustrate the causal model. According to Kenny (1979, p. 44), there are five basic steps in creating a structural model to perform a path analysis: 1. From theory, draw a set of structural equations that describes relations among the independent variables (called exogenous variables) and the effect (dependent or endogenous variables). This step is no different from what has been emphasized throughout this chapter. The prediction of an influential client characteristic must be based on strong theoretical assumptions of how the client characteristic will modify the effects of the treatment on a given set of observable behaviors. 2. Choose a measurement and design model. The measures must be directly related to those that the treatment is supposed to influence, and the incorporation of these measures into the design model must also be consistent with the theory of the intervention. Here the reader is asked to recall the four assumptions regarding selection of measures for an outcome analysis described at the beginning of the chapter. The design model (true or quasi-experiment) needs to be one that is sufficiently powerful to detect potential differences. Not only do researchers need to identify groups that will eliminate alternative hypotheses, but also levels of the client characteristics that offer the potential to show differences. 3. Respecify the structural model to conform to design and measurement specifications. As is true in any test of a model, the reality of collecting data in the real world takes hold. This step is stated formally to remind the researcher that an error of the third kind can be easily committed: to test a prediction with the wrong statistical model. 4. Check that there are sufficient numbers of predicted relations (each called an identification) to test the model. One of the dangers of using structural modeling is that a model can be created that has more unknowns than can be estimated. The major restriction here is the degrees of freedom in the covariance matrix, where the number of predicted relations cannot exceed the

< previous page

page_246

next page >

< previous page

page_247

next page > Page 247

number of terms identified to covary with each other. It is possible that a study will not produce sufficient information to reach a conclusion about the best model. 5. Estimate the covariances between the measured variables, and from the covariances, estimate the parameters of the model and test hypotheses. This is the bottom line where researchers can decide whether the data support a conclusion that by considering the client characteristic they enhance the prediction of treatment on outcome. There are two major limitations of structural modeling. First, the investigator must be able to develop very specific predictions. The discipline of developing such specific predictions has not been a common practice in clinical research. Thus, its use here will require the researcher to be more explicit in what is being predicted and why. The other is that there must be sufficient data to provide stable estimates of correlations (covariances). This often requires sample sizes in the hundreds rather than those frequently obtained in treatment outcome studies. The reader is referred to the classic texts written on structural modeling (Bentler, 1989; Bollen, 1989; Jöreskog & Sörbom, 1988; Hayduk, 1987; Kenny, 1979). Those not familiar with the matrix algebra might find the classic texts by Kenny (1979) or Hayduk (1987) as the best starting point. With new advances in technology, structural equation modeling programs such as LISREL, AMOS, and EQS are become increasingly accessible. That accessibility has been translated in increasing ease of use via graphical interfaces that allow for drawings rather than programming code to create models (as in EQS and AMOS). In the history of SEM programs, early versions of LISREL, for example, required the user to learn an arcane programming code consisting of the names of the matrices, numerical values, and few descriptors (Jöreskog & Sorbom, 1988, 1993). Although LISREL continues to read this code, the move toward graphical interfaces is part of the package that was available at the time of this writing. There is a troubling side to this ease of use. SEM is more than a simple test of parameters generated from covariance matrices resulting in a set of fit indices. SEM requires that the user develop a theoretical rationale, based on support from the empirical and theoretical literature for the model being tested. Previously, the development of the model required substantial effort on the part of the investigator as computer programming code was generated to represent the predicted relation. In present day, that has been replaced by graphical interfaces that allow multiple models to be generated in a single sitting of about an hour. The rapidity at which models can be developed places demands on the investigator such that each model developed must be given careful thought, with support from the literature. This is, lamentably, not done as often as it should (Scandura & Tejeda, 1997). The latest versions of SEM software permit automatic modification of the model under test such that a reasonable post hoc alternative can be considered. When this happens and the user pursues the model that results from the automatic modification, the confirmatory logic of SEM is lost and the analysis becomes exploratory. It is critical to note that unlike traditional hypothesis testing, SEM provides the opportunity of confirming and falsifying theoretical models. Thus, the construction of models based on automatic modifications obviates the important duty of falsifying existing theories in an effort to improve understanding of social, psychological, and behavioral processes: The use of automatic modification is simply pure mathematical speculation and solely exploratory. The reader may wonder whether these rantings manifest concerns. Figure 8.2 provides real data suggesting that the latent variable of bigotry is positively related to the latent variable of violent intentions; and the latent variable of self-esteem is negatively related

< previous page

page_247

next page >

< previous page

page_248

next page > Page 248

Fig. 8.2. A structural equation model describing the relation of bigotry and self-esteem on violent intentions. to the latent variable of violent intentions. Bigotry, self-esteem, and violent intentions each have two indicators depicted by squaresthe convention in SEM. Without entering into detail, the model fits the data quite well, with most fit indices exceeding .95. Figure 8.3 provides the results of automatic modifications that have resulted in fit indices exceeding .99; however, the resulting diagram now includes two new factors and a host of new paths. Whereas Fig. 8.2 provides an interpretable model, Fig. 8.3 presents a near saturated model of relations that fail to be parsimonious. Do "Something" and "Something Else" really add to the understanding of bigotry, self-esteem, and violent intentions? Although this example represents an extreme in the use of automatic modifications, more conservative uses of automatic modification can result in equal gibberish. Therefore, given that technological changes are inevitable, what recommendations can be made to the use of SEM knowing the concerns just mentioned? First, nothing replaces good theory and careful reflection. The cornerstone of structural equation models must remain careful assertions derived from theory and empirical findings. Second, it is profoundly helpful to construct a graphical representation of any structural equation including the measurement model. By producing a graphic, hypothesized relations become very clear. This is exceptionally helpful if the model will be run by another party, such as a statistician. Rival models should be depicted graphically as well. Third, negative findings should result in reflection as well as publication. Findings that fail to support theory form the basis of the theory's refinement. This refinement should not be driven by mathematics, but rather by careful thought. Question 3 What Was the Nature of the Change? There are several variations on this theme: What was the rate and character of the change? What did or did not change? Are the rates of change consistent within a treatment group?

< previous page

page_248

next page >

< previous page

page_249

next page > Page 249

Fig. 8.3. A modified structural equation model recommended by an automatic modification. Are the rates of change different among treatment groups? How quickly do individuals or groups of subjects achieve a satisfactory level and then stabilize at that level? The character of the predicted changes are only limited by the constraints of the investigator's questions and the study's design. For most studies, a simple linear change is predicted: What are the rates of change over time? For others, both a linear and a quadratic function are of interest. For example, the investigator could ask whether, in addition to potential differences in the rate of change over time, there are differences in how soon the predicted behaviors plateau (i.e., reaches an asymptotic level)? It is also possible that an investigator will predict three types of change functions over time: first an initial increase, then a plateau, and then a continued increase. This is what is called a cubic function. There are two prominent changes in direction. Consider this example: After an initial performance increase due to symptom relief, performance either plateaus or has a slight decline when the client discovers the "tougher problems" (e.g., that they must take some degree of responsibility for managing the factors influencing the problem). Further improvement can only occur when a strategy to deal with the "tougher problems" becomes evident. As the research question moves from linear to quadratic to cubic predictions, the demands on the study's design increases from at least two waves of data collection (i.e., at two different time points), to three and four waves of data collection. As is discussed later, most methodologists argue that at least three waves of data collection (pre-, during-, and posttreatment) are better than two. But many practitioners might argue that taking repeated measures on a client is intrusive on the clinical process. Another

< previous page

page_249

next page >

< previous page

page_250

next page > Page 250

strategy might be to collect pre-, post-, and 6- or 12-month follow-up data. But collecting follow-up data is both costly and often results in more missing data. There are two general approaches recommended to investigate the nature of change. Each approach has a different emphasis on the character of change, and its own assets and liabilities. The traditional approach is a repeated measures analysis of variance (univariate or multivariate), where between-group linear or curvilinear trends are contrasted among treatment groups (Myers & Well, 1991; O'Brien & Kaiser, 1985). A more recent development is growth curve analysis. This technique can be employed to describe changes in performance (for individuals and for groups) over time as part of an ongoing process (Bryk & Raudenbush, 1987; Francis et al., 1991; Rogosa & Willett, 1985; Willett et al., 1991). The analysis of variance model uses the semantics of, and restricts inferences to, differences among group trends over time relative to random variances of the trends among subjects within the groups. The semantics of growth curve analysis emphasize change as a process. The unit of analysis is the measure of change in the target behaviors over time for each subject. Growth curve analysis can develop hypotheses and describe results in terms of the process of change in individuals' behavior over time, as well as contrast observed change processes among treatment groups. The starting point for growth curve analysis is to select a growth model (or models) that can be used to estimate the measure of the change process for each individual. Most applications have employed a simple linear or quadratic expression to describe the change process; but, almost any mathematical expression can be used to describe the change process (e.g., linear, quadratic, cubic, exponential, Markovian). The major restriction is that there needs to be a theoretical basis for the model. For the purposes of exposition, discussion will center on predictions of linear change, using an example of treatment of families at risk of maladaptive parenting, child abuse, or neglect (Willett et al., 1991). These investigators were interested in within-family growth of family functioning (Ayoub & Jacewitz, 1982) over the course of treatment as it is related to entry level of family violence/maltreatment. The basic linear model used to describe growth in family functioning (FF) for the ith family over time, t, in months was:

The term pli is the slope of Equation 14 and is the unit of measure of the analysis. When this term is positive, change is in a positive direction, representing increasing levels of family functioning over the months of treatment. The estimation of pli is obtained by using an ordinary least squares procedure, where the monthly level of family functioning is entered to estimate the slope of the best-fitting straightline. This is easily estimated with any of the standard computer packages (e.g., MINITAB, SAS, SPSS, or SYSTAT). Stable estimates of the slope parameter can be found when measures are taken at three or more different times. Two-point estimations (e.g., pre- and posttreatment) typically do not have sufficient stability. Moreover, as Beutler (1991) suggested, the course and rate of progress (change) during the time period between the pre- and posttreatment could be where differences due to client characteristics are detected. This was the case in the two studies describe next. Once the estimate of the change parameter is obtained (pit in this case), it can be entered into its own prediction equation to fit the study design. The study conducted by Willett et al. (1991) focused on describing growth rates as they related to entry levels of family functioning (FF), violence/maltreatment (VM), and number of distressed

< previous page

page_250

next page >

< previous page

page_251

next page > Page 251

parenting problems (DD). The between-family linear regression model used to describe the predictive relation was:

The investigators found each of these factors to be significant contributors to rates of change in family functioning. For example, the growth rates of families with four or more parenting problems were slower, requiring more treatment to achieve a satisfactory level of family functioning. Although the example used here focused on a measure of linear change for one treatment intervention, it should be obvious that other measures of change can be estimated as well. For example, Francis et al. (1991) employed a quadratic function predicting the effect of treatment on the level of visual motor impairment (Yit) at time t, for children with head injuries:

Each subject's slope coefficient, pli, and the quadratic coefficient, p2i, is entered into a regression equation with three patient characteristic variables as predictor variables: age at onset of injury, initial severity of injury, and evidence of pupil contraction impairment. Age and initial severity were significant predictors of both the slope and quadratic coefficients. These investigators used an hierarchical linear model (HLM) developed by Bryk and Raudenbush (1987) and a software package for HLM (Bryk, Raudenbush, Seltzer, & Congdon, 1986) to test the predictors of the rate coefficients. This software package also provides estimates of the proportions of ''true" to "total" variance in rate measures accounted for by the model. In this case, the models developed by Francis et al. (1991) accounted for 79.4% of the variance among the subjects' rates of change. There are several additional advantages of growth curve analysis when contrasted with the traditional trend analysis of variance models. First, the number of repeated measures per subject does not have to be equal, nor does the interval between measures need to be the same. All that is required is a sufficient number of repeated measures to estimate the coefficients of change specified in the prediction model. This is not to say that the investigator does not need to be concerned about how many measures are to be taken or when in the course of treatment the observations are made. Both concerns, along with the structure of the prediction model itself, will influence the precision of the rate measures. If the number and spacing between observations is too haphazard, the precision will deteriorate sufficiently to prevent any significant findings to be detected. However, no data are lost due to minor variations in the number of observations or due to slight variations in spacing between observations. The HLM computer program does adjust for the degree of precision by considering the number and spacing between observations, when estimating each rate measure and when conducting the test of the prediction equation. Thus, the investigator can inspect the degree to which variation in numbers of observations and spacing has detracted from the estimation of the model's fit to the data. In summary, each of the two approaches has its uses. The best one is that which best fits the question raised by the investigator. The investigator, however, ought to experiment with the logic of both forms of analysis (differences in trend vs. mean differences in rates). The logic of repeated measures trend analysis of variance focuses on the average between-group differences in outcome over fixed intervals of time. If measures are taken at more than two intervals, then trend analysis can also test for between-group differences in trends; however, all subjects must have an equal number of observations taken at equal intervals. Subjects with missing data are discarded from

< previous page

page_251

next page >

< previous page

page_252

next page > Page 252

the entire analysis of trends. The growth curve analytical technique changes the focus (and unit) of analysis from the magnitude of a behavioral measure to its direction and rate of change. Growth curve analysis does not require that all subjects have the same number of observations or that the spacing between observations be exactly the same, although excessive variation in either will deteriorate the precision and therefore the power to detect significant differences. The technique of growth curve analysis is still too new to determine whether actual studies will be as productive as early billing provides. But if Beutler (1991) is correct, then these forms of analyses may bring to the surface the elusive treatment-by-client characteristic interactions. Question 4 What Effort (Dosage) Was Expended? What was the amount of time or effort expended for a consumer to achieve a "satisfactory" psychological state or level of functioning? Although the issues underlying this question have been proclaimed for some time (Carter & Newman, 1975; Fishman, 1975; Yates, 1980; Yates & Newman, 1980), it was not until the late 1980s that this question began to be recognized as part of a fundamental issue of outcome research (Howard et al., 1986, Newman & Howard, 1986). Recent interest appears to be centered on the economic concern regarding the worth of the investment in mental health care, rather than a scientific concern on how much is enough. The text by Yates (1996) is a wonderful exception. Yates provided a clear description of the steps needed for the research to understand how to analyze the costs incurred in the therapeutic and material efforts included in the procedures, processes, and outcomes of clinical and human services. Measures of effort are often easy to develop and are readily available to the researcher if there is a plan to collect the effort data. There are three major classes of effort measures that have served as either predictor or dependent variables in psychotherapy and mental health services research. They are dosage (i.e., the number of therapeutic events provided over the period of a clinical service episode), the level of treatment restrictiveness (i.e., the use of environmental manipulations to control the person's behavior during the clinical service episode), and the cumulative costs of the resources invested in treatment (i.e., the type of staff, staff time, and material resources consumed during the clinical service episode; Newman & Howard, 1986). Dosage is the measure most frequently employed when only a single modality is considered (e.g., number of days of inpatient or nursing home treatment). Restrictiveness measures can be developed at a sophisticated or a simple level. Hargreaves and his colleagues (Hargreaves, Gaynor, Ransohoff, & Attkisson, 1984; Ransohoff, Zackary, Gaynor, & Hargreaves, 1982) had panels of clinical experts, employing a magnitude estimation technique, scale the levels of restrictiveness for interventions designed to serve the seriously mentally ill. Newman et al. (1983) used a simpler approach to quantifying level of restrictiveness by giving a value of 1 to an outpatient visit, 2 to day treatment, and 3 to inpatient care in the treatment plans proposed by 174 clinicians for a standardized set of 18 cases. To create a dependent measure that combines dosage with restrictiveness of effort, these dosage and restrictiveness scores were cross-multiplied. Significant relations were found between this dependent measure and three predictor variables: levels of functioning at intake, level of social support, and level of cooperativeness at the start of treatment.

< previous page

page_252

next page >

< previous page

page_253

next page > Page 253

A measure of the costs of resources consumed combines the concepts of dosage and restrictiveness because the costs of staff time and the resources used to exert environmental control during the clinical service episode are summed to calculate the costs. However, the concept of employing costs as an empirical measure of therapeutic effort is still sufficiently new to the field such that there appears to be some misconceptions that inhibit its use in research. Newman and Howard (1986) described the three popular incorrect perceptions: 1. Confusion of costs with revenues. Revenues are the monies that come to the service from many different sources (payment of fees charged, grants, gifts, interest on cash in the bank). 2. Confusion of costs with fees charged. Fees charged may or may not cover costs of services provided. Profits accrue when fees collected are greater than costs, and deficits accrue when they are less. If all clients have similar diagnoses and problems, if they receive the same type and amount of treatment, and if the fees charged equal the costs of the resources used, then and only then do costs equal fees. However, for mental health programs and for private practice, the costs of the clinical efforts vary across consumers and therapeutic goals. 3. Confusion of costs and fees in private practice. This is being recognized as a myth by more and more private practice clinicians. Unfortunately, this myth is being perpetuated by third-party payer reimbursement practices where a single reimbursement rate is being set for a broad spectrum of diagnoses, independent of clients' levels of psychosocial functioning, social circumstances, and therapeutic goals. It is intuitively obvious to most private practice clinicians that not all clients require the same levels of care to achieve a satisfactory psychological or functioning level. It is also obvious that it is more profitable to restrict a practice to those consumers who can be profitably treated within the limits set by reimbursable fees rather than by the treatment goals achieved. Unprofitable clients might, unfortunately, be referred elsewhere. There are three statistical approaches that can be usefully applied to measures of effort: probit analysis, focusing on the cumulative proportion of the sample that has achieved a criterion of success (or failure) at each level (dose) of the intervention's events; log-linear analysis, using a multidimensional test of independence of two or more variables (each having two or more levels) in predicting two or more classes of outcomes; and univariate and multivariate regression and variance analysis, focusing on the unique characteristics and limitations of applying these traditional approaches to analyzing effort data. The shape of the distributions of measures of effort is of some concern. They are typically positively skewed, with as much as 3% to 10% of the subjects having effort measures 3+ standard deviations above the median. The first two approaches are less affected by the precise shape of the distribution of the measure of effort than the last approach. The focus of questions addressed by probit and log-linear analyses is on the relative frequency of observations that fall within given classes or ranges of outcome. The distribution of the measures of effort are more important for univariate and multivariance regression and variance analyses that have the usual assumptions of parametric statistics (e.g., normality, independence between within-group error variance, and group assignment). The difficulty of analyzing extremely skewed distributions is typically dealt with by one of two methods. One is to drop the "outliers," that is, those subjects with extreme scores (e.g., the top 10% or 20%) from the analysis. Another approach is to transform the values to produce a more normal appearing distribution. The arcsine and the log transformations are two popular transformations. These approaches may have negative consequences of either dropping data that should be considered or transforming the conceptual base to mean something other than it was originally conceptualized to mean.

< previous page

page_253

next page >

< previous page

page_254

next page > Page 254

Investigators often will invoke either of these approaches to deal with the statistical issues without considering what the implications are to the clinical aspects. Although probit and log-linear analyses do not require throwing away data, they do use a log transformation of the data during the analytic process. For the examples reviewed here, the probit analyses do appear to have preserved the conceptual basis of the studies as they were designed by the investigators. It also can be argued that both probit and log-linear analyses have their own set of negatives. The principal negative aspect is that relatively large samples are required to assure that observed differences in relative frequencies are stable. With these notes of caution, the conceptual basis underlying and the applications of each of the three approaches are described. Probit Analysis. The basic unit of probit analysis is the proportion of subjects within a specific group to achieve a satisfactory level of psychological or social functioning after a given dosage of treatment has been provided. Howard et al. (1986) described this relationship as "the amount of treatment (dose) needed to achieve a specific percentage of patient improvement (effect)" (p. 160). A probit model is created that uses the observed proportions of subjects in the jth group to achieve a satisfactory outcome at the ith dosage level,

To estimate the values of Aj and Bj in the model for the jth group, the observed values (proportions of subjects to achieve a satisfactory criterion at each dosage) are entered into a maximum likelihood procedure. Once estimated for a group, a model is created that generates a function describing the expected relations between dose and proportions to achieve a satisfactory outcome. The probit analysis provided in most statistical packages will generate a set of probit values and the estimated standardized proportions of persons expected to achieve a measured satisfactory state or level of functioning for each successive dose level, along with a the 95% confidence intervals about each probit value within a given group. Thus, for each group, the analysis provides dosage by success rate functions, along with a 95% confidence interval about each function. The extent of nonoverlap between the 95% interval envelopes for two or more groups will describe the statistical significance of between-group differences. Howard et al. (1986) found significantly different dose-effect functions for three diagnostic groups receiving psychotherapy: depression, anxiety, and borderline psychotic. There are three major limitations of probit analysis in addressing the question, "How much is enough?" One is that relatively large samples are needed to refine the dosage levels, probably more than 50 subjects per treatment group. The second is that only between-group main effects can be tested. However, it is possible to contrast the overlap in the 95% confidence interval across dose levels (or "confidence interval envelop") among any two or more groups. This results in a test of simple effects among groups in a design with two or more between-group variables; therefore, experimentwise error rates are an important concern and should be controlled (e.g., employing a Bonferroni correction of Type I error rates per comparison). The third limitation is that probit analysis is only applicable to measures of frequency such as the dose-effect relations. It cannot be applied to measures of intensity such as the restrictiveness or the cumulative cost classes of effort measures. The next two sets of approaches can be used for all three classes of effort measures.

< previous page

page_254

next page >

< previous page

page_255

next page > Page 255

Log-Linear Analysis of Effort Measures. The basic unit of measure when applying a log-linear analytic approach is the rank order of the magnitude of the effort measure when considering all subjects across all groups. For example, if there are 320 subjects in four groups, 80 per group, then the rank order values on the effort measure can vary from 1 to 320. Once subjects receive their rank order score, then any between-group rank order (nonparametric) statistical approach can be applied. Here, the log-linear analysis is considered because of its ability to consider higher level designs, with at least one treatment and one consumer characteristic variable. Table 8.2 provides an example of the general form of an analysis that can be considered by log-linear analysis. Consider that researchers are working with persons who have a serious and persistent mental illness and who are entering a community support program. At admission, they are first evaluated for their levels of interpersonal (including communication) skills, along with other characteristics. Half of those who score at a low level of interpersonal skills (Client Group B-1) are randomly assigned to a program that focuses on social and community functioning skills, where the treatment team works out of an office adjacent to a consumer-run dropin center (Group A-1, B-1). The remaining half of the consumers with low interpersonal skill are assigned to a program whose treatment team interventions focus on symptom control with a case manager who works out of a community mental health center (Group A-2, B-1). The same random assignment to Groups A-1 and A-2 are made for those clients scoring at the moderate to high levels of interpersonal skills (i.e,, assigned to Groups A-1, B-2 and A-2, B-2, respectively). Thus, in this example there are two treatment groups and two levels of a client characteristic (i.e., the moderator variable of entry level of interpersonal skills at low or moderate-high). The client characteristic is expected to interact with the effects of the therapeutic intervention. TABLE 8.2 The Frequencies of Subjects in Each Group for the Successive Quartiles When Ranked By the "Cumulative Costs of the Clinical Service Episode" Quartile Ranking of Cumulative Costs of Service Episode Treatment Client Group Q-1 Q-2 Q-3 Q-4 Group Lowest 26th to 51st to Highest Cost 50th 75th Cost 1st to 25th percentile percentile 76th to percentile 100th percentile f[111] f[112] f[113] f[114] A-1 Social B-1, Low Rehabilitation Interpersonal Treatment Skills Team f[121] f[122] f[123] f[124] B-2, Moderate to High Interpersonal Skil f[211] f[212] f[213] f[214] A-2 B-1, Low Symptom Interpersonal Control, Case Skills Manager at f[221] f[222] f[223] f[224] B-2, Moderate CMHC to High Interpersonal Skills Sum of Columns Equals: 25% of 25% of 25% of 25% of Total Total Total Total Note: The cell frequencies are described as f(ijk), for the ith quartile, in the jth treatment group and the kth level of the client characteristic.

< previous page

page_255

next page >

< previous page

page_256

next page > Page 256

The cumulative costs of treatment for each of the 320 consumers can be calculated for the first 6 months of treatment. These include the costs of personnel time and the materials consumed by agency personnel while serving the consumers over their first 6 months in the community support program. If cumulative costs of serving these consumers over the 6-month period were independent of either treatment or interpersonal communication skills, then it could be expected that the 80 subjects within each of the four groups would be evenly distributed across the cells of Table 8.2 (i.e., 20 subjects per cell). Because the columns represent the four respective quartiles, the columns will always sum to 25% of the sample. Based on the null hypothesis for all effects, there would be 20 consumers in each cell, indicating that the distribution of cumulative service costs are independent of either treatment or client characteristic. The outcome indicated that the most cost-efficient group would be the group-row with the largest observed cell frequencies in the lower quartile cells (Q-1 and Q-2) and the smallest observed cell frequencies in the higher quartile cells (Q-3 and Q-4). Although the logic here is that of a chi-square test of independence (testing whether row assignments are independent of column outcomes), the multidimensional classification (treatment group-by-client characteristic) nullifies the simple test of independence provided by the ordinary chi-square test. Log-linear analysis can provide a test of a cell frequency's independence of the association with the combinations of column and multiple row classifications that define the cell. As with the classical test of independence (chi-square), observed cell values are contrasted with expected cell values. Given that the cells are embedded in a multivaried classification scheme (three or more classes), the expected cell values need to be adjusted for main and firstorder interaction effects. The mathematical technique employed is to model each cell frequency using a natural log transformation of the observed frequencies. This permits the development of additive rather than exponential models to describe the relations among classifications. Considering the example, there is interest in testing the independence of each of the two main effects (treatment type and consumer characteristic) and the interaction with quartile ranking of cumulative service costs. The likelihood that the magnitude of service costs (level of Qi) is associated with type of treatment (Aj), or client characteristic (Bk), or both is being assessed. The natural log of the expected cell frequency, f, for the ijkth cell is

where µ is the average of all of the natural log frequencies within the table. Each of the omega terms are parameters estimated for each of the marginal effects. Each of the marginal effects is obtained in a fashion similar to a univariate analysis of variance. This can be seen, by example, in computing the parameter for Aj, WA-j = (µj - µ), where µj is the average of the natural log of the cell frequencies contained within Aj. The test statistic for the interaction of costs by treatment (Q by A) would be derived from the ordinary chisquare test for independence,

but employing the natural log values, an alternative, and widely used statistic called the Likelihood Ratio, L2, is computed as

< previous page

page_256

next page >

< previous page

page_257

next page > Page 257

Most statistical packages require that the user identify a design describing the interactions of interest and set an hierarchical order to the effects of interest. For example, the highest order interaction is Q-by-A-by-B, but there are only two other terms of interest: Q × A and Q × B. As is true of any hierarchal model, if the full (secondorder) interaction of Q × A × B is significant, then follow-up tests must be contingency tables investigating simple main effects and the interactions. Because the follow-up tests could inflate Type I error, two precautions are recommended. First, plan the follow-up tests in advance, restricting the number of planned comparisons to the degrees of freedom available (three in the example used here). Second, adjust the Type I error level of the follow-up tests to be more conservative using the Bonferroni correction (e.g., dividing the Type I error rate used to test the interaction by the number of degrees of freedom: .05 ÷ 3 = .017). The two major limitations of the a log-linear approach are that relatively large samples are required to obtain stable results, and the analysis leads to conclusions regarding treatment cost-efficiency and not costeffectiveness or benefit. The issue of sample size can sometimes be handled by careful consideration of expected outcomes. As with most chi-square techniques, expected cell frequencies should be greater than or equal to five for the highest order of classification (the ijkth cell in the current example). It is possible to establish a model that excludes certain cells from the analysis, provided that the exclusion can be logically defended as to why those cells ought to contain a count approaching zero. This could happen when contrasting a very inexpensive procedure with a very expensive procedure, where the investigator is interested in the middle level cost values and the interactions with two or more predictor variables. In this case, it is quite possible for the expected cell frequencies for the lowest quartile for the very expensive procedure to have expected values under five. Most computer packages permit the user to specify the cell structure and the model to be tested. If there is a good rationale for zero frequency cells, then this option should be used in analyzing the data. The issue that this design only attends to cost-efficiency and not to cost-effectiveness is best handled by treating this analysis as part of larger analysis where tests of treatment effectiveness will be considered alongside of the test of treatment costs, as illustrated by Fishman (1975). He used a strategy that considers the results of a treatment effectiveness study along with a cost-efficiency analysis (see Table 8.3). Fishman recommended a two-dimensional array contrasting the results of the cost study (the columns in the Table 8.3) with the results of the effectiveness study (the rows in Table 8.3). For seven of the nine combinations of dual outcome cost analyses, an investigator or policymaker would be able to decide on the most cost-effective choice. The issue of the need to consider an outcome (effectiveness) study along with a cost-efficiency study is also to be considered when performing the variance and regression analyses of costs (see the discussion that follows). Multivariate and Univariate Regression and Analysis of Variance of Effort Measures. There are some interesting possibilities when considering regression or variance analysis with cumulative costs, dosage, or restrictiveness measures. One possibility is to use an effort measure alongside progress or outcome measures in a multivariate regression or variance analysis. This would address the research issue of whether the independent (treatment or consumer characteristic) variables produce differences in client outcome profiles alongside of differences in cumulative costs of serving them. It is obvious that when consumers improve quickly, less long-term effort is needed; and when they are slow to react to treatment and slow to change, more or extended effort is required. Thus, it is defensible to include an effort measure along with the progress or outcome measure in the prediction equation that defines the regression or the variance analysis.

< previous page

page_257

next page >

< previous page

page_258

next page > Page 258

TABLE 8.3 Decisions Possible When Cumulative Costs and Outcome Effectiveness Are Jointly Compared Cumulative Episode Costs Outcome AB A better than Choose A Choose A No B decision A equal to B Choose A Choose Choose B either B better than No Choose B Choose B A decision Another possibility is to investigate the covariance structures of effort with consumer characteristics as they relate to outcome in a multiple regression analysis. Here the interaction (covariance) of the level of effort with the level of client characteristic is treated as a predictor variable and one or more client progress or outcome behavior measures serves as the dependent variable. The specific test is whether the slopes of the regression coefficients on one predictor dimension (e.g., dosage) differ across levels of the other dimension (e.g., initial severity of disorder prior to treatment). Finally, a major cautionary note regarding positively skewed distributions needs to be restated here. Dropping the data for the outlying top 5% to 20% has been used in many diagnostic-related grouping (DRG) studies and is accepted in some quarters. Others have felt that eliminating data should be avoided and a transformation that will approximate normality should be used instead. Some outliers are so extreme that even accepted transformations (e.g., arcsin, log or natural log) do not sufficiently modify the distribution to be acceptably normal. In these cases, it is often required to drop the drastically extreme cases and do a normalizing transformation as well. Question 5 Did the Person's State(S) or Levels of Functioning Stabilize? Although traditional research and statistical methods are designed to test for differences in behaviors or rates rather than testing for stability, it is possible to evaluate a prediction regarding stabilizing functional behaviors. The key is to carefully develop questions that follow the logic of the treatment goal of stability and to identify and collect the data needed for the corresponding dependent measures. The investigative methods and statistical analysis will follow from well-formulated questions. The following is a presentation of examples of several lines of questions. Contrasting Measures of Variations in Behaviors for Specific Periods of Time. Researchers often ask themselves, "Did fluctuations per unit of time (day, week, month, year) in psychological state or functioning change (decrease) as a result of the intervention? If so, what was the duration, amount, or rate of this change in variation from an unstable to a stable state within and across groups?" Here, trend or growth curve analysis can be applied, depending on the specification of the dependent measure. Some examples are: Count the number of fluctuations in a target behavior of a given magnitude per unit time; measure the duration of time the person remains within a given range of functioning; or estimate the rate of change from a unstable state to a stable state following a crisis.

< previous page

page_258

next page >

< previous page

page_259

next page > Page 259

Contrasting Differences in Odds or Probabilities. Results may also generate another question (stated in its formal null form), "What are the odds during a given unit of time for no housing or civil crisis to occur for individuals within and across groups?" Here logit, probit, or log-linear analytical approaches could be employed to contrast the outcomes of two or more treatment and/or client groups. As is true of all procedures that analyze relative frequencies or probabilities within and across categories, the definitions of categories or baseline conditions is very important. How does one define a housing or a civil crisis? Two criteria should be reviewed. First, the scheme for categorization identifies individuals as belonging to one and only one group, and to be in one and only one outcome category. Formally, this is the criterion for all events to be mutually exclusive and exhaustive. The second criterion is to assure that assumptions of independence or dependence are logically defensible in terms of the clinical theory. Still another line of questioning would follow traditional statistical tests where changes in the magnitude of the behaviors can be contrasted between groups over time: "What is the number of productive employment hours (independent of wages earned) by an individual?" Here univariate or multivariate analysis of variance or regression analysis can be readily applied. If sample size and precision permits, then structural equation modeling can also be applied. Moreover, the logic of the question could be easily modified to apply growth curve analysis of the rates of change in these magnitude measures. Thus, for Question 5, the issue is not so much one of what statistics to use, but one of carefully formulating a question (prediction) that can be analyzed. Logical traps that would require acceptance of the null hypothesis to demonstrate the worth of an intervention must be avoided. Predictions of stabilized functioning can be, and should be, tested if they are to be considered as reasonable treatment or service goals. There is a balance required. On the one hand, researchers do not want to compromise a clinical theory by fashioning a testable question. On the other hand, a good clinical theory ought to be testable. When confronting treatment goals of "stable functioning" most researchers have to learn to revise the way they formulate a testable question. Question 6 What Occurred During the Process of Treatment? What characteristics changed during the process of treatment? At what stages in the process did the change occur? How did the changes relate to final outcome characteristics? Historical Notes on Process Research and Its Measures. To date, most "process" research has been conducted within the content of specific forms of psychotherapy: individual psychotherapy (Orlinsky & Howard, 1986), group psychotherapy (Kaul & Bednar, 1986), and family therapy (Gurman, Kniskern, & Pinsof, 1986). No controlled research has been published outside of the psychotherapy literature. For example, none was found for the treatment team or case management or psychosocial approaches used in treating persons with a serious mental illness. The focus of the psychotherapy process research has, for the most part, been on the various aspects of the relation between the therapist and the client or the client's social system (e.g., family). The process measures employed are seldom standardized in the same fashion as the outcome measures discussed (for the most part) in this text. Instead, the process measures focus on the observable behaviors taken from videotapes or

< previous page

page_259

next page >

< previous page

page_260

next page > Page 260

transcripts, or from reports by the client or therapist about what occurred during or between the therapeutic interactions. Although reliability studies are frequently reported on the process measures, the basis for establishing validity of these measures is not clearly understood. Some session report techniques have been extensively studied (e.g., the Orlinsky & Howard, "Therapy Session Report," 1975). However, the majority of the techniques reported in the literature were specifically designed for a particular study. Some of the more popular instruments used in recent years are: the Therapy Session Report (Orlinsky & Howard, 1975), the Vanderbilt Negative Indicators Scale (Sachs, 1983), Structural Analysis of Social Behavior (Benjamin, Foster, Roberto, & Estroff, 1986); Helping Alliance Scale (Laborsky, Crits-Christoph, Alexander, Margolis, & Cohn, 1983), and the Working Alliance Scale (Horwrath & Greenberg, 1988). Others have developed systematic taxonomies for evaluating the content or tone of therapy sessions (Elliot, 1985; Stiles & Shapiro, 1994). Although none have norms that can be applied in the same fashion as those used with traditional psychological testing, each has a record of several published studies showing some degree of significant discriminative validity. Should process be related to outcome? Orlinsky and Howard (1986) and Silbershatz (1994) argued strongly that process ought to be related to outcome. Stiles and Shapiro (1994) argued that a true process measure should not be correlated with outcome. It will be left to the reader to decide which side of the argument to take, or to join those who still see it as an issue to be empirically settled (Newman, 1994). Orlinsky and Howard (1986) offered a fruitful "generic model" that outlines the process-outcome research literature. The outline presented in the generic model, and the literature cited, is recommended as a good starting point when designing a study on process-outcome relationships. The generic model has five interrelated components describing the therapeutic process: (1) The therapeutic contract is the purpose, format, terms and limits of the therapeutic enterprise. (2) Therapeutic interventions comprises the `business' of helping carried on under the terms of the therapeutic contract. (3) The therapeutic bond is an aspect of the relationship that develops between the participants as they perform their respective parts in the therapeutic interventions. (4) Patient's self-relatedness refers to the patient's ability to absorb the impact of therapeutic interventions and their therapeutic bond. (5) Therapeutic realizations, such as insight, catharsis, discriminative learning, and so on, occur within the session and presumably are productive of changes in the patient's life or personality. (Orlinsky & Howard, 1986, pp. 312-313) The areas of process research covered within each of the five areas of process research are well documented in Orlinsky and Howard. This research literature, for the most part, used the statistical procedures already covered in this chapter. Thus, no further discussion is needed beyond the recommendation that the generic model can be used to identify an area of interest in process research and then to review the studies cited as a guide for designs and statistical models. Having said this, consider one additional approach that was not covered by the literature in Orlinsky and Howard (1986), but nevertheless shows significant promise. Interpersonal Interaction Analysis in Therapy. Most psychosocial therapies involve the exchange of information and feelings among those involved. The interpersonal interaction process in therapy is not haphazard when done by a professional, but rather is goal directed. It should follow a predictable pattern over therapy sessions, with understandable variations as a function of the content of the material covered within a therapy session and/or the stage of the treatment.

< previous page

page_260

next page >

< previous page

page_261

next page > Page 261

The content of a therapy session which is to be analyzed is what is said, by whom, and in what context (i.e., within the context of what was said before). During a session, people (clients or therapists) can react to what others said or to their own line of thought from one utterance to another. Consider a second example, as shown in Table 8.4, that was offered by Canfield, Walker, and Brown (1991) as they attempted to describe the interpersonal and the intrapersonal interactions that can take place during a therapy session. The investigators classified all of the words in an utterance by the client and by the therapist according to whether each word in the utterance represented the degree of positive or negative valance (from +4 to -4) on each of three dimensions: emotional, cognitive, and contract. Table 8.4 describes the results of this analysis in one therapy session by giving the correlations between pairs of classes of utterances during the verbal interactions between the client and therapist, as well as the correlations of pairs of intrapersonal utterances. The data in Table 8.4 are presented as three correlation matrices. The 36 (6 × 6) entries on the lower left represent the contingent relationships between the client and the therapist. According to Canfield et al. (1991), the correlation of .40 between positive emotion and positive cognition in the client's intrapersonal correlation matrix indicates that if a client utterance was rated high in positive emotion, it was likely also to be rated high in positive cognition. Furthermore, the correlation of 0.46 in the therapist's intrapersonal correlation matrix between negative emotion and negative contract indicates that when the therapist utterance was rated high in negative emotion, it was likely to be rated high in contract as well. (p. 62) The major asset of developing the assessment of within-session interactions in this manner is that once the correlation matrix has been developed, then all of the procedures of regression, variance, or path analysis can be applied. Consider some examples: Will the matrices be the same over different stages of the treatment process? Will sessions rated ''rough" have a different correlation matrix than "information" sessions? What is the content of sessions that contained a "critical incident" that change the focus of therapy? It is recommended that the reader focus on the methods rather than on the specific content of the analysis. Other investigators may choose to use different systems for classifying utterances within therapy sessions. The form of the analysis could still be applied to other classification systems. Although the amount of effort to do these forms of analyses is high, it does offer great potential for understanding the basic ingredients of the therapeutic process and their interactions. When, in the not-too-distant future, the spoken word can be inexpensively digitized onto computer files for analysis, then the applications of these procedures should become as common as outcome studies. Conclusions The application of statistical methods to the analysis of data collected to address clinical issues has been traditionally awkward. Statistics are typically taught with examples from the literature. This chapter reversed the order: The clinical issue, along with its theoretical basis and clinical empirical findings, was presented first. Using this base, the chapter explored the relative merits of various statistical methods that could be applied to data collected to address each clinical issue. The expectation is that if clinical researchers use this approach, they may have a better chance of generating studies, with the analysis of the data, that maintain the integrity of the study's clinical issues.

< previous page

page_261

next page >

< previous page

page_262

next page > Page 262

TABLE 8.4 Interpersonal Interaction-contingency Matrix (Lower Left 6 × 6 Entries); Intrapersonal Correlation Matrix for the Client (Upper Triangular Array); and Intrapersonal Correlation Matrix for the Therapist (Lower Triangular Array) Client Utterances Therapist Utterances Emotion Cognition Contract Emotion Cognition Contract Variable 1(+) 2(-) 3(+) 4(-) 5(+) 6(-) 7(+) 8(-) 9(+) 10(-) 11(+) 12(-) Client 1. + Emotion .39 2. Emotion .40 .17 3. + Cognition .06 .07 .08 4. Cognition .29 .36 .29 .23 5. + Contract .25 .41 .14 .18 .44 6. Contract Therapist .48 .23 .15 .12 .12 .01 1. + Emotion .09 .01 .14 .10 .02 .09 .30 2. Emotion .03 .17 .10 .04 .34 .05 .19 .28 3. + Cognition .10 .02 .14 .07 .07 .18 .23 .11 .06 4. Cognition 0.40 .17 .20 .11 .12 .22 .33 .44 .13 .28 5. + Contract .05 .08 .29 .10 .00 .07 .17 .46 .20 .12 .37 6. Contract Note. The cell frequencies are described as f(ijk), for the ith quartile, in the jth treatment group and the kth level of the client characteristic.

< previous page

page_262

next page >

< previous page

page_263

next page > Page 263

Acknowledgments Thanks are owed to our thoughtful and patient colleagues, Siobhan A. Morse, for her recommendations and comments on drafts of this chapter, and José Szapocznik and colleagues at the Center for Family Studies, for their support and comments. References Ayoub, C., & Jacewitz, J. (1982). Families at risk of poor parenting: A descriptive study of 60 at risk families in a model prevention program. Child Abuse and Neglect, 6, 413-422. Benjamin, L., Foster, S., Roberto, L., & Estroff, S. (1986). Breaking the family code: Analysis of videotapes of family interactions by structural analysis of social behavior (SASB). In L. Greenberg & W. Pinsof (Eds.), The psychotherapeutic process: A research handbook (pp. 391-438). New York: Guilford. Bentler, P.M. (1989). EQS structural equations program manual. Los Angeles: BMPD Statistical Software. Bentler, P.M. (1990). Comparative fit indexes in structural equation modeling. Psychological Bulletin, 88, 588606. Bentler, P., & Bonett, D. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588-606. Beutler, L.E. (1991). Have all won and must all have prizes? Revisiting Luborsky et al.'s verdict. Journal of Consulting and Clinical Psychology, 59, 226-232. Bollen, K.A. (1989). Structural equations with latent variables. New York: Wiley. Bryk, A.S., & Raudenbush, S.W. (1987). Application of hierarchical linear models to assessing change. Psychological Bulletin, 101, 147-158. Bryk, A.S., & Raudenbush, S.W., Seltzer, M., & Congdon, R.J. (1986). An introduction to HLM: Computer program and user's guide. Chicago: University of Chicago. Canfield, M L., Walker, W.R., & Brown, L.G. (1991). Contingency interaction analysis in psychotherapy. Journal of Consulting and Clinical Psychology, 59, 58-66. Carter, D.E., & Newman, F.L. (1975). A client oriented system of mental health service delivery and program management: A work-book and guide (Series FN No. 4, DHHS Publication No. 80-307). Rockville, MD: Mental Health Service System Reports. Collins, L.M., & Horn, J.L. (Eds.). (1991). Best methods of the analysis of change: Recent advances, unanswered questions, future directions. Washington, DC: American Psychology Association. Cronbach, L.J. (1987). Statistical tests for moderator variables: Flaws in analyses recently proposed. Psychological Bulletin, 102, 414-417. Cronbach, L.J., & Furby, L. (1970). How we should measure "change"or should we? Psychological Bulletin, 74, 68-80. Cronbach, L.J., & Snow, R.E. (1977). Aptitudes and instructional methods: A handbook for research on interactions. New York: Irvington. Elliot, R. (1985). Helpful and non-helpful events in brief counseling interviews: An empirical taxonomy. Journal of Consulting and Clinical Psychology, 32, 307-321. Fishman, D.B. (1975). Development of a generic cost-effectiveness methodology for evaluating patient services of a community mental health center. In J. Zusman & C.R. Wurster (Eds.), Evaluation in alcohol, drug abuse, and mental health service programs (pp. 139-159). Lexington, MA: Heath. Francis, D.J., Fletcher, J.M., Strubing, K.K., Davidson, K.C., & Thompson, N.M. (1991). Analysis of change: Modeling individual growth. Journal of Consulting and Clinical Psychology, 59, 27-37. Garfield, S.L. (1986). Research on client variables in psychotherapy. In S.L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change (pp. 503-543). New York: Wiley. Gurman, A.S., Kniskern, D.P., & Pinsof, W.M. (1986). Research on marital and family therapies. In S.L. Garfield & A.E. Bergin (Eds.),

< previous page

page_263

next page >

< previous page

page_264

next page > Page 264

Handbook of psychotherapy and behavior change (pp. 525-564). New York: Wiley. Hargreaves, W.A., Gaynor, J., Ransohoff, R., & Attkisson, C.C. (1984). Restrictiveness of care among the severely mentally disabled. Hospital and Community Psychiatry, 35, 706-709. Hayduk, L.A. (1987). Structural equation modeling with LISREL: Essentials and advances. Baltimore, MD: Johns Hopkins University Press. Howard, K.I., Kopta, S.M., Krause, M.S., & Orlinsky, D.E. (1986). The dose-effect relationship in psychotherapy. American Psychologist, 41, 159-164. Horwrath, A.O., & Greenberg, L. (1986). The development of the Working Alliance Inventory. In L. Greenberg & W.M. Pinsof (Eds.), The psychotherapeutic process (pp. 529-556). New York: Guilford. Jaccard, J., & Wan, C.K. (1995). Measurement error in the analysis of interaction effects between continuous predictors using multiple regression: Multiple indicator and structural equation approaches. Psychological Bulletin, 117, 348-357. Jöreskog, K.G., & Sorbom, D. (1988). LISREL VII. Chicago: SPSS. Jöreskog, K.G., & Sorbom, D. (1993). LISREL 8: Structural qquation modeling with simple command language. Chicago: Scientific Software. Kaul, T.J., & Bednar, R.L. (1986). Research on group and related therapies. In S.L. Garfield & A.E. Bergin (Eds.), Handbook of psychotherapy and behavior change (pp. 671-714). New York: Wiley. Kazdin, A.E. (1986). The evaluation of psychotherapy: Research design and methodology. In S.L. Garfield & A.E. Bergin (Eds.), Handbook of psychotherapy and behavior change (pp. 23-68). New York: Wiley. Kenny, D.A. (1979). Correlation and causation. New York: Wiley. Laborsky, L., Crits-Christoph, P., Alexander, L., Margolis, M., & Cohn, M. (1983). Two helping alliance methods for predicting outcomes of psychotherapy: A counting signs versus a global rating method. Journal of Nervous and Mental Diseases, 171, 480-492. Lord, F.M. (1963). Elementary models for measuring change. In C.W. Harris (Ed.), Problems in measuring change (pp. 21-38). Madison: University of Wisconsin Press. Lyons, J.S., & Howard, K.I. (1991). Main effects analysis in clinical research: Statistical guidelines for disaggregating treatment groups. Journal of Consulting and Clinical Psychology, 59, 745-748. McClelland, G.H., & Judd, C.M. (1993). Statistical difficulties of detecting interactions and moderator effects. Psychological Bulletin, 114, 376-390. Myers, J.L., & Well, A.D. (1991). Research design and Statistical analysis. New York: Harper-Collins. Newman, F.L. (1983). Therapists' evaluations of psychotherapy. In M. Lambert, E. Christensen, & R. DeJulio (Eds.), The Assessment of psychotherapy outcome (pp. 497-534). New York: Wiley. Newman, F.L. (1994). When is observing non-significance enough? (Introduction to Special Feature). Journal of Consulting and Clinical Psychology, 62, 941. Newman, F.L., Ciarlo, J.A., Carpenter, D. (1998). Guidlines for selecting psychological instruments for treatment planning and outcome. In M. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. Newman, F., DeLiberty, R., Hodges, K., & McGrew, J. (1997, June). Hoosier assurance plan: Linking level of care to level of need. Paper presented at the National Conference for Mental Health Statistics, Washington, DC. Newman, F.L., Griffin, B.P., Black, R.W., & Page, S.E. (1989). Linking level of care to level of need: Assessing the need for mental health care for nursing home residents. American Psychologist, 44, 1315-1324. Newman, F.L., Heverly, M.A., Rosen, M., Kopta, S.M., & Bedell, R. (1983). Influences on internal evaluation data dependability: Clinicians as a source of variance. In A.J. Love (Ed.), Developing effective internal evaluation: New directions for program evaluation (No. 20, pp. 61-69). San Francisco: Jossey-Bass. Newman, F.L., & Howard, K.I. (1986). Therapeutic effort, treatment outcome, and national health policy. Journal of Consulting and Clinical Psychology. 41, 181-187. Newman, F.L., & Sorensen, J.E. (1985). Integrated clinical and fiscal management in

< previous page

page_264

next page >

< previous page

page_265

next page > Page 265

mental health: A guidebook. Norwood, NJ: Ablex. Newman, F.L., Tippet, M.T., & Johnson, D.A. (1992, July). A screening instrument for consumer placement in a level of CSP: Psychometric properties. Paper presented at the NIMH National Conference on Mental Health Statistics in Washington, DC. O'Brien, R.G., & Kaiser, M.K. (1985). MA-NOVA method for analyzing repeated measures designs. Psychological Bulletin, 97, 316-333. Orlinsky, D.E., & Howard, K.I. (1975). Varieties of psychotherapeutic experience. New York: Teacher's College Press. Orlinsky, D.E., & Howard, K.I. (1986). Process and outcome in psychotherapy. In S.L. Garfield and A.E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed., pp. 311-381). New York: Wiley. Ransohoff, P., Zackary, R.A., Gaynor, J.A., & Hargreaves, W.A. (1982). Measuring the restrictiveness of psychiatric care. Hospital and Community Psychiatry, 33, 361-366. Rogosa, D.R., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change. Psychological Bulletin, 92, 726-748. Rogosa, D.R., & Willett, J.B. (1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20, 335-343. Rogosa, D.R., & Willett, J.B. (1985). Understanding correlates of change by modeling individual differences in growth. Psychometrika, 50, 61-72. Rosnow, R.L., & Rosenthal, R. (1989). Definition and interpretation of interaction effects. Psychological Bulletin, 105, 143-146. Sachs, J.S. (1983). Negative factors in brief psychotherapy: An empirical assessment. Journal of Consulting and Clinical Psychology, 51, 557-564. Saunders, S.M. (1991). The process of seeking psychotherapy: Routes, difficulty and social support. Unpublished doctoral dissertation, Northwestern University, Psychology Department, Evanston, IL. Scandura, T.A., & Tejeda, M.J. (1997, August). Models as fiction in structural equation modeling. Paper presented at the annual meeting of the Academy of Management, Boston, MA. Shadish, W.R., & Sweeney, R.B. (1991). Mediators and moderators in meta-analysis: There's a reason we don't let dodo birds tell us which psychotherapies should have prizes. Journal of Consulting and Clinical Psychology, 59, 883-893. Shoham-Soloman, V. (1991). Introduction to special section on client-therapy interaction research. Journal of Consulting and Clinical Psychology, 59, 203-204. Shoham-Soloman, V., Avner, R., & Neeman, R. (1989). Your are changed if you do and changed if you don't: Mechanisms underlying paradoxical interventions. Journal of Consulting and Clinical Psychology, 57, 590-598. Silbershatz, G. (1994). Spurious or uncorrelated? Comments on Stiles and Shapiro. Journal of Consulting and Clinical Psychology, 62, 949-951. Smith, B., & Sechrest, L. (1991). Treatment of aptitude x treatment interactions. Journal of Consulting and Clinical Psychology, 59, 233-244. Snow, R.E. (1991). Aptitude-treatment interaction as a framework for research on individual differences in psychotherapy. Journal of Consulting and Clinical Psychology, 59, 205-216. Stiles, W.B., & Shapiro, D.A. (1994). Abuse of the drug metaphor: Psychotherapy process-outcome correlations. Journal of Consulting and Clinical Psychology 62, 942-948. Turner, R.M., Newman, F.L., & Foa, E. (1983). Relating obsessive-compulsive emotional structure to the cost and outcome of long term behavior therapy. Journal of Consulting and Clinical Psychology, 59, 233-244. Uehara, E., Smukler, M., & Newman, F.L. (1994). Linking resource use to consumer level of need in a local mental health system: Field test of the "LONCA" case mix method. Journal of Consulting and Clinical Psychology, 62, 695-709. Webster, H., & Bereiter, C. (1963). The reliability of changes measured by mental test scores. In C.W. Harris (Ed.), Problems in measuring change (pp. 39-59). Madison, WI: University of Wisconsin Press. Willett, J.B. (1988). Questions and answers in the measurement of change. In E.Z. Rothkopf (Ed.), Review of research in education (Vol. 15, pp. 345-422). Washington, DC: American Educational Research Association.

< previous page

page_266

next page > Page 266

Willett, J.B. (1989). Some results on reliability for the longitudinal measurement of change: Implications for the design of studies of individual growth. Educational and Psychological Measurement, 49, 587-602. Willett, J.B., Ayoub, C.C., & Robinson, D. (1991). Using growth modeling to examine systematic differences in growth: An example of change in the functioning of families at risk of maladaptive parenting, child abuse, or neglect. Journal of Consulting and Clinical Psychology, 59, 38-47. Willett, J.B., & Sayer, A.G. (1994). Using covariance structure analysis to detect correlates and predictors of individual change over time. Psychological Bulletin, 116, 363-381. Yates, B.T. (1980). Improving effectiveness and reducing costs in mental health. Springfield, IL: Thomas. Yates, B.T. (1996). Analyzing costs, procedures, processes, and outcomes in human services. Thousand Oaks, CA: Sage. Yates, B.T., & Newman, F.L. (1980). Findings of cost-effectiveness and cost-benefit analyses of psychotherapy. In G. VandenBos (Ed.), Psychotherapy: From practice to research to policy (pp. 163-185). Beverly Hills, CA: Sage. Zimmerman, D.W., & Williams, R.H. (1982a). Gain scores in research can be highly reliable. Journal of Educational Measurement, 19, 149-154. Zimmerman, D.W., & Williams, R.H. (1982b). The relative error magnitude in three measures of change. Psychometrika, 47, 141-147. [Page 266(a)]

PART II CHILD AND ADOLESCENT ASSESSMENT INSTRUMENTATION

< previous page

page_266

next page >

< previous page

page_267

next page > Page 267

Chapter 9 Use of the Children's Depression Inventory Gill Sitarenios Multi-Health Systems, Inc. Maria Kovacs University of Pittsburgh, School of Medicine From a clinical perspective, a syndrome refers to a characteristic constellation of psychopathologic symptoms and signs. A depressive syndrome typically implies the presence of negative dysphoric mood, and complaints such as a sense of worthlessness or hopelessness, preoccupation with death or suicide, difficulties in concentration or making decisions, disturbance in patterns of sleep and food intake, and reduced energy. A disorder implies the presence of a particular syndrome that has been shown to have the characteristics of a diagnosable condition, that is, it has a recognizable pattern of onset and course, clear negative consequences with respect to the individual's functioning, distinct biologic or related correlates, an association with known etiologic or risk factors, and a course that may be altered in predictable ways by various treatments. Major depressive disorder and dysthymic disorder are two forms of depressive disorders that affect children and adults. Episodes of major depression in childhood last about 10 months on average and may have psychotic or melancholic features associated with them (Kovacs, Obrosky, Gatsonis, & Richards, 1997). Major depression often is comorbid with other disorders, most commonly with disorders of anxiety and conduct (Kovacs, Gatsonis, Paulauskas, & Richards, 1989; Puig-Antich, 1982; Strober & Carlson, 1982). Major depression in childhood is associated with a high rate of recovery; there is, however, a very high risk of episode recurrence, and an increased risk for the development of other related disorders (Kovacs, 1996a, 1996b; Kovacs et al., 1989; Strober & Carlson, 1982). Compared to major depression, dysthymic disorder is a milder and possibly less impairing form of depression. However, dysthymia usually lasts longer than major depression, with an average duration of about 3 1/2 years or longer (Kovacs et al., 1997). Similar to major depression, dysthymia has a high rate of eventual recovery. But, it also is associated with a high rate of comorbid psychiatric disorders, and dysthymia increases the risk for major depression and other related conditions (Kovacs, Akiskal, Gatsonis, & Parrone, 1994; Kovacs et al., 1997). Weiss et al. (1991) noted that depression in childhood, which was once thought to be rare or nonexistent, is now the subject of much clinical and research activity and is currently recognized by almost all authoritative sources (e.g., DSM-IV). In fact, estimates

< previous page

page_267

next page >

< previous page

page_268

next page > Page 268

of prevalence rates of depressive disorders in children have been found to be quite high (e.g., see Kashani et al., 1981) and some clinicians have diagnosed them as early as preschool age (e.g., Kashani & Carlson, 1985). The pattern of symptoms seen in childhood depression is similar to that seen in adults with similar affective, cognitive, behavioral, and somatic complaints (Kaslow, Rehm, & Siegel, 1984), and there appears to be little variability in the associated features of the disorder across the life span (Kovacs, 1996b). Depressive disorders can disrupt the functioning of children and adolescents in a number of areasmost notably in schooland cause significant developmental delays. Moreover, children who have depressive disorders may have trouble "catching up" in development (Kovacs & Goldston, 1991, p. 389). Assessment of Depression Using Self-Report Assessment of depression can focus on the early identification of the extent and severity of depressive symptoms, the diagnosis of depression and associated disorders, and monitoring the effectiveness of interventions. Self-rated inventories have long been a part of the assessment of depressive symptoms in adults (e.g., Beck Depression Inventory; Beck, 1967). Such inventories typically are easy to administer, inexpensive, and readily analyzable. Because they quantify the severity of the depressive syndrome, they have been used for descriptive purposes, to assess treatment outcome, to test research hypotheses, and to select research subjects. However, because self-rated inventories do not assess the temporal features, the onset, the course, or the contributing factors of the syndrome being examined, they cannot yield diagnostic information. For children, self-report inventories nonetheless provide especially useful information in that many features of depression and internal and are not easily identified by informants such as parents or teachers. Moreover, according to psychological models, children's self-perceptions are of predictive value in their own right (Kovacs, 1992; C.F. Saylor, Finch, Baskin, Furey, & Kelly, 1984a). The Children's Depression Inventory (CDI) has been one of the most widely used and cited inventories of depression. According to Fristad, Emery, and Beck (1997), the CDI was used in over 75% of the studies with children in which self-report depression inventories were employed. The initial version of the CDI was developed in 1977. Formal publication of the instrument in 1992 increased its accessibility. This chapter provides a timely opportunity to summarize the research history and usage of the CDI since its inception and publication. The CDI and its componentsas well as its various forms, associated manuals, and scoring formsare described in the first part of this chapter. Current research and theory related to the CDI are also highlighted. The CDI manual (Kovacs, 1992) includes an annotated bibliography of about 150 related research studies up to the end of 1991 and, according to recent literature (Fristad et al., 1997), at least 200 additional articles pertaining to the CDI have been published since that time. Other goals of this chapter are to examine current usage of the CDI, distinguish proper usage of the instrument from improper usage, and address questions frequently asked by practitioners. The CDI can be useful in the early identification of symptoms and in the monitoring of treatment effectiveness. The CDI also can play a role in the diagnostic process but, as already noted, should not be used alone to diagnose a

< previous page

page_268

next page >

< previous page

page_269

next page > Page 269

depressive disorder. Finally, this chapter describes the ongoing development of the CDI, including anticipated accessories, future research directions, and extended applications. Summary of the Development of the CDI The Beck Depression Inventory (Beck, 1967), a clinically based, 21-item, self-rated symptom scale for adults was the starting point for the development of a paper-and-pencil tool appropriate for children. The research literature supported the decision to use an "adult" scale as the model, given that there appeared to be much overlap between the salient manifestations of depressive disorders in juveniles and in adults (Kovacs & Beck, 1977). Scale construction proceeded in four phases. Phase I The first version of the children's inventory (March 1975) was derived with the help of a group of 10- to 15year-old "normal" youths and similar-age children from an urban inpatient and partial hospitalization program. After the purpose of the scale revision project was explained individually to the children, they were asked for advice on how the items could be worded to make them "clear to kids." Although in this phase of scale construction the Beck item on sexual interest was replaced by an item on loneliness, the content and format of 20 items of the ''adult" scale were essentially retained. However, five "Appendix" items, adapted from Albert and Beck (1975), were added concerning school and peer functioning. Piloting yielded further semantic changes. Phase II Data from normal youth and children who were under psychiatric-psychological care were used along with a semantic and conceptual item analysis to produce a second major revision (February 1976) that also included a new item on self-blame. This version of the inventory was administered to 39 8- to 13-year-old children who were consecutively admitted to a child guidance center's hospitalization units, 20 "normal" 8- to 13-year-olds with no history of psychiatric contacts, and 127 10- to 13-year-old fifth- and sixth-grade students in the Toronto public school systems. The resultant data were analyzed according to standard psychometric principles and the findings were used to derive a completely new version of the scale. Two of the original 21 items ("shame" and "weight loss") and two of the Appendix items ("family fights" and "self-blame") were replaced by four new items that had face validity and appeared age appropriate (e.g., "feeling unloved"). The CDI item-choice distributions in these samples also revealed that the items could be recast into a threechoice format: One choice reflects "normalcy," the middle choice pertains to definite although not disabling symptom severity, and the other response option reflects a clinically significant complaint. In order to prevent response bias, approximately 50% of the items (randomly selected) were worded so that the first response choice suggested the most pathology, whereas the response choice order was reversed for the remaining items.

< previous page

page_269

next page >

< previous page

page_270

next page > Page 270

Phase III The newly modified version of the CDI (May 1977) was again pilot tested and sent to colleagues for a critique. A cover page was added with revised instructions and a sample item. Based on the results of piloting, the items were further refined and reworded in order to improve face validity and comprehensibility. Phase IV One minor change preceded preparation of the final version of the CDI (August 1979). The score values were eliminated from the inventory and scoring templates were developed. Current Work Since the initial development of the CDI, additional psychometric analyses have been conducted. Based on these analyses, five factors have been identified and are fully described in the CDI manual (Kovacs, 1992). A short form of the CDI has been derived as well, and software has been developed for online administration, scoring, and reporting. The instrument is now available in several foreign languages. Overview of the CDI The CDI is appropriate for children and adolescents from age 7 to 17. The instrument quantifies a range of depressive symptoms, including disturbed mood, problems in hedonic capacity and vegetative functions, low self-evaluation, hopelessness, and difficulties in interpersonal behaviors. Several items pertain to the consequences of depression with respect to contexts that are specifically relevant to children (e.g., school). Each of the 27 CDI items consists of three choiceskeyed 0 (absence of a symptom), 1 (mild symptom), or 2 (definite symptom)with higher scores indicating increasing severity and yielding a total scale score that can range from 0 to 54. In addition to the total score, the CDI also yields scores for five factors or subscales. These factors are labeled Negative Mood, Interpersonal Problems, Ineffectiveness, Anhedonia, and Negative Self-esteem. Although author-approved definitions of these subscales have been available for some time, the definitions have not been widely published (although they are given in the recent Software User's Manual; Kovacs, 1995). Therefore, these definitions are provided in Table 9.1. Reliability Psychometric information on reliability is directly related to the proper use and interpretation of an instrument. The reliability of the CDI has been examined in terms of internal consistency, test-retest reliability, and standard error.

< previous page

page_270

next page >

< previous page

page_271

next page > Page 271

Scale Negative Mood

TABLE 9.1 Definitions of the Subscales of the CDI Definition This subscale reflects feeling sad, feeling like crying, worrying about "bad things," being bothered or upset by things, and being unable to make up one's mind.

Interpersonal This subscale reflects problems and difficulties Problems in interactions withpeople, including trouble getting along with people, social avoidance, and social isolation. Ineffectiveness This subscale reflects negative evaluation of one's ability and school performance. Anhedonia This subscale reflects "endogenous depression," including impairedability to experience pleasure, loss of energy, problems with sleeping and appetite, and a sense of isolation. Negative SelfThis subscale reflects low self-esteem, selfesteem dislike, feelings of being unloved, and a tendency to have thoughts of suicide. Internal Consistency. Internal consistency refers to the fact that all items on the given instrument consistently measure the same dimension. Kovacs (1992) summarized several research studies that reported alpha reliability statistics for the CDI. Alpha coefficients from .60 to .70 are usually taken to indicate satisfactory reliability, .70 to .80 indicate good reliability, and .80 to .95 indicate excellent reliability. The majority of the studies reported total score alpha values over .80, and all of the values were greater than .70. For instance, Kovacs (1985) found the total score coefficient alpha to be .86 with a heterogeneous, psychiatrically referred sample of children, .71 with a pediatricmedical outpatient group, and .87 with a large sample of public school students (n = 860). Although the internal consistency of the CDI Total Score has often been reported, data on alpha coefficients for the five factor scores have been less available. Therefore, the internal consistency of the five subscales was assessed using two large data sets: the CDI normative sample of 1,266 children and an independent sample of 894 Canadian children. The reliability values obtained are shown in Table 9.2, along with a summary of alpha values previously reported for the CDI Total Score. Although reliability values for the five subscales are not as high as those for the CDI Total Score, the findings for the subscales are satisfactory. Furthermore, the alpha values obtained from the two samples are very similar. Test-Retest Reliability. The CDI is completed based on the respondent's feelings, moods, and functioning during the 2-week period just prior to the test administration. Thus, the inventory measures state symptoms, rather than traits, which are less changeable over time. Because the CDI measures a state rather than a trait, the retest interval TABLE 9.2 Estimates of Internal Consistency of the CDI and the Five CDI Factors Scale Internal Consistency (Cronbach's Alpha) Total CDI Alphas ranging from .71-.89 (Kovacs, 1992) Negative Mood Normative Sample: .62; Canadian Sample: .65 Interpersonal Normative Sample: .59; Canadian Sample: Problems .60 Ineffectiveness Normative Sample: .63; Canadian Sample: .59 Anhedonia Normative Sample: .66; Canadian Sample: .64 Negative Self-esteem Normative Sample: .68; Canadian Sample:

.66

< previous page

page_271

next page >

< previous page

page_272

next page > Page 272

for assessing reliability should be short (2 to 4 weeks). In the research reviewed by Kovacs (1992), studies using such short intervals found test-retest correlations between .56 and .87 (an outlier of .38 was obtained in one study) and the median test-retest correlation was .75. Thus, the CDI has acceptable short-term stability. Standard Error. Two types of standard error (Lord & Novick, 1968) are most relevant to the CDI: standard error of measurement (SEM1) and standard error of prediction (SEM2). The standard error of measurement (SEM1) represents the standard deviation of observed scores if the true score is held constant. In the case of the CDI, this means that if parallel forms of the scale were used to assess the same individual at the same time, then about 68% of the scores would fall within a ± 1.96 SEM1 unit of the score obtained on the CDI scale, and about 95% of the scores would fall within ±1.96 SEM1 units. The standard error of prediction (SEM2) has particular relevance because SEM2 has an intimate connection to outcome assessment. SEM2 represents the standard deviation of predicted scores if the obtained score is held constant. That is, if 100 individuals were reassessed on the CDI, about 68% of the retest scores would fall within ±1 SEM2 unit of the respective predicted scores and about 95% of the retest scores would fall within ±1.96 SEM2 units of the predicted scores. Thus, the SEM2 value is one way of assessing how much CDI scores can be expected to change due to random fluctuation. Any change in CDI scores that far exceeds the expected random fluctuation is most likely attributable to a significant change in the status of the individual's symptomatology. The absolute value for the standard error of measurement (SEM1) or the standard error of prediction (SEM2) varies according to both the estimate of reliability and the estimate of the population standard deviation used in the calculation. The previously noted SEM1 value was calculated based on the median Cronbach alpha for the CDI Total Score, shown in Table 9.2, and SEM2 values were derived using the median 2- to 4-week test-retest reliability estimate for the CDI Total Score. The resultant standard error of measurement values are presented in Table 9.3. Validity The validity of an instrument is assessed by estimating the extent to which it correctly measures the construct or constructs that it purports to assess. Because constructs cannot TABLE 9.3 Standard Error Values for the CDI Total Score Gender Standard Error of Standard Error of (Age Measurement Prediction (SEM2) Group) (SEM1) Boys 2.9 3.8 (overall) Boys (72.8 3.7 12) Boys (133.1 4.2 17) Girls 2.6 3.5 (overall) Girls (72.7 3.6 12) Girls (132.4 3.2 17) Overall 2.7 3.7

< previous page

page_272

next page >

< previous page

page_273

next page > Page 273

be directly observed, several options are available to assess the validity of an instrument: namely, its correlation with other scales purported to measure the same construct (construct validity: other depression scales), its correlation with scales purported to measure related constructs (construct validity: related constructs), its correlation with independent ratings of behaviour (construct validity: other measures), factor analytic support for its subscale structure (factorial validity), and its ability to predict appropriate behaviors (predictive validity). Thus, the validity of a test rests on accumulated evidence from a number of studies using various methodologies (Campbell & Fiske, 1959). The CDI has been utilized in hundreds of clinical and experimental research studies and its validity has been well established using a variety of techniques. Overall, the weight of the evidence indicates that the inventory assesses important constructs that have strong explanatory and predictive utility in the characterization of depressive symptoms in children and adolescents. Table 9.4 shows a listing of some of the research related to different aspects of validity. Also, see Barreto (1994) for a brief review of validity information, and C.F. Saylor, Finch, Baskin, Furey, et al. (1984a) and C.F. Saylor, Finch, Spirito, and Bennett (1984c), who used the multitrait-multimethod approach to assess the construct validity of the CDI. Further validation data pertinent to specific uses of the CDI are presented later. CDI Short Form The 10-item CDI Short Form was developed to enable more rapid and economical assessment of depressive symptoms than the long form. The CDI Short Form can be used when a quick screening measure is desired or when the examiner's time with the child is limited. The short form takes 5 to 10 minutes to administerabout half the time it takes to administer the long version. However, the long and short forms generally provide comparable results. That is, the correlation between the CDI Total Score and the CDI Short Form total score was r = .89 (Kovacs, 1992). Administration of the CDI Reading Level Past computations of the reading level for the CDI have produced different grade readability estimates (Berndt, Schwartz, & Kaiser, 1983; Kazdin & Petti, 1982). A first-grade reading level for the CDI is most frequently cited (e.g., Kovacs, 1992). Variable assessments of the instrument's reading level probably reflect the use of different reading level formulae. The Dale-Chall formula (Dale & Chall, 1948) has been found to be the most valid and accurate of the nine commonly utilized readability formulas (e.g., Harrison, 1980). It is based on semantic (word) difficulty, and syntactic (sentence) difficulty. Usually, two 100-word samples are taken to calculate the reading level using the Dale-Chall formula (Chall & Dale, 1995). However, to provide greater accuracy, the computation reported here used all of the CDI items. In accordance with the Dale-Chall

< previous page

page_273

next page >

< previous page

page_274

next page > Page 274

TABLE 9.4 Studies Containing Information Relevant to the Validity of the CDI Salient Measures/Methodology Reference Construct Validity CDI compared to CBCL other measures of " childhood depression " Bodiford, Eisenstadt, " Johnson, & Bradlyn, 1988 Hammen, Adrian, Gordon, Burge, Jaenicke, & Hiroto, 1987 Hepperlin, Stewart, & Rey, 1990 Weiss & Weisz, 1988

CDS and others Haley, Fine, CDS Marriage, Moretti, BDI & Freeman, 1985 Rotundo & MMPI-D Hensley, 1985 DSRS Seligman, Peterson, Kaslow, Tanenbaum, Alloy, & Abramson, 1984 Lipovsky, Finch, & Belter, 1989 Asarnow & Carlson, 1985 CDI compared to Anxiety (RCMAS) measures of related " constructs " Eason, Finch, " Brasted, & C.F. " Saylor, 1985 Felner, Rowlison, Raley, & Evans, 1988 Kovacs, 1985 Norvell, Brophy, & Finch, 1985 Ollendick & Yule, 1990 Anxiety (STAI) Blumberg & Izard, "

" Wolfe, Finch, " C.F. Saylor, RADS Blount, RADS, Pallmeyer, & HAMILTON Carek, 1987 Worchel, Hughes, Hall, S.B. Stanton, H. Stanton, & Little, 1990 Nieminen & Matson, 1989 Shain, Naylor, & Alesi, 1990 CES-D Faulstich, " Carey, CES-D and Ruggiero, SAS Enyart, & CDS Gresham, 1986 Felner, Rowlison, Raley, & Evans, 1988 Weissman, Orvaschel, & Padian, 1980 Bartell & Reynolds, 1986

1986 Self-concept (PiersWolfe, Finch, C. Harris) Saylor, Blount, " Pallmeyer, & " Carek, 1987 " Allen & Tarnowski, 1989 Elliott & Tarnowski, 1990 Knight, Hensley, & Waters, 1988 Kovacs, 1985 McCauley, " Mitchell, Burke, & " Moss, 1988 " Rotundo & " Hensley, 1985 C.F. Saylor, Finch, Baskin, Furey, & Kelly, 1984a C.F. Saylor, Finch, Spirito, & Bennett, 1984c Self-esteem Kaslow, Rehm, & (Coopersmith) Siegel, 1984 " Kovacs, 1985 " Reynolds, Self-esteem (SelfAnderson, & esteem Inventory) Bartell, 1985 Kazdin, French, Unis, & EsveldtDawson, 1983 Attributional Style Bodiford, (CASQ) Eisenstadt, Johnson," & Bradlyn, 1988 " Curry & Craighead, " 1990 Gladstone & Kaslow, 1995 Hammen, Adrian, & Hiroto, 1988 " Kuttner, Delameter, " & Santiago, 1989 " McCauley, Hopelessness Mitchell, Burke, & (Hopelessness Scale) Moss, 1988 Nolen-Hoeksema, Girgus, & Seligman, 1986 Elliott & Tarnowski, 1990 Kazdin, French, " Unis, & Esveldt" Dawson, 1983 " Kazdin, French, Unis, EsveldtDawson, & Sherick, 1983 McCauley, Mitchell, Burke, & Moss, 1988 " Spirito, Overholser, Perceived Competence & Hart, 1991 Scale Fauber, Forehand, Social Adjustment Long, Burke, & Scale Faust, 1987 Weissman,

Orvaschel, & Padian, 1980 (continued) (table continued on next page)

< previous page

page_274

next page >

< previous page

page_275

next page > Page 275

(table continued from previous page) Table 9.4 (Contineued) Reference CDI compared to behavioral measures/observations of depressive behavior/symptoms Blumberg & Izard, 1986 Huddleston & Rust, 1994 Ines & Sacco, 1992 Reynolds, Anderson, & Bartell, 1985 Renouf & Kovacs, 1994 Sacco & Graves, 1985 Shah & Morgan, 1996 Slotkin, Forehand, Fauber, McCombs, & Long, 1988

Salient Measures/Methodology

Parent/Teacher rating/observation " " " " " " "

Therapist/Staff ratings Breen & Weinberger, 1995 Perceptions of Stocker, 1994 relationships/adjustment Hodges, 1990 Interview findings C.F. Saylor, Finch, Baskin, Furey, & Kelly, Peer reports 1984a Factorial Validity Kovacs, 1992 Carey, Faulstich, Gresham, Ruggiero, & Enyart, 1987 Helsel & Matson, 1984 C.F. Saylor, Finch, Spirito, & Bennett, 1984c Weiss & Weisz, 1988 Weiss, Weisz, Politano, Carey, Nelson, & Finch, 1991 Predictive Validity Devine, Kempton, & Forehand, 1994 Longitudinal procedure used DuBois, Felner, Bartels, & Silverman, 1995 " Mattison, Handford, Kales, Goodman, & " McLaughlin, 1990 Reinherz, Frost, & Pakiz, 1991 " Marciano & Kazdin, 1994 Statistical prediction procedure used Slotkin, Forehand, Fauber, McCombs, & Long, " 1988 standard procedure for determining reading level, the number of complete sentences were counted and divided into the number of words to determine average sentence length (WDS/SEN). Next, the number of "unfamiliar" words (UFMWDS) were counted. A word is considered unfamiliar if it does not appear on a list of 3,000 "familiar" words compiled by Dale (revised in 1983). "Familiar'' words are known by at least 80% of children in the fourth grade. Consideration of the number of familiar and unfamiliar words in a sample of text increases the accuracy of the reading level assessment. The grade level was determined using the following formula:

The Dale-Chall procedure produced a third-grade reading level for the CDI, suggesting that the often-cited firstgrade reading level for the CDI is not definitive. Administrators/practitioners should not assume that all younger children will be able to understand the language on the inventory. For 7- and 8-year-olds and children with reading difficulties, it is recommended (Kovacs, 1992) that the administrator read aloud the instructions and the CDI items while children read along on their own form.

< previous page

page_275

next page >

< previous page

page_276

next page > Page 276

Administration Methods One way to administer the CDI is to allow children to indicate their responses on a special QuikScoreTM form (Kovacs, 1992). The QuikScoreTM form is self-contained and includes all materials needed to score and profile the CDI. Conversion to T-scores is automatically made in the QuikScoreTM form. The CDI also can be computer administered and scored using an IBM-compatible microcomputer (Kovacs, 1995). Regardless of which option/format is chosen, the administrator should make sure the child carefully reads the instructions and fully understands the inventory. As already noted, for younger children or those with reading difficulties, it may be necessary to read the instructions and the items aloud while children read along on their own form or the computer screen. After reading each item, children select one of the three response options provided. Children may say that none of the choices in a given item really applies to them. In such a case, they should be instructed to select the item choice that fits them best. Although the CDI is most often administered on an individual basis, group administration is permitted (e.g., Friedman & Butler, 1979; C.F. Saylor, Finch, Baskin, C.B. Saylor, et al., 1984b). Additionally, with nonclinical populations, some test administrators have considered inclusion of the suicide item to be inappropriate; in such instances, it may be preferable to use the CDI Short Form, which does not include this item. Applicable Populations In interpreting clinically significant patterns of total scale and factor scores on the CDI, it is important to consider the children's background, including their socioeconomic status, country of origin, and ethnicity. The norms presented in the main manual for the CDI (Kovacs, 1992) are based on a select sample of North American children. The validity of the instrument for other groups of children is suggested by research studies with different populations. In general, this body of research, cited in Table 9.5 and Table 9.6, shows very widespread applicability of the CDI. Table 9.5 lists research citations in connection with the use of the CDI with children from different cultures and from different countries. The CDI research includes data on children who were African American, North American, Spanish, German, Australian, Egyptian, Japanese, Brazilian, Icelandic, Croatian, and French. These references should be consulted to aid in the interpretation of CDI results regarding those populations. Table 9.5 also cites some of the translated versions of the CDI. Table 9.6 lists some of the research on the CDI with children in special circumstances. Data have been obtained from samples of low socioeconomic status; urban and rural residents; those in public housing situations; and children with mental retardation, learning disability, or emotional problems. In addition to the studies cited in Table 9.6, the CDI has also been used in situations where a family member or the child has cancer (Siegel, Karus, & Raveis, 1996; Polaino & del-Pozo-Armentia, 1992), with children going through the tribulations of parental divorce (e.g., Pons-Salvador & del-Barrio, 1993), and with children who have insulin-dependent diabetes mellitus (Kovacs, Iyengar, Stewart, Obrosky, & Marsh, 1990).

< previous page

page_276

next page >

< previous page

page_277

next page > Page 277

TABLE 9.5 Research Reports on the Use of CDI with Children of Different Ethnic and National Backgrounds Reference Notes Abdel-Khalek, 1996 n = 1,981, Arabic version, Kuwaiti students Abdel-Khalek, 1993 n = 2,558*, Arabic version Arnarson, Smari, Einarsdottir, & Jonasdottir, n = 436, Icelandic version 1994 Canals, Henneberg, Fernandez-Ballart, & n = 534, Spanish sample Domenech, 1995 Chartier & Lassen, 1994 n = 792*, North American sample DuRant, Getts, Cadenhead, Emans, & Woods,n = 225, African American 1995 sample Fitzpatrick, 1993 n = 221, African American sample Frias, Mestre, del-Barrio, & Garcia-Ros, n = 1,286, Spanish sample 1992 Ghareeb & Beshai, 1989 n = 2,029*, Arabic version Goldstein, Paul, & Sanfilippo-Cohn, 1985 n = 85, African American n = 305, Brazilian version Gouveia, Barbosa, de-Almeida, & de Andrade-Gaiao, 1995 Koizumi, 1991 n = 1,090*, Japanese version Lobert, 1989, 1990 n = 128, German version Mestre, Frias, & Garcia-Ros, 1992 n = 952*, Spanish sample Oy, 1991 n = 432, Turkish sample Reicher & Rossman, 1991 n = 658, German version Reinhard, Bowi, & Rulcovius, 1990 n = 84, German version Saint-Laurent, 1990 n = 470, French version Sakurai, 1991 n = 237, Japanese version Spence & Milne, 1987 n = 386*, Australian sample Steinsmeier-Pelster, Schurmann, & Urhahne, n = 319, German sample 1991 Steinsmeier-Pelster, Schurmann, & Duda, n = 918, German version 1991 n = 135, Hispanic sample Worchel, Hughes, Hall, S.B. Stanton, H. Stanton, & Little, 1990 Zivcic, 1993 n = 480, Croatian version * Sample sufficient to be considered normative data for this group. TABLE 9.6 Some Research Reports on the Use of CDI with Special Groups Reference Notes Benavidez & Matson, 1993 n = 25, Mentally retarded children DuRant, Getts, Cadehead, n = 225, Public housing Emans, & Woods, 1995 Goldstein, Paul, & Sanfilippo- n = 85, Learning disabled children Cohn, 1985 Meins, 1993 n = 798, Mentally retarded adults administered modified version of CDI Nestre, Frias, & Garcia-Ros, n = 25, Mentally retarded children 1992 Nelson, Politano, Finch, n = 535, Emotionally disturbed children Wendel, & Mayhall, 1987 Oy, 1991 n = 432, Different socioeconomic status Politano, Nelson, Evans, n = 551, Emotionally disturbed children Sorenson, & Zeman, 1985 C.F. Saylor, Finch, Spirito, & n = 154, Children with emotional-behavioral Bennett, 1984c problems

< previous page

page_277

next page >

< previous page

page_278

next page > Page 278

Approaches to CDI Interpretation The manner in which CDI results are used or interpreted is generally a function of the setting in which the instrument was administered and the ostensible reason for the administration. Consequently, the interpretative focus can be a detailed consideration of the specific responses of a given child to each individual item. The interpretation also may emphasize the total CDI T-score or individual CDI factor T-Scores, each of which "rank" the child in comparison to "normal" age- and gender-matched peers. Determining the Validity of the Results Regardless of the interpretive focus, CDI results need to be examined in the context of potential threats to validity. One approach is determination of the quality of the completed inventory. Another approach is examination of the Inconsistency Index. Procedural Issues. The following issues should be kept in mind in determination of the quality of the completed CDI: 1. Has the inventory been filled in properly? Missing items will invalidate the total score. Although the administrator may prorate a missing item (e.g., by taking the average score on all remaining items and assigning that value to the missing item), subsequent interpretation must take the missing item(s) into account. 2. Is there an apparent response bias? Response bias may be operating if a child consistently checks the first option on each item, the middle option, or the last option on each item. Random checking of options, which may be inferred by the detection of apparently contradictory answers to similar items, may represent biased responding as well. Such patterns invalidate the CDI Total Score. 3. Are there any suggestions of lack of truthfulness? In a clinical setting that involves testing a child who has been referred, this possibility may arise, as indicated by the child "denying" every symptom or endorsing the most severe option to all, or almost all, items. In such instances, inquiry into the child's expectations of the evaluation may be more informative than focus on the CDI score itself. 4. Is the testing environment appropriate to psychological examination? As with all forms of psychological assessment, the CDI should be completed in a setting that is free from distraction, affords the child the requisite privacy, and is reasonably comfortable. An unsuitable testing environment is likely to threaten the validity of the child's responses and must be considered in score interpretation. the Inconsistency Index. Children may exaggerate or misrepresent symptoms in some circumstances. As a result, some self-rated instruments include special items or scales to identify distorted responses (e.g., Beitchman, 1996; Reynolds & Richmond, 1985). Alternatively, for some instruments (e.g., MMPI-2 VRIN and TRIN scales: Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989; MASC Inconsistency Index: March 1997), an inconsistency index has been developed that does not usually require special items. Inconsistency indexes are based on the premise that the most similar items, or the most highly correlated items on a measure, elicit similar (although not necessarily identical) responses. As determined by statistical procedures, if there is a large discrepancy in the responses for several correlated item pairs, then inconsistent and possibly invalid responding must be considered.

< previous page

page_278

next page >

< previous page

page_279

next page > Page 279

An Inconsistency Index exists for the CDI. Each of the five scales on the CDI (i.e., Negative Mood, Interpersonal Problems, Ineffectiveness, Anhedonia, and Negative Self-esteem) contains sets of items that are highly correlated with one another. If a pair of items is highly correlated, then a child whose response is indicative of a symptom for one item of the pair should give a response indicative of a symptom for the other item of the pair. Although such consistency is generally expected, some inconsistency can and will occur to a limited extent, the magnitude of which can be assessed through the CDI Inconsistency Index (Kovacs, 1995). This index is generated based on a computer algorithm taking into account the factor loadings of items. For the Negative Mood scale, the following items form the highly correlated item group that is used to measure consistency: Items 1, 8, 10, and 11. For Interpersonal Problems, the item set consists of: Items 5, 26, and 27. For Ineffectiveness, the item set consists of: Items 15, 23, and 24. For Anhedonia, the item set consists of: Items 16, 19, 20, and 22. For Negative Self-esteem, the item set consists of: Items 7, 9, 14, and 25. In the normative sample for the CDI, only 89 out of 1,266 children (6.9%) scored greater than or equal to 7 on the Inconsistency Index. And, only 36 out of 1,266 (2.8%) scored greater than or equal to 9. Based on these data, the results from the Inconsistency Index are assessed as follows: If the Inconsistency Index is less than 7, then the responses are considered sufficiently consistent. If the Inconsistency Index is greater than or equal to 7 but less than 9, then the responses are somewhat inconsistent. If the Inconsistency Index is greater than or equal to 9, then the responses are very inconsistent. A high Inconsistency Index score should not be interpreted to mean that the CDI results should be disregarded. Inconsistent responding can occur for a variety of reasons, including the child being unable to concentrate on the task or to understand the instructions. Such considerations must be part of interpreting the Inconsistency Index for a respondent. Interpretive Steps Interpretation of CDI results in the context of community-based or epidemiologic studies are straightforward in so far as they usually employ clinically validated cutoff scores or normative T-scores to define "caseness," and is not discussed in this chapter. Likewise, when the CDI is used as a screening instrument, a priori defined raw cutoff scores (or T-scores) are generally employed, with no need for specific interpretations. Because most questions regarding CDI score interpretation arise in the context of clinical assessment, and for clinical purposes such as planning interventions or evaluations, pertinent information on these aspects of CDI use are now described in detail. Interpretation of Total Scores and Factor Scores as t-Scores. Normative data tables are incorporated into the Profile Form for the CDI. The normative data tables utilize Tscores that are standardized to have a mean or average of 50 and a standard deviation of 10. The normative tables automatically compare the child being assessed to children in the normative sample of the same gender and age, and allow each component in the profile to be compared to every other. T-scores above 65 are generally considered clinically significant when the child being studied is from a "high base-rate" group, such as children in a clinical setting. When the child is believed to be from a "low base-rate" group, such as children without identified behavioral problems, a much higher cutoff

< previous page

page_279

next page >

< previous page

page_280

next page > Page 280

(e.g., a T-score of 70 or 75) should be used for inferring clinical problems. High scores suggest a problem, whereas low scores indicate the absence of the problem. It should be noted that the T-scores used with the CDI are linear T-scores. Linear T-scores do not transform the actual distributions of the variables, and hence, whereas each variable has been transformed to have a mean of 50 and a standard deviation of 10, the distributions of the scale scores do not change. Variables not normally distributed in the raw data will continue to be nonnormally distributed after the transformation. As a rule of thumb, T-scores for the CDI can be interpreted using the guidelines in Table 9.7. These interpretations reflect how an individual child's score compares to those of children of the same age range and gender from the normative sample. Note, however, that the suggested adjectives are guidelines, and there is no reason to believe there is a perceptible difference, for instance, between a T-score of 55 and a T-score of 56. Therefore, these guidelines should not be used as absolute rules. For many clinical tests, it is common practice to interpret the overall profile based on the most elevated test scores. In such a case, a clinically elevated test score (in the metric of T-scores) would be defined as above 65. If, for a given set of scores, no test scores are above a T-score of 65, the profile is usually considered to be "normal." A profile in which a single T-score is elevated above 65 is usually considered to have a "1-point" code, and is referred to by the single elevated scale. In general, given the high correlations of the factors of the CDI, such profiles should be relatively rare and, when encountered, may be viewed as an indication of only moderate evidence of a problem. When two or more subscale scores are clinically elevated, the profile is usually categorized by the two factors that are the highest and is called a "2-point code." Although 2-point codes have not usually been employed with the CDI, some clinical practitioners may find it useful to do so. Experience with inventories such as the MMPI or the Personality Inventory for Children (PIC) indicates that 2-point codes tend to be useful and robust ways of categorizing clinically meaningful patterns of behavior (Lachar & Gdowski, 1979). In general, therefore, thoughtful examination of the CDI subscale profile should be more informative than consideration of only the total score. The CDI subscale T-score profile can be used to indicate specific areas of vulnerability as well as areas of strength. For example, from a clinical perspective, elevated T-scores on the Anhedonia factor or the Ineffectiveness factor may be particularly important. Because the Anhedonia factor contains items traditionally associated with "endogenous" depression, a child with a TABLE 9.7 Interpretive Guidelines for CDI T-score T-score Interpretation of Overall Symptoms/Complaints* Above 70 Very much above average 66-70 Much above average 61-65 Above average 56-60 Slightly above average 45-55 Average 40-44 Slightly below average 35-39 Below average 30-34 Much below average Below 30 Very much below average * Compared to children of similar age and gender in the normative sample.

< previous page

page_280

next page >

< previous page

page_281

next page > Page 281

high T-score on this factor may be at particular risk for a serious depressive episode. A high score on the Ineffectiveness factor may indicate notable functional impairment, which may warrant additional interventions for a particular child. Concomitantly, in interpreting the CDI profile, a child who has elevated T-scores on both of the aforementioned scales may be of greater clinical concern than a child who has an elevated score on the Anhedonia factor but an average score on the Ineffectiveness factor. In the former case, the child may be evidencing both functional impairment and troublesome depressive symptoms, whereas in the latter case, the troublesome depressive symptoms (area of vulnerability) are somewhat counteracted by having maintained reasonable functioning (area of strength). Examination of Total Raw Score and Item Response Pattern. A practitioner conducting a clinical assessment may decide to focus on the raw CDI score and individual item responses. For example, a total CDI score of 20 may result if a child endorses only 10 items, but each to its most severe degree. Alternatively, a child may receive a score of 20 by endorsing up to 20 items, but each to a mild degree. Examination of the number of items and the options for the items that contributed to the total CDI score can provide useful information about the extent and severity of the child's complaints and symptoms. The examiner also may find it helpful to group the items endorsed by a child into phenomenologically meaningful categories. This approach can provide an additional perspective regarding the nature of the child's complaints. For example, if most, or all, endorsed CDI items pertain to physical and neurovegetative symptoms (somatic complaints, problems with sleep, appetite, energy), then a pediatric examination may be warranted. If all items with symptomatic responses relate to school or peer problems, then a closer examination of those aspects of the child's life may be in order. Ex